Hyperspectral Soil Heavy Metal Prediction via Privileged-Informed Residual Correction

Mangafić, Alen; Oštir, Krištof; Kolar, Mitja; Zupan, Marko

doi:10.3390/rs17121987

Open AccessArticle

Hyperspectral Soil Heavy Metal Prediction via Privileged-Informed Residual Correction

by

Alen Mangafić

^1,2,*

,

Krištof Oštir

²

,

Mitja Kolar

³

and

Marko Zupan

⁴

¹

Geodetic Institute of Slovenia, Jamova cesta 2, 1000 Ljubljana, Slovenia

²

Faculty of Civil and Geodetic Engineering, University of Ljubljana, Jamova cesta 2, 1000 Ljubljana, Slovenia

³

Faculty of Chemistry and Chemical Technology, University of Ljubljana, Večna pot 113, 1000 Ljubljana, Slovenia

⁴

Biotechnical Faculty, University of Ljubljana, Jamnikarjeva ulica 101, 1000 Ljubljana, Slovenia

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(12), 1987; https://doi.org/10.3390/rs17121987

Submission received: 13 May 2025 / Revised: 3 June 2025 / Accepted: 6 June 2025 / Published: 8 June 2025

(This article belongs to the Special Issue Hyperspectral Sensors for Soil Parameters and Crop Parameters Retrieval)

Download

Browse Figures

Versions Notes

Abstract

This study integrates hyperspectral remote sensing with chemical and pedological data to estimate Zn, Pb, and Cd concentrations in the upper soil layers. Conducted in agricultural fields east and northeast of Celje, Slovenia, an area impacted by past industrial activities such as zinc ore smelting, the research integrates remote sensing and soil sampling to rapidly identify and map soil pollution over large surfaces. A multi-sensor approach was employed, combining two hyperspectral cameras (VNIR and SWIR, aerial), laboratory spectrometry, soil parameters, and content of chemical covariates measured with portable XRF and ICP-OES with a direct comparison of both techniques for this specific purpose. Accurate atmospheric and signal transformations were performed to improve modeling. The importance of covariates was thoroughly evaluated using conditional permutations to assess their contribution to the prediction of metal concentrations. The proposed framework utilizes spectral data and privileged information during training, improving prediction accuracy through a multi-stage model architecture. Here, a base model trained on spectral data is corrected using privileged information. During inference, the model functions without relying on privileged data providing a scalable and cost-effective solution for large-scale environmental monitoring. Our model achieved a reduction of predicted RMSE for Zn and Cd maps in comparison to the baseline models, translating to more precise identification of possibly polluted zones. However, for Pb, no improvements were observed, potentially due to variability in the data, including spectral issues or imbalances in the training and test datasets.

Keywords:

hyperspectral imagery; aerial remote sensing; imaging spectroscopy; digital soil mapping; pedometric mapping; soil contamination; environmental monitoring; residual modeling

1. Introduction

Substances that are found in areas where they are not desired or expected are referred to as contaminants. When the concentration of these contaminants exceeds the thresholds defined by certain standards or laws, they are termed pollutants or soil contaminants [1]. There are various standards and regulations regarding these concentrations and how to name them. In Slovenia, for instance, the Environmental Protection Act designates substances causing soil contamination as hazardous materials [2]. For the sake of simplicity, we will refer to these substances as pollutants hereafter.

Soils are considered contaminated when pollutants accumulate in quantities that impair their chemical, physical, or biological functions by reducing fertility, degrading structure, or threatening ecosystems. Common sources include industrial emissions, agricultural runoff, urban activities, and atmospheric deposition [3]. The chemical nature of pollutants determines their environmental behaviors, such as whether they degrade, become adsorbed onto soil particles, or remain mobile in the profile. Inorganic pollutants like heavy metals, particularly Zn, Pb, and Cd, pose a long-term risk due to their persistence and potential for bioaccumulation. Their environmental residence times range from decades to millennia, with Zn persisting up to 510, Cd 1100, and Pb up to 5900 years in soil [4]. Their solubility and uptake depend on multiple factors, including pH, organic matter, and redox conditions [5]. While Zn is an essential micronutrient, both Cd and Pb are toxic to soil biota even at low concentrations. They inhibit microbial respiration, reduce enzymatic activity, and impair organic matter decomposition, with elevated levels shown to reduce earthworm and microbial populations [6]. In our study area east and northeast of Celje, Slovenia, which is historically impacted by zinc ore smelting, concentrations of Zn, Pb, and Cd exceed natural background levels [3,7]. Recently, Slovenia was penalized EUR 1.2 million by the European Court of Justice, case C-318/23, due to prolonged noncompliance with EU waste management obligations at the Bukovžlak landfill site near Celje [8], an area explicitly covered in this study, thus underscoring the urgency and direct local relevance of addressing environmental contamination.

Conventional methods for evaluating soil contamination rely on field sampling and laboratory analysis following national standards. Although effective, these methods are expensive, time-consuming, and constrained in terms of spatial coverage. Traditional geostatistical methods, such as kriging [9,10], which depend on point observations and/or point auxiliary variables [11], fail to capture fine-scale spatial variability. In contrast, hybrid geostatistical and machine learning techniques, which incorporate auxiliary variables like remote sensing data and data fusion approaches [11,12,13,14], can capture fine-scale spatial variability. Remote sensing technologies and, especially hyperspectral imaging (or imaging spectrometry; hereafter referred to as HSI), offer an efficient alternative and scalable solution for large-scale soil monitoring by capturing high-resolution, spatially continuous spectral information that reflects variations in soil properties across the surface. The relationship between reflectance and wavelength forms a spectral signature that can be used to identify, classify, or quantify surface materials. Multispectral systems like OLI on Landsat 8 [15] and MSI on the Sentinel-2 [16] collect data in broader bands (30–180 nm), whereas HSI sensors acquire information in much narrower and contiguous bands, often with spectral bandwidths of 5 nm or less, enabling detection of subtle absorption features linked to specific chemical or mineralogical components [17].

These spectral features reflect the presence of key soil constituents, such as clay minerals, organic matter, carbonates, and iron oxides, which, in turn, influence or co-vary with soil processes and properties [18]. These components, commonly referred to as chromophores [19], exert a strong influence on soil reflectance, particularly in the visible and near-infrared (VNIR) and short-wavelength infrared (SWIR) range, and are therefore critical in modeling soil characteristics. Their spectral response is affected by several soil factors, including concentration, particle size, moisture content, soil organic carbon, and pH, all of which modify how the soil interacts with incoming radiation [18,19,20,21,22]. By capturing these variations, HSI provides a robust framework for quantifying spatial variability in soil physical and chemical parameters at fine scales [23]. In HSI, analytes do not always exhibit direct spectral signatures but can be detected indirectly through their association with spectral-active soil constituents. For heavy metals, this includes iron and manganese oxyhydroxides, clay minerals, and organic matter, which influence reflectance and absorption properties in the VNIR-SWIR. Changes in the concentrations of the constituents shape and alter the intensity of spectral signatures, making it possible to infer mineralogical composition, presence, and other chemical properties [24].

Soil moisture is another critical variable in hyperspectral analysis, significantly affecting spectral reflectance. Optimal results are achieved when volumetric moisture ranges outside the critical point of 0.15–0.40 g/cm³, as deviations outside this range can introduce nonlinear spectral distortions or mask absorptive features due to surface water acting as an optical filter [25]. This sensitivity underscores the importance of integrating both spectral and environmental data in the process of feature engineering and understanding whether our spectral data are suitable.

An increasing number of studies have investigated the use of HSI to predict heavy metal concentrations in soils, with a focus on metals such as Zn, Pb, Cd, Cu, and Ni [22,25,26,27,28,29,30,31,32,33,34,35,36]. These studies employ both direct and indirect approaches, relying on specific spectral absorption features or correlations with chromophoric soil components such as iron oxides, clays, and organic matter. While traditional regression models like partial least square regression (PLSR) and multiple linear regression (MLR) remain widely used, recent work increasingly leverages machine learning techniques, including support vector machine (SVM), random forest (RF), and gradient boosting algorithms (as XGBoost), to enhance prediction performance [37]. Some data science approaches have explored the use of privileged information, data available only during training but not at prediction time, as a way to improve model accuracy, a concept formalized in paradigms such as Learning Using Privileged Information—LUPI [37,38,39,40,41] and privileged learning [42]. This allows models to benefit from rich auxiliary data during learning, directly in the spectral learning context, while remaining deployable in settings where only a subset of features, such as spectral data, is available.

In this study, we adopted a multi-sensor approach, combining airborne VNIR and SWIR data with field and laboratory measurements, including a field spectrometer, portable X-ray fluorescence analyzer (pXRF), and inductively coupled plasma optical emission spectroscopy (ICP-OES). A key innovation lies in a privileged information framework that leverages chemical and pedological parameters during training to refine predictions, while the final model operates solely on hyperspectral inputs. This approach, inspired by but distinct from classical LUPI, is implemented using a multi-stage cascade of random forests: privileged chemical and pedological data are used to model residuals during training, and a spectral-only correction model learns to replicate these adjustments. Instead of stratifying residuals by their magnitude and modeling local environmental corrections per group as in hierarchical residual correction [43], we directly model residuals from privileged chemical and pedological variables, then transfer this correction back into spectral space. Unlike gradient boosting, which models residuals based solely on internal prediction errors and requires all training features at inference, our method leverages external domain knowledge from privileged inputs to inform spectral correction while remaining fully deployable without auxiliary data. This enables effective knowledge transfer and supports interpretable, scalable mapping of Zn, Pb, and Cd in contaminated soils.

2. Materials and Methods

2.1. Study Area

Our study area covers 355 hectares east and northeast of Celje, Slovenia (Figure 1), a region historically affected by zinc ore smelting (Cinkarna Celje) and related past industrial activities (EMO), leading to elevated concentrations of Zn, Pb, and Cd in the soil, which exceed national geochemical background values [4,7]. We focused on arable fields without vegetation to ensure direct spectral visibility of bare soil surfaces for HSI. Agricultural fields were selected due to their more homogeneous soil profiles resulting from ploughing, and to minimize phenological variability, we targeted plots with identical crop rotation cycles during the observation period. The chosen plots were not in direct proximity to industrial sources. This site selection was intentional to avoid steep pollution gradients typically associated with point sources such as industry. As a result, the uniform triangular sampling grid was appropriate for capturing the spatial variability of trace metals under relatively homogeneous field conditions.

A broader surrounding area was also considered, encompassing agricultural and urban land uses. Emissions from historical industrial waste, such as titanium gypsum and residues from sphalerite processing, are key contributors to contamination [44]. According to prior national soil monitoring campaigns [3], the region exhibits excessive levels of Zn, Pb, and Cd, particularly in the A horizon (topsoil). Soil pH in the area ranges from 5.4 to 5.7, a range that increases metal solubility and bioavailability, especially for Zn and Cd [45,46].

To support modeling, we analyzed land use data [47], noting that 34% of the surface is urban and the remainder is mainly agricultural. We prioritized fields with rotation of maize and spring cereals (e.g., wheat, oats, barley), based on timing and their suitability for spring-to-autumn imaging conditions [48,49]. Field selection was refined using farmer declarations [50], cadastral ownership, and visual field inspections. We excluded plots with external soil deposits or recent anthropogenic interventions. The final plot selection was based on confirmed tillage status and visible gradients of metal concentration, verified through field pXRF screening. This selection ensured a diverse contamination profile and comparable environmental conditions across plots, which is essential for developing robust and transferable hyperspectral prediction models. The final eight experimental fields are spatially dispersed, topographically uniform, and consist of non-amended native soils.

2.2. Soil Sampling and Analysis

2.2.1. Sampling Design

Sampling locations on each selected field were laid out in a triangular grid, following approximately equilateral triangle geometries. This design is optimal for geostatistical analyses such as kriging [51], and such a sampling strategy offers better datasets for further research, which includes geostatistical approaches. Sampling point coordinates were obtained using a survey-grade GNSS receiver (Leica GS07 using the real-time kinematic (RTK) technique within the D96/TM reference system, achieving horizontal accuracy of a few centimeters, which is sufficient for alignment with HSI. Samples were collected from a 0–5 cm soil depth. In such ploughed agricultural land, this surface layer is typically well-mixed due to tillage, making it appropriate for summarizing surface properties while matching the sensitivity range of hyperspectral imaging.

Each sampling site was composed of three subsamples taken from the vertices of a 50 cm triangle, with the sampling point defined at the centroid. This incremental approach helps mitigate micro-scale heterogeneity and ensures more representative compositing. Samples were sealed in moisture-proof bags, protected from light, and transported to the laboratory under controlled conditions to avoid physical or chemical changes, following ISO 18400-105 standards [52]. Spectroradiometric and pedochemical analyses were first performed on individual subsamples, and the results were aggregated for selected parts of the modeling workflow. An example of a soil sampling site and the corresponding distribution of sampling points within a field is shown in Figure 2.

Two sampling campaigns were conducted:

Model development samples: Ninety-seven sites were sampled on 11 March 2020.
Verification samples: Twenty-two sites were sampled on 25 March 2022, immediately following the hyperspectral flight.

Verification involved both disturbed and undisturbed sampling using Kopecky cylinders (100 cm³) to determine bulk density and water content at the time of imaging. These measurements were conducted to assess whether moisture conditions remained within the acceptable operational range for hyperspectral analysis, ensuring the reliability of spectral interpretations and reducing the risk of signal distortion associated with excessive or insufficient soil moisture.

Soil sampling was designed to support laboratory analyses of:

Concentrations of Zn, Pb, and Cd (using ICP-OES and pXRF);
Soil bulk density (ρ_b);
pH;
Total carbon (C);
Total nitrogen (N);
Organic carbon (C_org);
Carbonate content (CaCO₃);
Plant-available phosphorus (P₂O₅) and potassium K₂O.

2.2.2. Analysis of Soil Parameters

Disturbed subsamples were dried at 40 °C for 3 days, cleared of coarse fragments, ground in a ceramic mortar, and sieved to 2 mm (SIST ISO 11464 [53]). Subsamples were homogenized per plot. A 100 g portion was preserved for spectroradiometric measurements (see Section 2.2.4); the remainder was analyzed for the following soil parameters:

ρ_b and gravimetric water content (H₂O) (SIST EN ISO 11272 [54], ISO 11465 [55]);
pH in CaCl₂ (SIST ISO 10390 [56]);
C, N, and C_org (SIST ISO 10694 [57], SIST ISO 13878 [58]);
CaCO₃ content (SIST ISO 10693 [59]);
P₂O₅ and K₂O (ÖNORM L 1087 [60]).

Full methodology and instrumentation are documented in [61], following the principles established by [62].

Training set soils were acidic to slightly acidic (pH 4.7–6.9), with organic matter ranging from 2.9% to 4.8%. Carbonate content was generally low (<1%). P₂O₅ and K₂O concentrations were highly variable, with maximum values of 35.8 and 53.7 mg/100 g. Some samples showed enriched nutrient status, possibly due to fertilization or finer soil texture.

Validation samples had pH 5.0–7.0, indicating slightly lower acidity. Organic matter values remained within a similar range. Carbonate content varied more widely, with most samples below 3%. Available P₂O₅ and K₂O were notably higher, especially in samples from field Z.

Gravimetric water content ranged from 0.25 to 0.42 g/cm³ (median ~0.29 g/cm³), within the optimal range for HSI [25]. Only one sample slightly exceeded this range (the 0.42 g/cm³ one), without major impact on overall data quality.

2.2.3. Analysis of Heavy Metal Concentrations

Zn, Pb, and Cd concentrations were measured using

X-ray fluorescence (pXRF) with Olympus Delta 50 (2 × 60–90 s per sample; Cd detection limit = 4.5 ppm);
ICP-OES after microwave-assisted digestion (SIST ISO 12914 [63], ISO 22036 [64]).

Zn, Pb, and Cd concentrations in the pXRF training set exhibit substantial variability across samples (Table 1). Zn levels exceed the threshold limit (200 ppm) in all cases, with the highest concentrations recorded in field Z (768 ppm) and field T (645.9 ppm), both surpassing the warning threshold (300 ppm). Samples from field Z approach the critical value of 720 ppm. Pb concentrations exceed the threshold (85 ppm) in samples from fields K (114.1 ppm), T (121 ppm), and Z (122.8 ppm), with values from T and Z also exceeding the warning threshold (100 ppm). Cd levels are below the detection limit in most samples, except in fields T (5.1 ppm) and Z (7.6 ppm), both of which clearly exceed the Cd warning threshold.

In the validation set (22 independent samples), concentrations of Zn, Pb, and Cd are generally higher than those observed in the training set. Zn levels in multiple samples from fields Z and T exceed the critical threshold of 720 ppm, with values ranging from approximately 815 ppm to over 930 ppm. This reflects a consistent pattern of elevated Zn concentrations in these two fields. Pb levels are also markedly higher, particularly in field Z, where some samples exceed the warning threshold of 100 ppm and reach values above 200 ppm. Cd concentrations show a general increase as well, with several samples approaching the critical level of 12 ppm. However, in most validation samples, Cd levels remain lower than the highest concentrations recorded in fields T and Z within the training set. It is important to note that threshold, warning, and critical values [65] are defined for aqua regia-based analyses, whereas the measurements above were obtained using pXRF. As such, direct comparison with regulatory limits is not appropriate at this stage. In the following, Zn, Pb, and Cd concentrations determined by ICP-OES are presented (Table 2), enabling direct alignment with official regulatory benchmarks and providing a more accurate assessment of soil compliance.

ICP-OES results from the training set also reveal pronounced spatial differences in Zn, Pb, and Cd concentrations. Zn levels exceed the threshold value of 200 ppm in fields K (298.4 ppm), T (562.4 ppm), and Z (565.5 ppm), with fields T and Z also surpassing the warning threshold of 300 ppm. Pb concentrations exceed the threshold of 85 ppm in fields K (92.0 ppm), T (92.5 ppm), and Z (83.9 ppm); while values from fields K and T exceed the threshold, they remain below the warning limit of 100 ppm. Cd is below detection limits in most fields, except for K (2.3 ppm), T (4.7 ppm), and Z (6.2 ppm), all of which exceed the Cd warning threshold of 2 ppm, with Z nearing the critical limit of 12 ppm.

In the validation set, ICP-OES results again confirm elevated concentrations of Zn, Pb, and Cd compared to the training set. Zn levels in several samples from fields Z and T exceed the critical threshold of 720 ppm, with values reaching over 1000 ppm. Pb concentrations are consistently high, with all validation samples exceeding the regulatory threshold of 85 ppm, and numerous samples surpassing the warning level of 100 ppm. Cd levels are also elevated, with several samples ranging from approximately 8 to 9.4 ppm, below the critical threshold, but notably higher than background levels. In the remaining validation samples, Cd concentrations tend to be lower than those recorded in fields T and Z within the training set.

Unlike the pXRF measurements, the ICP-OES results are directly comparable with regulatory limits established for aqua regia-extracted concentrations. Both pXRF and ICP-OES analyses indicate that the verification fields (Z and T) exhibit not only higher concentrations of Zn and Pb but also a broader range of Cd values compared to the training fields. These findings highlight the importance of using the verification dataset to assess the physical plausibility of model predictions. Specifically, the wider value distributions and elevated concentrations present an opportunity to evaluate whether the developed models can generalize effectively to unseen data, or whether they exhibit signs of overfitting or underfitting in the presence of more complex or extreme field conditions.

2.2.4. Laboratory Spectroradiometry

Laboratory spectroradiometric measurements were performed using an ASD FieldSpec III Pro FR (350–2500 nm; ASD Inc., Boulder, CO, USA) under controlled conditions. A Zenit Polymer white reference (95% reflectance) was used for calibration. For each sample, 30 replicate spectra were averaged to reduce noise. The device provides a sampling interval of 1.4 nm (350–700 nm) and 2.0 nm (700–2500 nm), with spectral resolutions of 3.0 nm and 10.0 nm, respectively. The procedure followed the methodology described in [23,66]. Spectral measurements captured high-resolution reflectance profiles, enabling identification of diagnostic absorption features linked to mineral composition and organic matter [26,67]. Despite minor deviations from strict protocols for moisture and thermal stability [68], data quality remains sufficient for predictive modeling.

2.3. Hyperspectral Data Acquisition and Preprocessing

HSI was carried out on 24 March 2022 using a dual-camera setup mounted on a light aircraft: HySpex VNIR-1600 (400–1000 nm [69]) and HySpex SWIR-384 (930–2500 nm [70]). Both sensors were co-registered, allowing synchronized acquisition across the full optical-infrared range. The target product was a radiometrically and geometrically corrected hyperspectral orthomosaic, referenced to D96/TM, with spatial resolution of 0.2 m (VNIR) and 0.6 m (SWIR). The flight followed strict illumination and meteorological constraints: sun elevation >30°, clear sky, and soil moisture below 0.40 g/cm³. Flight altitude was ~1250 m a.s.l., resulting in an effective sensor-to-ground distance of ~995 m.

Orthorectification used GNSS/INS-aided aerotriangulation [71], with a 1 m DSM derived from Slovenia’s national LiDAR dataset [72]. Spatial accuracy was validated using 10 RTK-measured ground control points, yielding a planimetric RMSE_xy of 0.96 m, which is acceptable for our use case.

Atmospheric correction is essential to retrieve surface reflectance from at-sensor radiance and to minimize atmospheric absorption effects, particularly in SWIR regions sensitive to water vapor. We used the ATCOR4 model [73] (Richter & Schläpfer, 2023). A critical parameter in this correction is the column water vapor (WV), expressed as the equivalent height (cm) of precipitable water. Incorrect WV assumptions can lead to systematic over- or under-correction, particularly in spectral regions beyond 1400 nm and 1900 nm, and must be carefully considered, especially in early spring over agricultural soils. Therefore, we performed a custom WV estimation based on local meteorological data and physical modeling.

WV estimation procedure:

Meteorological data retrieval and procedure:
Air temperature at 2 m (T) and relative humidity (RH, %) [74] measured at 07:00, 14:00, and 21:00 were interpolated to the time of hyperspectral acquisition (11.53) using cubic splines [75]: 19.2 °C and 21.9% RH at 11.53.

2.: Dewpoint calculation:
Dew point temperature ( $T_{d}$ ) was calculated using the Magnus formula presented in Equation (1) [76]

T_{d} = \frac{b \cdot \ln (\frac{R H}{100}) + \frac{a \cdot T}{T + b}}{a - \ln (\frac{R H}{100}) - \frac{a \cdot T}{T + b}} = - 3.1 ° C

(1)

where

$a$ is an empirical constant related to the shape of the saturation vapor pressure curve (typical value: 17.67).
$b$ is an empirical constant related to the saturation point temperature (typical value: 243.5 °C).

3.: Air pressure at flight altitude:
Air pressure (p) at flight altitude (h = 1250 m) was estimated based on the surface pressure p₀ = 966 hPa, measured at the meteorological station. A simplified barometric formula (Equation (2)) was used, assuming an average vertical pressure gradient of 1 hPa per 8 m in the lower atmosphere [77]:

p = p_{0} - (\frac{h - h_{0}}{Δ h}) * Δ p = 870 h P a

(2)

4.: Estimation of column water vapor:
Water vapor content between the average elevation of the study area (255 m) and the flight altitude (1250 m) was estimated by integrating specific humidity over a vertical column [78,79]. The input parameters are pressure values (996 and 870 hPa) and the dew point temperature (−3.1 °C). The resulting column water vapor is $W V$ = 0.42 cm.

Atmospheric correction was performed in ATCOR4 using the rural aerosol model and a water vapor value of 0.4 cm or g/cm². This choice reflects the dry atmospheric conditions present during the acquisition period, as indicated by meteorological data and dew point analysis. Since atmospheric moisture levels in March are often considerably higher, site-specific WV estimation is critical to ensure accurate correction, particularly in sensitive spectral regions

2.4. Feature Engeneering

After applying atmospheric corrections, we proceeded to generate harmonized, spectrally consistent inputs for heavy metal concentration modeling. The HySpex system used consists of two pushbroom sensors: VNIR, covering the 400–1000 nm range, and SWIR, covering 930–2500 nm. To produce a continuous spectral dataset across both sensors, we applied a multi-step workflow: signal harmonization, removal of unreliable channels, spectral smoothing, global normalization using reference spectra, and merging of corrected channels. The output was a single reflectance vector per pixel, suitable for downstream transformation and modeling. VNIR and SWIR sensors overlap in the range of 950–980 nm. We computed a normalization factor based on the average reflectance across these overlapping bands and scaled the VNIR reflectance to match SWIR levels. VNIR channels in the overlap were then removed due to lower signal-to-noise ratio (SNR), and SWIR bands were retained.

Following this harmonization, we removed spectral regions affected by known noise sources. These included strong atmospheric water vapor absorption zones (1350–1410 nm and 1810–1930 nm), low-SNR regions near sensor edges (VNIR: 400–435 nm and >965 nm; SWIR: <965 nm and 2480–2500 nm), and other segments prone to sensor instability or electromagnetic interference (750–770 nm, 930–950 nm, 1130–1150 nm, 1370–1390 nm, 1870–1890 nm, 2070–2090 nm) [80,81,82]. After these exclusions, 391 spectral bands remained.

To reduce high-frequency noise, we applied Savitzky–Golay [83] filtering with a second-degree polynomial and a window size of 11 bands. This configuration is known to preserve absorption features while effectively suppressing random spectral fluctuations [66,84,85].

First, an empirical correction was applied using a Spectralon target with 50% reflectance, which was placed in the field of view and captured by the HSI sensors. This correction served to boost the recorded signal levels, aligning them more closely with expected reflectance values. Next, we applied a global empirical normalization using laboratory-acquired reference spectra of subsamples (n = 291). These spectra were interpolated to HSI wavelengths and processed identically in terms of band removal and filtering. We then computed the mean reflectance across all laboratory spectra measurements of subsamples and compared it to the mean reflectance of the imagery. The ratio between these means was applied as a single global scaling factor across all image spectra [86,87]. This normalization raised absolute reflectance levels to match reference data while preserving the spectral shape and variability.

With these preprocessed and harmonized reflectance spectra, we applied multiple spectral transformations to enhance features relevant to soil and metal-related spectral absorption. First, we performed baseline correction by subtracting the minimum reflectance value of each spectrum [88,89]. This removed vertical offsets without modifying the shape of the spectral curvature.

We then computed first and second spectral derivatives using Savitzky–Golay filters. The first derivative (second-order polynomial) highlights changes in slope, while the second derivative (third-order polynomial) emphasizes curvature transitions associated with subtle absorption features [90].

Continuum removal was performed using a convex hull-fitting method to isolate absorption features by normalizing spectra against their upper envelope [91]. This technique enhances contrast in absorption regions while minimizing broad-scale reflectance trends. An example of the impact of these preprocessing and transformation steps on the spectra is shown in Figure 3.

Finally, we applied dimensionality reduction to address spectral collinearity and compress redundant information. Principal Component Analysis (PCA) was computed using smoothed spectra, extracting both 20 and 50 components. In parallel, Kernel PCA (KPCA) with Nyström approximation was applied to capture potential nonlinear patterns that PCA might miss. KPCA achieves this by implicitly mapping spectral data into higher-dimensional spaces using kernel functions, where complex relationships become linearly separable [92,93]. The Nyström approximation was employed to ensure computational efficiency. Importantly, the exact transformation models fitted on the training dataset, including all parameters and mapping functions from PCA and KPCA, were directly applied to the HSI imagery. This ensured numerical consistency between model training and inference.

Heavy metal concentrations were determined for each sample as the mean of three subsample increments. This procedure was applied consistently for pXRF and ICP-OES measurements. In cases where values were below detection limits (just Cd with pXRF), we assigned empirical numerical values to retain modeling consistency. For Cd, we used 0.61 ppm, representing the regional median natural background level for the Inner Carniola–Celje Basin [94]. This approach avoids introducing zeroes, which could introduce instability in modeling.

Spectroradiometric measurements on subsamples (n = 291) were used to harmonize airborne HSI imagery, ensuring spectral consistency between field and laboratory data. For modeling purposes, laboratory spectra remained associated with individual subsamples, while chemical concentrations were averaged across subsamples to represent each sample site (n = 291).

We have three final datasets for modeling:

Laboratory spectra of subsamples in HSI feature space;
One based on pXRF concentrations, HSI spectra;
One based on ICP-OES concentrations and HSI spectra.

Each dataset was paired with a full set of spectral representations, including Savitzky–Golay smoothed spectra, baseline-corrected spectra, first and second derivatives, continuum-removed spectra, and the outputs of PCA and KPCA with 20 and 50 components, respectively. In addition to spectral data, we appended privileged features comprising concentrations of Cu, Ni, and As, as well as pedological properties: bulk density (ρ_b), pH in CaCl₂, total carbon (C), total nitrogen (N), organic carbon (C_org), organic matter, calcium carbonate content (CaCO₃), plant-available phosphorus (P₂O₅), and potassium (K₂O).

Strong collinearity was observed among several pedological variables. Total carbon, organic carbon, and organic matter were perfectly correlated (r = 1.00), while total nitrogen showed a high correlation with both total carbon and organic matter (r = 0.96). Due to this redundancy, only organic matter was retained for further analysis.

For pXRF subsamples (n = 291), Zn showed a strong correlation with Pb (r = 0.79) and moderate correlations with Cd (0.67), Cu (0.54), organic matter (0.48), CaCO₃ (0.46), and pH (0.30). Pb was correlated with Cu (0.63), As (0.69), and Cd (0.35), while Cd was correlated with pH (0.40) and CaCO₃ (0.30). All were statistically significant (p ≤ 0.05).

In pXRF samples, Zn correlated strongly with Pb (0.76) and moderately with Cd (0.46), Cu (0.56), and As (0.48). Cd correlated with pH (0.33), CaCO₃ (0.35), and total carbon (0.36).

ICP-OES data showed a strong correlation between Zn and Cd (r = 0.88) and between Pb and Cd (0.88), with moderate correlations of Zn with Pb (0.66) and pH (0.41). Other relationships, while weaker, remained statistically significant.

These enriched datasets formed the basis for all subsequent model training and validation. These combinations formed the foundation for all subsequent model training and validation.

2.5. Multi-Stage Privileged Learning with Spectral Residual Correction

Random forest [95] was used throughout the multi-stage privileged learning framework to leverage its robustness in modeling high-dimensional and nonlinear relationships In the initial phase, conditional permutation importance analysis was applied to laboratory spectra (measured at the subsample level using ASD and mapped into the HSI feature space) and to pedological and chemical variables, at both subsample (pXRF) and sample levels (pXRF and ICP-OES), to identify the most relevant predictors. Based on these results, optimized variable combinations were selected. For model development, HSI spectra were extracted at each sampling location using all nine pixels from 3 × 3 windows centered on the sampling point. Consequently, each site contributed nine HSI spectra, resulting in a training set approximately nine times larger than the number of chemical concentration samples. Methodological consistency was maintained by using RFR at each stage.

2.5.1. Conditional Permutation Importance for Privileged Feature Selection

A conditional permutation approach was employed to evaluate variable importance across all three core datasets, targeting the prediction of selected heavy metals. This method measures the decline in model performance after randomly permuting a variable, while controlling for multicollinearity by fixing strongly correlated pairs (r ≥ 0.70), thereby preventing biased importance estimates without excluding any variables [96,97].

Spectral features were treated as a locked component throughout, as they were present in all models. Correlated pairs fixed during permutation included the following:

pXRF subsamples: Zn-Pb, Zn-Cd, Pb-As, Cd-As;
pXRF samples: Zn-Pb, Zn-Cd, Pb-As, Cd-As, and
ICP-OES samples: Zn-Cd.

Permutation tests were conducted using an RFR, suitable for capturing nonlinear and hierarchical patterns in high-dimensional data [98]. Variable importance was assessed via 5-fold cross-validation to ensure robustness. This method improves upon impurity-based importance scores (e.g., Gini, entropy), which are known to favor variables with broader ranges or early tree splits [96]. While computationally intensive and sensitive to small sample sizes [99], conditional permutation offers more reliable insights for datasets with complex dependencies, such as those used in this study.

We evaluated the following combinations:

A total of 1023 permutations for pXRF datasets, using 10 variables + target (Cd, Zn, Pb, Cu, Ni, Hg, organic matter, pH, CaCO₃, ρ_b, K₂O, P₂O₅);
A total of 511 permutations for the ICP-OES dataset, same as for pXRF, excluding Hg (9 variables + target).

Model performance under permutation was evaluated using R² and RMSE [ppm]. While permutation importance was calculated via the decline in R² (reflecting each variable’s contribution to explained variance), final model selection prioritized RMSE. Although R² indicates explained variance, it may not reliably capture predictive accuracy in nonlinear models [100] such as RFR. Therefore, we prioritized RMSE for selecting optimal permutations as it quantifies prediction error directly in the target variable’s units and remains robust to violations of linearity assumptions. Unlike R², which can take negative values when models overfit noise or fail to generalize [101,102], RMSE was chosen as a more reliable estimate of predictive performance. To enable comparison independent of the data scale, performance was expressed both in measurement units (ppm) and relative to the target variable’s standard deviation (σᵧ). This approach helps assess the proportion of variability in the target variable explained by the model within the training set. By reporting the RMSE in both absolute and relative terms, we gain a clearer understanding of the model’s ability to capture meaningful patterns in the data while accounting for the variability inherent in the target variable.

2.5.2. Privileged-Informed Residual Ensemble Architecture

Following the conditional permutation importance analysis, a three-stage additive ensemble framework was developed to integrate privileged information during model training while ensuring that the final predictive model relied exclusively on hyperspectral inputs at inference.

In the first phase, the full spectral cube (generally comprising 391 bands, or 20/50 bands in cases of dimensionality reduction) was combined with privileged variables, including Cd, Zn, Pb, Cu, Ni, Hg, organic matter, pH, CaCO₃, ρ_b, K₂O, and P₂O₅. When a specific heavy metal (e.g., Zn, Pb, or Cd) was selected as the target variable, it was excluded from the privileged feature set to avoid information leakage.

An initial RFR model was trained to predict the target concentrations from hyperspectral:

\hat{y_{s}} = f_{RF} (X_{s})

(3)

where

X_{s} \in R^{d}

, with d

\in {20, 50, 391

}, denotes the HSI cube dimensions. Residuals were then calculated as:

ϵ = y - \hat{y_{s}}

(4)

where

y

denotes the laboratory-measured concentration of the target.

A secondary RFR model was trained to predict residuals using only the privileged variables:

\hat{ϵ} = g_{RF} (X_{p})

(5)

where

X_{p} \in R^{d}

represents the set of privileged pedological and chemical variables, and

d = 10

for the pXRF dataset and

d = 9

for the ICP-OES dataset.

Finally, a third RFR model was trained on spectral data

X_{s}

to approximate the privileged-modeled residuals:

\hat{ϵ_{s}} = h_{RF} (X_{s})

(6)

The final prediction combines the outputs of the baseline and correction models:

{\hat{y}}_{final} = f_{RF} (X_{s}) + h_{RF} (X_{s})

(7)

Model training was performed using RFR configured with 1000 trees. No maximum tree depth was explicitly set, following the default configuration to allow sufficient model flexibility. Validation was conducted using 5-fold cross-validation with randomized splits, assuming sample independence. Model performance was evaluated based on RMSE [ppm] and R². To assess the benefit of the privileged-informed ensemble approach, results were systematically compared against a baseline RFR model trained solely on hyperspectral data without privileged correction. This direct comparison with the pure spectral RFR allowed for the evaluation of the extent to which residual correction using privileged information improved predictive accuracy.

2.5.3. Validation of Heavy Metal Concentration Predictions

During the validation of predicted concentrations of Zn, Pb, and Ca, a significant methodological challenge arose due to temporal misalignment between the reference data and hyperspectral imagery. Samples used for predictive model development were collected in 2020, whereas the HSI and validation samples were obtained in 2022.

Although agricultural fields remained vegetation-free and experienced similar environmental conditions throughout the interval, minor natural or anthropogenic variations could not be entirely ruled out. Establishing methodological robustness under these temporal assumptions was, therefore, critical for validating our approach.

To mitigate concerns regarding data consistency across different years, the validation was conducted under the assumption of temporal consistency. This assumption is supported by three principal arguments:

Stability of agricultural practices

Between 2020 and 2022, the same types and quantities of fertilizers were applied, and identical soil management procedures were implemented. There were no significant agro-technical changes during this period.

2.: Absence of major external influences

No extreme weather events or anthropogenic disturbances were recorded that could have significantly altered the soil properties in the study area during the relevant time frame.

3.: Inertia of heavy metals in soil

Heavy metals are characterized by low mobility in soil environments, remaining relatively stable over short- to medium-term periods.

The predictive performance of the models was evaluated using a pixel-level validation approach. Predicted values were extracted at each sampling point from the 2022 HSI using multiple spatial window sizes, enabling the quantification of predictive accuracy at varying spatial scales. The window sizes analyzed included

3 × 3 pixels (1.8 m × 1.8 m),
5 × 5 pixels (3.0 m × 3.0 m), and
7 × 7 pixels (4.2 m × 4.2 m).

For each window, mean predicted concentrations were calculated and compared to the corresponding field validation measurements. Pearson correlation coefficients (r) and associated p-values were computed to assess the strength and statistical significance of the predictions across different spatial scales.

Model performance results for the privileged-informed spectral residual ensemble were systematically compared to those of an RFR model trained solely on hyperspectral inputs without privileged correction. This direct comparison allowed for the evaluation of the extent to which incorporating privileged-based residual corrections improved predictive accuracy and spatial consistency.

3. Results

3.1. Conditional Permutation Results

The conditional permutation framework identified optimal auxiliary variable combinations that significantly improved spectral predictions of Zn, Pb, and Cd concentrations. These improvements are demonstrated through the best-performing models for each element, summarized in Table 3, Table 4 and Table 5, which compare results to baseline RFR models trained solely on spectral data. RMSE values, reported in absolute units (ppm) and relative to the target variable’s standard deviation (%σᵧ), highlight the predictive gains achieved by integrating auxiliary variables.

For Zn, the RMSE values ranged from 68.1 to 96.8 ppm, with R² values between 0.76 and 0.93 (Table 3). The inclusion of variables such as Pb, pH, and organic matter, which exhibit strong linear relationships with Zn, resulted in high R² values and moderate variability in RMSE. These improvements reflect the robust spectral activity of Zn and the synergistic interactions with auxiliary variables. The observed Zn-Pb covariation (r = 0.79, Chapter 2.4) underscores their shared sulfide mineralogy in contaminated soils [103]. Additionally, K₂O likely contributes to the improved model performance through its role in soil fertility or mineral binding mechanisms. The relative RMSE reductions, such as the 26% reduction for PCA 20 compared to the direct model’s 51.5%, highlight the mitigating effect of auxiliary variables on spectral ambiguities.

Pb models displayed a distinct performance pattern, with RMSE ranging from 14.9 to 24.4 ppm and R² values fluctuating between 0.38 and 0.88 (Table 4). The wide variability in R² values, in contrast to a relatively stable RMSE (Δ~9.5 ppm), suggests that Pb has weaker spectral activity and relies heavily on auxiliary variables. Proxies such as Zn (r = 0.76) and As (r = 0.69) played a significant role in stabilizing Pb predictions. Nonlinear models incorporating auxiliary data, such as KPCA 20 (RMSE = 16.3 ppm), highlight their critical role in improving performance. In contrast, spectral-only nonlinear models performed poorly, as evidenced by the baseline Savitzky–Golay + ICP-OES model (R² = −0.02). The relative RMSE reductions, such as the 34.4% reduction for PCA 20 compared to the 67.6% reduction for the direct model, further underscore the significant contribution of covariates.

Cd presented unique challenges due to its trace concentrations, with RMSE values ranging from 0.8 to 2.5 ppm and R² values between 0.54 and 0.90. The best-performing model, which combined the second derivative with ICP-OES data (RMSE = 0.8 ppm), successfully isolated the interactions between Cd and carbonate and phosphate phases, utilizing covariates such as CaCO₃ and P₂O₅. Strong correlations between Zn and Cd (r = 0.88) and Pb and Cd (r = 0.88) (Chapter 2.4) further supported the necessity of incorporating ICP-OES data. Spectral-only models, such as the continuum removal + pXRF model (RMSE = 2.5 ppm), showed inferior performance. The relative RMSE values, with a 29.6% reduction for the second derivative + ICP-OES model compared to the 70.2% reduction for the direct model, underscore the precision gained through the incorporation of geochemical proxies.

For Zn, the most important variables were Pb, pH, and organic matter, reflecting the influence of sulfide mineralogy and adsorption dynamics. In Pb models, Zn, As, and pH were critical for stabilizing predictions, reflecting co-deposition processes and pH-dependent solubility [104,105]. Cd, CaCO_3, and P₂O₅ played a crucial role in resolving the trace adsorption mechanisms associated with Cd. Figure 4 displays the cumulative frequency of auxiliary variables across all evaluated models (pXRF subsamples and pXRF and ICP-OES samples) that outperformed direct spectral-only models when ranked by RMSE. The aggregated results highlight the most critical variables: Pb for Zn predictions, As for Pb predictions, and Zn for Cd predictions, emphasizing their consistent importance in improving accuracy across all permutations, datasets, and contamination scenarios. The prominence of As for Pb and Zn for Cd predictions reflects their co-occurrence as point-source pollutants, typically emitted together in industrial processes such as smelting, where these metals originate from shared contamination pathways.

3.2. Selected Heavy Metal Concentration Inferred from HSI

The best-performing models for Zn, Pb, and Cd, based on spatial window sizes, are summarized in Figure 5, Figure 6 and Figure 7. The models were evaluated across three different spatial window sizes, and key observations were made for each target element.

In Figure 5, we can see the distribution of errors for all the ICP-OES (a–c) models compared to the baseline RFR and pXRF models (d–f). The errors for the ICP-OES models are closer to 0 than those of the baseline RFR, indicating a better performance of the corrected RFR models. The pXRF models consistently performed worse and generally underestimated Zn concentrations. Specifically, we observe that the pXRF model is systematically biased towards lower predictions as the error distribution is shifted to the negative side. The error distribution for all the targets also has several smaller peaks, suggesting that the different combinations of features used in the models create multiple prediction regimes. This reflects the complexity of the relationships between the spectral features and target concentrations.

Similarly, for Pb, the ICP-OES-corrected RFR models again outperform the baseline RFR (Figure 6). The pXRF-based models are worse, showing larger errors.

For Cd, the ICP-OES model also shows slightly better results, but the improvements are more noticeable in the 7 × 7 window (Figure 7c). The errors for Cd are smaller in this case, and the ICP-OES model shows a more refined distribution of prediction errors, indicating that it effectively captures the geochemical interactions influencing Cd concentrations. However, for the pXRF models, the results remain suboptimal, with higher errors and less consistency across windows. It can be concluded that for both models, the ICP-OES-corrected ones performed better.

The RFR and the baseline RFR managed to generalize well across the datasets, with the ICP-OES-corrected models consistently showing a better fit. The pXRF-based models, while not as successful, still demonstrated the ability to capture some features, though they generally underestimated the target metals. We also note that in these plots, we present the results for all models that previously performed well in the permutation importance analysis. This approach validated the robustness of the selected models, confirming that the permutation-based methodology was effective in identifying the most relevant features for prediction.

Among all models, the best evaluation metrics for Zn in terms of RMSE and mean prediction error were achieved by the model using baseline-corrected spectra paired with XRF, where residual errors were corrected by integrating a combination of As, Cd, Cu, Ni, and Pb. During the conditional permutation process, this model achieved an R² of 0.70 and an RMSE of 136.2 ppm. When applied to the HSI dataset with a 3 × 3 window size, the model produced an inference RMSE of 161.9 ppm compared to target values, demonstrating strong generalization across data sources. The model also achieved a pixel-wise correlation of 0.89 within the selected window, confirming high spatial prediction accuracy.

For comparison, the baseline RFR model yielded an RMSE of 167.6 ppm on the same HSI dataset. The privileged correction thus improved inference accuracy by 3.4%, a modest but consistent enhancement observed across configurations. Critically, all privileged Zn models based on pXRF outperformed baseline equivalents (median ΔRMSE = 4.8%), demonstrating statistically robust improvement (p < 0.05) despite the inherent noise of cross-sensor validation. This consistency suggests the method systematically leverages multi-element relationships rather than fitting dataset-specific artifacts. It is possible that such methods, when applied to higher-quality HSI data with better preprocessing, would yield even greater improvements. While the privileged-informed models demonstrated improved accuracy for Zn and Cd predictions within the concentration ranges represented in the training set (2020), the validation set (2022) exhibited generally higher concentrations of Zn, Pb, and Cd (Section 2.2.2). This represents a domain shift where predictions required extrapolation beyond the training data distribution. The observed results likely reflect the inherent challenge of extrapolating to previously unobserved concentrations.

In Figure 8, we can see the prediction map alongside a selected field subset, providing a visual representation of the model’s performance across the area of interest. The map displays the entire study area, with two sample fields highlighted, which include validation sites marked as black points. The largest quantities of Zn were predicted in the southwest area, which is in proximity to the industrial zone, indicating a higher concentration of the metal in this region.

For Pb, the best predictions were not achieved by the corrected model but instead by the baseline RFR model, though both exhibited poor overall performance. The baseline RFR model yielded marginally better results with an RMSE of 32.2 ppm and an R² of 0.48. By comparison, the corrected model performed slightly worse, producing an RMSE of 33.2 ppm and an R² of 0.44. These results indicate that while the baseline RF model showed a small advantage for Pb predictions, neither approach demonstrated robust accuracy.

For Cd, results revealed unexpected performance from a Savitzky–Golay–preprocessed model using Cu, Pb, and Zn as auxiliary variables. Despite omitting advanced spectral derivations or dimensionality reduction, the model achieved an RMSE of 1.78 ppm and R² of 0.64, with a validation correlation of 0.91. Spatial predictions mirrored Zn trends in the southern field (industrial proximity zone), where both metals showed elevated concentrations. However, the northern field low-Zn area (Figure 9) exhibited no Cd-Zn correlation, though validation points there fell anomalously high against model predictions, suggesting potential localized sampling artifacts. Notably, this outperformed initial permutations (R² = 0.50, RMSE = 2.70 ppm), highlighting the value of multi-element correction even for Cd. All privileged Cd models based on ICP-OES outperformed baseline equivalents with a median ΔRMSE 0.5 ppm.

4. Discussion

The proposed models showed improvements for predicting Zn and Cd concentrations, while for Pb, no improvements were observed. The privileged-informed residual correction framework enhanced predictions for Zn and Cd, with RMSE reductions of 3.4-4.8% for Zn by utilizing auxiliary variables such as Pb, pH, and organic matter. Prediction maps reveal contamination hotspots in indirect proximity of industrial zones, demonstrating the framework’s ability to capture spatial variability. For Cd, the model achieved an RMSE of 1.78 ppm (R² = 0.64) with key variables like CaCO₃, P₂O₅, and Zn, aligning with Zn trends in spatial predictions. However, for Pb, no improvements were observed due to spectral limitations and data imbalances, with baseline models performing better (RMSE: 32.2 vs. 33.2 ppm).

Celje has specific contamination sources and geochemical conditions. However, it shares key traits with other legacy smelting areas. These include the co-occurrence of Zn, Pb, and Cd in sulfide-rich substrates and industrial residues, and soil properties that promote metal mobility. This makes Celje representative of a broader class of contaminated sites. The framework is suited to such conditions. For other types of polluted environments, such as mining areas without smelting, urban zones, or agricultural land, direct application would require adaptation. This depends on whether key properties like spectral diversity and relevant metal associations are present. To reduce dependence on the original training environment and improve generalization, the framework separates model structure from local empirical patterns. Predictors are identified using conditional permutation importance. The model must be trained from scratch on local spectral and auxiliary data. Instead of transferring learned parameters, the method reconstructs predictive relationships through spectral residual correction on new samples. This helps prevent overfitting and improves model robustness.

This study proposes a proof-of-concept framework leveraging spectral residuals, with several avenues for enhancement. Future iterations could explore multi-head neural networks or dual-input architectures that process laboratory and HSI data concurrently. In such a configuration, the network could first learn error patterns from controlled laboratory spectra before transferring these corrections to HSI datasets. This error-knowledge transfer could refine sensor calibration, bridging discrepancies between laboratory standards and HSI-derived measurements.

This ensemble-based spectral residual correction architecture differs fundamentally from the LUPI concept, particularly in how privileged information is incorporated. While both approaches utilize privileged data exclusively during training and not during inference, our method employs an additive ensemble of baseline and correction models rather than embedding privileged knowledge directly into the model structure. This modular design ensures compatibility with standard ML libraries (e.g., scikit-learn, cuML), as privileged learning operates as a parallel correction layer rather than necessitating algorithm-level modifications. Consequently, the framework remains scalable, adaptable to diverse model families, and straightforward to update without privileged-specific re-engineering. In contrast, traditional LUPI methods often require custom loss functions or optimizer adjustments, rendering them inflexible and challenging to deploy in operational settings.

Our methodology uses RF due to its ability to capture both linear and nonlinear relationships without assuming a functional form. Since the nature of soil–metal interactions is not known in advance, we avoided model-specific assumptions and used a model-agnostic conditional permutation approach. While methods like PLSR may perform better in strictly linear settings, our framework is designed to remain flexible across diverse conditions.

While model development employed randomized 5-fold cross-validation under sample independence assumptions, several methodological refinements warrant attention. First, the current methodology does not explicitly address spatial autocorrelation inherent in hyperspectral soil datasets. Incorporating spatial lag variables or hybrid approaches such as regression kriging of residuals could better model distance–decay relationships in contamination patterns. However, care must be taken when combining such spatial residual correction with existing corrections based on spectral patterns, as both may encode overlapping structures. Without proper separation, this could lead to redundant adjustments or amplification of local biases. To mitigate this, spatial corrections should be applied only after confirming that residuals retain spatial dependence not already captured by spectral variables. Such adaptations could also enable the integration of predictive spatial maps as covariates in subsequent modeling stages.

Though the ensemble architecture supports spatial block cross-validation, performance estimates could gain geostatistical robustness through variogram-informed, spatially stratified cross-validation schemes. Partitioning data by spatial clusters derived from variogram ranges would preserve geographic continuity while maintaining feature-space diversity during training and validation.

Additionally, while our study introduces spectral residual correction as a method, it cannot fully address the underlying reasons for the variability in prediction performance observed across different metals. The influence of privileged information, once translated back into the spectral domain, may either capture unique non-spectral knowledge or simply reinforce weak relationships that were already present in the spectral data. This variability in performance, particularly between metals like Zn and Pb, suggests that future studies should explore how privileged information interacts with the spectral data and whether the improvements stem from capturing new knowledge or enhancing existing spectral patterns.

Soil moisture was not explicitly modeled in our framework. Instead, we enforced acquisition constraints to ensure that all data were collected under suitable moisture conditions. The HSI flight was planned during a dry period, and gravimetric measurements confirmed that all samples were below the methodological threshold. This approach minimized spectral distortions caused by moisture and ensured that observed variation reflected soil composition. Explicit correction for moisture was not performed as it would introduce additional dependencies and make it unclear whether improvements were due to privileged variables or moisture compensation. A proper assessment of moisture effects would require multiple acquisitions under different soil moisture levels, which was beyond the scope of this study.

Lastly, while neural networks are good for modeling nonlinear interactions between spectral and privileged data, their deployment demands significantly larger training datasets to mitigate overfitting risks. Furthermore, the divergent statistical scales and noise profiles of hyperspectral features and pedochemical variables complicate unified neural network training, potentially hindering convergence and interpretability. These constraints underscore the practicality of the RFR-based ensemble approach adopted here, particularly given limited field samples and heterogeneous input characteristics.

5. Conclusions

In this study, we developed a multi-stage ensemble framework that strategically incorporates privileged soil information during model training while maintaining fully spectral-based prediction at inference. Random forest regressors were used to model both direct concentration predictions from hyperspectral data and residual corrections informed by auxiliary pedochemical parameters.

Validation was conducted using temporally independent hyperspectral data and field samples, with predictive performance evaluated through pixel-level assessments across multiple spatial scales.

A comparison to a pure spectral random forest baseline demonstrated that the privileged-informed residual correction improved overall predictive accuracy for Zn and Cd, achieving a reduction in RMSE compared to the baseline models. However, for Pb, no improvements were observed. This result may be influenced by variability in the data, including spectral issues or imbalances between the training and test datasets, which could have affected the model’s ability to improve Pb predictions. This suggests that future studies should explore how these data-related factors interact with privileged information and adjust models accordingly for specific metals.

Author Contributions

Conceptualization, A.M., K.O., M.K. and M.Z.; methodology, A.M., K.O., M.K. and M.Z.; software, A.M.; validation, A.M., M.K. and M.Z.; formal analysis, A.M.; investigation, A.M. and M.Z.; resources, A.M., M.K. and M.Z.; data curation, A.M.; writing—original draft preparation, A.M.; writing—review and editing, K.O., M.K. and M.Z.; visualization, A.M.; supervision, K.O. and M.K.; project administration, A.M.; funding acquisition, A.M., M.K. and M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to thank the Slovenian Research and Innovation Agency (ARIS) for funding this work through the Research Programmes P1-0153 and P2-0406.

Data Availability Statement

The datasets presented in this study are not publicly available as the research is part of an ongoing study. Requests to access the data should be directed to Alen Mangafić.

Acknowledgments

We gratefully acknowledge the Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences, Prague, for providing access to their laboratory for soil spectral measurements.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

As	Arsenic
Cd	Cadmium
C	Total carbon
CaCO₃	Calcium carbonate
Cu	Copper
GNSS	Global Navigation Satellite System
ICP-OES	Inductively Coupled Plasma—Optical Emission Spectrometry
K₂O	Plant-available potassium
KPCA	Kernel Principal Component Analysis
N	Total nitrogen
Ni	Nickel
Pb	Lead
P₂O₅	Plant-available phosphorus
PCA	Principal Component Analysis
PRF	Privileged Random Forest
pXRF	Portable X-ray Fluorescence analyzer
RF	Random Forest
RFR	Random Forest Regressor
RMSE	Root Mean Square Error
ρ_b	Bulk Density
SNR	Şignal-to-Noise Ratio
SWIR	Short-wave Infrared
VNIR	Visible and Near-Infrared
Zn	Zinc

References

Mitra, S.; Patnaik, P.; Kebbekus, B. Environmental Chemical Analysis, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar] [CrossRef]
Government of Slovenia. Zakon o Varstvu Okolja (ZVO-2). 2022. Available online: https://www.uradni-list.si/_pdf/2022/Ur/u2022044.pdf (accessed on 10 April 2025).
Zupan, M.; Grčman, H.; Lobnik, F. Raziskave Onesnaženosti Tal Slovenije; Agencija RS Za Okolje: Ljubljana, Slovenia, 2008.
Lobnik, F.; Zupan, M.; Grčman, H. Onesnaženost Tal in Rastlin v Celjski Kotlini. In Onesnaženost Okolja in Naravni Viri Kot Omejitveni Dejavnik Razvoja v Sloveniji—Modelni Pristop za Degradirana Območja: Zbornik 1. konference; Inštitut za Okolje in Prostor: Celje, Slovenia, 2010. [Google Scholar]
Adriano, D.C. Trace Elements in Terrestrial Environments, 2nd ed.; Springer: New York, NY, USA, 2001. [Google Scholar] [CrossRef]
Richardson, M. Environmental Xenobiotics; CRC Press: London, UK, 1996. [Google Scholar] [CrossRef]
Pirc, S.; Šajn, R. Vloga geokemije v ugotavljanju kemične obremenitve okolja. In Kemizacija okolja in življenja – do katere mere? [Chemical Changes of the Environment and Life – Up to Which Extent?]; Proceedings of the European Year of Nature Conservation 1995 Project; Slovensko ekološko gibanje: Ljubljana, Slovenia, 1997; pp. 165–186. [Google Scholar]
Commission v Slovenia (Décharge de Bukovžlak). 2025. Available online: https://curia.europa.eu/juris/document/document.jsf?docid=299095 (accessed on 13 May 2025).
Krige, D.G. A Statistical Approach to Some Basic Mine Valuation Problems on the Witwatersrand. J. South. Afr. Inst. Min. Metall. 1951, 52, 119–139. Available online: https://wiredspace.wits.ac.za/items/ae034a42-dd51-44c9-b405-d14a37f76472 (accessed on 7 June 2025).
Wang, X.J. Kriging and Heavy Metal Pollution Assessment in Wastewater Irrigated Agricultural Soil of Beijing’s Eastern Farming Regions. J. Environ. Sci. Health Part A 1998, 33, 1057–1073. [Google Scholar] [CrossRef]
Wu, J.; Norvell, W.A.; Welch, R.M. Kriging on Highly Skewed Data for DTPA-Extractable Soil Zn with Auxiliary Information for pH and Organic Carbon. Geoderma 2006, 134, 187–199. [Google Scholar] [CrossRef]
Hengl, T.; Heuvelink, G.B.M.; Rossiter, D.G. About Regression-Kriging: From Equations to Case Studies. Comput. Geosci. 2007, 33, 1301–1315. [Google Scholar] [CrossRef]
Ikoyi, I.O.; Heuvelink, G.B.M.; De goede, R.G.M. Geostatistical Modelling and Mapping of Nematode-Based Soil Ecological Quality Indices in a Polluted Nature Reserve. Pedosphere 2021, 31, 670–682. [Google Scholar] [CrossRef]
Hengl, T.; Nussbaum, M.; Wright, M.N.; Heuvelink, G.B.M.; Gräler, B. Random Forest as a Generic Framework for Predictive Modeling of Spatial and Spatio-Temporal Variables. PeerJ 2018, 6, e5518. [Google Scholar] [CrossRef]
Irons, J.R.; Dwyer, J.L.; Barsi, J.A. The next Landsat Satellite: The Landsat Data Continuity Mission. Remote Sens. Environ. 2012, 122, 11–21. [Google Scholar] [CrossRef]
Pour, A.B.; Ranjbar, H.; Sekandari, M.; Abd El-Wahed, M.; Hossain, M.S.; Hashim, M.; Yousefi, M.; Zoheir, B.; Wambo, J.D.T.; Muslim, A.M. 2—Remote Sensing for Mineral Exploration. In Geospatial Analysis Applied to Mineral Exploration; Pour, A.B., Parsa, M., Eldosouky, A.M., Eds.; Elsevier: Amsterdam, The Netherlands, 2023; pp. 17–149. [Google Scholar] [CrossRef]
Pu, R. Hyperspectral Remote Sensing: Fundamentals and Practices; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar] [CrossRef]
Chabrillat, S.; Eisele, A.; Guillaso, S.; Rogaß, C.; Ben-Dor, E.; Kaufmann, H. HYSOMA: An Easy-to-Use Software Interface for Soil Mapping Applications of Hyperspectral Imagery. In Proceedings of the 7th EARSeL SIG Imaging Spectroscopy Workshop, Edinburgh, UK, 11–13 April 2011. [Google Scholar]
Ben-Dor, E. Quantitative Remote Sensing of Soil Properties. In Advances in Agronomy; Elsevier: Amsterdam, The Netherlands, 2002; Volume 75, pp. 173–243. [Google Scholar] [CrossRef]
Ben-Dor, E.; Irons, J.R.; Epema, G.F. Soil Reflectance. In Remote Sensing for the Earth Sciences: Manual of Remote Sensing; Rencz, A.N., Ed.; John Wiley & Sons: New York, NY, USA, 1999; pp. 111–188. [Google Scholar]
Chabrillat, S.; Ben-Dor, E.; Cierniewski, J.; Gomez, C.; Schmid, T.; van Wesemael, B. Imaging Spectroscopy for Soil Mapping and Monitoring. Surv. Geophys. 2019, 40, 361–399. [Google Scholar] [CrossRef]
Ben-Dor, E.; Inbar, Y.; Chen, Y. The Reflectance Spectra of Organic Matter in the Visible Near-Infrared and Short Wave Infrared Region (400–2500 Nm) during a Controlled Decomposition Process. Remote Sens. Environ. 1997, 61, 1–15. [Google Scholar] [CrossRef]
Gholizadeh, A.; Saberioon, M.; Ben-Dor, E.; Borůvka, L. Monitoring of Selected Soil Contaminants Using Proximal and Remote Sensing Techniques: Background, State-of-the-Art and Future Perspectives. Crit. Rev. Environ. Sci. Technol. 2018, 48, 243–278. [Google Scholar] [CrossRef]
Resmini, R.G.; Kappus, M.E.; Aldrich, W.S.; Harsanyi, J.C.; Anderson, M. Mineral Mapping with HYperspectral Digital Imagery Collection Experiment (HYDICE) Sensor Data at Cuprite, Nevada, USA. Int. J. Remote Sens. 1997, 18, 1553–1570. [Google Scholar] [CrossRef]
Weidong, L.; Baret, F.; Xingfa, G.; Qingxi, T.; Lanfen, Z.; Bing, Z. Relating Soil Surface Moisture to Reflectance. Remote Sens. Environ. 2002, 81, 238–246. [Google Scholar] [CrossRef]
Omran, E.-S.E. Inference Model to Predict Heavy Metals of Bahr El Baqar Soils, Egypt Using Spectroscopy and Chemometrics Technique. Model. Earth Syst. Environ. 2016, 2, 1–17. [Google Scholar] [CrossRef]
Francos, N.; Gholizadeh, A.; Ben Dor, E. Spatial Distribution of Lead (Pb) in Soil: A Case Study in a Contaminated Area of the Czech Republic. Geomat. Nat. Hazards Risk 2022, 13, 610–620. [Google Scholar] [CrossRef]
Bian, Z.; Sun, L.; Tian, K.; Liu, B.; Zhang, X.; Mao, Z.; Huang, B.; Wu, L. Estimation of Heavy Metals in Tailings and Soils Using Hyperspectral Technology: A Case Study in a Tin-Polymetallic Mining Area. Bull. Environ. Contam. Toxicol. 2021, 107, 1022–1031. [Google Scholar] [CrossRef]
Zhang, B.; Guo, B.; Zou, B.; Wei, W.; Lei, Y.; Li, T. Retrieving Soil Heavy Metals Concentrations Based on GaoFen-5 Hyperspectral Satellite Image at an Opencast Coal Mine, Inner Mongolia, China. Environ. Pollut. 2022, 300, 118981. [Google Scholar] [CrossRef]
Fu, P.; Zhang, J.; Yuan, Z.; Feng, J.; Zhang, Y.; Meng, F.; Zhou, S. Estimating the Heavy Metal Contents in Entisols from a Mining Area Based on Improved Spectral Indices and Catboost. Sensors 2024, 24, 1492. [Google Scholar] [CrossRef]
Sun, W.; Liu, S.; Zhang, X.; Zhu, H. Performance of Hyperspectral Data in Predicting and Mapping Zinc Concentration in Soil. Sci. Total Environ. 2022, 824, 153766. [Google Scholar] [CrossRef]
Zhang, Z.-H.; Guo, F.; Xu, Z.; Yang, X.-Y.; Wu, K.-Z. On Retrieving the Chromium and Zinc Concentrations in the Arable Soil by the Hyperspectral Reflectance Based on the Deep Forest. Ecol. Indic. 2022, 144, 109440. [Google Scholar] [CrossRef]
Liu, Z.; Lu, Y.; Peng, Y.; Zhao, L.; Wang, G.; Hu, Y. Estimation of Soil Heavy Metal Content Using Hyperspectral Data. Remote Sens. 2019, 11, 1464. [Google Scholar] [CrossRef]
Cui, S.; Zhou, K.; Ding, R.; Cheng, Y.; Jiang, G. Estimation of Soil Copper Content Based on Fractional-Order Derivative Spectroscopy and Spectral Characteristic Band Selection. Spectrochim. Acta. A. Mol. Biomol. Spectrosc. 2022, 275, 121190. [Google Scholar] [CrossRef] [PubMed]
Koerting, F.; Koellner, N.; Mielke, C.; Rogass, C.; Kuras, A.; Altenberger, U.; Kaestner, F.; Hildebrand, C. Hyperspectral Imaging Data of the Northern Mine Face and of Laboratory Samples of the Copper-Gold-Pyrite Mine Apliki, Nicosia District, Republic of Cyprus. GFZ Data Serv. 2021. [Google Scholar] [CrossRef]
Jiang, G.; Zhou, S.; Cui, S.; Chen, T.; Wang, J.; Chen, X.; Liao, S.; Zhou, K. Exploring the Potential of HySpex Hyperspectral Imagery for Extraction of Copper Content. Sensors 2020, 20, 6325. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Zou, B.; Chai, L.; Lin, Z.; Feng, H.; Tang, Y.; Tian, R.; Tu, Y.; Zhang, B.; Zou, H. Monitoring of Soil Heavy Metals Based on Hyperspectral Remote Sensing: A Review. Earth-Sci. Rev. 2024, 254, 104814. [Google Scholar] [CrossRef]
Vapnik, V.; Izmailov, R. Learning Using Privileged Information: Similarity Control and Knowledge Transfer. J. Mach. Learn. Res. 2015, 16, 2023–2049. [Google Scholar]
Lapin, M.; Hein, M.; Schiele, B. Learning Using Privileged Information: SVM+ and Weighted SVM. Neural Netw. 2014, 53, 95–108. [Google Scholar] [CrossRef]
Serra-Toro, C.; Traver, V.J.; Pla, F. Exploring Some Practical Issues of SVM+: Is Really Privileged Information That Helps? Pattern Recognit. Lett. 2014, 42, 40–46. [Google Scholar] [CrossRef]
Li, X.; Du, B.; Xu, C.; Zhang, Y.; Zhang, L.; Tao, D. Robust Learning with Imperfect Privileged Information. Artif. Intell. 2020, 282, 103246. [Google Scholar] [CrossRef]
Moradi, M.; Syeda-Mahmood, T.; Hor, S. Tree-Based Transforms for Privileged Learning. In Machine Learning in Medical Imaging; Wang, L., Adeli, E., Wang, Q., Shi, Y., Suk, H.-I., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 188–195. [Google Scholar] [CrossRef]
Wang, Y.; Zou, B.; Li, S.; Tian, R.; Zhang, B.; Feng, H.; Tang, Y. A Hierarchical Residual Correction-Based Hyperspectral Inversion Method for Soil Heavy Metals Considering Spatial Heterogeneity. J. Hazard. Mater. 2024, 479, 135699. [Google Scholar] [CrossRef]
Grilc, V.; Husić, M. Nastajanje in Ravnanje z Industrijskimi Odpadki v Mestni Občini Celje. In Onesnaženost Okolja in Naravni Viri Kot Omejitveni Dejavnik Razvoja v Sloveniji—Celjska Kotlina Kot Modelni Pristop Za Degradirana Območja; Inštitut Za Okolje in Prostor: Celje, Slovenia, 2013. [Google Scholar]
Brümmer, G.; Herms, U. Influence of Soil Reaction and Organic Matter on The Solubility of Heavy Metals in Soils. In Effects of Accumulation of Air Pollutants in Forest Ecosystems, Proceedings of a Workshop, Göttingen, Germany, 16–18 May 1982; Ulrich, B., Pankrath, J., Eds.; Ulrich, B., Pankrath, J., Eds.; Springer Netherlands: Dordrecht, The Netherlands, 1983; pp. 233–243. [Google Scholar] [CrossRef]
Evans, L.J. Chemistry of Metal Retention by Soils. Environ. Sci. Technol. 1989, 23, 1046–1056. [Google Scholar] [CrossRef]
MKGP. Evidenca Dejanske Rabe Kmetijskih in Gozdnih Zemljišč. Available online: https://rkg.gov.si/vstop/ (accessed on 10 April 2025).
Pevec, T.; Setev koruze. Kmetijsko Gozdarska Zbornica Slovenije, Zavod Celje. Available online: https://www.kgzs.si/uploads/kgzs_-_zavod_ce/travnistvo_in_pasnistvo/setev_koruze.doc (accessed on 12 January 2023).
Škerbot, I. S Setvijo Jarih Žit Ne Odlašamo. Kmetijsko Gozdarska Zbornica Slovenije, Zavod Celje. Available online: https://www.kmetijskizavod-celje.si/aktualno/s-setvijo-jarih-zit-ne-odlasamo-2022-02-25 (accessed on 12 January 2023).
MKGP. Sloj Kmetijskih Rastlin Iz Zbirnih Vlog (KMRS). 2019. Available online: https://podatki.gov.si/dataset/sloj-kmetijskih-rastlin-iz-zbirnih-vlog-kmrs (accessed on 31 May 2019).
Yfantis, E.A.; Flatman, G.T.; Behar, J.V. Efficiency of Kriging Estimation for Square, Triangular, and Hexagonal Grids. Math. Geol. 1987, 19, 183–205. [Google Scholar] [CrossRef]
ISO 18400-105:2017; Soil Quality—Sampling—Part 105: Packaging, Transport, Storage and Preservation of Samples. International Organization for Standardization: Geneva, Switzerland, 2017.
SIST ISO 11464:2008; Soil Quality—Pretreatment of Samples for Physico-Chemical Analyses. Slovenian Institute for Standardization: Ljubljana, Slovenia, 2008.
SIST EN ISO 11272:2020; Soil Quality—Determination of Dry Bulk Density. Slovenian Institute for Standardization: Ljubljana, Slovenia, 2020.
ISO 11465:1993; Soil Quality—Determination of Dry Matter and Water Content on a Mass Basis—Gravimetric Method. International Organization for Standardization: Geneva, Switzerland, 1993.
SIST ISO 10390:2005; Soil Quality—Determination of pH. Slovenian Institute for Standardization: Ljubljana, Slovenia, 2005.
SIST ISO 10694:2005; Soil Quality—Determination of Organic and Total Carbon After Dry Combustion (Elementary Analysis). Slovenian Institute for Standardization: Ljubljana, Slovenia, 2005.
SIST ISO 13878:2005; Soil Quality—Determination of Total Nitrogen Content by Dry Combustion (“Elemental Analysis”). Slovenian Institute for Standardization: Ljubljana, Slovenia, 2005.
SIST ISO 10693:1995; Soil Quality—Determination of Calcium Carbonate Content—Volumetric Method. Slovenian Institute for Standardization: Ljubljana, Slovenia, 2005.
ÖNORM L 1087:2012; Chemical Analyses of Soils—Determination of "Plant-Available" Phosphorus and Potassium by the Calcium-Acetate-Lactate (CAL) Method. Austrian Standards Institute: Vienna, Austria, 2012.
Jenko, T. Variabilnost Osnovnih Pedoloških Lastnosti Na Izbranih Njivskih Površinah. Bachelor’s Thesis, Univerza v Ljubljani, Biotehniška Fakulteta (samozaložba T. Jenko), Ljubljana, Slovenia, 2022. [Google Scholar]
Grčman, H.; Zupan, M. Praktična Pedologija; Biotehniška fakulteta, Center za pedologija in varstvo okolja: Ljubljana, Slovenia, 2010. [Google Scholar]
SIST ISO 12914:2019; Soil Quality—Microwave-Assisted Extraction of the Aqua Regia Soluble Fraction for the Determination of Elements. Slovenian Institute for Standardization: Ljubljana, Slovenia, 2019.
ISO 22036:2024; Environmental Solid Matrices—Determination of Elements Using Inductively Coupled Plasma Optical Emission Spectrometry (ICP-OES). International Organization for Standardization: Geneva, Switzerland, 2024.
Uredba o Merilih Za Ugotavljanje Stopnje Obremenjenosti Okolja Zaradi Onesnaženosti Tal z Nevarnimi Snovmi. Available online: http://pisrs.si (accessed on 10 January 2023).
Gholizadeh, A.; Saberioon, M.; Carmon, N.; Boruvka, L.; Ben-Dor, E. Examining the Performance of PARACUDA-II Data-Mining Engine versus Selected Techniques to Model Soil Carbon from Reflectance Spectra. Remote Sens. 2018, 10, 1172. [Google Scholar] [CrossRef]
Stoner, E.R.; Baumgardner, M.F.; Weismiller, R.A.; Biehl, L.L.; Robinson, B.F. Extension of Laboratory-Measured Soil Spectra to Field Conditions. Soil Sci. Soc. Am. J. 1980, 44, 572–574. [Google Scholar] [CrossRef]
Chabrillat, S.; Gholizadeh, A.; Neumann, C.; Berger, D.; Milewski, R.; Ogen, Y.; Ben-Dor, E. Preparing a Soil Spectral Library Using the Internal Soil Standard (ISS) Method: Influence of Extreme Different Humidity Laboratory Conditions. Geoderma 2019, 355, 113855. [Google Scholar] [CrossRef]
SphereOptics. NEO HySpex Hyperspectral Cameras. 2015. Available online: http://sphereoptics.de/wp-content/uploads/2015/01/NEO-HySpex-Hyperspectral-Cameras.pdf (accessed on 10 April 2025).
HySpex. HySpex SWIR-384: High-Resolution SWIR Hyperspectral Camera. Available online: https://www.hyspex.com/hyspex-products/hyspex-classic/hyspex-swir-384/ (accessed on 10 April 2025).
ReSe. PARGE Airborne Image Rectification. Available online: https://www.rese-apps.com/software/parge/index.html (accessed on 15 March 2025).
GURS. Portal CLSS: Pregledovalnik Podatkov Cikličnega Laserskega Skeniranja Slovenije (Geodetska uprava Republike Slovenije). Available online: https://clss.si/ (accessed on 15 March 2025).
ReSe. ATCOR for Airborne Remote Sensing. Available online: https://www.rese-apps.com/software/atcor-4-airborne/index.html (accessed on 17 March 2025).
ARSO. meteo.si—Uradna Vremenska Napoved za Slovenijo—Državna Meteorološka Služba RS—Vreme Podrobneje. Available online: https://meteo.arso.gov.si/met/sl/app/webmet/#webmet==8Sdwx2bhR2cv0WZ0V2bvEGcw9ydlJWblR3LwVnaz9SYtVmYh9iclFGbt9SaulGdugXbsx3cs9mdl5WahxXYyNGapZXZ8tHZv1WYp5mOnMHbvZXZulWYnwCchJXYtVGdlJnOn0UQQdSf (accessed on 17 March 2025).
Press, W.H. Numerical Recipes 3rd Edition: The Art of Scientific Computing; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
Alduchov, O.A.; Eskridge, R.E. Improved Magnus Form Approximation of Saturation Vapor Pressure. J. Appl. Meteorol. Climatol. 1996, 35, 601–609. [Google Scholar] [CrossRef]
Barry, R.; Chorley, R.; Barry, R.G.; Chorley, R. Atmosphere, Weather and Climate, 8th ed.; Routledge: London, UK, 2004. [Google Scholar] [CrossRef]
Salby, M.L. Fundamentals of Atmospheric Physics; International Geophysics Series; Academic Press: San Diego, CA, USA, 1996. [Google Scholar]
MetPy. precipitable_water—MetPy 1.6. Available online: https://unidata.github.io/MetPy/latest/api/generated/metpy.calc.precipitable_water.html (accessed on 21 March 2025).
Vane, G.; Goetz, A.F.H. Terrestrial Imaging Spectroscopy. Remote Sens. Environ. 1988, 24, 1–29. [Google Scholar] [CrossRef]
Clark, R.N.; Swayze, G.A.; Livo, K.E.; Kokaly, R.F.; King, T.V.V.; Dalton, J.B.; Vance, J.S.; Rockwell, B.W.; Hoefen, T.; McDougal, R.R. Surface Reflectance Calibration of Terrestrial Imaging Spectroscopy Data: A Tutorial Using AVIRIS. In Proceedings of the 10th Airborne Earth Science Workshop; Jet Propulsion Laboratory: Pasadena, CA, USA, 2002. [Google Scholar]
Qiao, X.-X.; Wang, C.; Feng, M.-C.; Yang, W.-D.; Ding, G.-W.; Sun, H.; Liang, Z.-Y.; Shi, C.-C. Hyperspectral Estimation of Soil Organic Matter Based on Different Spectral Preprocessing Techniques. Spectrosc. Lett. 2017, 50, 156–163. [Google Scholar] [CrossRef]
Savitzky, A.; Golay, M.J.E. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Anal. Chem. 1964, 36, 1627–1639. [Google Scholar] [CrossRef]
Ren, H.-Y.; Zhuang, D.-F.; Singh, A.N.; Pan, J.-J.; Qiu, D.-S.; Shi, R.-H. Estimation of As and Cu Contamination in Agricultural Soils Around a Mining Area by Reflectance Spectroscopy: A Case Study. Pedosphere 2009, 19, 719–726. [Google Scholar] [CrossRef]
Song, Y.; Li, F.; Yang, Z.; Ayoko, G.A.; Frost, R.L.; Ji, J. Diffuse Reflectance Spectroscopy for Monitoring Potentially Toxic Elements in the Agricultural Soils of Changjiang River Delta, China. Appl. Clay Sci. 2012, 64, 75–83. [Google Scholar] [CrossRef]
Smith, G.M.; and Milton, E.J. The Use of the Empirical Line Method to Calibrate Remotely Sensed Data to Reflectance. Int. J. Remote Sens. 1999, 20, 2653–2662. [Google Scholar] [CrossRef]
Ben-Dor, E.; Chabrillat, S.; Demattê, J.A.M.; Taylor, G.R.; Hill, J.; Whiting, M.L.; Sommer, S. Using Imaging Spectroscopy to Study Soil Properties. Remote Sens. Environ. 2009, 113, S38–S55. [Google Scholar] [CrossRef]
Barnes, R.J.; Dhanoa, M.S.; Lister, S.J. Standard Normal Variate Transformation and De-Trending of Near-Infrared Diffuse Reflectance Spectra. Appl. Spectrosc. 1989, 43, 772–777. [Google Scholar] [CrossRef]
Burger, J.; Geladi, P. Hyperspectral NIR Image Regression Part I: Calibration and Correction. J. Chemom. 2005, 19, 355–363. [Google Scholar] [CrossRef]
Demetriades-Shah, T.H.; Steven, M.D.; Clark, J.A. High Resolution Derivative Spectra in Remote Sensing. Remote Sens. Environ. 1990, 33, 55–64. [Google Scholar] [CrossRef]
Clark, R.N.; Roush, T.L. Reflectance Spectroscopy: Quantitative Analysis Techniques for Remote Sensing Applications. J. Geophys. Res. Solid Earth 1984, 89, 6329–6340. [Google Scholar] [CrossRef]
Schölkopf, B.; Smola, A.; Müller, K.-R. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Comput. 1998, 10, 1299–1319. [Google Scholar] [CrossRef]
Williams, C.; Seeger, M. Using the Nyström Method to Speed Up Kernel Machines. In Advances in Neural Information Processing Systems; Leen, T., Dietterich, T., Tresp, V., Eds.; MIT Press: Cambridge, MA, USA, 2000; Volume 13. [Google Scholar]
Gosar, M.; Šajn, R.; Bavec, Š.; Gaberšek, M.; Pezdir, V.; Miler, M. Geochemical Background and Threshold for 47 Chemical Elements in Slovenian Topsoil. Geologija 2019, 62, 7–59. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Strobl, C.; Boulesteix, A.-L.; Kneib, T.; Augustin, T.; Zeileis, A. Conditional Variable Importance for Random Forests. BMC Bioinformatics 2008, 9, 307. [Google Scholar] [CrossRef]
Nicodemus, K.K. Letter to the Editor: On the Stability and Ranking of Predictors from Random Forest Variable Importance Measures. Brief. Bioinform. 2011, 12, 369–373. [Google Scholar] [CrossRef] [PubMed]
Biau, G.; Scornet, E. A Random Forest Guided Tour. arXiv 2015, arXiv:1511.05741. [Google Scholar] [CrossRef]
Altmann, A.; Toloşi, L.; Sander, O.; Lengauer, T. Permutation Importance: A Corrected Feature Importance Measure. Bioinformatics 2010, 26, 1340–1347. [Google Scholar] [CrossRef]
Shmueli, G. To Explain or to Predict? Stat. Sci. 2010, 25, 289–310. [Google Scholar] [CrossRef]
Barten, A.P. The Coefficient of Determination for Regression without a Constant Term. In The Practice of Econometrics: Studies on Demand, Forecasting, Money and Income; Heijmans, R., Neudecker, H., Eds.; Springer Netherlands: Dordrecht, The Netherlands, 1987; pp. 181–189. [Google Scholar] [CrossRef]
Colin Cameron, A.; Windmeijer, F.A.G. An R-Squared Measure of Goodness of Fit for Some Common Nonlinear Regression Models. J. Econom. 1997, 77, 329–342. [Google Scholar] [CrossRef]
Smieja-Król, B.; Pawlyta, M.; Gałka, M. Ultrafine Multi-Metal (Zn, Cd, Pb) Sulfide Aggregates Formation in Periodically Water-Logged Organic Soil. Sci. Total Environ. 2022, 820, 153308. [Google Scholar] [CrossRef]
Sun, Q.; Yang, H.; Feng, X.; Liang, Y.; Gao, P.; Song, Y. Synchronous Stabilization of Pb, Zn, Cd, and As in Lead Smelting Slag by Industrial Solid Waste. Chemosphere 2023, 339, 139755. [Google Scholar] [CrossRef]
Smieja-Król, B.; Pawlyta, M.; Kądziołka-Gaweł, M.; Fiałkiewicz-Kozieł, B. Formation of Zn and Pb Sulfides in a Redox-Sensitive Modern System Due to High Atmospheric Fallout. Geochim. Cosmochim. Acta 2022, 318, 126–143. [Google Scholar] [CrossRef]

Figure 1. The study area with full hyperspectral data coverage (RGB: 2200 nm, 850 nm, 660 nm) overlaid on aerial imagery. The right panel shows a data cube visualization (RGB: 850 nm, 660 nm, 560 nm) of a representative portion of the scene, included to illustrate the high spectral detail of the hyperspectral data.

Figure 2. (a) An example of a soil sampling site, with highlighted triangle vertices indicating points projected onto the ground; (b) an example of sampling point distribution within a field.

Figure 3. Spectral transformations of HSI.

Figure 4. Frequency of auxiliary variables in the 10% best-performing models based on RMSE [ppm].

Figure 5. Prediction error distributions for Zn models across verification window sizes: ICP-OES (a–c) and pXRF (d–f), comparing RF and privileged RF (PRF) models.

Figure 6. Prediction error distributions for Pb models across verification window sizes: ICP-OES (a–c) and pXRF (d–f), comparing RF and privileged RF (PRF) models.

Figure 7. Prediction error distributions for Cd models across verification window sizes: ICP-OES (a–c) and pXRF (d–f), comparing RF and privileged RF (PRF) models.

Figure 8. Predicted Zn concentrations [ppm] across the entire study area. All agricultural fields are shown, with two validation fields highlighted where ground-truth samples were collected.

Figure 9. Predicted Cd concentrations [ppm] across the entire study area. All agricultural fields are shown, with two validation fields highlighted where ground-truth samples were collected.

Table 1. Heavy metal concentrations measured with pXRF in samples used for model training.

Field ID and Number of Samples [tot. 97]	Depth	Zn	Pb	Cd	Cu	Ni	As
	cm	[ppm]	[ppm]	[ppm]	[ppm]	[ppm]	[ppm]
P [6]	0–5	212.6	74.2	BDL *	16.0	15.6	12.1
H1 [10]	0–5	220.0	74.3	BDL *	20.6	20.6	14.4
H3 [18]	0–5	226.7	69.2	BDL *	BDL *	15.0	13.7
M1 [4]	0–5	205.2	72.5	BDL *	21.4	15.0	12.7
M2 [3]	0–5	178.7	57.7	BDL *	20.8	15.0	12.0
K [14]	0–5	379.8	114.1	BDL *	31.1	17.1	17.2
T [21]	0–5	645.9	121.0	5.1	24.8	15.0	16.5
Z [17]	0–5	768.0	122.8	7.6	27.6	15.0	14.0
Threshold value [2]		200.0	85.0	1.0	60.0	50.0	20.0
Warning value [2]		300.0	100.0	2.0	100.0	70.0	30.0
Critical value [2]		720.0	530.0	12.0	300.0	210.0	55.0
Slovenia-median [3,7]	0–5	99.0	42.0	0.6	26.3	29.2	10.2

* BDL: below detection limit.

Table 2. Heavy metal concentrations measured with ICP-OES in samples used for model training.

Field ID and Number of Samples [tot. 97]	Depth	Zn	Pb	Cd	Cu	Ni	As
	cm	[ppm]	[ppm]	[ppm]	[ppm]	[ppm]	[ppm]
P [6]	0–5	178.0	68.0	BDL *	19.5	24.9	1.6
H1 [10]	0–5	172.2	58.4	BDL *	19.1	25.1	15.1
H3 [18]	0–5	193.1	51.3	BDL *	20.2	25.9	6.8
M1 [4]	0–5	155.5	58.0	BDL *	24.5	24.9	8.8
M2 [3]	0–5	116.0	44.3	BDL *	25.3	22.0	13.6
K [14]	0–5	298.4	92.0	2.3	29.2	28.7	24.1
T [21]	0–5	562.4	92.5	4.7	21.2	21.5	4.8
Z [17]	0–5	565.5	83.9	6.2	23.5	25.2	12.6
Threshold value [2]		200.0	85.0	1.0	60.0	50.0	20.0
Warning value [2]		300.0	100.0	2.0	100.0	70.0	30.0
Critical value [2]		720.0	530.0	12.0	300.0	210.0	55.0
Slovenia-median [3,7]

* BDL: below detection limit.

Table 3. Conditional permutation results for Zn: best auxiliary variable combinations and corresponding model performance across spectral preprocessing methods.

		Model with Auxiliary Variables				Direct Model
Spectrum	Dataset	Best Variable Combination	R²	RMSE	Rel. RMSE	R²	RMSE	Rel. RMSE
Savitzky–Golay	sub. pXRF	Cu, Pb	0.79	119.3	45.6	0.63	157.7	60.2
Savitzky–Golay	samples pXRF	As, Cd, Cu, Hg, Ni, Pb	0.69	133.8	51.7	0.49	176.5	68.2
Savitzky–Golay	samples ICP-OES	CaCO₃, Cd, Pb	0.63	117.0	56.7	0.43	148.5	72.0
baseline corr.	sub. pXRF	Cd, Cu, Pb	0.83	108.2	41.3	0.71	139.9	53.4
baseline corr.	samples pXRF	As, Cd, Cu, Hg, Ni, Pb	0.73	129.4	50.1	0.49	166.8	64.5
baseline corr.	samples ICP-OES	Cd, Cu, Ni, organic matter, Pb, P₂O₅	0.62	122.7	59.5	0.35	150.8	73.1
first derivative	sub. pXRF	As, CaCO₃, Cd, Cu, Ni, Pb	0.91	79.5	30.4	0.83	105.6	40.4
first derivative	samples pXRF	As, CaCO₃, Cd, Cu, Hg, Pb	0.78	104.2	40.3	0.68	134.2	51.9
first derivative	samples ICP-OES	As, CaCO₃, Cd, Cu, Ni, Pb	0.73	101.4	49.2	0.62	115.9	56.2
sec. derivative	sub. pXRF	As, Cd, Cu, Hg, Ni, Pb	0.90	83.7	32.0	0.80	114.6	43.8
sec. derivative	samples pXRF	As, CaCO₃, Cd, Cu, Ni, Pb	0.84	99.0	38.3	0.69	133.0	51.4
sec. derivative	samples ICP-OES	As, CaCO₃, Cd, Pb	0.72	97.3	47.2	0.60	116.4	56.5
continuum rem.	sub. pXRF	Cd, Cu, Ni, Pb	0.81	113.2	43.3	0.70	141.0	53.9
continuum rem.	samples pXRF	As, Cd, Cu, Hg, Ni, Pb	0.65	130.0	50.3	0.43	173.1	67.0
continuum rem.	samples ICP-OES	As, CaCO₃, Cd, Cu, Ni, Pb	0.57	116.3	56.4	0.35	144.0	69.8
PCA 20	sub. pXRF	As, K₂O, organic matter, Pb, P₂O₅, pH	0.93	68.1	26.0	0.73	134.8	51.5
PCA 20	samples pXRF	As, Cd, K₂O, organic matter, Pb, P₂O₅, pH	0.87	88.4	34.2	0.37	190.4	73.6
PCA 20	samples ICP-OES	Cd, Cu, Ni, K₂O, organic matter, Pb, P₂O₅, pH	0.80	88.8	43.1	0.35	152.7	74.1
PCA 50	sub. pXRF	As, Cd, Cu, K₂O, organic matter, Pb, P₂O₅, pH	0.87	94.2	36.0	0.60	164.1	62.7
PCA 50	samples pXRF	As, Cd, Cu, K₂O, organic matter, Pb, P₂O₅, pH	0.78	117.9	45.6	0.24	208.8	80.8
PCA 50	samples ICP-OES	Cd, Pb, organic matter, P₂O₅	0.69	110.6	53.6	0.28	165.9	80.5
KPCA 20	sub. pXRF	As, bulk density, Cd, Cu, Pb, pH, organic matter, P₂O₅	0.91	74.9	28.6	0.63	154.0	58.8
KPCA 20	samples pXRF	As, Cd, Cu, Pb, P₂O₅, pH	0.89	83.4	32.2	0.56	160.0	61.9
KPCA 20	samples ICP-OES	As, Cd, organic matter, Pb, P₂O₅, pH	0.79	84.3	40.9	0.37	143.2	69.4
KPCA 50	sub. pXRF	As, bulk density, CaCO₃, Cd, Cu, Ni, Pb, Hg, pH, P₂O₅	0.88	88.6	33.8	0.63	156.1	59.6
KPCA 50	samples pXRF	As, Cd, Cu, K₂O, organic matter, Pb, P₂O₅, pH	0.83	102.6	39.7	0.43	175.9	68.0
KPCA 50	samples ICP-OES	CaCO₃, Cd, Cu, organic matter, Pb, P₂O₅	0.71	99.4	48.2	0.52	133.5	64.7

Table 4. Conditional permutation results for Pb: best auxiliary variable combinations and corresponding model performance across spectral preprocessing methods.

		Model with Auxiliary Variables				Direct Model
Spectrum	Dataset	Best Variable Combination	R²	RMSE	Rel. RMSE	R²	RMSE	Rel. RMSE
Savitzky–Golay	sub. pXRF	As, Cd, Cu, Ni, Zn, Hg	0.56	28.4	65.6	0.18	38.6	89.2
Savitzky–Golay	samples pXRF	As, Cd, Cu, Hg, Ni, Zn	0.39	29.4	69.2	0.01	41.0	96.4
Savitzky–Golay	samples ICP-OES	As, CaCO₃, Cd, Cu, Ni, Zn	−0.02	28.6	83.5	−0.05	33.8	98.5
baseline corr.	sub. pXRF	As, Cd, Cu, Ni, Zn, Hg	0.59	27.6	63.7	0.41	32.4	74.8
baseline corr.	samples pXRF	As, Cd, Cu, Hg, Ni, Zn	0.51	28.5	67.0	0.11	36.7	86.3
baseline corr.	samples ICP-OES	Cu, Ni, Zn	0.34	26.2	76.5	0.16	30.2	88.2
first derivative	sub. pXRF	As, CaCO₃, Cd, Cu, Hg, Zn	0.79	19.4	44.9	0.59	27.2	62.8
first derivative	samples pXRF	As, CaCO₃, Cd, Hg, Ni, Zn	0.33	29.0	68.1	0.32	32.4	76.2
first derivative	samples ICP-OES	As, CaCO₃, Cd, Cu, Ni, Zn	0.26	27.6	80.6	0.19	29.4	85.9
sec. derivative	sub. pXRF	As, Cd, Cu, Hg, Ni, Zn	0.78	20.1	46.5	0.56	28.5	65.8
sec. derivative	samples pXRF	As, CaCO₃, Cu, Hg, Ni, Zn	0.47	29.7	69.9	0.23	35.0	82.3
sec. derivative	samples ICP-OES	As, CaCO₃, Cd, Cu, Ni, Zn	0.16	28.0	81.6	0.08	31.4	91.8
continuum rem.	sub. pXRF	Cd, Cu, Ni, Zn	0.61	26.8	61.9	0.35	34.2	79.0
continuum rem.	samples pXRF	As, Cd, Cu, Hg, Ni, Zn	0.28	30.0	70.4	0.25	35.6	83.7
continuum rem.	samples ICP-OES	As, CaCO₃, Cd, Cu, Ni, Zn	0.13	27.8	81.0	0.02	32.3	94.3
PCA 20	sub. pXRF	As, CaCO₃, Cd, Hg, Zn	0.88	14.9	34.4	0.53	29.3	67.6
PCA 20	samples pXRF	As, Cd, Cu, Hg, Zn	0.76	20.3	47.6	0.27	35.4	83.1
PCA 20	samples ICP-OES	As, CaCO₃, Cd, Cu, Zn	0.39	25.2	73.5	0.10	31.3	91.3
PCA 50	sub. pXRF	As, Cd, Cu, Hg, Zn	0.80	19.2	44.3	0.42	32.7	75.5
PCA 50	samples pXRF	As, Cd, Cu, Zn	0.64	24.7	57.9	0.19	36.2	85.0
PCA 50	samples ICP-OES	Cd, Cu, Ni, Zn	0.35	26.2	76.5	0.02	32.4	94.4
KPCA 20	sub. pXRF	As, Cu, Hg, CaCO₃, organic matter, pH, P₂O₅, K₂O, Zn	0.86	16.3	37.6	0.26	36.6	84.6
KPCA 20	samples pXRF	As, Cd, Cu, Hg, Zn	0.79	18.7	43.9	0.09	38.3	90.1
KPCA 20	samples ICP-OES	As, CaCO₃, Cd, Cu, Zn	0.50	22.3	65.1	0.07	31.5	92.0
KPCA 50	sub. pXRF	As, Cd, Cu, CaCO₃, organic matter, pH, P₂O₅, Zn	0.78	20.0	46.2	0.32	35.1	81.1
KPCA 50	samples pXRF	As, CaCO₃, Cu, Hg, Ni, Zn	0.66	23.7	55.8	0.11	37.8	88.8
KPCA 50	samples ICP-OES	CaCO₃, Cd, Cu, Ni, Zn	0.38	24.6	71.7	0.27	28.3	82.7

Table 5. Conditional permutation results for Cd: best auxiliary variable combinations and corresponding model performance across spectral preprocessing methods.

		Model with Auxiliary Variables				Direct Model
Spectrum	Dataset	Best Variable Combination	R²	RMSE	Rel. RMSE	R²	RMSE	Rel. RMSE
Savitzky–Golay	sub. pXRF	CaCO₃, organic matter, Pb, pH, P₂O₅, Zn	0.51	2.6	69.1	0.45	2.8	72.7
Savitzky–Golay	samples pXRF	As, Cu, Hg, Ni, Pb, Zn	0.46	2.3	69.2	0.43	2.5	73.8
Savitzky–Golay	samples ICP-OES	CaCO₃, Cu, Ni, Pb, P₂O₅, Zn	0.74	1.3	48.6	0.64	1.6	57.3
baseline corr.	sub. pXRF	bulk density Cu, Pb, Hg, K₂O, pH, P₂O₅, Zn	0.42	2.9	76.8	0.39	2.9	76.8
baseline corr.	samples pXRF	As, Cu, Hg, Ni, Pb, Zn	0.32	2.3	68.9	0.50	2.3	67.8
baseline corr.	samples ICP-OES	As, CaCO₃, Cu, Ni, Pb, Zn	0.67	1.5	54.3	0.54	1.7	62.1
first derivative	sub. pXRF	bulk density, CaCO₃, Cu, K₂O, organic matter, P₂O₅, Pb, pH, Zn	0.55	2.5	65.5	0.49	2.7	69.5
first derivative	samples pXRF	CaCO₃, Cu, K₂O	0.65	1.9	56.7	0.63	1.9	57.4
first derivative	samples ICP-OES	CaCO₃, Pb, Zn	0.88	0.9	32.5	0.81	1.1	39.2
sec. derivative	sub. pXRF	bulk density, Cu, Hg, Ni, Zn, K₂O, P₂O₅, pH	0.51	2.6	68.9	0.49	2.7	70.2
sec. derivative	samples pXRF	As, bulk density, CaCO₃, Cu, Hg, K₂O, Ni, organic matter, Pb, pH, P₂O₅, Zn	0.33	2.0	60.3	0.62	1.9	58.1
sec. derivative	samples ICP-OES	As, CaCO₃, Cu, Ni, Pb, Zn	0.90	0.8	29.6	0.86	1.0	35.5
continuum rem.	sub. pXRF	As, bulk density, CaCO₃, Cu, Hg, organic matter pH, P₂O₅, Zn	0.46	2.8	72.1	0.31	3.1	81.7
continuum rem.	samples pXRF	As, CaCO₃, Cu, Hg, K₂O, organic matter, P₂O₅, Zn	0.53	2.2	64.3	0.44	2.3	69.8
continuum rem.	samples ICP-OES	Cu, Ni, Pb, Zn	0.59	1.3	49.1	0.51	1.6	56.5
PCA 20	sub. pXRF	As, CaCO₃, Cu, Hg, K₂O, organic matter, Pb, pH, P₂O₅, Zn	0.55	2.5	65.7	0.46	2.8	72.6
PCA 20	samples pXRF	As, CaCO₃, Ni, K₂O, organic matter, Pb, pH, P₂O₅, Zn	0.27	2.0	59.6	0.45	2.4	70.6
PCA 20	samples ICP-OES	Ni, K₂O, organic matter, Pb, pH, P₂O₅, Zn	0.85	1.0	37.0	0.55	1.7	62.1
PCA 50	sub. pXRF	Hg, Ni, K₂O, Pb, P₂O₅, pH, Zn	0.53	2.6	68.1	0.31	3.0	79.3
PCA 50	samples pXRF	CaCO₃, Cu, Ni, K₂O, organic matter, Pb, P₂O₅, pH, Zn	0.25	2.3	67.1	0.34	2.6	77.4
PCA 50	samples ICP-OES	Ni, K₂O, organic matter, Pb, P₂O₅, pH, Zn	0.73	1.4	49.5	0.39	2.0	72.7
KPCA 20	sub. pXRF	As, bulk density, CaCO₃, K₂O, organic matter, Pb, pH, P₂O₅, Zn	0.58	2.5	64.4	0.45	2.8	73.0
KPCA 20	samples pXRF	As, CaCO₃, Cu, Ni, Pb, Zn, K₂O, P₂O₅	0.31	2.1	63.7	0.47	2.3	69.0
KPCA 20	samples ICP-OES	bulk density, K₂O, Ni, organic matter, Pb, P₂O₅, pH, Zn	0.80	1.1	38.5	0.59	1.6	57.3
KPCA 50	sub. pXRF	CaCO₃, Cu, Hg, Ni, Pb, Pb, pH, P₂O₅, Zn	0.55	2.5	65.3	0.45	2.8	73.0
KPCA 50	samples pXRF	As, CaCO₃, Cu, K₂O, Ni, organic matter, Pb, pH, P₂O₅, Zn	0.30	2.2	66.1	0.46	2.4	70.2
KPCA 50	samples ICP-OES	CaCO₃, Cu, Ni, organic matter, Pb, P₂O₅, Zn	0.76	1.2	43.7	0.56	1.7	60.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mangafić, A.; Oštir, K.; Kolar, M.; Zupan, M. Hyperspectral Soil Heavy Metal Prediction via Privileged-Informed Residual Correction. Remote Sens. 2025, 17, 1987. https://doi.org/10.3390/rs17121987

AMA Style

Mangafić A, Oštir K, Kolar M, Zupan M. Hyperspectral Soil Heavy Metal Prediction via Privileged-Informed Residual Correction. Remote Sensing. 2025; 17(12):1987. https://doi.org/10.3390/rs17121987

Chicago/Turabian Style

Mangafić, Alen, Krištof Oštir, Mitja Kolar, and Marko Zupan. 2025. "Hyperspectral Soil Heavy Metal Prediction via Privileged-Informed Residual Correction" Remote Sensing 17, no. 12: 1987. https://doi.org/10.3390/rs17121987

APA Style

Mangafić, A., Oštir, K., Kolar, M., & Zupan, M. (2025). Hyperspectral Soil Heavy Metal Prediction via Privileged-Informed Residual Correction. Remote Sensing, 17(12), 1987. https://doi.org/10.3390/rs17121987

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hyperspectral Soil Heavy Metal Prediction via Privileged-Informed Residual Correction

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Soil Sampling and Analysis

2.2.1. Sampling Design

2.2.2. Analysis of Soil Parameters

2.2.3. Analysis of Heavy Metal Concentrations

2.2.4. Laboratory Spectroradiometry

2.3. Hyperspectral Data Acquisition and Preprocessing

2.4. Feature Engeneering

2.5. Multi-Stage Privileged Learning with Spectral Residual Correction

2.5.1. Conditional Permutation Importance for Privileged Feature Selection

2.5.2. Privileged-Informed Residual Ensemble Architecture

2.5.3. Validation of Heavy Metal Concentration Predictions

3. Results

3.1. Conditional Permutation Results

3.2. Selected Heavy Metal Concentration Inferred from HSI

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI