1. Introduction
Rising global food demand and increasing environmental concerns have made monitoring soil health a critical priority for sustainable agriculture [
1]. Precision agriculture, which involves tailoring agricultural inputs to site-specific conditions, has emerged as a promising approach to improve crop yields while minimizing negative environmental impacts [
2]. Accurate estimation of soil properties, such as nutrient levels and pH, is central to precision agriculture, as these properties directly influence crop growth, soil health, and the effectiveness of agricultural interventions [
3].
Traditional methods for assessing soil properties rely on physical sampling and laboratory analysis. While these methods are reliable, they are labor-intensive, costly, and they provide only point-based measurements that may not represent larger field variability [
4]. Such sparse and localized sampling can miss important spatial patterns of soil nutrients or contaminants [
4]. Remote sensing techniques have therefore gained traction as a complementary, non-invasive approach to soil analysis [
4,
5,
6,
7,
8].
Hyperspectral imaging (HSI) captures reflectance data across hundreds of narrow, contiguous spectral bands, providing means to assess soil characteristics over broad areas without direct contact with the ground [
9,
10]. Soil properties (such as organic matter, moisture, or mineral content) impart distinctive and measurable features in the soil’s spectral signature [
11,
12,
13,
14]. Soil organic matter strongly influences visible to near-infrared reflectance through light absorption, with higher organic content typically decreasing overall soil reflectance [
15,
16]. Soil moisture content significantly affects spectral reflectance across the entire spectrum, particularly in shortwave infrared regions where water absorption bands at 1440 nm and 1930 nm are prominent [
17,
18,
19]. Clay minerals and iron oxides exhibit characteristic absorption features in the visible and near-infrared regions, with iron-bearing minerals dominating visible region absorption (400–700 nm) and clay minerals showing distinctive features in the shortwave infrared [
20,
21,
22]. By measuring reflectance across a wide spectrum, HSI can differentiate materials based on their unique spectral signature, enabling the detection of variations in soil composition and condition [
23,
24]. This capability makes HSI an indispensable tool for mapping soil properties over large agricultural regions in a rapid and cost-effective manner.
Despite its potential, hyperspectral data poses significant challenges due to its high dimensionality and complexity. A single HSI scene can have hundreds of bands, resulting in a large feature space from which relevant spectral–spatial patterns are difficult to extract. Traditional machine learning methods frequently struggle with these high-dimensional data and the nonlinear relationships between spectral features and soil properties [
25]. Simpler models may fail to capture the variety of spectral signatures associated with different soil parameters, particularly under changing field conditions (e.g., moisture or surface residue changes) [
26]. This complexity demands more advanced analytical techniques capable of extracting meaningful information from HSI while avoiding overfitting and noise sensitivity.
Deep learning (DL) approaches, particularly convolutional neural networks (CNNs), have shown considerable success in various computer vision tasks, including hyperspectral image analysis [
27,
28,
29,
30,
31]. Specialized deep models have achieved state-of-the-art performance in tasks such as hyperspectral image classification and segmentation [
32]. However, most existing DL models for hyperspectral regression require a large amount of labeled training data to learn effectively. Labeled data in agricultural applications (for example, ground truth soil measurements coincident with HSI) is frequently limited, expensive to obtain, and does not usually cover all conditions [
33,
34]. As a result, purely deep models are prone to overfitting on small datasets and may perform poorly when applied to new regions or soil types [
35,
36]. Recent advances in self-supervised learning offer a promising avenue to tackle the data scarcity problem. By pulling together (in feature space) different augmented views of the same sample and pushing apart views of different samples, contrastive frameworks enable models to capture spectral patterns without relying on extensive labeled datasets [
37]. Such techniques have produced impressive results in both general computer vision and remote sensing applications [
38,
39], indicating the possibility of improving feature extraction for hyperspectral data. Pretraining a model in a self-supervised manner on a large collection of HSI (without the need for ground truth labels) allows one to initialize the model with more robust and informative spectral feature encodings for downstream regression tasks.
Another promising approach to improving soil property prediction is through the development of hybrid frameworks that combine the strengths of deep learning and classical machine learning [
40,
41]. In this hybrid approach, a deep neural network can act as a feature extractor, distilling the high-dimensional hyperspectral data into a compact set of informative features, while traditional machine learning models (or ensembles) can be used as final predictors for soil parameters [
42]. By combining deep and shallow learners, one can achieve a form of regularization [
43]. The deep model transforms the input into a lower-dimensional feature space, while the downstream ML model reduces overfitting through ensemble averaging and other constraints [
44]. This synergy is especially useful in scenarios with limited training data [
45].
In this paper, we introduce HyperSoilNet, a hybrid framework for soil property estimation from hyperspectral imagery. HyperSoilNet integrates a hyperspectral-native CNN backbone with a self-supervised contrastive learning scheme and a machine learning ensemble for regression. We apply HyperSoilNet to the HyperView Challenge dataset [
46], a recent benchmark for soil property prediction from satellite-based HSI focusing on four important soil parameters: potassium oxide (K
2O), phosphorus pentoxide
, magnesium (Mg), and pH. Our experimental results show that the proposed approach outperforms existing state-of-the-art models on this dataset, highlighting its potential for advancing precision agriculture and sustainable soil management.
In summary, our contributions are the following: (1) We propose a hybrid framework (HyperSoilNet) that integrates a hyperspectral CNN backbone and an ensemble of traditional ML regressors to estimate soil properties. (2) We demonstrate that this approach outperforms others on a public benchmark dataset. (3) We provide an analysis of the framework’s components. The remainder of this paper is organized as follows:
Section 2 reviews relevant literature for soil property estimation using hyperspectral data.
Section 3 details the proposed hybrid methodology.
Section 4 presents the experimental setup and results, and
Section 5 presents a discussion of the results and insights into the model’s performance. Finally,
Section 6 concludes the paper with a summary and suggestions for future work.
2. Related Work
Hyperspectral Imaging for Soil Analysis: Hyperspectral remote sensing has a rich history in soil and agricultural applications, providing a means to assess soil properties across large areas with high spectral fidelity. HSI enables the identification of soil constituents such as minerals, organic matter, moisture, and nutrients based on their spectral signatures [
47]. Numerous studies [
5,
6] have leveraged hyperspectral data for in situ soil property estimation, often in the context of precision agriculture and land management. Early approaches [
48,
49] drew from techniques in spectroscopy and chemometrics, using statistical analysis of spectra or handcrafted spectral indices to infer soil parameters. For example, vegetation and soil indices (like NDVI and its soil-adjusted variants) have been employed to indirectly estimate properties like soil organic carbon or fertility [
50]. More specifically, partial least squares regression (PLSR) and other multivariate regression methods have traditionally been used to model the relationship between lab-measured soil properties and their spectral reflectance, particularly in studies involving soil spectral libraries or field spectrometer data [
51,
52]. These traditional methods were effective in many cases, establishing a baseline performance for soil prediction tasks. However, as the availability of hyperspectral imagery has grown, so has the need to map soil properties at scale, introducing greater variability in soil conditions and imaging factors. To address this complexity, the community has gradually incorporated more powerful machine learning techniques than linear regression.
Classical Machine Learning Approaches: A variety of traditional machine learning (ML) models have been used to predict soil properties from spectral data. Support vector machines (SVMs) and random forest (RF) ensembles are popular choices that have demonstrated strong performance in numerous case studies [
49]. These models can capture nonlinear relationships between spectral features and soil properties and tend to be more robust than simple linear models. For instance, Abdulraheem et al. [
4] provide a comprehensive review of remote sensing methods for soil measurement, highlighting the effectiveness of tree-based ensembles and kernel methods in this domain. In many studies, a common workflow is to first perform feature extraction or selection on the hyperspectral data (for example, using principal components, band selection, or expert-designed spectral features) and then train an ML regressor on those features [
53,
54]. While these traditional approaches can achieve good performance, particularly when calibrated to a specific region or dataset, they may struggle to generalize broadly. One significant limitation is that manually crafted features or shallow decision boundaries may not fully capture the complex, high-order interactions found in full-spectrum hyperspectral data.
Deep Learning Methods: Deep learning has increasingly been explored for modeling hyperspectral data, including soil parameter estimation. Deep neural networks can automatically learn feature representations from raw spectral images, potentially uncovering subtler patterns than manual feature engineering. For example, Zhong et al. [
55] demonstrated that a deep CNN outperformed a shallow CNN and traditional ML methods for predicting soil properties in the large LUCAS soil dataset. Other architectures such as autoencoders and recurrent neural networks (RNNs) have also been investigated. Autoencoders (stacked denoising autoencoders, in particular) have been used to learn unsupervised spectral features that improve subsequent prediction of multiple soil attributes [
56,
57]. More recently, attention mechanisms and transformer-based architectures have been introduced to hyperspectral analysis [
58,
59]. Overall, deep learning methods have pushed the performance boundaries in soil spectroscopy, but they often require careful regularization, large training datasets, or transfer learning to be effective, due to the risk of overfitting in data-scarce scenarios [
35,
36].
Hyperspectral Estimation of Soil Nutrients and pH: In contrast to other soil properties like organic matter and nitrogen, research on hyperspectral estimation of potassium (K), phosphorus (P), magnesium (Mg), and pH has been limited, despite the importance of soil nutrients and pH for agricultural applications. This limitation stems partly from the fact that these nutrients do not exhibit distinctive spectral features in the visible to shortwave infrared (400–2500 nm) region [
60]. Traditional approaches for nutrient estimation have relied on PLSR models combined with spectral preprocessing techniques. Peng et al. [
61] developed methods using PLSR combined with variable selection algorithms to estimate soil total nitrogen, phosphorus, and potassium, finding that potassium showed better prediction accuracy (
= 0.82) compared to nitrogen and phosphorus due to its metallic nature and higher spectral sensitivity. Mahajan et al. [
62] utilized field spectroscopy with PLSR to monitor wheat nutrient content including nitrogen, phosphorus, potassium, and sulfur, demonstrating the effectiveness of visible-shortwave infrared reflectance (350–2500 nm) for macronutrient detection in agricultural applications. Riad et al. [
63] investigated soil nutrient prediction using Landsat-8 hyperspectral satellite imagery in northern Bangladesh, developing a machine learning-based hybrid classification model with over 1500 satellite images to identify major soil nutrients and support agricultural decision-making in regions using traditional farming practices.
Recent advances in machine learning have improved nutrient estimation accuracy. Chlouveraki et al. [
64] investigated combinations of Principal Component Regression (PCR), Automatic Relevance Determination (ARD), PLSR, and Multi-Layer Perceptrons (MLP) for predicting macronutrients including nitrogen, phosphorus, potassium, calcium, and magnesium from hyperspectral data. Their study revealed that feature extraction and selection techniques are crucial for refining the high-dimensional spectral input space. Castaldi et al. [
60] utilized PRISMA hyperspectral satellite data with machine learning algorithms to retrieve topsoil nutrients (N, P, K) and pH, finding that PRISMA data provided slightly better accuracy than Sentinel-2 for nutrient retrieval, with shortwave infrared bands being particularly important for P and K estimation.
For pH estimation specifically, research has shown more promising results due to pH’s influence on iron oxide content and overall soil reflectance characteristics. Jain et al. [
65] developed novel spectral indices specifically for soil pH estimation using hyperspectral data, achieving
values of 0.86 for AfSIS soil pH and 0.945 for LUCAS-2009 soil pH using artificial neural networks combined with principal component analysis. Yang et al. [
66] compared multiple machine learning approaches including PLSR, least squares-support vector machines, extreme learning machines, and Cubist regression for pH prediction from vis-NIR spectra, with extreme learning machines showing the best performance (
= 0.74, RMSE = 0.42 for pH). The performance differences between linear and nonlinear methods highlight the complex relationships between soil pH and spectral reflectance patterns.
Recent deep learning approaches have further advanced nutrient estimation capabilities. Sun et al. [
67] developed a hybrid CBiResNet-BiLSTM model for soil total nitrogen estimation, achieving
= 0.937 compared to traditional PLSR (
= 0.883), representing a 5.4% improvement. However, comprehensive studies comparing traditional ML, deep learning, and hybrid approaches specifically for K, P, Mg, and pH estimation from hyperspectral data remain limited. Most existing research focuses on individual nutrients or combines nutrients with other soil properties, making direct performance comparisons challenging. The lack of standardized datasets and evaluation protocols for these specific properties further complicates progress assessment in the field.
Hybrid and Ensemble Frameworks: An emerging trend in the field is the development of hybrid frameworks that seek to harness the complementary advantages of different approaches. Rather than viewing classical ML and deep learning as mutually exclusive solutions, recent research shows that combining them can lead to more robust and generalized models [
40,
41,
42]. Another approach is the model ensembling of heterogeneous learners. For example, ensemble models that combine the outputs of neural networks and traditional ML models can often outperform either model alone, by reducing variance and exploiting different modeling strengths. The benefits of such hybrid strategies were evident in the 2022 HyperView Challenge (a competition for predicting soil properties from hyperspectral images). The Hyperview Challenge winner, EagleEyes [
68], combined random forest and KNN with hand-crafted features, achieving a score of 0.781. However, its reliance on manual feature engineering limits scalability. HyperSoilNet advances this paradigm by integrating a self-supervised Hyperspectral CNN backbone with an ML ensemble, leveraging unlabeled data to automate feature extraction while maintaining prediction accuracy.
Research Gaps and Motivation: Despite the progress in applying traditional ML and DL to hyperspectral soil data, there remain noteworthy gaps in the literature [
6,
49]. Many studies either rely solely on classical ML or on end-to-end deep networks; relatively few attempts have been made to integrate these approaches into a cohesive framework for regression tasks [
69,
70]. The potential of self-supervised learning in this domain is also largely untapped [
38], with a few attempts such as SSL-SoilNet [
71] for soil organic carbon prediction and early work on soil moisture estimation using Self-Organizing Maps [
72]. Most prior works train on labeled data only [
58,
64,
73], overlooking the value of abundant unlabeled hyperspectral data to pretrain models [
74]. While recent reviews highlight self-supervised learning as a rising trend in remote sensing [
38], and foundational work like Tile2Vec [
75] and Seasonal Contrast [
76] has demonstrated the effectiveness of contrastive learning in related domains, applications in soil property prediction are still in their early stages, with limited methodological diversity and no established benchmark datasets for SSL evaluation in hyperspectral soil analysis [
38,
77]. Recent advances in hyperspectral SSL frameworks such as SpectralEarth [
78] and SatMAE [
79] provide transferable methodologies, though their application to soil analysis remains unexplored. Additionally, as noted in our review of nutrient-specific research, there is a significant gap in comprehensive studies that systematically compare different modeling approaches for K, P, Mg, and pH estimation, with most research focusing on individual properties or broader soil characteristic assessments. Our work is motivated to develop a hybrid approach that combines self-supervised feature learning and ensemble modelling to improve generalization. To our knowledge, this is the first approach to integrate contrastive self-supervised learning with a classical ML ensemble for hyperspectral soil property estimation. In the following sections, we build upon the insights from prior work and detail how our method is designed to advance the state of the art in non-invasive soil property estimation.
3. Methodology
In this work, we present a hybrid framework for soil property estimation from hyperspectral imagery called HyperSoilNet. Our approach integrates a pretrained deep learning backbone with traditional machine learning regressors to effectively leverage both representation learning and ensemble prediction. The entire workflow is designed to address the challenges of high-dimensional hyperspectral data while maximizing prediction accuracy for soil properties.
3.1. Dataset Characteristics and Analysis
The experiments in this study were conducted using the Hyperview dataset, a collection of high-quality hyperspectral imagery and corresponding ground truth soil measurements, provided as part of the Hyperview Challenge organized by KP Labs, ESA, and QZ Solutions [
46]. The dataset consists of hyperspectral images for training and validation, with ground truth soil property measurements provided in CSV format by the challenge organizers. All soil attributes were collected and measured by the challenge organizers. The authors did not conduct any soil sampling or laboratory analysis.
The hyperspectral imagery was acquired over Polish agricultural areas in March 2021 using a HySpex VS-725 (Norsk Elektro Optikk, Oslo, Norway) hyperspectral imager mounted on a Piper PA-31 Navajo (Piper Aircraft Corporation, Vero Beach, FL, USA) aircraft flying over actual agricultural fields (
Figure 1). This imaging system comprises SWIR-384 and VNIR-1800 imagers (Norsk Elektro Optikk, Oslo, Norway), capturing a total of 430 hyperspectral bands, which were subsequently reduced to 150 bands to match the spectral range of the Intuition-1 satellite’s onboard sensor [
80]. The Hyperview dataset contains 150 contiguous hyperspectral bands spanning 462.08 to 938.37 nm with approximately 3.2 nm spectral resolution, corresponding to the VNIR portion of the electromagnetic spectrum. Ground truth measurements of soil properties were obtained by the challenge organizers through in situ sampling and analysis using the Mehlich 3 methodology [
81,
82]. The four target soil properties measured were potassium oxide (K
2O), phosphorus pentoxide (P
2O
5), magnesium (Mg), and soil pH. This field-based data collection by the challenge organizers ensures that models trained on this dataset reflect realistic agricultural conditions, including natural variations in soil moisture, surface roughness, and atmospheric effects, making the results applicable to practical precision agriculture scenarios. The complete dataset consists of 2886 patches (1732 for training and 1154 for testing), with each patch containing 150 spectral bands and representing a field with the four ground truth soil parameters provided in CSV format.
Analysis of the spectral characteristics reveals distinctive reflectance patterns across different soil property levels (
Figure 2). The spectral differences between soil parameter levels are most pronounced in the NIR-SWIR region (750–938 nm), with a distinctive reflectance increase after the Red Edge region (700–750 nm). Fields with high Mg and pH levels show notably higher reflectance in the NIR-SWIR region compared to those with lower values, while K
2O and P
2O
5 differences are more subtle across the entire spectral range. While soil spectral reflectance mechanisms involve complex interactions between soil components [
14,
48], the dataset provides sufficient spectral information for soil property discrimination as demonstrated by the challenge results and the performance of participating methods.
The distribution of soil properties across the training dataset exhibits notable patterns, with K
2O, P
2O
5, and Mg showing right-skewed distributions (most fields having lower to medium values), while pH follows a more normal distribution centered around 6.8. Correlation analysis between soil properties (
Figure 3) revealed a moderate positive correlation between K
2O and P
2O
5 (r = 0.41), suggesting these macronutrients share common dynamics in the studied soils. K
2O and Mg showed a weaker positive relationship (r = 0.23), while a slight negative correlation exists between P
2O
5 and Mg (r = −0.10). The correlations between pH and other properties were particularly weak (r = 0.01 to 0.17), confirming that soil acidity is largely independent of nutrient content in this dataset.
These observations guided our feature engineering process and model design choices, highlighting the need for techniques that can capture the subtle spectral variations in the VIS (462–700 nm) and Red Edge (700–750 nm) regions, while leveraging the more prominent differences in the NIR-SWIR bands (750–938 nm). The varying correlation strengths between soil properties further supported our multi-task learning approach, which can leverage shared information for correlated properties while maintaining specificity for more independent variables like pH.
3.2. Framework Overview
HyperSoilNet consists of two main components, as illustrated in
Figure 4. The foundation of the first component is a pretrained Hyperspectral-Native CNN Backbone based on the HyperKon architecture [
83], which was previously trained on a large collection of hyperspectral satellite imagery using contrastive learning. This backbone serves as our feature extractor, providing robust spectral–spatial representations that have been learned from diverse hyperspectral data. Then we adapt the pretrained backbone for soil property estimation. The second component is an ML Ensemble Module which employs multiple traditional machine learning regressors that operate on the extracted features to predict soil properties with enhanced robustness.
3.3. Pretrained Backbone and Architectural Adaptations
The HyperKon architecture is based on ResNeXt’s multibranch cardinality design, comprising multiple convolutional blocks with squeeze-and-excitation attention mechanisms. The network contains approximately 5.54M parameters distributed across residual blocks with cardinality of 32 and bottleneck width of 4. The architecture includes an initial convolutional layer followed by four main residual stages with
blocks, respectively, each incorporating squeeze-and-excitation modules for adaptive feature recalibration. The network was pretrained using self-supervised contrastive learning on the EnHyperSet-1 dataset, which contains 800 hyperspectral scenes (200 Level 1B, 200 Level 1C, 400 Level 2A) with 224 spectral bands ranging from 420–2450 nm, covering diverse global urban, forest, and agricultural environments. The pretraining utilized NT-Xent contrastive loss for 1000 epochs with batch size 32 and Adam optimizer with learning rate
. Complete architectural specifications and training procedures are detailed in [
83].
We adapt the pretrained model for our specific task of soil property estimation. Our first key adaptation is the integration of a spectral attention mechanism after the initial convolutional layers. This mechanism is specifically designed to emphasize the most informative wavelengths for soil property estimation based on our spectral analysis findings. The attention module works by spatially pooling the feature map to generate channel-wise weights that highlight important spectral bands. This process can be formulated as
where
is the feature map, GAP is global average pooling,
and
are weights of the MLP with reduction ratio
,
is the ReLU function,
is the sigmoid function, and ⊗ denotes channel-wise multiplication.
As a second adaptation, we incorporate a global context module that combines global average pooling and global max pooling operations, followed by concatenation and dimension reduction. This module helps capture both overall field characteristics and the most distinctive spectral features, providing a more comprehensive representation of the soil sample. The final output of our adapted backbone is a 128-dimensional feature embedding vector that encapsulates the complex spectral–spatial patterns associated with different soil properties. These CNN-extracted features serve as the primary input to our machine learning ensemble, providing each algorithm with rich spectral–spatial representations that capture hierarchical patterns learned through self-supervised pretraining on diverse hyperspectral imagery.
3.4. Feature Engineering and Processing
To maximize information extraction from hyperspectral data, we implement a comprehensive feature extraction process guided by our spectral analysis findings. We process the raw hyperspectral patches (, where w and h are spatial dimensions and c represents 150 spectral bands) through several complementary transformations designed to capture different aspects of the spectral–spatial information.
The first set of features focuses on spectral characteristics through the computation of average spectral reflectance and its first-, second-, and third-order derivatives. These derivatives highlight subtle variations and absorption features specific to different soil minerals, which are often not apparent in the raw reflectance data. The derivative operation can be expressed as
where
is the reflectance at wavelength
, and
n is the derivative order. This approach is motivated by our observation that spectral differences between soil parameter levels are most pronounced in the NIR-SWIR region, with distinctive patterns after the Red Edge region (bands 60–80).
For capturing multi-scale spectral patterns, we apply discrete wavelet transforms (DWT) using the Meyer wavelet [
84]. The wavelet transform decomposes the signal into approximation (
) and detail (
) coefficients:
where
represents approximation coefficients and
represents detail coefficients at decomposition level
j (we use
). This multi-resolution analysis enables the detection of features at different spectral scales, which we found particularly valuable for differentiating between similar soil types with subtle spectral differences.
To capture dominant spatial–spectral patterns, we employ singular value decomposition (SVD) to each spectral channel. For a given spectral band
represented as a matrix, the SVD can be written as
where
contains the singular values
. We use the top five singular values and their ratios as features to capture the dominant spectral–spatial patterns within each field while reducing dimensionality.
Finally, we extract frequency domain characteristics through Fast Fourier Transforms (FFT):
The real and imaginary components of the FFT enhance the representation of periodic patterns in the spectral signatures, which can be indicative of certain mineral compositions within the soil.
These feature engineering techniques were selected based on their ability to capture the specific spectral characteristics observed in our dataset analysis. The correlation patterns between soil properties (
Figure 3) further informed our approach, as we needed to capture both shared and property-specific spectral patterns in the data. The 128-dimensional CNN feature vector, combined with these engineered spectral features, provides each machine learning algorithm with complementary representations that enhance predictive performance. For Random Forest, the CNN features serve as input variables for decision tree splitting, enabling the discovery of complex spectral–spatial decision boundaries that leverage multi-band interactions beyond traditional spectral indices. XGBoost utilizes these features within its gradient boosting framework, where the rich CNN representations allow the algorithm to model subtle spectral variations and build more accurate predictive trees by focusing on residual errors in the learned feature space. The KNN algorithm operates directly in the CNN feature space, using Euclidean distance calculations where hyperspectral patches with similar soil properties are positioned closer together based on learned spectral–spatial patterns rather than raw spectral similarity, resulting in more meaningful similarity metrics for soil property prediction.
3.5. Machine Learning Ensemble
The features extracted by the adapted backbone serve as input to a machine learning ensemble comprising Random Forest [
43], XGBoost [
85], and K-Nearest Neighbors (KNN) [
86] regressors. Our choice of these algorithms is based on their complementary strengths for soil property modeling, as revealed through our experimental analysis and supported by extensive literature demonstrating their effectiveness in hyperspectral soil analysis [
4,
49]. Random Forest provides robust performance with good resistance to overfitting through its ensemble of decision trees [
43]. The random subspace method enables it to capture different aspects of the spectral–spatial features, performing well even when specific regions of the spectrum contain noise or atmospheric effects [
44,
87]. XGBoost, as a gradient boosting framework, sequentially improves predictions by focusing on previously misclassified samples [
85], making it particularly valuable for accurately predicting extreme values of soil properties, which are less common in the dataset but agriculturally important [
45,
58]. KNN, as a non-parametric method, captures local patterns in the feature space [
88], making it effective for fields with similar spectral signatures and providing a contrasting approach to the tree-based methods, thus improving ensemble diversity [
14].
Each regressor in the ensemble is independently optimized with tailored configurations determined through a systematic grid search with 5-fold cross-validation on the training dataset. The Random Forest uses 100 decision trees with mean-squared error as the split criterion, maximum depth of 20, and minimum samples per leaf of 5. The choice of 100 trees provides a good balance between computational efficiency and model stability, as recommended in the literature for ensemble methods on moderate-sized datasets [
43,
87]. The maximum depth of 20 prevents overfitting while allowing sufficient model complexity to capture spectral–spatial relationships, and minimum samples per leaf of 5 ensures adequate statistical support for leaf nodes [
45]. We employ bootstrap sampling with sample weights inversely proportional to property frequency to address class imbalance. The XGBoost regressor is configured with a learning rate of 0.1, 100 boosting rounds, maximum tree depth of 5, L1 regularization (alpha) of 0.01, and L2 regularization (lambda) of 1.0. The learning rate of 0.1 is a commonly recommended conservative value that ensures stable convergence while maintaining reasonable training time [
85]. The maximum depth of 5 and regularization parameters (alpha = 0.01, lambda = 1.0) were selected to prevent overfitting in the high-dimensional feature space typical of hyperspectral applications [
58]. The 100 boosting rounds were determined through cross-validation to achieve optimal performance without overfitting. We also use early stopping with a patience of 15 rounds to prevent overfitting. The KNN regressor utilizes 7 neighbors with distance-weighted voting using Euclidean distance in the feature space and applies a standardization preprocessor to ensure fair distance calculations across all feature dimensions.
We implement a property-specific weighted ensemble that assigns different weights to each regressor based on its performance for each soil property. These weights are determined using Bayesian optimization to minimize the validation error for each property:
where
is the final ensemble prediction for property
p (K, P
2O
5, Mg, or pH) on sample
i, and
,
, and
are the optimized weights for each regressor on property
p such that
.
The optimal weights varied by soil property, reflecting the strength of each regressor for different soil characteristics. For potassium (K), the weights were distributed as , , and , indicating that Random Forest and XGBoost contributed most significantly to K prediction. For phosphorus pentoxide (P2O5), XGBoost received the highest weight (, , ), suggesting its effectiveness for this property. Magnesium (Mg) prediction relied more heavily on Random Forest (, , ), while pH prediction was dominated by XGBoost (, , ). This property-specific weighting strategy improved overall prediction accuracy by 3–5% compared to simple averaging, with the most significant improvements observed for pH and P2O5 predictions.
3.6. Training and Implementation Details
We implemented HyperSoilNet using PyTorch 2.5.1 for the CNN backbone and scikit-learn 1.4.2 for the ML ensemble. All experiments were conducted on an NVIDIA A100 GPU with 40 GB memory. The training process consisted of two main phases: backbone fine-tuning and ensemble training.
The backbone fine-tuning hyperparameters were selected based on established practices for transfer learning in hyperspectral analysis and validated through preliminary experiments [
35,
36,
89]. The backbone was fine-tuned for 100 epochs with a batch size of 24 using the AdamW optimizer with a weight decay of 1 × 10
−4. The batch size of 24 was chosen to maximize GPU memory utilization while ensuring stable gradient estimates, and 100 epochs provided sufficient training time for convergence without overfitting on the limited labeled data [
83]. The AdamW optimizer with weight decay of 1 × 10
−4 is recommended for fine-tuning pretrained models, providing adaptive learning rates and effective regularization [
90]. We employed a cosine annealing learning rate schedule starting from 1 × 10
−4 and decreasing to 1 × 10
−6. The initial learning rate of 1 × 10
−4 is conservative for fine-tuning pretrained networks, preventing catastrophic forgetting while allowing parameter adaptation [
89]. During fine-tuning, we used a multi-task loss function combining mean squared error (MSE) for each soil property:
where
are property-specific weights (1.0, 1.2, 1.0, and 1.5 for K, P
2O
5, Mg, and pH, respectively) determined based on the property distributions and relative prediction difficulties.
After fine-tuning the backbone, we extracted features for all training samples and trained the ML ensemble. Each regressor (RF, XGBoost, KNN) was trained independently using its optimal hyperparameters. We employed 5-fold stratified cross-validation to ensure robust performance evaluation and prevent overfitting. For the final model, we trained each regressor on the full training set and optimized the ensemble weights using a held-out validation set (20% of the training data).
5. Discussion
5.1. Analysis of Property-Specific Performance
Our cross-validation results reveal interesting patterns in the performance of HyperSoilNet across different soil properties. The varying prediction accuracy can be related to both the distribution characteristics of each property in the dataset and the correlations between different soil parameters. More importantly, these performance differences have fundamental physical and chemical bases rooted in how each soil property manifests in hyperspectral reflectance patterns.
Phosphorus pentoxide (P
2O
5) achieved the highest
(0.786), which may be attributed to its characteristic spectral response patterns that are captured effectively by the hyperspectral bands. Phosphorus in soils is primarily associated with iron and aluminum phosphates that exhibit characteristic absorption features in the near-infrared region (800–1200 nm) [
14]. These mineral phases create distinct spectral signatures because phosphate groups interact with metal cations to form crystalline structures with specific vibrational frequencies detectable in the hyperspectral range [
48]. The P-O stretching and bending modes in phosphate minerals contribute to diagnostic absorption features that are well-captured by the 150-band hyperspectral data used in this study. This high prediction accuracy facilitates more precise soil management decisions for phosphorus fertilization. Furthermore, P
2O
5 showed a moderate positive correlation with potassium (r = 0.41), allowing the model to leverage shared information between these properties.
Potassium oxide (K
2O) showed similarly strong predictive performance (
= 0.771), which may be attributed to its correlation with P
2O
5 and its own distinct patterns in the hyperspectral data. Potassium in agricultural soils occurs primarily in K-bearing minerals such as feldspars, micas, and clay minerals. These minerals exhibit diagnostic spectral features related to Al-OH and Mg-OH stretching vibrations in the shortwave infrared region (1400–2400 nm) [
21]. The crystalline structure of K-feldspars and the layer silicate structure of micas create characteristic absorption patterns that are distinguishable from other soil components [
23]. Additionally, exchangeable potassium associated with clay mineral surfaces can influence the overall spectral response through changes in surface chemistry and hydration states. The correlation between K
2O and P
2O
5 suggests shared dynamics in the soil that the model can leverage for prediction, potentially through shared features or parameters in the neural network. This strong performance for both macronutrients is encouraging for precision agriculture applications where nutrient management is critical.
Magnesium (Mg) predictions were less accurate (
= 0.686), which may be related to the slight negative correlation with P
2O
5 (r = −0.10) that might introduce competing patterns that complicate prediction. The lower accuracy for magnesium can be explained by its complex occurrence in multiple mineral phases with overlapping spectral characteristics. Magnesium occurs in primary minerals such as olivine and pyroxene, secondary minerals like chlorite and vermiculite, and as exchangeable cations on clay surfaces [
20]. Unlike phosphorus and potassium, which have more distinct mineral associations, magnesium’s spectral signature is often masked or confounded by iron oxides and organic matter, which dominate reflectance in the visible and near-infrared regions [
22]. The Mg-OH absorption features in sheet silicates occur in similar wavelength ranges to other hydroxyl-bearing minerals, making spectral discrimination more challenging. Additionally, Mg showed greater variability in its relationships with other soil properties compared to K
2O and P
2O
5, suggesting greater heterogeneity in how this nutrient is distributed across the agricultural fields in the dataset.
Soil pH was the most challenging property to predict (
= 0.529), which aligns with its weak correlations with other properties (r = 0.01 to 0.17). The difficulty in predicting soil pH stems from its complex physicochemical nature as an integrative measure of multiple soil processes rather than a direct mineral component. Soil pH reflects the balance of acid-producing and acid-neutralizing reactions involving carbonates, organic acids, clay mineral surface chemistry, and aluminum hydrolysis [
24]. Unlike nutrients that are associated with specific mineral phases, pH influences soil reflectance indirectly through its effects on iron oxide crystallinity (hematite vs. goethite), organic matter decomposition products, and clay mineral surface charge [
12]. These indirect relationships create more variable and context-dependent spectral patterns that are difficult to capture consistently across different soil types and management conditions. pH determination involves complex chemical interactions in soil that may be influenced by multiple factors, including soil texture, organic matter content, and mineral composition. The independent nature of pH compared to the nutrient properties means the model cannot leverage correlations to improve prediction accuracy, requiring the algorithm to rely solely on the direct spectral-pH relationships present in the data.
These findings suggest that both the inherent complexity of soil property-spectral relationships and the correlations between properties influence prediction accuracy. Our property-specific ensemble weighting strategy helps mitigate these challenges by optimizing the regressor combination for each property, giving more weight to the algorithms that perform best for that specific soil parameter. The varying performance across properties also highlights the importance of multi-task learning approaches that can account for both shared and property-specific patterns in hyperspectral soil data.
5.2. Advantages of the Hybrid Approach
The superior performance of our hybrid approach compared to both end-to-end deep learning (Variant C in our ablation study) and individual ML regressors (Variants D1–D3) highlights several key advantages of combining these methodologies for soil property estimation. Our hybrid framework leverages complementary strengths from both paradigms: the CNN backbone excels at extracting complex spectral–spatial patterns from raw hyperspectral data, while the ML ensemble provides robust regression with lower risk of overfitting. This combination is particularly valuable given the limited labeled data available in the Hyperview dataset, where end-to-end deep learning approaches would be more prone to overfitting without extensive regularization.
Our property-specific ensemble weighting strategy further enhances this hybrid approach by allowing the model to adapt to the unique characteristics of each soil property. The optimal weights determined through Bayesian optimization reveal that different regressors excel at different properties. XGBoost received higher weights for P2O5 and pH prediction, while Random Forest contributed more significantly to K and Mg prediction. This adaptive weighting can be understood in terms of the statistical properties of each soil parameter and the corresponding modeling strengths of each regressor. For instance, the strength of XGBoost in modeling non-linear relationships and handling outliers makes it particularly effective for pH, which showed the lowest correlation with other properties and more scattered distribution patterns.
The hybrid framework also improves generalization by combining multiple prediction approaches, reducing the risk of overfitting to specific spectral patterns or soil conditions in the training data. This ensemble effect is evident in the strong performance on the challenge test set, which likely contains fields with different characteristics than those in the training set. The diversity of the ensemble members, spanning both tree-based (Random Forest, XGBoost) and distance-based (KNN) methods, ensures that the model can handle a wide range of spectral signatures and soil conditions.
Our ablation results quantify these advantages, showing that the full HyperSoilNet (Variant A) achieved a custom score of 0.683 ± 0.011, significantly outperforming both the CNN-only approach (Variant C, 0.738 ± 0.012) and the best individual regressor (Variant D1, 0.779 ± 0.008). The most dramatic performance drop occurred when removing the pretraining (Variant B, 0.820 ± 0.015), highlighting the critical role of transfer learning from a larger dataset in establishing a strong foundation for soil property prediction. The HyperKon backbone was pretrained using self-supervised contrastive learning on a diverse collection of hyperspectral satellite imagery spanning multiple geographical regions and land cover types, providing robust spectral–spatial feature representations that transfer effectively to soil analysis tasks [
83]. This extensive pretraining on unlabeled hyperspectral data enables the model to learn generalizable spectral patterns that are not achievable when training solely on the limited labeled soil data.
5.3. Limitations and Future Directions
Despite its promising performance, our approach has several limitations that warrant further investigation in future work. The geographic specificity of the model represents the most significant constraint for large-scale cross-regional application, as it was developed and evaluated exclusively on data from Polish agricultural regions. This limitation reflects a fundamental challenge in hyperspectral soil analysis: models trained on region-specific data may not generalize to other geographical contexts due to substantial variations in parent material, climate, vegetation, and land management practices across different regions. Soil characteristics and spectral signatures vary considerably across different global regions due to differences in parent material, climate, vegetation, and land management practices. This variability may limit the direct transferability of our model to other geographical contexts without adaptation. The challenge of developing truly generalizable models for large-scale cross-regional application remains a significant research gap in the field of hyperspectral soil analysis, where most studies, including ours, focus on specific regions or datasets.
Current hyperspectral soil property estimation approaches, including machine learning and deep learning methods, often exhibit limited transferability when applied to new geographical regions or different soil types. This limitation is not unique to our approach but represents a broader challenge in the field, where models tend to perform well within their training domains but show degraded performance when applied to areas with different soil characteristics, climatic conditions, or agricultural practices. The development of more robust, generalizable models that can maintain performance across diverse geographical regions remains an active area of research requiring coordinated efforts across multiple institutions and datasets.
Future research should validate the approach on diverse datasets spanning different geographical areas and soil types to assess and improve generalization. Potential strategies for improving cross-regional generalization include domain adaptation techniques, transfer learning approaches, and the development of standardized spectral correction methods that can account for regional variations in soil composition and environmental conditions. Additionally, collaborative efforts to create multi-regional datasets with consistent measurement protocols could facilitate the development of more globally applicable soil analysis models.
The current model also does not account for temporal dynamics in soil spectral signatures, which can vary significantly due to changing moisture conditions, vegetation cover, or management practices throughout growing seasons. Soil moisture, in particular, has a strong influence on spectral reflectance across the entire spectrum, potentially confounding the relationship between reflectance and nutrient content. Incorporating temporal modeling or developing moisture-invariant spectral indices could improve robustness across different seasonal conditions. Multi-temporal datasets that capture the same fields under different moisture conditions would be particularly valuable for this research direction.
From an interpretability perspective, while our spectral analysis provides insights into the relationship between soil properties and spectral signatures, the deep learning component still operates partially as a black box. This limitation can impede adoption by agricultural practitioners who need to understand and trust model predictions. Developing more physically interpretable models that directly relate spectral features to known soil absorption mechanisms would enhance trust and facilitate adoption. Techniques such as layer-wise relevance propagation or gradient visualization could help elucidate which spectral regions most influence predictions for each soil property.
Our current framework focuses on four commonly measured soil properties (K, P2O5, Mg, and pH), but precision agriculture requires monitoring of additional parameters such as nitrogen, organic carbon, soil texture, and moisture. Extending the model to predict these properties would increase its practical utility for comprehensive soil management. This extension would likely require additional labeled data for these properties and potentially different spectral regions or features to capture their unique signatures.
Future research directions to address these limitations include developing soil-specific pretraining techniques that incorporate domain knowledge about spectral absorption features of different soil constituents, exploring physics-informed neural networks that integrate spectroscopic principles directly into the model architecture, investigating active learning approaches to optimize ground sampling strategies based on model uncertainty, creating multi-modal frameworks that combine hyperspectral data with other sensing technologies (e.g., thermal, SAR) for comprehensive soil health assessment, developing domain adaptation and transfer learning methods specifically designed for cross-regional soil analysis applications, and extending the approach to higher spatial resolution imagery (e.g., from drones) for field-scale precision management. Additionally, establishing international collaborative frameworks for sharing hyperspectral soil datasets across different geographical regions could facilitate the development of more robust and generalizable soil analysis algorithms. These advancements would further improve the accuracy, interpretability, and practical utility of hyperspectral soil property estimation for precision agriculture and sustainable land management.
5.4. Broader Implications for Precision Agriculture
The capabilities demonstrated by HyperSoilNet have significant implications for precision agriculture and sustainable land management. Accurate remote estimation of soil properties could substantially reduce the need for extensive soil sampling and laboratory analysis, lowering costs and enabling more frequent monitoring. Unlike traditional soil testing based on sparse point samples, hyperspectral imagery provides continuous spatial coverage, revealing field heterogeneity and enabling site-specific management. More precise information about soil nutrient status allows farmers to apply fertilizers only where and when needed, reducing environmental impacts while maintaining productivity.
However, the practical deployment of such systems for large-scale cross-regional applications requires careful consideration of the geographical limitations discussed above. While our approach demonstrates promising results within the Polish agricultural context, broader implementation would necessitate region-specific validation and potential model adaptation to account for local soil characteristics and environmental conditions.
The approach could be scaled from individual fields to regional or national agricultural monitoring systems, supporting policy decisions and environmental assessment. The soil property maps generated by our approach could feed directly into variable-rate application equipment, enabling automated and optimized resource management. By advancing the accuracy and reliability of non-invasive soil analysis, HyperSoilNet represents a step toward more sustainable and efficient agricultural systems that balance productivity with environmental stewardship. The development of more generalizable approaches that can maintain performance across diverse geographical regions remains a key research priority for realizing the full potential of hyperspectral soil analysis in global precision agriculture applications.
6. Conclusions
In this work, we introduced HyperSoilNet, a novel hybrid framework that integrates a pretrained hyperspectral CNN backbone with an ensemble of classical regression models to estimate soil properties from hyperspectral imagery. Our approach represents a practical solution for non-invasive soil analysis, directly addressing the challenges posed by limited labeled data and the high dimensionality of hyperspectral images.
The main contributions of this work include the development of a hybrid framework that leverages the complementary strengths of deep learning for feature extraction and traditional machine learning for robust regression, implementation of soil-specific adaptations to the HyperKon backbone, introduction of a property-specific weighted ensemble approach that optimizes prediction performance for each soil parameter individually, and evaluation on the Hyperview Challenge dataset.
Our experiments confirmed that the combination of a pretrained hyperspectral backbone and a carefully designed ML ensemble outperforms both end-to-end deep learning approaches and traditional feature engineering methods. The ablation studies highlighted the importance of each component, with the pretrained backbone providing the foundation for effective feature extraction and the ensemble approach ensuring robust predictions across diverse soil conditions.
Overall, HyperSoilNet contributes a robust and efficient approach for soil property estimation from hyperspectral imagery, supporting more informed decision-making in precision agriculture and sustainable land management.