Residual Attention Network with Atrous Spatial Pyramid Pooling for Soil Element Estimation in LUCAS Hyperspectral Data

Deng, Yun; Cao, Yuchen; Chen, Shouxue; Cheng, Xiaohui

doi:10.3390/app15137457

Open AccessArticle

Residual Attention Network with Atrous Spatial Pyramid Pooling for Soil Element Estimation in LUCAS Hyperspectral Data

¹

Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin 541006, China

²

College of Computer Science and Engineering, Guilin University of Technology, Guilin 541006, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7457; https://doi.org/10.3390/app15137457

Submission received: 9 June 2025 / Revised: 25 June 2025 / Accepted: 27 June 2025 / Published: 3 July 2025

(This article belongs to the Special Issue Advanced Agricultural Technologies: Monitoring, Modeling, and Machine Learning Techniques)

Download

Browse Figures

Versions Notes

Abstract

Visible and near-infrared (Vis–NIR) spectroscopy enables the rapid prediction of soil properties but faces three limitations with conventional machine learning: information loss and overfitting from high-dimensional spectral features; inadequate modeling of nonlinear soil–spectra relationships; and failure to integrate multi-scale spatial features. To address these challenges, we propose ReSE-AP Net, a multi-scale attention residual network with spatial pyramid pooling. Built on convolutional residual blocks, the model incorporates a squeeze-and-excitation channel attention mechanism to recalibrate feature weights and an atrous spatial pyramid pooling (ASPP) module to extract multi-resolution spectral features. This architecture synergistically represents weak absorption peaks (400–1000 nm) and broad spectral bands (1000–2500 nm), overcoming single-scale modeling limitations. Validation on the LUCAS2009 dataset demonstrated that ReSE-AP Net outperformed conventional machine learning by improving the R² by 2.8–36.5% and reducing the RMSE by 14.2–69.2%. Compared with existing deep learning methods, it increased the R² by 0.4–25.5% for clay, silt, sand, organic carbon, calcium carbonate, and phosphorus predictions, and decreased the RMSE by 0.7–39.0%. Our contributions include statistical analysis of LUCAS2009 spectra, identification of conventional method limitations, development of the ReSE-AP Net model, ablation studies, and comprehensive comparisons with alternative approaches.

Keywords:

atrous spatial pyramid pooling; LUCAS2009 dataset; multi-scale feature extraction; nonlinear soil–spectra relationships; Vis–NIR

1. Introduction

Soil, as an essential component of Earth’s ecosystems, serves not only as the foundational medium for agricultural production [1], but also as a critical mediator sustaining biodiversity and facilitating carbon cycling processes [2]. Under the dual pressures of climate change and human activities, global soil degradation has exhibited an alarming trend [1,3,4,5,6]. This situation highlights the urgent need for precise and efficient soil monitoring technologies to achieve the “Zero Hunger” and “Life on Land” objectives outlined in the United Nations Sustainable Development Goals (SDGs) [7,8,9,10,11,12]. Visible and near-infrared (Vis–NIR) spectroscopy, characterized by its rapid and non-destructive analytical capabilities, has become an integral tool in contemporary soil analysis. Utilizing spectral response information within the wavelength range of 400–2500 nm, Vis–NIR spectroscopy enables the effective assessment of critical soil parameters, such as organic matter and heavy metals, thus offering technological feasibility for large-scale soil surveys [13,14,15,16,17,18].

In terms of methodological frameworks for soil spectral modeling, traditional machine learning techniques have established diversified approaches including partial least squares regression (PLSR), support vector machine regression (SVR), random forest (RF), ridge regression, and gradient boosting trees (XGBoost). Numerous scholars have conducted extensive research in the field of soil hyperspectral inversion. For instance, P Jia et al. utilized the extremely randomized tree (ERT) model to predict soil electrical conductivity in northwestern China [19] while L Jia et al. employed the marine predators algorithm to optimize random forest models for predicting the soil organic matter content [20]. Wu B et al. applied an optimized XGBoost model to retrieve the soil copper content [21], while Zhang M et al. compared linear models (GWR, PLSR) with nonlinear models (RF, SVM) to predict the arsenic concentration in soils from Pingtan Island [22]. Z Gao et al. inverted the total nitrogen content in apple orchard soils during fertilization using hyperspectral data and various machine learning regression methods [23]. Q Song et al. leveraged UAV hyperspectral data and compared PLSR with ensemble learning models for the inversion of soil textures (sand, silt, clay) [24]. Zhou W et al. combined laboratory-based spectral data with random forest and Bayesian data fusion methods to estimate the soil organic carbon in the Three-River Headwater Region [25]. Zhong Q et al. utilized hyperspectral data in conjunction with extreme learning machine (ELM) and support vector machine (SVM) for urban soil nickel concentration inversion [26]. Subi X et al. developed hyperspectral models for soil organic matter (SOM) in arid regions of northwest China, comparing multiple linear regression and machine learning approaches [27]. Chen S et al. applied continuous wavelet transform (CWT) coupled with extreme learning machine (ELM) for the rapid inversion of soil moisture content [28]. While the above studies have demonstrated considerable success in soil hyperspectral inversion using machine learning, three major issues remain. (1) Existing machine learning models often rely on principal component analysis (PCA) or manual band selection for dimensionality reduction when dealing with large-scale datasets such as the LUCAS2009. However, linear dimensionality reduction methods compromise spectral continuity (e.g., absorption peak shapes and adjacent-band relationships), leading to diminished sensitivity to subtle spectral signals (e.g., heavy metal feature peaks). (2) Even when utilizing nonlinear models (e.g., RF, XGBoost), their tree-based feature splitting mechanisms essentially represent piecewise linear approximations, thus failing to adequately characterize complex nonlinear coupling between soil elements and spectral features. (3) Existing methodologies predominantly adopt single-scale modeling, unable to simultaneously capture local spectral details and global trends, thereby resulting in fragmented cross-band relational information.

Within the methodological framework of deep learning, approaches such as convolutional neural networks (CNN) and long short-term memory networks (LSTM) exhibit significant advantages compared with traditional machine learning paradigms. Specifically, deep learning circumvents limitations associated with manual feature engineering by leveraging autonomous feature learning mechanisms, effectively captures complex higher-order response relationships through deep nonlinear mappings, and enhances practical applicability via end-to-end data-driven modeling frameworks. From a theoretical perspective, these methods demonstrate superiority in feature representational capacity (representation learning), efficiency in processing high-dimensional data (dimensional invariance), multi-task generalization (parameter sharing), and robustness to noise (distributed representations), thereby providing a novel paradigm for modeling complex soil–spectral interactions. Empirically, the suitability and advantages of deep architectures have been validated by previous studies: for instance, Sheng Wang et al. [29] employed an LSTM-based framework to capture dependencies within spectral sequences; Wang H et al. [30] proposed a CNN-LSTM hybrid architecture for the joint extraction of spatial and temporal features; and Li H’s group [31] developed a dual-branch CNN architecture effectively integrating heterogeneous features, achieving breakthroughs in various application scenarios.

Addressing the core issues inherent in traditional methods—specifically the difficulties in processing high-dimensional data, insufficient nonlinear representation, and multi-scale fragmentation—this study introduces a novel deep-learning-based model termed as the multi-scale attention residual network (ReSE-AP Net). Building upon the residual convolutional neural network (ResNet) structure, the proposed model incorporates innovations across multiple dimensions. (1) Channel attention mechanism: By embedding a squeeze-and-excitation (SE) module, global average pooling is employed to capture statistical channel-wise feature responses, and a two-layer fully connected network dynamically calibrates feature channels, significantly enhancing the representation of critical spectral regions such as heavy-metal-sensitive bands. (2) Multi-scale feature pyramid construction: An atrous spatial pyramid pooling (ASPP) module based on dilated convolutions is designed to simultaneously capture the local spectral details and global spectral trends through parallel convolutional branches with varying receptive fields (dilation rates of 1, 2, and 4). (3) Hierarchical feature fusion: Employing residual skip-connections to facilitate cross-layer information interaction, local textural features from shallow layers (e.g., baseline reflectance fluctuations) and abstracted nonlinear spectral combinations from deep layers are integrated, creating a multi-granularity feature representation system.

To validate the effectiveness of the proposed model, rigorous comparative experiments were conducted using the LUCAS2009 benchmark dataset. The experiments included traditional machine learning models (PLSR, Ridge, SVR, RF, XGBoost) and mainstream deep learning models (VGG, ResNet, temporal convolutional network (TCN), Transformer). Evaluation metrics employed were the coefficient of determination (R²) and root mean square error (RMSE). Results indicated that the ReSE-AP Net model significantly outperformed traditional machine learning methods across all elements, achieving improvements of 2.8–36.5% in R² and reductions of 14.2–69.2% in RMSE. Compared with contemporary deep learning models commonly used in the field, the ReSE-AP Net achieved a superior R² performance for more than half of the soil elements, improving by approximately 0.2–25.5% while maintaining a comparable performance with the best deep learning models for the remaining elements. Moreover, the proposed model consistently exhibited superior RMSE performance, outperforming all other deep learning models except for matching the TCN performance on pH (H₂O), demonstrating improvements of approximately 0.7–39.0%, thus confirming its excellent predictive accuracy and generalization capability.

2. Materials and Methods

2.1. Data Sets and Data Processing

The LUCAS 2009 dataset [32], a flagship outcome of the EU-led Land Use/Cover Area frame Survey, is recognized as one of the most representative continental-scale environmental databases in Europe due to its stringent soil sampling and analytical standards. Implementing a systematic 2 km × 2 km grid design, the survey spans the 25 EU Member States and adjacent regions, encompassing approximately 19,000 locations where topsoil (0–20 cm) was sampled with fine granularity. At each site, composite sampling was rigorously applied: five sub-samples were collected in a cross pattern within a 2 m radius centered on the geo-referenced point and subsequently pooled to form a 0.5 kg topsoil sample, thereby objectively representing the soil properties of roughly 4 m² of land [33]. The sampling network covers diverse land-use categories, including arable land, grassland, and forest, with a particularly high proportion of agricultural sites.

During laboratory analysis, each soil sample underwent standardized pre-treatments, including air-drying, homogenization, and quality control, before being systematically characterized for fifteen core parameters: texture fractions (clay, silt, sand), chemical properties (pH, organic carbon, carbonates, total nitrogen, available phosphorus, and potassium), physical structure (coarse fragment content), and functional attributes (cation exchange capacity). Multispectral reflectance spectra were additionally acquired for a subset of samples, providing multi-dimensional inputs for subsequent soil-health assessments and carbon-stock modeling. Although the dataset’s sampling density (one point per 4 km²) and its emphasis on agricultural land impose constraints on fine-scale ecological studies and analyses of non-agricultural soils, its rigorous stratified sampling scheme, harmonized analytical protocols, and open-access policy render it an indispensable benchmark for evaluating EU agricultural policies, investigating the soil-degradation–climate interactions, and validating remote-sensing inversion models. To date, it continues to play an irreplaceable role in environmental science, agricultural management, and carbon-cycle research.

The rigorous hierarchical sampling framework, standardized analytical methods, and open-access nature of the LUCAS dataset provide a robust foundation for this study. This methodological integrity ensures the reliability of our model’s performance evaluation while minimizing the propagation of errors originating from data inaccuracies. A critical consideration, however, is the dataset’s pronounced skew toward agricultural land cover. Accordingly, the generalizability of our findings to extensive non-agricultural ecosystems warrants further investigation, representing a clear and compelling avenue for subsequent research efforts.

2.1.1. Dataset Statistics and Division

The LUCAS dataset employed in this study comprised 19,036 samples, but individual feature columns exhibited varying degrees of missingness. To mitigate the impact of missing data on model training without incurring excessive data loss, the following strategy was adopted: for any target variable under prediction, only those rows with missing values in that specific column were removed. This approach balances sample size with data integrity, thereby enhancing model robustness and generalization. The remaining data were partitioned into a training set (66.6%) and an independent test set (33.3%), with fivefold cross-validation applied to the training set and 20% of the training samples withheld as a validation subset.

Descriptive statistics were computed for each split, including the size (number of observations), mean (arithmetic average, reflecting central tendency), std (standard deviation, measuring dispersion), median (middle value, an alternative measure of central tendency), mode (most frequent value), kurtosis (peakedness, indicating tail heaviness), and Iqr (inter-quartile range, a robust measure of spread). The results are summarized in Table 1.

From a statistical perspective, nutrient-related variables such as organic carbon (OC), CaCO₃, total nitrogen (N), phosphorus (P), and potassium (K) display pronounced right-skewness; P and K additionally exhibit leptokurtic, heavy-tailed distributions, implying a prevalence of low values interspersed with a few extreme highs. High coefficients of variation for cation-exchange capacity (CEC) and K highlight marked spatial heterogeneity in soil fertility.

With respect to soil texture, the mean fractions of clay (18.88%), silt (38.23%), and sand (42.88%) indicate that the study region is dominated by sandy loam. The close agreement between the median (37%) and mean for silt suggests an approximately symmetric distribution, whereas the wide Iqr for sand denotes substantial variability in sand content. The large divergence between the mean (49.92) and median (20.80) of OC—together with a kurtosis of 13.53—revealed a mixture of high-organic soils and typical arable soils. Collectively, the dataset captures the pronounced heterogeneity of European soils, posing a non-trivial challenge for predictive modeling. Nevertheless, the training and test sets exhibited strong concordance in key statistics (mean, standard deviation, median), and apart from a minor discrepancy in the Iqr of K, all other parameters maintained stable Iqr values across splits. This indicates a sound data partitioning strategy with no evidence of significant data leakage, thus providing a solid foundation for subsequent model training and evaluation.

2.1.2. Data Preprocessing

For data preprocessing, this study employed piecewise pooling averaging (PPA), also known as the bin-averaging method. PPA is a dimensionality-reduction technique that applies local mean pooling to high-dimensional spectral data: the spectrum is partitioned into fixed intervals, and the mean of each interval is computed to generate a compressed feature set. This procedure preserves global trend information while effectively suppressing random noise and lowering computational cost. Given that Vis–NIR spectra typically comprise thousands of bands, PPA was used to condense the original 4200-dimensional spectral vectors to 128 dimensions, substantially improving both computational efficiency and training speed without sacrificing predictive accuracy. Let the original spectral matrix be

A_{x} \in R^{m \times n}

, where m is the sample and n is the feature dimension. Then, the width of the compartments can be obtained by Formula (1):

bin_size = ⌊\frac{n}{b}⌋

(1)

The unpacking operation of PPA can be expressed as Formula (2):

A_{x}^{'} = \frac{1}{bin_size} \sum_{k = 1}^{bin_size} A_{x} [:, (i - 1) \cdot bin_size + k]

(2)

Among them,

i \in {1, 2, \dots, b}

and the final output is

A_{x}^{'} \in R^{m \times b}

.

2.2. Modeling Method

This study proposes a multi-scale attention residual network (ReSE-AP Net) that synergistically integrates residual architecture, channel attention, and multi-scale feature fusion to efficiently decode complex spectral information. Centered on residual convolutions, the network employs skip connections to merge local spectral details with global abstract features, thereby alleviating gradient-vanishing issues. Within each residual block, a squeeze-and-excitation (SE) attention mechanism dynamically enhances responses at critical spectral bands through global feature statistics while suppressing noise. The model further incorporates atrous spatial pyramid pooling (ASPP) to extract spectral features at multiple scales in parallel, simultaneously capturing fine structures of weak absorption peaks and overarching trends of broad spectral ranges. Ultimately, feature fusion followed by nonlinear mapping enables end-to-end prediction, furnishing a robust deep-learning framework for spectral analysis.

2.2.1. Overall Model Structure

The overall architecture of the model is depicted in Figure 1. Training data were first partitioned with a batch size of 320, and piecewise pooling averaging (PPA) was employed to compress the 4200-dimensional spectra to 128 dimensions. After preprocessing, the input tensor had a shape [320,1,128] corresponding to [batch_size,channel,seq_length].

An initial convolutional module was placed at the network front end to extract low-level features; this consists of a convolutional layer (kernel size = 3, padding = 1) followed by a ReLU activation function, thereby introducing nonlinearity. The resulting features are forwarded to a residual network augmented with a squeeze-and-excitation (SE) channel-attention mechanism. This residual network contains two residual blocks, each comprising two convolutional layers—the first expanding the channel dimension and the second maintaining it—together with batch normalization and ReLU activation. Within the main branch of each block, the features produced by the two convolutions are re-weighted by the SE attention to dynamically enhance informative spectral bands and suppress noise. The attention-refined output is then added to the shortcut pathway, forming the residual connection. The shortcut both mitigates gradient vanishing and network degradation and enables lower-level information to flow directly to deeper layers, fostering feature reuse and preventing information loss.

The output of the residual network is subsequently fed into an atrous spatial pyramid pooling (ASPP) module. ASPP comprises three parallel atrous-convolution branches with dilation rates of 1, 2, and 4, respectively, to capture multi-scale features with varying receptive fields. Concurrently, a global-average-pooling branch compresses the sequence dimension to obtain global statistics, which are then restored to the original sequence length via nearest-neighbor upsampling to align with the atrous branches. Features from the atrous convolutions and the global branch are merged in a fusion layer, yielding a composite representation that integrates local, intermediate, large-scale, and global information.

The fused features are further downsampled by a max-pooling layer (kernel size = 2, stride = 2), flattened, and passed through a fully connected layer to produce the final output, enabling end-to-end mapping from spectra to soil-element predictions.

2.2.2. SE Attention Mechanism and Residual Convolutional Network

The squeeze-and-excitation (SE) attention mechanism constitutes a canonical form of channel attention, designed to augment the representational capacity of convolutional neural networks while reducing the training overhead. It comprises two principal operations—squeeze and excitation. In the squeeze phase, global average pooling is applied to each channel feature map, collapsing its spatial dimensions to generate a channel descriptor that encapsulates the channel’s global response. This operation is formalized in Equation (3):

z_{c} = \frac{1}{N} \sum_{i = 1}^{N} x_{bic}

(3)

where

x_{b i c}

represents the value of channel c at point i in the BTH batch of the input feature position, N is the total number of elements on this channel, and

z_{c}

is the value of channel c after compression. During the excitation phase, the squeezed descriptors are passed through a nonlinear transformation to produce a weight vector whose length equals the number of channels, with each element quantifying the importance of its corresponding channel. This operation can be formulated as Equation (4):

s = σ_{ReLU} (W_{2} σ_{Sigmod} (W_{1} z + b_{1}) + b_{2})

(4)

Among them, z is the compressed feature vector,

z = [z_{1}, z_{2}, \dots\dots, z_{c}]

,

W_{1}, W_{2}

are the weight parameter of the fully connected layer,

b_{1}, b_{2}

are the bias term, and

σ

is the activation function. Subsequently, the channel-wise weights generated in the excitation phase are applied to the original feature maps via channel-specific multiplication, thereby re-calibrating the features. This operation can be represented by Equation (5):

X_{c a l e d} = s \cdot x

(5)

Among them, x is the original feature and

X_{c a l e d}

is the feature after channel weighting. The squeeze-and-excitation (SE) attention mechanism recalibrates each channel feature map in a convolutional neural network through two successive operations—squeeze and excitation—thereby markedly enhancing the representational capacity. A schematic of the SE module adopted in this study is illustrated in Figure 2, where B denotes the batch size, C is the number of channels, L is the sequence length, and r is the reduction (compression) ratio.

As a paradigmatic deep-network architecture, the residual neural network alleviates the vanishing-gradient problem commonly encountered during the training of very deep models by incorporating residual learning and cross-layer identity mappings, thereby substantially enhancing the feature representation and generalization capabilities. Each residual block in a ResNet can be formulated as Equation (6):

y_{l} = h (x_{l}) + F (x_{l}, {W_{i}})

(6)

Among them,

y_{l}

represents the output of the

l

residual block,

x_{l}

is the input of the

l

residual block,

h (x_{l})

is the skip connection,

F (x_{l}, {W_{i}})

is the residual function, and

W_{i}

is the weight parameter. In ReSE-AP Net,

F (x_{l}, {W_{i}})

should be expressed as Equation (7):

F (x_{l}, {W_{i}}) = s \cdot (W_{2} \cdot σ_{ReLU} (W_{1} \cdot x_{l} + b_{1}) + b_{2})

(7)

Among them,

W_{1}, W_{2}

are the weight parameters of the two convolutional layers,

b_{1}, b_{2}

are the bias terms, and s is the channel weight vector calculated by the SE module. Finally, the total expression of the residual network weighted by SE attention can be obtained as Equation (8):

y_{L} = x_{L} + \sum_{i = 1}^{L} F (x_{i}, {W_{i}})

(8)

Among them,

x_{L}

is the input of the last residual block, and L is the total number of residual blocks. Experimental results indicate that excessively deep architectures (e.g., ResNet-152) deteriorate performance in the target task rather than improving it. Detailed analysis attributes this degradation to two primary factors: (i) the inherent parameter redundancy of very deep networks results in a mismatch between model complexity and dataset size, thereby inducing severe overfitting; and (ii) over-parameterized models exhibit gradient instability during back-propagation, substantially complicating training. In response, a streamlined shallow residual architecture is proposed. As illustrated in Figure 3 (where C1, C2, and C3 denote different channel dimensions), the network consists of only two residual blocks, striking a judicious balance between model capacity and computational efficiency. Empirical evidence demonstrates that relative to deeper residual networks, this shallow design preserves the feature-extraction capability while markedly reducing complexity, consequently shortening the per-iteration training time and facilitating rapid model updates.

This work innovatively integrates the squeeze-and-excitation (SE) channel-attention mechanism into the residual network for two principal reasons:

Heterogeneous channel importance. Conventional convolutions treat all channels equally, however, their contributions to the target task vary substantially; some channels even convey redundant or noisy information. The SE mechanism adaptively learns channel-specific weights, suppressing less informative channels and amplifying pivotal ones, thereby improving feature utilization.

Explicit modeling of inter-channel dependencies. While residual networks mitigate gradient vanishing via skip connections, they do not explicitly model relationships among channels. By employing nonlinear mapping to capture such dependencies, the SE attention further augments representational power.

The formal mathematical definitions and computational procedures of this module are provided in Equations (9)–(13).

F_{l} = BN (ReLU (Conv 1 D (H_{l}, {W_{l}}^{(1)})))

(9)

F_{l} = BN (ReLU (Conv 1 D (F_{l}, {W_{l}}^{(2)})))

(10)

S_{l} = ReLU (W_{l}^{(f c 2)} \cdot Sigmod (W_{l}^{(f c 1)} \cdot GAP (F_{l})))

(11)

F_{l}^{SE} = F_{l} \otimes S_{l}

(12)

H_{l + 1} = F_{l}^{SE} + ShortCut (H_{l})

(13)

Among them,

BN (), ReLU (), Sigmod ()

represent batch normalization and two activation functions, respectively;

Conv 1 D ()

represents one-dimensional convolution;

W_{l}^{(1)}, W_{l}^{(2)}

respectively represent the different weight parameters of the two convolution operations;

W_{l}^{(f c 1)}, W_{l}^{(f c 2)}

are respectively the two weight parameters of the SE attention mechanism in the process of calculating channel weighting;

GAP ()

represents global average pooling (which is the symbolic expression of Formula (3)).

S_{l}

represents the weight calculated by the channel attention mechanism, and

\otimes

represents the multiplication of each channel. Moreover, incorporating the SE channel-attention mechanism adds only a negligible number of parameters and incurs a minimal computational overhead, imparting an inherently lightweight nature that helps maintain training efficiency. In the proposed design, the SE module is inserted at the end of the main branch of each residual block, immediately before the residual summation; performing channel-wise recalibration prior to feature fusion yields a more discriminative combined representation. Because the shortcut branch primarily serves as an unobstructed gradient-flow pathway to alleviate vanishing gradients, no SE module is applied to this branch. Given that the two residual blocks produce feature maps with different channel dimensions, separate SE modules—each matched to its respective dimensionality—are deployed, thereby ensuring dimensional compatibility and preventing cross-interference among the attention weights.

2.2.3. Pyramid Pooling of Hollow Space

Atrous spatial pyramid pooling (ASPP) is a multi-scale feature-extraction strategy that constructs a pyramid of atrous-convolution branches with distinct dilation rates within a convolutional neural network. By substantially enlarging the effective receptive field without a significant increase in parameters, ASPP enables the network to capture contextual information at multiple spatial scales while preserving feature-map resolution, thereby enhancing its ability to recognize objects of varying sizes. Atrous convolution expands the kernel’s field of view through sparse sampling, circumventing the detail loss usually caused by downsampling, whereas the parallel multi-branch design endows the model with rich scale awareness. The convolution operations corresponding to the different dilation rates are formally defined in Equations (14)–(16).

x_{1} = σ_{ReLU} (W_{1} * r_{1} \cdot x + b_{1})

(14)

x_{2} = σ_{ReLU} (W_{2} * r_{2} \cdot x + b_{2})

(15)

x_{3} = σ_{ReLU} (W_{3} * r_{3} \cdot x + b_{3})

(16)

Among them,

W_{1}, W_{2}, W_{3}

are the weight parameters of different convolutional layers,

b_{1}, b_{2}, b_{3}

are the bias terms,

r_{1}, r_{2}, r_{3}

are the void rates,

x

is the input feature sequence, and

σ_{ReLU}

is the ReLU activation function. The global-average-pooling branch compresses the input feature map into a global feature vector, refines the channel dimensionality via a 1 × 1 convolution, and subsequently restores the spatial resolution through upsampling. This process provides global contextual information, thereby compensating for the limited receptive field of local convolutions. The corresponding mathematical formulations are presented in Equations (17)–(19).

{\bar{x}}_{c} = \frac{1}{L} \sum_{i = 1}^{L} x_{b i c}

(17)

x_{4} = σ_{ReLU} (W_{4} \cdot \bar{x} + b_{4})

(18)

x_{4}^{'} = L_{(1)} \cdot {x_{4}}^{T}

(19)

Among them,

x_{b i c}

represents the value of channel c at position i of the Bth batch of the input feature position, L is the total length of the vector,

\bar{x}

is the value of channel c after compression and

\bar{x} = [{\bar{x}}_{1}, {\bar{x}}_{2}, \dots\dots, {\bar{x}}_{c}]

, and

L_{(1)}

is the column vector of all 1s. Subsequently, the features extracted from the multi-scale atrous-convolution branches and the global-average-pooling branch are concatenated along the channel dimension and fused via a 1 × 1 convolution to realize cross-scale interaction and compression, as formulated in Equations (20) and (21).

x_{c a t} = [x_{1}, x_{2}, x_{3}, x_{4}^{'}]

(20)

y_{o u t} = σ_{ReLU} (W_{5} \cdot x_{c a t} + b_{5})

(21)

Among them,

x_{o u t}

represents the result obtained by concatenating the features obtained by the convolution of different receptive fields with the features obtained by global pooling, and

y_{o u t}

is the cross-scale feature output result after convolution fusion.

In the present task, distinct spectral bands in hyperspectral data corresponded to characteristic absorption features of various substances. The ASPP module, with its parallel multi-branch design, concurrently captures local details (small dilation rate), medium- to long-range dependencies (large dilation rate), and global context (pooling branch). The fused multi-scale features enhance the model’s robustness to spectral noise and local occlusions, rendering ASPP particularly well-suited to the high spectral dimensionality of hyperspectral data. The ASPP architecture implemented in this study is illustrated in Figure 4, where B denotes the batch size, C is the number of channels, and L is the sequence length. Within the overall framework, the ResNet backbone extracts deep representations via residual connections but may overlook cross-scale contextual information; the ASPP module refines these high-level features at multiple scales. Simultaneously, the SE module in the residual network focuses on channel-wise importance, whereas ASPP emphasizes spatial multi-scale information. Their combination realizes “channel-spatial” dual-attention, markedly enhancing the expression of salient features. The complete mathematical formulation of this module is provided in Equations (22)–(26).

C_{1} = ReLU (Conv 1 D (H, W_{1}))

(22)

C_{2} = ReLU (Conv 1 D (H, W_{2}))

(23)

C_{4} = ReLU (Conv 1 D (H, W_{4}))

(24)

G = Upsample (GAP (H))

(25)

F_{fusion} = {Concat (C}_{1} {, C}_{2} {, C}_{4}, G)

(26)

Among them,

C_{1}, C_{2}, C_{4}

represent the convolution outputs of three different void rates,

H

represents the input received by ASPP,

W_{1}, W_{2}, W_{4}

represent the weight parameters of the three different convolutions,

Upsample ()

represents upsampling,

Conv 1 D ()

represents one-dimensional convolution,

GAP ()

represents global average pooling, and

Concat ()

represents the concatenation and fusion of

C_{1}, C_{2}, C_{3}, C_{4}, G

.

2.2.4. Model Evaluation

In this study, model performance was assessed using the coefficient of determination (R²) and the root mean square error (RMSE). The coefficient of determination quantifies the degree of correspondence between the predicted and observed values, representing the proportion of variance in the response variable that is accounted for by the predictive model; an R² value approaching 1 indicates a superior goodness of fit. RMSE measures the average discrepancy between the predicted and observed values, thereby reflecting the overall predictive accuracy; a lower RMSE denotes reduced error and more precise predictions. The mathematical formulations of R² and RMSE are provided in Equations (27) and (28), respectively.

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(27)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(28)

2.2.5. Experimental Setup

The experiments were conducted on a system equipped with a 14th-generation Intel Core i7-14700HX processor (20 cores/28 threads), Intel, Santa Clara, CA, USA and an NVIDIA GeForce RTX 4060 GPU, NVIDIA, Santa Clara, CA, USA. The software environment comprised Windows 11 as the operating system, Python 3.11 as the programming language, and PyTorch 2.3.0 as the deep-learning framework. Based on the above configuration, each round of training during the model training process took approximately 40 s and occupied about 14 GB of memory.

3. Experimental Result

3.1. Ablation Experiment

The study conducted a systematic evaluation of individual model components, namely the residual block alone (ResBlock), the residual network with channel attention (ResNet + SE), and the complete multi-scale attention residual network (ReSE-AP Net). The performance outcomes are summarized in Table 2 (comparison of R²) and Table 3 (comparison of RMSE). The data revealed that augmenting the baseline ResBlock with an SE attention mechanism yielded an average increase of 3.3% in R² and a 7.5% reduction in RMSE, indicating that channel-wise recalibration markedly improves the feature discriminability. Incorporating ASPP on top of the ResNet + SE architecture further enhanced the average R² by 2.1% and reduced the RMSE by an additional 6.0%. Relative to the standalone ResBlock, the full ReSE-AP Net achieved a 5.2% average gain in R² and a 13.1% decrease in RMSE. These findings substantiate the efficacy of parallel multi-receptive-field extraction in strengthening cross-scale feature representation.

3.2. Comparative Experiment

To further assess the practical predictive capability of ReSE-AP Net, a series of benchmark models were selected for comparison including conventional machine learning algorithms (PLSR, Ridge, SVR, RF, XGBoost) and commonly used deep-learning architectures in this domain (VGG-11, TCN, ResNet-152, Transformer). All models were trained on the LUCAS 2009 dataset under identical preprocessing procedures to ensure experimental fairness. The training hyper-parameters were uniformly set as follows: 3000 epochs, a learning rate of 0.0003175, a batch size of 320, the Adam optimizer, and mean squared error (MSE) loss. The resulting performance on the independent test set is summarized in Table 4 (comparison of R²) and Table 5 (comparison of RMSE).

As shown in Table 4 and Table 5, partial least squares regression (PLSR) delivered the best overall performance among the conventional machine learning models, followed by support vector regression (SVR). Within the deep-learning cohort, the Transformer architecture achieved the highest aggregate accuracy across elemental predictions. In contrast, ReSE-AP Net outperformed all machine learning models for every element: for OC, its R² exceeded that of the best machine learning performer (SVR) by roughly 2.8% and surpassed the poorest performer (random forest, RF) by about 9.2%. For the most divergent indicator, K, ReSE-AP Net improved upon PLSR and RF by approximately 18.63% and 41.79%, respectively. Compared with deep-learning baselines, ReSE-AP Net surpassed the Transformer model on half of the evaluated metrics (clay, silt, sand, OC, CaCO₃, and P) and matched it on the remaining metrics, underscoring the proposed network’s superior capability for multi-element hyperspectral prediction.

With respect to the RMSE metric, PLSR again delivered the lowest prediction errors among the traditional machine learning models, followed by SVR. Within the deep learning cohort, the Transformer architecture achieved the best RMSE on over half of the evaluated indicators, with VGG-11 ranking second. In contrast, ReSE-AP Net consistently surpassed all of the baseline models. For the most closely matched indicator, phosphorus (P), ReSE-AP Net reduced the RMSE by approximately 14.2% relative to the best machine learning performer (PLSR) and by about 25.2% relative to the poorest performer (RF). For calcium carbonate (CaCO₃)—the indicator exhibiting the largest discrepancy—ReSE-AP Net lowered the RMSE by roughly 41.9% compared with PLSR and by 69.2% compared with RF. Against the deep learning baselines, ReSE-AP Net outperformed all counterparts on every indicator except for the pH in H₂O, where it tied with TCN. For silt—the closest indicator among the deep models—ReSE-AP Net achieved a further 1.6% reduction in RMSE over the best deep model (VGG-11) and a 24.4% reduction relative to the worst (ResNet-152). For the pH in CaCl₂, which showed the greatest gap, ReSE-AP Net lowered the RMSE by about 1% compared with TCN and by approximately 39.0% compared with ResNet-152. It is worth noting that on the RMSE indicator, the means of other models, except for ReSE-AP Net, on the 11 elements were respectively: 6.526% (clay), 11.838% (silt), 15.494% (sand), 0.584 (PH in CaCl₂), 0.555 (PH in H₂O), 24.611 g/kg (OC), 41.107 g/kg (CaCO₃) 1.293 g/kg (N), 28.261 mg/kg (P), 190.939 mg/kg (K), and 7.876 cmol(+)/kg (CEC). ReSE-AP Net increased the mean value of each element by approximately 26.9% (clay), 22.2% (silt), 25.7% (sand), 39.5% (PH in CaCl₂), 34.6% (PH in H₂O), 31.4% (OC), 44.8% (CaCO₃), 28.7% (N), 17.9% (P) 21.8% (K), and 21.5% (CEC).

As illustrated in Figure 5 and Figure 6, Figure 5 compares the coefficient of determination (R²) obtained by the competing models for each soil attribute, whereas Figure 6 presents the corresponding root mean square error (RMSE). In Figure 5, the closer a model’s curve was to the upper boundary, the better its predictive performance; conversely, in Figure 6, curves located nearer the lower boundary indicated smaller errors and hence superior accuracy. Within the same plot, greater separations between the two curves signified more pronounced performance disparities. The results revealed that with the exception of the P, K, and CEC indicators, ReSE-AP Net occupied the uppermost—or nearly uppermost—position across all remaining attributes in Figure 5. Likewise, in Figure 6, the ReSE-AP Net curve lay at the very bottom for almost every element, underscoring its outstanding overall predictive accuracy.

4. Further Evaluation and Discussion

To further quantify and evaluate the model’s fitting quality and overall performance, scatter plots of the observed versus predicted values were generated (Figure 7). Each plot contained numerous points and two curves: the red line denoted y = x, representing the ideal scenario in which the predicted values perfectly match the observations, whereas the blue curve corresponded to the regression fit of the model’s predictions. Point density was color-coded, with deeper (reddish) hues indicating higher concentrations of samples. In every plot, the red and blue curves intersected; this intersection marks the point at which the predicted and observed values are equal. When the intersection lay within the region of highest point density, the model exhibited a superior fitting performance and robustness within the principal data distribution. The results show that for most soil-element predictions, the intersection of the curves for ReSE-AP Net fell within the densest region, confirming its strong predictive capability and generalization. Nonetheless, for the pH and sand indicators, the intersection only appeared in relatively dense regions rather than the densest area, suggesting that the model’s performance on these two attributes could be further improved.

It is pertinent to contrast our ReSE-AP Net with recent related works that also leverage ASPP-like structures for hyperspectral data analysis, notably the contributions from Liu et al. [34] and Liu et al. [35].

Liu et al. [34] ingeniously adapted the ResNet-50 architecture for weed detection by replacing its latter stages with an ASPP module. While this design proved effective for their specific task, our preliminary experiments indicated that deeper networks, such as ResNet-152, did not necessarily yield a superior performance in our soil property prediction context, suggesting that the optimal network depth is task-dependent. Furthermore, their model lacked an explicit attention mechanism, which we identified as a key component for refining spectral features. In contrast, ReSE-AP Net is architecturally optimized in two ways: first, it employs a residual network of a deliberately chosen, more moderate depth to prevent overfitting and capture salient features effectively, and second, it integrates the SE channel attention mechanism within the feature extraction backbone, enabling progressive feature refinement and noise suppression.

In another relevant study, Liu et al. [35] proposed RAANet for semantic segmentation, which innovatively incorporates a residual structure within the ASPP module itself and deploys a dense arrangement of attention modules both inside and outside the ASPP. While this approach is novel and effective, its primary focus is on a complex, attention-augmented ASPP, with comparatively less emphasis on the initial deep feature extraction process. This may risk underutilizing the rich information embedded in the original hyperspectral data. ReSE-AP Net adopts a fundamentally different strategy by prioritizing the front-end feature extraction. Our model leverages a synergistic combination of residual connections and SE attention to ensure that features are comprehensively extracted and purified before they are channeled into the multi-scale analysis stage. This strategic divergence underscores the unique architectural philosophy of our approach.

In summary, the novelty of ReSE-AP Net, when benchmarked against these state-of-the-art models, is threefold:

(1): Endogenous refinement through deeply embedded attention: We pioneered the concept of embedding channel attention within each fundamental building block of the feature extraction backbone. This facilitated a progressive, layer-by-layer purification of spectral features, fundamentally enhancing the quality of the feature maps that are subsequently fed into the multi-scale analysis module.
(2): We propose a novel two-stage architectural paradigm with a clear division of labor: A front-end network dedicated to feature purification and a back-end module focused on multi-scale fusion. This represents a strategic innovation over existing models that either lack a purification stage or conflate it with multi-scale analysis.
(3): We successfully adapted and validated the efficacy of the ASPP module, a technique predominantly used in 2D image processing, for the task of one-dimensional hyperspectral inversion. Our results confirm that ASPP is a highly effective tool for capturing multi-scale contextual information within 1D spectral data, thereby establishing its utility for this new domain.

5. Conclusions

This study first elucidated the significance of soil-element prediction and its relevance to sustainable agriculture. The publicly available LUCAS 2009 soil dataset was then introduced, outliers were removed, and a suite of descriptive statistics—size, mean, std, median, mode, kurtosis, and IQR—were computed and interpreted to demonstrate the scientific soundness of the data partitioning strategy. After data cleansing, piecewise pooling averaging (PPA) was applied to reduce the dimensionality of the spectral inputs. Building on these preparations, a multi-scale attention residual network based on spatial pyramid pooling (ReSE-AP Net) was proposed and employed for visible–near-infrared (Vis–NIR) hyperspectral inversion of multiple soil elements on the LUCAS 2009 dataset. The model extracts initial features via a front-end convolutional layer, propagates salient information through residual blocks augmented with SE channel attention, and enhances predictive accuracy and robustness by leveraging multi-scale feature extraction and fusion within the ASPP module. Experimental results showed that ReSE-AP Net outperformed all mainstream traditional machine learning models and equaled or surpassed widely used deep learning architectures, with particularly strong performance in terms of RMSE; its success on a publicly available dataset further attests to its generalization capability.

Despite the demonstrated robustness and high performance of the proposed ReSE-AP Net, we acknowledge several limitations that warrant discussion and outline clear avenues for future research.

(1): The reliance on the LUCAS 2009 dataset, while ensuring high data quality and standardization, introduced a potential bias. The dataset is geographically confined to Europe and is predominantly composed of agricultural soils. Consequently, the model’s generalizability to other geographical regions, diverse land-use types (e.g., forests, wetlands), or less standardized, private datasets remains an open question requiring empirical validation. Future work will therefore focus on acquiring and testing the model on such heterogeneous datasets to rigorously assess its real-world applicability.
(2): A nuanced analysis of the performance metrics revealed a noteworthy finding. Although ReSE-AP Net surpassed all baseline models in terms of RMSE across all soil properties, its performance on the R² metric for pH, N, and K was merely on par with the Transformer architecture. We hypothesize two complementary reasons for this observation. One pertains to the inherent inductive biases of the models: the Transformer’s self-attention mechanism may be more adept at capturing the global, long-range spectral dependencies upon which the prediction of these particular elements relies, whereas our CNN-based model excels at leveraging local features. The other reason, suggested by the superior RMSE of our model, is that ReSE-AP Net achieves exceptional accuracy on the majority of samples within the central data distribution but may be less effective than the Transformer at fitting the extreme values that heavily influence the R² score. This indicates a clear opportunity for refinement.

To address this, our immediate future work will concentrate on enhancing the model architecture. A primary strategy will be to introduce an adaptive weighting mechanism within the ASPP module. This mechanism will be designed to dynamically assign weights during the fusion of multi-scale convolutional features, thereby amplifying salient feature information while suppressing irrelevant noise. In principle, such a modification should augment the feature fusion capability of the ASPP module, leading to an overall improvement in the model’s predictive power, especially in capturing the full variance of the data. This promising direction is currently under active investigation.

In conclusion, while acknowledging these areas for further improvement, the ReSE-AP Net model, as presented, demonstrates strong predictive capabilities for a wide range of soil elements, offering a valuable and high-performance benchmark for the field of soil spectroscopy.

Author Contributions

Conceptualization, Y.D. and Y.C.; Methodology, Y.C.; Software, Y.C.; Validation, S.C., Y.C. and Y.D.; Formal analysis, X.C.; Investigation, Y.C.; Resources, Y.D.; Data curation, Y.D.; Writing—original draft preparation, Y.C.; Writing—review and editing, Y.D.; Supervision, X.C.; Project administration, Y.D.; Funding acquisition, X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Guangxi Key Research and Development Pro-gram (GuikeAB24010338, GuikeAB25069340), the National Natural Science Foundation of China (32360374), and the Innovation Project of Guangxi Graduate Education (YCSW2025405).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available in the following links to access public dataset LUCAS: https://esdac.jrc.ec.europa.eu/projects/lucas (accessed on 1 March 2025).

Acknowledgments

We would like to express our gratitude to all of the researchers who participated in the experiment for their efforts. Meanwhile, we also wish to thank the institutions that provided us with financial assistance. At the same time, we declare that we have not used any artificial intelligence tools to manipulate and generate any experimental data and results.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gomiero, T. Soil degradation, land scarcity and food security: Reviewing a complex challenge. Sustainability 2016, 8, 281. [Google Scholar] [CrossRef]
Sun, W.; Huang, Y.; Zhang, W.; Yongqiang, Y. Carbon sequestration and its potential in agricultural soils of China. Glob. Biogeochem. Cycles 2010, 24. [Google Scholar] [CrossRef]
Mandal, D.; Roy, T. Climate Change Impact on Soil Erosion and Land Degradation. In Climate Change Impacts on Soil-Plant-Atmosphere Continuum; Springer Nature: Singapore, 2024; pp. 139–161. [Google Scholar]
Rhodes, C.J. Soil erosion, climate change and global food security: Challenges and strategies. Sci. Prog. 2014, 97, 97–153. [Google Scholar] [CrossRef] [PubMed]
Prăvălie, R. Exploring the multiple land degradation pathways across the planet. Earth-Sci. Rev. 2021, 220, 103689. [Google Scholar] [CrossRef]
Bhattacharyya, R.; Ghosh, B.N.; Mishra, P.K.; Mandal, B.; Rao, C.S.; Sarkar, D.; Das, K.; Anil, K.S.; Lalitha, M.; Hati, K.M.; et al. Soil degradation in India: Challenges and potential solutions. Sustainability 2015, 7, 3528–3570. [Google Scholar] [CrossRef]
Lal, R.; Bouma, J.; Brevik, E.; Dawson, L.; Field, D.J.; Glaser, B.; Hatano, R.; Hartemink, A.E.; Kosaki, T.; Lascelles, B.; et al. Soils and sustainable development goals of the United Nations: An International Union of Soil Sciences perspective. Geoderma Reg. 2021, 25, e00398. [Google Scholar] [CrossRef]
Mikhailova, E.A.; Zurqani, H.A.; Lin, L.; Hao, Z.; Post, C.J.; Schlautman, M.A.; Shepherd, G.B. Opportunities for monitoring soil and land development to support United Nations (UN) Sustainable Development Goals (SDGs): A Case study of the United States of America (USA). Land 2023, 12, 1853. [Google Scholar] [CrossRef]
Mikhailova, E.A.; Post, C.J.; Nelson, D.G. Integrating United Nations Sustainable Development Goals in Soil Science Education. Soil Syst. 2024, 8, 29. [Google Scholar] [CrossRef]
Pandey, P.C.; Pandey, M. Highlighting the role of agriculture and geospatial technology in food security and sustainable development goals. Sustain. Dev. 2023, 31, 3175–3195. [Google Scholar] [CrossRef]
Bouma, J. Contributing pedological expertise towards achieving the United Nations sustainable development goals. Geoderma 2020, 375, 114508. [Google Scholar] [CrossRef]
Atukunda, P.; Eide, W.B.; Kardel, K.R.; Iversen, P.O.; Westerberg, A.C. Unlocking the potential for achievement of the UN Sustainable Development Goal 2–‘Zero Hunger’–in Africa: Targets, strategies, synergies and challenges. Food Nutr. Res. 2021, 65, 10–29219. [Google Scholar] [CrossRef] [PubMed]
Olatunde, K.A. Soil Characterization Using Visible Near Infrared Diffuse Reflectance Spectroscopy (VNIR DRS). Ph.D. Thesis, University of Reading, Berkshire, UK, 2018. [Google Scholar]
Luo, B.; Sun, H.; Zhang, L.; Chen, F.; Wu, K. Advances in the tea plants phenoty using hyperspectral imaging technology. Front. Plant Sci. 2024, 15, 1442225. [Google Scholar] [CrossRef]
Piccini, C.; Metzger, K.; Debaene, G.; Stenberg, B.; Götzinger, S.; Borůvka, L.; Sandén, T.; Bragazza, L.; Liebisch, F. In-field soil spectroscopy in Vis–NIR range for fast and reliable soil analysis: A review. Eur. J. Soil Sci. 2024, 75, e13481. [Google Scholar] [CrossRef]
Stenberg, B.; Rossel, R.A.V.; Mouazen, A.M.; Wetterlind, J. Visible and near infrared spectroscopy in soil science. Adv. Agron. 2010, 107, 163–215. [Google Scholar]
Leone, A.P.A.; Viscarra-Rossel, R.; Amenta, P.; Buondonno, A. Prediction of soil properties with PLSR and vis-NIR spectroscopy: Application to mediterranean soils from Southern Italy. Curr. Anal. Chem. 2012, 8, 283–299. [Google Scholar] [CrossRef]
Zhao, L.Y.M.H.; Zhou, W.; Liu, Z.-H.; Pan, Y.-C.; Shi, Z.; Wang, G.-X. Estimation methods for soil mercury content using hyperspectral remote sensing. Sustainability 2018, 10, 2474. [Google Scholar] [CrossRef]
Jia, P.; Zhang, J.; He, W.; Hu, Y.; Zeng, R.; Zamanian, K.; Jia, K.; Zhao, X. Combination of hyperspectral and machine learning to invert soil electrical conductivity. Remote Sens. 2022, 14, 2602. [Google Scholar] [CrossRef]
Jia, L.; Zu, W.; Yang, F.; Gao, L.; Gu, G.; Zhao, M. Estimating Organic Matter Content in Hyperspectral Wetland Soil Using Marine-Predators-Algorithm-Based Random Forest and Multiple Differential Transformations. Appl. Sci. 2023, 13, 10693. [Google Scholar] [CrossRef]
Wu, B.; Yang, K.; Li, Y.; He, J. Hyperspectral Inversion of Heavy Metal Copper Content in Corn Leaves Based on DRS–XGBoost. Sustainability 2023, 15, 16770. [Google Scholar] [CrossRef]
Zheng, M.; Luan, H.; Liu, G.; Sha, J.; Duan, Z.; Wang, L. Ground-based hyperspectral retrieval of soil arsenic concentration in Pingtan island, China. Remote Sens. 2023, 15, 4349. [Google Scholar] [CrossRef]
Gao, Z.; Wang, W.; Wang, H.; Li, R. Selection of Spectral Parameters and Optimization of Estimation Models for Soil Total Nitrogen Content During Fertilization Period in Apple Orchards. Horticulturae 2024, 10, 358. [Google Scholar] [CrossRef]
Song, Q.; Gao, X.; Song, Y.; Li, Q.; Chen, Z.; Li, R.; Zhang, H. Estimation and mapping of soil texture content based on unmanned aerial vehicle hyperspectral imaging. Sci. Rep. 2023, 13, 14097. [Google Scholar] [CrossRef] [PubMed]
Zhou, W.; Li, H.; Wen, S.; Xie, L.; Wang, T.; Tian, Y.; Yu, W. Simulation of soil organic carbon content based on laboratory spectrum in the three-rivers source region of China. Remote Sens. 2022, 14, 1521. [Google Scholar] [CrossRef]
Zhong, Q.; Eziz, M.; Sawut, R.; Ainiwaer, M.; Li, H.; Wang, L. Application of a hyperspectral remote sensing model for the inversion of nickel content in urban soil. Sustainability 2023, 15, 13948. [Google Scholar] [CrossRef]
Subi, X.; Eziz, M.; Zhong, Q. Hyperspectral Estimation Model of Organic Matter Content in Farmland Soil in the Arid Zone. Sustainability 2023, 15, 13719. [Google Scholar] [CrossRef]
Chen, S.; Gao, J.; Loum, F.; Tuo, Y.; Tan, S.; Shan, Y.; Luo, L.; Xu, Z.; Zhang, Z.; Huang, X. Rapid estimation of soil water content based on hyperspectral reflectance combined with continuous wavelet transform, feature extraction, and extreme learning machine. PeerJ 2024, 12, e17954. [Google Scholar] [CrossRef]
Wang, S.; Guan, K.; Zhang, C.; Lee, D.; Margenot, A.J.; Ge, Y.; Peng, J.; Zhou, W.; Zhou, Q.; Huang, Y. Using soil library hyperspectral reflectance and machine learning to predict soil organic carbon: Assessing potential of airborne and spaceborne optical soil sensing. Remote Sens. Environ. 2022, 271, 112914. [Google Scholar] [CrossRef]
Wang, H.; Zhang, L.; Zhao, J.; Hu, X.; Ma, X. Application of hyperspectral technology combined with genetic algorithm to optimize convolution long-and short-memory hybrid neural network model in soil moisture and organic matter. Appl. Sci. 2022 12, 10333. [CrossRef]
Li, H.; Ju, W.; Song, Y.; Cao, Y.; Yang, W.; Li, M. Soil organic matter content prediction based on two-branch convolutional neural network combining image and spectral features. Comput. Electron. Agric. 2024, 217, 108561. [Google Scholar] [CrossRef]
Toth, G.; Jones, A.; Montanarella, L.; Alewell, C.; Ballabio, C.; Carre, F.; De Brogniez, D.; Guicharnaud, R.A.; Gardi, C.; Hermann, T.; et al. LUCAS Topoil Survey—Methodology, Data and Results; Publications Office of the European Union: Luxembourg, 2013. [Google Scholar]
Cao, L.; Sun, M.; Yang, Z.; Jiang, D.; Yin, D.; Duan, Y. A novel transformer-CNN approach for predicting soil properties from LUCAS Vis-NIR spectral data. Agronomy 2024, 14, 1998. [Google Scholar] [CrossRef]
Liu, T.; Zhao, Y.; Wang, H.; Wu, W.; Yang, T.; Zhang, W.; Zhu, S.; Sun, C.; Yao, Z. Harnessing UAVs and deep learning for accurate grass weed detection in wheat fields: A study on biomass and yield implications. Plant Methods 2024, 20, 144. [Google Scholar] [CrossRef] [PubMed]
Liu, R.; Tao, F.; Liu, X.; Na, J.; Leng, H.; Wu, J.; Zhou, T. RAANet: A residual ASPP with attention framework for semantic segmentation of high-resolution remote sensing images. Remote Sens. 2022, 14, 3109. [Google Scholar] [CrossRef]

Figure 1. Overall architecture of ReSE-AP Net.

Figure 2. SE attention mechanism.

Figure 3. Residual convolutional network structure.

Figure 4. Pyramid pooling of hollow space.

Figure 5. Comparison chart of R².

Figure 6. RMSE comparison chart.

Figure 7. Scatter plot of the fitting of the true values and predicted values.

Table 1. Dataset statistics table.

Element	Set	Size	Mean	Std	Median	Mode	Kurtosis	Iqr
Clay (%)	Complete	17,939	18.88	13.00	17.00	4.00	0.69	18.50
	Test	5980	18.88	13.01	17.00	4.00	0.69	18.25
	Train	11,959	18.89	13.00	17.00	4.00	0.69	18.50
Silt (%)	Complete	17,939	38.23	18.30	37.00	32.00	−0.54	26.00
	Test	5980	38.23	18.30	37.00	32.00	−0.54	26.00
	Train	11,959	38.23	18.30	37.00	32.00	−0.54	26.00
Sand (%)	Complete	17,939	42.88	26.11	42.00	5.00	−1.10	45.00
	Test	5980	42.87	26.11	42.00	5.00	−1.10	45.00
	Train	11,959	42.89	26.11	42.00	5.00	−1.10	45.00
PH in CaCl₂	Complete	19,031	5.59	1.43	5.64	7.36	−1.30	2.68
	Test	6344	5.60	1.43	5.64	7.36	−1.29	2.68
	Train	12,687	5.59	1.43	5.65	7.36	−1.30	2.68
PH in H₂O	Complete	19,031	6.20	1.35	6.21	7.76	−1.24	2.45
	Test	6344	6.20	1.35	6.21	7.76	−1.24	2.45
	Train	12,687	6.20	1.35	6.21	7.76	−1.24	2.45
OC (g/kg)	Complete	19,031	49.92	91.19	20.80	11.40	13.53	27.00
	Test	6344	49.34	90.40	20.60	11.40	13.95	26.70
	Train	12,687	50.22	91.58	20.80	11.40	13.32	27.10
CaCO₃ (g/kg)	Complete	19,031	51.61	125.33	1.00	0.00	9.07	12.00
	Test	6344	51.71	125.59	1.00	0.00	9.00	12.00
	Train	12,687	51.56	125.21	1.00	0.00	9.10	12.00
N (g/kg)	Complete	19,031	2.92	3.75	1.70	1.20	16.85	1.70
	Test	6344	2.91	3.73	1.70	1.20	16.31	1.70
	Train	12,687	2.92	3.76	1.70	1.20	17.11	1.70
P (mg/kg)	Complete	19,031	30.05	32.81	22.30	0.00	173.50	32.00
	Test	6344	29.98	32.66	22.20	0.00	71.46	31.90
	Train	12,687	30.09	32.88	22.40	0.00	223.09	32.00
K (mg/kg)	Complete	19,030	196.99	229.29	136.40	0.00	166.43	176.80
	Test	6343	195.13	229.66	135.50	0.00	185.74	174.40
	Train	12,687	197.92	229.11	137.00	0.00	156.71	177.95
CEC (cmol(+)/kg)	Complete	19,031	15.75	14.48	12.40	0.00	34.08	13.30
	Test	6344	15.73	14.55	12.40	0.00	37.99	13.30
	Train	12,687	15.76	14.45	12.40	0.00	32.07	13.30

Table 2. Comparison table of R² in the ablation experiments.

Model	Clay	Silt	Sand	PH in CaCl₂	PH in H₂O	OC	CaCO₃	N	P	K	CEC
ResBlock	0.8162	0.6408	0.7442	0.9210	0.9173	0.9574	0.9480	0.9101	0.3179	0.4810	0.7062
ResNet + SE	0.8482	0.7255	0.7912	0.9288	0.9200	0.9613	0.9512	0.9224	0.3590	0.5100	0.8019
ReSE-AP Net	0.8653	0.7467	0.8055	0.9384	0.9278	0.9656	0.9642	0.9393	0.4148	0.5568	0.8172

Table 3. Comparison table of RMSE in the ablation experiments.

Model	Clay (%)	Silt (%)	Sand (%)	PH in CaCl₂	PH in H₂O	OC (g/kg)	CaCO₃ (g/kg)	N (g/kg)	P (mg/kg)	K (mg/kg)	CEC (cmol(+)/kg)
ResBlock	6.065	10.477	12.536	0.410	0.396	18.832	25.086	1.120	25.412	170.121	7.877
ResNet + SE	5.401	9.643	11.987	0.385	0.371	17.808	24.412	0.941	24.634	165.195	6.553
ReSE-AP Net	4.773	9.204	11.511	0.354	0.364	16.871	22.711	0.922	23.214	149.404	6.185

Table 4. Comparison table of R².

Model	Clay	Silt	Sand	PH in CaCl₂	PH in H₂O	OC	CaCO₃	N	P	K	CEC
PLSR	0.763	0.549	0.649	0.895	0.886	0.917	0.903	0.868	0.277	0.371	0.724
Ridge	0.746	0.505	0.610	0.872	0.864	0.907	0.883	0.845	0.249	0.335	0.688
SVR	0.755	0.522	0.583	0.622	0.642	0.937	0.903	0.902	0.196	0.277	0.738
RF	0.524	0.416	0.453	0.610	0.622	0.874	0.656	0.797	0.050	0.139	0.529
XGboost	0.562	0.423	0.491	0.684	0.711	0.893	0.765	0.810	0.082	0.161	0.561
VGG11	0.855	0.739	0.785	0.928	0.920	0.961	0.959	0.932	0.364	0.480	0.808
TCN	0.834	0.705	0.767	0.938	0.928	0.955	0.954	0.926	0.280	0.469	0.769
ResNet152	0.734	0.557	0.618	0.834	0.849	0.941	0.942	0.891	0.160	0.338	0.716
Transformer	0.850	0.720	0.770	0.940	0.930	0.960	0.960	0.940	0.410	0.600	0.830
ReSE-AP Net	0.865	0.747	0.806	0.938	0.928	0.966	0.964	0.939	0.415	0.557	0.817

Table 5. RMSE comparison table.

Model	Clay (%)	Silt (%)	Sand (%)	PH in CaCl₂	PH in H₂O	OC (g/kg)	CaCO₃ (g/kg)	N (g/kg)	P (mg/kg)	K (mg/kg)	CEC (cmol(+)/kg)
PLSR	6.335	12.281	15.461	0.461	0.456	26.325	39.103	1.359	27.067	188.292	7.608
Ridge	6.558	12.869	16.293	0.511	0.498	27.911	42.915	1.475	27.593	193.599	8.094
SVR	6.442	12.653	16.860	0.877	0.809	22.850	39.121	1.173	28.551	201.808	7.416
RF	8.976	13.979	19.305	0.890	0.832	32.466	73.746	1.686	31.030	220.221	9.939
XGBoost	8.607	13.901	18.628	0.801	0.727	29.921	60.776	1.631	30.506	217.350	9.600
VGG11	4.951	9.355	12.101	0.383	0.381	17.923	25.494	0.977	25.385	171.063	6.344
TCN	5.304	9.933	12.598	0.356	0.363	19.310	26.800	1.016	27.010	172.962	6.963
ResNet152	6.709	12.174	16.122	0.579	0.543	21.994	30.377	1.237	29.176	193.032	7.598
Transformer	4.860	9.400	12.080	0.400	0.390	22.800	31.640	1.090	28.040	160.130	7.330
ReSE-AP Net	4.773	9.204	11.511	0.354	0.364	16.871	22.711	0.922	23.214	149.404	6.185

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Deng, Y.; Cao, Y.; Chen, S.; Cheng, X. Residual Attention Network with Atrous Spatial Pyramid Pooling for Soil Element Estimation in LUCAS Hyperspectral Data. Appl. Sci. 2025, 15, 7457. https://doi.org/10.3390/app15137457

AMA Style

Deng Y, Cao Y, Chen S, Cheng X. Residual Attention Network with Atrous Spatial Pyramid Pooling for Soil Element Estimation in LUCAS Hyperspectral Data. Applied Sciences. 2025; 15(13):7457. https://doi.org/10.3390/app15137457

Chicago/Turabian Style

Deng, Yun, Yuchen Cao, Shouxue Chen, and Xiaohui Cheng. 2025. "Residual Attention Network with Atrous Spatial Pyramid Pooling for Soil Element Estimation in LUCAS Hyperspectral Data" Applied Sciences 15, no. 13: 7457. https://doi.org/10.3390/app15137457

APA Style

Deng, Y., Cao, Y., Chen, S., & Cheng, X. (2025). Residual Attention Network with Atrous Spatial Pyramid Pooling for Soil Element Estimation in LUCAS Hyperspectral Data. Applied Sciences, 15(13), 7457. https://doi.org/10.3390/app15137457

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Residual Attention Network with Atrous Spatial Pyramid Pooling for Soil Element Estimation in LUCAS Hyperspectral Data

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Sets and Data Processing

2.1.1. Dataset Statistics and Division

2.1.2. Data Preprocessing

2.2. Modeling Method

2.2.1. Overall Model Structure

2.2.2. SE Attention Mechanism and Residual Convolutional Network

2.2.3. Pyramid Pooling of Hollow Space

2.2.4. Model Evaluation

2.2.5. Experimental Setup

3. Experimental Result

3.1. Ablation Experiment

3.2. Comparative Experiment

4. Further Evaluation and Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI