Next Article in Journal
Mechanistic Insights into Farmland Soil Carbon Sequestration: A Review of Substituting Green Manure for Nitrogen Fertilizer
Previous Article in Journal
Spent Pleurotus ostreatus Substrate Has Potential for Controlling the Plant-Parasitic Nematode, Radopholus similis in Bananas
Previous Article in Special Issue
Multi-Feature Driver Variable Fusion Downscaling TROPOMI Solar-Induced Chlorophyll Fluorescence Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Spatiotemporal Attention-Guided Graph Neural Network for Precise Hyperspectral Estimation of Corn Nitrogen Content

China Agricultural University, Beijing 100083, China
*
Author to whom correspondence should be addressed.
Agronomy 2025, 15(5), 1041; https://doi.org/10.3390/agronomy15051041
Submission received: 29 March 2025 / Revised: 20 April 2025 / Accepted: 24 April 2025 / Published: 26 April 2025

Abstract

:
A hyperspectral maize nitrogen content prediction model is proposed, integrating a dynamic spectral–spatiotemporal attention mechanism with a graph neural network, with the aim of enhancing the accuracy and stability of nitrogen estimation. Across multiple experiments, the proposed method achieved outstanding performance on the test set, with R 2 = 0.93 , RMSE of 0.35, and MAE of 0.48, significantly outperforming comparative models including SVM, RF, ResNet, and ViT. In experiments conducted across different growth stages, the best performance was observed during the grain-filling stage, where R 2 reached 0.96. In terms of accuracy, recall, and precision, the proposed model exhibited an average improvement exceeding 15%, demonstrating strong adaptability to temporal variation and generalization across spatial conditions. These results provide robust technical support for large-scale, nondestructive nitrogen monitoring in agricultural applications.

1. Introduction

Maize is one of the most important food crops globally, and its growth, development, and yield are influenced by various factors. Among them, nitrogen (N), as one of the essential macronutrients, plays a crucial role in maize growth, physiological metabolism, and final yield [1,2,3]. Proper application of nitrogen fertilizer can not only improve crop yield and quality but also effectively reduce environmental pollution. However, excessive application of nitrogen fertilizer can lead to soil nutrient imbalance, resulting in groundwater contamination, increased atmospheric nitrogen oxide emissions, and ecosystem degradation [4,5]. On the other hand, insufficient nitrogen fertilization may reduce crop growth rate and yield, affecting the economic benefits of agricultural production [6]. Therefore, how to accurately measure crop nitrogen content and optimize nitrogen application strategies has become an important research direction in the field of smart agriculture. Traditional methods for nitrogen content measurement mainly rely on chemical analysis techniques, such as the Kjeldahl method and the Dumas combustion method [7,8]. Bofana José et al. [9] used the Kjeldahl method to determine total nitrogen levels in soil. Their results revealed distinct soil nitrogen credit and debit patterns, with significant differences between preplanting and post-harvest analyses for maize and cotton, showing high uptake and minimal fixation. Contrary to expectations, soybean exhibited high uptake and low fixation, challenging the determination of optimal crop rotation intervals. However, these methods are destructive, labor-intensive, costly, and difficult to scale for large-area rapid monitoring. In recent years, with the development of remote sensing technology, hyperspectral remote sensing has received extensive attention in the estimation of crop physiological and biochemical parameters [10]. Regarding maize nitrogen content estimation, hyperspectral data are considered to have great application potential. Hyperspectral data can provide information on leaf reflectance, absorption, and scattering at different wavelengths, allowing inference of nitrogen content in maize leaves [11]. Li et al. [1] proposed a hybrid approach combining in situ measurements with mechanistic model simulations to improve maize nitrogen estimation. The results showed that the hybrid method consistently outperformed others across four test datasets (with the RMSE ranging from 10.08% to 10.84%). However, these methods usually rely on only a portion of spectral bands and fail to fully exploit the potential of hyperspectral data. In addition, traditional machine learning methods, such as Support Vector Machines (SVMs) and Random Forests (RFs), have been applied to nitrogen content prediction using hyperspectral data, yet they exhibit varying degrees of limitations. SVMs typically rely on manually extracted features and are thus heavily dependent on feature engineering. In contrast, Random Forests possess a certain level of feature selection capability and can be applied directly to high-dimensional raw data. However, due to their relatively shallow model structure, they struggle to capture the complex temporal dependencies and spatial structural information embedded in spectral sequences. Consequently, although these methods can achieve reasonable prediction accuracy under specific conditions, their generalization ability and adaptability remain limited in complex agricultural environments [12].
To better leverage the advantages of hyperspectral data, deep learning methods have been widely applied in agricultural remote sensing in recent years. Convolutional Neural Networks (CNNs) are widely used for spectral feature extraction; through hierarchical convolution operations, CNNs can automatically learn spatial features from spectral imagery [13]. Sabzi Sajad et al. [14] used hyperspectral imaging (HSI) combined with three different regression methods (a hybrid artificial neural network–particle swarm optimization (ANN-PSO), partial least squares regression (PLSR), and one-dimensional deep learning (CNN) to predict nitrogen (N) content in cucumber (Cucumis sativus L., var. Super Arshiya-F1) leaves. The results indicated that PLSR slightly outperformed the CNN and ANN-PSO. Temporal models such as Recurrent Neural Networks (RNNs) can partially solve the temporal dependency problem in hyperspectral data, but they have high computational complexity and suffer from vanishing gradients when handling long sequences [15]. In recent years, transformer-based attention mechanisms have made breakthroughs in natural language processing and computer vision [16]. In agricultural remote sensing, transformer architectures have been applied to tasks such as vegetation index prediction, disease identification, and soil nutrient estimation, demonstrating excellent performance. Zhang et al. [17] proposed a novel deep learning framework called the Self-Supervised Spectral–Spatial Vision Transformer (SSVT). The proposed method achieved high accuracy (0.96) and showed good generalization and reproducibility in wheat nitrogen status estimation. Moreover, Graph Neural Networks (GNNs), as deep learning models capable of capturing topological structural information, show potential advantages in hyperspectral modeling [18]. GNNs can model hyperspectral data as graph structures and learn spatial features through node message propagation, improving prediction stability and robustness for nitrogen content.
Based on the above context, the objective of this study is to propose a hyperspectral-based maize nitrogen content measurement method that integrates dynamic spectral–temporal attention fusion with a GNN to enhance prediction accuracy and generalization. The main contributions of this study are summarized as follows:
  • Temporal–spatial fusion feature extraction module based on the transformer architecture: This structure integrates temporal sequences and spatial locations using a self-attention mechanism to capture global dependencies among spectral bands, thereby improving attention to key bands and enhancing prediction accuracy.
  • Joint modeling of spectral and graph structures: This study innovatively transforms hyperspectral data into graph representations. By leveraging a GNN to propagate and aggregate spectral features between adjacent crop samples, the model enhances spatial dependency learning, thereby improving generalization performance in complex field environments.
  • A dynamic attention weight update mechanism: Considering the significant variation in maize nitrogen content across growth stages, a dynamic attention update strategy was designed that incorporates temporal windows and field-level information, enabling the model to adjust its focus according to phenological stages and enhancing the modeling of temporal features.

2. Related Work

2.1. Traditional Machine Learning for Hyperspectral Estimation of Maize Nitrogen Content

In the study of maize nitrogen content estimation, traditional machine learning methods have been widely used for the modeling and analysis of hyperspectral data. Hyperspectral data contain rich spectral information, where reflectance at different wavelengths can reflect the physiological and biochemical status of crops [19]. To simplify data processing and enhance model generalization, researchers often employ spectral indices as input features for machine learning models. The most commonly used indices include the Normalized Difference Vegetation Index (NDVI), Ratio Vegetation Index (RVI), and Structure Index (SI) [20]. Although spectral indices are easy to compute and interpret, they typically utilize only a small subset of bands from hyperspectral data, overlooking nonlinear relationships among spectral bands. Therefore, traditional spectral-index-based methods face limitations in the accurate estimation of maize nitrogen content. To overcome these limitations, researchers have introduced traditional machine learning algorithms such as SVM and RF to model the relationship between spectral data and maize nitrogen content [21,22]. SVM is a classification and regression algorithm based on the principle of Structural Risk Minimization (SRM). The basic idea is to find an optimal hyperplane that maximizes the margin or minimizes regression error [23]. In regression tasks, SVM adopts the ε -insensitive loss function, with its objective function given by [24]
min w , b 1 2 w 2 + C i = 1 n ξ i ,
Here, w denotes the model parameters, where b is the bias term, C is the regularization parameter, and ξ i represents slack variables. SVM demonstrates strong generalization capabilities in hyperspectral data analysis and is effective for high-dimensional datasets. However, its performance heavily depends on the choice of kernel functions, and computational complexity becomes a concern on large-scale datasets. RF is an ensemble learning method based on decision trees. It builds multiple trees and aggregates their predictions through averaging or voting to improve generalization [25]. The RF prediction function can be expressed as [26]
y ^ = 1 M m = 1 M h m ( x ) ,
where M is the number of trees in the forest and h m ( x ) is the prediction from the m-th tree. RF is robust, resistant to overfitting, and capable of automatic feature selection. However, when dealing with high-dimensional hyperspectral data, the random sampling process may result in the underutilization of critical spectral features.

2.2. Deep Learning-Based Hyperspectral Estimation of Maize Nitrogen Content

In recent years, deep learning approaches have achieved remarkable progress in hyperspectral data analysis, particularly in predicting crop physiological and biochemical properties [27,28]. CNNs and RNNs are two widely used deep learning models in this domain, designed to extract spatial and temporal features, respectively. CNNs, originally developed for computer vision, have demonstrated strong feature extraction capabilities in hyperspectral applications [29]. They utilize receptive fields and weight-sharing mechanisms to capture local patterns and progressively extract high-level features through stacked convolutional layers. For hyperspectral data, CNNs can be used to extract spectral–spatial features. The basic computation is defined as [30]
Z = f ( W X + b ) ,
where X represents the input spectral data, W is the convolution kernel, ∗ denotes the convolution operation, b is the bias, and f ( · ) is a nonlinear activation function such as ReLU. CNNs also employ pooling layers to reduce dimensionality and enhance robustness. While effective in extracting local spectral features, CNNs are limited in capturing long-range dependencies across spectral bands [31]. In contrast, RNNs excel at modeling sequential data and have thus been applied to extract temporal features from spectral sequences. Hyperspectral data often exhibit temporal characteristics, such as reflectance changes during different growth stages of maize [32]. RNNs leverage recurrent connections to retain information from previous time steps, enabling the modeling of long-term dependencies. The RNN update rule is given by [33]
h t = f ( W h h t 1 + W x x t + b ) ,
where h t is the hidden state at time t, h t 1 is the previous hidden state, x t is the input at time t, W h and W x are weight matrices, and b is the bias term. f ( · ) typically denotes a nonlinear activation function [34]. Although CNNs and RNNs have achieved success in hyperspectral data analysis, they still exhibit certain limitations. CNNs, relying on local receptive fields, struggle to capture global spectral dependencies, while RNNs entail high computational complexity and limited scalability for long sequences.

3. Materials and Method

3.1. Data Collection

The hyperspectral data used in this study were collected between March and September 2023 at a controlled agricultural experimental base in Jungar Banner (39°51′56″ N, 111°13′36″ E), Ordos City, Inner Mongolia Autonomous Region, China. This region exhibits a typical temperate semi-arid climate with prolonged annual sunshine duration, making it suitable for maize cultivation and field experiments under different nitrogen fertilization conditions. The experimental site features loamy sand soil with an average pH of 7.8 (slightly alkaline), organic matter content of 12.3 g/kg, total nitrogen of 0.68 g/kg, available phosphorus (Olsen-P) of 18.6 mg/kg, and available potassium of 145.2 mg/kg, indicating a moderate fertility level. These baseline characteristics support stable maize growth and provide a reliable foundation for nitrogen treatment experiments and hyperspectral modeling. The maize variety used in the experiment was “Zhengdan 958”, a widely cultivated hybrid in northern China known for its high yield potential, strong stress resistance, and adaptability across diverse ecological regions. It has been extensively adopted in large-scale agricultural production and is particularly valued for its stable performance and pronounced nitrogen responsiveness, making it ideal for nitrogen content monitoring studies. Hyperspectral images were acquired using a Nano-Hyperspec imaging spectrometer manufactured by Headwall Photonics, which covers a spectral range of 400 nm to 1000 nm, including the visible and near-infrared bands closely related to nitrogen content in the maize canopy. This sensor provides exceptional spectral resolution of 2.5 nm, capturing information across 240 spectral bands, with a spatial resolution of 1.5 mm, enabling high-precision imaging of maize canopies in the field and ensuring the accuracy and stability of spectral data. To maintain consistency and reproducibility in data acquisition, a mobile overhead rail platform was constructed to mount the imaging device and control the imaging height. The imaging height was set to 1.8 m, allowing the capture of the maize canopy’s upper layer while avoiding interference from weeds, shadows, and soil backgrounds, thereby ensuring the purity of the collected data. Data collection was conducted at three critical maize growth stages, including the jointing stage, tasseling stage, and grain-filling stage, with 1004, 1287, and 1167 images collected at each stage, respectively, to capture the complete nitrogen accumulation process, as shown in Figure 1 and Figure 2. In this study, hyperspectral images were acquired under consistent lighting conditions, specifically during cloudless midday periods to ensure uniform spectral reflectance. The spatial resolution was 1.5 mm / pixel , with each image having a dimension of 1010 × 1010 pixels. A total of approximately 3458 hyperspectral images were collected across the jointing, tasseling, and grain-filling stages, with a daily average of around 120 images. Due to issues such as uneven illumination, sensor vibration, or rail misalignment, 312 images were deemed unusable and excluded, resulting in a rejection rate of approximately 9.02%.
Imaging was performed daily between 9:00 a.m. and 11:00 a.m. to minimize the influence of strong direct sunlight at noon, which could cause high contrast and shadow interference, ensuring consistent illumination conditions. A white reference panel with 99% reflectance was used for reflectance calibration during the imaging process. Standard white and black panels were imaged before and after each data collection session to ensure the accuracy of radiometric calibration. Throughout the imaging process, consistent exposure time and shutter parameters were maintained across all samples to minimize the impact of system errors. Real-time imaging was performed using Hyperspec III software provided by Headwall, and the acquired images were saved in the original ENVI format for subsequent processing. The raw images underwent radiometric correction and spectral calibration. Geometric registration algorithms were applied to align all images within a unified spatial reference frame, correcting minor deviations caused by rail movement. In the corrected images, representative maize canopy regions were extracted through region cropping, excluding edge backgrounds and nontarget areas, to ensure that only high-quality canopy reflectance data were used for subsequent analysis.
To investigate the impact of varying nitrogen levels on maize spectral responses, three fertilization treatments were established in the field: (1) no nitrogen application (N0) as a nitrogen-deficient control, (2) conventional nitrogen application (N1) based on the locally recommended rate, and (3) high nitrogen application (N2) at 150% of the recommended rate. A randomized block design with three replications per treatment was employed to ensure experimental robustness and repeatability. Immediately following hyperspectral imaging, representative functional leaves were collected from each plot to obtain ground-truth nitrogen content values for model training. The samples were oven-dried at 70 °C for 48 h, finely ground, and analyzed using the Kjeldahl method to determine total nitrogen content. Each nitrogen measurement was precisely aligned with its corresponding hyperspectral image, enabling accurate mapping between spectral features and biochemical reference values, as shown in Table 1.

3.2. Data Preprocessing

The primary goals of data preprocessing are to remove noise, reduce redundant information, and improve data separability so that subsequent modeling can more efficiently learn useful features. In this study, data preprocessing mainly includes spectral data smoothing, dimensionality reduction, and standardization or normalization. Hyperspectral data are often affected by measurement environments, instrument stability, and atmospheric conditions, which may introduce significant noise. Therefore, spectral smoothing is first applied during preprocessing to reduce noise interference and maintain the smoothness of the spectral curve. Savitzky–Golay filtering (SG filtering) is a widely used spectral smoothing method, which performs local polynomial fitting for smoothing while preserving the key characteristics of the spectral signal. The basic principle of SG filtering is to perform polynomial fitting within a sliding window and optimize the fitting coefficients using the least mean square error. Its mathematical expression is as follows [35]:
y ^ i = j = k k c j y i + j ,
Here, y ^ i denotes the smoothed spectral value, y i + j represents the original spectral value, c j is the fitting coefficient, and k indicates the half-width of the sliding window. Compared with traditional moving average filters, SG filtering better retains spectral details while reducing noise interference, resulting in smoother spectral curves that benefit subsequent feature extraction and modeling. In SG filtering, a third-order polynomial ( d = 3 ) was selected after comparative testing to balance spectral smoothness and detail preservation. Another major challenge of hyperspectral data is their high dimensionality, as each sample contains multiple spectral bands, leading to far higher dimensionality than traditional imagery. High-dimensional data not only increase computational cost but may also trigger the curse of dimensionality, making it difficult for models to effectively extract useful features. Therefore, dimensionality reduction is a necessary step in spectral analysis. Principal Component Analysis (PCA) is a widely used dimensionality reduction method that projects high-dimensional data into a lower-dimensional space via linear transformation while preserving the largest variance information. PCA removes low-variance components while retaining the main data structure. In hyperspectral analysis, PCA effectively reduces dimensionality, enhances computational efficiency, and lowers model complexity. For dimensionality reduction, the number of principal components in PCA was empirically set to 30, which retained over 98% of the cumulative spectral variance across all samples, ensuring the integrity of critical spectral features while reducing redundancy. In the preprocessing process, since different spectral bands of hyperspectral data may have different physical magnitudes, feeding raw data directly into the model may cause slow convergence or even degrade prediction accuracy. Hence, standardization or normalization is necessary to scale all spectral bands within a comparable range and enhance model stability. Standardization transforms data to a normal distribution with zero mean and unit variance, calculated as [36]
X = X μ σ ,
where X is the raw data, μ is the mean, and σ is the standard deviation. Standardized data ensure that spectral bands follow similar distributions, which helps deep learning models converge faster. Normalization, on the other hand, scales data to a fixed range (e.g., [ 0 , 1 ] or [ 1 , 1 ] ), and is computed as [37]
X = X X min X max X min ,
where X min and X max are the minimum and maximum values of the data, respectively. Normalization effectively mitigates the dominance of large-magnitude bands during training and ensures balanced contributions across bands. In hyperspectral processing, the choice between standardization and normalization depends on the application—standardization is preferred for normally distributed data, while normalization is used for non-normal distributions, as shown in Figure 3.

3.3. Proposed Method

3.3.1. Overall View

The objective of this study is to develop a precise maize nitrogen content estimation model based on hyperspectral data by integrating dynamic spectral information, spatiotemporal attention mechanisms, and a GNN. This approach aims to improve prediction accuracy and generalization capabilities under complex agricultural conditions. The proposed method first captures the spectral response of maize at different growth stages through time-series hyperspectral data, thereby modeling the dynamic changes in nitrogen content over time. Based on this foundation, a spatiotemporal attention mechanism is designed to integrate both temporal and spatial dimensions. This mechanism dynamically adjusts attention weights according to the spectral characteristics of different growth stages and the spatial distribution of the plots, enhancing the model’s ability to focus on key spatiotemporal features. To further improve the model’s capability in characterizing in-field spatial heterogeneity, this study incorporates a graph-based structure on top of the extracted spatiotemporal features to enhance the modeling of spatial relationships among samples. Specifically, a graph is constructed based on the geographical proximity or spectral similarity among plots in real-world spatial layout, where each field sample is represented as a node and the edges encode spatial or spectral relationships between samples. The fused features extracted by the spatiotemporal attention mechanism are then used as the initial node representations and embedded into a Graph Neural Network (GNN) to preserve temporal dynamics and spatial contextual information. Graph convolution operations are subsequently applied to propagate and aggregate node features across the graph, enabling spectral feature fusion and spatial context awareness among neighboring nodes. This process facilitates the modeling of spatial consistency while retaining local spectral variations. The entire process can be mathematically described by the following equation:
Z = G ST - Attn ( X ) ; A
where X R N × w × f denotes the input temporal hyperspectral data, with N representing the number of samples, w the temporal window length, and f the spectral dimension (i.e., the number of bands at each time point). ST - Attn ( · ) refers to the proposed spatiotemporal attention mechanism designed to extract and fuse dynamic spectral–temporal features. A R N × N is the spatial adjacency matrix constructed based on geographic proximity or spectral similarity. G ( · ; A ) represents the feature propagation and aggregation process performed over the graph structure. The output Z R N × F is the final sample representation, which integrates spectral–temporal dependencies with spatial structural information. By combining the self-attention mechanism from the transformer architecture with the graph convolutional operation of GNNs, the model is capable of capturing both the global dependencies across spectral bands and the spatial associations between samples, thereby enabling multi-dimensional feature modeling. To ensure model stability and scalability in real-world applications, a dynamic weight update strategy is introduced during training. This allows the model to adapt to nitrogen content variations across different maize growth stages and spatial distributions. Through the construction of a joint spatiotemporal feature extraction framework and the incorporation of graph-based learning mechanisms, the proposed approach provides an efficient and interpretable solution for hyperspectral data-driven crop nitrogen content estimation.

3.3.2. Dynamic Spectral–Spatiotemporal Attention Fusion Module

As shown in Figure 4, the proposed dynamic spectral–spatiotemporal attention fusion module is designed to fully exploit the dynamic evolution of maize hyperspectral data across different growth stages and field locations. By integrating both temporal and spatial modeling mechanisms, the module overcomes the limitations of traditional self-attention mechanisms, which are confined to modeling within a single dimension.
Structurally, the module takes two inputs: temporal hyperspectral data with dimensions X t e m p o r a l R N × w × f and spatial feature data with dimensions X s p a t i a l R N × f , where N represents the number of samples, w denotes the temporal window length, and f indicates the spectral dimension (number of bands) at each time point or spatial location. The temporal branch employs a temporal encoder, which begins with positional encoding to capture temporal order information. This is followed by two stacked multi-head attention layers combined with layer normalization and feed-forward layers to model the dynamic changes in spectral features over time. The resulting temporal feature h t e m p R d t e m p is then projected to a unified dimension of d c o m m o n = 128 via a fully connected layer. The spatial branch utilizes a spatial encoder composed of two Conv1D layers with kernel sizes of 3 and a stride of 1, outputting 64 and 128 channels, respectively, followed by max-pooling with a kernel size of 2. The output undergoes batch normalization, flattening, and a dense layer to produce the spatial feature h s p a t R d s p a t , which is also projected to d c o m m o n = 128 . Unlike standard self-attention mechanisms where Q = K = V in the temporal sequence, the proposed module adopts a cross-attention mechanism that enables feature interaction between heterogeneous temporal and spatial branches. Let the projected temporal feature be Q = h t e m p p r o j and the spatial feature be K = V = h s p a t p r o j ; the cross-attention computation is defined as [38]
CrossAttention ( Q , K , V ) = softmax Q K T d k V .
The fused output h f u s i o n R d c o m m o n is subsequently passed through a fully connected layer W o u t R d c o m m o n × d o u t p u t to generate the final spatiotemporal joint representation for regression prediction. Compared to conventional self-attention mechanisms where Q = K = V , the proposed cross-attention mechanism introduces a complementary interaction between heterogeneous sources, allowing the model to capture the synergies between spectral bands and spatial locations. To ensure contextual consistency and temporal sensitivity, a learnable position encoding P E t R w × d is added to the original input. This allows each time point’s feature to perceive the entire growth process through the multi-head attention layer. The mathematical advantage of the dynamic fusion mechanism lies in enhancing the model’s nonlinear expressive capacity. Let f ( x ) = ϕ ( h f u s i o n ) represent the final prediction function, where ϕ ( · ) denotes the regressor and h f u s i o n is the output from the spatiotemporal attention module. Given that the attention weight matrix A = softmax ( Q K T / d k ) is both differentiable and nonlinear, it follows that h f u s i o n has a higher-order expressive capacity compared to a single temporal or spatial representation. Specifically, for certain i , j , it can be shown that
f x i f x j , and 2 f x i x j 0 .
This indicates that the model can capture not only the first-order responses between input features but also their interaction terms, enabling the accurate modeling of nitrogen content variations. In summary, the dynamic spectral–spatiotemporal attention fusion module exhibits significant advantages over conventional attention mechanisms in terms of structural design, feature dimensionality, cross-domain interaction, and theoretical expressive power, making it a key component for achieving high-precision nitrogen content estimation.

3.3.3. Graph Neural Network

As shown in Figure 5, a GNN is introduced as an extension of the dynamic spectral–spatiotemporal attention fusion module to further enhance the model’s ability to capture field spatial heterogeneity and the local structural dependencies among crops. Unlike traditional hyperspectral analysis models that treat samples independently, this approach represents each field sample as a node in a graph and constructs edges based on either geographical proximity or spectral similarity, forming a graph structure G = ( V , E ) , where V denotes the set of nodes and E represents the set of edges.
Each node v i is associated with a feature representation h i 0 R F , which is initialized with the output h f u s i o n from the dynamic spectral–spatiotemporal attention module. The graph structure is constructed using either a neighborhood search strategy or a spatial-distance-based k-nearest neighbor (kNN) rule to ensure biological and spatial consistency in node connections. The GNN module consists of three graph convolution layers (GCLs) with hidden dimensions d 1 = 128 , d 2 = 64 , and d 3 = 32 , respectively, and an output dimension d o u t = 1 for nitrogen content regression prediction. Within each graph convolution layer, the feature update for node i is computed as follows [39]:
h i ( l + 1 ) = σ j N ( i ) 1 c i j W ( l ) h j ( l ) ,
where h i ( l ) represents the feature of node i at the l-th layer, W ( l ) R d l × d l + 1 is the learnable weight matrix of the l-th layer, N ( i ) denotes the set of neighboring nodes for node i, c i j is the normalization coefficient (e.g., c i j = | N ( i ) | | N ( j ) | ), and σ ( · ) is the nonlinear activation function (LeakyReLU is used in this study). To enhance the model’s ability to retain node-specific information during the update process, a residual connection is incorporated as follows:
h i ( l + 1 ) = α · h i ( l ) + ( 1 α ) · h ^ i ( l + 1 ) ,
where α [ 0 , 1 ] is a balance coefficient, and h ^ i ( l + 1 ) represents the updated value after neighborhood aggregation. This mechanism ensures that the original node information is preserved during each update step, improving model stability and convergence speed. From a graph signal propagation perspective, the GNN structure can be mathematically analyzed by considering the entire node feature matrix H ( l ) R N × d l , where a single graph convolution operation can be expressed as
H ( l + 1 ) = σ D ˜ 1 / 2 A ˜ D ˜ 1 / 2 H ( l ) W ( l ) ,
where A ˜ = A + I represents the adjacency matrix with self-loops and D ˜ is the corresponding degree matrix. This formulation essentially performs Laplacian smoothing on the graph, which suppresses high-frequency noise and enhances local consistency, making it particularly suitable for agricultural scenarios where adjacent samples exhibit similar spectral characteristics. In the regression task, the final output from the GNN module is passed through a multi-layer perceptron (MLP) for further transformation, producing the predicted value y ^ i = f ( h i ( 3 ) ) . The model is optimized using the mean squared error (MSE) loss function:
L G N N = 1 N i = 1 N y i y ^ i 2 ,
where y i represents the ground-truth nitrogen content and y ^ i is the predicted value. The inclusion of the GNN module provides several key advantages for the task addressed in this study. First, it enables the modeling of spatial structures among samples, which is particularly beneficial in agricultural applications where geographic adjacency and spatial consistency of crop physiological states are critical for accurate predictions. Second, GNNs are naturally suited for handling diverse and non-Euclidean data distributions, making them adaptable to real-world agricultural scenarios with irregular field shapes and spectral variations. Third, when integrated with the dynamic spectral–spatiotemporal attention module, the GNN allows the model to simultaneously capture micro-level (spectral band features) and macro-level (regional spatial relationships) information, enhancing both interpretability and stability in maize nitrogen content prediction. By combining carefully designed network parameters, multi-level graph convolution layers, and a robust mathematical modeling framework, the GNN extension significantly enhances the model’s ability to capture spatial structures and offers a reliable technical pathway for future intelligent agricultural remote sensing applications across multiple crops and time phases. As shown in Figure 6, the pseudocode of all the methods are below.

3.4. Hardware and Software Platform

Model training and experiments in this study were conducted on a high-performance computing platform using NVIDIA A100 GPUs to accelerate deep learning model training and improve computational efficiency. The computing environment included an Intel Xeon 64-core CPU and 512GB RAM to support the high computational demands of hyperspectral data. NVMe SSDs were used for data storage to improve data access speeds and optimize the processing of large-scale hyperspectral datasets. All experiments ran on the Ubuntu 20.04 operating system and employed Docker containers for managing the training environment, ensuring experimental reproducibility and consistency across different settings.
Regarding software, Python 3.8 was used as the main programming language. The deep learning framework was PyTorch 2.0, combined with CUDA 12.0 for GPU acceleration. Data processing relied on libraries such as NumPy 1.18.0, Pandas 1.0.0, and OpenCV 4.2.0, while Spectral Python (SPy) 0.19 was used for hyperspectral-specific processing. For model visualization, Matplotlib 3.7.2 and Seaborn 0.12.3 were used. To improve training efficiency, data loading utilized PyTorch’s built-in DataLoader with batch loading and multi-threaded preprocessing to reduce I/O bottlenecks. The dataset was divided into 80% for training, 10% for validation, and 10% for testing to ensure good generalization. The loss function used was MSE, and the optimizer was Adam (Adaptive Moment Estimation) with an initial learning rate of α = 0.001 . A learning rate decay strategy was applied, halving the learning rate every 10 epochs. The weight decay coefficient was set to 10 5 to prevent overfitting. Dropout was also used during training with a dropout rate of 0.3, meaning that 30% of neurons were randomly deactivated during each iteration to improve generalization. Additionally, L2 regularization (weight decay) was added to the loss function to further mitigate overfitting. Supported by robust hardware and carefully selected hyperparameters, the model training process achieved high stability and generalization, ensuring reliable and reproducible experimental results.

3.5. Model Evaluation

In the task of predicting maize nitrogen content from hyperspectral data, several core metrics were employed to comprehensively evaluate model performance, including Coefficient of Determination ( R 2 ), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE). These metrics assess different aspects such as model fit, absolute error magnitude, and error dispersion. The R 2 metric measures the proportion of variance in the true values that can be explained by the model predictions. It ranges from 0 to 1, with higher values indicating better model fit. RMSE quantifies the average magnitude of the prediction errors and maintains the same unit as the original data, providing a direct interpretation of prediction accuracy. MAE calculates the mean absolute deviation between predictions and ground truths and is more robust to outliers than RMSE. Their formulas are as follows:
R 2 = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ ) 2
R M S E = 1 n i = 1 n ( y i y ^ i ) 2
M A E = 1 n i = 1 n | y i y ^ i | ,
where y i is the ground truth, y ^ i is the predicted value, y ¯ is the mean of the ground truths, and n is the number of samples.

3.6. Baseline

The present study selected several representative traditional machine learning and deep learning models as baseline comparisons, including SVM [40], RF [41], ResNet [42], GoogleNet [43], vision transformer (ViT) [44], and Swin-transformer [45]. These models have been widely applied in image classification and object recognition tasks and are supported by well-established theoretical foundations. SVM constructs an optimal hyperplane in a high-dimensional feature space to achieve maximum-margin classification, with the objective of minimizing the function min 1 2 | w | 2 subject to the constraint y i ( w x i + b ) 1 , where w denotes the weight vector, b is the bias term, y i represents the class label, and x i is the input sample. SVM is suitable for small-sample, linearly separable, or kernel-transformable classification problems, but exhibits limitations in handling large-scale image data and extracting unstructured image features. RF, in contrast, is an ensemble learning method that constructs multiple decision trees and performs classification through a voting mechanism. It offers strong robustness to variations in feature dimensions and is insensitive to outliers. ResNet introduces a residual connection mechanism by adding an identity mapping F ( x ) + x within each convolutional block to alleviate the vanishing gradient problem in deep networks. This enables the training of deeper architectures capable of capturing more complex image features. GoogleNet enhances multi-scale feature representation by employing the inception module, which performs parallel convolutions with kernels of different sizes. With the emergence of transformer-based architectures, ViT partitions an image into multiple patches and maps them into a sequence of vectors. It captures global dependencies among patches using the self-attention mechanism. While ViT demonstrates significant advantages in image representation learning, it is prone to overfitting under small-sample conditions. Swin-transformer, an improved version of ViT, incorporates a window-based multi-head self-attention mechanism that confines attention computations within local windows. By applying a window-shifting strategy, global context is gradually constructed. This design balances computational efficiency with global modeling capability and is particularly suitable for high-resolution agricultural image analysis.

4. Results and Discussion

4.1. Experimental Results of Nitrogen Content Estimation Models

The purpose of this experiment is to systematically compare the predictive performance of different types of models for maize nitrogen content estimation based on hyperspectral data. This evaluation includes traditional machine learning methods, classical CNNs, ViT architectures, and the proposed model that integrates dynamic spectral–spatiotemporal attention and a GNN. To ensure fair comparison, the same dataset division and evaluation metrics, including RMSE, MAE, and the coefficient of determination ( R 2 ), were employed.
As shown in Table 2, the overall trend indicates that as model complexity and structural expressiveness increase, predictive performance improves correspondingly. The proposed method achieves the best performance with R M S E = 0.35 , M A E = 0.48 , and R 2 = 0.93 , significantly outperforming all baseline models, thereby validating the potential of combining spatiotemporal attention and graph-based modeling for hyperspectral regression tasks. From the perspective of mathematical modeling and structural characteristics, traditional machine learning methods such as SVM and RF primarily rely on fixed kernel functions or ensemble tree structures, which are limited in their ability to automatically extract deep nonlinear features from high-dimensional hyperspectral data, resulting in relatively low prediction accuracy. CNN models, including ResNet and GoogleNet, improve performance by extracting local spatial features through deep convolutional structures. However, these models exhibit limited capability in modeling global spectral dependencies and temporal dynamics. Attention-based models, such as Swin-transformer and ViT, employ self-attention mechanisms to capture long-range dependencies across spectral bands, thereby enhancing the model’s focus on critical spectral regions and reducing prediction errors. Nevertheless, pure vision transformer architectures, without explicit modeling of temporal dynamics and spatial dependencies, still encounter performance bottlenecks. In contrast, the proposed method dynamically constructs attention weights across temporal and spatial dimensions and integrates a GNN for regional information propagation. Mathematically, this enables the extraction of high-order interactive features by combining global attention with local graph-based smoothing. While self-attention models learn global weighted representations of spectral bands, the GNN propagation mechanism effectively applies graph Laplacian smoothing to model spatial correlations between neighboring samples. This combined approach not only leverages the intrinsic features of individual samples but also incorporates their relational structure, thereby significantly improving the model’s approximation of actual nitrogen content distribution.

4.2. Visualization of Attention Mechanisms and Response to Key Factors

To gain a deeper understanding of the model’s response mechanisms to key spectral and spatiotemporal features in hyperspectral corn nitrogen content estimation, a comprehensive visualization analysis was conducted on six types of attention representations. An attention-based correlation heatmap was constructed to evaluate the statistical associations between the six spatiotemporal attention signals extracted by the model and critical domain-specific factors relevant to nitrogen modeling. The horizontal axis of the heatmap corresponds to six distinct attention types: phenological dynamic attention, spatial-structure-aware attention, key-spectral-band-focused attention, multi-temporal trend attention, spatiotemporal contextual attention, and inter-regional relational attention. The vertical axis represents six categories of task-relevant hyperspectral factors: vegetation index, canopy structure index, soil reflectance feature, growth stage encoding, spatial location context, and spectral nitrogen signature. Each cell in the heatmap quantifies the statistical correlation between a given attention mechanism and a corresponding modeling factor, where higher values indicate a stronger discriminative focus.
As shown in Figure 7, the key-spectral-band-focused attention exhibited the strongest response to the spectral nitrogen signature, with a peak correlation of 0.40, highlighting its ability to localize spectral regions such as the red edge and near-infrared bands that are sensitive to nitrogen content variation. The phenological dynamic attention showed high correlation (up to 0.36) with the growth stage encoding, indicating its effectiveness in modeling the temporal spectral dynamics associated with different developmental stages of the crop. The spatial-structure-aware attention demonstrated pronounced sensitivity to the spatial location context factor, validating its capacity to capture spatial heterogeneity among field plots. The multi-temporal trend attention correlated strongly with the vegetation index and temporal evolution factors, enabling it to capture spectral changes that reflect nitrogen accumulation over time. Meanwhile, the spatiotemporal contextual attention and inter-regional relational attention maintained moderate to high correlations across multiple factors, suggesting robust global modeling and generalization capabilities. These findings collectively confirm that the proposed multi-branch attention mechanism establishes semantically distinct functional modules within the network. Each attention stream contributes to enhanced precision and robustness in hyperspectral nitrogen estimation by targeting specific dimensions of the data space.

4.3. Performance of the Proposed Method at Different Growth Stages

This experiment aims to analyze the variation in prediction accuracy of the proposed dynamic spectral–spatiotemporal attention and GNN model across different maize growth stages, thereby evaluating the model’s generalization ability and sensitivity over time. The physiological and metabolic characteristics of maize, as well as nitrogen absorption rates, exhibit significant differences across growth stages, leading to dynamic changes in hyperspectral responses. Therefore, experiments were conducted at three critical growth stages: the jointing stage, tasseling stage, and grain-filling stage. This approach enables the assessment of model stability and reliability in processing hyperspectral data across different temporal phases.
As shown in Table 3, prediction errors gradually decrease as maize growth progresses, with the best performance observed during the grain-filling stage, where R M S E = 0.33 , M A E = 0.45 , and R 2 = 0.96 . This suggests that the proposed model achieves greater accuracy and stability in nitrogen content estimation during the later growth stages. During the jointing stage, the maize plants are in the early rapid growth phase, where spectral responses such as vegetation indices and nitrogen-sensitive bands are not yet fully expressed, resulting in slightly lower prediction accuracy. During the tasseling stage, model performance improves significantly, indicating that hyperspectral features in this stage more accurately reflect nitrogen content. The grain-filling stage represents the peak period of nitrogen accumulation, during which the increase in nitrogen content is primarily attributed to the sustained remobilization and redistribution of nitrogen from vegetative organs to the developing kernels, rather than continuous external nitrogen input. Throughout this process, the plant maintains a relatively stable nitrogen metabolic level by efficiently utilizing its internal nitrogen resources. This physiological trait leads to more consistent and distinct spectral reflectance characteristics, thereby creating favorable conditions for the model to extract key wavelength information and facilitating improved prediction accuracy. From a mathematical modeling perspective, these experimental results demonstrate the model’s advantage in capturing dynamic temporal features. Traditional models without explicit temporal encoding or growth stage information often struggle to distinguish hyperspectral variations across time, resulting in inconsistent performance. The proposed method incorporates dynamic attention weights to model spectral features across different temporal windows and employs learnable positional encoding to enhance the model’s sensitivity to growth stage variations. This enables adaptive feature extraction across different stages. Additionally, the GNN reconstructs sample graphs at each growth stage and propagates information among neighboring nodes, reinforcing spatial consistency among similar maize samples within the same stage. Notably, during the grain-filling stage of maize, vegetative growth gradually ceases, and the plant enters the later reproductive phase dominated by kernel filling and biomass accumulation. At this stage, nitrogen is extensively remobilized from vegetative organs such as leaves and stems to the developing kernels, resulting in the highest concentration of nitrogen distribution. Functional leaves—particularly those in the middle to upper canopy—maintain strong photosynthetic activity and support a stable level of nitrogen metabolism, while the lower leaves have generally senesced and degraded, contributing minimally to the spectral signal. This physiological status leads to reduced variability in nitrogen content among canopy leaves and yields more consistent and discriminative reflectance in nitrogen-sensitive spectral bands. In addition, as the plant structure stabilizes and leaf area expansion slows down, the interference of external environmental factors on spectral responses is correspondingly reduced. These characteristics result in a spatially regular nitrogen distribution, functionally concentrated leaf activity, and clearer spectral expression, which collectively enable the graph convolution mechanism of the GNN to effectively capture and smooth these spatial dependencies, thereby improving the overall prediction performance at this stage.

4.4. Ablation Study of Different Attention Mechanisms

This experiment was designed to comprehensively evaluate the modeling capabilities and individual contributions of different attention mechanisms in hyperspectral maize nitrogen content prediction tasks. Specifically, comparisons were made between two mainstream attention architectures—Coordinate Attention and Triplet Attention—and the proposed dynamic spectral–spatiotemporal attention mechanism, with a focus on their respective strengths in multi-dimensional hyperspectral feature extraction. Given the redundancy and nonlinear couplings inherent in hyperspectral data, properly designed attention modules are crucial for guiding the model toward informative spectral bands and spatial–temporal regions, thereby improving estimation accuracy. To ensure a fair comparison, the same backbone architecture and training configuration were used across all models, with only the attention modules varied. The prediction performance was evaluated using root mean squared error (RMSE), mean absolute error (MAE), and the coefficient of determination ( R 2 ), capturing both error magnitude and model fit.
As shown in Table 4, the Coordinate Attention model achieved RMSE = 0.54, MAE = 0.66, and R 2 = 0.74 on the test set. This architecture incorporates attention weights along horizontal and vertical coordinate directions, enabling the model to perceive channel-wise dependencies separately. However, since it is primarily designed for spatial feature modeling in conventional image domains, its effectiveness is limited when handling spectral–temporal joint structures. The Triplet Attention model, which enhances feature representation by constructing three orthogonal attention paths across spatial dimensions, demonstrated improved performance with RMSE = 0.47, MAE = 0.56, and R 2 = 0.85 , reflecting stronger feature integration capability. In contrast, the proposed dynamic spectral–spatiotemporal attention mechanism achieved the best results across all metrics (RMSE = 0.35, MAE = 0.48, and R 2 = 0.93 ), significantly outperforming other attention structures. This module integrates temporal sequence modeling, spatial context awareness, and cross-scale interaction through learnable positional encodings, heterogeneous attention branches, and cross-domain fusion strategies. The inherent asymmetry and contextual responsiveness of this structure allow for the construction of higher-order spectral–spatiotemporal dependencies aligned with the complexities of real agricultural environments, demonstrating superior generalization capability and robustness.

4.5. Ablation Study on GNN Contribution

To quantitatively assess the contribution of the Graph Neural Network (GNN) component in the proposed model, a supplementary ablation study was conducted by removing or altering the graph structure while retaining the spatiotemporal attention module. Specifically, three configurations were tested: (1) the full model with both spatiotemporal attention and GNN (baseline), (2) a variant without the GNN module (only the attention fusion module), and (3) a simplified version using a fully connected layer (FC) instead of graph-based propagation to simulate global feature aggregation without spatial structure modeling. The same dataset partitioning, training strategies, and evaluation metrics were used for consistency.
As shown in Table 5, removing the GNN module led to a notable decline in performance, with R 2 dropping from 0.93 to 0.88 and RMSE increasing from 0.35 to 0.44. When the GNN was replaced with a fully connected layer to simulate global aggregation, a slight improvement was observed compared to the purely attention-based variant, but it still fell short of the full model. These results confirm that the GNN component plays a critical role in leveraging spatial dependencies and enhancing the generalization ability of the model. By constructing a structured graph representation based on spatial adjacency or spectral similarity, the GNN captures regional correlations and reinforces local consistency, which cannot be fully substituted by nonstructural aggregation methods. This highlights the necessity of including GNNs in hyperspectral modeling for agricultural prediction tasks involving spatial heterogeneity.

4.6. Computational Efficiency and Inference Speed Analysis

To assess the practical applicability of the proposed model in real-world agricultural scenarios, particularly on edge or mobile platforms, this section evaluates the computational requirements and inference latency of various nitrogen content estimation models, including our proposed method. Experiments were conducted on an NVIDIA A100 GPU and an NVIDIA Jetson Xavier NX edge device to simulate both high-performance and resource-constrained environments. Metrics include the number of parameters, FLOPs, average inference time per sample, and GPU memory usage.
As shown in Table 6, the proposed method requires moderate computational resources, with 37.2 M parameters and 7.8 GFLOPs, resulting in an average inference time of 9.4 milliseconds per sample on the A100 GPU and 27.8 milliseconds on Jetson NX. While it is more computationally intensive than traditional models such as SVM or RF, it remains significantly more efficient than large transformer models like ViT. Importantly, the model can be deployed on edge devices such as the Jetson NX with acceptable latency, supporting real-time or near-real-time field applications. These results validate the practical feasibility of the proposed model in precision agriculture environments where both accuracy and speed are essential.

4.7. Discussion

The hyperspectral-image-based maize nitrogen content estimation model proposed in this study demonstrated stable and accurate predictive performance across multiple experiments, with particularly strong results observed during the grain-filling stage. At this growth phase, the plant transitions into the late reproductive period, during which nitrogen is extensively translocated from vegetative organs to the developing kernels. The canopy structure becomes more stable and leaf functionality is concentrated in the upper layers, resulting in higher spectral consistency within nitrogen-sensitive wavelength bands and enhancing the model’s discriminative capability. Comparative ablation experiments involving various attention mechanisms and graph structure configurations confirmed the critical role of the graph neural network in modeling canopy spatial structure. Deployment experiments on edge devices such as the Jetson NX further demonstrated favorable inference speed and resource efficiency, supporting practical use in field monitoring or real-time UAV-based analysis. When integrated into fertilization management practices, the proposed approach provides high spatiotemporal resolution decision support for precise nutrient regulation, particularly suitable for applications such as topdressing window identification and nitrogen distribution evaluation. Future work may extend the model to diverse regions, crops, and agronomic practices to enhance its generalization capability and in-field robustness.

4.8. Limitation and Future Work

Although this study has achieved significant results in hyperspectral maize nitrogen content prediction, certain limitations remain. First, the construction of the graph structure mainly relies on spatial adjacency or similarity rules between samples and fails to fully integrate multi-source information that may be contained in remote sensing images, such as meteorological data, soil characteristics, and management measures. Although this design simplifies the graph construction process and has a certain computational efficiency, it has obvious limitations in simulating multi-factor interactions in complex agricultural environments, which may affect the generalization ability of the model. Second, although the introduction of the dynamic spectral–spatiotemporal attention mechanism enhances the model’s ability to perceive features across different growth stages, the model still faces challenges related to the degradation of temporal dependency information over long time spans or under complex temporal conditions. Additionally, the data used in this study were primarily collected from Jungar Banner in Inner Mongolia, a relatively homogeneous geographic region. This, to some extent, limits the model’s adaptability to varying climatic conditions and agronomic management practices. Moreover, due to the high cost of acquiring hyperspectral data, the available sample size is relatively limited. This constraint may lead to reduced model performance in extreme growth states or at the edges of the study area, and additional training is required for model adaptation when applied to new regions or crops. Future research could explore several directions for further optimization and extension. On the one hand, the framework could be extended to incorporate data from multiple regions and diverse field types, enabling a more comprehensive evaluation of the model’s generalization ability and robustness. On the other hand, one potential avenue involves expanding the current approach to a multi-source data fusion framework by incorporating heterogeneous inputs such as meteorological data, soil characteristics, and UAV imagery, with the aim of enhancing the model’s robustness and adaptability under varying environmental conditions. While this direction holds significant potential, we acknowledge that it also imposes higher demands on data integration and knowledge engineering. Another promising direction lies in improving the graph neural network by adopting a dynamic graph construction mechanism, allowing edge weights between nodes to update dynamically based on temporal evolution or agronomic management practices. Such an enhancement would better capture real-time changes in field structures. Additionally, further efforts should focus on model lightweighting to enable deployment on edge computing devices or mobile platforms, thereby facilitating the practical implementation of precision agriculture technologies. Finally, incorporating agronomic expert knowledge and domain-specific priors into the model could help develop a more interpretable and controllable nitrogen content prediction system. This would provide a solid theoretical foundation and technical support for intelligent fertilization and sustainable agricultural development.

5. Conclusions

As one of the most important cereal crops globally, maize relies heavily on nitrogen nutrition, which plays a decisive role in yield formation and quality improvement. To achieve intelligent and precise crop nutrient management, it is imperative to develop an efficient, robust, and generalizable nondestructive method for nitrogen content monitoring. Addressing the limitations of traditional nitrogen measurement approaches—such as low efficiency and poor regional adaptability—a hyperspectral nitrogen estimation model is proposed that integrates a dynamic spectral–spatiotemporal attention mechanism with a graph neural network. This model effectively captures the spectral–temporal feature variations occurring throughout crop development and accurately models spatial–structural relationships among field samples, thereby enhancing both the precision and the stability of nitrogen content estimation. The methodological innovations of this study are mainly reflected in three aspects. First, a transformer-based dynamic spectral–spatiotemporal attention fusion module is proposed to enable heterogeneous feature modeling across different growth stages and spatially distributed maize samples. Second, a graph neural network is incorporated to construct graph structures among field samples, thereby improving the model’s sensitivity to spatial heterogeneity and local consistency. Third, through multi-stage experiments, the proposed approach is validated to exhibit strong robustness and generalization capability across various baseline models and phenological stages. In comparative experiments, the proposed method achieved the highest prediction accuracy on the test set, with a coefficient of determination R 2 reaching 0.93, significantly outperforming mainstream models such as SVM, RF, ResNet, and ViT. Furthermore, in stage-wise evaluations during the jointing, tasseling, and grain-filling phases, the model achieved R 2 values of 0.92, 0.95, and 0.96, respectively, demonstrating excellent temporal adaptability. Additionally, the model exhibited superior performance in terms of MAE and RMSE, with average improvements exceeding 15% while maintaining leading results in extended regression metrics such as precision and recall. Overall, the proposed method offers an effective solution for intelligent modeling of agricultural hyperspectral data and nitrogen monitoring, providing a solid technical foundation for the advancement of smart agriculture and precision fertilization systems.

Author Contributions

Conceptualization, F.L., B.Z., Y.H. and C.L.; data curation, X.X. and C.D.; formal analysis, W.L., C.D. and L.L.; funding acquisition, C.L.; investigation, W.L.; methodology, F.L., B.Z. and Y.H.; project administration, C.L.; resources, X.X.; software, F.L., B.Z. and Y.H.; supervision, C.L.; validation, W.L. and L.L.; visualization, X.X., C.D. and L.L.; writing—original draft, F.L., B.Z., Y.H., X.X., W.L., C.D., L.L. and C.L.; F.L., B.Z. and Y.H. contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to express their sincere gratitude to the Computer Association of China Agricultural University (ECC) for their valuable technical support. Upon the acceptance of this paper, the project code and the dataset will be made publicly available to facilitate further research and development in this field.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Li, J.; Ge, Y.; Puntel, L.A.; Heeren, D.M.; Bai, G.; Balboa, G.R.; Gamon, J.A.; Arkebauer, T.J.; Shi, Y. Integrating UAV hyperspectral data and radiative transfer model simulation to quantitatively estimate maize leaf and canopy nitrogen content. Int. J. Appl. Earth Obs. Geoinf. 2024, 129, 103817. [Google Scholar] [CrossRef]
  2. Saudy, H.S.; Mohamed El-Metwally, I. Effect of irrigation, nitrogen sources, and metribuzin on performance of maize and its weeds. Commun. Soil Sci. Plant Anal. 2023, 54, 22–35. [Google Scholar] [CrossRef]
  3. Zhao, X.; Dong, Q.; Han, Y.; Zhang, K.; Shi, X.; Yang, X.; Yuan, Y.; Zhou, D.; Wang, K.; Wang, X.; et al. Maize/peanut intercropping improves nutrient uptake of side-row maize and system microbial community diversity. BMC Microbiol. 2022, 22, 14. [Google Scholar] [CrossRef]
  4. Liang, Y.; Wang, J.; Wang, Z.; Hu, D.; Jiang, Y.; Han, Y.; Wang, Y. Fulvic acid alleviates the stress of low nitrogen on maize by promoting root development and nitrogen metabolism. Physiol. Plant. 2024, 176, e14249. [Google Scholar] [CrossRef] [PubMed]
  5. Wang, S.; Guan, K.; Wang, Z.; Ainsworth, E.A.; Zheng, T.; Townsend, P.A.; Liu, N.; Nafziger, E.; Masters, M.D.; Li, K.; et al. Airborne hyperspectral imaging of nitrogen deficiency on crop traits and yield of maize by machine learning and radiative transfer modeling. Int. J. Appl. Earth Obs. Geoinf. 2021, 105, 102617. [Google Scholar] [CrossRef]
  6. Li, Z.W.; Wang, G.Y.; Khan, K.; Yang, L.; Chi, Y.X.; Wang, Y.; Zhou, X.B. Irrigation combines with nitrogen application to optimize soil carbon and nitrogen, increase maize yield, and nitrogen use efficiency. Plant Soil 2024, 499, 605–620. [Google Scholar] [CrossRef]
  7. Shtylla, B.; Pikuli, K.; Morava, K.; Karapanci, N.; Zhidro, N. Determination of Proteins by the Kjeldahl Method in Cereals of the Markets in Korça City. JASRD J. Agric. Sustain. Rural Dev. 2024, 2, 26–33. [Google Scholar] [CrossRef]
  8. Kumar, K.; Parihar, C.M.; Nayak, H.S.; Sena, D.R.; Godara, S.; Dhakar, R.; Patra, K.; Sarkar, A.; Bharadwaj, S.; Ghasal, P.C.; et al. Modeling maize growth and nitrogen dynamics using CERES-Maize (DSSAT) under diverse nitrogen management options in a conservation agriculture-based maize-wheat system. Sci. Rep. 2024, 14, 11743. [Google Scholar] [CrossRef]
  9. Bofana, J.; Mussane, R.D.; Tamele, R.A.; Costa, A.C.; Savanguane, B.; Popinsky, I.; Matusse, A.F. Quantifying the Spatiotemporal Variations of Soil Nitrogen Fixation or Absorption from Soybean, Cotton, and Maize Planted Fields to Support Sustainable Agriculture Practices. Nitrogen 2024, 5, 1135–1155. [Google Scholar] [CrossRef]
  10. Shen, J.; Huang, Y.; Chen, W.; Li, M.; Tan, W.; Wang, R.; Deng, Y.; Gong, Y.; Ai, S.; Liu, N. Assessing the Transferability of Models for Predicting Foliar Nutrient Concentrations Across Maize Cultivars. Remote Sens. 2025, 17, 652. [Google Scholar] [CrossRef]
  11. Guo, A.; Huang, W.; Wang, K.; Qian, B.; Cheng, X. Early monitoring of maize northern leaf blight using vegetation indices and plant traits from multiangle hyperspectral data. Agriculture 2024, 14, 1311. [Google Scholar] [CrossRef]
  12. Mandal, D.; de Siqueira, R.; Longchamps, L.; Khosla, R. Machine learning and fluorosensing for estimation of maize nitrogen status at early growth-stages. Comput. Electron. Agric. 2024, 225, 109341. [Google Scholar] [CrossRef]
  13. Wang, Z.; Fan, S.; An, T.; Zhang, C.; Chen, L.; Huang, W. Detection of insect-damaged maize seed using hyperspectral imaging and hybrid 1D-CNN-BiLSTM model. Infrared Phys. Technol. 2024, 137, 105208. [Google Scholar] [CrossRef]
  14. Sabzi, S.; Pourdarbani, R.; Rohban, M.H.; García-Mateos, G.; Arribas, J.I. Estimation of nitrogen content in cucumber plant (Cucumis sativus L.) leaves using hyperspectral imaging data with neural network and partial least squares regressions. Chemom. Intell. Lab. Syst. 2021, 217, 104404. [Google Scholar] [CrossRef]
  15. Geng, J.; Yang, C.; Li, Y.; Lan, L.; Luo, Q. MPA-RNN: A novel attention-based recurrent neural networks for total nitrogen prediction. IEEE Trans. Ind. Inform. 2022, 18, 6516–6525. [Google Scholar] [CrossRef]
  16. Zhang, X.; Han, L.; Sobeih, T.; Lappin, L.; Lee, M.A.; Howard, A.; Kisdi, A. The self-supervised spectral–spatial vision transformer network for accurate prediction of wheat nitrogen status from UAV imagery. Remote Sens. 2022, 14, 1400. [Google Scholar] [CrossRef]
  17. Zhang, X.; Han, L.; Sobeih, T.; Lappin, L.; Lee, M.; Howard, A.; Kisdi, A. The self-supervised spectral-spatial attention-based transformer network for automated, accurate prediction of crop nitrogen status from UAV imagery. arXiv 2021, arXiv:2111.06839. [Google Scholar]
  18. Gupta, A.; Singh, A. Agri-gnn: A novel genotypic-topological graph neural network framework built on graphsage for optimized yield prediction. arXiv 2023, arXiv:2310.13037. [Google Scholar]
  19. da Silva, B.C.; de Mello Prado, R.; Baio, F.H.R.; Campos, C.N.S.; Teodoro, L.P.R.; Teodoro, P.E.; Santana, D.C.; Fernandes, T.F.S.; da Silva Junior, C.A.; de Souza Loureiro, E. New approach for predicting nitrogen and pigments in maize from hyperspectral data and machine learning models. Remote Sens. Appl. Soc. Environ. 2024, 33, 101110. [Google Scholar] [CrossRef]
  20. Ma, C.; Zhai, L.; Li, C.; Wang, Y. Hyperspectral estimation of nitrogen content in different leaf positions of wheat using machine learning models. Appl. Sci. 2022, 12, 7427. [Google Scholar] [CrossRef]
  21. Shu, M.; Zhu, J.; Yang, X.; Gu, X.; Li, B.; Ma, Y. A spectral decomposition method for estimating the leaf nitrogen status of maize by UAV-based hyperspectral imaging. Comput. Electron. Agric. 2023, 212, 108100. [Google Scholar] [CrossRef]
  22. Sun, Q.; Chen, L.; Gu, X.; Zhang, S.; Dai, M.; Zhou, J.; Gu, L.; Zhen, W. Estimation of canopy nitrogen nutrient status in lodging maize using unmanned aerial vehicles hyperspectral data. Ecol. Inform. 2023, 78, 102315. [Google Scholar] [CrossRef]
  23. Chen, B.; Lu, X.; Yu, S.; Gu, S.; Huang, G.; Guo, X.; Zhao, C. The application of machine learning models based on leaf spectral reflectance for estimating the nitrogen nutrient index in maize. Agriculture 2022, 12, 1839. [Google Scholar] [CrossRef]
  24. Cherkassky, V.; Ma, Y. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw. 2004, 17, 113–126. [Google Scholar] [CrossRef]
  25. Zhang, Y.; Xia, C.; Zhang, X.; Cheng, X.; Feng, G.; Wang, Y.; Gao, Q. Estimating the maize biomass by crop height and narrowband vegetation indices derived from UAV-based hyperspectral images. Ecol. Indic. 2021, 129, 107985. [Google Scholar] [CrossRef]
  26. Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
  27. Yang, W.; Nigon, T.; Hao, Z.; Paiao, G.D.; Fernández, F.G.; Mulla, D.; Yang, C. Estimation of corn yield based on hyperspectral imagery and convolutional neural network. Comput. Electron. Agric. 2021, 184, 106092. [Google Scholar] [CrossRef]
  28. Zhao, R.; An, L.; Tang, W.; Qiao, L.; Wang, N.; Li, M.; Sun, H.; Liu, G. Improving chlorophyll content detection to suit maize dynamic growth effects by deep features of hyperspectral data. Field Crop. Res. 2023, 297, 108929. [Google Scholar] [CrossRef]
  29. Sun, L.; Yang, C.; Wang, J.; Cui, X.; Suo, X.; Fan, X.; Ji, P.; Gao, L.; Zhang, Y. Automatic Modeling Prediction Method of Nitrogen Content in Maize Leaves Based on Machine Vision and CNN. Agronomy 2024, 14, 124. [Google Scholar] [CrossRef]
  30. Zhao, X.; Wang, L.; Zhang, Y.; Han, X.; Deveci, M.; Parmar, M. A review of convolutional neural networks in computer vision. Artif. Intell. Rev. 2024, 57, 99. [Google Scholar] [CrossRef]
  31. Zhang, L.; An, D.; Wei, Y.; Liu, J.; Wu, J. Prediction of oil content in single maize kernel based on hyperspectral imaging and attention convolution neural network. Food Chem. 2022, 395, 133563. [Google Scholar] [CrossRef] [PubMed]
  32. Gallo, I.; Boschetti, M.; Rehman, A.U.; Candiani, G. Self-supervised convolutional neural network learning in a hybrid approach framework to estimate chlorophyll and nitrogen content of maize from hyperspectral images. Remote Sens. 2023, 15, 4765. [Google Scholar] [CrossRef]
  33. Das, S.; Tariq, A.; Santos, T.; Kantareddy, S.S.; Banerjee, I. Recurrent neural networks (RNNs): Architectures, training tricks, and introduction to influential research. Mach. Learn. Brain Disord. 2023, 197, 117–138. [Google Scholar]
  34. Toledo, C.A.; Crawford, M.; Vyn, T. Maize yield prediction based on multi-modality remote sensing and LSTM models in nitrogen management practice trials. In Proceedings of the 2022 12th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS), Rome, Italy, 13–16 September 2022; pp. 1–7. [Google Scholar]
  35. Savitzky, A.; Golay, M.J. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 1964, 36, 1627–1639. [Google Scholar] [CrossRef]
  36. Kreutzer, J.S.; Caplan, B.; DeLuca, J. Encyclopedia of Clinical Neuropsychology; Springer: New York, NY, USA, 2011; Volume 28. [Google Scholar]
  37. Asesh, A. Normalization and bias in time series data. In Proceedings of the Conference on Multimedia, Interaction, Design and Innovation, Online, 9–10 December 2021; Springer: Cham, Switzerland, 2021; pp. 88–97. [Google Scholar]
  38. Shaw, P.; Uszkoreit, J.; Vaswani, A. Self-attention with relative position representations. arXiv 2018, arXiv:1803.02155. [Google Scholar]
  39. Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
  40. Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
  41. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  42. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Online, 5 July 2016; pp. 770–778. [Google Scholar]
  43. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Online, 20 December 2015; pp. 1–9. [Google Scholar]
  44. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  45. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Online, 4 August 2021; pp. 10012–10022. [Google Scholar]
Figure 1. Illustration of sample distribution across maize growth stages. Hyperspectral images were collected during three representative stages: jointing stage (1004 samples), tasseling stage (1287 samples), and grain-filling stage (1167 samples).
Figure 1. Illustration of sample distribution across maize growth stages. Hyperspectral images were collected during three representative stages: jointing stage (1004 samples), tasseling stage (1287 samples), and grain-filling stage (1167 samples).
Agronomy 15 01041 g001
Figure 2. Representative image samples collected at three key maize growth stages: jointing stage, heading stage, and grain-filling stage.
Figure 2. Representative image samples collected at three key maize growth stages: jointing stage, heading stage, and grain-filling stage.
Agronomy 15 01041 g002
Figure 3. NDVI visualization of maize canopy during preprocessing. The left panel shows the grayscale NDVI map, while the right panel presents the pseudo-color rendering, highlighting spatial variations in canopy reflectance related to nitrogen content.
Figure 3. NDVI visualization of maize canopy during preprocessing. The left panel shows the grayscale NDVI map, while the right panel presents the pseudo-color rendering, highlighting spatial variations in canopy reflectance related to nitrogen content.
Agronomy 15 01041 g003
Figure 4. Structural diagram of the dynamic spectral–spatiotemporal attention fusion module, which consists of dual-branch encoders for temporal and spatial inputs.
Figure 4. Structural diagram of the dynamic spectral–spatiotemporal attention fusion module, which consists of dual-branch encoders for temporal and spatial inputs.
Agronomy 15 01041 g004
Figure 5. Schematic diagram of the GNN structure, illustrating the node feature update and node coordinate update processes.
Figure 5. Schematic diagram of the GNN structure, illustrating the node feature update and node coordinate update processes.
Agronomy 15 01041 g005
Figure 6. Pseudocode.
Figure 6. Pseudocode.
Agronomy 15 01041 g006
Figure 7. Heatmap showing the correlation between six types of attention representations and key hyperspectral features for maize nitrogen content estimation. The x-axis denotes different attention mechanisms, and the y-axis denotes critical spectral or spatiotemporal factors. Color intensity indicates the strength of correlation.
Figure 7. Heatmap showing the correlation between six types of attention representations and key hyperspectral features for maize nitrogen content estimation. The x-axis denotes different attention mechanisms, and the y-axis denotes critical spectral or spatiotemporal factors. Color intensity indicates the strength of correlation.
Agronomy 15 01041 g007
Table 1. Nitrogen fertilization treatments used in the field experiment.
Table 1. Nitrogen fertilization treatments used in the field experiment.
GroupTreatment CodeDescription and Fertilizer Rate (kg N/ha)
1N0No nitrogen (control); 0 kg N/ha
2N1Recommended nitrogen application (N rate); 180 kg N/ha
3N2High nitrogen application (150% of N1); 270 kg N/ha
Table 2. Experimental results of nitrogen content estimation models.
Table 2. Experimental results of nitrogen content estimation models.
ModelRMSEMAER2
SVM0.670.760.84
RF0.610.730.86
ResNet0.580.690.87
GoogleNet0.530.650.88
Swin-transformer0.480.600.90
ViT0.420.540.88
Proposed method0.350.480.93
Table 3. Performance of the proposed method at different growth stages.
Table 3. Performance of the proposed method at different growth stages.
Growth StageRMSEMAER2
Jointing stage0.380.520.92
Tasseling stage0.360.490.95
Grain-filling stage0.330.450.96
Table 4. Ablation study of different attention mechanisms.
Table 4. Ablation study of different attention mechanisms.
ModelRMSEMAER2
Coordinate Attention0.540.660.74
Triplet Attention0.470.560.85
Proposed method0.350.480.93
Table 5. Ablation study on the contribution of the GNN in nitrogen content estimation.
Table 5. Ablation study on the contribution of the GNN in nitrogen content estimation.
ModelRMSEMAE R 2
Full model (attention + GNN)0.350.480.93
Without GNN (only attention module)0.440.570.88
Attention + FC replacement for GNN0.410.540.89
Table 6. Computational efficiency and inference speed comparison of nitrogen content estimation models.
Table 6. Computational efficiency and inference speed comparison of nitrogen content estimation models.
ModelParams (M)FLOPs (G)Inference Time (ms)Memory Usage (MB)
SVM--1.452
RF--1.663
ResNet23.54.18.9430
ViT86.715.312.5812
Proposed method37.27.89.4598
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lu, F.; Zhang, B.; Hou, Y.; Xiong, X.; Dong, C.; Lu, W.; Li, L.; Lv, C. A Spatiotemporal Attention-Guided Graph Neural Network for Precise Hyperspectral Estimation of Corn Nitrogen Content. Agronomy 2025, 15, 1041. https://doi.org/10.3390/agronomy15051041

AMA Style

Lu F, Zhang B, Hou Y, Xiong X, Dong C, Lu W, Li L, Lv C. A Spatiotemporal Attention-Guided Graph Neural Network for Precise Hyperspectral Estimation of Corn Nitrogen Content. Agronomy. 2025; 15(5):1041. https://doi.org/10.3390/agronomy15051041

Chicago/Turabian Style

Lu, Feiyu, Boming Zhang, Yifei Hou, Xiao Xiong, Chaoran Dong, Wenbo Lu, Liangxue Li, and Chunli Lv. 2025. "A Spatiotemporal Attention-Guided Graph Neural Network for Precise Hyperspectral Estimation of Corn Nitrogen Content" Agronomy 15, no. 5: 1041. https://doi.org/10.3390/agronomy15051041

APA Style

Lu, F., Zhang, B., Hou, Y., Xiong, X., Dong, C., Lu, W., Li, L., & Lv, C. (2025). A Spatiotemporal Attention-Guided Graph Neural Network for Precise Hyperspectral Estimation of Corn Nitrogen Content. Agronomy, 15(5), 1041. https://doi.org/10.3390/agronomy15051041

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop