Multi-Source Feature Selection and Explainable Machine Learning Approach for Mapping Nitrogen Balance Index in Winter Wheat Based on Sentinel-2 Data

Shi, Botai; Chen, Xiaokai; Guo, Yiming; Liu, Li; Li, Peng; Chang, Qingrui

doi:10.3390/rs17183196

Open AccessArticle

Multi-Source Feature Selection and Explainable Machine Learning Approach for Mapping Nitrogen Balance Index in Winter Wheat Based on Sentinel-2 Data

by

Botai Shi

¹,

Xiaokai Chen

¹

,

Yiming Guo

¹

,

Li Liu

¹,

Peng Li

¹ and

Qingrui Chang

^1,2,*

¹

College of Natural Resources and Environment, Northwest A&F University, Xianyang 712100, China

²

Key Laboratory of Plant Nutrition and Agri-Environment in Northwest China, Ministry of Agriculture, Xianyang 712100, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(18), 3196; https://doi.org/10.3390/rs17183196

Submission received: 19 July 2025 / Revised: 11 September 2025 / Accepted: 12 September 2025 / Published: 16 September 2025

(This article belongs to the Special Issue Perspectives of Remote Sensing for Precision Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Highlights

What are the main findings?

The Sen2Res super-resolution improved Sentinel-2 features, enhancing NBI correlations in winter wheat.
The optimized RF model (R² = 0.77, RMSE = 1.57) outperformed linear models; SHAP highlighted red-edge/NIR dominance (~75%).

What is the implication of the main finding?

The workflow enables regional, high-accuracy NBI monitoring and supports precision fertilization.
Combining SHAP interpretation with model outputs improves the transparency and transferability of nutrient monitoring.

Abstract

The Nitrogen Balance Index is a key indicator of crop nitrogen status, but conventional monitoring methods are invasive, costly, and unsuitable for large-scale application. This study targets early-season winter wheat in the Guanzhong Plain and proposes a framework that integrates Sentinel-2 imagery with Sen2Res super-resolution reconstruction, multi-feature optimization, and interpretable machine learning. Super-resolved imagery demonstrated improved spatial detail and enhanced correlations between reflectance, texture, and vegetation indices and the Nitrogen Balance Index compared to native imagery. A two-stage feature-selection strategy, combining correlation analysis and recursive feature elimination, identified a compact set of key variables. Among the tested algorithms, the random forest model achieved the highest accuracy, with R² = 0.77 and RMSE = 1.57, representing an improvement of about 20% over linear models. Shapley Additive Explanations revealed that red-edge and near-infrared features accounted for up to 75% of predictive contributions, highlighting their physiological relevance to nitrogen metabolism. Overall, this study contributes to the remote sensing of crop nitrogen status through three aspects: (1) integration of super-resolution with feature fusion to overcome coarse spatial resolution, (2) adoption of a two-stage feature optimization strategy to reduce redundancy, and (3) incorporation of interpretable modeling to improve transparency. The proposed framework supports regional-scale NBI monitoring and provides a scientific basis for precision fertilization.

Keywords:

nitrogen balance index; sentinel-2; super-resolution reconstruction; feature optimization; machine learning; SHAP interpretability

1. Introduction

Wheat is a major staple crop globally, constituting a fundamental part of agricultural civilization and playing a pivotal role in global food security [1]. One of the primary objectives in precision agriculture is to decrease the input–output ratio in crop production, a process influenced by various factors [2]. Among these, nitrogen input represents a critical factor. Nitrogen regulates photosynthesis and serves as an essential macronutrient for cellular structure and function, especially during the early stages of crop growth and development [3,4,5]. Excessive nitrogen fertilization can cause soil pollution and premature crop maturation, ultimately decreasing yield. In contrast, nitrogen deficiency leads to poor crop development, reduced flag leaf density, and heightened susceptibility to pests and diseases [6]. Therefore, dynamic monitoring of nitrogen content during the early stages of crop growth is essential for effective field management and the implementation of variable-rate fertilization strategies.

Traditional wet chemistry methods have historically been used to determine plant nitrogen concentration; however, these approaches are destructive. In recent decades, leaf-clip sensor technologies for assessing plant nitrogen status have advanced rapidly, with devices such as the SPAD-502 estimating chlorophyll content to guide nitrogen fertilization management. Although numerous studies have demonstrated a strong correlation between chlorophyll content and nitrogen status, this relationship is susceptible to leaf water content, thickness, and structure, potentially reducing measurement accuracy [7].

The Nitrogen Balance Index (NBI) is a dimensionless indicator of plant nitrogen status that reflects the balance between optical signals related to chlorophyll and to flavonoids [8]. NBI is defined as the ratio of chlorophyll to flavonols (Chl/Flav), with flavonols constituting a subclass of polyphenols. When nitrogen is sufficient, synthesis of photosynthetic pigments and protein complexes associated with pigments increases, whereas production of flavonoids for ultraviolet screening and antioxidant functions is relatively reduced, so NBI increases. Under nitrogen limitation or other stresses, the opposite pattern occurs, and NBI decreases [9,10]. Simultaneous measurement of polyphenols in the leaf epidermis and chlorophyll content helps overcome the limitations of chlorophyll-based estimation, as the chlorophyll-to-polyphenol ratio is less affected by heterogeneity in the chlorophyll distribution across the leaf surface [11].

NBI captures the balance between primary and secondary metabolism, helps differentiate nitrogen deficiency from other environmental stresses, and, being unitless, facilitates comparisons across platforms and regions. Accordingly, NBI is regarded as an important parameter derived from fluorescence measurements [12,13] and a robust indicator for evaluating crop nitrogen status.

Recent research has increasingly focused on the NBI, which is derived from the Dualex^® Scientific chlorophyll fluorescence sensor (Force-A, Orsay, France). Huang et al. [14] used the Dualex to assess nitrogen status in cold-region rice at various growth stages, finding NBI to be highly sensitive to nitrogen status and particularly suitable for estimating Leaf Nitrogen Concentration, Plant Nitrogen Concentration, and Nitrogen Nutrition Index. Dong et al. [15] identified the Dualex 4 as an effective and promising tool for monitoring crop nitrogen status, reporting a strong correlation between maize NBI and nitrogen nutrition indices at multiple growth stages. Jiang et al. [16] compared the use of the SPAD-502 and the Dualex 4 Scientific+ for non-destructive diagnosis of nitrogen status in winter wheat. They found that the NBI measured with the Dualex 4 Scientific+ provided better estimates of five nitrogen status indicators: leaf nitrogen concentration, leaf nitrogen accumulation, plant nitrogen concentration, plant nitrogen accumulation, and the nitrogen nutrition index, than readings from the SPAD-502.

Although the Dualex device enables nondestructive detection of plant nitrogen status, it remains costly, time-consuming, and labor-intensive for large-scale crop nitrogen mapping. Therefore, remote sensing technology represents a promising alternative.

Most existing remote sensing research on NBI estimation is grounded in relating in situ spectroscopy to the Nitrogen Balance Index. Typically, spectral transformations, vegetation indices, and selected bands are used to develop predictive models applied to crops including maize, wheat, and melon [17,18,19]. However, these methodologies are insufficient for enabling large-scale monitoring of crop NBI. Recent advances in sensor technology have enabled the acquisition of satellite multispectral data with high spatial and temporal resolution, facilitating large-scale ground observation and targeted analyses.

The Sentinel-2 satellite constellation represents an important data source for monitoring crop biophysical variables. It consists of two satellites, Sentinel-2A (launched in 2016) and Sentinel-2B (launched in 2017), both equipped with identical Multispectral Instrument (MSI) sensors. These provide 13 spectral bands, including three red-edge bands centered at 705 nm, 740 nm, and 783 nm, which are particularly sensitive to crop nitrogen content [20]. Compared with other satellite sensors, Sentinel-2 offers higher spatial resolution (10 m), shorter revisit intervals (5 days), and free data accessibility.

Consequently, Sentinel-2 has been widely adopted for crop growth monitoring, including the assessment of chlorophyll content, biomass, leaf area index, water content, nutrient composition, and yield estimation [21,22,23,24,25,26]. Belgiu et al. [27] used Sentinel-2 imagery to predict the nutritional composition of major crops, including maize, soybean, rice, and wheat, demonstrating that remote sensing data can provide an economical, efficient, and spatially explicit representation of crop nutritional quality. Delloye et al. [28] employed Sentinel-2 imagery to invert green area index, chlorophyll content, and canopy chlorophyll content in winter wheat, thereby informing nitrogen management strategies. Jamali et al. [29] estimated wheat leaf area index, leaf dry weight, specific leaf area, and specific leaf weight using Sentinel-2 data combined with machine learning algorithms, achieving accurate and robust predictions. However, to date, few studies have integrated Sentinel-2 imagery with the NBI to assess nitrogen status in wheat.

Currently, machine learning algorithms have become mainstream in crop remote sensing research, as they effectively handle non-linear relationships among crop growth parameters and spectral reflectance data, can manage limited sample sizes and noise, and do not require the normality assumptions associated with linear algorithms [30]. Additionally, unlike traditional physical models, machine learning algorithms do not suffer from ill-posed inversion problems [31]. However, the prediction process of machine learning models is often opaque and difficult to interpret. Although tree-based models can provide feature importance scores, they do not reveal how each feature individually contributes to predictions or how each sample is processed, thus making them so-called “black-box” models [32].

Interpretability has become a prominent area of research in the development of machine learning and artificial intelligence, with the Shapley value emerging as a powerful tool for explaining complex relationships between multivariate features and predictive models [33]. SHAP, a game theory-based approach, enables post hoc global and local interpretation of black-box models, thereby improving the transparency and reliability of model predictions and providing valuable insights for data collection and experimental design [34]. Due to its relatively recent adoption, only a limited number of studies have applied SHAP to interpret prediction models in crop monitoring.

Despite substantial progress in research on crop NBI, several important issues remain to be addressed. First, it remains unclear which Sentinel-2 predictors, particularly red-edge and narrowband near-infrared features, provide consistent information for monitoring crop NBI. Second, the effects of common spatial preprocessing methods (for example, image resampling and super-resolution) and feature selection procedures on the accuracy and stability of different machine learning models have not been systematically evaluated. Third, model interpretability is often descriptive rather than transparent, gives insufficient weight to the relative contributions of feature groups, and is frequently conflated with causal explanation.

Therefore, this study utilizes Sentinel-2 satellite data to develop a Nitrogen Balance Index estimation model for winter wheat by extracting spectral and spatial features from imagery, with the objective of optimizing regional nitrogen field management. The main components of the study are as follows: (1) comparing the stability and performance of the original nearest neighbor resampled images with the Sen2Res super-resolution images; (2) using spectral reflectance, texture parameters, and vegetation indices as input features for machine learning models and applying correlation analysis and recursive feature elimination to select relevant variables and evaluate their influence on Nitrogen Balance Index estimation; and (3) training models with five machine learning algorithms, namely, ridge regression, partial least squares regression, support vector regression, extreme gradient boosting, and random forest regression, evaluating their performance and accuracy, as well as applying Shapley Additive Explanations to interpret the best performing model.

2. Materials and Methods

2.1. Overview of the Study Area

The study area is situated at the boundary between Wugong County (108°01′E–108°19′E, 34°12′N–34°26′N) and Fufeng County (107°45′E–108°03′E, 34°12′N–34°37′N) in Shaanxi Province, China. As shown in Figure 1, it is located in the western part of the Guanzhong Plain and is characterized mainly by loess terraces and Weihe River terraces. The region has a warm temperate, semi-humid continental monsoon climate with abundant sunlight and well-defined seasons. In Wugong County, the average annual precipitation is 633 mm, the mean temperature is 12.9 °C, and there are approximately 2095 h of sunshine annually. In Fufeng County, these values are 592 mm, 12.4 °C, and approximately 2134 h of sunshine annually.

The experimental area lies within a semi-arid, irrigated agricultural zone, where Lou soil is the predominant soil type. The primary crops are wheat and maize, which are grown in a double-cropping system each year. Winter wheat in the Guanzhong region of Shaanxi is typically sown from late September to early October and harvested in early June of the following year. Fertilization and field management practices for winter wheat adhere to local agricultural conventions. Field sampling sites include: A1 (Yingxi Village, Wugong County, 108°08′E, 34°32′N); A2 (Nie Village, Wugong County, 108°07′E, 34°33′N); A3 (Liangmaxiyao Village, Wugong County, 108°05′E, 34°34′N); A4 (Maxi Village, Fufeng County, 108°03′E, 34°33′N); and A5 (Juliang Village, Fufeng County, 108°01′E, 34°39′N).

2.2. Nitrogen Balance Index Acquisition

Based on the natural conditions and winter wheat cultivation patterns, plots characterized by flat terrain and extensive wheat coverage were selected as observation sites within each experimental zone. To minimize interference from mixed pixels in satellite imagery, the observation plots were situated away from roads, buildings, and ditches. The Nitrogen Balance Index (NBI) was measured using a Dualex Scientific+ instrument (FORCE-A, France). The instrument estimates leaf chlorophyll (Chl) from the leaf transmittance ratio between far red and near infrared wavelengths, infers flavonoids (Flav) and anthocyanins (Anth) by differencing the screening effect on near infrared chlorophyll fluorescence under alternate excitation (ultraviolet A versus red, green versus red), and then computes the Nitrogen Balance Index as NBI = Chl/Flav [35]. According to the manufacturer, the overall measurement error is approximately 5%, the typical sampling spot is about 19.6 mm², and a single acquisition requires less than 500 ms [36].

At each sampling point, six leaves were chosen from adjacent wheat plants exhibiting similar growth conditions. Measurements were taken three times per leaf, from the petiole to the tip, avoiding the main leaf veins. The mean value was used to represent NBI for each sampling point. As optimal light conditions are essential for image data acquisition and processing, data collection was conducted between 10:00 a.m. and 2:00 p.m. during clear, calm, and well-lit weather.

NBI data were gathered within three days following 9 April 2018. Due to variability in wheat sowing dates across the region, the observed growth stages ranged from booting to heading. For each 20 × 20 m plot, the mean NBI value from two diagonal sampling points was used to match the Sentinel-2 pixel resolution, and the geographic coordinates of the plot center were recorded. Across five large field-observation areas, 150 point measurements and 75 field level samples were collected. The NBI ranged from 17.10 to 36.79 (mean 27.14, standard deviation 3.68, coefficient of variation 13.56%). Figure 2 presents a schematic of a portion of the sampling area and a box plot of NBI sample distribution.

2.3. Sentinel-2 Image Processing

Sentinel-2 imagery was obtained from the European Space Agency (ESA) data portal (https://dataspace.copernicus.eu/), accessed on 6 December 2024. The onboard Multispectral Instrument (MSI) offers 13 spectral bands, ranging from the visible (VIS) to the shortwave infrared (SWIR) regions, with three spatial resolutions. Bands 1 (coastal aerosol), 9 (water vapor), and 10 (cirrus) have a spatial resolution of 60 m. The three red-edge (RE) bands (Bands 5, 6, and 7), the narrow NIR band (Band 8A), and the SWIR bands (Bands 11 and 12) are provided at 20 m resolution, while the remaining four bands (Bands 2, 3, 4, and 8) in the VIS and NIR regions offer 10 m resolution [37]. The image utilized for the regional experiment was a cloud-free Level-1C Sentinel-2A scene acquired on 9 April 2018, exhibiting excellent quality. The primary parameters of the Sentinel-2A spectral bands are presented in Table 1.

Level-1C Sentinel-2A imagery is radiometrically and geometrically corrected by default. Atmospheric correction was applied using the Sen2Cor plugin (Copernicus Data Center). The atmospherically corrected Level-2A images were resampled to a 10 m spatial resolution using both the nearest neighbor method and the Sen2Res super-resolution tool in SNAP. The Sen2Res super-resolution approach extracts details from the highest-resolution bands and transfers this information to lower-resolution bands, thus generating images in which all spectral bands attain the optimal resolution. Specifically, the method separates band-specific reflectance from shared geometric scene structure, then applies this model to unmix lower-resolution bands, thereby optimizing reflectance and sub-pixel characteristics [38].

To evaluate the performance of Sen2Res reconstruction, information entropy and mean gradient were computed for each band of both Sen2Res and nearest neighbor resampled images. This study utilized ten Sentinel-2A bands: B2, B3, and B4 (blue, green, and red visible spectrum); B5, B6, and B7 (red-edge); B8 and B8A (near-infrared); and B11 and B12 (shortwave infrared).

2.4. Selection of Vegetation Indices and Texture Parameters

Vegetation indices are mathematical combinations of different spectral bands designed to reduce soil background and environmental noise. Various indices can adapt to diverse and complex environmental conditions [39]. Therefore, 32 vegetation indices commonly used in crop monitoring were selected based on previous studies. These include two-band, three-band, and four-band indices. Additionally, the three red-edge bands and two near-infrared bands were substituted into various index formulas, resulting in the computation of 70 vegetation indices for analysis. Table 2 presents the selected vegetation indices and their corresponding Sentinel-2A band formulas.

The application of spatial texture features in crop parameter estimation has become a prominent research focus. In this study, second-order statistical filtering methods were applied. Second-order statistics are derived from the gray-level co-occurrence matrix (GLCM), which is a relative frequency matrix representing how often pairs of pixel values occur in adjacent windows separated by a specific distance and direction. The GLCM quantifies the frequency of relationships between a pixel and its defined spatial neighborhood [40]. Eight specific texture features were computed: Mean, Variance, Homogeneity, Contrast, Dissimilarity, Entropy, Second Moment, and Correlation. Texture parameters were calculated for 10 bands, resulting in a total of 80 texture features. As this study does not involve parameter standardization, default settings were used.

2.5. Methodology and Techniques

The machine learning models employed in this study include Ridge Regression (RIDGE), Partial Least Squares Regression (PLSR), Support Vector Regression (SVR), Extreme Gradient Boosting (XGBOOST), and Random Forest Regression (RFR).

RIDGE [41] is an enhanced form of linear regression that incorporates an L2 regularization term into the regression coefficients to address overfitting in linear models. PLSR [42] is another adapted linear regression technique, primarily used to address multicollinearity issues. SVR [43] maps input data into a high-dimensional space and seeks a hyperplane that best fits the data, ensuring that most data points fall within a defined margin near the plane. SVR offers several advantages, including strong noise resistance, global optimality, and effective handling of high-dimensional data. XGBOOST [44] is a tree-based ensemble learning algorithm that belongs to the Boosting family and is characterized by high speed, accuracy, and adaptability to diverse data types. RFR [45] is also a tree-based ensemble learning method built on the Bagging (Bootstrap Aggregating) approach. RFR can handle high dimensional data, is resistant to overfitting, estimates feature importance, and provides high prediction accuracy.

Shapley Additive Explanations (SHAP) was employed to interpret how the trained model uses each input feature under the current data distribution [34]. The method originates from cooperative game theory and decomposes the prediction for an individual sample into a baseline output plus a set of additive feature contributions, thereby indicating at the local level whether each feature acts positively or negatively on that sample’s prediction [46]. At the global level, aggregating absolute contributions across samples yields a stable ranking of feature importance [47]. SHAP summarizes how information is used within the model and is employed to improve model transparency and interpretability.

In this study, feature selection was conducted using the correlation coefficient method (CC) and recursive feature elimination (RFE). RFE [48] begins with all features and iteratively removes the least important ones until an optimal subset is obtained. This process reduces the number of features required and accelerates prediction. The dataset was randomly divided into a training set (70%) and an independent testing set (30%). Given the sample size, five-fold cross-validation was used to evaluate model accuracy, and grid search was applied to identify optimal hyperparameters for each model.

The prediction strategy involved comparing Sen2Res-enhanced super-resolution imagery with the original nearest neighbor resampled imagery to identify the optimal spectral and spatial information. Correlation analysis was performed between band reflectance, vegetation indices, spatial texture parameters, and the measured NBI. Features with a correlation coefficient above 0.5 at a highly significant level (p < 0.01) were retained. The selected single-band reflectance features were defined as feature set A. Single-band reflectance combined with texture parameters formed feature set B, while reflectance, texture, and vegetation indices together constituted feature set C. Feature sets A, B, and C were individually used as input variables for model training, with NBI as the target variable.

Given that a large number of variables may reduce model generalization and stability, a new prediction approach was proposed to address this issue. The first step remained consistent, while the second step applied RFE again to subset C for further refinement, resulting in a refined feature set D for model training.

Finally, the SHAP method was employed to interpret the best-performing model. In summary, four prediction strategies were implemented to obtain optimal results. All data processing and visualization were conducted using ENVI 5.3, ArcGIS 10.6, SNAP, and Python 3.8 within the Anaconda environment.

2.6. Model Evaluation

To evaluate the accuracy of the winter wheat NBI estimation model, the coefficient of determination (R²) and root mean square error (RMSE) were used to assess model reliability. R² indicates the degree of fit between the estimated and observed values, while RMSE reflects the dispersion between them. A higher R² value closer to 1 and a lower RMSE indicate better model learning and estimation ability. The formulas are as follows:

R^{2} = \frac{Σ_{ⅈ = 1}^{n} {({\hat{y}}_{i} - {\bar{y}}_{i})}^{2}}{Σ_{ⅈ = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}

(1)

R M S E = \sqrt{\frac{Σ_{ⅈ = 1}^{n} {({\hat{y}}_{i} - {\bar{y}}_{i})}^{2}}{n}}

(2)

where

y_{i}, {\hat{y}}_{i}, {\bar{y}}_{i}

and represent the observed values, predicted values, and the mean of observed values in the test set, respectively.

Table 2. Selected vegetation indices and their corresponding Sentinel-2A band formulas.

Vegetation Index	Full Name	Sentinel-2 Corresponding Formula	References
NDVI8or8A	Normalized Difference Vegetation Index	$(B 8 o r B 8 A - B 4) / (B 8 o r B 8 A + B 4)$	[49]
SR8or8A	Simple Ratio Index	$B 8 o r B 8 A / B 4$	[50]
EVI8or8A	Enhanced Vegetation Index	$2.5 ((B 8 o r B 8 A - B 4) / (B 8 o r B 8 A + 6 B 4 - 7.5 B 2 + 1))$	[51]
ARVI8or8A	Atmospherically Resistant Vegetation Index	$(B 8 o r B 8 A - 2 B 4 + B 2) / (B 8 o r B 8 A + 2 B 4 - B 2)$	[52]
RENDVI740or783or3	Red-Edge Normalized Difference Vegetation Index	$(B 6 o r B 7 - B 5 o r B 6) / (B 6 o r B 7 + B 5 o r B 6)$	[53]
mSR705or740	Modified Red-Edge Simple Ratio Index	$(B 6 - B 2) / (B 5 + B 2) o r (B 7 - B 2) / (B 6 + B 2)$	[54]
mNDVI705or740	Modified Red-Edge Normalized Difference Vegetation Index	$(B 6 o r B 7 - B 5 o r B 6) / (B 6 o r B 7 + B 5 o r B 6 - 2 B 2)$	[55]
VOG705or740or783	Vogelmann Red-Edge Index 1	$B 6 o r B 7 / B 5 o r B 6$	[56]
SIPI8or8A	Structure Insensitive Pigment Index	$(B 8 o r B 8 A - B 2) / (B 8 o r B 8 A + B 4)$	[57]
REDRG1	Red-Edge RG1	$(B 5 + B 6 + B 7) / 3 / B 2$	[58]
REDRG2	Red-Edge RG2	$(B 8 + B 8 A) / 2 / (B 5 + B 6 + B 7) / 3$	[58]
ARI1or1-1or1-2	Anthocyanin Reflectance Index 1	$(1 / B 3) - (1 / B 5 o r B 6 o r B 7)$	[59]
ARI2-1or2-2or2-3	Anthocyanin Reflectance Index 2	$B 8 (ARI 1 or 1 - 1 or 1 - 2)$	[59]
ARI2-4or2-5or2-6	Anthocyanin Reflectance Index 2	$B 8 A (ARI 1 or 1 - 1 or 1 - 2)$	[59]
CI705or740or783-B8	Red-edge Chlorophyll Index	$B 8 / B 5 o r B 6 o r B 7 - 1$	[60]
CI705or740or783-B8A	Red-edge Chlorophyll Index	$B 8 A / B 5 o r B 6 o r B 7 - 1$	[60]
DVI8or8A	Difference Vegetation Index	$B 8 o r B 8 A - B 4$	[61]
NRI	Nitrogen Reflectance Index	$(B 3 - B 4) / (B 3 + B 4)$	[62]
SAVI8or8A	Soil Adjusted Vegetation Index	$1.5 (B 8 o r B 8 A - B 4) / (B 8 o r B 8 A + B 4 + 0.5)$	[63]
TVI705or740or783	Triangular Vegetation Index	$0.5 (120 (B 5 o r B 6 o r B 7 - B 3) - 200 (B 4 - B 3))$	[64]
RVSI	Red-Edge Vegetation Stress Index	$(B 5 - B 7) / 2 - B 6$	[65]
MCARI705or740or783	Modified Chlorophyll Absorption Ratio Index	$(B 5 o r B 6 o r B 7 - B 4) - 0.2 (B 5 o r B 6 o r B 7 - B 3) \frac{B 5 o r B 6 o r B 7}{B 4}$	[66]
VARI	Visible Atmospherically Resistant Index	$(B 3 - B 4) / (B 3 + B 4 - B 2)$	[67]
TCARI705or740or783	Transformed Chlorophyll Absorption Ratio Index	$3 ((B 5 o r B 6 o r B 7 - B 4) - 0.2 (B 5 o r B 6 o r B 7 - B 3)) \frac{B 5 o r B 6 o r B 7}{B 4}$	[68]
OSARI8or8A	Optimization Of Soil Regulatory Vegetation Index	$1.16 (B 8 o r B 8 A - B 4) / (B 8 o r B 8 A + B 4 + 0.16)$	[69]
NPCI	Chlorophyll Normalized Vegetation Index	$(B 4 - B 2) / (B 4 + B 2)$	[70]
MTVI8or8A	Modified Triangular Vegetation Index	$1.5 (1.2 (B 8 o r B 8 A - B 3) - 2.5 (B 4 - B 3))$	[71]
IDVI8or8A	Inverted Difference Vegetation Index	$(1 + (B 8 o r B 8 A - B 4)) / (1 - (B 8 o r B 8 A - B 4))$	[72]
OSAVI2	Optimized Soil Adjusted Vegetation Index 2	$1.16 (B 6 - B 5) / (B 6 + B 5 + 0.16)$	[73]
MCARI2	Modified Chlorophyll Absorption Ratio Index 2	$(B 6 - B 5) - 0.2 (B 6 - B 3) \frac{B 6}{B 5}$	[73]
TCARI2	Transformed Chlorophyll Absorption Ratio Index 2	$3 ((B 6 - B 5) - 0.2 (B 6 - B 3)) \frac{B 6}{B 5}$	[73]
MSI8or8A	Moisture Stress Index	$B 11 / B 8 o r B 8 A$	[74]
NDII8or8A	Normalized Difference Infrared Index	$1.5 (B 8 o r B 8 A - B 11) / (B 8 o r B 8 A + B 11)$	[75]
LSWI8or8A	Land Surface Water Index	$(B 8 o r B 8 A - B 12) / (B 8 o r B 8 A + B 12)$	[76]

3. Results

3.1. Comparison Between Sen2Res and Original Images

To evaluate the effectiveness of the Sen2Res imagery, a qualitative comparison was performed between the super-resolved images and the original nearest-neighbor resampled images.

Figure 3 illustrates the comparison of spatial details between the original and enhanced images. All images were displayed using default settings without stretch adjustment. The original 10 m resolution image (Figure 3a) clearly distinguishes rivers, roads, buildings, and farmland within the study area. The enhanced images (Figure 3b,d) show high consistency with the original image in terms of spatial detail. In comparison to the original 20 m resolution images (Figure 3c,e), the enhanced images exhibit sharper object boundaries, richer spatial textures, clearer color gradation, and significantly improved contrast. Furthermore, the spectral variation between the enhanced and original images remains consistent across land cover types. The original 20 m images exhibit lower resolution and reduced clarity in object identification compared to the enhanced images.

In addition to the qualitative assessment, the study quantitatively evaluated image quality before and after enhancement. Table 3 presents the changes in information entropy and mean gradient across bands before and after enhancement. From the perspective of information theory, image information entropy reflects the richness of image information. A higher entropy value indicates greater information content and improved image quality. The mean gradient characterizes image sharpness; a higher gradient indicates more detail and clearer imagery. The Sen2Res method was applied to enhance the spatial resolution of selected images to 10 m. For the six original 20 m resolution bands (B5, B6, B7, B8A, B11, and B12), entropy increased by approximately 2.6 times, from around 10 to 26, and the mean gradient also improved, though to a lesser extent.

Overall, the enhanced imagery successfully increased the spatial resolution of the 20 m bands to 10 m while preserving the original spectral characteristics.

3.2. Analysis of Correlation Results

Correlation analysis was conducted between NBI and spectral reflectance, texture parameters, and vegetation indices to explore their relationships and extract relevant features. Figure 4, Figure 5 and Figure 6 show the correlations between NBI and spectral reflectance, texture parameters, and vegetation indices, respectively.

The correlation distribution of spectral reflectance for both enhanced and original images (Figure 4) was similar, displaying negative correlations in the visible and shortwave infrared bands and positive correlations in the near-infrared bands. The enhanced imagery outperformed the original, with correlation values improving by approximately 2%. Beyond the 705 nm band, correlation values increased by approximately 2.5 times. Among the 10 bands, 6 exhibited highly significant correlations. After enhancement, the red-edge B5 (704 nm) band also demonstrated a highly significant correlation. The band with the highest absolute correlation |R| was B8A (0.738), followed by the red-edge B7 band (0.716) and B8 (0.705); the remaining bands had R values of 0.542 (B6), 0.333 (B4), and 0.300 (B5).

Overall, the red to near-infrared bands exhibited stronger correlations than other bands. Notably, the SWIR bands B11 and B12 are not suitable for NBI estimation.

As shown in Figure 5, the correlation performance of texture parameters was generally poor. Among the 80 texture parameters, only four exhibited highly significant correlations, with enhanced images generally yielding better correlations than the original images for these parameters. The remaining 76 parameters showed no meaningful correlation with NBI; therefore, no clear comparison could be made between the enhanced and original images for those features. Among the eight types of texture features analyzed, only the “mean” parameter demonstrated a strong correlation. The highest correlation coefficient was R = 0.696 for B8A_M, followed by R = 0.681 for B8_M and R = 0.670 for B7_M. The remaining seven texture features were affected by substantial noise, which hindered their ability to characterize correlations with NBI. The bands with stronger correlations primarily belonged to the near-infrared spectral region.

Most of the selected vegetation indices exhibited highly significant correlations with NBI (Figure 6). Vegetation indices constructed using enhanced bands outperformed those constructed using the original bands in terms of correlation strength. Among all vegetation indices, 34 had absolute correlation coefficients |R| greater than 0.5. Of these, 19 had |R| values greater than 0.6. The top-performing indices in descending order were: IDVI8A (0.711), CI740-B8A (0.706), DVI8A (0.706), SAVI8A (0.698), MTVI8A (0.688), IDVI8 (0.685), DVI8 (0.682), VOG740 (0.678), SAVI8 (0.676), TVI783 (0.675), MTVI8 (0.668), EVI8A (0.665), mNDVI740 (0.661), RVSI (0.660), REDRG2 (0.658), EVI8 (0.650), CI740-B8 (0.649), and mSR740 (0.646).

These highly correlated vegetation indices included two-band, three-band, and four-band formulations. Most were constructed using bands B3, B4, B6, B7, B8, and B8A. Among indices using red-edge bands, those incorporating the 740 nm band exhibited superior correlation compared to the other two red-edge bands. The 783 nm band also demonstrated strong performance, while indices involving the 705 nm band exhibited weaker correlation with NBI. Indices constructed using the narrow 865 nm band yielded stronger correlations than those employing the broader 842 nm band.

In summary, the enhanced images exhibited a better response to NBI than the original images (Figure 4, Figure 5 and Figure 6). Spectral reflectance and vegetation indices performed well, whereas texture parameters exhibited a weaker response to NBI. Predictive factors with absolute correlation coefficients greater than 0.5 were selected. In total, 4 spectral bands, 3 texture parameters, and 34 vegetation indices were chosen (Table 4). Additionally, RFE was applied to further optimize the feature set.

3.3. Model Results

Models trained on datasets with different feature variables were evaluated on an independent test set. Dataset A consisted of 4 variables, dataset B included 7 variables, and dataset C contained 41 variables, as shown in Figure 7 and Figure 8. Table 5 lists the main hyperparameters for each model. The results indicated that for the single-band dataset A, the RFR model performed best, with R² = 0.64 and RMSE = 1.97. The worst-performing model was SVR, with R² = 0.57 and RMSE = 2.16. The performance of the XGBOOST model was comparable to that of RIDGE and PLSR, with R² around 0.60 and RMSE near 2.06. For datasets with a small number of variables from a single feature category, the accuracy difference between machine learning algorithms and traditional statistical regression methods was minimal.

In the case of dataset B, which included spectral bands and texture parameters, the addition of spatial information led to improved accuracy across all models. The improvements were particularly significant for the XGBOOST and RFR models, with an increase in R² of approximately 0.10 and a decrease in RMSE of about 0.30. The best performance was observed in the RFR model (R² = 0.75, RMSE = 1.63), followed by the XGBOOST model (R² = 0.73, RMSE = 1.70).

For dataset C, which combined spectral bands, texture features, and vegetation indices, the number of variables increased to 41 due to the inclusion of vegetation indices. However, the increased variable count did not lead to improved model accuracy. The performance of all models, except for SVR, declined; the SVR model maintained stable performance (R² = 0.62, RMSE = 2.02). Machine learning algorithms still outperformed traditional regression models, with XGBOOST and RFR remaining the top performers.

Excessive predictive variables introduce complex collinearity and substantial redundancy, impairing model interpretability. Therefore, RFE was employed to perform a second round of variable selection to reduce model complexity and improve interpretability. Through RFE, eight variables were extracted from the original 41: three spectral bands (B7, B8, B8A), two texture parameters (B7_M, B8A_M), and three vegetation indices (RENDVI3, ARI2-3, CI740-B8A), comprising the D dataset. With dataset D, model accuracy improved further among machine learning algorithms. The best-performing model, RFR, achieved an R² of 0.77 and RMSE of 1.57, followed by XGBOOST (R² = 0.76, RMSE = 1.63). PLSR and RIDGE models showed limited adaptability to dataset D, with minimal improvements in accuracy.

In terms of dataset comparison, the integration of CC and RFE for combining spectral, texture, and vegetation indices (dataset D) with RFR yielded the best results (CC-RFE-DRFR). The next best model was XGBOOST (CC-RFE-DXGB), and both algorithms also performed well on dataset B. Moreover, the 1:1 scatter plots of predicted versus observed values (Figure 8) displayed a more uniform sample distribution, a wider value range, and minimal underestimation or overestimation. Among the five predictive models, the two ensemble tree-based algorithms performed the best. The SVR model exhibited significant improvement with increased feature numbers and model tuning, while PLSR and RIDGE showed limited capability in handling complex information.

3.4. SHAP Analysis

To understand the prediction process of the model, this study employed the SHAP method to evaluate the predictors selected by the RFE algorithm, using the best-performing RFR and XGBOOST models. The results are presented in Figure 9 and Figure 10. Figure 9 shows the SHAP-based feature importance distribution, in which the bar length indicates the importance of each feature, determined by the mean absolute SHAP value across the entire prediction dataset. Figure 9a,b correspond to the RFR and XGBOOST models, respectively.

For the RFR model, spectral band B8A (865 nm) made the largest contribution, followed by B8 (842 nm) and RENDVI3, whereas B7 (783 nm) contributed the least. In the XGBOOST model, spectral band B8A (865 nm) and the texture parameter B7_M were the most important features, followed by RENDVI3 and CI740-B8A, with B8A_M having the lowest importance. In both models, the average SHAP value impact of B8A (865 nm) exceeded 0.8, indicating its strong importance for model prediction. The importance of the texture feature B7_M significantly increased in the XGBOOST model, suggesting that further investigation is needed to understand its role under different scenarios.

Figure 10 illustrates the cumulative influence of each feature on model output as well as the SHAP value distribution for individual features. Figure 10a,b correspond to the RFR model, while Figure 10c,d correspond to the XGBOOST model. In the line plots, each line represents a sample, revealing the effect of individual features on the prediction process. In the density plots, each point represents a sample, with colors representing feature values ranging from blue (low) to red (high). The vertical displacement of samples for a given feature reflects variability, where larger sample groups exhibit wider clusters.

The path plots for the RFR model display a more uniform distribution compared to XGBOOST. In contrast, the XGBOOST model exhibits some abnormally steep paths, suggesting considerable differences in sample sensitivity to specific features. In the RFR model (Figure 10b), most features demonstrate a clear pattern: low feature values correspond to negative effects, while high values correspond to positive effects, with relatively weak feature interactions. In XGBOOST (Figure 10d), important features exhibit relatively consistent influence; however, the sample distribution indicates significant nonlinear effects.

3.5. Spatial Distribution of NBI

The top four models in terms of performance (CC-RFE-DRFR, CC-RFE-DXGBOOST, CC-RFE-BRFR, CC-RFE-BXGBOOST) were selected to map the spatial distribution of NBI in the study area (Figure 11).

The overall NBI distribution patterns were consistent among models, with lower values observed in the central fan-shaped region and the southwest, as well as sporadic low-value plots in the northeast. These patterns may be attributed to differences in sowing time and variations in temperature and precipitation. In the Guanzhong rainfed agricultural region, drought conditions can influence both nitrogen production and transport in plants. No significant differences were found in the predicted distributions across models. Both overestimations and underestimations were observed, but the predicted values were generally consistent with ground observations (Table 6). This indicates that the method based on Sentinel-2 imagery can assist field managers in assessing nitrogen status and formulating more effective management strategies. This not only contributes to food security but also promotes sustainable agricultural development.

4. Discussion

Nitrogen deficiency reduces leaf chlorophyll content and increases polyphenol levels; therefore, NBI is a useful indicator of changes in crop nitrogen status [77]. Monitoring NBI during the early growth stages of winter wheat can guide fertilization and nitrogen application decisions, thereby improving yield, reducing resource waste, and minimizing agricultural environmental pollution. This study investigates the feasibility of using Sentinel-2 imagery to estimate NBI of winter wheat across large spatial scales.

4.1. Comparison Between Sen2Res Super-Resolved Images and Original Images

Traditional remote sensing satellite images often have low spatial resolution; super-resolution enhancement techniques can be used to improve image resolution. These techniques have been applied to improve land cover classification, monitor environmental and ecosystem changes, and support post-disaster assessment and emergency response [78,79,80,81,82]. Compared to multi-source image fusion methods for enhancing Sentinel-2 resolution, single-image super-resolution techniques can reduce fusion-induced errors and uncertainties [83]. The temporal resolution and imaging angle of a single sensor are more consistent, helping to maintain a low baseline of data source errors between the satellite and ground observations.

In this study, the Sen2Res super-resolution plugin in the Sentinel-2 processing software SNAP 7.0 was used to enhance 20 m resolution spectral bands to 10 m. Compared with the original nearest-neighbor resampled images, the enhanced images displayed finer spatial details, clearer land features (Figure 3), higher resolution, preserved spectral characteristics, and improved band information entropy and average gradient (Table 3), all indicating good image quality. This improvement can be attributed to the Sen2Res method, which begins with the highest-resolution bands, separates band reflectance information from scene components, and incrementally unmixed low-resolution bands to optimize reflectance and sub-pixel features. As a result, the method increases pixel counts, reduces distortion, and enhances object information, thereby improving the visual perception of the digital images.

A further comparison was made between parameters extracted from enhanced and original images in relation to NBI. The results (Figure 4, Figure 5 and Figure 6) showed that parameters derived from enhanced images—including spectral reflectance, spatial texture, and vegetation indices—were more strongly correlated with NBI than those from the original images, supporting the study’s conclusions. Zhang et al. [84] also achieved similar results using the SupReME super-resolution algorithm applied to Sentinel-2 imagery for summer maize monitoring.

The Sen2Res method used in this study is straightforward and easy to implement; however, the processing time is relatively long, requiring several hours to generate one image for the study area. This presents a limitation for large-scale remote sensing data processing. Currently, both reconstruction-based and deep learning-based approaches face similar challenges. Therefore, balancing super-resolution performance and computational efficiency remains a key direction for future optimization.

4.2. Response of Spectral, Textural, and Vegetation Indices to NBI

Spectral, textural, and vegetation indices have been widely used in previous studies [85,86,87] to assess crop parameters such as chlorophyll content, leaf area index, and biomass; in this study, we further examined their relationship with NBI of winter wheat (Figure 4, Figure 5 and Figure 6).

In the spectral domain, red-edge bands at 740 nm and 783 nm, as well as near-infrared bands at 842 nm and 865 nm, showed strong responses to NBI, whereas the visible spectrum performed poorly. This may be attributed to uneven refraction resulting from variations in leaf internal structure, thickness, moisture content, and canopy architecture, all of which reduce the effectiveness of visible spectral responses. In contrast, red-edge and near-infrared bands are less sensitive to such variations. It is generally accepted that the red-edge to near-infrared bands of satellites are crucial for retrieving crop parameters. The narrow near-infrared B8A band of Sentinel-2 showed better correlations than the broader B8 band, indicating that narrow bands can more accurately capture spectral information and detect subtle spectral differences. Belgiu et al. [27] also found that the narrow band B8A of Sentinel-2 outperformed the broad band B8.

At the 10 m resolution of Sentinel-2, texture features should be interpreted as indirect proxies for canopy structure at the field scale rather than as direct responses to nitrogen. In this study, texture was derived from GLCM metrics, whose variation primarily reflects row spacing and orientation, canopy cover, and the spatial heterogeneity of bare soil and shadow patches [88]. Nitrogen affects chlorophyll content, leaf area index, and the distribution of leaf angles, which in turn alter local spectral contrast and homogeneity; therefore, its linkage to texture is indirect and mediated [89]. Accordingly, texture is treated as a structural background descriptor that complements spectral reflectance and vegetation index information.

Although spatial texture parameters have been extensively used to retrieve crop parameters from UAV imagery, the eight texture parameters derived from the gray level co-occurrence matrix in this study showed low sensitivity to NBI across the ten studied bands. Only the mean value demonstrated relatively strong performance in the near-infrared band, whereas other texture parameters were not suitable for use with Sentinel-2 imagery. This may be due to a limited number of pixels at the field sampling scale and the impact of mixed pixels.

Given that larger sampling scales may unnecessarily increase labor and resource expenditure, priority should be placed on identifying spatial texture types better suited for satellite data, such as wavelet transform, Fourier transform, and Gabor filters [90]. Importantly, field boundaries, nontarget land cover, and soil background can weaken the robustness of the relationship between texture metrics and NBI. Therefore, satellite imagery with higher spatial resolution is a practical way to mitigate spatially distributed noise.

Different vegetation indices may perform variably across regions and growth stages. Most of the vegetation indices selected in this study exhibited strong responsiveness to NBI, demonstrating its adaptability for indicating variation in crop nitrogen content. Vegetation indices constructed using green, red-edge, and near-infrared bands performed especially well. Indices incorporating red-edge bands generally exhibited highly significant responses to NBI, further confirming that indices using two or more band combinations—particularly those based on red-edge bands—can substantially reduce interference from soil background, atmospheric effects, shadows, and canopy structure. This observation is consistent with previous research [30] and further confirms that other crop parameters share similar spectral response patterns with NBI.

Within spectral, texture, and vegetation index features, the red edge (705 to 783 nm) and the near infrared (842 nm and 865 nm) were dominant, consistent with their sensitivity to the chlorophyll absorption edge and to multilayer canopy scattering, as documented in numerous crop studies [58,91,92]. During peak growth, high chlorophyll and high leaf area index conditions cause parts of the visible bands and broadband vegetation indices to saturate, whereas red-edge vegetation indices retain sensitivity over a wider range [93]. At the 10 m pixel size, leaf-level noise is suppressed, and field structure is captured, but boundary noise is inevitably introduced; these limitations should be acknowledged.

4.3. Feature Selection and Machine Learning Algorithms

In the estimation of NBI, spectral reflectance, texture parameters, and vegetation indices serve as important feature sources; however, including too many features may introduce redundant information and noise, increasing model complexity and thereby reducing prediction accuracy and generalization ability. In this study, among the feature sets selected by correlation coefficients, the inclusion of all 41 features significantly deteriorated the model’s prediction performance, validating the need for feature selection prior to model training. Furthermore, the RFE method was applied to obtain an optimal and simplified feature subset (set D), comprising spectral, spatial texture, and vegetation index features. Feature set D achieved the best prediction performance.

RFR and XGBOOST exhibited different performance across datasets (Figure 7). This divergence reflects differences in sample distribution, sample size, and noise structure among datasets, and the two ensemble methods strike different balances between bias and variance. Moreover, the accuracy gap between linear methods (PLSR and RIDGE) and ensemble methods (RFR and XGBOOST) was not significant on datasets A and C, because a single predictive feature (dataset A) and an excessive number of redundant features (dataset C) both limit model learning: the former provides little exploitable signal, whereas the latter introduces noise that exceeds the models’ capacity to characterize the data.

In the SHAP analysis, the importance rankings in the RFR model are related to feature independence and interpretability. This suggests that the wavelengths at 865 nm (B8A) and 842 nm (B8) may represent highly independent features that have strong associations with the response variable. The XGBOOST model excels at handling complex relationships among features, and the increased contribution of B7_M suggests that it interacts with other features. Moreover, the more uniform distribution of feature contributions in XGBOOST indicates a reliance on the synergistic effect of multiple features. This also demonstrates that features extracted by the RFE algorithm possess not only strong independence but also significant interrelationships. The study by Singh et al. [94] indicated that wavelengths between 750 and 900 nm are typically used by tree-based models to predict plant nitrogen status, which is consistent with the findings of this study.

Across the four better performing models (Figure 11), the NBI prediction maps show divergent spatial patterns in the high NBI range. One likely cause is the sparsity of high-value samples in the dataset, which leads algorithms to follow different extrapolation paths at the upper tail and thereby amplifies local error and spatial heterogeneity. Tree based ensemble learning methods, such as RFR and XGBOOST, can capture mild nonlinearity, but when tail samples are scarce they are more sensitive to hyperparameters and stochasticity. Consequently, the differences at the high end in Figure 11 are more plausibly attributed to variation in extrapolation uncertainty under sparse conditions rather than to deficiencies in the information sources themselves.

Based on this reasoning, application oriented model selection should prioritize algorithms that are more stable across datasets and that exhibit smaller tail errors for extrapolative mapping of crop parameters. Moreover, when data reveal stable nonlinearity and sample coverage is sufficient, ensemble learning methods provide measurable gains.

The RFR algorithm consistently performed best across all feature sets, followed by the XGBOOST algorithm; both are tree-based ensemble algorithms. The XGBOOST algorithm rapidly constructs trees and leverages parallel computing, improving upon Gradient Boosting Machines (GBMs) through strong regularization and advanced boosting strategies. These features help mitigate overfitting and often enable XGBOOST to outperform other algorithms. The RFR algorithm performs well with fewer parameter settings and is widely used in related studies for its robustness and accuracy in predicting crop growth parameters.

The optimal model (CC-RFE-DRFR) was established by applying the RFR algorithm to feature set D, selected through correlation analysis and recursive feature elimination. This performance is attributed to RFR’s introduction of two layers of randomness and its ensemble learning approach, which enables the combination of weak learners into a strong one. Its insensitivity to data collinearity also enhances noise resistance and reduces the risk of overfitting.

5. Conclusions

This study is the first to attempt to directly predict the regional winter wheat Nitrogen Balance Index using Sentinel-2 data, utilizing image super-resolution comparison, feature construction, machine learning algorithms, and SHAP value-based model interpretation, subsequently followed by regional-scale predictions.

In this process, the built-in super-resolution method in SNAP was applied to enhance spectral quality, spatial texture, and vegetation index accuracy; however, spatial texture performance remained suboptimal, indicating that future research should consider more advanced super-resolution techniques. The optimal prediction strategy combined tree-based models with correlation analysis and recursive feature elimination. SHAP-based interpretability analysis showed that Random Forest Regressor emphasized variable independence and interpretability, whereas Extreme Gradient Boosting captured more complex interactions among variables.

The resulting prediction maps were generally consistent with observed field conditions, thus providing valuable data support for farmers to assess nitrogen status, improve soil quality during early winter wheat growth, prevent soil pollution, reduce input costs, and enhance yield.

Author Contributions

Conceptualization, B.S. and Q.C.; methodology, B.S.; software, B.S., Y.G., L.L. and P.L.; validation, X.C. and Y.G.; formal analysis, L.L. and P.L.; investigation, B.S.; resources, Q.C.; data curation, B.S.; writing—original draft preparation, B.S.; writing—review and editing, X.C.; visualization, Y.G.; supervision, Q.C.; project administration, Q.C.; funding acquisition, Q.C. All authors have read and agreed to the published version of the manuscript. We confirm that none of the material in this manuscript has been published or is under consideration for publication elsewhere.

Funding

This research was funded by the National Natural Science Foundation of China (Grant Nos. 41701398 and 42071240). The authors appreciate the anonymous reviewers for constructive comments on our manuscript.

Data Availability Statement

Data will be made available on request.

Acknowledgments

We would like to thank all the students in Chang’s team for collecting the data for us.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Shiferaw, B.; Smale, M.; Braun, H.-J.; Duveiller, E.; Reynolds, M.; Muricho, G. Crops that feed the world 10. Past successes and future challenges to the role played by wheat in global food security. Food Secur. 2013, 5, 291–317. [Google Scholar] [CrossRef]
Fu, Y.; Yang, G.; Li, Z.; Li, H.; Li, Z.; Xu, X.; Song, X.; Zhang, Y.; Duan, D.; Zhao, C.; et al. Progress of hyperspectral data processing and modelling for cereal crop nitrogen monitoring. Comput. Electron. Agric. 2020, 172, 105321. [Google Scholar] [CrossRef]
Abdel-Rahman, E.M.; Ahmed, F.B.; Van Den Berg, M. Imaging spectroscopy for estimating sugarcane leaf nitrogen concentration. In Proceedings of the SPIE Remote Sensing, Cardiff, UK, 15–18 September 2008; Neale, C.M.U., Owe, M., D’Urso, G., Eds.; SPIE: Bellingham, WA, USA, 2008; Volume 7104, p. 71040V. [Google Scholar] [CrossRef]
Andrews, M.; Raven, J.A.; Lea, P.J. Do plants need nitrate? The mechanisms by which nitrogen form affects plants. Ann. Appl. Biol. 2013, 163, 174–199. [Google Scholar] [CrossRef]
Soltanikazemi, M.; Minaei, S.; Shafizadeh-Moghadam, H.; Mahdavian, A. Field-scale estimation of sugarcane leaf nitrogen content using vegetation indices and spectral bands of Sentinel-2: Application of random forest and support vector regression. Comput. Electron. Agric. 2022, 200, 107130. [Google Scholar] [CrossRef]
Swarbreck, S.M.; Wang, M.; Wang, Y.; Kindred, D.; Sylvester-Bradley, R.; Shi, W.; Varinderpal-Singh; Bentley, A.R.; Griffiths, H. A roadmap for lowering crop nitrogen requirement. Trends Plant Sci. 2019, 24, 892–904. [Google Scholar] [CrossRef] [PubMed]
Cerovic, Z.G.; Ounis, A.; Cartelat, A.; Latouche, G.; Goulas, Y.; Meyer, S.; Moya, I. The use of chlorophyll fluorescence excitation spectra for the non-destructive in situ assessment of UV-absorbing compounds in leaves. Plant Cell Environ. 2002, 25, 1663–1676. [Google Scholar] [CrossRef]
Shi, P.; Wang, Y.; Yin, C.; Fan, K.; Qian, Y.; Chen, G. Mitigating Saturation Effects in Rice Nitrogen Estimation Using Dualex Measurements and Machine Learning. Front. Plant Sci. 2024, 15, 1518272. [Google Scholar] [CrossRef]
Li, J.; Zhang, F.; Qian, X.; Zhu, Y.; Shen, G. Quantification of rice canopy nitrogen balance index with digital imagery from unmanned aerial vehicle. Remote Sens. Lett. 2015, 6, 183–189. [Google Scholar] [CrossRef]
Tremblay, N.; Wang, Z.; Cerovic, Z.G. Sensing crop nitrogen status with fluorescence indicators: A review. Agron. Sustain. Dev. 2012, 32, 451–464. [Google Scholar] [CrossRef]
Cartelat, A.; Cerovic, Z.G.; Goulas, Y.; Meyer, S.; Lelarge, C.; Prioul, J.L.; Barbottin, A.; Jeuffroy, M.H.; Gate, P.; Agati, G.; et al. Optically assessed contents of leaf polyphenolics and chlorophyll as indicators of nitrogen deficiency in wheat (Triticum aestivum L.). Field Crops Res. 2005, 91, 35–49. [Google Scholar] [CrossRef]
Ali, M.M.; Al Ani, A.; Eamus, D.; Tan, D.K.Y. Leaf nitrogen determination using non destructive techniques—A review. J. Plant Nutr. 2017, 40, 928–953. [Google Scholar] [CrossRef]
Tao, W.; Chen, Q.; Li, W.; Gao, S.; Li, J.; Wang, Y.; Ahmad, S.; Ding, Y.; Li, G. Optimizing rice yield: Evaluating the nitrogen supply characteristics of slow and controlled release fertilizers using the leaf nitrogen balance index. J. Integr. Agric. 2024, 23, S2095311924000820. [Google Scholar] [CrossRef]
Huang, S.; Miao, Y.; Yuan, F.; Cao, Q.; Ye, H.; Lenz-Wiedemann, V.I.S.; Bareth, G. In-season diagnosis of rice nitrogen status using proximal fluorescence canopy sensor at different growth stages. Remote Sens. 2019, 11, 1847. [Google Scholar] [CrossRef]
Dong, R.; Miao, Y.; Wang, X.; Chen, Z.; Yuan, F. Improving maize nitrogen nutrition index prediction using leaf fluorescence sensor combined with environmental and management variables. Field Crops Res. 2021, 269, 108180. [Google Scholar] [CrossRef]
Jiang, J.; Wang, C.; Wang, H.; Fu, Z.; Cao, Q.; Tian, Y.; Zhu, Y.; Cao, W.; Liu, X. Evaluation of three portable optical sensors for non-destructive diagnosis of nitrogen status in winter wheat. Sensors 2021, 21, 5579. [Google Scholar] [CrossRef]
Fan, K.; Li, F.; Chen, X.; Li, Z.; Mulla, D. Nitrogen balance index prediction of winter wheat by canopy hyperspectral transformation and machine learning. Remote Sens. 2022, 14, 3504. [Google Scholar] [CrossRef]
Gabriel, J.L.; Zarco-Tejada, P.J.; López-Herrera, P.J.; Pérez-Martín, E.; Alonso-Ayuso, M.; Quemada, M. Airborne and ground level sensors for monitoring nitrogen status in a maize crop. Biosyst. Eng. 2017, 160, 124–133. [Google Scholar] [CrossRef]
Padilla, F.M.; Peña-Fleitas, M.T.; Gallardo, M.; Thompson, R.B. Evaluation of optical sensor measurements of canopy reflectance and of leaf flavonols and chlorophyll contents to assess crop nitrogen status of muskmelon. Eur. J. Agron. 2014, 58, 39–52. [Google Scholar] [CrossRef]
Segarra, J.; Buchaillot, M.L.; Araus, J.L.; Kefauver, S.C. Remote sensing for precision agriculture: Sentinel-2 improved features and applications. Agronomy 2020, 10, 641. [Google Scholar] [CrossRef]
Clevers, J.; Kooistra, L.; Van Den Brande, M. Using Sentinel-2 data for retrieving LAI and leaf and canopy chlorophyll content of a potato crop. Remote Sens. 2017, 9, 405. [Google Scholar] [CrossRef]
Hassanpour, R.; Majnooni-Heris, A.; Fard, A.F.; Tasumi, M. Spatio-temporal monitoring of plant water status using optical remote sensing data and in situ measurements. Adv. Space Res. 2024, 74, 4688–4704. [Google Scholar] [CrossRef]
Hunt, M.L.; Blackburn, G.A.; Carrasco, L.; Redhead, J.W.; Rowland, C.S. High resolution wheat yield mapping using Sentinel-2. Remote Sens. Environ. 2019, 233, 111410. [Google Scholar] [CrossRef]
Rossi, M.; Candiani, G.; Nutini, F.; Gianinetto, M.; Boschetti, M. Sentinel-2 estimation of CNC and LAI in rice cropping system through hybrid approach modelling. Eur. J. Remote Sens. 2023, 56, 2117651. [Google Scholar] [CrossRef]
Sharifi, A. Using Sentinel-2 data to predict nitrogen uptake in maize crop. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 2656–2662. [Google Scholar] [CrossRef]
Weiss, M.; Jacob, F.; Duveiller, G. Remote sensing for agricultural applications: A meta-review. Remote Sens. Environ. 2020, 236, 111402. [Google Scholar] [CrossRef]
Belgiu, M.; Marshall, M.; Boschetti, M.; Pepe, M.; Stein, A.; Nelson, A. PRISMA and Sentinel-2 spectral response to the nutrient composition of grains. Remote Sens. Environ. 2023, 292, 113567. [Google Scholar] [CrossRef]
Delloye, C.; Weiss, M.; Defourny, P. Retrieval of the canopy chlorophyll content from Sentinel-2 spectral bands to estimate nitrogen uptake in intensive winter wheat cropping systems. Remote Sens. Environ. 2018, 216, 245–261. [Google Scholar] [CrossRef]
Jamali, M.; Soufizadeh, S.; Yeganeh, B.; Emam, Y. Wheat leaf traits monitoring based on machine learning algorithms and high-resolution satellite imagery. Ecol. Inform. 2023, 74, 101967. [Google Scholar] [CrossRef]
Kganyago, M.; Adjorlolo, C.; Mhangara, P.; Tsoeleng, L. Optical remote sensing of crop biophysical and biochemical parameters: An overview of advances in sensor technologies and machine learning algorithms for precision agriculture. Comput. Electron. Agric. 2024, 218, 108730. [Google Scholar] [CrossRef]
Baret, F.; Houles, V.; Guerif, M. Quantification of plant stress using remote sensing observations and crop models: The case of nitrogen management. J. Exp. Bot. 2006, 58, 869–880. [Google Scholar] [CrossRef]
Houborg, R.; McCabe, M.F. A Cubesat enabled Spatio-Temporal Enhancement Method (CESTEM) utilizing Planet, Landsat and MODIS data. Remote Sens. Environ. 2018, 209, 211–226. [Google Scholar] [CrossRef]
Brenning, A. Interpreting machine-learning models in transformed feature space with an application to remote-sensing classification. Mach. Learn. 2023, 112, 3455–3471. [Google Scholar] [CrossRef]
Albinet, F.; Peng, Y.; Eguchi, T.; Smolders, E.; Dercon, G. Prediction of exchangeable potassium in soil through mid-infrared spectroscopy and deep learning: From prediction to explainability. Artif. Intell. Agric. 2022, 6, 230–241. [Google Scholar] [CrossRef]
Neugart, S.; Tobler, M.A.; Barnes, P.W. Rapid Adjustment in Epidermal UV Sunscreen: Comparison of Optical Measurement Techniques and Response to Changing Solar UV Radiation Conditions. Physiol. Plant. 2021, 173, 725–735. [Google Scholar] [CrossRef] [PubMed]
Dong, T.; Shang, J.; Qian, B.; Liu, J.; Ma, B.; Kovacs, J.M.; Walters, D.; Jiao, X.; Geng, X.; Shi, Y. Assessment of Portable Chlorophyll Meters for Measuring Crop Leaf Chlorophyll Concentration. Remote Sens. 2019, 11, 2706. [Google Scholar] [CrossRef]
European Space Agency (ESA). Sentinel-2 User Handbook; ESA Standard Document; ESA: Paris, France, 2015; pp. 1–64. [Google Scholar]
Brodu, N. Super-resolving multiresolution images with band-independent geometry of multispectral pixels. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4610–4617. [Google Scholar] [CrossRef]
Araus, J.L.; Kefauver, S.C.; Vergara-Díaz, O.; Gracia-Romero, A.; Rezzouk, F.Z.; Segarra, J.; Buchaillot, M.L.; Chang-Espino, M.; Vatter, T.; Sanchez-Bragado, R.; et al. Crop phenotyping in a context of global change: What to measure and how to do it. J. Integr. Plant Biol. 2022, 64, 592–618. [Google Scholar] [CrossRef]
Hlatshwayo, S.T.; Mutanga, O.; Lottering, R.T.; Kiala, Z.; Ismail, R. Mapping forest above-ground biomass in the reforested Buffelsdraai landfill site using texture combinations computed from SPOT-6 pan-sharpened imagery. Int. J. Appl. Earth Obs. Geoinf. 2019, 74, 65–77. [Google Scholar] [CrossRef]
Vigneau, E.; Devaux, M.F.; Qannari, E.M.; Robert, P. Principal component regression, ridge regression and ridge principal component regression in spectroscopy calibration. J. Chemom. 1997, 11, 239–249. [Google Scholar] [CrossRef]
Wold, S.; Sjöström, M.; Eriksson, L. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
Awad, M.; Khanna, R. Support vector regression. In Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers; Apress: Berkeley, CA, USA, 2015; pp. 67–80. [Google Scholar]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
Schonlau, M.; Zou, R.Y. The random forest algorithm for statistical learning. Stata J. 2020, 20, 3–29. [Google Scholar] [CrossRef]
Aas, K.; Jullum, M.; Løland, A. Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artif. Intell. 2021, 298, 103502. [Google Scholar] [CrossRef]
Štrumbelj, E.; Kononenko, I. Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 2014, 41, 647–665. [Google Scholar] [CrossRef]
Darst, B.F.; Malecki, K.C.; Engelman, C.D. Using recursive feature elimination in random forest to account for correlated variables in high dimensional data. BMC Genet. 2018, 19, 65. [Google Scholar] [CrossRef] [PubMed]
Tucker, C.J. Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens. Environ. 1979, 8, 127–150. [Google Scholar] [CrossRef]
Jordan, C.F. Derivation of leaf-area index from quality of light on the forest floor. Ecology 1969, 50, 663–666. [Google Scholar] [CrossRef]
Miura, T.; Huete, A.R.; Yoshioka, H. Evaluation of sensor calibration uncertainties on vegetation indices for MODIS. IEEE Trans. Geosci. Remote Sens. 2000, 38, 1399–1409. [Google Scholar] [CrossRef]
Kaufman, Y.J.; Tanré, D. Atmospherically resistant vegetation index (ARVI) for EOS-MODIS. IEEE Trans. Geosci. Remote Sens. 1992, 30, 261–270. [Google Scholar] [CrossRef]
Gitelson, A.A.; Merzlyak, M.N. Remote estimation of chlorophyll content in higher plant leaves. Int. J. Remote Sens. 1997, 18, 2691–2697. [Google Scholar] [CrossRef]
Chen, J.M. Evaluation of vegetation indices and a modified simple ratio for boreal applications. Can. J. Remote Sens. 1996, 22, 229–242. [Google Scholar] [CrossRef]
Sims, D.A.; Gamon, J.A. Relationships between leaf pigment content and spectral reflectance across a wide range of species, leaf structures and developmental stages. Remote Sens. Environ. 2002, 81, 337–354. [Google Scholar] [CrossRef]
Vogelmann, J.E.; Rock, B.N.; Moss, D.M. Red edge spectral measurements from sugar maple leaves. Int. J. Remote Sens. 1993, 14, 1563–1575. [Google Scholar] [CrossRef]
Peñuelas, J.; Baret, F.; Filella, I. Semi-empirical indices to assess carotenoids/chlorophyll a ratio from leaf spectral reflectance. Photosynthetica 1995, 31, 221–230. [Google Scholar]
Frampton, W.J.; Dash, J.; Watmough, G.; Milton, E.J. Evaluating the capabilities of Sentinel-2 for quantitative estimation of biophysical variables in vegetation. ISPRS J. Photogramm. Remote Sens. 2013, 82, 83–92. [Google Scholar] [CrossRef]
Gitelson, A.A.; Merzlyak, M.N.; Chivkunova, O.B. Optical properties and nondestructive estimation of anthocyanin content in plant leaves. Photochem. Photobiol. 2001, 74, 38–45. [Google Scholar] [CrossRef]
Gitelson, A.A.; Viña, A.; Ciganda, V.; Rundquist, D.C.; Arkebauer, T.J. Remote estimation of canopy chlorophyll content in crops. Geophys. Res. Lett. 2005, 32, 2005GL022688. [Google Scholar] [CrossRef]
Richardson, A.J.; Wiegand, C.L. Distinguishing vegetation from soil background information. Photogramm. Eng. Remote Sens. 1977, 43, 1541–1552. [Google Scholar]
Bausch, W.C.; Duke, H. Remote sensing of plant nitrogen status in corn. Trans. ASAE 1996, 39, 1869–1875. [Google Scholar] [CrossRef]
Huete, A.R. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
Broge, N.H.; Leblanc, E. Comparing prediction power and stability of broadband and hyperspectral vegetation indices for estimation of green leaf area index and canopy chlorophyll density. Remote Sens. Environ. 2001, 76, 156–172. [Google Scholar] [CrossRef]
Merton, R.; Huntington, J. Early simulation results of the ARIES-1 satellite sensor for multi-temporal vegetation research derived from AVIRIS. In Proceedings of the Eighth Annual JPL Airborne Earth Science Workshop, Pasadena, CA, USA, 9–11 February 1999; Jet Propulsion Laboratory: Pasadena, CA, USA, 1999; pp. 9–11. [Google Scholar]
Daughtry, C. Estimating corn leaf chlorophyll concentration from leaf and canopy reflectance. Remote Sens. Environ. 2000, 74, 229–239. [Google Scholar] [CrossRef]
Gitelson, A.A.; Kaufman, Y.J.; Stark, R.; Rundquist, D. Novel algorithms for remote estimation of vegetation fraction. Remote Sens. Environ. 2002, 80, 76–87. [Google Scholar] [CrossRef]
Haboudane, D.; Miller, J.R.; Tremblay, N.; Zarco-Tejada, P.J.; Dextraze, L. Integrated narrow-band vegetation indices for prediction of crop chlorophyll content for application to precision agriculture. Remote Sens. Environ. 2002, 81, 416–426. [Google Scholar] [CrossRef]
Rondeaux, G.; Steven, M.; Baret, F. Optimization of soil-adjusted vegetation indices. Remote Sens. Environ. 1996, 55, 95–107. [Google Scholar] [CrossRef]
Clay, D.E.; Kim, K.; Chang, J.; Clay, S.A.; Dalsted, K. Characterizing water and nitrogen stress in corn using remote sensing. Agron. J. 2006, 98, 579–587. [Google Scholar] [CrossRef]
Haboudane, D. Hyperspectral vegetation indices and novel algorithms for predicting green LAI of crop canopies: Modeling and validation in the context of precision agriculture. Remote Sens. Environ. 2004, 90, 337–352. [Google Scholar] [CrossRef]
Sun, Y.; Ren, H.; Zhang, T.; Zhang, C.; Qin, Q. Crop leaf area index retrieval based on inverted difference vegetation index and NDVI. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1662–1666. [Google Scholar] [CrossRef]
Wu, C.; Niu, Z.; Tang, Q.; Huang, W. Estimating chlorophyll content from hyperspectral vegetation indices: Modeling and validation. Agric. For. Meteorol. 2008, 148, 1230–1241. [Google Scholar] [CrossRef]
Ceccato, P.; Flasse, S.; Tarantola, S.; Jacquemoud, S.; Grégoire, J.-M. Detecting vegetation leaf water content using reflectance in the optical domain. Remote Sens. Environ. 2001, 77, 22–33. [Google Scholar] [CrossRef]
Ji, L.; Zhang, L.; Wylie, B.K.; Rover, J. On the terminology of the spectral vegetation index (NIR − SWIR)/(NIR + SWIR). Int. J. Remote Sens. 2011, 32, 6901–6909. [Google Scholar] [CrossRef]
Chandrasekar, K.; Sesha Sai, M.; Roy, P.; Dwevedi, R. Land surface water index (LSWI) response to rainfall and NDVI using the MODIS vegetation index product. Int. J. Remote Sens. 2010, 31, 3987–4005. [Google Scholar] [CrossRef]
Padilla, F.M.; Peña-Fleitas, M.T.; Gallardo, M.; Thompson, R.B. Proximal optical sensing of cucumber crop N status using chlorophyll fluorescence indices. Eur. J. Agron. 2016, 73, 83–97. [Google Scholar] [CrossRef]
Gunjan, V.K.; Senatore, S.; Kumar, A.; Gao, X.-Z.; Merugu, S. (Eds.) Advances in Cybernetics, Cognition, and Machine Learning for Communication Technologies; Lecture Notes in Electrical Engineering; Springer: Singapore, 2020. [Google Scholar] [CrossRef]
Huang, Y.; Wen, X.; Gao, Y.; Zhang, Y.; Lin, G. Tree species classification in UAV remote sensing images based on super-resolution reconstruction and deep learning. Remote Sens. 2023, 15, 2942. [Google Scholar] [CrossRef]
Rohith, G.; Kumar, L.S. Paradigm shifts in super-resolution techniques for remote sensing applications. Vis. Comput. 2021, 37, 1965–2008. [Google Scholar] [CrossRef]
Wang, P.; Bayram, B.; Sertel, E. A comprehensive review on deep learning based remote sensing image super-resolution methods. Earth Sci. Rev. 2022, 232, 104110. [Google Scholar] [CrossRef]
Zhang, B.; Ma, M.; Wang, M.; Hong, D.; Yu, L.; Wang, J.; Gong, P.; Huang, X. Enhanced resolution of FY4 remote sensing visible spectrum images utilizing super-resolution and transfer learning techniques. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 7391–7399. [Google Scholar] [CrossRef]
Chen, H.; He, X.; Qing, L.; Wu, Y.; Ren, C.; Sheriff, R.E.; Zhu, C. Real-world single image super-resolution: A brief review. Inf. Fusion 2022, 79, 124–145. [Google Scholar] [CrossRef]
Zhang, M.; Su, W.; Fu, Y.; Zhu, D.; Xue, J.-H.; Huang, J.; Wang, W.; Wu, J.; Yao, C. Super-resolution enhancement of Sentinel-2 image for retrieving LAI and chlorophyll content of summer corn. Eur. J. Agron. 2019, 111, 125938. [Google Scholar] [CrossRef]
Chen, X.; Li, F.; Chang, Q.; Miao, Y.; Yu, K. Improving winter wheat plant nitrogen concentration prediction by combining proximal hyperspectral sensing and weather information with machine learning. Comput. Electron. Agric. 2025, 232, 110072. [Google Scholar] [CrossRef]
Guo, Y.; Jiang, S.; Miao, H.; Song, Z.; Yu, J.; Guo, S.; Chang, Q. Ground-based hyperspectral estimation of maize leaf chlorophyll content considering phenological characteristics. Remote Sens. 2024, 16, 2133. [Google Scholar] [CrossRef]
Zhang, Y.; Ta, N.; Guo, S.; Chen, Q.; Zhao, L.; Li, F.; Chang, Q. Combining spectral and textural information from UAV RGB images for leaf area index monitoring in kiwifruit orchard. Remote Sens. 2022, 14, 1063. [Google Scholar] [CrossRef]
Iqbal, N.; Mumtaz, R.; Shafi, U.; Zaidi, S.M.H. Gray Level Co-Occurrence Matrix (GLCM) Texture Based Crop Classification Using Low Altitude Remote Sensing Platforms. PeerJ Comput. Sci. 2021, 7, e536. [Google Scholar] [CrossRef]
Darvishzadeh, R.; Skidmore, A.; Abdullah, H.; Cherenet, E.; Ali, A.; Wang, T.; Nieuwenhuis, W.; Heurich, M.; Vrieling, A.; O’Connor, B.; et al. Mapping Leaf Chlorophyll Content from Sentinel-2 and RapidEye Data in Spruce Stands Using the Invertible Forest Reflectance Model. Int. J. Appl. Earth Obs. Geoinf. 2019, 79, 58–70. [Google Scholar] [CrossRef]
Humeau-Heurtier, A. Texture feature extraction methods: A survey. IEEE Access 2019, 7, 8975–9000. [Google Scholar] [CrossRef]
Misra, G.; Cawkwell, F.; Wingler, A. Status of Phenological Research Using Sentinel-2 Data: A Review. Remote Sens. 2020, 12, 2760. [Google Scholar] [CrossRef]
Gao, S.; Zhong, R.; He, Q.; Yan, K.; Ma, X.; Chen, X.; Pu, J.; Gao, S.; Qi, J.; Yin, G.; et al. Evaluating the Saturation Effect of Vegetation Indices in Forests Using 3D Radiative Transfer Simulations and Satellite Observations. Remote Sens. Environ. 2023, 295, 113665. [Google Scholar] [CrossRef]
Tesfaye, A.A.; Awoke, B.G. Evaluation of the Saturation Property of Vegetation Indices Derived from Sentinel-2 in Mixed Crop–Forest Ecosystem. Spatial Inf. Res. 2021, 29, 109–121. [Google Scholar] [CrossRef]
Singh, H.; Roy, A.; Setia, R.; Pateriya, B. Estimation of nitrogen content in wheat from proximal hyperspectral data using machine learning and explainable artificial intelligence (XAI) approach. Model. Earth Syst. Environ. 2022, 8, 2505–2511. [Google Scholar] [CrossRef]

Figure 1. Overview map of the study area.

Figure 2. Schematic diagrams of partial sample collection in the study area and the boxplot of sample data distribution. Panels (a,c,d,f) show partial sampling locations. Panel (b) presents the boxplot of sample data distribution. Panel (e) displays the measurement process using the Dualex Scientific+ instrument.

Figure 3. Comparison between original and Sen2Res-enhanced images. (a) Original 10 m true-color image with R/G/B = Band 4/Band 3/Band 2; (b,c) represent enhanced and original false-color images with R/G/B = Band 12/Band 11/Band 5, respectively; (d,e) represent enhanced and original false-color images with R/G/B = Band 8A/Band 6/Band 5, respectively.

Figure 4. Correlation distribution between spectral reflectance and NBI.

Figure 5. Correlation distribution between texture parameters and NBI. M, V, H, C, D, E, S, and Co represent Mean, Variance, Homogeneity, Contrast, Dissimilarity, Entropy, Second Moment, and Correlation, respectively. The same abbreviations apply in subsequent figures.

Figure 6. Distribution of the correlation between vegetation indices and NBI.

Figure 7. Model performance results. (a) R² values of different models; (b) RMSE values of different models.

Figure 8. 1:1 distribution diagram of predicted and Measured values.

Figure 9. SHAP feature importance distribution. (a) random forest regression (RFR) model; (b) extreme gradient boosting (XGBOOST) model.

Figure 10. Feature paths and SHAP value distributions for specific features. The y-axis represents feature importance ranked from highest to lowest; in (a,c), the x-axis denotes the range of model predictions, while in (b,d), the x-axis represents the SHAP values contributing to the model output.

Figure 11. Distribution maps of NBI. (a) corresponds to the CC-RFE-DRFR method, (b) to the CC-RFE-DXGBOOST method, (c) to the CC-RFE-BRFR method, and (d) to the CC-RFE-BXGBOOST method.

Table 1. Main spectral band parameters of the Sentinel-2A satellite.

Band	Center Wavelength (nm)	Bandwidth (nm)	Spatial Resolution (m)
B1-Coastal aerosol	443	20	60
B2-Blue	490	65	10
B3-Green	560	35	10
B4-Red	665	30	10
B5-Red edge1	705	15	20
B6-Red edge2	740	15	20
B7-Red edge3	783	20	20
B8-NIR	842	145	10
B8a-NIR narrow	865	20	20
B9-Water vapor	945	20	60
B10-Cirrus	1375	30	60
B11-SWIR1	1610	90	20
B12-SWIR2	2190	180	20

Table 3. Changes in information entropy and mean gradient of each band before and after image enhancement.

Metric	Image Type	B2	B3	B4	B5	B6	B7	B8	B8A	B11	B12
Entropy	Sen2Res	9.923	9.906	10.281	26.845	26.845	26.845	10.416	26.845	26.845	26.845
Entropy	Linearest	9.923	9.906	10.281	10.697	10.999	11.562	10.416	11.585	11.014	11.194
Mean Gradient	Sen2Res	0.009	0.010	0.015	0.016	0.013	0.015	0.020	0.017	0.014	0.016
Mean Gradient	Linearest	0.009	0.010	0.015	0.010	0.010	0.014	0.020	0.014	0.011	0.013

Table 4. Selected predictive factors.

Factor Type	Factor Name
Spectral band	B6, B7, B8, B8A
Texture parameter	B7_M, B8_M, B8A_M
Vegetation index	SR8A, EVI8, EVI8A, RENDVI783, RENDVI3, mSR740, mNDVI740, VOG740, VOG783, SIPI8A, REDRG2, ARI2-2, ARI2-3, ARI2-5, ARI2-6, CI705-B8, CI740-B8, CI705-B8A, CI740-B8A, DVI8, DVI8A, SAVI8, SAVI8A, TVI740, TVI783, RVSI, MCARI783, TCARI783, OSARI8, OSARI8A, MTVI8, MTVI8A, IDVI8, IDVI8A
RFE	B7, B8, B8A, B7_M, B8A_M, RENDVI3, ARI2-3, CI740-B8A

Table 5. Grid search results for the main model hyperparameters.

Models	Dataset A	Dataset B	Dataset C	Dataset D
RIDGE	α: 0.001	α: 0.0001	α: 0.01	α: 0.0001
PLSR	n_components: 4	n_components: 6	n_components: 6	n_components: 4
SVR	C: 4, kernel: rbf	C: 4, kernel: rbf	C: 5, kernel: rbf	C: 5, kernel: rbf
XGBOOST	max_depth: 2, n_estimators: 530	max_depth: 3, n_estimators: 90	max_depth: 2, n_estimators: 95	max_depth: 3, n_estimators: 570
RFR	max_depth: 3, n_estimators: 30	max_depth: 3, n_estimators: 70	max_depth: 3, n_estimators: 120	max_depth: 3, n_estimators: 50

Notes: α denotes the L2 regularization strength; n_components denotes the number of latent variables; C denotes the penalty parameter; kernel denotes the kernel function; max_depth denotes the maximum tree depth; n_estimators denotes the number of base learners.

Table 6. Comparison of NBI distributions between the sample region and the region of model extrapolation.

NBI Distribution	Max	Min	Mean	SD	CV (%)
Sample	36.79	17.10	27.14	3.68	13.56%
CC-RFE-DRFR	36.39	17.82	23.45	3.78	16.12%
CC-RFE-DXGBOOST	36.05	17.15	22.58	4.45	19.71%
CC-RFE-BRFR	34.42	17.39	23.87	3.88	16.25%
CC-RFE-BXGBOOST	34.62	17.31	23.01	4.18	18.17%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, B.; Chen, X.; Guo, Y.; Liu, L.; Li, P.; Chang, Q. Multi-Source Feature Selection and Explainable Machine Learning Approach for Mapping Nitrogen Balance Index in Winter Wheat Based on Sentinel-2 Data. Remote Sens. 2025, 17, 3196. https://doi.org/10.3390/rs17183196

AMA Style

Shi B, Chen X, Guo Y, Liu L, Li P, Chang Q. Multi-Source Feature Selection and Explainable Machine Learning Approach for Mapping Nitrogen Balance Index in Winter Wheat Based on Sentinel-2 Data. Remote Sensing. 2025; 17(18):3196. https://doi.org/10.3390/rs17183196

Chicago/Turabian Style

Shi, Botai, Xiaokai Chen, Yiming Guo, Li Liu, Peng Li, and Qingrui Chang. 2025. "Multi-Source Feature Selection and Explainable Machine Learning Approach for Mapping Nitrogen Balance Index in Winter Wheat Based on Sentinel-2 Data" Remote Sensing 17, no. 18: 3196. https://doi.org/10.3390/rs17183196

APA Style

Shi, B., Chen, X., Guo, Y., Liu, L., Li, P., & Chang, Q. (2025). Multi-Source Feature Selection and Explainable Machine Learning Approach for Mapping Nitrogen Balance Index in Winter Wheat Based on Sentinel-2 Data. Remote Sensing, 17(18), 3196. https://doi.org/10.3390/rs17183196

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Source Feature Selection and Explainable Machine Learning Approach for Mapping Nitrogen Balance Index in Winter Wheat Based on Sentinel-2 Data

Abstract

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of the Study Area

2.2. Nitrogen Balance Index Acquisition

2.3. Sentinel-2 Image Processing

2.4. Selection of Vegetation Indices and Texture Parameters

2.5. Methodology and Techniques

2.6. Model Evaluation

3. Results

3.1. Comparison Between Sen2Res and Original Images

3.2. Analysis of Correlation Results

3.3. Model Results

3.4. SHAP Analysis

3.5. Spatial Distribution of NBI

4. Discussion

4.1. Comparison Between Sen2Res Super-Resolved Images and Original Images

4.2. Response of Spectral, Textural, and Vegetation Indices to NBI

4.3. Feature Selection and Machine Learning Algorithms

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI