Mapping Cropland Soil Nutrients Contents Based on Multi-Spectral Remote Sensing and Machine Learning

: Nitrogen (N) and phosphorus (P) are primary indicators of soil nutrients in agriculture. Accurate management of these nutrients is essential for ensuring food security. High-resolution, multi-spectral remote sensing images can provide crucial information for mapping soil nutrients at the ﬁeld scale. This study compares the capabilities of ZH-1 and Sentinel-2 satellite data, along with different spectral indices, in mapping soil nutrients (total N and Olsen-P) using two machine learning algorithms, random forest (RF) and XGBoost (XGB). Two agricultural ﬁelds in Suihua City were selected as the study areas for this investigation. The results showed that Sentinel-2 data performed best in computing the total N content in soil using the RF model ( R 2 = 0.74, RMSE = 0.10 g/kg). However, for the soil Olsen-P content, the XGBoost model performed better with ZH-1 data ( R 2 = 0.75, RMSE = 9.79 mg/kg) than the RF model. This study demonstrates that both ZH-1 and Sentinel-2 satellite data perform well in terms of accurately mapping soil total N and Olsen-P contents using machine learning. Due to its higher spectral and spatial resolution, ZH-1 remote sensing data provides more detailed information on soil nutrient content during Olsen-P inversion and exhibits comparable accuracy.


Introduction
In modern agricultural practices, guiding precision agriculture development and ensuring food security are of paramount importance [1][2][3].Digital soil nutrient mapping plays a crucial role in achieving these objectives as it provides essential information about the soil properties that directly influence crop growth and development [4,5].Among various soil nutrients, total nitrogen and available phosphorus are key indicators of soil fertility and plant nutrition [6,7].Accurate mapping of the spatial distribution of total nitrogen and available phosphorus through precise mapping techniques is critical for optimizing agricultural productivity and resource management [8].
Traditional digital soil nutrient mapping methods typically rely on interpolation of ground survey data, resulting in coarse spatial resolution and limited guidance for precision agriculture [9][10][11].Alternatively, using ground spectrometers combined with spectral information on nutrients for estimation may face challenges in broad-scale applications [12,13].In large-scale farmlands or extensive regions, traditional approaches may encounter issues such as high data acquisition costs, time-consuming processes, and reliance on ground field surveys.These limiting factors hinder the application of nutrient mapping techniques in guiding precision agriculture and achieving sustainable agricultural development [14,15].Sentinel-2, with a revisit period of 5-10 days and a spatial resolution of 10 m, captures 13 image bands, including visible light, near-infrared, and short-wave infrared, providing valuable spectral information for inferring the soil nutrient content [16,17].Additionally, the ZH-1 hyperspectral satellite has a revisit period of six days for a single satellite and an extended revisit period of approximately one day for eight hyperspectral satellites.It possesses a spatial resolution of 10 m, a spectral resolution of 2.5 nanometers, and a wavelength range of 400-1000 nanometers, enabling detailed high-spectral data to be gathered for a more accurate characterization of soil properties [18,19].
In recent years, with the development of emerging technologies, digital soil mapping has been applied using various techniques.Commonly used models include multiple linear regression [20], principal component analysis regression [21], the generalized additive model [22], and kriging interpolation [23].Moreover, machine learning algorithms (e.g., support vector machines, decision trees, random forests, artificial neural networks) have been widely employed in remote sensing studies [24][25][26][27].These algorithms offer advantages by learning from limited data and reducing errors through adaptive learning processes [24].However, research on soil total nitrogen and available phosphorus mapping at higher spatial resolutions is still lacking [28].Machine learning algorithms may not be universally applicable in different environments.Therefore, it is necessary to evaluate the applicability of different machine learning algorithms in our own context to understand the distribution of soil total nitrogen and available phosphorus content.
Hence, this study adopts the random forest (RF) and extreme gradient boosting (XGB) regression methods and introduces Zhuhai-1 (ZH-1) hyperspectral data for the first time on a field scale to explore their potential and effectiveness in mapping total nitrogen (total N) and available phosphorus (Olsen-P), providing valuable insights for soil nutrient estimation.

Materials and Methods
The technical workflow of this study is illustrated in Figure 1.It is primarily divided into three parts: data preprocessing, model training and validation, and model application.In the data preprocessing stage, both Sentinel-2 and ZH-1 data underwent radiometric calibration, followed by atmospheric correction using the Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) method to ensure the accuracy of surface reflectance [29,30].Subsequently, the coarse spatial resolution bands of Sentinel-2 were resampled to 10 m using the nearest neighbor interpolation method, aligning the spatial resolution of both Sentinel-2 and ZH-1.Finally, specific bands from Sentinel-2 and ZH-1 were selected to calculate the vegetation and soil indices.
For the model training and validation stage, the surface reflectance of the two remote sensing datasets, along with vegetation and soil index values at their respective sampling points, were utilized as feature values, with total nitrogen (total N) and Olsen-P serving as label values.The pixel values for the mentioned sampling points were extracted using the rasterio library in Python.Due to different feature combinations, eight different datasets were formed while training two nutrient prediction models.Considering prior relevant studies on soil nutrient inversion, RF and XGB were selected as prediction models for machine learning regression [27,31].
To be specific, 90% of each dataset was used as training data, while the remaining 10% was reserved as a validation set to assess the model's accuracy.In the model application portion, the best-performing model was saved.Data consistent with model features is used as input for soil nutrient inversion at the field scale, resulting in a spatial distribution map of the soil nutrient content.

Study Area
The experimental site is located in Suihua City, which is a significant core area of the black soil zone and an important grain production region in central Heilongjiang Province, China (Figure 2).It is situated to the east of the Songnen Plain, at the junction of the Xiaoxing'an Mountains and the Songnen Plain in the middle reaches of the Hulan River (latitude: 46 • 19 N to 47 • 09 N, longitude: 126 • 25 E to 127 • 23 E) [32].The forested area in the north belongs to the semi-humid and semi-arid monsoon climate zone in the northern temperate zone.Spring is relatively dry, with little rainfall, while summer is humid and hot, with more rainfall.Autumn is cool, but the temperature drops quickly.Winter is cold, with a freezing period of up to six months, as the area belongs to a distinct continental climate [33].The average annual temperature is about 2.9 • C, and the annual average precipitation is 552.5 mm, of which 70% is concentrated from June to August [34].The annual average sunshine hours are 2395 h, the annual effective accumulated temperature is 2852.6 • C, and the area belongs to the second accumulated temperature zone in the province [35].

Available phosphorus
Content Maps

Figure 1.
The proposed methodological framework for mapping soil nutrient content.

Study Area
The experimental site is located in Suihua City, which is a significant core area o black soil zone and an important grain production region in central Heilongjiang P ince, China (Figure 2).It is situated to the east of the Songnen Plain, at the junction o Xiaoxing'an Mountains and the Songnen Plain in the middle reaches of the Hulan R (latitude: 46°19′ N to 47°09′ N, longitude: 126°25′ E to 127°23′ E) [32].The forested ar the north belongs to the semi-humid and semi-arid monsoon climate zone in the nort temperate zone.Spring is relatively dry, with little rainfall, while summer is humid hot, with more rainfall.Autumn is cool, but the temperature drops quickly.Winter is with a freezing period of up to six months, as the area belongs to a distinct contin climate [33].The average annual temperature is about 2.9 °C, and the annual average cipitation is 552.5 mm, of which 70% is concentrated from June to August [34].The an average sunshine hours are 2395 h, the annual effective accumulated temperature is 2 °C, and the area belongs to the second accumulated temperature zone in the province

Field Data Collection and Laboratory Analysis
This study used soil samples collected from two experimental fields on 17 September 2020 and 22 October 2020.The samples were collected during the non-planting stage of the field, and the nutrient content in the samples reflects the soil nutrient status of the sampling site.Soil samples were collected using the systematic sampling method from the top 20 cm of the field surface, with approximately 500 g of samples being introduced into the sample bag.The sample bags were marked with sample numbers, and the latitude and longitude coordinates of the sampling points were recorded using GPS.A total of 72 and 49 soil samples were collected from these two experimental fields, respectively.

Field Data Collection and Laboratory Analysis
This study used soil samples collected from two experimental fields on 17 Septem 2020 and 22 October 2020.The samples were collected during the non-planting stag the field, and the nutrient content in the samples reflects the soil nutrient status of sampling site.Soil samples were collected using the systematic sampling method from top 20 cm of the field surface, with approximately 500 g of samples being introduced the sample bag.The sample bags were marked with sample numbers, and the latit and longitude coordinates of the sampling points were recorded using GPS.A total o and 49 soil samples were collected from these two experimental fields, respectively.
The soil samples were collected and sent to the Laboratory of the Institute of Science, Chinese Academy of Sciences, for physicochemical analysis and processing.samples were further dried, and stones, residues, and impurities were removed.The s ples were then ground into a powder for subsequent physical and chemical analysis cluding the determination of total nitrogen and Olsen-P in the soil using the Kelvin dig tion method and the colorimetric method.The Kelvin digestion method is a widely u approach for determining soil total nitrogen content, involving high-temperature dig tion with strong oxidizing agents to convert organic and inorganic nitrogen compou into analyzable forms.The determination of Olsen-P using the colorimetric method volved the conversion of soil samples into phosphate ions within a solution, followed their reaction with reagents containing chromogenic agents to generate colored c pounds.Subsequently, the absorbance of these compounds was measured in order to r idly and accurately quantify the available phosphorus content in the soil [36].The sta tical results obtained after testing of the nitrogen and phosphorus content in the so each sampling point are presented in Table 1.The soil samples were collected and sent to the Laboratory of the Institute of Soil Science, Chinese Academy of Sciences, for physicochemical analysis and processing.The samples were further dried, and stones, residues, and impurities were removed.The samples were then ground into a powder for subsequent physical and chemical analysis, including the determination of total nitrogen and Olsen-P in the soil using the Kelvin digestion method and the colorimetric method.The Kelvin digestion method is a widely used approach for determining soil total nitrogen content, involving high-temperature digestion with strong oxidizing agents to convert organic and inorganic nitrogen compounds into analyzable forms.The determination of Olsen-P using the colorimetric method involved the conversion of soil samples into phosphate ions within a solution, followed by their reaction with reagents containing chromogenic agents to generate colored compounds.Subsequently, the absorbance of these compounds was measured in order to rapidly and accurately quantify the available phosphorus content in the soil [36].The statistical results obtained after testing of the nitrogen and phosphorus content in the soil at each sampling point are presented in Table 1.

Remote Sensing Data Acquisition and Preprocessing
Sentinel-2, as a part of the Copernicus program of the European Space Agency (ESA), is composed of multiple satellites that acquire medium-resolution images for various applications, such as forest monitoring, water quality assessment, land cover change detection, and disaster management [37].This mission includes two satellites, namely, Sentinel-2A and Sentinel-2B, which share similar designs and orbits.Each satellite is equipped with a multi-spectral instrument (MSI) and utilizes a three-mirror astigmatic telescope with a 150 mm aperture and 600 mm focal length [16].It captures 13 image bands, including different spectral ranges such as visible light, near-infrared, and shortwave infrared [17].Using a push-broom method, MSI can achieve high-resolution images spanning 290 km [38].Sentinel-2 offers temporal data continuity with a revisiting period of 5 days, which is accessible to all users [39].However, their spatial resolution varies as shown in Table 2: bands 2, 3, 4, and 8 are 10 m, bands 5, 6, 7, 8A, 11, and 12 are 20 m, and bands 9 and 10 are 60 m [40].The ZH-1 hyperspectral satellite (OHS) adopts a push-broom imaging technique with a spatial resolution of 10 m, a spectral resolution of 2.5 nanometers, and a wavelength range of 400-1000 nanometers as shown in Table 3. Due to storage limitations and compression design, it transmits 32 spectral bands and has a weight of 71 kg [41].Each hyperspectral satellite can orbit the Earth approximately 15-16 times a day, with a maximum data acquisition time of about 8 min per orbit.Currently, a single hyperspectral satellite has a revisit period of six days, while the extended revisit period of eight hyperspectral satellites is about one day [18].The experiment aimed to ensure that the surface reflectance of the research area obtained could faithfully represent the situation regarding soil sample collection.We acquired the Sentinel-2 image from 15 October 2020 through the Google Earth Engine (GEE), which covered the date of field sampling.Considering that the image was free of clouds, it was highly suitable for our study.Additionally, the ZH-1 image from 19 October 2020 was obtained from the ZH-1 Remote Sensing Data Service Platform (https://www.obtdata.com,accessed on 24 April 2022) and was downloaded through an application specifically designed for educational and research purposes.
Prior to the experiment, both images underwent preprocessing, which included radiometric correction, atmospheric correction, and geometric correction.Furthermore, in the Sentinel-2 imagery, bands with spatial resolutions other than 10 m were resampled to achieve a uniform 10 m resolution.

Spectral Indices
In this study, we derived spectral indices from the bands of Sentinel-2 and ZH-1.This selection was motivated by various vegetation indices reported in previous studies, aiming to augment spatial information and enhance regression accuracy to some extent by incorporating additional vegetation indices.In similar studies conducted by Zinhle, vegetation indices were carefully screened, and the following indices were considered to play a significant role in soil nutrient inversion [42].Accordingly, we continued to utilize these vegetation indices in our experiments.The vegetation indices based on vegetation reflectance include normalized difference vegetation indices (NDVIRE1n, NDVIRE2n, ND-VIRE3n) in the narrow bands, as well as a modified simple ratio (MSRRE).Furthermore, these indices encompass the plant senescence reflectance index (PSRI), the enhanced vegetation index (EVI), and the green normalized difference vegetation index (GNDVI) [28,43].The final spectral indices derived from Sentinel-2 which were used in this study are summarized in Table 4. Additionally, for this investigation, we selected corresponding bands from the Sentinel-2 and ZH-1 original data to calculate the same indices, and the results are presented in Table 5.
Table 4. Spectral indices used in this study with Sentinel-2 data: The table consists of seven vegetation indices and five soil indices.From left to right, the columns represent the names of the spectral indices, their corresponding calculation formulas, the meanings of the spectral indices, and the reference numbers for the literature sources that utilized these spectral indices.

Vegetation Index
Equation Purpose Source Senescence-induced reflectance changes [44] NDVIRE1n Sparse biomass [45] NDVIRE2n Sparse biomass [45] NDVIRE3n Chlorophyll-sensitive [47] GNDVI Chlorophyll-sensitive [48] Soil index Equation Property Source Soil color [49] HI Hematite content [50] SI Spectral slope [49] Table 5. Spectral indices used in this study with ZH-1 data: The table consists of seven vegetation indices and five soil indices.From left to right, the columns represent the names of the spectral indices, their corresponding calculation formulas, the meanings of the spectral indices, and the reference numbers for the literature sources that utilized these spectral indices.
Vegetation Index Equation Property Source Senescence-induced reflectance changes [44] NDVIRE1n Sparse biomass [45] NDVIRE2n Sparse biomass [45] NDVIRE3n Sparse biomass [45] MSRRE Correction for leaf specular reflection [46] EVI 2.5 × Chlorophyll-sensitive [47] GNDVI Chlorophyll-sensitive [48] Soil index Equation Property Source 3 ) 0.5 Average reflectance magnitude [49] CI Soil color [49] HI Hematite content [50] SI Spectral slope [49] 2.5.Machine Learning Regression Models 2.5.1.Random Forest Regression Random forest is a supervised ensemble learning method that operates based on the principles of decision trees as shown in Figure 3.This versatile algorithm is capable of effectively handling both classification and regression problems [51,52].The fundamental concept underlying random forest involves creating a forest comprising multiple decision trees, where each tree serves as a base learner and the entire ensemble embodies the concept of ensemble learning [53,54].The final model is generated by aggregating the average output of each tree in the forest.Additionally, the algorithm utilizes out-of-bag samples, representing unused data points that can be leveraged for model evaluation and assessing variable importance [55,56].Notably, random forest exhibits the ability to handle highdimensional feature data without necessitating feature selection.Scholar John employed a set of machine learning algorithms, including an artificial neural network (ANN), a support vector machine (SVM), cubist regression, random forest (RF), and multiple linear regression (MLR), to predict SOC levels.Among these models, RF demonstrated the best performance, with an R-squared value of 0.68 [27].Furthermore, the algorithm boasts a concise set of parameters, including the number of decision trees (n_estimators), the maximum depth of each decision tree (max_depth), and the minimum number of samples required for a node to split (min_samples_split) [57,58].For this investigation, we conducted our analysis utilizing the Scikit-Learn module within the Python environment.Employing a grid search approach, we ascertained the optimal parameters and established a robust model to accurately predict soil nutrient content.

Extreme Gradient Boosting Regression
The extreme gradient boosting (XGB) algorithm, introduced by Chen and Guestrin in 2016, is a novel machine learning approach as shown in Figure 4.It has demonstrated remarkable performance in numerous international data mining competitions, surpassing even deep learning algorithms.XGB falls under the category of gradient boosting algorithms for classification and regression ensembles, making it applicable to both classification and regression tasks [59,60].The XGB training process involves two stages: fitting the input training dataset and fitting the residuals.The main hyperparameters of XGBoost include the number of decision trees, the learning rate, the maximum depth of trees, the minimum sample weight, the subsample ratio used in each iteration, and the weight of the L1 regularization term [57,61].This training method significantly enhances the performance of weakly supervised learning.Scholar Miao employed three machine learning models, i.e., XGBoost, RF, and LightGBM, based on Sentinel-2 images for the purpose of estimating leaf nutrient levels.The results demonstrated that XGBoost outperformed the other models in terms of estimating leaf C, (with R 2 values of 0.655, 0.799, and 0.829 for spring, summer, and winter, respectively), N (with R 2 values of 0.668, 0.743, and 0.704), and P (with R 2 values of 0.539, 0.622, and 0.596) [62].The fitting process underwent multiple iterations until it met the convergence criterion.In this study, the XGB algorithm was adopted due to its ability to mitigate overfitting issues and its superior performance [63].The Xgboost library in the Python environment was utilized for modeling purposes in this research.

Extreme Gradient Boosting Regression
The extreme gradient boosting (XGB) algorithm, introduced by Chen and Guestrin in 2016, is a novel machine learning approach as shown in Figure 4.It has demonstrated remarkable performance in numerous international data mining competitions, surpassing even deep learning algorithms.XGB falls under the category of gradient boosting algorithms for classification and regression ensembles, making it applicable to both classification and regression tasks [59,60].The XGB training process involves two stages: fitting the input training dataset and fitting the residuals.The main hyperparameters of XGBoost include the number of decision trees, the learning rate, the maximum depth of trees, the minimum sample weight, the subsample ratio used in each iteration, and the weight of the L1 regularization term [57,61].This training method significantly enhances the performance of weakly supervised learning.Scholar Miao employed three machine learning models, i.e., XGBoost, RF, and LightGBM, based on Sentinel-2 images for the purpose of estimating leaf nutrient levels.The results demonstrated that XGBoost outperformed the other models in terms of estimating leaf C, (with R 2 values of 0.655, 0.799, and 0.829 for spring, summer, and winter, respectively), N (with R 2 values of 0.668, 0.743, and 0.704), and P (with R 2 values of 0.539, 0.622, and 0.596) [62].The fitting process underwent multiple iterations until it met the convergence criterion.In this study, the XGB algorithm was adopted due to its ability to mitigate overfitting issues and its superior performance [63].The Xgboost library in the Python environment was utilized for modeling purposes in this research.

Experiments
In this study, we conducted research on characteristic variable images to simulate different soil nutrient contents (total nitrogen and Olsen-P).In order to enhance the accuracy and generalization capability of the model and to achieve a more stable performance, we employed a larger set of samples for training, allowing the model to better learn the features and patterns within the data.The dataset was divided into 90% training and 10% testing subsets.Drawing inspiration from previous research, our aim was to compare the effectiveness of different feature combinations to capture spectral information related to vegetation and soil, thereby enhancing feature representation for the purpose of obtaining more accurate and effective soil parameter estimation models.To achieve this, we employed two models, random forest (RF) and XGBoost (XGB), along with two types of remote sensing data with various combinations of variables, which are summarized in Table 6.
For the two soil nutrients (total nitrogen and Olsen-P), the experimental setups included the following combinations: In this research, we employed a grid search as the method for model tuning.Grid search is a widely used parameter optimization technique aimed at determining the optimal hyperparameter combinations for machine learning models.It involves traversing a predefined grid of parameter values, exploring different combinations, and evaluating the performance of each combination to identify the best parameter settings [64].

Experiments
In this study, we conducted research on characteristic variable images to simulate different soil nutrient contents (total nitrogen and Olsen-P).In order to enhance the accuracy and generalization capability of the model and to achieve a more stable performance, we employed a larger set of samples for training, allowing the model to better learn the features and patterns within the data.The dataset was divided into 90% training and 10% testing subsets.Drawing inspiration from previous research, our aim was to compare the effectiveness of different feature combinations to capture spectral information related to vegetation and soil, thereby enhancing feature representation for the purpose of obtaining more accurate and effective soil parameter estimation models.To achieve this, we employed two models, random forest (RF) and XGBoost (XGB), along with two types of remote sensing data with various combinations of variables, which are summarized in Table 6.For the two soil nutrients (total nitrogen and Olsen-P), the experimental setups included the following combinations: In this research, we employed a grid search as the method for model tuning.Grid search is a widely used parameter optimization technique aimed at determining the optimal hyperparameter combinations for machine learning models.It involves traversing a predefined grid of parameter values, exploring different combinations, and evaluating the performance of each combination to identify the best parameter settings [64].

Model Evaluation
This study used common machine learning verification indices to evaluate the prediction performances of the RF and XGB models.These included mean absolute error (MAE), root mean square error (RMSE), percent bias (PBIAS), and r-squared(R 2 ), as shown in Equations ( 1)-( 4): where n represents the number of sample points, P i is the predicted soil content, and O i is the observed soil content at site i.

Model Evaluation
In this study, we conducted model performance statistics as shown in Tables 7 and 8 on the testing data (n = 13 samples) and obtained the following results.Regarding the estimation of the total nitrogen content in soil, the random forest model showed remarkable performance, especially the RF1 variant (represented by experiment number 1 in Table 6), which demonstrated outstanding results.This model exhibited the lowest root mean square error (RMSE) and mean absolute error (MAE), indicating the highest accuracy in estimating soil nitrogen content (RMSE = 0.10 g/kg, MAE = 0.07 g/kg), and it also achieved the highest R-squared value (R 2 = 0.74).It is noteworthy that, based on the prediction bias (PBIAS = −2.66), the predicted values of total nitrogen were slightly higher than the observed values.
Overall, among all models, the XGBoost (XGB) model in Experiment 8 performed the most poorly.This model incorporated the raw bands, soil indices, and vegetation indices of ZH-1.It had a higher error rate, as reflected by the higher RMSE and MAE values (RMSE = 0.16 g/kg, MAE = 0.11 g/kg), and it achieved the lowest R-squared value (R 2 = 0.31).Moreover, this model overestimated the total nitrogen content, as indicated by the prediction bias (PBIAS = −5.01).In the inversion model for total soil nitrogen, the overestimation of predicted values is attributed to the presence of features with low correlations in the dataset, which negatively impact the model.In future research, we will address this issue and work towards mitigating its influence.It is evident that PBIAS increases (decreases) with the addition of model features.Considering the characteristics of the model, this trend may be attributed to the introduction of noise from features with lower correlations in the dataset.
The XGB model from Experiment 5 emerged as the top-performing model for Olsen-P estimation among all experiments.It demonstrated the highest accuracy in terms of predicting Olsen-P content, with the lowest root mean square error (RMSE = 9.79 mg/kg) and mean absolute error (MAE = 6.41 mg/kg), along with the highest R-squared value (R 2 = 0.75).The predicted values were slightly lower than the observed values, as indicated by the prediction bias (PBIAS = 2.31).
On the other hand, the XGB model from Experiment 4 exhibited the poorest performance.This model involved the original bands of Sentinel-2, soil indices, and vegetation indices.It had a higher error rate, which was evident from the elevated RMSE (14.97 mg/kg) and MAE (11.48 mg/kg), and the lowest R-squared value (R 2 = 0.40).Furthermore, this model underestimated the Olsen-P content, as indicated by the percent bias (PBIAS = 5.50).
The effectiveness of the inversion models for total nitrogen and Olsen-P was evaluated using the Taylor diagram shown in Figure 5 (generated using the plotrix package in R).The Olsen-P inversion model exhibited an outstanding performance, with correlation coefficients ranging from 0.8 to 0.95, demonstrating a close agreement with the actual values, as illustrated in the Taylor diagram.For the total nitrogen inversion model, the correlation coefficients were primarily distributed between 0.8 and 0.9.The inversion results were consistent with observations from multiple models and demonstrated excellent performance.According to the Taylor diagram, the best-performing total nitrogen inversion model was RF1, which corresponds with the model evaluation results presented in Table 7. RF1 displayed the highest accuracy in terms of estimating the total nitrogen content in the soil.RF7 emerged as the optimal Olsen-P inversion model.

Variable Importance
In this study, we selected the best-performing RF and XGB models for the inversion of each soil nutrient content, then compared and analyzed the significance of each feature.The feature contributions of the soil total nitrogen and Olsen-P content inversion models, as shown in Figures 6 and 7, varied.In the soil total nitrogen content inversion model, RF1 demonstrated the best performance, with Sentinel-2′s B3 band contributing over 20%.Among the four XGB models, XGB1 performed the best, with its highest contributing feature being the same as RF1 nearly 15%.In the Olsen-P content inversion model, the topperforming models were RF7 and XGB5.In the RF7 model, the feature with the highest contribution was ZH-1′s B2 band, which contributed nearly 8%.In the XGB5 model, the feature with the largest contribution was ZH-1′s B27 band, also contributing nearly 8%.This indicates that Sentinel-2 data played a significant role in the inversion of soil total nitrogen, while for the Olsen-P content inversion model, ZH-1 data had a more prominent contribution.The observed changes in feature importance in the results can be attributed

Variable Importance
In this study, we selected the best-performing RF and XGB models for the inversion of each soil nutrient content, then compared and analyzed the significance of each feature.The feature contributions of the soil total nitrogen and Olsen-P content inversion models, as shown in Figures 6 and 7, varied.In the soil total nitrogen content inversion model, RF1 demonstrated the best performance, with Sentinel-2 s B3 band contributing over 20%.Among the four XGB models, XGB1 performed the best, with its highest contributing feature being the same as RF1 nearly 15%.In the Olsen-P content inversion model, the top-performing models were RF7 and XGB5.In the RF7 model, the feature with the highest contribution was ZH-1 s B2 band, which contributed nearly 8%.In the XGB5 model, the feature with the largest contribution was ZH-1 s B27 band, also contributing nearly 8%.This indicates that Sentinel-2 data played a significant role in the inversion of soil total nitrogen, while for the Olsen-P content inversion model, ZH-1 data had a more prominent contribution.The observed changes in feature importance in the results can be attributed to the presence of highly correlated features in the input data, which overshadowed the importance of other features, resulting in variations in their contributions.
to the presence of highly correlated features in the input data, which overshadowed the importance of other features, resulting in variations in their contributions.

Mapping Soil Nutrients Content
Zinhle has demonstrated in study that the combination of machine learning methods with remote sensing data and derived spectral indices can accurately predict soil total nitrogen content and generate spatial distribution maps [42].Therefore, in this study, different models were selected to predict and map the spatial distribution of soil total nitrogen and Olsen-P content.RF1 and XGB1 models were chosen to predict and map soil total nitrogen content, as shown in Figure 8a,b.From Figure 8a,b, significant spatial variations in soil total nitrogen content between the two experimental fields can be observed.In experimental field 1, the soil total nitrogen content in the southern part was notably higher than that in the northern part, while in experimental field 2, the overall soil total nitrogen content was higher in the northern part, ranging from 1.28 to 1.70 g/kg.The spatial distribution of the soil Olsen-P content shown in Figure 8c,d was generally consistent with that of soil total nitrogen content, with the Olsen-P content ranging from 16.34 to 68.76 mg/kg.In the scatter plots in Figure 9, it can be clearly observed that the data points are not highly clustered.Additionally, the spatial distribution of soil nutrient content predicted by the two models aligns, indicating that the experimental design in this study is reliable and capable of producing trustworthy results.to the presence of highly correlated features in the input data, which overshadowed importance of other features, resulting in variations in their contributions.

Mapping Soil Nutrients Content
Zinhle has demonstrated in study that the combination of machine learning metho with remote sensing data and derived spectral indices can accurately predict soil to nitrogen content and generate spatial distribution maps [42].Therefore, in this study, d ferent models were selected to predict and map the spatial distribution of soil total nit gen and Olsen-P content.RF1 and XGB1 models were chosen to predict and map soil to nitrogen content, as shown in Figure 8a,b.From Figure 8a,b, significant spatial variatio in soil total nitrogen content between the two experimental fields can be observed.In perimental field 1, the soil total nitrogen content in the southern part was notably high than that in the northern part, while in experimental field 2, the overall soil total nitrog content was higher in the northern part, ranging from 1.28 to 1.70 g/kg.The spatial dis bution of the soil Olsen-P content shown in Figure 8c,d was generally consistent with t of soil total nitrogen content, with the Olsen-P content ranging from 16.34 to 68.76 mg/ In the scatter plots in Figure 9, it can be clearly observed that the data points are not high clustered.Additionally, the spatial distribution of soil nutrient content predicted by two models aligns, indicating that the experimental design in this study is reliable a capable of producing trustworthy results.

Mapping Soil Nutrients Content
Zinhle has demonstrated in study that the combination of machine learning methods with remote sensing data and derived spectral indices can accurately predict soil total nitrogen content and generate spatial distribution maps [42].Therefore, in this study, different models were selected to predict and map the spatial distribution of soil total nitrogen and Olsen-P content.RF1 and XGB1 models were chosen to predict and map soil total nitrogen content, as shown in Figure 8a,b.From Figure 8a,b, significant spatial variations in soil total nitrogen content between the two experimental fields can be observed.In experimental field 1, the soil total nitrogen content in the southern part was notably higher than that in the northern part, while in experimental field 2, the overall soil total nitrogen content was higher in the northern part, ranging from 1.28 to 1.70 g/kg.The spatial distribution of the soil Olsen-P content shown in Figure 8c,d was generally consistent with that of soil total nitrogen content, with the Olsen-P content ranging from 16.34 to 68.76 mg/kg.In the scatter plots in Figure 9, it can be clearly observed that the data points are not highly clustered.Additionally, the spatial distribution of soil nutrient content predicted by the two models aligns, indicating that the experimental design in this study is reliable and capable of producing trustworthy results.

Discussion
This study aims to evaluate the applicability of ZH-1 and Sentinel-2 satellite da mapping soil nutrients (total nitrogen and Olsen-P) in farmland soil in Suihua City, C Two machine learning algorithms, random forest (RF) and XGBoost (XGB), were ployed to assess the predictive capabilities of these data for soil nutrient content.
Regarding the soil total nitrogen content, our results demonstrate that the RF m performed optimally when using Sentinel-2 data, with an R 2 of 0.74 and RMSE of g/kg.Conversely, for the soil Olsen-P content, the XGB model outperformed using data, showing an R 2 of 0.75 and RMSE of 9.79 mg/kg, surpassing the RF model.The s rior performance of the Sentinel-2 data model in predicting soil total nitrogen can b tributed to its sensitivity in detecting nitrogen compounds in the short-wave inf range, as indicated by the prominent contributions of Sentinel-2′s B11 and B12 ban Figure 6 [65].In contrast, ZH-1′s spectral range is 400-1000 nm; hence, for soil total n gen inversion, the combination of Sentinel-2 data with machine learning algorithms y better results.The inversion of soil total nitrogen has shown a common overestim phenomenon, mainly due to the redundancy of features in the dataset.It can be obse that when ZH-1 data appears as a feature in the dataset, the overestimation of soil nitrogen significantly increases, likely due to the rapid increase in the number of feat In future research, this issue will be addressed by optimizing the selection of mode tures to improve model accuracy.
This study incorporated vegetation indices and soil indices to construct divers tasets for model training, with the aim of enhancing model accuracy and comparin performance disparities among models with different input features.Although, ov the model performance did not exhibit a significant improvement when using veget

Discussion
This study aims to evaluate the applicability of ZH-1 and Sentinel-2 satellite data for mapping soil nutrients (total nitrogen and Olsen-P) in farmland soil in Suihua City, China.Two machine learning algorithms, random forest (RF) and XGBoost (XGB), were employed to assess the predictive capabilities of these data for soil nutrient content.
Regarding the soil total nitrogen content, our results demonstrate that the RF model performed optimally when using Sentinel-2 data, with an R 2 of 0.74 and RMSE of 0.10 g/kg.Conversely, for the soil Olsen-P content, the XGB model outperformed using ZH-1 data, showing an R 2 of 0.75 and RMSE of 9.79 mg/kg, surpassing the RF model.The superior performance of the Sentinel-2 data model in predicting soil total nitrogen can be attributed to its sensitivity in detecting nitrogen compounds in the short-wave infrared range, as indicated by the prominent contributions of Sentinel-2 s B11 and B12 bands in Figure 6 [65].In contrast, ZH-1 s spectral range is 400-1000 nm; hence, for soil total nitrogen inversion, the combination of Sentinel-2 data with machine learning algorithms yields better results.The inversion of soil total nitrogen has shown a common overestimation phenomenon, mainly due to the redundancy of features in the dataset.It can be observed that when ZH-1 data appears as a feature in the dataset, the overestimation of soil total nitrogen significantly increases, likely due to the rapid increase in the number of features.In future research, this issue will be addressed by optimizing the selection of model features to improve model accuracy.
This study incorporated vegetation indices and soil indices to construct diverse datasets for model training, with the aim of enhancing model accuracy and comparing the performance disparities among models with different input features.Although, overall, the model performance did not exhibit a significant improvement when using vegetation indices as partial features, it is noteworthy that the combination of Sentinel-2 vegetation in-dices slightly outperformed the model utilizing soil indices for soil total nitrogen modeling, as indicated in Tables 7 and 8. Furthermore, in the models utilizing vegetation indices, their contribution was generally greater than that of other spectral bands.These findings are consistent with the observations made by Zhang in their research, wherein conventional spectral indices also played a role in nitrogen estimation [25].Thus, judiciously selecting vegetation indices and soil indices as model input features can indeed enhance spatial information and improve regression accuracy.
It is worth noting that both the ZH-1 and Sentinel-2 satellite data performed well in accurately mapping soil total nitrogen and Olsen-P contents using machine learning regression models.Due to its higher spectral and spatial resolution, ZH-1 remote sensing data provided more detailed information on soil nutrient content during Olsen-P inversion, displaying considerable accuracy.This finding highlights the potential of ZH-1 data to provide valuable soil nutrient variation information at a finer scale.This aligns with the viewpoint presented in the studies by Sebastian [66] and Kawamura [67], indicating that hyperspectral remote sensing images exhibit a certain advantage over multispectral remote sensing images in terms of capturing key soil parameters.In the context of precise soil nutrient mapping, the spatial resolution of digital soil mapping products for total nitrogen and Olsen-P has increased from 250 m to 30 m [31].During the production of these products, the inversion accuracy and spatial resolution of nutrient distribution largely rely on the spatial and spectral resolution of input remote sensing data.This study introduced ZH-1 hyperspectral remote sensing data and demonstrated its excellent potential in soil Olsen-P content retrieval experiments.The Olsen-P inversion mapping method proposed in this study contributes to the potential of obtaining the spatial distribution of soil Olsen-P content in farmland more rapidly and accurately through remote sensing data, thus promoting the development and implementation of precision agriculture.
Undeniably, our research has certain limitations.The sample collection in the study was relatively limited, focusing solely on a relatively small geographic area.However, for larger-scale precision agriculture projects, a more extensive and comprehensive collection of soil samples becomes crucial.With broader coverage, future research will incorporate ground-based environmental factors and other data (such as soil type, DEM, slope, and aspect) as supplementary features into the model, aiming to significantly enhance the predictive accuracy.This important enhancement will make a substantial contribution to the refined inference and prediction of soil nutrient contents, thereby providing more reliable support for decision-making regarding agricultural production.Additionally, greater attention should be given to feature selection and interpretability of the models.These considerations are of paramount importance for the advancement of precision agriculture.

Conclusions
Based on machine learning regression methods, combined with Sentinel-2 multispectral data and ZH-1 hyperspectral data, this study inversely estimated the total nitrogen content and Olsen-P content in farmland soils.Through experiments on different combinations of predictive factors, this study found that different factors have different effects on the prediction of soil parameters, and the best-performing model varies depending on the different soil parameters.Among them, the RF1 model performed best in the inverse estimation of the total nitrogen content, reaching R 2 = 0.74 and RMSE = 0.10 g/kg, while the XGB5 model performed best in the inverse estimation of Olsen-P content, reaching R 2 = 0.75 and RMSE = 9.79 mg/kg.In addition, this study also found, through comparative analysis, that when predicting soil total nitrogen content, the original bands of Sentinel-2 contribute more to the prediction results, proving that Sentinel-2 plays an important role in predicting soil total nitrogen content.When predicting soil Olsen-P content, the original bands of ZH-1 hyperspectral data contribute more to and have a positive impact on the prediction results.These high-contribution features are the basis for establishing soil parameter prediction models.Finally, this study has generated spatial distribution maps of soil total nitrogen and Olsen-P, which can serve as valuable tools to guide agricultural production decision-making and aid in formulating field-scale soil nutrient management plans to increase crop yields and enhance food security.The application of these maps holds great potential, not only for promoting the development, but also for facilitating the implementation of precision agriculture.

Figure 1 .
Figure 1.The proposed methodological framework for mapping soil nutrient content.

Figure 2 .
Figure 2. Illustration of the research area.

Figure 2 .
Figure 2. Illustration of the research area.

Figure 3 .
Figure 3.A simplified schematic diagram of the random forest regression model.

Figure 3 .
Figure 3.A simplified schematic diagram of the random forest regression model.

Figure 4 .
Figure 4.A concise schematic diagram illustrating the XGBoost regression model.In the figure, 'x' represents features, 'y' represents labels, 'i' indicates sequence values, and 'n' denotes the number of trees.

Figure 4 .
Figure 4.A concise schematic diagram illustrating the XGBoost regression model.In the figure, 'x' represents features, 'y' represents labels, 'i' indicates sequence values, and 'n' denotes the number of trees.

Figure 5 .
Figure 5.Taylor diagram for the 16 experiments for the two nutrients: (a) Taylor diagram for total nitrogen; (b) Taylor diagram for Olsen-P.

Figure 5 .
Figure 5.Taylor diagram for the 16 experiments for the two nutrients: (a) Taylor diagram for total nitrogen; (b) Taylor diagram for Olsen-P.

Figure 6 .
Figure 6.The contributions of key features to the prediction of total nitrogen content in soil using the RF1 and XG1 models are depicted in the graph.The y-axis represents the contribution of each feature, while the x-axis represents the different features.

Figure 7 .
Figure 7.This graph illustrates the contributions of key features to the prediction of soil Olsen-P content using the RF7 and XG5 models.The y-axis represents the contribution of each feature, while the x-axis represents different features.

Figure 6 .
Figure 6.The contributions of key features to the prediction of total nitrogen content in soil using the RF1 and XG1 models are depicted in the graph.The y-axis represents the contribution of each feature, while the x-axis represents the different features.

Figure 6 .
Figure 6.The contributions of key features to the prediction of total nitrogen content in soil us the RF1 and XG1 models are depicted in the graph.The y-axis represents the contribution of e feature, while the x-axis represents the different features.

Figure 7 .
Figure 7.This graph illustrates the contributions of key features to the prediction of soil Olse content using the RF7 and XG5 models.The y-axis represents the contribution of each feature, wh the x-axis represents different features.

Figure 7 .
Figure 7.This graph illustrates the contributions of key features to the prediction of soil Olsen-P content using the RF7 and XG5 models.The y-axis represents the contribution of each feature, while the x-axis represents different features.

Figure 8 .
Figure 8. Spatial distribution map of soil nutrient content: (a) The spatial distribution of total nitrogen was mapped with the random forest model for experiment 1.(b) The spatial distribution of total nitrogen was mapped with the extreme gradient boosting model for experiment 1. (c) The spatial distribution of Olsen-P was mapped with the random forest model for experiment 7. (d) The spatial distribution of Olsen-P was mapped with the extreme gradient boosting model for experiment 5.

Figure 8 .
Figure 8. Spatial distribution map of soil nutrient content: (a) The spatial distribution of total nitrogen was mapped with the random forest model for experiment 1.(b) The spatial distribution of total nitrogen was mapped with the extreme gradient boosting model for experiment 1. (c) The spatial distribution of Olsen-P was mapped with the random forest model for experiment 7. (d) The spatial distribution of Olsen-P was mapped with the extreme gradient boosting model for experiment 5.

Figure 9 .
Figure 9. Scatter plots of soil nutrient content, where the horizontal axis represents observed v and the vertical axis represents predicted values.The red lines indicate the trend lines.(a,b) sent RF1 and XGB1 in the soil total nitrogen inversion model, respectively.(c,d) represent RF XGB5 in the Olsen-P inversion model, respectively.

Figure 9 .
Figure 9. Scatter plots of soil nutrient content, where the horizontal axis represents observed values and the vertical axis represents predicted values.The red lines indicate the trend lines.(a,b) represent RF1 and XGB1 in the soil total nitrogen inversion model, respectively.(c,d) represent RF7 and XGB5 in the Olsen-P inversion model, respectively.

Table 1 .
Statistical table of physical and chemical data for 121 soil samples.

Table 6 .
The different data configurations for the machine learning regression experiments.

Table 7 .
Model evaluation statistics for the total nitrogen in different experiments.

Table 8 .
Model evaluation statistics for the Olsen-P in different experiments.
This model exhibited strong correlation and minimal errors when compared to the actual measured values.