An Integrated Yield-Based Methodology for Improving Soil Nutrient Management at a Regional Scale

The relationships between crop yield and its selected related impact factors has often been explored using ordinary least squares regression (OLSR). However, this model is non-spatial and non-robust. This study first used stepwise regression to identify the main factors affecting winter wheat yield from twelve potential related factors in Yucheng County, China. Next, robust geographically weighted regression (RGWR) was used to explore the spatially non-stationary relationships between wheat yield and its main impact factors. Then, its modeling effect was compared with that of GWR and OLSR. Last, robust geostatistical analysis was conducted for spatial soil management measures in low-yield areas. Results showed that: (i) three main impact factors on wheat yield were identified by stepwise regression, namely soil organic matter, soil total phosphorus, and pH; (ii) the spatially non-stationary effects of the main impact factors on wheat yield were revealed by RGWR but were ignored by OLSR; (iii) RGWR obtained the best modeling effect (RI = 52.31%); (iv) robust geostatistics obtains a better spatial prediction effect and the low-yield areas are mainly located in the northeast and the middle east of the study area. Therefore, the integrated yield-based methodology effectively improves soil nutrient management at a regional scale.


Introduction
With a growing human population [1], global food demand will double by 2050 [2]. In China, 30% to 50% more food will be needed to meet the country's demands in the next two to three decades [3]. Therefore, the Chinese Government has taken drastic steps to increase food production to feed its 1.3 billion people [4]. Wheat is a staple food in China, and increasing wheat yield per unit area is crucial to ensuring the country's food supply.
Wheat yield may be influenced by many factors, such as soil physical and chemical properties, climate, irrigation, tillage, and so on [5][6][7][8][9][10][11][12]. Since the influence degree of environmental factors in each agricultural area is not identical, the main limiting factors for crop yield are usually different. For example, in an agricultural areas rich in a certain soil nutrient, this nutrient is often not the key limiting factor for crop growth and development. However, in the areas with severe deficiency of this nutrient, the application of this fertilizer can often significantly increase crop yield. Therefore, it is crucial to determine the main impact factors on wheat yields. Stepwise regression could choose explanatory variables using an automatic procedure [13]. A variable is added to or subtracted from the explanatory variable set based on some prespecified criterion in each step. Stepwise regression is a potential tool for identifying the main impact factors on wheat yield in this study.
In previous studies, global regression techniques, such as ordinary least squares regression (OLSR), were usually used to explore the relationships between crop yield and its related impact factors [14][15][16]. However, these models are non-spatial; that is to say, spatial coordinates were not considered in regression coefficient estimation, and thus only constant regression coefficients could be obtained. However, the strength or concentration of soil physicochemical properties varies with spatial position, so their effects on crop yield are not constant. Therefore, global relationships obtained by the non-spatial models (e.g., OLSR) may deviate considerably from those observed locally, providing only an average impression of the relationships over an entire region [17,18].
Geographically weighted regression (GWR) is a spatially local regression technology [19]. This model takes the spatial coordinates into the coefficient estimation, permitting the coefficients to vary with spatial position. However, the locally linear estimation of GWR also causes this technology to be sensitive to outliers [20,21]. With the strengthening of human activities, spatial outliers commonly exist in the geochemical data sets measured at a regional scale. These outliers may further affect the performance of OLSR and basic GWR for exploring the relationships between wheat yield and its main impact factors [20][21][22][23]. In recent years, a robust local regression model, robust geographically weighted regression (RGWR), has emerged [24]. In theory, RGWR may be an effective tool for exploring the spatially non-stationary influences of main impact factors on wheat yield at a regional scale.
In addition, determining the spatial distribution pattern of wheat yield is a prerequisite for precisely managing the soil nutrients in low-yield areas. Geostatistical models can greatly minimize the estimation errors and investigation costs [25,26]. However, the traditionally-used Matheron variogram is sensitive to outliers, which may further affect the spatial prediction accuracy of wheat yield. Therefore, it is necessary to establish robust statistics in the spatial prediction of environmental factors in the area with high-intensity human influence.
The main objectives of this study are to: (i) identify the main impact factors on winter wheat yield in Yucheng County, China, using stepwise regression; (ii) establish RGWR for exploring spatially non-stationary relationships between wheat yield and its related main impact factors; (iii) compare the performance of RGWR, GWR, and OLSR in modeling the relationships; (iv) conduct robust geostatistical analysis and further suggest some spatially precise soil management measures in the low-yield areas of the county.

Study Area and Data
Yucheng County (116 • 22 ~116 • 45 E, 36 • 40 ~37 • 12 N) is located in the northwest of Shandong Province, with an area of 990 km 2 ( Figure 1). This county belongs to the North China Plain and has a better irrigation and drainage system. The cultivated land covers 683 km 2 , accounting for 69% of the total land area in the county [27]. The annual average temperature and precipitation are 13.1 • C and 593.2 mm, respectively. The Yellow River runs through the county. Winter wheat (Triticum aestivum L.) is one of the most important crops in the County. Climate, topography, and agronomic measures for winter wheat are similar across the county. Main soil subgroups in this county are Calcaric-Endorusti-Ustic Cambosols (Fluvisols), Calaric-Ochri-Aquic Cambosols (Fluvic Cambisols), Parasalic-Ochric-Aquic Cambosols (Calcaric Fluvisols) and Typic-Aqui-Orthic Halosols (Calcaric-Fluvic Cambisols) [28,29].

Determination of the Main Limiting Factors
In this study area, the main limiting factors on winter wheat yield were identif using stepwise regression from twelve potential related soil factors, namely soil pH, c ion exchange capacity (CEC), soil organic matter (SOM), total nitrogen (TN), total ph phorus (TP), total potassium (TK), available boron, copper, iron, zinc, molybdenum, a manganese.

Soil Sampling and Chemical Analysis
Ninety-nine pairs of wheat grains and surface soils (0~20 cm depth) were collected June 2012 ( Figure 1). The wheat genotype in the study area was mainly the Ji-Mai-22. T sampling locations were randomly arranged throughout the study area, and their coor nate information was recorded using a handheld GPS receiver (MAP60CSX, Garmin L Lenexa, KS, USA). The wheat and the corresponding surface soil were collected from same location. Each sample comprised four subsamples within a distance of 10 m s rounding a specific sampling location. For wheat sampling, one square meter of wh was collected at each subsample location and was sent to the lab for drying, threshi weighing, and finally calculating the wheat yield.
All soil samples were air-dried at room temperature (20~22 °C). After remov stones and other debris, soil samples were sieved into soil particles less than 2 mm portion of each soil sample (about 50 g) was then ground to pass through a sieve of 1/1 mm meshes. Standard methods were adopted to measure the following 12 soil propert pH (glass pH electrodes method), CEC (ammonium acetate method), SOM (potassiu dichromate-wet combustion method), TN (Kjeldahl method with H2SO4 + H2O2 digestio TP (acid digestion and colorimetry method), TK (acid digestion and atomic absorpt spectroscopy method), available boron (hot water extraction and colorimetry metho and available copper, iron, zinc, molybdenum, and manganese (diethylentriaminepe aacetic acid and atomic absorption spectroscopy method) [30]. Quality control was bas on the certified samples (GBW 07413) and analysis duplicates.

Determination of the Main Limiting Factors
In this study area, the main limiting factors on winter wheat yield were identified using stepwise regression from twelve potential related soil factors, namely soil pH, cation exchange capacity (CEC), soil organic matter (SOM), total nitrogen (TN), total phosphorus (TP), total potassium (TK), available boron, copper, iron, zinc, molybdenum, and manganese.

Soil Sampling and Chemical Analysis
Ninety-nine pairs of wheat grains and surface soils (0~20 cm depth) were collected in June 2012 ( Figure 1). The wheat genotype in the study area was mainly the Ji-Mai-22. The sampling locations were randomly arranged throughout the study area, and their coordinate information was recorded using a handheld GPS receiver (MAP60CSX, Garmin Ltd., Lenexa, KS, USA). The wheat and the corresponding surface soil were collected from the same location. Each sample comprised four subsamples within a distance of 10 m surrounding a specific sampling location. For wheat sampling, one square meter of wheat was collected at each subsample location and was sent to the lab for drying, threshing, weighing, and finally calculating the wheat yield.
All soil samples were air-dried at room temperature (20~22 • C). After removing stones and other debris, soil samples were sieved into soil particles less than 2 mm. A portion of each soil sample (about 50 g) was then ground to pass through a sieve of 1/100 mm meshes. Standard methods were adopted to measure the following 12 soil properties: pH (glass pH electrodes method), CEC (ammonium acetate method), SOM (potassium dichromate-wet combustion method), TN (Kjeldahl method with H 2 SO 4 + H 2 O 2 digestion), TP (acid digestion and colorimetry method), TK (acid digestion and atomic absorption spectroscopy method), available boron (hot water extraction and colorimetry method), and available copper, iron, zinc, molybdenum, and manganese (diethylentriaminepentaacetic acid and atomic absorption spectroscopy method) [30]. Quality control was based on the certified samples (GBW 07413) and analysis duplicates.

Robust Geostatistics
Kriging is expressed as the weighted sums of the adjacent sample points by taking into account the spatial dependence of the sample data. Variogram describes the spatial Agronomy 2022, 12, 298 4 of 13 correlation structure of a continuous variable. In the kriging system, variogram is calculated as [31]: where E{ ·} is the expectation value of the sample data in brackets; z i (u) is the target variable z i at location u; h is the separation distance vector. Matheron's estimator is the most widely used estimator for the variogram [32]. However, this estimator is sensitive to outliers [33][34][35]. In this study, three robust variogram estimators were first used, namely the Cressie-Hawkins estimator [33], Dowd's estimator [36], and Genton's estimator [37]. Then, standardized square prediction error (SSPE) at each sampling site were calculated [38]: where z * (u) and σ 2 (u) are kriging estimate and kriging variance, respectively, at location u. The expectation of the median value of SSPE is 0.455. The optimal variogram estimator corresponds to the one with the median value of SSPE closest to 0.455 [38]. In this study, the robust ordinary kriging z * ROK (u) is written: subject to: where λ α is the weight of the αth sample in the search radius used for kriging calculation. For detailed descriptions of ROK, please refer to the related literature [38].

Robust Geographically Weighted Regression
Traditional global regression techniques, such as ordinary least squares regression (OLSR), assume that the relationships between the dependent variable (i.e., wheat yield) and its explanatory variables (i.e., main related soil factors) are constant throughout the study area. OLSR model is expressed as [20]: where y, x i , and ε represent the wheat yield, the ith related soil factor, and the Gaussian error term, respectively; β 0 and β i represent the model intercept and the regression coefficient for the ith related soil factor x i , respectively; and p is the number of the main related soil factors in the OLSR model. This type of regression model is non-spatial; that is to say, spatial coordinates are not considered in estimating regression parameters. Geographically weighted regression (GWR) takes the spatial locations of samples into the estimation of regression coefficients, permitting the coefficients to vary spatially. The GWR model is expressed as [20,39]: where y(µ, v), x i (µ, v), and ε(µ, ν) represent wheat yield, the ith main soil factor, and the Gaussian error term, respectively, at the location (µ, ν); β 0 (µ, v) and β i (µ, v) represent the intercept and the local regression coefficient for the ith main soil factor, respectively, at where X is an [m × (p + 1)] data matrix of main related soil factors with p being the number of the main related soil factors and m being the number of observed data at the regression point (µ, ν); Y is an (m × 1) data vector of the wheat yield; W(u, v) is an (m × m) local weights diagonal matrix which is calculated from a kernel function that places more weights on samples spatially closer to the calibration location [20]. As the sample density varies with spatial position, the adaptive bi-square kernel function was adopted for weight estimation. The optimal adaptive bandwidth was determined based on the corrected Akaike information criterion (AICc) [20]. Thirty-five surrounding points (i.e., optimal adaptive bandwidth) were finally used to calibrate the basic GWR model at each location.
Since outliers are widely present in geochemical datasets, robust geographically weighted regression (RGWR) borrowed from the robust multiple linear regression paradigm was used to reduce the influences of outliers on the estimation of local regression coefficients. This technology is to refit a basic GWR with a filtered dataset by removing sample data with large externally studentized residuals of an initial GWR fit [20,21]. Details about the RGWR model were described in Fotheringham et al. [20] and Harris et al. [21].

Model Validation
In order to evaluate the modeling effects of the regression models (i.e., RGWR, GWR, and OLSR), the regression residuals (n = 99) were analyzed. Traditionally, two indices are often used to evaluate modeling effects based on predicted and measured values, namely mean absolute error (MAE) and root-mean-square error (RMSE). The two indices can be calculated using the following equations: where n is the number of validation points; z i and z * i are the measured and predicted wheat yield, respectively, at the ith validation site. Lower MAE and RMSE values indicate higher prediction accuracy. However, the above indices are susceptible to outliers. High outliers usually make MAE and RMSE overestimated, which in turn affects the assessment results. In order to weaken the influence of outliers on model validation, this study used the median rather than the mean in the validation indices. When MAE and RMSE are calculated using the idea of the median, both indices are transformed into median absolute error (MAE*): The relative improvement (RI) of one model over the other is calculated using [40]: where MAE * R and MAE * E are the MAE * of the reference and evaluated models, respectively. In this study, OLSR, GWR, and RGWR were performed on the "GWmodel" packages [20] in R 3.3.3 software (R Development Core Team); robust variograms were fitted on the "Georob" package [41] in R 3.3.3 software (R Development Core Team); ArcGIS (version 10.0) was used for producing maps.

Wheat Yield and Its Main Impact Factor
Stepwise regression analysis was used to identify the main impact factors on wheat yield in Yucheng County. Three optimal explanatory variables (i.e., SOM, TP, and pH) were identified from the 12 soil properties (i.e., SOM, pH, CEC, TN, TP, TK, and available boron, copper, iron, zinc, molybdenum and manganese). The stepwise regression model is given as follows: In this model, the spatial coordinate was not considered [42]. The relationships derived from the global model provided only average influences on wheat yield over the entire region.
Descriptive statistics of sample data for wheat yield and its three main impact factors (i.e., SOM, soil TP, and soil pH) are listed in Table 1. The soil samples in the whole study area are alkaline. The mean of soil pH is 8.56 in the study area. Such an alkaline soil environment may weaken the transformation ability of soil nutrients from fixed forms to available forms directly or indirectly [43,44]. The coefficients of variation (CVs) of soil SOM, TP, and wheat yield range from 15.12-38.49%, showing moderate variability for these indicators [45].

Global Influences of the Main Related Factors on Wheat Yield
Pearson's correlation coefficients (r) between wheat yield and its main related factors are shown in Table 2. Wheat yield is negatively correlated with soil pH (r = −0.56) and positively correlated with SOM (r = 0.71) and TP (r = 0.62). The correlation between the three indicators (i.e., SOM, TP, and pH) is weak (Table 2). Therefore, SOM, TP, and soil pH all strongly influence wheat yield. However, since the traditional correlation analysis does not consider spatial coordinates, the above results can only reflect the global rather than local effects.

Robust Geostatistical Analysis of Wheat Yield and the Main Related Factors
The medians of the SSPE obtained by leave-one-out cross-validation are presented in Table 3. The medians of SSPE corresponding to the traditional Matheron estimator deviated the farthest from 0.455. The main reason is that the traditional Matheron estimator is sensitive to outliers. The optimal estimators were the Dowd estimator for wheat yield and SOM and Genton estimators for soil TP and pH. Therefore, the above robust estimators rather than the traditionally-used Matheron estimator were adopted in this study. There was no apparent anisotropy for wheat yield, SOM, soil TP and pH. Therefore, all robust variograms were fitted omnidirectionally. Parameters of the optimal robust variogram are shown in Table 4. Except that the experimental variogram corresponding to wheat yield is fitted by the spherical model, the experimental variograms corresponding to the others are fitted by the exponential model. The nugget/sill ratios range from 19.79% to 31.58%, showing moderate spatial self-dependency for all four indicators [41]. The spatial distribution maps of wheat yield and its main impact factors generated by ROK are presented in Figure 2. High-yield areas are mainly concentrated in the west and southeast of the study area, and low-yield areas are mainly located in the northeast of the study area. It can be seen from Figure 2 that multiple factors jointly affected the wheat yield in the study area. The low-yield areas in the northeast also have lower SOM, TP, and stronger soil alkalinity; high-yield areas in the west also have higher SOM, TP, and weaker soil alkalinity. However, only a qualitative effect of these soil factors on wheat yield can be obtained from Figure 2. Therefore, it is necessary to explore further the spatially non-stationary influences of main related factors on winter wheat yield.

Model Validation for RGWR
Validation indices for OLSR, GWR and RGWR are shown in Table 5. RGWR produced the best modeling effect (MAE* = 0.5), and OLSR produced the worst modeling effect (MAE* = 1.04). Relative improvement accuracy over OLSR was 52.31% for RGWR and 34.1% for GWR. Therefore, RGWR performed better than GWR and OLSR for modeling the relationships between wheat yield and its related impact factors (i.e., SOM, soil TP, and soil pH). Therefore, RGWR could effectively reveal the spatially non-stationary influences of main related factors on winter wheat yield in this study.

Spatially Non-Stationary Relationships between Wheat Yield and Its Main Impact Factors
Local regression coefficients (i.e., intercepts, slopes) generated by RGWR analysis of wheat yield against its main impact factors are presented in Figure 3. The spatially non-stationary relationship maps of wheat yield vs. its main impact factors could be used to explain the local causes of low wheat yield in the study area. The relationships between wheat yield and its three related soil factors were spatially non-stationary ( Figure 3); that is to say, the regression coefficients vary with spatial position. Their corresponding local regression coefficients explain the influences of SOM, soil TP, and soil pH on wheat yield. In general, positive regression coefficients represent the positive influence on wheat yield, and negative regression coefficients represent the negative influence on wheat yield. Therefore, the measures such as reducing soil alkalinity and increasing soil SOM and TP levels should be conducted for increasing wheat yield in this county. The influence strength of the related impact factors on wheat yield varies with spatial location. The areas with the strongest impact were mainly located in the southeast, east, and west of the study area. The results are crucial information for improving soil nutrient management.

Delineating Areas for Soil Improvement
Soil nutrient management areas (wheat yield < 6.00 t hm −2 ) delineated based on the spatial distribution pattern of wheat yield are shown in Figure 4. Low-yield areas are mainly located in the northeast (i.e., subarea I) and the middle east (i.e., subarea II) of the study area (Figure 4). Local regression coefficients generated by RGWR indicate that soil pH has a negative contribution to wheat yield while SOM and soil TP have positive contributions ( Figure 3). Thus, reducing soil alkalinity, increasing SOM and soil TP should be important measures to increase wheat yield in Yucheng County. Thus, applying organic fertilizer, phosphate fertilizer, or/and acid fertilizer may be important measures to improve soil nutrition, which may be beneficial to increase wheat production in Yucheng County. As the influence strength of the main impact factors (i.e., SOM, soil TP, and soil pH) on wheat yield varies with spatial location (Figure 3), improvement measures should be different for different low-yield areas. For example, SOM has a greater impact on wheat

Delineating Areas for Soil Improvement
Soil nutrient management areas (wheat yield < 6.00 t hm −2 ) delineated based on the spatial distribution pattern of wheat yield are shown in Figure 4. Low-yield areas are mainly located in the northeast (i.e., subarea I) and the middle east (i.e., subarea II) of the study area ( Figure 4). Local regression coefficients generated by RGWR indicate that soil pH has a negative contribution to wheat yield while SOM and soil TP have positive contributions ( Figure 3). Thus, reducing soil alkalinity, increasing SOM and soil TP should be important measures to increase wheat yield in Yucheng County. Thus, applying organic fertilizer, phosphate fertilizer, or/and acid fertilizer may be important measures to improve soil nutrition, which may be beneficial to increase wheat production in Yucheng County. As the influence strength of the main impact factors (i.e., SOM, soil TP, and soil pH) on wheat yield varies with spatial location (Figure 3), improvement measures should be different for different low-yield areas. For example, SOM has a greater impact on wheat yield in subarea I than in subarea II; soil pH has a greater impact on wheat yield in subarea I than in subarea II (Figures 3 and 4). also change to a certain extent over time, the main limiting factors of wheat yield may also change after a larger time span. After a long period of time or in other study areas, the main limiting factors of crop yield and their spatially varying impact need to be re-determined. In addition, wheat cultivars may also have an important influence on wheat yield [47]. Ji-Mai-22, one of the most important wheat cultivars in Yucheng County, was considered in this study. If other wheat cultivars were planted in the same study area, the relationships between wheat yield and related soil properties might not be identical. Usually, wheat yield is influenced by multiple factors, such as soil physical and chemical properties, climate, irrigation, tillage and so on [5,6,8,12]. At the national scale of China, climate factors may be important indicators affecting the spatial variation of wheat yield [46]. Because the environmental factors in each region are not identical, the main factors that limit the growth of wheat in each region are different. Stepwise regression is an effective model for identifying the main impact factors on crop yield at a regional scale. However, this model is a type of global regression, and spatial coordinate was not considered in regression coefficient estimation. Thus, stepwise regression can only qualitatively identify the main factors that impact wheat yield in the entire study area. RGWR takes into account spatial coordinates in the regression analysis and is robust to outliers. Therefore, RGWR was used to explore the spatially non-stationary influences of main related soil factors on winter wheat yield. Model validation shows that the relationships generated by RGWR are more accurate than those generated by GWR and OLSR.
This study mainly focused on improving soil nutrient management at the county scale to increase wheat yield. Crop yields at the regional scale are often affected by local limiting soil factors. In order to increase the crop yields in a spatially targeted manner, the limiting soil factors must be determined first, and then the spatially non-stationary relationship between crop yield and the limiting soil factors should be explored.
It is worth noting that the content of specific soil factors and their spatial variability in each study area are quite different, and the limiting factors of wheat yield in this study area are not necessarily the same as those in other areas. Moreover, since soil factors will also change to a certain extent over time, the main limiting factors of wheat yield may also change after a larger time span. After a long period of time or in other study areas, the main limiting factors of crop yield and their spatially varying impact need to be re-determined. In addition, wheat cultivars may also have an important influence on wheat yield [47]. Ji-Mai-22, one of the most important wheat cultivars in Yucheng County, was considered in this study. If other wheat cultivars were planted in the same study area, the relationships between wheat yield and related soil properties might not be identical.

Conclusions
Stepwise regression was used to identify the most important impact factor on wheat yield from 12 potential impact factors. Then, RGWR was established to explore the relationships between winter wheat yield and its main related soil factors, and its modeling effect was further compared with the traditionally-used OLSR and GWR. Finally, robust geostatistical analysis was conducted for spatially precise soil management measures in the low-yield areas of the county. Results showed that: (i) SOM, soil TP, and pH were the main impact factors on wheat yield; (ii) the relative improvement accuracy over OLSR was 52.31% for RGWR and 34.1% for GWR; (iii) RGWR revealed the spatially non-stationary influences of the main impact factors on wheat yield, which were ignored by the traditional OLSR model; (iv) robust geostatistics obtain a better spatial prediction effect and the low-yield areas are mainly located in the northeast and middle east of the study area. Therefore, the method recommended in this study provided information for guideline field management.