Application of Parameter Optimization Methods Based on Kalman Formula to the Soil—Crop System Model

Soil–crop system models are effective tools for optimizing water and nitrogen application schemes, saving resources and protecting the environment. To guarantee model prediction accuracy, we must apply parameter optimization methods for model calibration. The performance of two different parameter optimization methods based on the Kalman formula are evaluated for a parameter identification of the soil Water Heat Carbon Nitrogen Simulator (WHCNS) model using mean bias error (ME), root-mean-square error (RMSE) and an index of agreement (IA). One is the iterative local updating ensemble smoother (ILUES), and the other is the DiffeRential Evolution Adaptive Metropolis with Kalman-inspired proposal distribution (DREAMkzs). Our main results are as follows: (1) Both ILUES and DREAMkzs algorithms performed well in model parameter calibration with the RMSE_Maximum a posteriori (RMSE_MAP) values were 0.0255 and 0.0253, respectively; (2) ILUES significantly accelerated the process to the reference values in the artificial case, while outperforming in the calibration of multimodal parameter distribution in the practical case; and (3) the DREAMkzs algorithm considerably accelerated the burn-in process compared with the original algorithm without Kalman-formula-based sampling for parameter optimization of the WHCNS model. In conclusion, ILUES and DREAMkzs can be applied to a parameter identification of the WHCNS model for more accurate prediction results and faster simulation efficiency, contributing to the popularization of the model.


Introduction
Unreasonable water and nitrogen management practices in intensive agricultural systems have caused serious resource waste and environmental pollution [1]. Field experiments and numerical simulations are the main approaches to optimize water and nitrogen application schemes for the construction of high-yield and low-pollution cropping systems [2][3][4]. Due to the long time consumption and high cost of field experiments, numerical models are crucial tools for analyzing how soil-crop systems respond to field management schemes. There are many relevant soil-crop system models, such as RZWQM [5], WNMM [6], SWAT [7] and SPWS [8]. DRAINMOD is often combined with other models, such as ADAPT, RZWQM2 and DSSAT [9][10][11][12]. There are also many other soil-crop models, such as HERMES [13], CHANI-EPIC [14], DAYCENT [15] and NLEAP-GIS [16]. DNDC can simulate the processes responsible for the production, consumption and transport of nitrous oxide [17]. The soil Water Heat Carbon Nitrogen Simulator (WHCNS) model can be applicable to studies of water and nitrogen management under the complex conditions of intensive cropping systems in North China [18].
However, parameters must be calibrated and validated before the application of soilcrop system models to ensure prediction accuracy [19][20][21][22]. There are several cases for exploring the calibration of system models through combining parameter optimization methods where C L, f MD represents the cross-covariance matrix between the model parameters M The multiple data assimilation (MDA) scheme was applied as a supplementary for the model parameter update to achieve better optimization results for nonlinear problems [41,42]. The iterative form of an ES, which is defined as an ESMDA, assimilates the observations multiple times. The factor α i should be applied to inflate the measurement error for obtaining the reasonable results in iteration i. The factor should obey ∑ N i=1 1/α i 2 = 1, where N is the total assimilation time.

Theory of the DREAMkzs Algorithm
The formula of the Kalman-inspired proposal distribution, which accelerates the convergence of the chain to the posterior distribution, is shown as follows: where θ θ θ is a vector of model parameters; r (t−1) represents the residual vector of parameters θ θ θ (t−1) , and (t−1) denotes a vector of random draw from the distribution of measurement errors ( (t−1) ∼ N(0, R), with R as the covariance matrix of measurement errors. The Kalman gain K is defined as: where C θ θ θd signifies the cross-covariance matrix of model parameters and simulation results, and C dd denotes the covariance matrix of model simulation results. The candidate can be derived from the samples in the archive, through the mixture of parallel direction, snooker and Kalman trial moves. However, the Kalman-inspired proposal distribution introduces asymmetry even though it can shorten the burn-in processes. The approaches that can remedy this defect at present will considerably deteriorate the sampling efficiency. To avoid this disadvantage, the Kalman-inspired proposal distribution is only applied to the first T k steps of the Markov chain, after which the parallel direction and snooker proposal distributions are used for sampling.

WHCNS Model
The WHCNS, which has a detailed introduction in the literature [43], was selected as the forward soil-crop system model for our simulation. The depth of the soil profile was 180 cm, assuming that nitrogen moved beyond this depth could not be utilized by the crop. In total, 32 uncertain hydraulic parameters are listed in Table 1. The soil profile was divided into eight layers with four parameters for each layer. Considering the 8 crop parameters and 5 nitrogen transformation parameters shown in Table 2, there were 45 parameters that needed to be calibrated in all. The prior distributions of the parameters were assumed to be uniform distributions, and the ranges are listed in Tables 1 and 2. Mean bias error (ME), root mean square error (RMSE) and index of agreement (IA) are three different statistics applied for the evaluation of the model performance. The corresponding equations for them are as follows: where n is the number of observations, Si is the simulated value, Oi is the measured value, and O is the mean of the measured values. The RMSE represents the average difference between the simulations and the observations. The range of IA is limited to 0-1, and a better model performance is indicated if the value is closer to 1.

Description of Field Experiment Conditions
The study area was located within Alxa Left Banner, Inner Mongolia, China. The total potential evaporation is 20 times more than the average annual precipitation, which reaches 116 mm/year. The oasis cropping system is single crop growth during middle April to early October, which is dominated by spring maize. Irrigation is typically applied as flood irrigation and depends on groundwater, ranging from 40 to 70 m in depth. Typical N fertilizer application rates are about 280-350 kg N ha −1 [44]. The field experiment was undertaken at the experimental site over two spring maize growth periods. Four treatments were replicated three times in the 20 m × 20 m experimental plots each year. Maize was planted on April 12 and harvested on October 18. Volumetric soil water content was measured from 16 points of the soil profile by taking a value every 10 cm from a depth of 20 cm to 170 cm on 17 days (the 41st, 52nd, 54th, 63rd, 70th, 74th, 82nd, 91st, 95th, 104th, 109th, 115th, 124th, 135th, 146th, 154th and 160th day after sowing), using time domain reflectometry (TDR) probes. Detailed information on water and N management practices are presented in Table 3 [45]. Four irrigation-fertilizer treatments, namely I std N std (W 1 N 1 ), I std N csv (W 1 N 2 ), I csv N std (W 2 N 1 ) and I csv N csv (W 2 N 2 ), can be derived by combining two irrigation treatments with two N-fertilization applications.

Results
We demonstrate the performance of the ILUES compared with the ESMDA through both synthetic and practical cases, which are mentioned in Sections 3.1.1 and 3.1.2, respectively. Then, the applicability of the DREAMkzs algorithm is evaluated in Section 3.2. The parameters that need to be optimized include 32 soil hydraulic parameters for 8 layers of the soil profile, 5 nitrogen transformation parameters and 8 crop parameters, which are contained in Tables 1 and 2. Uniform prior parameter distributions are assumed based on the ranges listed in Tables 1 and 2.

Synthetic Case
To demonstrate the performance of the ILUES algorithm, a synthetic case is discussed in this section. The reference soil water content values indicated with red points in Figure 1a,b are derived through the following steps: First, randomly sample the reference parameter values from the corresponding prior uniform distributions consistent with the ranges listed in Tables 1 and 2; second, generate the reference output values by collecting soil water content at 16 different points of the soil profile on the 17 days mentioned in Section 2.2.2, which are derived by running the forward WHCNS model; and third, disturb the simulated soil water content values using the assumed measurement error ε ∼ N (0, 0.005 2 ). The ILUES and MDAES algorithms are executed with the number of iterations as 3 and an ensemble size of 500. Figure 1a shows the fitting results of the simulated soil water content and the corresponding reference values derived by the ILUES algorithm with the RMSE_Maximum a posteriori (RMSE_MAP) is equal to 0.0104, while Figure 1b shows similar results derived by the MDAES algorithm with RMSE_MAP = 0.0093. Then, we come to a preliminary conclusion that the ILUES algorithm is able to obtain a satisfactory fitting effect equivalent to ESMDA. Figure 1b shows similar results derived by the MDAES algorithm with RMSE_MAP = 0.0093. Then, we come to a preliminary conclusion that the ILUES algorithm is able to obtain a satisfactory fitting effect equivalent to ESMDA.  Figure 1 shows that acceptable simulation accuracy can be obtained by both the IL UES and ESMDA algorithms after three iterations. To further demonstrate the proposed algorithm's parameter estimation capabilities, the trace plots of model parameters cali brated by the two methods are compared in Figures 2 and 3. In Figure 2, the ILUES algo rithm's sampling efficiency for all soil hydraulic parameters is higher than that of the ESMDA algorithm because its sampling trace derived converged to the corresponding reference values faster. Figure 3 shows a similar performance, except for Tsum. This resul indicates that the ILUES method's ability to explore the parameter space is superior to the ESMDA method, as noted in the literature [40,41].  Figure 1 shows that acceptable simulation accuracy can be obtained by both the ILUES and ESMDA algorithms after three iterations. To further demonstrate the proposed algorithm's parameter estimation capabilities, the trace plots of model parameters calibrated by the two methods are compared in Figures 2 and 3. In Figure 2, the ILUES algorithm's sampling efficiency for all soil hydraulic parameters is higher than that of the ESMDA algorithm because its sampling trace derived converged to the corresponding reference values faster. Figure 3 shows a similar performance, except for T sum . This result indicates that the ILUES method's ability to explore the parameter space is superior to the ESMDA method, as noted in the literature [40,41].

Practical Case
To further demonstrate the ILUES algorithm's performance compared wit ESMDA method, the soil hydraulic, crop and nitrogen parameters of the WHCNS m are simultaneously identified through the practical W2N1 case mentioned in Table 3 red points shown in Figure 4a,b indicate volumetric soil water content measured depths of 20 cm, 40 cm, 60 cm, 80 cm, 100 cm, 120 cm, 140 cm and 160 cm; further those displayed in Figure 4c,d indicate volumetric soil water content measured depths of 30 cm, 50 cm, 70 cm, 90 cm, 110 cm, 130 cm and 170 cm. Volumetric soil content at each selected depth of soil profile are measured at 17 days, which is ment in Section 2.2.2. That is to say, there are 136 observations at 8 layers for 17 days that to be calibrated, and 119 measurement values at 7 layers for 17 days are used to va the calibrated model's capacity for accurate prediction. Figure 4a,b are the calibrati sults of the ESMDA and ILUES algorithms, and the RMSE_MAP values for bot 0.0255, meaning both methods have a similar data assimilation capability. Figur show validation results for the two different inverse methods with RMSE_MAPs, w are 0.0265 and 0.0269, respectively, indicating that the ILUES algorithm can deriv sonable results without overfitting. Therefore, both methods can reliably optimize th ward model system for accurate predictions.

Practical Case
To further demonstrate the ILUES algorithm's performance compared with the ES-MDA method, the soil hydraulic, crop and nitrogen parameters of the WHCNS model are simultaneously identified through the practical W 2 N 1 case mentioned in Table 3. The red points shown in Figure 4a,b indicate volumetric soil water content measured from depths of 20 cm, 40 cm, 60 cm, 80 cm, 100 cm, 120 cm, 140 cm and 160 cm; furthermore, those displayed in Figure 4c,d indicate volumetric soil water content measured from depths of 30 cm, 50 cm, 70 cm, 90 cm, 110 cm, 130 cm and 170 cm. Volumetric soil water content at each selected depth of soil profile are measured at 17 days, which is mentioned in Section 2.2.2. That is to say, there are 136 observations at 8 layers for 17 days that need to be calibrated, and 119 measurement values at 7 layers for 17 days are used to validate the calibrated model's capacity for accurate prediction. Figure 4a,b are the calibration results of the ESMDA and ILUES algorithms, and the RMSE_MAP values for both are 0.0255, meaning both methods have a similar data assimilation capability. Figure 4c,d show validation results for the two different inverse methods with RMSE_MAPs, which are 0.0265 and 0.0269, respectively, indicating that the ILUES algorithm can derive reasonable results without overfitting. Therefore, both methods can reliably optimize the forward model system for accurate predictions.         Table 3. Figure 6 shows evaluations of the DREAMzs and DREAMkzs algorithms' performances by applying the WHCNS model's parameter calibration based on the practical W1N1 case mentioned in Table 3. We take Pp = 0.7, Ps = 0.1, Pk = 0.2 and Tk = 0.2T, where Pp, Ps and Pk represent the selection probabilities of the parallel direction, snooker and Kalman-inspired proposal distributions, respectively, and T denotes the total number of model evaluations. Both MCMC algorithms were run with the number of chains N set to 3 and the number of generations in each chain T set to 1000. The red points shown in Figure 6a,b indicate the volumetric soil water content measured from depths of 20 cm, 40 cm, 60 cm, 80 cm, 100 cm, 120 cm, 140 cm and 160 cm. Volumetric soil water content at each selected soil profile depth is measured at 17 days, which is mentioned in Section 2.2.2. In total, 136 observations at 8 layers for 17 days need to be calibrated. Figure 6a,b show that the MCMC methods, including both DREAMkzs and DREAMzs, performed well in improving the level of fit between the observed states and simulated results, with RMSE_MAP values of 0.0253 and 0.0255, respectively. This preliminarily proved that the recommended DREAMkzs algorithm is feasible for accurately estimating practical observations.  Table 3. Figure 6 shows evaluations of the DREAMzs and DREAMkzs algorithms' performances by applying the WHCNS model's parameter calibration based on the practical W 1 N 1 case mentioned in Table 3. We take P p = 0.7, P s = 0.1, P k = 0.2 and T k = 0.2T, where P p , P s and P k represent the selection probabilities of the parallel direction, snooker and Kalman-inspired proposal distributions, respectively, and T denotes the total number of model evaluations. Both MCMC algorithms were run with the number of chains N set to 3 and the number of generations in each chain T set to 1000. The red points shown in Figure 6a,b indicate the volumetric soil water content measured from depths of 20 cm, 40 cm, 60 cm, 80 cm, 100 cm, 120 cm, 140 cm and 160 cm. Volumetric soil water content at each selected soil profile depth is measured at 17 days, which is mentioned in Section 2.2.2. In total, 136 observations at 8 layers for 17 days need to be calibrated. Figure 6a,b show that the MCMC methods, including both DREAMkzs and DREAMzs, performed well in improving the level of fit between the observed states and simulated results, with RMSE_MAP values of 0.0253 and 0.0255, respectively. This preliminarily proved that the recommended DREAMkzs algorithm is feasible for accurately estimating practical observations.  Figure 7a,b display the evolution of the RMSE between the simulated and observe soil water content derived by DREAMkzs and DREAMzs, respectively. The horizontal re dashed line depicts the reference value of 0.0255. DREAMkzs significantly accelerates th optimization process compared with DREAMzs. Figure 7c Figure 7a,b display the evolution of the RMSE between the simulated and observed soil water content derived by DREAMkzs and DREAMzs, respectively. The horizontal red dashed line depicts the reference value of 0.0255. DREAMkzs significantly accelerates the optimization process compared with DREAMzs. Figure 7c,d show the distributions of ME and IA obtained by Equations (6) and (8), respectively. The results indicate that the simulated soil water content derived by both algorithms agreed well with the practical observations. Moreover, DREAMkzs improves the sampling efficiency considerably more than DREAMzs. However, the number of generations is small because the original parameter ranges are determined based on the LM algorithm's optimized results. More generations may be required if there are no preliminary optimization results.

Discussion
We found that the proposed parameter optimization methods can explore parameter sampling spaces and considerably improve sampling efficiency, increasing the WHCNS model's prediction accuracy. Reliable simulation results can be derived by the calibrated models, which popularize the WHCNS model. In addition, models with high prediction accuracy can be applied to analyze the effect of climate, types of plants and field management on soil water movement, crop growth and soil nitrogen transport, contributing to the development of field management practices that not only save resources but also decrease the environmental pollution induced by excessive water and fertilizer application.
Obvious differences exist between the two Kalman-formula-based parameter optimization methods mentioned above, even though both of them can identify the model with acceptable accuracy. First, the Kalman formula plays different roles in the two proposed parameter optimization algorithms. The Kalman formula is utilized to update the parameter ensemble, while the Kalman-inspired proposal distribution is just applied to sample a parameter candidate in the DREAMkzs algorithm. Second, the influence degree of the Kalman formula on inversion results is different. In ILUES, the Kalman gain exists throughout the whole parameter inversion process [40]. However, the Kalman-inspired proposal distribution is only used in the first several steps to avoid introducing asymmetry in the sampled candidate states [46]. In addition, parameter calibration results that enable a good fitness to the observed data can make a satisfying prediction under the same

Discussion
We found that the proposed parameter optimization methods can explore parameter sampling spaces and considerably improve sampling efficiency, increasing the WHCNS model's prediction accuracy. Reliable simulation results can be derived by the calibrated models, which popularize the WHCNS model. In addition, models with high prediction accuracy can be applied to analyze the effect of climate, types of plants and field management on soil water movement, crop growth and soil nitrogen transport, contributing to the development of field management practices that not only save resources but also decrease the environmental pollution induced by excessive water and fertilizer application.
Obvious differences exist between the two Kalman-formula-based parameter optimization methods mentioned above, even though both of them can identify the model with acceptable accuracy. First, the Kalman formula plays different roles in the two proposed parameter optimization algorithms. The Kalman formula is utilized to update the parameter ensemble, while the Kalman-inspired proposal distribution is just applied to sample a parameter candidate in the DREAMkzs algorithm. Second, the influence degree of the Kalman formula on inversion results is different. In ILUES, the Kalman gain exists throughout the whole parameter inversion process [40]. However, the Kalman-inspired proposal distribution is only used in the first several steps to avoid introducing asymmetry in the sampled candidate states [46]. In addition, parameter calibration results that enable a good fitness to the observed data can make a satisfying prediction under the same treatment.
However, it does not guarantee comparable estimations accuracy for other types of field management practices.
ILUES and DREAMkzs are newly proposed parameter inversion methods that have not been previously applied in the data assimilation of soil-crop system models. However, soil water content was the single-state variable used for the models' parameter calibrations, and improvements can be made by considering other state variables, such as soil nitrate concentration, crop N uptake and yield [30]. Meanwhile, we consider the soil hydraulic, soil nitrogen transformation and crop parameters simultaneously, which induced a high dimension. Global sensitivity analysis can be adopted to reduce parameter dimensions, as in [47]. In addition, the effects of applying the two proposed methods to high nonlinear models need to be further verified. Meanwhile, an increase in the parameter dimension can reduce the sampling efficiency. Therefore, more attention should be paid to further improving the parameter optimization performance of soil-crop system models.

Conclusions
Both ILUES and DREAMkzs are parameter optimization methods based on the Kalman formula. They can be successfully applied to a data assimilation of the WHCNS model, even though the Kalman formula produces a different effect for them. Our main conclusions are as follows: (1) The models calibrated using both ESMDA and ILUES algorithms under the condition of W 2 N 1 are promisingly robust. The parameter distribution approaches the reference value faster under the artificial situation. Moreover, the performance of ILUES in modeling multimodal parameter distributions under a practical situation is superior to that of ESMDA under the same number of model evaluations condition. (2) DREAMkzs was successful in considerably accelerating the burn-in procedure compared with the DREAMzs with a comparative soil water content fitting accuracy under the same water and nitrate management practices.
Accurate model prediction is conducive to the formulation of field management measures for energy conservation and high yield. Furthermore, reducing input cost and increasing yield can raise farmers' incomes. The applicability of the latest parameter inversion methods in other fields to the WHCHS model can also be evaluated. At the same time, the two data assimilation methods mentioned in this paper can also be applied to other soil-crop system models. In addition, deep learning is currently popular and not limited by Gaussian hypothesis; therefore, it can be considered for introduction into the parameter inversion of a soil-crop system model, whether in whole or part.