Mechanism Analysis and Demonstration of Effective Information Extraction in the System Differential Response Inversion Estimation Method

: The system differential response method for inverse estimation has received much attention in the hydrology literature. However, its underlying mechanisms remain largely unexplored, highlighting the need for this study. This study proposes the relation degree coefﬁcient (RC) concept describing the nonlinear relationships between different variables and demonstrates that the selective information extraction ability of the method was ﬁrst demonstrated from a theoretical perspective. Synthetic cases were conducted to demonstrate the method performance under various variables to be estimated. The results show that the useful information is extracted from the relationship between the variables to be estimated and the observed discharge. In addition, there is a general trend that suggests that incorporating more variables into the inversion estimation can lead to enhanced estimation performance.


Problems of the Optimization Based on Objective Functions
Maximizing effective information utilization is the basis of model establishment, optimal parameter search, and accurate simulation and prediction [1][2][3][4].The flow information at the outlet of a river basin can be seen as the information treasure trove of hydrological modeling research because it contains all the information that impacts the calculation results of the model.The inversion estimation method of system differential response proposed by our research group provides an effective method for the selective utilization of flow information.It has been widely verified in real-time flood forecasting error corrections, model parameter calibration, and error estimations of rainfall, runoff yield, and initial soil water content.To deeply understand and analyze the theory and mechanism of the system differential response method, this paper analyzes and demonstrates the basic theoretical issues of the extraction effect and selective extraction ability of the inversion estimation information of the system differential response.
Model parameter calibration is the earliest application of the system differential response inversion estimation method [4,5], which has attracted extensive attention [6][7][8][9][10][11]. Calibration methods of model parameters are conventionally based on the responses of the objective function and use the provided information to search for the optimal parameters.These methods have two essential problems [4].The first problem is that the objective function, e.g., the error square operation, not only complicates the surface of the objective function but also changes the original optimal values and induces irrelevant local optimal values.We illustrate this problem through a parameter estimation example (i.e., estimating a from the following nonlinear function): Let us take samples of (x = 1, y = 2) and (x = 2, y =2).For the first sample, parameter a has two possible local optimal solutions, a1 = 1 and a2 = −2.There is only one solution for the second sample, a1 = 1.Parameter a should satisfy both samples; hence, only a1 = 1 is the unique solution of Equation (1).We can use the objective function of the sum of squared errors of the two samples, as shown in Equation (2).
The local optimal solutions can be obtained by formatting a cubic equation by the derivation dF (a)/da = 0 6 − a − 3a 2 −2a 3 = 0 In theory, the above equation has three local optimal solutions, and interestingly, −2 is not the local optimal solution of Equation (3).This shows that the square error operation not only induces uncorrelated local optimal solutions (from 2 to 3) but also changes the solution of the original function.
The second problem is that the sum of error squares of the objective function greatly loses the effective information in determining parameters.Similarly, we take Equation (1) as an example without losing generality.If the error squares are not aggregated, take each error square as the individual objective function, as follows: Similarly, with the derivation operation dF 1 (a)/da = 0 and dF 2 (a)/da = 0, Equation ( 5) is obtained The local optimal solutions of Equation ( 5) are a1 = 1, a2 = −2, and a3 = −0.5.This shows that if the error squares are not summated, the original two local optimal solutions a1 and a2 can still be found.But after summation, a2 cannot be found, as shown above.This implies a significant loss of information on the aggregation operations commonly used in optimization issues.If it is a complex nonlinear system such as the watershed hydrological model, information loss problems, the change in local optimal values, and the increase in uncorrelated local optimal values can be more pronounced using the objective functions based on the sum of error squares.Therefore, as a step forward, the method of system differential response proposed by our research group extracts parameters directly from the surface of the parametric function.This method does not lose the amount of information and does not increase the local optimal value, and the search speed is very fast.For more information, see papers [4][5][6][7].
Flood forecasting error correction is another important application field of the method.Hydrologic models are generally designed to model the rainfall-runoff physical process which is generally considered to be nonlinear and time-varying.Conceptual rainfall-runoff models simplify and conceptualize these complex processes using a set of simple mathematical equations.Conceptual rain-runoff models are generally reported to be robust and reliable in the applications of flood forecasting.However, it is generally accepted that the satisfactory application of such a model can be hampered by many factors including errors in the inputs, inadequacy of the model which generally includes errors in the model structure, errors in the model parameters, and errors in the model initial conditions.Many studies have indicated that sequential data assimilation methods represented by particle filtering and the ensemble Kalman filter, as well as variational assimilation methods exemplified by the 4D-variational method, can effectively reduce the uncertainties of flood forecasting.However, the successful application of these algorithms heavily relies on statistical assumptions that align with real-world conditions.In practice, it is often challenging to provide precise statistical assumptions.For instance, estimating the background error covariance in variational methods is difficult.Assumptions are typically made based on empirical evidence.If the given statistical assumptions deviate significantly from the actual conditions, the effectiveness of these methods might be compromised, sometimes even leading to divergence in results.
The method is applied to estimate the errors existing in rainfall [12], runoff yield [13], and soil water content [14], and the results demonstrate that the method can significantly improve the forecasting accuracy.For example, the proposed method outperforms the autoregressive technique when applying the methods to improve the forecast performance of a lumped hydrological model [15].As the driving force input of the hydrological model, the error in rainfall is critical to the model's accuracy.There are two types of rainfall errors: the observation error caused by instruments and equipment of measuring points and the error caused by obtaining the areal rainfall from the measured point rainfall.In arid and semi-arid basins, the second type of error is often the main error source of the model [12,13,16].The dynamic system inversion estimation method of differential response is used to dynamically estimate the rainfall error, which indirectly modifies the flood prediction results and greatly improves the flood prediction accuracy [12,13].Later, many scholars put forward improved methods accordingly [14][15][16][17].Rainfall is not only the main error source of the hydrological model but also the main source of error in the integrated water-sediment model [16].The static system differential response inversion method is further used to estimate the total rainfall of each flood or rainfall error [16] to reduce the influence of rainfall error on the flow and sediment model simulation results.The error of calculated runoff is one of the main sources of model calculation error [13].
The inversion estimation method of system differential response is used in the estimation of runoff error [13,18,19], and a variety of improved methods have been derived [20][21][22][23][24], which has received much attention in the context of hydrological modeling [25][26][27][28][29][30].Other applications include the inversion correction of initial soil water content error [14], the inversion correction of sediment yield error [31], and the error correction of comprehensive multi-factors [15].However, these applications are mainly demonstrations of the effect of the method, and some theoretical problems of the method itself are rarely explored.In this paper, the mechanism of how the system differential response extracts effective information from the flow hydrograph is expounded and demonstrated theoretically.

The Inverse Estimation Method of System Differential Response and Theoretical Demonstration of Its Effective Information Extraction 2.1. Basic Method
A hydrological model can be expressed by the following expression: where Q is the model output; Φ is a vector composed of model parameters, inputs, intermediate variables, and initial values to be estimated; and t is a discrete-time auxiliary variable.
According to the inversion estimation method of system differential response [4,5], the cyclic estimation Formula (7) can be used to calculate the Φ to be estimated: where j is the number of estimation cycles, is the deviation of the model simulation, and S is the system differential response matrix.Equation ( 7) can be resolved iteratively by minimizing the sum of squared deviations of the model simulation.That is, the iterative estimation makes Equation ( 8) satisfy Equation ( 9): 8) Therefore, the system differential response method is convergent and optimal.

The Relation Degree Coefficient and the Information Extraction Mechanism
For the convenience of demonstration without losing generality, it is assumed that the model has only one factor P to be estimated.According to the inversion estimation method of system differential response (Equation ( 7)), P is estimated as Equation (10).
where ∂ f (P; t) /∂P| P=P (j) n t=1 , when P = P (j) , is the system differential response curve series, P (j) is known at step j, and P (j + 1) is a new value to be estimated from the system differential response.In analogy to the concept of the correlation coefficient in statistics, Equation ( 10) is transformed to Equation (11) to demonstrate the mechanism of information extraction of system differential response.
The relation degree coefficient (RC), remaining information (RI), and unit information (UI) are then defined as follows.
It can be seen from Equation ( 12) that if the two series (Q t − f (P (j) ; t)); ∂ f (P; t) /∂P| P=P (j) n t=1 have a zero mean, RC is equivalent to the correlation coefficient.Therefore, RC can be understood as the coefficient reflecting the correlation degree between the series of differential response curves of the system and the series of deviations calculated by the model.Moreover, it can be proved mathematically that the range of the RC value is between −1 and 1, similar to that of the correlation coefficient.The residual information RI is expressed by overall differences between the observed flow and the flow calculated by the model and thus usually contains both effective and harmful information.The harmful information should not affect the inversion estimation result of P. Therefore, RC can be used to filter out harmful information.UI is the amount of effective information needed to estimate a unit variable, which is numerically equal to the sum of the system differential response curve squares.It can be seen that RC expresses the correlation between the differential response curve and the deviation series of the model calculation, suggesting that the method extracts the effective information needed for estimation through the correlation between the differential response curve and the residual information series.
The differential response curve of the system varies with different variables to be updated, and the statistical characteristics of the associated residual information series also differ.This achieves the distinction and extraction of the necessary information for updating different factors.

Analysis and Demonstration of Information Extraction Effect
It is sometimes difficult to demonstrate the rationality of a new method from the mathematical perspective, and hence, synthetic cases are used for in-depth analysis in this study.

Synthetic Case
Because the true model structure, parameters, inputs, outputs, and all related errors are unknown for real-world applications, we conduct a synthetic case using a conceptual hydrological model, where all the true values are known (predefined).Therefore, the performances of the method can be directly verified and compared by evaluating the accuracy of the updated variables.In other words, in the synthetic case, we assume that the model structure, parameters, inputs, outputs, and all related errors are known.In other words, the model structure is set, the model parameters and input are given, and the model output is calculated according to the input.As such, the error-free synthetic case is constructed.Then, a series of random errors are generated and superimposed into the "true" value, according to the need for demonstration, to form a synthetic case with an error closer to reality.In this study, the random number series with uniform distribution ranges from 0 to 1.Then, the error series {e t } n t=1 are calculated by Equation ( 13): where e max is the maximum possible error proportional to the mean absolute error of the study variable.

Single-Factor Information Extraction under the Influence of Noise Intensity
The most concerning problem in the demonstration of the information extraction method is whether the extraction method is effective and how its effect is affected by the error.So, in this paper, we take the unit hydrograph confluence model as an example to analyze the error inversion estimation of a flood runoff.The unit hydrographs are shown in Figure 1.Suppose there are two runoff generation periods with 10 mm and 12 mm runoff depths in the synthetic case.One hundred thousand sets of error series generated by Equation ( 13) are added to the process of runoff and flow in each period.The calculation results are shown in Table 1.In the table, V1 and V2 are the mean square error of the estimated runoff.δ1 and δ2 are the relative errors of the estimated runoff yield.α represents the proportional coefficient between the maximum possible error and the average discharge, as shown in Equation ( 14).
e max = αQ (14) where Q is the average flow.We can see from Equation ( 14) that α also reflects the ratio of the absolute mean value of the error series to the average discharge.
According to the results in Table 1, the mean square error and the relative error of the runoff estimation are both at a small level, less than 0.048 and 3.8%, respectively, although the error proportion coefficient relative to the average flow is 0.2, which is quite a large scale for the average flow error.This indicates that the inversion estimation method of system differential response has a good effect when there is a single factor error.
where Q is the average flow.We can see from Equation ( 14) that α also reflects ratio of the absolute mean value of the error series to the average discharge.

Multi-Factor Information Extraction Affected by Noise Intensity
Multi-factor information extraction is more complex than that of a single factor, and the biggest problem is whether it can separate extraction when there are errors from multiple factors.For the convenience of the graphic display of the extraction effect, this paper takes the unit hydrograph routing model as an example to analyze and demonstrate the mechanism.Then, the general effects of the method under multi-factor errors are verified through the final performance of improving runoff simulations.

Feature Analysis of Distinguishing and Extracting Information for Two-Factor Inversion Estimation
Figure 1 shows the unit hydrographs (i.e., system differential response curve) of the surface and underground runoff components (denoted as "Us" and "Ug", respectively).Figure 2 shows the total residual information of discharge and the error series (denoted as "DQ" and "e", respectively).It can be seen from the figure that the residual information with rapid changes in the early stage is positive, while negative information with relatively gradual changes is observed in the late stage.Figure 3 shows the effective information process of surface runoff and underground runoff extracted by the inversion estimation method of system differential response.These graphs show the information features extracted by the system differential response inversion method, that is, the updated surface and underground runoff extracted from the information process with the same shape as the surface and underground runoff differential response curve through the correlation between the surface and underground runoff differential response curve and the residual flow process information.
Water 2023, 15, x FOR PEER REVIEW 8 of 13 surface and underground runoff extracted from the information process with the shape as the surface and underground runoff differential response curve through the correlation between the surface and underground runoff differential response curve and the residual flow process information.surface and underground runoff extracted from the information process with the same shape as the surface and underground runoff differential response curve through the correlation between the surface and underground runoff differential response curve and the residual flow process information.

Demonstration of Multi-Factor Information Extraction Effect
In the watershed hydrological system, rainfall (P), evaporation (E), initial soil water content (W), and runoff yield (R) are the main factors affecting the final flow calculation performance.Although conducting theoretical and visual analyses like single-factor and two-factor analyses is impossible, the effect of system differential response inversion estimation can still be comprehensively verified using a synthetic case.We constructed the synthetic case by applying the Xinanjiang hydrological model in the Shuangjiangxi basin.The Xinanjiang (XAJ) rainfall-runoff model is a conceptual rainfall-runoff model proposed by Zhao [32] and has been extensively used in most humid and semi-humid regions in China.At the heart of its architecture is the proposition that a watershed's soil moisture storage capacity distribution can be systematically represented by a specific curve.A typical structure of the XAJ model consists of a runoff generation module, a runoff separation module, a flow concentration module, and a routing module.
Rainfall and evaporation observation data from 32 historical floods are used to calculate the synthetic flood process.Error series of rainfall, evaporation, initial soil water, and discharge are randomly generated similarly as in the previous section.The same statistical index of flow is used to compare the effect of error inversion estimation of different factors.The calculation procedure is shown in Figure 4, and the results are in Table 2.In the table, R 2 is the coefficient of determination, and RE is the relative effective coefficient.An R 2 value of 1 indicates a perfect match of the model output to the observed data, while an R 2 value of 0 means that the performance of the model is only as accurate as the mean of the observations.RE represents the degree of reduction of the model calculation error using the inverse estimation method of system differential response compared to the original simulations.P, E, W, and R in brackets after RE are the estimated rainfall, evaporation, initial soil water, and runoff, respectively.The R 2 and RE calculation formulas are shown in Equations ( 15) and ( 16), respectively.
where Q, QC, and QCU represent the observed flow, model calculated flow, and estimated flow, respectively.
Water 2023, 15, x FOR PEER REVIEW 9 of 13 in China.At the heart of its architecture is the proposition that a watershed's soil moisture storage capacity distribution can be systematically represented by a specific curve.A typical structure of the XAJ model consists of a runoff generation module, a runoff separation module, a flow concentration module, and a routing module.Rainfall and evaporation observation data from 32 historical floods are used to calculate the synthetic flood process.Error series of rainfall, evaporation, initial soil water, and discharge are randomly generated similarly as in the previous section.The same statistical index of flow is used to compare the effect of error inversion estimation of different factors.The calculation procedure is shown in Figure 4, and the results are in Table 2.In the table, R 2 is the coefficient of determination, and RE is the relative effective coefficient.An R 2 value of 1 indicates a perfect match of the model output to the observed data, while an R 2 value of 0 means that the performance of the model is only as accurate as the mean of the observations.RE represents the degree of reduction of the model calculation error using the inverse estimation method of system differential response compared to the original simulations.P, E, W, and R in brackets after RE are the estimated rainfall, evaporation, initial soil water, and runoff, respectively.The R 2 and RE calculation formulas are shown in Equations ( 15) and ( 16), respectively.( ) ( ) where Q, QC, and QCU represent the observed flow, model calculated flow, and estimated flow, respectively.In Table 2, findings derived from 32 distinct flood events are detailed.The results of R 2 underscore the significant reduction in forecasting errors achieved by implementing distinct inversion estimates for each flood event.The average REs of 32 floods are 0.832, 0.726, 0.695, 0.904, and 0.835 for single-factor P, e, W, R, and multi-factor PEW,  In Table 2, findings derived from 32 distinct flood events are detailed.The results of R 2 underscore the significant reduction in forecasting errors achieved by implementing distinct inversion estimates for each flood event.The average REs of 32 floods are 0.832, 0.726, 0.695, 0.904, and 0.835 for single-factor P, e, W, R, and multi-factor PEW, respectively.The most effective among them is the inversion estimation of runoff, followed by the simultaneous inversion estimation of P, E, and W, followed by the single-factor estimation of P, E, and W. The reason is that most of the error factors, including the errors from rainfall, evaporation, initial soil moisture, and the structure of the runoff model, are embedded in runoff errors and thus are considered indirectly when we estimate the runoff errors.However, the simultaneous estimation of P, E, and W only considers the errors of P, E, and W without considering those of the model structure.The single-factor estimation considers a single error factor, resulting in a lower RE.The results show that the more error factors there are involved in the inversion estimation, the more effective information contained in the flow hydrograph can be extracted by the system differential response, which makes the inversion estimation better.

Conclusions
The system differential response method for inverse estimation has received much attention in the hydrology literature, while its underlying mechanisms remain largely unexplored.To this end, this paper analyzes the method's formation mechanism of selective extraction from a theoretical perspective and proposes a new concept (i.e., relation degree coefficient) to analyze the correlation between nonlinear functional variables.Then, we analyze the mechanism of selective extraction from multiple perspectives and construct synthetic cases to demonstrate the effect of the selective extraction of inversion estimation information of the system differential response.The major conclusions of this paper are as follows: (1) The inversion estimation method of system differential response can selectively extract effective information, and its mechanism is based on the correlation between the system differential response curve of the factors to be estimated and the information contained in the flow hydrograph.(2) The more error factors considered in the inversion estimation, the more effective information contained in the flow hydrograph can be extracted by the system differential response, which makes the inversion estimation better.
In this study, synthetic cases were utilized to verify the effectiveness of the proposed method.Therefore, an intuitive idea for future research is to employ real-world cases to further substantiate and augment our findings.The method should be more broadly tested by expanding case studies across different hydrological models and study areas.For example, one can utilize the method to enhance the forecast performance of the HyMOD hydrological model using the CAMELS dataset [33,34].Additionally, this study analyzed the mechanism of information extraction through the correction of model inputs and model variables.Therefore, future work is needed to include more factors, such as the model parameters and model structure.While the proposed method has already proved its usefulness in the hydrologic community, it also holds potential as a robust estimation tool in other domains, such as target tracking and assessing environmental sustainability [35,36].For instance, the proposed method can be used to estimate the states and parameters of a maneuvering target by augmenting the state vector with model parameters.

Figure 1 .- 1 )Figure 1 .
Figure 1.The unit curves of flow routing.Us is the unit hydrograph of surface runoff, and Ug is unit hydrograph of underground runoff.

Figure 2 .
Figure 2. Model-calculated flow deviations and error series.DQ represents the total residual information process of the flood discharge, and e represents the error series.

Figure 3 .- 1 )Figure 2 .
Figure 3.The process line of extracted effective information.IRS and IRG represent the effective information-extracting process of surface runoff and underground runoff extracted by the system differential response inversion estimation method.3.3.2.Demonstration of Multi-Factor Information Extraction EffectIn the watershed hydrological system, rainfall (P), evaporation (E), initial soil water content (W), and runoff yield (R) are the main factors affecting the final flow calculation

Figure 2 .
Figure 2. Model-calculated flow deviations and error series.DQ represents the total residual information process of the flood discharge, and e represents the error series.

Figure 3 .- 1 )Figure 3 .
Figure 3.The process line of extracted effective information.IRS and IRG represent the effective information-extracting process of surface runoff and underground runoff extracted by the system differential response inversion estimation method.3.3.2.Demonstration of Multi-Factor Information Extraction EffectIn the watershed hydrological system, rainfall (P), evaporation (E), initial soil water content (W), and runoff yield (R) are the main factors affecting the final flow calculation

Author
Contributions: Y.C.: Conceptualization, Methodology, Software, Validation, Formal analysis, Data curation, Writing-original draft, Writing-review and editing, Visualization, Funding acquisition.K.L.: Writing-review and editing, Supervision.S.J.: Writing-review and editing, Visualization.Y.S.: Visualization.H.C.: Editing.All authors have read and agreed to the published version of the manuscript.Funding: This research received no external funding.

Table 1 .
Results of the estimated runoff obtained by the system differential response method.V1 and V2 are the mean square errors of estimated runoff.δ1 and δ2 are the relative errors of the estimated runoff yield.α represents the proportional coefficient between the maximum possible error and the average discharge.

Table 2 .
Multi-factor inversion estimation performance.Flood code represents a unique identifier assigned to each analyzed flood event.R 2 represents the coefficient of determination indicating the reliability of the model fit.RE(P), RE(E), and RE(W) represent the relative error attributed to discrepancies in precipitation estimation, evaporation estimation, and initial soil water content estimation, respectively.RE(PEW) represents the combined relative error for factors P, E, and W, showing their compounded impact.