Adaptive Conditional Bias-Penalized Kalman Filter for Improved Estimation of Extremes and its Approximation for Reduced Computation

In many signal processing applications of Kalman filter (KF) and its variants and extensions, accurate estimation of extreme states is often of great importance. When the observations used are uncertain, however, KF suffers from conditional bias (CB) which results in consistent under- and overestimation of extremes in the right and left tails, respectively. Recently, CB-penalized KF, or CBPKF, has been developed to address CB. In this paper, we present an alternative formulation based on variance-inflated KF to reduce computation and algorithmic complexity, and describe adaptive implementation to improve unconditional performance. For theoretical basis and context, we also provide a complete self-contained description of CB-penalized Fisher-like estimation and CBPKF. The results from 1-dimensional synthetic experiments for a linear system with varying degrees of nonstationarity show that adaptive CBPKF reduces root mean square error at the extreme tail ends by 20 to 30% over KF while performing comparably to KF in the unconditional sense. The alternative formulation is found to approximate the original formulation very closely while reducing computing time to 1.5 to 3.5 times of that for KF depending on the dimensionality of the problem. Adaptive CBPKF hence offers a significant addition to the dynamic filtering methods for general application in signal processing when accurate estimation of extremes is of importance.


I. INTRODUCTION
Kalman filter (KF) and its variants and extensions are widely used to fuse observations with model predictions in a wide range of applications [ [20] [21]. In geophysics and environmental science and engineering, often the main objective of signal processing is to Submitted  improve estimation and prediction of states in their extremes rather than in normal ranges. In hydrologic forecasting, for example, accurate prediction of floods and droughts is far more important than that of streamflow and soil moisture in normal conditions. Because KF minimizes unconditional error variance, its solution tends to improve estimation near median where the state of the dynamic system resides most of the times while often leaving significant biases in the extremes. Such conditional biases (CB) [22]generally result in consistent underand overestimation of the true states in the upper and lower tails of the distribution, respectively. To address CB, CB-penalized Fisher-like estimation and CB-penalized KF (CBPKF) [23] [24]have recently been developed which jointly minimize error variance and expectation of the Type-II CB squared for improved estimation and prediction of extremes. The Type-II CB, defined as , is associated with failure to detect the event where x denotes the realization of where , X and x denote the unknown truth, the estimate, and the realization of X , respectively [25]. The original formulation of CBPKF, however, is computationally very expensive for high-dimensional problems. Also, whereas CBPKF improves performance in the tails, it deteriorates performance in the normal ranges. In this work, we approximate CBPKF with forecast error covariance-inflated KF, referred to hereafter as the variance-inflated KF (VIKF) formulation, as a computationally less expensive and algorithmically simpler alternative, and implement adaptive CBPKF to improve performance in the unconditional mean sense. Elements of CB-penalized Fisher-like estimation has been described in the forms of CB-penalized indicator cokriging for fusion of predicted streamflow from multiple models and observed streamflow [26], CB-penalized kriging for spatial estimation [27] and rainfall estimation [28], and CB-penalized cokriging for fusion of radar rainfall and rain gauge data [29].The original formulation of CBPKF have been described in [24]and [23], respectively. Its ensemble extension, CB-penalized ensemble KF, or CEnKF, is described in [30] in the context of ensemble data assimilation for flood forecasting. Haksu [31]for , or * , as: The error covariance matrix for * , [( − * )( − * ) ],is given by: With (2), we may write Type-II CB as: The observation equation for Z is obtained by inverting (1): The (mx(n+m)) matrix, G, in (5) is given by: where U T is some (m×(n+m)) nonzero matrix. Using (5) and the identity, = Ψ + , we may write the Bayesian estimate for E[Z|X] in (4) as: where Equations (7) and (8) state that the Bayesian estimate of Z given X is given by HX if the a priori state error covariance Ψ is noninformative or there are no observation errors, but by the average of the a priori mean and the observed true state X if the a priori Ψ is perfectly informative or observations are information-less.
With (4), we may write the quadratic penalty due to Type-II CB as: where I denotes the (m×m) identity matrix. Combining Σ in (3) and Σ in (9), we have the apparent error covariance, Σ , which reflects both the error covariance and Type-II CB: where α denotes the scaler weight given to the CB penalty term. Minimizing (10) with respect to W, or by direct analogy with the Bayesian solution [31], we have: The modified structure matrix ̂ and observation error covariance matrix in (11) are given by: Using (11) and the matrix inversion lemma [32], we have for Σ and * in (10) and (2), respectively: where Δ = αΨ̂[̂Ψ̂+ ] − . To render the above Bayesian solution to a Fisher-like solution, we assume no a priori information in X and let Ψ −1 in the brackets in (14) and (15) vanish: where the scaling matrix B is given by = Ψ̂−̂+ . To obtain the estimator of the form, * = , we impose the unbiasedness condition, The above condition is satisfied by replacing [̂−̂] − with [̂− ] − and dropping ∆ in (17): Finally, we obtain from (3) the error covariance, Σ , associated with * in (19): Note that, if α=0, we have ̂= and = , and hence the CB-penalized Fisher-like solution, (19) and (20), is reduced to the Fisher solution [31].

III. CONDITIONAL BIAS-PENALIZED KALMAN FILTER
CBPKF results directly from decomposing the augmented matrices and vectors in (19) and (20) as KF does from the Fisher solution [31]. The CBPKF solution, however, is not very simple because the modified observation error covariance matrix, Λ, is no longer diagonal. An important consideration in casting the CB-penalized Fisher-like solution into CBPKF is to recognize that CB arises from the error-in-variable effects associated with uncertain observations [33], and that the a priori state, represented by the dynamical model forecast, is not subject to CB. We therefore apply the CB penalty to the observations only, and reduce C in (8) to = ( 1, 2, ) = ( 1, 0) . Separating the observation and dynamical model components in̂and via the matrix inversion lemma, we have: where In the above, denotes the (n×m) observation matrix,and denotes the (n × n) observation error covariance matrix. To evaluate the (m  n) matrix, 1, , it is necessary to specify in (6). We use U T =H T which ensures invertibility of U T H, but other choices are possible. We then have for 1, : Expanding W in (11) with −1 = = [ Γ 11, Γ 12, Γ 21, Γ 22, ], we have; In (32), the (m × n) and (m × m) weight matrices for the observation and model prediction, ω1,k and ω2,k, respectively, are given by: The apparent CBPKF error covariance, which reflects both Σ and Σ , is given by (18) as: The CBPKF error covariance, which reflects Σ only, is given by (20) as: Because CBPKF minimizes Σ , | rather than Σ | , it is not guaranteed that (39) satisfies Σ | ≤ Σ | −1 a priori. If the above condition is not met, it is necessary to reduce α and repeat the calculations. If α is reduced all the way to zero, CBPKF collapses to KF. The CBPKF estimate may be rewritten into a more familiar form: In (40), Zk denotes the (n  1) observation vector, and the (m×n) CB-penalized Kalman gain, , is given by: To operate the above as a sequential filter, it is necessary to prescribe Ψ and α. An obvious choice for Ψ , i.e., the a priori error covariance of the state, is Σ | −1 . Specifying α requires some care. In general, a larger α improves accuracy over the tails but at the expense of increasing unconditional error. Too small an α may not effect large enough CB penalty in which case the CBPKF and KF solutions would differ little. Too large an α, on the other hand, may severely violate the Σ | ≤ Σ | −1 condition in which case the filter may have to be iterated at additional computational expense with successively reduced . A reasonable strategy for reducing is = value of α at the i-th iteration [24] [30]. For high-dimensional problems, CBPKF can be computationally very expensive.
Whereas KF requires solving an (m  n) linear system only once per updating or fusion cycle, CBPKF additionally requires solving two (m  m) linear systems (for 1, andΓ 22 ), and an (n  n) system (for 11  ), assuming that the structure of the observation equation does not change in time (in which case 2, in (29) may be evaluated only once). To reduce computation, below we approximate CBPKF with KF by inflating the forecast error covariance.

IV. VIKF APPROXIMATION OF CBPKF
The main idea behind this simplification is that, if the gain for the CB penalty, C, in (10) can be linearly approximated with H, the apparent error covariance Σ becomes identical toΣ in (3) but with Ψ inflated by a factor of 1+α: The KF solution for (42) is identical to the standard KF solution but with Σ | −1 replaced by (1 + α)Σ | −1 : With WH=I in (43) for the VIKF solution, we have Σ (1+ ) = ( + ) for the apparent filtered error variance of X | in (42). The error covariance of X | , Σ | , is given by (3) as: In (44), the inflated filtered error covariance, Σ , | , where denotes the multiplicative inflation factor, is given by: Computationally, evaluation of (43) and (44) requires solving two (m×n) and an (m×m) linear systems. As in the original formulation of CBPKF, iterative reduction of is necessary to ensure Σ | ≤ Σ | −1 . The above approximation assumes that the CB penalty, Σ , is proportional to the error covariance, Σ . To help ascertain how KF, CBPKF and the VIKF approximation may differ, we compare in Table I  = 1 and 2 = 4 (middle), and | −1 2 = 4 and 2 = 1 (right). For all cases, we set h to unity and varied from 0 to 1. The figure indicates that, compared to the KF solution, the VIKF approximation and the CBPKF solution prescribe appreciably larger gains, that the increase in gain is larger for larger α, and that the CBPKF gain is larger than the gain in the VIKF approximation for the same value of α. The figure also indicates that, compared to KF error variance, CBPKF error variance is larger, and that the increase in error variance is larger for larger α. Note that the differences between the KF and CBPKF solutions are the smallest for | −1 2 > 2 , a reflection of the diminished impact of CB owing to the comparatively smaller uncertainty in the observations. The above development suggests that one may be able to approximate CBPKF very closely with the VIKF-based formulation by adjusting α in the latter. Below, we evaluate the performance of CBPKF relative to KF and the VIKF-based approximation of CBPKF.

V. EVALUATION AND RESULTS
For comparative evaluation, we carried out the synthetic experiments of [24]. We assume the following linear dynamical and observation models with perfectly known statistical parameters: where Xk and Xk-1 denote the state vectors at time steps k and k-1, respectively, Φk-1 denotes the state transition matrix at time step k-1 assumed as Φ −1 = −1 , ,~( 0, 2 ), i=1,…,n. The number of observations, n, is assumed to be time-invariant. The observation errors are assumed to be independent among themselves and of the true state. To assess comparative performance under widely varying conditions, we randomly perturbed φk-1, σw,k-1 and σv,k above according to (48) through (50) below, and used only those deviates that satisfy the bounds: In the above, the superscript p signifies that the variable is a perturbation, and v  denote the normally-distributed white noise for the respective variables, and   , w  and v  denote the standard deviations of the white noise added to  , respectively. The parameter settings (see Table   I) are chosen to encompass less predictable (small φk-1) to more predictable (large φk-1) processes, certain (small σw,k-1) to uncertain (large σw,k-1) model dynamics, and more informative (small σv,k) to less informative (large σv,k) observations. The bounds for p  , and the observation, Zk, respectively, very large, and hence keep the filters operating in unrealistically favorable conditions for extended periods of time. We then apply KF, CBPKF and the VIKF approximation to obtain k k X | and Σ | , and verify them against the assumed truth. To evaluate the performance of CBPKF relative to KF, we calculate percent reduction in root mean square error (RMSE) by CBPKF over KF conditional on the true state exceeding some threshold between 0 and the largest truth. Fig. 2 show the percent reduction in RMSE by CBPKF over KF for Cases 1 (left), 5 (middle) and 9 (right) representing Groups 1, 2 and 3 in Table I, respectively. The three groups differ most significantly in the variability of the dynamical model error, , and may be characterized as nearly stationary (Group 1), nonstationary (Group 2), and highly nonstationary (Group 3). The range of values used is [0.1, 1.2] with an increment of 0.1. The numbers of state variables, observations, and updating cycles used in Fig. 2 are 1, 10, and 100,000 for all cases. The dotted line at 10% reduction in the figure serves as a reference for significant improvement. The figure shows that, at the extreme end of the tail, CBPKF with of 0.7, 0.6 and 0.5 reduces RMSE by about 15, 25 and 30% for Cases 1, 5 and 9, respectively, but at the expense of increasing unconditional RMSE by about 5%.The general pattern of reduction in RMSE for other cases in Table I is similar within each group and is not shown. We only note here that larger variability in observational uncertainty (i.e., larger ) reduces the relative performance of CBPKF somewhat, and that the magnitude of variability in predictability (i.e., ) has relatively small impact on the relative performance.
It was seen in Table I that the VIKF approximation is identical to CBPKF for m=n=1 but for the multiplicative scaler weight for the CB penalty. Numerical experiments indicate that, whereas the above relationship does not hold for other m or n, one may very closely approximate CBPKF with the VIKFbased formulation by adjusting . For example, the VIKF approximation with increased by a factor of 1.25 to 1.90 differ from CBPKF only by 1% or less for all 12 cases in Table  II with m=1 and n=10. The above findings indicate that the VIKF approximation may be used as a computationally less expensive alternative for CBPKF. Table III compares the CPU time among KF, CBPKF and the VIKF approximation for 6 different combinations of m and n based using Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz. The computing time is reported in multiples of the KF's. Note that the original formulation of CBPKF quickly becomes extremely expensive as the dimensionality of the problem increases whereas the CPU time of the VIKF approximation stays under 3.5 times that of KF for the size of the problems considered.
If the filtered error variance is unbiased, one would expect the mean of the actual error squared associated with the variance to be approximately the same as the variance itself. To verify this, we show in Fig. 3 the filtered error variance vs. the actual error squared for KF (left), the VIKF approximation (middle) and CBPKF (right) for all ranges of filtered error variance. For reference, we plot the one-to-one line representing the unbiased error variance conditional on the magnitude of the filtered error variance and overlay the local regression fit through the actual data points using the R package locfit [34]. The figure shows that all three provide conditionally unbiased estimates of filtered error variance as theoretically expected, and that the VIKF approximation and CBPKF results are extremely similar to each other.

VI. ADAPTIVE CBPKF
Whereas CBPKF or the VIKF approximation significantly improves the accuracy of the estimates over the tails, it deteriorates performance near the median. Fig. 2 suggests that, if can be prescribed adaptively such that a small/large CB penalty is effected when the system is in the normal/extreme state, the unconditional performance of CBPKF would improve. Because the true state of the system is not known, adaptively specifying is necessarily an uncertain proposition. There are, however, certain applications in which the normalvs.-extreme state of the system may be ascertained with higher accuracy than others. For example, the soil moisture state of a catchment may be estimated from assimilating precipitation and streamflow data into hydrologic models [35] [36][37] [38][39] [40]. If is prescribed adaptively based on the best available estimate of the state of the catchment, one may expect improved performance in hydrologic forecasting. In this section, we apply adaptive CBPKF in the synthetic experiment and assess its performance. An obvious strategy for adaptively filtering is to parameterize in terms of the KF estimate (i.e., the CBPKF estimate with = 0) as the best guess for the true state. The premise of this strategy is that, though it may be conditionally biased, the KF estimate fuses the information available from both the observations and the dynamical model, and hence best captures the relationship between and the departure of the state of the system from median. A similar approach has been used in fusing radar rainfall data and rain gauge observations for multisensor precipitation estimation in which ordinary cokriging estimate was used to prescribe in CB-penalized cokriging [29].
Necessarily, the effectiveness of the above strategy depends on the skill of the KF estimate; if the skill is very low, one may not expect significant improvement. Fig. 2 suggests that, qualitatively, α should increase as the state becomes more extreme. To that end, we employed the following model for time-varying : where denotes the multiplicative CB penalty factor for CBPKF at time step k,‖̂| ‖ denotes some norm of the KF estimate at time step k, and denotes the proportionality constant. Fig. 4a shows the RMSE reduction by adaptive CBPKF over KF with = |̂| | for the 12 cases in Table II m=1 and  n=10. The values used were 3.0, 1.0 and 0.5 for Groups 1, 2 and 3 in Table II, respectively. The figure shows that adaptive CBPKF performs comparably to KF in the unconditional sense while substantially improving performance in the tails. The rate of reduction in RMSE with respect to the increasing conditioning truth, however, is now slower than that seen in Fig.  2 due to the occurrences of incorrectly specified α. To assess the uppermost bound of the feasible performance of adaptive   Table II. CBPKF, we also specified with perfect accuracy under (51) via = | | where denotes the true state.The results are shown in Fig. 4b for which the values used were 3.0, 1.5 and 1.0 for Groups 1, 2 and 3 in Table II, respectively. The figure indicates that adaptive CBPKF with perfectly prescribed greatly improves performance, outperforming KF even in the unconditional sense. Fig. 4 suggests that, if can be prescribed more accurately with additional sources of information, the performance of adaptive CBPKF may be improved beyond the level seen in Fig. 4a. Finally, we show in Fig. 5 the example scatter plots of the KF (black) and adaptive CBPKF (red) estimates vs. truth. They are for Cases 1 and 9 in Table II representing Groups 1 and 3, respectively. It is readily seen that the CBPKF significantly reduces CB in the tails while keeping its estimates close to the KF estimates in normal ranges.

VII. CONCLUSIONS
Conditional bias-penalized Kalman filter (CBPKF) has recently been developed to improve estimation and prediction of extremes. The original formulation, however, is computationally very expensive, and deteriorates performance in the normal ranges relative to KF. In this work, we present a computationally less expensive alternative based on the variance-inflated KF (VIKF) approximation, and improve unconditional performance by adaptively prescribing the weight for the CB penalty. For evaluation, we carried out synthetic experiments using linear systems with varying degrees of dynamical model uncertainty, observational uncertainty, and predictability. The results indicate that the VIKF-based approximation of CBPKF provides a computationally much less expensive alternative to the original formulation, and that adaptive CBPKF performs comparably to KF in the unconditional sense while improving estimation of extremes by about 20 to 30% over KF. It is also shown that additional improvement may be possible by improving adaptive prescription of the weight to the CB penalty using additional sources of information. The findings indicate that adaptive CBPKF offers a significant addition to the dynamic filtering methods for general application in signal processing and, in particular, when or where estimation of extremes is of importance. The findings in this work are based on idealized synthetic experiments that satisfy linearity and normality. Additional research is needed to assess performance for nonnormal problems and for nonlinear problems using the ensemble extension [30], and to prescribe the weight for the CB penalty more skillfully.