An Investigation of Adaptive Radius for the Covariance Localization in Ensemble Data Assimilation

: The covariance matrix estimated from the ensemble data assimilation always suffers from ﬁlter collapse because of the spurious correlations induced by the ﬁnite ensemble size. The localization technique is applied to ameliorate this issue, which has been suggested to be effective. In this paper, an adaptive scheme for Schur product covariance localization is proposed, which is easy and efﬁcient to implement in the ensemble data assimilation frameworks. A Gaussian-shaped taper function is selected as the localization taper function for the Schur product in the adaptive localization scheme, and the localization radius is obtained adaptively through a certain criterion of correlations with the background ensembles. An idealized Lorenz96 model with an ensemble Kalman ﬁlter is ﬁrstly examined, showing that the adaptive localization scheme helps to signiﬁcantly reduce the spurious correlations in the small ensemble with low computational cost and provides accurate covariances that are similar to those derived from a much larger ensemble. The investigations of adaptive localization radius reveal that the optimal radius is model-parameter-dependent, vertical-level-dependent and nearly ﬂow-dependent with weather scenarios in a realistic model; for example, the radius of model parameter zonal wind is generally larger than that of temperature. The adaptivity of the localization scheme is also illustrated in the ensemble framework and shows that the adaptive scheme has a positive effect on the assimilated analysis as the well-tuned localization.


Introduction
Background error covariances play an important role in determining the influence of assimilated observations on the multivariate analysis of model states [1]. However, the background error covariances are generally specified under the assumptions of homogeneity and isotropy at the beginning of the DA window in the four-dimensional variation data assimilation system (4D-Var), and the characteristics of background error introduced by 4D-Var are still limited by the linearity assumption and the usage of a static covariance model at the beginning of each 4D-Var cycle in the mainstream numerical weather prediction (NWP) centers [2][3][4].
Ensemble data assimilation has been widely employed for estimating the background error covariances in numerical oceanographic and meteorological dynamic systems [5][6][7]. However, the ensemble data assimilation scheme suffers from a few problems in which the covariance matrix is essentially replaced with the sample covariance matrix. For the practical implementation in oceanic or atmospheric systems, the ensemble size is typically many orders of magnitude smaller than the dimension of the model states, considering the affordable computational load in the operational system. This phenomenon means that the sample covariance matrix is rank-deficient due to the finite ensemble size, which can lead to the sampling error and the long-range spurious correlations in the sample covariance matrix [8][9][10].
The consequence of the spurious correlations is that a state variable may be incorrectly impacted by an observation that is physically remote, and filter divergence occurs when the analysis adjusted by observation information is unable to more accurately represent the true state. For this reason, the localization technique has been developed to overcome this problem [11][12][13][14]. The most common localization scheme is Schur product covariance localization, which attempts to mitigate spurious correlations by applying a weighting function to cut off the spurious correlations at a long distance. While covariance localization has been useful for ameliorating spurious correlations over long distances, they require exhaustive tuning of the localization radius for determining at which point the spurious correlations should be cut off [15,16]. More recently, to avoid the empirical tuning of the localization parameters, several schemes of adaptive localization have been proposed and validated for ensemble data assimilation depending on their specific implementations [17][18][19][20][21][22], e.g., using an ensemble of ensembles to mitigate spurious correlations such as sampling error correction and reducing correlation sampling for adaptive localization [23][24][25][26]; assuming the localization function can be taken as a power of the background error correlation function [27][28][29]; proposing the theory of optimal linear filtering for adaptive localization [30]; proposing a correlation-based adaptive localization method [31][32][33]; and considering machine learning algorithms to capture the adaptive localization parameters [34][35][36]. The adaptive localization technique has been developed in detail, but some of these methods make strong assumptions or do not always work in ensemble data assimilation (EDA)-like frameworks.
In this paper, an adaptive localization scheme is proposed for Schur product covariance localization. We focus on the implementation of the adaptive localization for the ensemblebased background error covariances with two difficulties: which taper function is used for determining the localization weighting coefficient and how the localization radius can be updated with the weather scenario.
The remainder of this paper is organized as follows: In Section 2, a brief introduction of the ensemble Kalman filter and traditional covariance localization approach is given. Section 3 presents the new adaptive localization scheme and its implementation. In Section 4, the adaptivity of this localization scheme is evaluated in the Lorenz96 model [37], and we also demonstrate the numerical results of the adaptive localization scheme in the Data Assimilation Research Testbed (DART) [38] and Weather Research and Forecasting (WRF) system [39]. Finally, conclusions and remaining issues are discussed in Section 5.

Background Material
In this section, we recall the basics of the ensemble Kalman filter (EnKF) and the localization technique. Table 1 describes some notations used in this paper.

Ensemble Kalman Filter
The EnKF is one of the variations developed from the Kalman filter equations by replacing the single forecast trajectory with an ensemble. The initial ensemble is a collection of perturbed estimates of the background state, and these ensemble members approximate a random sample from the probability distribution of the background states and observations by a Monte Carlo procedure. There are two steps involved in using the Kalman filter to assimilate observations in the system: a forecast step and a Kalman-analysis step.
In the forecast step, the EnKF produces an ensemble of forecasts by propagating with the full nonlinear numerical model to the time when observations become available. This step can be performed over a long time period, and the background error covariance matrix could be flow-dependent. The approximation of the evolved background error covariance can be estimated in the ensemble data assimilation: In this expression, S indicates the matrix containing each forecast ensemble anomalies as a column, ; the superscript T stands for the matrix transpose, the superscript f represents the forecast.
In the Kalman-analysis step, a weighting value between the observation and background errors is calculated in the Kalman filter equations. This weighting determines the degree to which the forecast trajectory can be adjusted towards the observational data, and the update equations are commonly written as follows: where x f and x a denote the forecast and analysis variables, respectively; H is the forward operator that maps the model state to the observational space (m y × m x matrix); R is the observational error covariance (m y × m y matrix).

Traditional Covariance Localization
However, current NWP models have state spaces containing elements on the order of 10 7 , and a resource-limited ensemble size is far smaller than the dimension of the state spaces. The undersampling may produce a reduced rank representation of the background error covariance matrix and the development of spurious long-range correlations. Covariance localization is the most commonly method used to mitigate these issues by modifying the ensemble-based covariance (Equation (1)) with its element-wise product [40][41][42]: where • indicates the Schur product. It is ordinarily achieved by a localization matrix ρ, whose structure is a band of nonzero elements around the main diagonal, with ones on the diagonal and falling to zero at the localization radius distance from the diagonal. The covariance localization attempts to cut-off the spurious correlations in the error covariances at a long distance, and effectively improves the estimates of the background error covariance. Thus, the update equations with covariance localization can be written as: For computational efficiency, the localization operations are implemented using the H in the observational space, ρ • (P f H T ) and ρ • (HP f H T ), respectively.

An Adaptive Localization Scheme
In this section, the adaptive localization scheme will be introduced in detail with the localization taper function and adaptive radius. The implementation of the adaptive scheme in the ensemble framework is also illustrated.

Localization Taper Function
The covariance localization scheme cuts off the long-range spurious correlations with its element-wise product like a weighting function. There are different ways of choosing the localization weighting function in localization schemes, such as taper functions [43], the fifth-order polynomial correlation function [44], and the distance-based localization functions for multiphase flows [45].
In the previous section, it was shown that the flow-dependent background error covariance estimated by the ensemble members is an m x × m x real symmetric matrix. For any nonzero column vector u of m x real numbers, the scalar u T P f u is strictly positive: Therefore, the sample covariance matrix estimated from the ensembles is a positive semi-definite matrix, and the localized covariance ρ • P f is also positive semi-definite since it is the Schur product of two positive semi-definite matrices [46,47]. Another advantage of the Schur product is that it increases the rank of the covariance matrix by cutting off all long-range spurious covariances to zero, and the increasing rank would introduce the extra degrees of freedom to the dimension of the model states and the observations. The Gaspari-Cohn function is very popularly used as localization taper function in ensemble frameworks because it is a compactly supported 5th-order piecewise rational function and its Gaussian-shaped is similar to the probability density function of a normal distribution [44,48]; thus the Gaspari-Cohn function is selected as the taper function for the adaptive localization scheme. The Gaspari-Cohn function has a value decreasing from one to zero in an approximation of a Gaussian distribution between zero and a 2 × cutoff distance, shown in Figure 1, that will be transformed and constructed into a localization matrix with the same dimensions as the covariance matrix.

The Threshold Value of the Localization Radius
At present, defining the localization radius is a heuristic process (i.e., it is based on experience), which is computationally expensive and often needs to be redone whenever the weather situation changes or the ensemble system is modified. If the threshold value is larger, then there is a larger localization radius such that the long-range spurious correlations will not be eliminated and more observational data are used for the model state update in Equation (5). Similarly, If the threshold value is small, then the physical dynamic correlations are excessively damped, and more observational data will be excluded. The root mean square error (RMSE) indicates how closely the estimation matches the reference field and is defined as wherex corresponds to the mean of the updated fields in the ensemble, x re f represents the reference field, and N total is the number of model parameters. The Lorenz96 model is a dynamic system formulated by Edward Lorenz in 1996 [37]. It is defined as follows. For i = 1, ..., m x , where it is assumed that Here, x i is the state of the system, m x = 40, F is a forcing constant, and F = 8 is a common value known to cause chaotic behavior.  Figure 2 shows the RMSE corresponding to different radii in the Lorenz96 model with a 20-member ensemble. It can be noticed that different localization radii have different RMSE in the ensemble assimilation. A well-tuned radius will appropriately remove the unphysical spurious correlations and make the data match well, such as loc2 and loc3 in Figure 2. However, a larger radius, such as the label "loc4", will weaken the effect of removing the spurious correlation and have a poor data match with larger RMSE. And when the localization radius is adjusted to an excessively small value, the specified dynamic correlations will be deteriorated and a smaller radius will exclude many useful neighboring observations.
The tuned radius cannot reflect the real-time characteristics of the ensemble system. The adaptive radius needs to correspond adaptively to the correlation coefficient of the ensemble system since the localization technique is used for ameliorating the spurious correlations.
We define the x i are the ensemble members,x(x = E[x]) is the estimate of the mean µ x . Since ensemble members x are the random sampling using the Monte Carlo method in the EnKF, independent and uncorrelated. According to the Strong Law of Large Numbers, when the sample is infinite (N → ∞) that we can get the estimate from the ensemble is exact: The estimate error is ε = x − µ x , we can get the estimation is unbiased with: is the estimate of the true variance σ 2 x . According to the Central Limit Theorem, the estimate values are normally distributed around the true, and the samples are independent, thus: The standard error of the ensemble estimates approximates the square root of the ensemble size 1 The role of covariance localization is to cut-off spurious correlation between state variables that are spatially distant or not physically correlated. Since the estimation error of the ensemble members based on Monte Carlo sampling is 1 , the spurious correlation in the correlation coefficient has seriously affected the accuracy of the estimated value when the correlation coefficient is less than 1 . Therefore, it is considered that the corresponding distance at which the correlation coefficient is less than is the radius of influence of the localization taper function that most of the spurious correlation can be cut-off with such radius. To avoid the effect of negative correlation coefficients, we use the square of the correlation coefficient for the determination of the localization radius E C(r) 2 ≥ 1 N−1 . Some studies have shown that there are strong links between localization and correlation, when the ensemble distribution is gaussian with the assumption of the sampling error is unbiased [49,50]: where ρ ij is the first-order approximation of the localization coefficient ρ ij and C ij is the element of sample correlation. The threshold value of the localization radius for the Gaspari-Cohn function should make ρ ≥ 0 in the adaptive localization scheme, so that spurious correlations can be accurately eliminated. Thus, the Equation (14) can be written The taper coefficient of the localization weighting function should be zero when E C(r) 2 is less than 1 N−1 , in which the corresponding distance r is the localization radius threshold value. The localization radius can be computed adaptively with known statistical properties of sample covariances and updated with the real-time characteristics in the ensemble system. Figure 3a-d shows the square of the correlation coefficient in the Lorenz96 model with different ensemble members, and the dashed line corresponds to the position of the localization radius calculated by the adaptive localization scheme. It can be noticed that the spurious correlation weakens significantly with the increase of ensemble members (Figure 3e-h), and the adaptive radius reflects the range of spurious correlation well and gives a reasonable radius threshold. The larger ensemble has a larger radius than that of the small ensemble size, which can be explained that the accuracy of the estimated correlations of the small ensemble have been severely contaminated, but it is still accurate at the same distance in a larger ensemble.

Implementation
The adaptive covariance localization scheme can be implemented directly as an independent module in the ensemble framework, and the flowchart is shown in Figure 4. The practical implementation in the ensemble framework can be summarized by the following steps:

1.
Integrate all ensemble members forward to the assimilation time step; 2.
Estimate the sample covariance for the forecast ensemble members; 3.
Calculate the correlation coefficient with the background ensembles; 4.
Update the adaptive localization radius and transform the localization matrix; 5.
Update the analysis field with the observation data, operator and error information; 6.
Integrate the ensemble members forward to the next time step; 7.
Repeat steps 2-6 until the assimilation process is completed.

Figure Legend
In the original publication, there was a mistake in the legend for Figure 8. The original schematic diagram of calculating the correlation coefficient for the variables at a single grid point location with distance is prone to misinterpretation, so the expression was redesigned. The correct legend appears below:

Error in Figure
In

Results and Analysis
In this section, experimental studies of the adaptive scheme application on ensemble framework are examined in Lorenz96 model and WRF model.

Preliminary Evaluation in Lorenz96 Model
The effects of the adaptive scheme on eliminating the spurious correlations is firstly examined with the Lorenz96 model in the EnKF. A perfect model scenario is simulated by a fourth-order Runge-Kutta scheme(RK4) for model integration with time step t = 1/40, which is used to guarantee stability with RK4. We perform a free run of the model from an initial condition, and this integration is considered to be the true state. The observations are created from the true state by adding uncorrelated random noise, and only one observation is available at two model grid points. The representations' covariance matrices of the Lorenz96 model calculated from background ensembles using Equation (1) are shown in Figure 5. Each pixel in the grid represents the covariance between two state variables, and the pixels are colored according to the covariance value. Figure 5a shows the sample background error covariance estimated from the 20-member ensemble, and there are still large undesirable correlations in the covariance matrix far from the leading diagonal, and most of them are contributed by the spurious correlations. Accordingly, observations at one location may have an improper impact on state variables that are far away from the observation location, and the weighting placed on the forecasted state and the observations is inferior in the Kalman gain (Equation (3)), which will greatly damage the quality of the analysis. Figure 5c shows the representation background error covariance matrix in Figure 5a after it has been localized. A striking feature is that the localization technique is like a weighting function to cut off long-range spurious correlations and the correlations close to the diagonal are maintained. The accuracy of the localized covariance matrix from the 20-member ensemble is almost the same as that from the 100-member ensemble (Figure 5b). Figure 6 shows the rank histogram of Lorenz96 model variables with different localization scheme 20 ensemble members. The statistical frequencies of each interval are more uniformly distributed with the adaptive localization scheme, reflecting a more reasonable distribution of ensemble members in the assimilation process, which can well suppress the filtering divergence. The signal-to-noise ratios (SNR) of the raw and localized covariance matrix are 1.282 and 6.102, respectively, and a larger SNR of localized covariance strongly supports the fact that the adaptive localization effectively ameliorates most of the spurious correlations. Another benefit of the localization technique is that the covariance matrix becomes sparse after applying the Schur product (Equation (4)), which can lead to important computational savings in the calculations.  The spread is a direct measurement that qualifies the uncertainty contained in the ensemble members. If the spread of the ensemble members is small, then all the members are in a similar state, and there is a high confidence in the background. In contrast, if the spread of the ensemble members is large, then the ensemble members are widely spaced. After the assimilation of observational data, the RMSE and spread of the ensemble members are reduced, and the spread should ideally match the RMSE if the ensembles estimate the uncertainty of the model states correctly. Figure 7 shows the RMSE and spread of the analysis field with adaptive scheme and the localization tuned manually. The RMSE and spreads of the well-tuned localization scheme and the adaptive scheme descended stably and continuously. At the stable stage, the RMSE and spread of the two experiments are close to each other. The RMSE and the spread of the adaptive scheme reach 0.3604 and 0.4267, respectively, which implies that the ensembles work well on data assimilation. The localization radii during the last 10 steps of the assimilation cycle of the adaptive scheme are represented in Table 2. The localization radius is not constant in the cycle, and the adaptive scheme works well in updating the radius values. These results indicate that the adaptive localization radius is efficient in suppressing the inferior effect of spurious correlations and has positive effects in the data assimilation.

Application in an Atmospheric Model
The performance of the adaptive localization scheme will be investigated in the Weather Research and Forecasting (WRF) model [39] and the Data Assimilation Research Testbed (DART) framework [38]. We select Typhoon Dujuan (2015) as a numerical example, which was generated in the Northwest Pacific on 20 September 2015 and weakened to a cyclone on 30 September. The center's maximum wind speed was about 79 m/s, and its pressure was about 925 hPa. The center of the simulation area is 23.0 • N, 130.0 • E at 1800 UTC 26 September 2015, which is shown in Figure 8; the grid size of the assimilation region is 500 × 400; the horizontal resolution is 10 km. The initial and lateral boundary conditions for the WRF simulation were derived from the National Centers for Environmental Prediction (NCEP) operational Global Forecast System analysis data. A pair of 40-member ensembles is generated by the random perturbations based on the initial conditions. For a more realistic environment, the observations (PREPBUFR files) from the NCEP are assimilated using the ensemble adjustment Kalman filter in DART [51]. The expectations of correlation coefficients at different distances are calculated from the ensemble states by random sampling in the simulation area, which is shown in Figure 9 that the point pair connected by the lime solid line is used to calculate the correlation coefficients at distances of 200 km. The adaptive scheme is easy to implement online in the frameworks of the traditional covariance localization and has a low computational cost that the realizations of random sampling for expectations is roughly in the order of 10 2 .

Text Correction
(1) Following the correction of the derivation of the adaptive localization radius scheme, the text of ** Section 3 **, ** Sub-section 2 ** was edited to reflect the localization radius is obtained adaptively through a certain criterion of correlations with the background ensembles in the adaptive localization scheme.
A correction has been made to ** Section 3 **, ** Sub-section 2 **: We define the are the ensemble members, ̅ ( ̅ = [ ]) is the estimate of the mean . Since ensemble members x are the random sampling using the Monte Carlo method in the EnKF, independent and uncorrelated. According to the Strong Law of Large Numbers, when the sample is infinite ( → ∞) that we can get the estimate from the ensemble is exact: Figure 9. Shows the random sampling for calculating the correlation coefficients. Figure 10a shows the expectations of the correlation coefficient calculated from random sampling. It is obvious that the characteristics of the correlation coefficient vary with the model parameter, which is illustrated by the temperature and zonal wind correlation at the surface. The correlation coefficient gradually decreases with increasing distance, drops from one to approximately zero at first within a certain distance, and then starts to oscillate around zero with increasing distance. In addition, the correlation coefficients of different model parameters have different trends; for example, the correlation coefficient of temperature gradually decreases from one to approximately zero, while the correlation coefficient of zonal wind first decreases below zero, then increases to approximately zero, and then oscillates around zero.
The localization radii of zonal wind and temperature on the vertical levels calculated by the adaptive scheme are illustrated in Figure 10b. There is a striking feature that the optimal localization radius is variable on the model vertical levels; for instance, the localization radius of temperature around the surface is 550 km, which is smaller than 930 km at 500 hPa. The trend of the radius varying with model levels is different with model parameters where the radius of zonal wind is generally larger than that of temperature at the same levels. The optimal radius of the localization function is not fixed for the model parameters, which is flow-dependent with the background ensemble weather scenario. A uniform localization radius tuned manually on the observational data of different variables may reduce the effect of localization on the spurious correlations issue and deteriorate the balance relationship between the model parameters [52]. Single observational data simulations with the adaptive scheme are implemented in the DART-WRF system. The spatiotemporal elution of the observational data increment of different model parameters and its link with the meteorological flow are investigated through Figure 11, from 18:00 UTC on 26 September to 00:00 UTC on 27 September 2015. An overview of Figure 11 outlines the general features of the adaptive localization scheme, where the adaptive localization based on the correlations is changed in accordance with the meteorological flow in the background without tuning the localization radius manually. The temporal evolution of the adaptive localization function is examined in Figure 12a for temperatures in which the localization radius at 00:00 UTC is shorter than that at 18:00 UTC.  It can be observed that the range of the zonal wind increments is greater than the range of temperature at the same time ( Figure 11) and the different increments of the different model parameters indicate that the optimal localization radius vary across the parameters, which is in agreement with the localization radius shown in Figure 12b that the localization radius of zonal wind obtained by the adaptive method is larger than that of temperature. A more detailed examination shows that the observation increment of temperature has almost no negative increment and zonal wind has a large amount of negative increment information, which corresponds to the correlations in Figure 10a that the correlation coefficient of temperature drops directly from one to near zero, while the correlation coefficient of zonal wind drops from one to below zero and then oscillates near zero. The characteristics of different variable observations with different radii are reflected reasonably by the adaptive localization scheme.

Discussion and Conclusions
The ensemble-based covariances estimated from the short-term forecast ensembles are a reduced rank representation of the background error covariance and always suffer from spurious correlation problems. In order to ameliorate the spurious correlations in the ensemble-based covariances, an adaptive localization scheme is presented in this study.
The theoretical part of the ensemble-based covariance shows that the estimated matrix from the ensembles is a positive semi-definite matrix. In addition, this supports the claim that the positive semi-definite feature of the covariance matrix should be maintained after covariance localization. For that purpose, a Gaussian-shaped taper function is used for determining the localization weighting coefficient. The investigation of the threshold value of the localization radius reveals that the adaptive radius has a strong quantitative relationship with the correlation and ensemble size. The spurious correlations generated by the finite ensembles have the same scale as the true correlation coefficient at a long distance and become weaker in the larger ensemble. Note that the accuracy of the estimated correlations of the small ensemble have been severely contaminated, but it is still accurate at the same distance in a larger ensemble, which means that the localization radius of the large ensemble size is larger than that of the small ensemble size. In a real NWP context, the localization radius obtained by the background ensemble is model parameter-and vertical level-dependent, which is reflected in the variation of observational increments. The interesting properties are also closely consistent with the change characteristics of the corresponding correlation coefficient.
Experimental studies of the adaptive scheme application on the ensemble framework present that the adaptive scheme has a positive impact on the RMSE in the analysis compared with the well-tuned localization scheme. The spurious correlations in the ensemble-based covariance matrix have been significantly suppressed, and the assimilation effect is improved with the localization radius is updated adaptively. In addition, it is easy to implement the adaptive scheme in the traditional covariance localization procedure when running an ensemble assimilations.
However, the homogeneous and uniform type of the localization taper function may weaken the anisotropic characteristics of the increment. In the next step, combining the adaptive localization and a wavelet approach may be worth considering. Second, further adaptive applications are currently being explored for the multiple domain according to the weather situation.