Evaluating the Role of the EOF Analysis in 4 DEnVar Methods

The four-dimensional variational data assimilation (4DVar) method is one of the most popular techniques used in numerical weather prediction. Nevertheless, the needs of the adjoint model and the linearization of the forecast model largely limit the wider applications of 4DVar. 4D ensemble-variational data assimilation methods (4DEnVars) exploit the strengths of the Ensemble Kalman Filter and 4DVar, and use the ensemble trajectories to directly estimate four-dimensional background error covariance. This study evaluates the role of the empirical orthogonal function (EOF) analysis in 4DEnVars. The widely-recognized 4DEnVar method DRP-4DVar (the Dimension-reduced projection 4DVar) is adopted as the representation of EOF analyses in this study. The performance of the Dimension-reduced projection 4DVar (DRP-4DVar), 4DEnVar (i.e., another traditional 4DEnVar scheme without EOF transformation), and the Ensemble Transform Kalman Filter (ETKF) was compared to demonstrate the effect of the EOF analysis in DRP-4DVar. Sensitivity experiments indicate that EOF analyses construct basis vectors in eigenvalue space and the dimension reduction in the DRP-4DVar approach helps improve computational efficiency and analysis accuracy. When compared with 4DEnVar and the ETKF, the DRP-4DVar demonstrates similar analysis root-mean-square error (RMSE) to 4DEnVar, whereas it surpasses the ETKF by 22.3%. In addition, sensitivity experiments of DRP-4DVar on the ensemble size, the assimilation window length, and the standard deviation of the initial perturbation imply that the DRP-4DVar with the optimized EOF truncation number is robust to a wide range of the parameters, but extremely low values should be avoided. The results presented here suggest the potential wide application of EOF analysis in the hybrid 4DEnVar methods.


Introduction
The four-dimensional variational data assimilation (4DVar) method, the Ensemble Kalman Filter (EnKF), and "hybrid" methods are the most promising methods to provide optimal analysis for numerical weather prediction (NWP) [1,2].The traditional 4DVAR technique could optimally fit observations at multiple times through the trajectory of the model solution while constraining the output with model dynamics and physics, but it requires the adjoint and tangent linear approximation of the forecast model due to the implicit expression of the control variables in the cost function.On the Atmosphere 2017, 8, 146; doi:10.3390/atmos8080146www.mdpi.com/journal/atmosphereother hand, the usual EnKF is easier to perform and could provide flow-dependent error estimates of the background errors by forecasting the statistical characteristics, but it lacks the dynamic constraint as in 4DVar.The 4DEnVar methods aim to retain the advantages of 4DVar and EnKF while avoiding the need of an adjoint or tangent linear model of the forecast model, and it has become increasing popular.Lorenc and Kalnay et al. suggest the encouraging achievement of the 4DEnVar methods in advancing data assimilation [3,4].Gustafsson also points out that the optimal approach is to combine the best ideas of 4DVar and EnKF [5].Previous studies have evaluated the performance of 4DVar and EnKF with comparing experiments by Caya et [6][7][8][9][10][11][12].With the development of hybrid methods, recent studies have shown favorable development of a hybrid method which combines the advantages of 4DVar and EnKF [8][9][10]13].Many hybrid methods have been proposed using 4DVar or EnKF in recent years.Similar to the extension of the three dimensional variational technique (3DVar) to the 4DVar for assimilating asynchronous observations, Evensen and Van Leeuwen, Hunt et al., and Fertig et al. also extend EnKF to the Ensemble Kalman Smoother (EnKS) and a four-dimensional Local Ensemble Transform Kalman Filter (4D-LETKF), respectively [14][15][16].With the idea of employing ensemble forecast to provide flow-dependent background error covariance matrices, Zhang et al. coupled 4DVar with EnKF to produce the Ensemble 4DVar method which runs both EnKF and 4DVar, and replaces the ensemble mean of EnKF with the 4DVar analysis [3,17,18].
Both four-dimensional ensemble-variational data assimilation methods (4DEnVars) and En4DVars use a hybrid combination of fixed background error covariance with localized covariance from an ensemble of current forecasts designed to provide flow-dependent error estimates.The fundamental difference between the methods is their modeling of the time evolution of errors: En4DVars also uses a linear model and its adjoint, while 4DEnVar uses a localized linear combination of nonlinear forecasts [19].Furthermore, the 4DEnVar methods require no adjoint model, therefore they are easy to implement and computationally economical.For example, in Qiu's 4DEnVar, the analysis increment is expressed by a linear combination of basis vectors constructed by the singular value decomposition (SVD), and then the cost function is transformed to be an explicit expression of control variable without the need of an adjoint and tangent linear model [20,21].Also, Liu et al. presented a 4DEnVar algorithm which directly uses the background ensemble perturbations as basis vectors [22,23].In addition, Tian et al. developed a POD (proper orthogonal decomposition)-based ensemble four-dimensional variational data assimilation approach (referred to as POD-4DEnVar).The POD-4DVar method exploits the strengths of the EnKF and 4DVar, and the feasibility and effectiveness of POD-4DVar has been demonstrated in an idealized model, WRF, as well as global chemistry transport model Geos-Chem [24][25][26][27].It should be noted that reduced order strategy is a promising way for efficient 4DVar assimilation [28][29][30][31][32][33], for example, the POD/DEIM developed by Stefanescu et al. [28].Moreover, Wang et al. developed a dimension-reduced projection 4DVar (DRP-4DVar) method, which employed the empirical orthogonal function (EOF) to project perturbation ensemble to a reduced space spanned by the dominant EOF modes, and then used basis vectors to obtain an analysis increment which optimally combines the observations and backgrounds [34,35].The above 4DEnVar methods share common features with previous hybrid methods (e.g., [18]), which combine the advantages of the flow-dependent background error covariance matrix (referred to as B-matrix) and simultaneous assimilated observations at multiple times.
As one of the representative 4DEnVar approaches, DRP-4DVar has been studied in a series of studies.Liu et al. demonstrated that the DRP-4DVar method reasonably generated a flow-dependent B-matrix capturing the weather trends during the assimilation window by using the fifth-generation Pennsylvania State University-National Center for Atmospheric Research Mesoscale Model (MM5) [36].Zhao and Wang noted that the performance of the DRP-4DVar method has been improved with the initial perturbation samples derived from the background error covariance of the WRF-3DVar assimilation system [37].A series of numerical experiments were also conducted by using Lorenz-96 model to investigate the sensitivity of assimilation parameters (e.g., the assimilation window length and the flow-dependent background error covariance) of DRP-4DVar in comparison with the EnKF [38].Zhao et al. developed a DRP-4DVar assimilation system based on MM5, and successfully assimilated the simulated sea level pressure observations to improve the typhoon-track forecasts in the observing system simulation experiments (OSSEs) [39].However, previous studies paid little attention to the EOF-, SVD-, or POD-based techniques and their related parameters (e.g., truncation number) in the 4DEnVar method family [20,24,25,40].Investigating and understanding the role of EOF is of great importance for the study of these 4DEnVars since the basis vectors expressing the analysis variables in the 4D space are constructed by the technique.This distinguishes the current paper from most others on hybrid data assimilation, which focus on comparing 4DVar, 4DEnVar, and 3DEnVar.
This study focuses on discussing the impact of EOF analysis in the DRP-4DVar method and the optimization of related parameters, by comparison with another traditional 4DEnVar scheme without the use of EOF transformation (referred to as 4DEnVar hereafter, [22]) and the Ensemble Transform Kalman Filter (ETKF, [41][42][43]) in theory and practical application.The ETKF and 4DEnVar methods share a common feature with the DRP-4DVar method, in that the analysis increments are expressed by expansion of the basis vectors, yet directly use the background ensemble perturbations in 3D space and 4D space, respectively, as the basis vectors.Thus, a slightly unusual presentation of the ETKF, the 4DEnVar, and the DRP-4DVar formulas is given, emphasizing the theoretical differences, particularly those related to the EOF analysis.The three methods are then applied to the Lorenz-96 model, illustrating the practical outcomes resulting from the theoretical differences.Further, the influence of the truncation number of EOF modes is investigated, since insufficient EOF modes might not represent the spatial structure and temporal evolution of the analysis variables, whereas surfeit EOF modes include noise and cost more computational resources.The analysis of DRP-4DVar, 4DEnVar, and the ETKF are compared to demonstrate the effect of the EOF analysis in DRP-4DVar.Finally, the sensitivity of the EOF analysis to parameters is examined, which are the ensemble size, the assimilation window length, and the ensemble spread of initial perturbations for each assimilation window.Clarifying the role of the EOF component and its relevant parameter optimization is helpful to improve the computational efficiency and result accuracy of the DRP-4DVar method.As the DRP-4DVar method is one of the representative approaches in the family, the discussion presented here can also facilitate the understanding and applying of other 4DEnVar hybrid methods.The nomenclature in this paper follows Recommended Nomenclature for EnVar Data Assimilation Methods by WMO's DAOS WG [38].

The DRP-4DVar and the 4DEnVar Methods
By minimizing the incremental format of the standard 4DVar cost function J(x ), an optimum increment (x α ) at the beginning time t 0 can be written as Equation ( 1) over an assimilation window of length L w (L w = t N − t 0 ).Given the background state x 0,b at t 0 (the beginning of the assimilation window) and the observations y obs,i at t i (i = 0, 1, . . ., N), the minimization of the cost function provides the analysis increment and then the analysis initial condition.
where x 0,b with a dimension of L x × 1, refers to the background or first guess at the initial time t 0 and B denote the corresponding B-matrix with a dimension of L x × L x .y obs represents the observation increment which is the difference between the observation y obs and the simulated observation y(x 0 ) of the background x 0,b through the forecast model and the observation operators.y (x 0 ) represents the simulated observation increment of x 0,b .In the assimilation window[t 0 , t N ], y (x 0 ), y obs,i and y obs,i are S i × 1 dimensional column vectors at t i and the number of observation times is N + 1.Therefore, y obs and y (x 0 ) have a dimension of L y × 1, where L y = N ∑ i=0 S i • M t 0 →t i denotes the model forecast integrated from t 0 to t i .H i denotes the observation operator at t i .O refers to the observation error covariance matrix, which include diagonal matrices O i on its diagonal line since observation errors are assumed to be uncorrelated.The control variable x 0 in the incremental format of the 4DVar cost function is connected with x i through forward model, so the gradient as well as the minimization of the cost function with respect to x 0 is difficult to compute.4DVar utilizes optimal control theory [44,45] to minimize the cost function defined over the time interval by using an adjoint model to determine its gradient, but minimizing the cost function is still computationally expensive.Additionally, coding the adjoint model for 4DVar and maintaining the update with the model upgrading require tremendous effort.
The 4DEnVar methods (e.g., DRP-4DVar and 4DEnVar in this paper) substantially reduce computational cost and need no adjoint model [23,34].The 4DEnVars use X en and Y en , ensemble perturbations generated by model integration to construct P y and P x , the basis vectors, and then express x 0 and y b by a linear combination of the basis vectors with weighting coefficients given by α: where m (1 ≤ m ≤ K) is the vector number of the basis, and K is the ensemble size of the perturbation ensemble.Meanwhile, similar to the ensemble methods, 4DEnVars estimate the B-matrix in the m-dimension space spanned by the basis vectors: where the matrix b α is defined as The 4DEnVar methods transform Equation (1) into Equation (10), which means transforming the implicit cost function of x 0 into one of the weighting coefficient matrix α, and then derives the explicit solutions of the analysis increment x 0,a and its weighting coefficients matrix α 0,a .Thus, the iterative minimization and the adjoint model are no longer needed, and the computational cost is substantially reduced.With the information of observation increment y obs , α 0,a is derived in the space spanned by the basis vectors (Equation (11)).
Finally, the analysis x 0,a at the time t 0 is obtained by Equation ( 13), and then is employed as the initial condition for the forecast of the next time.Equation (13) indicates that the analysis initial condition x 0,a is derived through a linear combination of the basis vectors with weighting coefficients given by α 0,a .
The 4DEnVar method directly uses X en and Y en as P y and P x , respectively.All information about the ensemble perturbations is kept and the analysis state is performed in the K-dimension space spanned by the ensemble perturbations.
where y b refers to y(x 0,b ), x en and y en are ensemble of the prior members and the simulated observations, with a dimension of L x × K and L y × K, respectively.The X en and Y en are generated by either historical sampling (e.g., [34]) or ensemble forecast for the assimilation window (e.g., [36,37,46]), and this study focuses on the latter scenario.Further, the background ensemble perturbations X en are assumed to be a Gaussian distribution with X en = 0 and given standard deviations.The background covariance B is approximately estimated by In the DRP-4DVar method, the basis vectors P y and P x are generated by projecting X en and Y en to the reduced dimension space spanned by U y (a K × m matrix), the leading mth EOFs of Y en .
The EOF analysis here reduces the dimension of Y en , through keeping the first mth EOF that make Y en U y have maximum variability [41].The validation of Equation ( 18) is under the tangent linear approximation.The contribution of the EOFs eigenvalues to the variance is measured by the corresponding eigenvalues in a m × m matrix Therefore, the EOF analysis in the DRP-4DVar reduces the K-dimensional space to a smaller m-dimensional space, and filters the noise or redundant components that contributing little to the variability.In addition, b α (Equation 8) is a singular matrix with a rank of m − 1, and leads to an underestimation of B. To ameliorate such problem, an improved b α as Equation ( 21) is proposed by Wang et al. [34], and is proved necessary by Liu et al. [30].

The ETKF
The ETKF [41][42][43] shares the common feature with 4DEnVars, in that the analysis is expressed with weight coefficients that linearly combine the basis vectors, though the basis vectors are simply the background ensemble in three-dimensional space and the ensemble perturbations are updated as well.In particular, matrices and vectors related to L y have L y = L x in the following equations at analysis time.
At an analysis time, the background ensemble x b is represented by where x b is a column vector containing the mean of the background ensemble and X en,b is a matrix whose columns are the background ensemble perturbations from the ensemble mean.For the mean analysis, the increments and y b are represented by the linear combination of the background ensemble perturbations with the weight coefficients given by w Since B is estimated in the ETKF as Equation ( 25), the cost function for the analysis time then becomes an explicit function of weighting coefficient matrix w.
Computing the gradient of the J(w) with respect to w for the minimization, the weight coefficients for the analysis mean w a and the transform matrix B a are obtained.It should be noted that the ETKF method transforms forecast perturbations into analysis perturbations by a transformation matrix B a and then use B a to solve the analysis state [41][42][43].Similar to Equation (11), Equation ( 27) indicates the K × 1 vector of weight w a is derived with information about observation increments, y obs − H(x b ) The mean analysis is then computed from the background ensemble mean and a linear combination of the background ensemble perturbations.
In addition, the analysis ensemble perturbations are converted from the background ensemble perturbation ensemble by the symmetric square root of (K − 1) B a .
Equations ( 29) and (31) indicate that the analysis ensemble is derived through a linear combination of the background ensemble, with weighting coefficients given by w a (a K × 1 vector) for the mean analysis, and W a (a K × K matrix) for the analysis perturbations [42].Therefore, the k-th analysis ensemble member is obtained by where X en,b,k , X en,a,k , and W a,k refer to the k-th column corresponding matrices, respectively.

Comparison among the DRP-4DVar, the 4DEnVar, and the ETKF Methods
As shown in Figure 1, the DRP-4DVar, the 4DEnVar, and the ETKF methods theoretically differ in the construction of basis vectors and temporal evolution of the ensemble perturbations.The 4DEnVar and the ETKF methods directly employ the background ensemble perturbations as the basis vectors, and the major analysis of the two methods takes place in a K-dimensional space spanned by the background ensemble perturbations.X en,b , the ensemble perturbations, can also be regarded as a linear transformation from the original K-dimensional space to one spanned by X en,b .This helps avoid the mathematics conceptual difficulty in viewing them as basis vectors, since the columns of X en,b are linearly dependent and the B-matrix has rank at most K − 1 [40].Yet constructing P y in 4DEnVar uses ensemble perturbations in four-dimensional space, whereas the ETKF uses ensemble perturbations in three dimensions.
Instead of directly using the background ensemble perturbations, DRP-4DVar employs the EOF analysis to extract the leading EOF vectors, explaining the major variance as the basis vectors.The EOF analysis allows the DRP-4DVar method to transform the basis from the K-dimensional space to a m-dimensional space (m ≤ K).When m = K, the transformation maintains all the information about the ensemble perturbations, and the DRP-4DVar is equivalent to the 4DEnVar.When m < K, the DRP-4DVar only uses the leading EOF vectors, and filters the noise components in the ensemble perturbations.The noise components are the EOFs which contribute little to the variability, from the view of the variance analysis [35,47,48].Further, since the sum of the columns of X en is zero and only the leading EOFs are retained, the sum of the columns of P x tends to be nonzero.The nonzero sum of the columns of P x also explains the difference between Equation (8) and Equation ( 16) (or Equation ( 25)) which, according to the same definition, estimate the B-matrix in the DRP-4DVar and the 4DEnVar methods (or the ETKF), respectively.Therefore, when using m < K, the EOF analysis in the DRP-4DVar not only transforms the basis to the eigenvalue space, but also filters the noise in the ensemble perturbations.
As to the temporal evolution of the ensemble perturbations, the ETKF refreshes the background ensemble perturbations at each analysis step.The B-matrix computed from the ensemble perturbations is flow-dependent.Whereas, at each assimilation window, the ensemble perturbations in the 4DEnVars are initialized independently by Gaussian random numbers, and then integrated by the model merely during the assimilation windows.The B-matrix computed from the ensemble perturbations in the 4DEnVars is only partially flow-dependent.In order to obtain ensemble perturbations representative of the model dynamical structure, ensemble size, assimilation window length, and initial Gaussian random perturbations need to be optimized.Yet, the DPR-4DVar method is expected to be relatively robust because of the positive impact of the EOF analysis.Instead of directly using the background ensemble perturbations, DRP-4DVar employs the EOF analysis to extract the leading EOF vectors, explaining the major variance as the basis vectors.The EOF analysis allows the DRP-4DVar method to transform the basis from the K-dimensional space to a m-dimensional space ( ≤ ).When = , the transformation maintains all the information about the ensemble perturbations, and the DRP-4DVar is equivalent to the 4DEnVar.When , the DRP-4DVar only uses the leading EOF vectors, and filters the noise components in the ensemble perturbations.The noise components are the EOFs which contribute little to the variability, from the view of the variance analysis [35,47,48].Further, since the sum of the columns of is zero and only the leading EOFs are retained, the sum of the columns of tends to be nonzero.The nonzero sum of the columns of also explains the difference between Equation (8) and Equation ( 16) (or Equation 25) which, according to the same definition, estimate the B-matrix in the DRP-4DVar and the 4DEnVar methods (or the ETKF), respectively.Therefore, when using m < K, the EOF analysis in the DRP-4DVar not only transforms the basis to the eigenvalue space, but also filters the noise in the ensemble perturbations.
As to the temporal evolution of the ensemble perturbations, the ETKF refreshes the background ensemble perturbations at each analysis step.The B-matrix computed from the ensemble perturbations is flow-dependent.Whereas, at each assimilation window, the ensemble perturbations in the 4DEnVars are initialized independently by Gaussian random numbers, and then integrated by the model merely during the assimilation windows.The B-matrix computed from the ensemble perturbations in the 4DEnVars is only partially flow-dependent.In order to

Experiment Design
A series of experiments were conducted using the Lorenz-96 model [49] in order to compare these three assimilation schemes and assess the impact of pertinent parameters.The ideal model simulates the simplified behavior of a meteorological variable in a periodic domain: This section may be divided by subheadings.It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.
Here x j denotes the variable on the j-th grid point, 40 is the number of the grid points, and F denotes the forcing.The model solves Equation ( 33) through a fourth-order Runge-Kutta scheme with a time step of 0.05 (corresponding to 6 h).Although the Lorenz-96 model is simple, it is helpful in illustrating important features of assimilation methods and in providing instruction for practical optimization.
The true states include variables at the 40 grids of every model time step during a 1500 time-step integration with forcing F = 8.The observations are generated by adding white noise from a Gaussian distribution with zero mean and variance of 1.00 to the true states.The simulated observations consist of the variable at the 40 grid points at every model time step.The observation operators in such a situation are identity matrices.
Several sensitivity experiments are conducted in an imperfect model scenario with forcing parameter F = 9.Besides model errors, an initial bias of 2.00 is added to each grid point for examining both methods' capacity of handling error as close as possible to a realistic situation.Analysis is performed every model time step (or 6 h) for 1500 time steps.Exp-1 compares the performance of the DRP-4DVar, the 4DEnVar, and the ETKF methods.For the 4DEnVars, a single simulation is cycled in time through the forecast steps and analysis steps, and the sample ensemble at each analysis time is generated through ensemble forecasting at the beginning of each assimilation window with an ensemble size 80.The random perturbations, which are used to generate the initial ensemble of each assimilation window, are Gaussian distributed with mean of zero and standard deviation 0.10.The assimilation window has a 6 model time-step lengths (corresponding to 36 h).In the DRP-4DVar method, the leading 20 EOF vectors are extracted through the EOF analysis to form the reduced dimension space for major analysis.Meanwhile, the ETKF employs an ensemble with the ensemble size 100.
Both the localization and inflation techniques are essential to ameliorate the contaminations resulting from the spurious correlation over long spatial distances or between variables known to be uncorrelated [50].For example, Amezcu and Goodliff found that with observations at every time step 4DVar outperforms 4DEnVar, but with localization, the localized 4DEnVar has similar results to the 4DVar [51,52].To ameliorate problems related to sampling error in this work, the ETKF adopts the multiplicative covariance inflation technique [53], in which the background error covariance is multiplied by (1 + ∆), and ∆ is the inflation factor.The manually optimized value ∆ = 30% is adopted, which yields the minimum mean root-mean-square errors (RMSEs) among 10%, 15%, 20%, 25%, 30%, 35%, and 40% inflation settings.
Following the Exp-1, sensitivity experiments Exp-2, Exp-3, Exp-4, and Exp-5 (see Table 1) are conducted to further illustrate the role of the EOF analysis and assess the impact of pertinent parameters (i.e., EOF truncation number, ensemble size, assimilation window length, and standard deviation of initial random perturbation for each assimilation window).

Experimental Results
As shown in Figure 2, the ETKF, 4DEnVar, and DRP-4DVar methods all successfully assimilated the observations and substantially improved result accuracy.The mean analysis RMSEs of the DRP-4DVar, the 4DEnVar, and the ETKF methods are 0.253, 0.310, and 0.386, respectively, which are much lower than observation error (approximately 1.00).Such results indicate that all the assimilation schemes are able to handle the model error and the initial bias.Meanwhile, the ensemble spread of the DRP-4DVar and the ETKF methods provide proper estimations of the uncertainty.The background spread of the ETKF lies around the mean RMSE and within the +/− standard deviation line, and so does the ensemble spread of the DRP-4DVar method (represented by the mean ensemble spread during the assimilation window).However, the analysis space transforming and the noise filtering of the DRP-4DVar method render positive influence, as the mean analysis RMSE of the DRP-4DVar method is 0.133 (34%) and 0.057 (18%) lower than that of the ETKF and the 4DEnVar methods, respectively.To evaluate the analysis space transformation, a series test of the DRP-4DVar method with different m value is done in Exp-2.The RMSEs of the ETKF and the 4DEnVar methods in Exp-1 are used as benchmarks.Firstly, the difference among the ETKF, the 4DEnVar, and the DRP-4DVar methods with m = 75 in Figure 3 shows superiority of the 4DEnVar methods' simultaneous assimilation of asynchronous observations in this study.The DRP-4DVar method with m = 75 produces results similar to that of the 4DEnVar method in analysis RMSE (0.300 vs. 0.310), whereas it surpasses the ETKF by 0.086 (22.3%).Figure 4 illustrates that the cumulative sum of the variance explained of the first 75 EOFs is almost 100%.Further, Figure 5 shows that the 4DEnVar and the DRP-4DVar methods with m = 75 generate the almost same analysis increment during the assimilation window.Therefore, in accordance with the theoretical analysis in Section 2.3, the EOF analysis plays a role in transforming the basis from an ensemble space to an eigenvalue space.For a certain assimilation time window, Figure 5d-f shows in detail that the 4DEnVar methods tend to give more weight on the background state and smaller analysis increment than the ETKF.Considering the relatively large observation error, this inhibits improper correction of the background state and plausibly explains the better performance of the 4DEnVar method in analysis RMSE.For impact of noise filtering, Figure 3 shows that both mean background RMSEs and mean analysis RMSEs in the DRP-4DVar method at first sharply decrease with the increasing of until reaches 30, and then increase slowly when continues to increase.The DRP-4DVar method with a wide range of EOF truncation numbers surpasses the ETKF and the 4DEnVar methods in analysis RMSE.The first dramatic RMSE decrease suggests extremely low values need to be avoided.Figure 4(a) shows that the first small part of EOFs corresponds to a large variance explained, whereas the following EOFs correspond to a long tail of decreasing values.This explains the first dramatic RMSE decrease, since every EOF mode of the first small part matters.When is larger than 30, variances explained by the EOF modes are indistinguishable within their uncertainties.This means that their actual structures may not be particularly interesting, since any linear combination of these patterns is as significant as each of them [47].When = 30, filtering the noise contributes to a 0.057 analysis RMSE reduction.Figure 4(b) also shows that the cumulative variance explained increases rapidly above 90% at = 15 and to almost 100% at = 35.To further explain the results, Figure 5a-e illustrates that increasing the EOF truncation number improves the performance if is small; while increasing the EOF truncation number brings indistinct impact if just corresponds to the sufficient variance explained, whereas it causes slight deterioration if is overlarge.Therefore, choosing a relatively large EOF truncation number improves the accuracy and the optimal EOF truncation number in this study is = 30.For impact of noise filtering, Figure 3 shows that both mean background RMSEs and mean analysis RMSEs in the DRP-4DVar method at first sharply decrease with the increasing of m until m reaches 30, and then increase slowly when m continues to increase.The DRP-4DVar method with a wide range of EOF truncation numbers surpasses the ETKF and the 4DEnVar methods in analysis RMSE.The first dramatic RMSE decrease suggests extremely low m values need to be avoided.Figure 4a shows that the first small part of EOFs corresponds to a large variance explained, whereas the following EOFs correspond to a long tail of decreasing values.This explains the first dramatic RMSE decrease, since every EOF mode of the first small part matters.When m is larger than 30, variances explained by the EOF modes are indistinguishable within their uncertainties.This means that their actual structures may not be particularly interesting, since any linear combination of these patterns is as significant as each of them [47].When m = 30, filtering the noise contributes to a 0.057 analysis RMSE reduction.Figure 4b also shows that the cumulative variance explained increases rapidly above 90% at m = 15 and to almost 100% at m = 35.To further explain the results, Figure 5a-e illustrates that increasing the EOF truncation number improves the performance if m is small; while increasing the EOF truncation number brings indistinct impact if m just corresponds to the sufficient variance explained, whereas it causes slight deterioration if m is overlarge.Therefore, choosing a relatively large EOF truncation number improves the accuracy and the optimal EOF truncation number in this study is m = 30.Because of the space transforming and noise filtering role of the EOF analysis, the DPR-4DVar method is expected to be relatively robust to ensemble perturbations' representativeness of the model's dynamic structure.At first, results of Exp-3 (see Figure 6), which investigates the sensitivity to ensemble size, shows that proper EOF analysis setting is capable of ameliorating negative influence due to sampling error and generating analysis results with acceptable accuracy.Mean RMSEs of the DRP-4DVar experiments, as well as the RMSE decreasing speed, decreasing with the increasing ensemble size, and the mean RMSEs are indistinct when the ensemble size is larger than 75.The poorest performance of the DRP-4DVar in Exp-3 is still comparable with the 4DEnVar method and much better than the ETKF in Exp-1.This result indicates that the negative effect of the insufficient ensemble size is counteracted by the refinement of the EOF truncation number, since the 4DEnVar method has been proven to be equivalent to the DRP-4DVar method with = .Additionally, the stability of the performance of the DRP-4DVar method is achieved without additional covariance inflation or localization techniques used in previous studies (e.g., [22,25]).Subsequently, the mean RMSEs of the DRP-4DVar method in Exp-4 (see Figure 7) show a similar pattern to those in Exp-3.The DRP-4DVar method with an optimized EOF analysis setting is able to handle the impact of the assimilation window length, though extremely short window lengths need to be avoided.The RMSEs with assimilation windows longer than 24 h are lower than those of the 4DEnVar method in Exp-1, which is mainly because a longer assimilation window facilitates the development of ensemble perturbations representative of the model dynamics.Yet, the RMSEs' variation due to the increasing assimilation window length is more dramatic.Extremely short assimilation window lengths lead to failure of the assimilation analysis when the RMSE is Because of the space transforming and noise filtering role of the EOF analysis, the DPR-4DVar method is expected to be relatively robust to ensemble perturbations' representativeness of the model's dynamic structure.At first, results of Exp-3 (see Figure 6), which investigates the sensitivity to ensemble size, shows that proper EOF analysis setting is capable of ameliorating negative influence due to sampling error and generating analysis results with acceptable accuracy.Mean RMSEs of the DRP-4DVar experiments, as well as the RMSE decreasing speed, decreasing with the increasing ensemble size, and the mean RMSEs are indistinct when the ensemble size is larger than 75.The poorest performance of the DRP-4DVar in Exp-3 is still comparable with the 4DEnVar method and much better than the ETKF in Exp-1.This result indicates that the negative effect of the insufficient ensemble size is counteracted by the refinement of the EOF truncation number, since the 4DEnVar method has been proven to be equivalent to the DRP-4DVar method with m = K.Additionally, the stability of the performance of the DRP-4DVar method is achieved without additional covariance inflation or localization techniques used in previous studies (e.g., [22,25]).Subsequently, the mean RMSEs of the DRP-4DVar method in Exp-4 (see Figure 7) show a similar pattern to those in Exp-3.The DRP-4DVar method with an optimized EOF analysis setting is able to handle the impact of the assimilation window length, though extremely short window lengths need to be avoided.The RMSEs with assimilation windows longer than 24 h are lower than those of the 4DEnVar method in Exp-1, which is mainly because a longer assimilation window facilitates the development of ensemble perturbations representative of the model dynamics.Yet, the RMSEs' variation due to the increasing assimilation window length is more dramatic.Extremely short assimilation window lengths lead to failure of the assimilation analysis when the RMSE is larger than 1.00.This result is also in accordance with previous studies on the assimilation window length in 4DVar and the DRP-4DVar methods [7,36].
Atmosphere 2017, 8, 146 14 of 19 larger than 1.00.This result is also in accordance with previous studies on the assimilation window length in 4DVar and the DRP-4DVar methods [7,36].ensemble spreads after model integration to underestimate the background RMSE, which means background states are over-weighted and little information from observations is taken to correct the uncertainty due to the imperfect model and erroneous initial condition.High leads to the overlarge ensemble spreads after model integration, so the observations are given too much weight even when the background state is much more accurate.Such impact of resembles that of the standard deviation of the white noise used in the additive covariance inflation [22,55].uncertainty due to the imperfect model and erroneous initial condition.High leads to the overlarge ensemble spreads after model integration, so the observations are given too much weight even when the background state is much more accurate.Such impact of resembles that of the standard deviation of the white noise used in the additive covariance inflation [22,55].Exp-5 (see Figure 8) illustrates that the mean RMSEs of the DRP-4DVar method sharply decrease with the increasing of SD I at first, and then increase slowly when SD I continues to increase.Most mean RMSEs of the DRP-4DVar method are smaller than that of the 4DEnVar method in Exp-1.This result indicates the DRP-4DVar method is relatively stable to the variation of SD I .However, similar to the situation of the EOF truncation number, the dramatic RMSE decrease suggests that choosing large SD I values is relatively safe.Extremely low SD I (SD I = 0.01) values lead to the failure of assimilation (analysis RMSE = 5.046, not shown).Further, increasing mean ensemble spreads in Figure 8 might explain the pattern of RMSEs.Extremely low SD I causes the ensemble spreads after model integration to underestimate the background RMSE, which means background states are over-weighted and little information from observations is taken to correct the uncertainty due to the imperfect model and erroneous initial condition.High SD I leads to the overlarge ensemble spreads after model integration, so the observations are given too much weight even when the background state is much more accurate.Such impact of SD I resembles that of the standard deviation of the white noise used in the additive covariance inflation [22,55].

Summary and Conclusions
This study investigates the role of EOF analysis in the 4DEnVar scheme DRP-4DVar and the optimization of pertinent parameters, through comparisons among the DRP-4DVar, 4DEnVar, and the ETKF methods.The mathematical formula comparison and sensitivity experiments by using the Lorenze-96 model indicate that the EOF analysis plays an important role in space transforming and noise filtering in DRP-4DVar.The DRP-4DVar method with little dimension reduction produces results similar to that of the 4DEnVar method in analysis root-mean-square error (RMSE), whereas it surpasses the ETKF by 22.3%, and the further refinement of the EOF truncation number magnifies the superiority by 12.2% and slightly reduces the computational cost.
The space transforming and noise filtering role of the EOF analysis improves the computational efficiency and the analysis accuracy.The DRP-4DVar method with a wide range of the EOF truncation number surpasses the ETKF and the 4DEnVar methods in analysis RMSE.This explains the successful applications of the DRP-4DVar method as well as other 4DEnVar methods [20,24,34,38,56], without special optimization of the truncation number.Yet, optimizing the EOF truncation number makes sense.The DRP-4DVar method with the optimized EOF truncation number is robust to the variation of pertinent parameters, though extremely low parameter values should be avoided.Results of the sensitivity experiments on the parameters (i.e., the ensemble size, the assimilation window length, and the standard deviation of the initial perturbation for each assimilation window) also agree with previous studies on EnKF, 4DVar, 4DEnVar, and DRP-4DVar [7,22,36,55,57], Thus, refinement of the EOF truncation number is attractive for cases with a high requirement of the computational efficiency and the analysis accuracy, particularly for applications on operational numerical system (e.g., [39]).
Clarifying the role of the EOF analysis and its relevant parameter optimization facilitates improvement of the computational efficiency and the analysis accuracy in the DRP-4DVar method, as well as other 4DEnVar methods.The results presented here suggest the potential wide application of EOF analysis in the 4DEnVar methods.Such applications are attractive since the 4DEnVar methods require no tangent linear model or adjoint model, and share the advantages of both EnKF and 4DVar.Our future work will focus on the development of regional chemical data assimilation systems with the 4DEnVars scheme over East Asia, and comparing it with the current EnKF-based assimilation system [58].

Summary and Conclusions
This study investigates the role of EOF analysis in the 4DEnVar scheme DRP-4DVar and the optimization of pertinent parameters, through comparisons among the DRP-4DVar, 4DEnVar, and the ETKF methods.The mathematical formula comparison and sensitivity experiments by using the Lorenze-96 model indicate that the EOF analysis plays an important role in space transforming and noise filtering in DRP-4DVar.The DRP-4DVar method with little dimension reduction produces results similar to that of the 4DEnVar method in analysis root-mean-square error (RMSE), whereas it surpasses the ETKF by 22.3%, and the further refinement of the EOF truncation number magnifies the superiority by 12.2% and slightly reduces the computational cost.
The space transforming and noise filtering role of the EOF analysis improves the computational efficiency and the analysis accuracy.The DRP-4DVar method with a wide range of the EOF truncation number surpasses the ETKF and the 4DEnVar methods in analysis RMSE.This explains the successful applications of the DRP-4DVar method as well as other 4DEnVar methods [20,24,34,38,56], without special optimization of the truncation number.Yet, optimizing the EOF truncation number makes sense.The DRP-4DVar method with the optimized EOF truncation number is robust to the variation of pertinent parameters, though extremely low parameter values should be avoided.Results of the sensitivity experiments on the parameters (i.e., the ensemble size, the assimilation window length, and the standard deviation of the initial perturbation for each assimilation window) also agree with previous studies on EnKF, 4DVar, 4DEnVar, and DRP-4DVar [7,22,36,55,57], Thus, refinement of the EOF truncation number is attractive for cases with a high requirement of the computational efficiency and the analysis accuracy, particularly for applications on operational numerical system (e.g., [39]).
Clarifying the role of the EOF analysis and its relevant parameter optimization facilitates improvement of the computational efficiency and the analysis accuracy in the DRP-4DVar method, as well as other 4DEnVar methods.The results presented here suggest the potential wide application of EOF analysis in the 4DEnVar methods.Such applications are attractive since the 4DEnVar methods require no tangent linear model or adjoint model, and share the advantages of both EnKF and 4DVar.Our future work will focus on the development of regional chemical data assimilation systems with the 4DEnVars scheme over East Asia, and comparing it with the current EnKF-based assimilation system [58].

Figure 1 .
Figure 1.Schematic diagram of the 4DEnVar method (a) and the ETKF (b).

Figure 1 .
Figure 1.Schematic diagram of the 4DEnVar method (a) and the ETKF (b).

Figure 2 .
Figure 2. Time series of the analysis RMSE (red solid line) and background spread (blue solid line) of the ETKF experiment (a), the 4DEnVar experiment (b) and the DRP-4DVar (c) in Exp-1.Observation RMSEs (black dash line), mean analysis RMSEs (pink dash line), and Mean analysis RMSE +/− standard deviation of analysis RMSEs (green solid line) are computed over last 500 time steps.

Figure 2 .
Figure 2. Time series of the analysis RMSE (red solid line) and background spread (blue solid line) of the ETKF experiment (a), the 4DEnVar experiment (b) and the DRP-4DVar (c) in Exp-1.Observation RMSEs (black dash line), mean analysis RMSEs (pink dash line), and Mean analysis RMSE +/− standard deviation of analysis RMSEs (green solid line) are computed over last 500 time steps.

Atmosphere 2017, 8 , 146 12 of 19 Figure 3 .
Figure 3.The mean RMSEs of the DRP-4DVar with m set as a serious different value from 5 to 75.Other assimilation parameters of the DRP-4DVar experiments are the same as those of the DRP-4DVar experiment in Exp-1.The mean RMSEs of the ETKF and the 4DEnVar methods in Exp-1 are used as benchmarks, which are noted as B1 and B2, respectively.The black bars and white bars denote the mean background RMSE and the mean analysis RMSE during the last 500 time steps, respectively.

Figure 3 .
Figure 3.The mean RMSEs of the DRP-4DVar with m set as a serious different value from 5 to 75.Other assimilation parameters of the DRP-4DVar experiments are the same as those of the DRP-4DVar experiment in Exp-1.The mean RMSEs of the ETKF and the 4DEnVar methods in Exp-1 are used as benchmarks, which are noted as B1 and B2, respectively.The black bars and white bars denote the mean background RMSE and the mean analysis RMSE during the last 500 time steps, respectively.

Atmosphere 2017, 8 , 146 13 of 19 Figure 4 .
Figure 4. (a).The mean variance explained of the first 75 EOFs during the experimental period, i.e. the mean eigenvalue spectrum of the covariance matirx of the sample ensemble.Vertical bars show approximate 95% confidence limits given by the rule of thumb [54], which also indicate the uncertainty of the variance explained.(b).The cumulative variance explained by the first m modes.The data are adopted from the DRP-4DVar experiment with m = 75 in Exp-2.

Figure 4 .
Figure 4. (a).The mean variance explained of the first 75 EOFs during the experimental period, i.e., the mean eigenvalue spectrum of the covariance matirx of the sample ensemble.Vertical bars show approximate 95% confidence limits given by the rule of thumb [54], which also indicate the uncertainty of the variance explained.(b).The cumulative variance explained by the first mth modes.The data are adopted from the DRP-4DVar experiment with m = 75 in Exp-2.

Figure 5 .
Figure 5.The analysis increments (correction, color shades) and forecast (background) errors (true state minus background mean, contours) of the ETKF, the 4DEnVar, and the DRP-4DVar methods with different m at the grid points 11~20 during the assimilation window on the 1200th model time step.a-d correspond to the DRP-4DVar method with as 15, 25, 30, 75, e correspond to the 4DEnVar method, and f corresponds to the ETKF.The assimilation window length is 36 hours (corresponding to 6 time-step length).The analysis y′ of DRP-4DVar is generated by linear combination of the basis with different as Equation (7).As to the DRP-4DVar method with the different and the 4DEnVar method, the basis vectors and the background errors are computed from the ensemble perturbations from the same original background ensemble during the assimilation window.The original background ensemble is generated by the DRP-4DVar experiment with = 30 in Exp-1.Meanwhile, since the ETKF sequentially assimilates observations, f employs the analysis y′ and background error during the 1200-1206 time steps (corresponding to the 36 h assimilation window) of the ETKF experiment in EXP-1.Note that f has an order of magnitude different color scale.

Figure 5 .
Figure 5.The analysis increments (correction, color shades) and forecast (background) errors (true state minus background mean, contours) of the ETKF, the 4DEnVar, and the DRP-4DVar methods with different m at the grid points 11~20 during the assimilation window on the 1200th model time step.(a-d)correspond to the DRP-4DVar method with m as 15, 25, 30, 75, (e) correspond to the 4DEnVar method, and (f) corresponds to the ETKF.The assimilation window length is 36 h (corresponding to 6 time-step length).The analysis y b of DRP-4DVar is generated by linear combination of the basis with different m as Equation(7).As to the DRP-4DVar method with the different m and the 4DEnVar method, the basis vectors and the background errors are computed from the ensemble perturbations from the same original background ensemble during the assimilation window.The original background ensemble is generated by the DRP-4DVar experiment with m = 30 in Exp-1.Meanwhile, since the ETKF sequentially assimilates observations, f employs the analysis y b and background error during the 1200-1206 time steps (corresponding to the 36 h assimilation window) of the ETKF experiment in EXP-1.Note that f has an order of magnitude different color scale.

Figure 6 .
Figure 6.The mean RMSEs of DRP-4DVar with ensemble size of 35-100.Other assimilation parameters of the DRP-4DVar experiments are the same as those of the DRP-4DVar experiment in Exp-1.The mean RMSEs of the ETKF and the 4DEnVar methods in Exp-1 are used as benchmarks, which are noted as B1 and B2, respectively.The black bars and white bars denote the mean background RMSE and the mean analysis RMSE during the last 500 time steps, respectively.

Figure 7 .
Figure 7.The mean RMSEs of the ETKF (noted as B1), the 4DEnVar (noted as B2), and the DRP-4DVar methods with assimilation window length set as a series of different value from 0 to 54 h.Other assimilation parameters of the DRP-4DVar experiments are the same as those of the DRP-4DVar experiment in Exp-1.The mean RMSEs of the ETKF and the 4DEnVar methods in Exp-1 are used as benchmarks, which are noted as B1 and B2, respectively.The black bars and white bars denote the mean background RMSE and the mean analysis RMSE during the last 500 time steps, respectively.

Figure 6 .
Figure 6.The mean RMSEs of DRP-4DVar with ensemble size of 35-100.Other assimilation parameters of the DRP-4DVar experiments are the same as those of the DRP-4DVar experiment in Exp-1.The mean RMSEs of the ETKF and the 4DEnVar methods in Exp-1 are used as benchmarks, which are noted as B1 and B2, respectively.The black bars and white bars denote the mean background RMSE and the mean analysis RMSE during the last 500 time steps, respectively.

Figure 6 .
Figure 6.The mean RMSEs of DRP-4DVar with ensemble size of 35-100.Other assimilation parameters of the DRP-4DVar experiments are the same as those of the DRP-4DVar experiment in Exp-1.The mean RMSEs of the ETKF and the 4DEnVar methods in Exp-1 are used as benchmarks, which are noted as B1 and B2, respectively.The black bars and white bars denote the mean background RMSE and the mean analysis RMSE during the last 500 time steps, respectively.

Figure 7 .
Figure 7.The mean RMSEs of the ETKF (noted as B1), the 4DEnVar (noted as B2), and the DRP-4DVar methods with assimilation window length set as a series of different value from 0 to 54 h.Other assimilation parameters of the DRP-4DVar experiments are the same as those of the DRP-4DVar experiment in Exp-1.The mean RMSEs of the ETKF and the 4DEnVar methods in Exp-1 are used as benchmarks, which are noted as B1 and B2, respectively.The black bars and white bars denote the mean background RMSE and the mean analysis RMSE during the last 500 time steps, respectively.

Figure 7 .
Figure 7.The mean RMSEs of the ETKF (noted as B1), the 4DEnVar (noted as B2), and the DRP-4DVar methods with assimilation window length set as a series of different value from 0 to 54 h.Other assimilation parameters of the DRP-4DVar experiments are the same as those of the DRP-4DVar experiment in Exp-1.The mean RMSEs of the ETKF and the 4DEnVar methods in Exp-1 are used as benchmarks, which are noted as B1 and B2, respectively.The black bars and white bars denote the mean background RMSE and the mean analysis RMSE during the last 500 time steps, respectively.

Figure 8 .
Figure 8.The mean RMSEs and ensemble spread of the DRP-4DVar method using initial random perturbation for each assimilation window with standard deviation of 0.05~0.35.Other assimilation parameters of the DRP-4DVar experiments are the same as those of the DRP-4DVar experiment in Exp-1.The mean RMSEs and ensemble spread of the ETKF and the 4DEnVar methods in Exp-1 are used as benchmarks, which are noted as B1 and B2, respectively.The ensemble spread of the DRP-4DVar and the 4DEnVar methods refers to the average of the ensemble spreads during an assimilation window.The black bars, gray bars, and white bars denote the mean background RMSE, the mean analysis RMSE and the mean ensemble during the last 500 time steps, respectively.

Figure 8 .
Figure 8.The mean RMSEs and ensemble spread of the DRP-4DVar method using initial random perturbation for each assimilation window with standard deviation of 0.05~0.35.Other assimilation parameters of the DRP-4DVar experiments are the same as those of the DRP-4DVar experiment in Exp-1.The mean RMSEs and ensemble spread of the ETKF and the 4DEnVar methods in Exp-1 are used as benchmarks, which are noted as B1 and B2, respectively.The ensemble spread of the DRP-4DVar and the 4DEnVar methods refers to the average of the ensemble spreads during an assimilation window.The black bars, gray bars, and white bars denote the mean background RMSE, the mean analysis RMSE and the mean ensemble during the last 500 time steps, respectively.
al. in a cloud-resolving model, Yang et al. in a quasi-geostrophic model, Buehner et al. in global deterministic NWP, Poterjoy et al. in the Weather Research and Forecasting (WRF) model, Miyoshi et al. in the Japan Meteorological Agency's operational global analysis and prediction system, and Skachko et al. in a stratospheric chemical transport model

Table 1 .
Settings of sensitivity experiments for parameters related to the EOF analysis in the DRP-4DVar.