Data Assimilation System Applied to Short-Range Forecast System

: An evaluation of 3DVAR, 3DEnVAR and 4DEnVAR methods is carried out, assimilating in a combined way prepbufr and radiances data, applied to the Short-range Forecast System to de-termine which scheme is more suitable for short-term forecasting purposes. The results suggest that hybrid schemes tend to generate more accurate forecasts than 3DVAR, however, 4DEnVAR is the most robust scheme and therefore the one that provides more realistic solutions. The forecast of the accumulated rainfall in 24 h constitutes the greatest difficulty since all the assimilation schemes generate underestimates related with 24 h rainfall forecast.


Introduction
With the aim of obtaining more realistic solutions in relation to short and very short term numerical forecasting, the SisPI project (Short-range Forecast System, acronyms in Spanish), whose fundamental core is the WRF-ARW (Weather Research and Forecasting-Advanced Research WRF) mesoscale model its working operationally at the Meteorology Institute of Cuba. One of the lines that is currently in full development within the framework of SisPI is the implementation of data assimilation techniques, seeking an universal design that meets the growing demands for forecasts and numerical products, all of this framed within of project "Development of the data assimilation module for the Shortrange Forecast System", belonging to the "Meteorology and sustainable development" program.
Efforts to operationally include a data assimilation scheme that would meet the demands in terms of forecast quality and current technological capabilities began from the very beginning of the project [1]. However, they were limited since these studies involved only the 3DVAR scheme and, although experiments were carried out with different covariance matrices [2,3], the use of the generic matrix prevailed. Recently, aspects related to the extension of the temporary window for the construction of the matrices [4], the impact of the application of multiple outer loops (OL), as well as the application of hybrid methods have been included in the experiments in in order to achieve the most robust scheme.
In this research, three assimilation schemes are evaluated, 3DVAR, 3DEnVAR and 4DEnVAR, all available in the WRFDA module. The impact of combined prepbufr and radiances data assimilation, these lastests obtained from microwave channel from polar orbit satellites, is analyzed. For the investigation, three studie cases are used: Hurricane ETA (Tropical Storm passing over Cuba) at 07-08/11/2020; a mesoscale convective system (SCM) that affected the central region of the country at 05/23/2020 and an thunderstorm over the western north coast that occurred at 09/29/2020.

Short-Range Forecast System (SisPI)
To develop the experiments, the WRF model was used with the ARW dynamic core in its version 3.8.1 [1,5]. The main objective of this system is very short and short-term forecasting. SisPI's desing has two way nested domains of 27 and 9 km (kilometers) respectively and a one way nested domain with 3 km of grid resolution (Figure 1). The model was initialized from GFS (Global Forecast System) forecast data with 0.5° horizontal resolution, the same that is used at the operative desing. The configuration proposed by SisPI includes 28 vertical levels, the Mellor-Yamada-Nakanishi and Niino2.5 PBL scheme and the RRTM longwave parameterization for all domains. In the case of low-resolution domains, it contains the microphysics of WSM5, the Grell-Freitas cumulus parameterization, and the shortwave Dudhia scheme. For the high resolution domain, the Morrison double-moment microphysics is used, the cumulus parameterization are deactivated and the Goddard shortwave radiation scheme is used. These differences has been supported by sensitivity studies makes in the project develop [1,5].

3-Dimentions Variational (3DVAR)
The 3DVAR method can be summarized as the iterative solution to find the state that minimizes the cost function (Equation (1)). This solution represents the maximum probability (least variance) estimate of the atmosphere true state given two a priori data sources: the background field and observations [6].

Hybrid Assimilation Methods (3DEnVAR-4DEnVAR)
Recent studies suggest that hybrid formulations have some advantages over 3DVAR, highlighting the fact that the covariances that weight the model errors extracted from an ensemble, being flow-dependent, can better represent the error of the day, as opposed to the isotropic and static characteristics that contains the 3DVAR algorithm [7]. This means that, in the purely variational case, when the error contained in the covariance matrix differs from the flow of the day, the results may not be satisfactory [7].
In the 3DEnVAR formulation (Equation (2)) the ensemble is valid only for the analysis time: (2) In this case, B is the weight attributed to the ensemble, with C being the correlation matrix for the effective location of the ensemble's perturbations.
The algorithm used for 4DEnVAR is very similar, the difference lies in the fact that it requires ensemble perturbations and observations for multiple time steps (Equation (3)) [8].
where k represents the multiple perturbations used in the algorithm.

Experiments Desing
To carry out the data assimilation experiments, the WRFDA version 3.9 package was used, because the 4DEnVAR method is unvaliable at previous versions. The assimilation was executed only on the domain with the highest horizontal resolution (3 km). For all experiments, a statistical background error domain-dependent was used, using the forecasts generated up to 15 days prior to initialization at interest day. The construction of background error was carried out using the gen_be_wrapper.ksh program, available in the WRFDA.
For 3DVAR's experiments, multiple outers loops (OLs) were applied. This was made following the results showed in similar studies by [7,9]; For these experiments where multiple OLs are used, it was decided to empirically modify the multiplicative weight of the standard deviation and the scale length too, reducing them in each OL (Table 1). This strategy aims to give greater influence to first guess improved in the successive OL.  Regarding the application of the hybrid schemes, the ensemble designed to obtain the flow-dependent perturbations was build using previous SisPI runs, under the criterion that they included members at least to 12 h prior to the initialization moment, with the purpose of mitigating the possible effects of the model spin-up on the calculation of flowdependent perturbations. This allowed to make a small ensemble with 5 members.
For this research, a weight of 75% was arbitrarily assigned to the ensemble contribution and 25% to the static covariance matrix, giving thus a greater relevance to flow-dependent errors on the assimilation process in the case of hybrid schemes. This weight ratio has also been established in other studies with satisfactory results [7,9].
For the evaluation of the experiments, the satellite precipitation estimation data of the GPM product [10] were used, as well as the data from the surface meteorological stations of Meteorology Institute.

Differences Related with the Volume of Data Assimilated by the Methods
In relation to data in prepbufr format, 3DVAR and 3DEnVAR are usually close, while 4DEnVAR assimilates a significantly higher volume. The fundamental differences seem to be in the radiances, where it was the hybrid schemes that maintained high volumes of information of this type assimilated in contrast to 3DVAR. These differences seem to be associated with the algorithms inside the assimilation methods, because the avaliable observations and the assimilation domain was the same in all cases.

Cost Function Reduction and Application of Multiple OLs
The use of multiple OLs leads to an increase in the computational cost requiring a greater number of iterations. It is found that this technique does not always guarantee variational control, since in some cases a ΔJo > 0 is obtained, which indicates that the number of rejected observations in the last OL was less than at the beginning, which is an indicative of the emphasis diminishing of the guess in later OLs. Hybrid methods lead to the minimization of the cost function in a similar number of iterations, initializing it at higher values as a result of a greater number of assimilated observations. The reduction rate is similar to 3DVAR for these experiments, although it can be seen that 4DEnVAR achieves a more effective minimization than 3DEnVAR (Figure 2). These results are considered to coincide with [8], who state that hybrid methods can work effectively with the application of only one OL.

Figure 2. Cost function reduction obtained by the diferents methods, (a) experiment initialized on 7
November 2020 at 12 UTC corresponding to tropical storm ETA; (b) experiment initialized on 23 May 2020 at 12 UTC corresponding to mesoscale convective system.

Incremental Analisys
The incremental analysis, a result of the difference between the analysis field and background, seen through cross-sections allows understanding the differences shown by the different assimilation schemes and the impact on the numerical forecast of different meteorological fields. For all cases 3DVAR method contribution to the background field is very small and local, coincidence with another results [8,10]. In the case of ETA the incremental analisys shows a change in latitudinal position of the storm core at day 7 experiment (omitted figure), expresed in a presence of a thermal dipole. There are also signs of growth of convective cells with respect to the background field, characterized by cooling at low levels and heating at high levels. The aforementioned differences become more noticeable in the runs of day 8 initialized at 0000 UTC.
The incremental analisys made in SCM indicate that the contribution was characterized by an increase in humidity at low levels, which was more marked in the case of 3DEn-VAR, which exceeds thresholds between 1.5 and 5 g/kg to 4DEnVAR. The latter proposes an initial condition with a lower and middle troposphere cooler than SisPI, while 3DEn-VAR limits this cooling to the lower layers (where increases in specific humidity occur) and exhibits warming zones with respect to SisPI at high levels, suggesting, mainly towards the Guamuhaya massif region, the early growth of convective cells. The contributions observed at the initialization of 0000 UTC support the theory of an accelerated stratification by the model, based on the additional cooling provided by the assimilation schemes (mainly the hybrids) in the layer between the 800 and 400 hPa surfaces.
In relation to the TE in the experiments initialized at 1200 UTC, it is observed that the hybrid methods tend to heat the layer between 600 and 300 hPa approximately. An increase in humidity is also observed at medium and high levels as a result of the contribution of flow disturbances contained in these methods. There is also a certain tendency to a slight cooling in the lower layers with respect to the background field, a solution that led to the early development of convective cells in these experiments. The main difference between both methods is that 4DEnVAR shows a slightly less humid atmosphere in some portions between the 850 and 700 hPa layers

Rainfall Forecast
In general the model tends to underestimate the total amount of precipitation in 24 h. The behavior of this variable in ETA show as likely cause of errors in spatial coverage and rainfall intensity in the hybrid schemes, the predicted interaction of ETA with an upper low, as these solutions locate the center of circulation of the upper low closer to the core of the upper low. ETA. The intrusion of dry air at mid-levels, favored by the presence of the aforementioned system, may have been the determining factor in the forecast of the distribution of the storm's rainfall areas (Figure 3).
In relation to the SCM, the 3DVAR scheme is the one that produces a somewhat superior solution to the rest of the experiments. On the other hand, the hybrid schemes lead to very similar forecasts with errors in the spatial coverage of the precipitation areas higher than those exhibited by 3DVAR or SisPI itself. In the opposite way, in the experiments initialized at 00:00 UTC, because although the underestimation trend continues, the solution proposed by 3DVAR is very similar to that of SisPI and it is the hybrid schemes that produce significant improvements, especially towards the region western country.
Finally at the TE case, SisPI and 3DVAR solutions were closer, although 3DVAR improved the spatial coverage of precipitation, especially towards the eastern half of the country where convection was stimulated by the presence of a tropical wave. Towards the western half (where the study case was developed) the differences between 3DVAR and SisPI were not very significant. On the other hand, the hybrid methods increased the prediction error towards the eastern part, significantly reducing the precipitation areas. However, a slight improvement is observed towards the west, although both schemes predicted the region of convergence of the mesoscale flow just overland on the north coast of Pinar del Río and Artemisa, a solution derived from the forecast of a weaker southeast region flow. in relation to reality. On the contrary, the solutions of the hybrid schemes corresponding to the initialization of 0000 UTC turned out to be much superior to the forecasts emanating from SisPI and 3DVAR, improving the spatial coverage and the intensity of the precipitation in numerous points of the domain.

Conclusions
The results described in the previous section indicate that the model shows a clear tendency to underestimate the accumulated rainfall in 24 h, which is only partially corrected by the assimilation schemes as they have been designed.
In relation to the purely variational proposal, it is obtained that it poorly modifies the background field as a result of the isotropic and static characteristics of the algorithm, which causes its results to quickly converge to the solution without assimilation. In this sense, the application of multiple LOs modifying the multiplicative coefficients of the scale length and the variance in order to give greater representativeness to the first corrected approximation resulting from the initial LOs is not satisfactory and increases the computational cost of the method.
On the other hand, the contribution of the flow-dependent errors in combination with the static errors contained in the covariance matrix, in the hybrid schemes, manages to palpably modify the field of the first approximation, leading to the assimilation effect being prolong in an approximate threshold of 6 to 12 h. The performance of the 3DEnVAR method is unstable as it can lead to very realistic forecasts or others comparable with 3DVAR in the same situation. The 4DEnVAR method allows assimilating a significantly larger volume of conventional data, however it does not exhibit the same superiority in (a) (b) relation to radiances. The 4DEnVAR scheme turns out to be the most robust of the three because, although it does not always turn out to be the one that exhibits the most realistic solutions, it is the only one whose forecast is always superior to the run without assimilation.