Model Error Representation Using the Stochastically Perturbed Hybrid Physical–Dynamical Tendencies in Ensemble Data Assimilation System

Ensemble data assimilation systems generally suffer from underestimated background error covariance that leads to a filter divergence problem—the analysis diverges from the natural state by ignoring the observation influence due to the diminished estimation of model uncertainty. To alleviate this problem, we have developed and implemented the stochastically perturbed hybrid physical–dynamical tendencies to the local ensemble transform Kalman filter in a global numerical weather prediction model—the Korean Integrated Model (KIM). This approach accounts for the model errors associated with computational representations of underlying partial differential equations and the imperfect physical parameterizations. The new stochastic perturbation hybrid tendencies scheme generally improved the background error covariances in regions where the ensemble spread was not sufficiently expressed by the control experiment that used an additive inflation and the relaxation to prior spread method.


Introduction
Numerical weather prediction (NWP) includes inevitable forecast errors due to uncertainties in both initial conditions and models. The model error would be observed even though the initial conditions perfectly describe the true state of atmospheric flows [1]; thus, the differences in tendencies between the model and the real world always exist. As an alternative to deterministic forecasting, which has limitations in representing this uncertainty, an ensemble prediction system (EPS) can be utilized to overcome the limitation [2,3]. In particular, the ensemble data assimilation (EDA) system is beneficial to represent the initial uncertainties and flow-dependent background error covariance (BEC), thus improving the assimilation of observations [4]. If the ensemble perturbation includes the true observation and the model error, the unperturbed analysis will represent the analysis error [5]. However, the EDA usually suffers from the underestimation of BEC due to the limited ensemble size, sampling errors, and model error [6]. This underestimation leads to a filter divergence problem: the analysis diverges from the natural state by ignoring the observation influence due to a small forecast uncertainty. Conversely, with an over-dispersive ensemble, the analysis ignores the model errors due to a large forecast uncertainty.
If plenty of ensemble sizes are available, the sampling error is expected to be moderately eliminated [7,8]; however, this would be impractical due to limited computational resources. In general, the problems related to covariance underestimation and sampling error can be solved with covariance inflation and localization, respectively [6]-the former inflates the small uncertainty while the latter reduces sampling errors due to limited ensemble size. In this study, we focus on the covariance inflation only. During the last a few decades, several inflation methods have been proposed, including multiplicative inflation (e.g., [9]), additive inflation (e.g., [10,11]), and some alternative inflation methods using relaxation (e.g., [12][13][14]) and adaptive approaches (e.g., [6,[15][16][17]). Nevertheless, it has been found that the ensemble BEC still remains under-dispersive and underestimates the true uncertainty [18,19].
In addition to imperfect initial conditions, forecast errors are attributed to model uncertainties, stemming from approximations in governing equations, truncation and round-off errors, misrepresentations of physical processes, etc. The stochastic approach with random forcing (RF) is widely used to represent the model uncertainties in various applications [20] as well as in the EPSs of many operational centers [1]. The stochastically perturbed physical tendency (SPPT) scheme has been proven to be effective in ensemble spread (ES) and ensemble mean error (EME), compared to the traditional inflation methods [4,21]. Here, the ES-a standard deviation from the ensemble mean-provides an estimate of the forecast uncertainty [22], and its average is expected to have a similar spread to the EME. Recently, several studies have focused on the dynamic part of a forecast model; for instance, the stochastic kinetic energy backscatter (SKEB) assumes that the model error is associated with inadequate scale interactions in truncated NWP models [1,23,24]. Although it showed remarkable performance in the medium-range EPS, it has been difficult to obtain a positive impact in EDA [11]. As an alternative method, Koo and Hong [25] proposed the stochastically perturbed dynamical tendency (SPDT) scheme by assuming that the model error is associated with computational representations of the underlying partial differential equations to solve the atmospheric motion. They reported a significant improvement in prediction skills in a seasonal ensemble framework by perturbing both dynamical and physical tendencies, especially in poleward transient eddy and seasonal precipitation, compared to perturbing the physical tendency only. In addition, the round-off error could be a significant source of forecast error [26]. These imply that an additional stochastic perturbation to dynamical tendency would be beneficial in representing the model uncertainties.
In this study, we develop and implement both SPDT and SPPT schemes into an EDA system. As the local change of a model variable is a sum of physical and dynamical tendencies, these perturbed tendencies can account for model uncertainties. Note that SPDT has only been confirmed in the deterministic medium-range forecasts and seasonal prediction framework [25], whereas SPPT has been widely used in EDA as well as EPS in many operational centers [4,27]. As far as we know, this study is the first attempt to apply SPDT and SPPT simultaneously in EDA to help the ES increase. The scheme performance is evaluated with ensemble BEC that the ES of a 6 h forecast started from each initial condition member, compared to the EME. In Section 2, we describe the global NWP system, the EDA system, the stochastic perturbation hybrid tendencies scheme, and the experimental designs. Section 3 discusses the impact of the new scheme on the 6 h forecast in EDA. A summary and conclusions are provided in Section 4.

Numerical Weather Prediction Model
The Korean Integrated Model (KIM) [28] is a newly developed global weather prediction model developed at the Korea Institute of Atmospheric Prediction Systems (KIAPS); as of April 2020, it has become an operational model at the Korea Meteorological Administration (KMA). It is composed of non-hydrostatic governing equations on a cubed sphere, implemented with the state-of-the-art physics parameterization packages including radiation, gravity wave drag, vertical mixing, convection, cloud physics, and so on (see [28]). For the ensemble forecast in the local ensemble transform Kalman filter (LETKF) system, the horizontal resolution was about 50 km, but recently it has changed to 32 km. The model top is set to 0.01 hPa with 91 vertical levels in the hybrid sigma-pressure vertical coordinate. In this study, the ensemble model resolution is set to 50 km due to the limitation of computational resources. Since the model resolution is coarse, this study focuses on synoptic-scale structures rather than small-scale weather events.

Data Assimilation System
We use the four-dimensional (4D) LETKF, as a method of the EDA system at KIAPS, whose analysis is obtained by assimilating the available observations within a local region [29][30][31]. The 50 ensemble members analyze the zonal wind, meridional wind, potential temperature, humidity mixing ratio, and surface pressure at a 50 km horizontal resolution. The EDA system includes the tropical cyclone initialization [32] as well as the real observations (e.g., sonde, surface, aircraft, Global Positioning System-Radio Occultation, Infrared Atmospheric Sounding Interferometer, Advanced Microwave Sounding Unit-A, Cross-track Infrared Sounder, Microwave Humidity Sounder, Advanced Technology Microwave Sounder, and Atmospheric Motion Vector) [33,34]. The initial conditions of the LETKF cycle are produced by modifying the analysis with the lagged forecast differences that are used to generate the static BEC for the hybrid 4D ensemble-variational data assimilation system of KIAPS [35].
Several inflation methods have been implemented to prevent filter divergence [31], such as the following: (1) additive inflation that adds perturbations randomly sampled from the bias-corrected lagged forecast differences used to generate the static BEC to each ensemble member after the analysis step [36]; (2) adaptive multiplicative inflation that multiplies perturbations using an adaptively estimated inflation factor [6]; and (3) relaxation to prior spread (RTPS) that relaxes the ensemble standard deviation of analysis (posterior) back to the background (prior) by defining a relaxation parameter with a range from 0 to 1 [12]. In the current system, the RTPS with the relaxation parameter of 0.95 and the additive inflation showed better performance than any other configurations that we have tried (not shown).
Regarding the covariance localization, the horizontal localization is given by a Gaussian-like piecewise fifth-order rational function [6,37] and varies from 660 to 1800 km in radius of influence (ROI) depending on vertical levels [38]. The vertical localization differs as to the observation type.
For the conventional data, it is defined by a Gaussian-like rational function as 2 10 3 σ v where σ v is dependent on pressure p: σ v is 0.2 ln(p) for wind and surface pressure and 0.1 ln(p) for mass variables. For the radiance data, the vertical weighting function is defined by the gradient of transmittance of the measured radiance [39].

Stochastic Perturbation Hybrid Tendencies Scheme
In the stochastic perturbation hybrid tendencies (SPHT) scheme (i.e., SPDT+SPPT), the dynamic tendencies are from the explicitly resolved dynamics and horizontal diffusion while the physical tendencies are from the physical parameterization schemes. The SPPT assumes that uncertainty in the parameterized physical tendencies is proportional to the net physical tendency [40]; similarly, the SPDT supposes that uncertainty in dynamical tendencies is in proportion to the net dynamical tendency. Consequently, the multiplicative perturbation is applied to each tendency at each model time step and every grid point via and where ∂x ∂t dyn and ∂x ∂t phy represent the dynamical and physical tendencies, respectively; µ ∈ {0, 1} represents the vertical tapering function, e η−1 , in the generalized vertical coordinate η, and r the RF; the state x consists of temperature and humidity mixing ratio only. Note that in KIM, physics and dynamics are coupled by a time-splitting method, thus this approach differs from the method of perturbing total model tendency, i.e., (1 + µr)[( ∂x ∂t ) dyn + ( ∂x ∂t ) phy ]. To generate RF at every grid point, we transform RF from the spectral space to the grid space on a cubed sphere [41]. The new RF is applied at every time step and time correlations are described with auto-regressive processes of the first order (AR1) in the spectral space. Furthermore, RF follows a Gaussian distribution with a zero mean and a standard deviation of perturbation (σ). Optionally, it decreases exponentially upwards through µ. In particular, these perturbations have been tapered to zero in the lowermost and the uppermost atmosphere to avoid numerical instability [4]. Finally, this RF is applied to the tendency after all the dynamical and physical forcings are computed.

Experimental Designs
We designed the experiments based on the RF tuning parameters, whose values are listed in Table 1. Each parameter is defined as follows: L, the horizontal correlation length scale, determines how much perturbed errors propagate in a horizontal direction; τ, the de-correlation time scale, determines how long the perturbed errors will be sustained; σ, the standard deviation, controls the amplitudes of RF; and µ, the tapering function, determines whether the error would exponentially decrease or remain the same in the vertical layers. With these parameter settings, each stochastic perturbation is activated simultaneously in a stable state. Under the same RF, the tendency fluctuation in SPDT is larger than that in SPPT; thus, in order to suppress spurious instability, we designed the SPDT to generate sufficiently small RF by setting smaller σ while activating µ to render the RF smaller with height [25]. Table 1. Configuration of the random forcing (RF) tuning parameters for the experiment using stochastically perturbed dynamical tendency (SPDT) and stochastically perturbed physical tendency (SPPT): L is the horizontal correlation length scale (in km), τ the de-correlation time scale (in s), σ the standard deviation, and µ the tapering function.

Experiment
L τ σ µ SPDT 500 10,800 0.5 on SPPT 500 21,600 1.0 off We conducted two sets of EDA runs-the control run (CTL) and the stochastic perturbation run (STO)-to assess the SPHT scheme. The additive inflation and the RTPS were included in both runs; however, the SPDT and the SPPT were simultaneously activated only in the STO. Thus, the differences between the results from CTL and STO are solely due to the stochastic perturbation. The experiments started at 12:00 UTC on 22 June 2018 and ended at 18:00 UTC on 7 July 2018; this period was long enough to examine the response of BEC in an EDA system. We specified the first 78 h as a spin-up period and excluded this period from the error analyses.
We assessed the proposed scheme by quantifying the ensemble features. The ES was examined with regard to the areas that were underestimated or overestimated, compared to the EME that is defined as the root-mean-square deviation (RMSD) with respect to the ensemble mean. The EME was evaluated against the Integrated Forecast System (IFS) analysis from the European Centre for Medium-Range Weather Forecast (ECMWF), having a horizontal resolution of 25 km with 25 pressure levels from 1000 to 1 hPa. Currently, the analysis of IFS/ECMWF shows the smallest RMSD compared with the analyses of other operational centers (see [42]); thus, we regard the IFS analysis as the true state against which the EME is estimated.We also checked the performance of EDA by the new SPHT scheme through the accuracy and reliability. The accuracy was measured by the reduction of EME while the reliability was evaluated by comparing the ES and the EME [43]. In particular, we define the reliability index (γ) as which depicts the balance between the ES and the EME.

Results
We first diagnosed the status of ensemble quality in CTL in terms of the zonally-averaged EME and ES (Figure 1). For temperature, large EMEs are found in the lower atmosphere over the Arctic and Antarctica regions and in the overall stratosphere ( Figure 1a); for specific humidity, the tropics and mid-latitudes have larger EMEs than the other regions, especially below 700 hPa ( Figure 1c); for zonal wind, large EMEs are found in the tropics from the middle troposphere to the lower stratosphere, Antarctica, and most of the stratosphere except the Northern Hemisphere (NH) (Figure 1e).
These model errors should be correspondingly represented in the ESs (Figure 1b,d,f); however, the ESs were relatively underestimated compared to the EMEs. In summary, all the variables have insufficient ESs in the troposphere below 700 hPa in both hemispheres and above 10 hPa in the Southern Hemisphere (SH). Although the current inflation methods are quite effective in increasing the ES, this result evidently shows that we can further augment the ES. Note that the ES of zonal wind is quite similar to the EME in terms of patterns and magnitudes (cf. Figure 1e,f). Therefore, the zonal wind is considered to have sufficient ES already, especially in the jet regions, and additional inflation may have a negative impact on the mean error. For this reason, we perturbed only temperature and humidity mixing ratios that were the prognostic variables.
Prior to the EDA runs, we compared the background ESs of SPDT, SPPT, and SPHT for temperature at the first cycle (see Figure 2). The ES was mainly affected by the SPPT below 700 hPa, with prominent increases over the subtropics and extratropics in the NH and over the subtropics and polar region in the SH; the ES was also increased by the SPDT below 700 hPa over the extratropics in the SH and in the polar stratosphere but with much smaller magnitudes. This can be expected because the RF amplitude is relatively small in the SPDT, along with vertical relaxation (see Table 1). The SPHT scheme increases the ES considerably in the overall troposphere and weakly in the stratosphere-these regions are where we desire to enlarge the ES. This supports our idea to run the SPDT and the SPPT simultaneously as a new stochastic perturbation approach.
In Figure 3, we examine the effect of the SPHT scheme on the ES by comparing differences between STO and CTL for the globally averaged vertical profiles of temperature, specific humidity, and zonal wind. Starting with zeroes at the initial time, the differences gradually increased with time for all the variables. For temperature, the ES mainly increased in the troposphere below 500 hPa while it weakly increased in the stratosphere above 10 hPa (Figure 3a). For specific humidity, primarily distributed at the middle and lower troposphere, the ES also increased below 500 hPa ( Figure 3b). However, the ES for zonal wind slightly increased as the others increased (Figure 3c), though the zonal wind was not perturbed. The increasing pattern of the zonal wind profile was quite similar to that of the temperature profile. Therefore, we confirm that the SPHT scheme leads to an increase in the ensemble BECs of the 6 h forecasts, serving as an inflation method.   In Figure 4, we show the effect of our proposed scheme on the accuracy of EDA, by showing the zonal mean differences of ES and EME between STO and CTL for the 6 h forecasts during the experimental periods. As expected, in most areas the ES increased (red) while the EME decreased (blue). It is noteworthy that, for all variables, the EME significantly decreased (i.e., the accuracy increased) over the tropics, especially below 700 hPa (Figure 4b,d,f). For temperature, the accuracy increased over higher latitudes as well in the lower atmosphere while it prominently decreased in the stratosphere over the SH (Figure 4b); for zonal wind, the accuracy also increased in the middle to upper troposphere but decreased in the stratosphere over the tropics while it generally decreased at high latitudes, especially over the SH (60 • S) (Figure 4f). The ES generally enlarged in all variables: for temperature, the ES substantially increased in most latitudes and atmospheric layers except in the mid-to upper-troposphere and over the low-level tropics (10 • S-10 • N) below 700 hPa ( Figure 4a); for specific humidity, the ES was mostly enhanced within the mid-latitudes below 700 hPa ( Figure 4c); for zonal wind, the ES was significantly augmented in most of the global atmosphere, especially in the subtropical troposphere below 700 hPa and in most of the stratosphere except over 40 • S-10 • N (Figure 4e). For the mass variable variation, the SPDT acts indirectly through the mass-wind balance and the cycling, thus having a weaker impact on the temperature and specific humidity than the SPPT, while the wind field variation is induced comparably by both SPDT and SPPT. left panels) and the EME (EME STO -EME CTL ; right panels) against the IFS analysis: (a,c) temperature (in K), (b,d) specific humidity (in g kg −1 ), and (e,f) zonal wind (in m s −1 ). Results are represented with a composite of the 49 cycles' background fields from 18:00 UTC on 25 June to 18:00 UTC on 7 July 2018. Dots indicate the composite differences with the 99% statistical significance based on the two-tail t-test.
Regarding the reliability, Figure 5 shows a time series of the reliability index (γ; see Equation (3)) for the 6 h forecasts. When the ES is properly described, γ ≈ 1. Compared to CTL, the reliability has widely improved (i.e., γ increased) in STO: for temperature, in the layer below 150 hPa; for specific humidity, in the layer below 850 hPa; for zonal wind, in the mid-troposphere to lower stratosphere (700-30 hPa). The globally averaged value of γ has increased by about 4% in temperature and specific humidity-from 0.706 to 0.734 for temperature, and from 0.386 to 0.401 for specific humidity; for zonal wind, it has increased by about 3% from 0.757 to 0.779. We also note that, for all the variables, the γ values nearest to unity appear in the middle atmosphere in both CTL and STO-around 100-700 hPa for temperature and zonal wind and around 900-300 hPa in specific humidity. Overall, by considering the model error representation, it is evident that the SPHT scheme improves both the reliability and the accuracy of the EDA system in KIAPS.

Conclusions and Discussion
Ensemble data assimilation (EDA) systems suffer from underestimated background error covariance, which leads to the filter divergence problem. Model errors stem from uncertainty sources in both dynamical and physical processes, including underlying governing equations and approximations, computational schemes, physical parameterizations, etc.; thus, we have implemented the stochastic perturbation hybrid tendencies (SPHT) scheme-by combining the stochastically perturbed dynamical tendency (SPDT) and stochastically perturbed physical tendency (SPPT)-into the local ensemble transform Kalman filter (LETKF) system of the Korea Integrated Model (KIM).
By employing the SPHT scheme we found that for temperature, the ensemble spread (ES) increased in the troposphere below 700 hPa and in the stratosphere while the ensemble mean error (EME) decreased in the troposphere below 700 hPa; for specific humidity, both ES and EME have been improved in the troposphere below 700 hPa over the tropics and the extratropics; for zonal wind, the ES increased in most of the global atmosphere, especially in the low-level subtropical troposphere and in most of the stratosphere except over 40 • S-10 • N, whereas the EME substantially decreased in the tropical troposphere.
The new SPHT scheme satisfactorily increases the ES in the areas where the model errors are underestimated. This could not be achieved with existing inflation methods. In particular, the most effective and positive impacts appear over the tropics below 700 hPa on temperature and specific humidity. We will further improve the scheme's performance to reduce the EME and generate stable and adequate perturbations. To avoid model instability due to random forcing in SPPT, we plan to include a transition zone with vertical dependency, as in SPDT. This is because the total physical tendency is known to be different between the troposphere and stratosphere-relatively larger scale and smaller error in the stratosphere-due to radiative forcing [4]. As far as we know, this study is the first attempt to increase the ES by simultaneously perturbing the physical and dynamical tendencies in EDA.