Rapid Update with EnVar Direct Radar Reﬂectivity Data Assimilation for the NOAA Regional Convection-Allowing NMMB Model over the CONUS: System Description and Initial Experiment Results

: This study ﬁrst describes the extended Grid-Point Statistical Interpolation analysis system (GSI)-based ensemble-variational data assimilation (DA) system within the North American Mesoscale Rapid Refresh (NAMRR) system for the Nonhydrostatic Multiscale Model on the B grid (NMMB). Experiments were conducted to examine three critical aspects of data assimilation conﬁguration in this system. Ten retrospective high-impact convective cases during the warm season of 2015–2016 were adopted for testing. A 10-member, 18 h ensemble forecast was launched for each experiment. Speciﬁcally, the experiment using horizontal (vertical) localization radii (L r ) of 300 km (0.55-scaled height measured in the nature log of pressure) overall had more skills than that of 500 km (1.1-scaled height) for conventional in-situ observation assimilation. Diagnostics suggest that the higher forecast skills could be attributed to applying smaller L r in the boundary with large temperature and moisture gradients. For radar DA, the experiment was more skillful with horizontal (vertical) L r of 15 km (1.1-scaled height) than that of 12 km (0.55-scaled height). Diagnostics suggest that the improved forecasts were achieved by using wider L r to spread radar observations into unobserved areas more effectively. Slight forecast skill differences between the relaxation inﬂation factors of 95% and 65% are presented. The impact of varying inﬂation magnitudes primarily occurred in the upper-level spread.

The 3DEnVar scheme implemented in the GSI is employed in the US operational NAMRR system. The first goal of this study is to document and describe the additional developments for the NAMRR system to better initialize convection-allowing forecasts through leveraging efforts from earlier studies [21,22,30]. First, ensemble forecasts from the Global Forecast System (GFS) with a coarser resolution, e.g.,~13 km, are used to estimate background-error covariances (BECs) in the current NAMRR system and other studies [24,25,27,28]. Lu et al. [21] found that such a GFS ensemble is insufficient to resolve (HWRF) in [21], and the Advanced Research WRF (WRF-ARW) in [22], the newly extended DA system for the regional NMMB model has two significant features. One is that NMMB's own cycled, EnKF-updated ensemble forecasts are used to produce the flowdependent estimate of BECs that will be fed into the EnVar. The other is that direct assimilation of many operational observations that sample different scales is allowed. In particular, radar reflectivity can be directly assimilated [22] rather than indirectly using the CA approach. Detailed efforts of these additional developments are specifically introduced below.

GSI-Based EnVar Data Assimilation System within NAMRR
As shown in Figure 1, each DA cycle comprises four steps to assimilate both conventional and radar observations. The reader can refer to Section 3.2 for more details of the assimilated observations. Conventional and radar observations are assimilated separately due to the significant difference in their optimal influence radii. If only conventional observations are available at the analysis time t, the context in the dash red box of Figure 1 is skipped, and vice versa for the dash green box when assimilating radar observations only. (1). At the analysis time t, the conventional observations are first assimilated in this step. The ensemble prior M t k is updated to generate the ensemble analysis N t k , k = 1, . . . , K, where K is the ensemble size, at analysis time t using the EnKF; meanwhile, the control prior M t C is analyzed to the control analysis N t C through the EnVar, which ingests BECs sampled from the ensemble prior M t k using the extended control variable method; (2). Radar observations are assimilated following similar procedures to step (1); the ensemble analysis and control analysis from step (1) act as the priors, and the flow-dependent BECs are estimated from the ensemble analysis N t k from step (1); the updated results through EnKF and EnVar form ensemble analysis O t k and control analysis O t C , respectively; (3). The ensemble analysis O t k from step (2) is recentered around the control analysis O t C to obtain the final ensemble analysis P t k ; (4). The final ensemble analysis P t k and control analysis O t C are advanced to the next analysis time t + 1 through the NMMB model.
Conventional and radar observations are assimilated separately due to the significant difference in their optimal influence radii. If only conventional observations are available at the analysis time t, the context in the dash red box of Figure 1 is skipped, and vice versa for the dash green box when assimilating radar observations only.

Methodology of GSI-Based EnVar DA System for Direct Assimilation of Reflectivity within NAMRR
In this section, the GSI-based EnKF and EnVar interfacing with the NMMB model are extended with the capability of direct assimilation of radar reflectivity following [22,50] for the WRF-ARW, respectively. The specific extensions for both EnVar and EnKF are described below.

GSI-Based EnVar and its Extensions of Direct Reflectivity Assimilation for NMMB
The formulation of the GSI-based hybrid DA was described in [12]. The extensions in this study absorb its EnVar flavor, where a pure ensemble covariance is utilized. A brief description of the EnVar formula is introduced following the notation of [12]. The analysis increment x is defined through modulating the normalized ensemble perturbation with the extended control variable vectors a k , k = 1, . . . K, In Equation (1), x e k denotes the k-th ensemble perturbation normalized by The symbol • is the Schur product. The optimal analysis increment x is obtained through minimizing the cost function J(a) with a preconditioned conjugate gradient algorithm [12,52], where In the first term of Equation (2) on the right-hand side, vector a is formed by concatenating the extended control vectors a k , k = 1, . . . , K. A block-diagonal matrix A is used to constrain these extended control variables and define the spatial covariance of a. In the second term, R denotes the observation error covariance, H is the linearized observation forward operator, and y o denotes the innovation vector. The gradient of J(a) with respect to the unitless control variables a is derived as where D is a diagonal matrix denoted as [diag(x e 1 ) · · · diag(x e k )] to organize x e k along its diagonal.
In [22], three direct reflectivity assimilation methods were implemented in the above EnVar and discussed for the WRF-ARW. The first method uses hydrometeor mixing ratios as state variables by assigning these hydrometeor mixing ratios in x . This approach, however, can cause cost function gradient imbalance during the variational minimization and, therefore, impede efficient convergence. In addition, its tangent linear of the reflectivity operator leads to overly small hydrometeor increments. The second method fixes this gradient imbalance problem by using logarithmic hydrometeor mixing ratios as state variables. However, this method generates spuriously large hydrometeor increments as a result of transforming to and from the logarithmic space, as well as the linearization of the operator. Wang and Wang [22] proposed the third method to directly add reflectivity as a state variable. Since the tangent linear and adjoint of the nonlinear operator become identical matrices, this method avoids the aforementioned problems in the first two methods. Another benefit of this method is a reduction in the workload of developing the tangent linear and adjoint of the nonlinear operator, especially for the complicated operators. Therefore, this study adopts the third method to directly assimilate radar reflectivity.
Consequently, four developments are implemented to extend the GSI-based EnVar for the NMMB to directly assimilate reflectivity. First, the simulated reflectivity is diagnosed within the microphysics scheme (i.e., Ferrier-Aligo scheme [53]) of the NMMB, and the value of the forward operator is, therefore, obtained by linearly interpolating the simulated reflectivity to the observation space. Second, the ensemble perturbation interface is extended to import new state variables of rainwater mixing ratio (qr), snow mixing ratio (qs), ice water mixing ratio (qi), cloud water mixing ratio (ql), reflectivity (dbz), and vertical velocity (w) from each ensemble first guess. Third, the GSI interface for the NMMB is extended to read the control first guess and update the control analysis of these extended state variables. Note that the NMMB model directly outputs the fraction variables, including total condensation (CWM), the ice fraction of CWM (fice), the rain fraction of liquid part of CWM (frain), and the ratio of rimed ice (frime), rather than the hydrometeor mixing ratios. Therefore, additional developments are implemented for the transform of the hydrometeor mixing ratios from and to the fraction variables following the formulas in the Ferrier-Aligo microphysics scheme [53], specifically, for converting the fraction variables to hydrometeor mixing ratios, and CWM = qs + qr + ql fice = min(1 .0, for the inverse transform.

GSI-Based EnKF and its Extensions of Direct Reflectivity Assimilation for NMMB
An ensemble square-root filter algorithm (EnSRF [46]) is adopted to update the ensemble priors in the GSI-based EnKF. Following the notation from [46], the model state variable from each ensemble member is partitioned to an ensemble mean x b and a deviation from the mean x b , which can be respectively updated to obtain the analyzed ensemble mean x a and deviation x a through where H is the observation forward operator (i.e., in nonlinearity), and K and K are the traditional Kalman gain and the reduced gain. Consistent with Section 2. ; thus, K is given by If R is diagonal, applying the serial observations assimilation can denote K as The extensions of the GSI-based EnKF to directly assimilate reflectivity for the NMMB include three developments. The first development is to extend the ensemble observation priors Hx b to include the radar reflectivity. Second, the new state variables of qr, qs, qi, ql, dbz, and w are included in EnKF for both first guess and analyses. Third, the EnKF interface for the NMMB is extended to input the ensemble first guess and analysis of these extended, new state variables for calculating x b and x a , respectively. The derivations between the model-diagnosed fraction variables and the hydrometeor mixing ratios in Equations (4) and (5) are also implemented here.

Model Configuration
The NMMB model is employed within the NAMRR system. A single domain ( Figure 2) is configured by covering the Continental United States (CONUS) with 1568 × 1120 horizontal grid points and 50 vertical levels. The NMMB uses a rotated latitude-longitude grid in the regional applications [54] at a grid spacing of 0.0268 • (i.e., 1.9-2.6 km in the mid-latitude) with a center point at 38.7 • N/97.0 • W. A hybrid sigma-pressure vertical coordinate is applied for the vertical levels with the hybrid interface level at 300 hPa and the model top at 50 hPa. As a convection-allowing horizontal grid spacing is applied in the NMMB model, no convection and cumulus parameterization schemes are used. The other physics parameterizations include the Ferrier-Aligo microphysics scheme [53], the Mellor-Yamada-Janjić planetary boundary layer scheme [55][56][57][58], the Noah land surface model (LSM [59]), and the Rapid Radiative Transfer Model (RRTM) longwave and shortwave radiation schemes [60]. A twice digital filter initialization (TDFI [61]) approach is used with a 2 min half-window length, consistent with [30]. vations between the model-diagnosed fraction variables and the hydrometeor mixing ratios in Equations (4) and (5) are also implemented here.

Model Configuration
The NMMB model is employed within the NAMRR system. A single domain ( Figure  2) is configured by covering the Continental United States (CONUS) with 1568 × 1120 horizontal grid points and 50 vertical levels. The NMMB uses a rotated latitude-longitude grid in the regional applications [54] at a grid spacing of 0.0268° (i.e., 1.9-2.6 km in the mid-latitude) with a center point at 38.7° N/97.0° W. A hybrid sigma-pressure vertical coordinate is applied for the vertical levels with the hybrid interface level at 300 hPa and the model top at 50 hPa. As a convection-allowing horizontal grid spacing is applied in the NMMB model, no convection and cumulus parameterization schemes are used. The other physics parameterizations include the Ferrier-Aligo microphysics scheme [53], the Mellor-Yamada-Janjić planetary boundary layer scheme [55][56][57][58], the Noah land surface model (LSM [59]), and the Rapid Radiative Transfer Model (RRTM) longwave and shortwave radiation schemes [60]. A twice digital filter initialization (TDFI [61]) approach is used with a 2 min half-window length, consistent with [30].

DA Configuration and Experimental Design
The GSI-based hybrid DA and ensemble forecast system is schematically shown in Figure 3. Beginning from T − 6 h (relative to the start of the free forecast) with a cold start, the initial conditions (ICs) valid at T − 6 h and its corresponding lateral boundary conditions (LBCs) for the EnVar control member are downscaled from the control GFS analysis and its subsequent forecasts. Perturbations for a 40-member convective-scale ensemble valid at T − 6 h are provided by the 20-member Global Ensemble Forecast System (GEFS) and the 20-member Short-Range Ensemble Forecast (SREF) system of the National Centers

DA Configuration and Experimental Design
The GSI-based hybrid DA and ensemble forecast system is schematically shown in Figure 3. Beginning from T − 6 h (relative to the start of the free forecast) with a cold start, the initial conditions (ICs) valid at T − 6 h and its corresponding lateral boundary conditions (LBCs) for the EnVar control member are downscaled from the control GFS analysis and its subsequent forecasts. Perturbations for a 40-member convective-scale ensemble valid at T − 6 h are provided by the 20-member Global Ensemble Forecast System (GEFS) and the 20-member Short-Range Ensemble Forecast (SREF) system of the National Centers for Environmental Prediction (NCEP). These perturbations adding to the GFS control analysis are used to generate the initial ensemble. The selected forecasts from GEFS and SREF provide the LBCs forecast with an 18 h leading time is launched from the T − 0 ensemble analyses. Given the computing resource constraint, one control forecast initialized from the control analysis for each ensemble member. The strategies of assimilating conventional and radar observations at each DA cycle are illustrated in Figure 1 and Section 2.1. A free ensemble and nine-member ensemble initialized from the first nine-member recentered EnKF ensemble analyses valid at T − 0 h are used with the same model configuration during the free forecast. analysis are used to generate the initial ensemble. The selected forecasts from GEFS and SREF provide the LBCs forecast with an 18 h leading time is launched from the T − 0 ensemble analyses. Given the computing resource constraint, one control forecast initialized from the control analysis for each ensemble member. The strategies of assimilating conventional and radar observations at each DA cycle are illustrated in Figure 1 and Section 2.1. A free ensemble and nine-member ensemble initialized from the first nine-member recentered EnKF ensemble analyses valid at T − 0 h are used with the same model configuration during the free forecast. The system is designed to assimilate the conventional observations hourly from T − 5 h to T − 0 h. The conventional observations are archived from the operational Rapid Refresh (RAP)/HRRR in situ data stream, including surface (i.e., horizontal u and v wind components, surface pressure ps, temperature T, and dewpoint Td), rawinsonde (i.e., u, v, T, relative humidity RH, and geopotential height z; only available at 12:00 a.m. UTC and 12:00 p.m. UTC), aircraft (i.e., u, v, T, and water-vapor mixing ratio qv), and GPS precipitable water (PW). The unified quality control in the GSI software and the default observation errors consistent with the North American Mesoscale Forecast System (NAM) are applied to the conventional observations.
Radar reflectivity assimilation is designed to enable hourly or sub-hourly cycling, and this study here assimilates the reflectivity observations every hour. The radar reflectivity observations are provided by the Multi-Radar/Multi-Sensor (MRMS) system (https://www.nssl.noaa.gov/projects/mrms/ accessed on 30 September 2021) in the National Severe Storms Laboratory (NSSL). The MRMS reflectivity data comprise a horizontal grid spacing of 0.01° and 33 vertical levels, stretching with the vertical spacings of 250 m between 500 m and 3000 m AGL, 500 m between 3000 m and 9000 m AGL, and 1000 m between 9000 m and 19,000 m AGL. The temporal resolution of the MRMS reflectivity is approximately 2 min. The reflectivity is quality-controlled within the MRMS system. The standard deviation of the reflectivity is assumed to be 5 dBZ, which is consistent with the previous storm-scale DA studies [22,23,49,50,62]. The observed reflectivity lower than 5 dBZ is reset to 0 dBZ [22,23,30,49,63]. These 0 dBZ observations are considered "no precipitation" observations and assimilated to suppress spurious convections [22,36,49]. Moreover, the observed reflectivity between 5 and 10 dBZ will not be assimilated, because it is believed to be indistinguishable from other sources, e.g., ground clutter, birds, and insects, especially near the radar locations [63]. Therefore, an observed reflectivity greater than 10 dBZ and equal to 0 dBZ is assimilated in this system.
The designed experiments are summarized in Table 1. Three experiments with different horizontal and vertical localization radii are first conducted to examine the impact of localizations for the assimilation of conventional and radar observations. The baseline experiment named REF1 uses the DA configuration as follows: the horizontal localization radii for conventional and radar DA are 300 km and 12 km, respectively, and the vertical The system is designed to assimilate the conventional observations hourly from T − 5 h to T − 0 h. The conventional observations are archived from the operational Rapid Refresh (RAP)/HRRR in situ data stream, including surface (i.e., horizontal u and v wind components, surface pressure p s , temperature T, and dewpoint T d ), rawinsonde (i.e., u, v, T, relative humidity RH, and geopotential height z; only available at 12:00 a.m. UTC and 12:00 p.m. UTC), aircraft (i.e., u, v, T, and water-vapor mixing ratio q v ), and GPS precipitable water (PW). The unified quality control in the GSI software and the default observation errors consistent with the North American Mesoscale Forecast System (NAM) are applied to the conventional observations.
Radar reflectivity assimilation is designed to enable hourly or sub-hourly cycling, and this study here assimilates the reflectivity observations every hour. The radar reflectivity observations are provided by the Multi-Radar/Multi-Sensor (MRMS) system (https:// www.nssl.noaa.gov/projects/mrms/ accessed on 30 September 2021) in the National Severe Storms Laboratory (NSSL). The MRMS reflectivity data comprise a horizontal grid spacing of 0.01 • and 33 vertical levels, stretching with the vertical spacings of 250 m between 500 m and 3000 m AGL, 500 m between 3000 m and 9000 m AGL, and 1000 m between 9000 m and 19,000 m AGL. The temporal resolution of the MRMS reflectivity is approximately 2 min. The reflectivity is quality-controlled within the MRMS system. The standard deviation of the reflectivity is assumed to be 5 dBZ, which is consistent with the previous storm-scale DA studies [22,23,49,50,62]. The observed reflectivity lower than 5 dBZ is reset to 0 dBZ [22,23,30,49,63]. These 0 dBZ observations are considered "no precipitation" observations and assimilated to suppress spurious convections [22,36,49]. Moreover, the observed reflectivity between 5 and 10 dBZ will not be assimilated, because it is believed to be indistinguishable from other sources, e.g., ground clutter, birds, and insects, especially near the radar locations [63]. Therefore, an observed reflectivity greater than 10 dBZ and equal to 0 dBZ is assimilated in this system.
The designed experiments are summarized in Table 1. Three experiments with different horizontal and vertical localization radii are first conducted to examine the impact of localizations for the assimilation of conventional and radar observations. The baseline experiment named REF1 uses the DA configuration as follows: the horizontal localization radii for conventional and radar DA are 300 km and 12 km, respectively, and the vertical localization radius is the 0.55-scaled height for both DAs. Here, the vertical localization scale is measured in the nature log of pressure. Using the previous studies as a reference, most of these parameters for localization radii are similar to those used in [49], except that the horizontal localization radius for the conventional observation assimilation is nearly half of that in [49]. The relaxation to prior spread (RTPS [64]) with a factor of 0.95, same as [50], is used in the EnKF component to all model variables to maintain ensemble spread for the assimilation of both observations. The radar DA is conducted every 60 min for a 1 h period before the free forecasts. In other words, the radar DA in both EnKF and EnVar are only conducted at T − 1 h and T − 0 h. Experiment CONV_H500_V1.1 is the same as REF1 except that wider horizontal and vertical localization radii of 500 km and 1.1-scaled height, respectively, are used to examine the impact of localization on conventional DA when compared to REF1. In terms of radar DA, RADAR_H15_V1.1 uses the localization radii of 15 km in the horizontal and 1.1-scaled height in the vertical dimensions, respectively. The comparison between RADAR_H15_V1.1 and REF1 reveals the influences of different localization radii on radar DA. Below, RADAR_H15_V1.1 is used as the second benchmark experiment REF2 to examine the impacts of the ensemble inflation magnitude. Specifically, RTPS_065 uses the same DA configuration as REF2 but a reduced inflation magnitude of 65% on the ensemble analyses. These experiments were examined with a collection of 10 cases occurring in the warm season of 2015-2016. These cases were characterized by higher convective coverages and well-organized convective storms. The reader can refer to [30,65] for more details of these selected cases.

Verification
A single domain covering the eastern two-thirds of the CONUS (110-80 • W longitude and 25-50 • N latitude) was used for the verification in this study. The fractions skill score (FSS) calculated from neighborhood-based ensemble probability [65] was applied to quantitively verify predictions of the composite reflectivity and 1 h accumulated precipitation fields. Such a neighborhood-based metric is expected to measure forecast quality more adequately than traditional verification metrics such as bias and RMSE [66,67], due to the highly structured nature of these simulated fields from the convection-allowing model. The FSS metric was applied to the composite reflectivity fields at 20, 30, and 40 dBZ thresholds, and the 1 h precipitation fields at 0.254, 2.54, and 6.35 mm thresholds (0.01, 0.1, and 0.25 inches, respectively). The chosen neighborhood radius for FSS was 48 km, similar to the size used in previous relevant studies [30,65,68]. Corresponding observation fields were obtained from the MRMS project [69].
Additionally, surface and upper-air variables were also evaluated by calculating the grid point-based ensemble mean error and spread. Specifically, the upper-air variables, including temperature, dewpoint temperature, and wind, were verified against hourly RAP analyses. Moreover, hourly 2.5 km Real-Time Mesoscale Analysis (RTMA [70]) fields were used to evaluate surface variables, including 2 m temperature, 2 m specific humidity, and 10 m wind. The RTMA fields at a comparable resolution with the forecasts permit verification of higher-resolution details that the 3 km convection-allowing forecasts can resolve.
Similar to [30,65], statistical significance tests were applied to the differences of verification metrics, FSS, and RMSE against RTMA among the experiments using the bootstrapping method [71] with 1000 resamples. The 1000 bootstrap samples were generated by resampling the verification metrices with replacement. This study utilized a two-tailed test with a significance level of 0.9. In other words, the differences between two experiments exceeding the 95th or 5th percentiles of the null distribution were considered significant.

Impact of Localization Radii for Conventional DA
The FSSs of composite reflectivity and 1 h accumulated precipitation are shown in Figure 4, calculated from the aggregated contingency table components over the 10 retrospective cases. The FSSs of both experiments showed similar patterns at all precipitation and reflectivity thresholds, where REF1 was significantly more skillful than CONV_H500_V1.1 during the first 16 h of the forecast for the reflectivity at all thresholds (20,30, and 40 dBZ) and the precipitation forecasts at the lighter thresholds (0.254 and 2.54 mm). In terms of the heavier precipitation threshold (6.35 mm), such outperformance of REF1 persisted up to 12 forecast hours. In addition, the FSSs were also aggregated into two groups of five cases separated by the strength of synoptic forcing in Figure 4. We noticed a great differences in the verification results between the two strength-of-forcing groups. The FSSs of the weakly forced group were generally lower than those of the strongly forced group throughout the almost entire forecast period, except at the forecast hour of 18 for the heavier thresholds. The lower scores may have been caused by the fact that predicting storms is more difficult for the weakly forced group than for the strongly forced group [72]. REF1 was generally superior to CONV_H500_V1.1 at most thresholds for the weakly forced group. Compared to CONV_H500_V1.1, the FSSs of REF1 were significantly more pronounced for 7, 11, and 13 out of 18 forecast hours for the 20, 30, and 40 dBZ reflectivity thresholds, respectively. For precipitation forecasts, REF1 had significantly higher scores than CONV_H500_V1.1 for 7, 11, and 9 out of 18 forecast hours for the 0.254, 2.54, and 6.35 mm thresholds, respectively. For the strongly forced group, the FSSs of REF1 were similar to but slightly higher than CONV_H500_V1.1 during the first 16 forecast hours at the lighter thresholds (20 and 30 dBZ for composite reflectivity; 0.254 and 2.54 mm for precipitation) and the first four forecast hours at the heavier thresholds (40 dBZ and 6.35 mm for composite reflectivity and precipitation, respectively). However, the FSSs of CONV_H500_V1.1 had higher scores at the later lead times (forecast hours of 5-18) for the heavier reflectivity and precipitation thresholds despite not showing statistical significance. This result can be explained as follows: for a few strongly forced cases, compared to using narrower localization radii, using wider localization radii can better analyze the synoptic scale systems through efficiently spreading the observation information into a longer distance. These synoptic systems strongly affect the convection initiation during the later forecasts (not shown).
The verification of surface variables with time series of ensemble mean RMSE against 2.5 km RTMA analyses is shown in Figure 5. The error of REF1 was about 0.05 g·kg −1 lower than CONV_H500_V1.1 in the entire 18 forecast hours for the specific humidity ( Figure 5b). A lower error of REF1 was also found for meridional and zonal winds in the forecast hours of 0-3 and 11-18, respectively, with a maximum difference up to 0.1 m·s −1 . The error of REF1 in the surface temperature was lower than CONV_H500_V1.1 only at the analysis time, and the lead times after the 9th forecast hour. These results are consistent with the higher skills of convection predictions by REF1 than CONV_H500_V1.1.
An example of forecasts initialized at 4:00 a.m. UTC 10 July 2016 was selected to explore and understand the systematic difference between REF1 and CONV_H500_V1.1 as a result of using the different localization radii on conventional DA. This case features a nocturnal mesoscale convective vortex (MCV) from South Dakota, through North Dakota and Minnesota, and into Iowa. Discrete convections were initiated ahead of a dryline located in northern North Dakota due to a strong instability from the daytime heating. A low-level jet carrying northward moist and warm air passing through Nebraska, South Dakota, and North Dakota fueled further storm development and upscale growth of MCV into Iowa. This case was chosen because its differences between REF1 and CONV_H500_V1.1 were consistent with those aggregated results in Figures 4 and 5. At the analysis time (Figure 6a,b), CONV_H500_V1.1 produced a much weaker storm in northern South Dakota than REF1, whereas these storms in REF1 were much stronger and closer to observations (Figure 6e). Single conventional DA cycle tests were also performed to understand their differences in analyses due to using the different localization radii. The first guess and conventional observations valid at 4:00 a.m. UTC 10 July 2016 are selected. It was found that REF1 produced 1-4 g·kg −1 greater moisture increments than CONV_H500_V1.1 in northern South Dakota and its surroundings (Figure 7a). Therefore, these surface drier airs in CONV_H500_V1.1 may have enhanced the convection inhibition (CIN) over the north of South Dakota and the south of North Dakota and then prohibit the corresponding storm strengthening. Conversely, the storm in northern Minnesota was better captured by CONV_H500_V1.1 than REF1. The slightly moister air of CONV_H500_V1.1 in this area was also consistent with the improved analysis compared to REF1 (Figure 7b). In addition, CONV_H500_V1.1 tended to have more spurious convections than REF1, i.e., across the border between southeastern Nebraska and northeastern Kansas. Near northern Kansas and southern Nebraska, more spurious moisture (~0.5 g·kg −1 ) was analyzed in CONV_H500_V1.1 compared to REF1. Six hours later (Figure 6c,d), REF1 maintained all the observed storms (Figure 6f), while CONV_H500_V1.1 missed the storms over southern Minnesota and eastern North Dakota. These sustained storms in REF1 could be traced back to the well-analyzed storms over northern South Dakota and southern North Dakota at the analysis time.  The results aggregated over 10 cases are shown as black lines, those over the weakly forced cases are shown as red lines, and the strongly forced cases are shown as green lines. Statistically significant differences between CONV_H500V1.1 and REF1 over the 10 cases, over the strongly forced group, and over the weakly forced group are indicated by the black, green, and red symbols, respectively, along the top axes. The 95% and 90% confidence levels are represented by dot and star symbols, respectively.
The verification of surface variables with time series of ensemble mean RMSE against 2.5 km RTMA analyses is shown in Figure 5. The error of REF1 was about 0.05 g·kg −1 lower than CONV_H500_V1.1 in the entire 18 forecast hours for the specific humidity ( Figure  5b). A lower error of REF1 was also found for meridional and zonal winds in the forecast hours of 0-3 and 11-18, respectively, with a maximum difference up to 0.1 m·s −1 . The error of REF1 in the surface temperature was lower than CONV_H500_V1.1 only at the analysis time, and the lead times after the 9th forecast hour. These results are consistent with the higher skills of convection predictions by REF1 than CONV_H500_V1.1. The results aggregated over 10 cases are shown as black lines, those over the weakly forced cases are shown as red lines, and the strongly forced cases are shown as green lines. Statistically significant differences between CONV_H500V1.1 and REF1 over the 10 cases, over the strongly forced group, and over the weakly forced group are indicated by the black, green, and red symbols, respectively, along the top axes. The 95% and 90% confidence levels are represented by dot and star symbols, respectively. to REF1 (Figure 7b). In addition, CONV_H500_V1.1 tended to have more spurious convections than REF1, i.e., across the border between southeastern Nebraska and northeastern Kansas. Near northern Kansas and southern Nebraska, more spurious moisture (~0.5 g·kg −1 ) was analyzed in CONV_H500_V1.1 compared to REF1. Six hours later ( Figure  6c,d), REF1 maintained all the observed storms (Figure 6f), while CONV_H500_V1.1 missed the storms over southern Minnesota and eastern North Dakota. These sustained storms in REF1 could be traced back to the well-analyzed storms over northern South Dakota and southern North Dakota at the analysis time.  These diagnostics for the 10 July 2016 case indicate that applying a uniform localization radius on the CONUS domain is inappropriate. For example, a better analysis of moisture in northern South Dakota requires a tighter localization radius, i.e., REF1. This means that a tighter localization radius is required near the boundary between different airmasses with large gradients, such as dry lines for moisture, and cold and warm fronts for temperature. However, a wider radius is needed for properly analyzing storms in northern Minnesota as in CONV_H500_V1.1. The adaptive localization functions [73,74]  These diagnostics for the 10 July 2016 case indicate that applying a uniform localization radius on the CONUS domain is inappropriate. For example, a better analysis of moisture in northern South Dakota requires a tighter localization radius, i.e., REF1. This means that a tighter localization radius is required near the boundary between different airmasses with large gradients, such as dry lines for moisture, and cold and warm fronts for temperature. However, a wider radius is needed for properly analyzing storms in northern Minnesota as in CONV_H500_V1.1. The adaptive localization functions [73,74] or the scale-dependent localization method [75,76] may be more appropriate to address these multiscale DA problems in the future study.

Impact of Localization Radii for Radar DA
The impact of different localization radii in radar DA on subsequent convection forecasts was explored by comparing REF1 with RADAR_H15_V1. The spectra of the temperature increments at 850 hPa from REF1 and RA-DAR_H15_V1.1 are plotted in Figure 9 to study the impact of using different localization radii on the analysis increments at different scales. The temperature increments were obtained using the same first guess and observations but different localization radii for the The spectra of the temperature increments at 850 hPa from REF1 and RADAR_H15_V1.1 are plotted in Figure 9 to study the impact of using different localization radii on the analysis increments at different scales. The temperature increments were obtained using the same first guess and observations but different localization radii for the first radar DA cycle valid at 3:00 a.m. UTC 10 July 2016. The energy of all scales in REF1 was consistently lower than that in RADAR_H15_V1.1, especially for the scales below 10 km. This result implies that increments in RADAR_H15_V1.1 were featured by larger contributions of all scales compared to REF1. These reduced contributions of all scales in REF1 are speculated to have been caused by the dampened correlation at small scales and then the weakened cross-correlations between small and large scales when applying tighter localization radii for radar DA, consistent with [76,77]. For the wind increments, similar tendencies for the spectral differences were displayed in the two experiments (not shown).   Figure 10c shows that RADAR_H15_V1.1 tended to capture all convections across South Dakota, North Dakota, and Minnesota with higher probabilities than REF1, especially for the isolated storm cells at the border between South Dakota and North Dakota. Their differences also existed in the later lead times. For example, 6 h later at 10:00 a.m. UTC, RADAR_H15_V1.1 still had more than 0.15 higher probabilities for the convections over northeastern South Dakota and the storm leading-edge areas in southern Minnesota than REF1, while REF1 had less than 0.1 higher probabilities in the stratiform area (Figure 10d).   Figure 10c shows that RADAR_H15_V1.1 tended to capture all convections across South Dakota, North Dakota, and Minnesota with higher probabilities than REF1, especially for the isolated storm cells at the border between South Dakota and North Dakota. Their differences also existed in the later lead times. For example, 6 h later at 10:00 a.m. UTC, RADAR_H15_V1.1 still had more than 0.15 higher probabilities for the convections over northeastern South Dakota and the storm leading-edge areas in southern Minnesota than REF1, while REF1 had less than 0.1 higher probabilities in the stratiform area (Figure 10d). Atmosphere 2021, 12, x FOR PEER REVIEW 16 of 24  To understand why wider localization radii in radar DA better favor analyzing and predicting convections, the vertical cross-sections of analysis increments used in Figure 9 for dewpoint temperature and temperature are shown in Figure 11. As the same first guess fields were used in both experiments, differences in the increments were caused by the  To understand why wider localization radii in radar DA better favor analyzing and predicting convections, the vertical cross-sections of analysis increments used in Figure 9 for dewpoint temperature and temperature are shown in Figure 11. As the same first guess fields were used in both experiments, differences in the increments were caused by the To understand why wider localization radii in radar DA better favor analyzing and predicting convections, the vertical cross-sections of analysis increments used in Figure 9 for dewpoint temperature and temperature are shown in Figure 11. As the same first guess fields were used in both experiments, differences in the increments were caused by the radar DA using different localization radii. It was found that their differences could be mostly attributed to the use of different vertical localization radii, as their disparity in horizontal localization radii was very small (12 km vs. 15 km). RADAR_H15_V1.1 had colder and drier air below 850 hPa and moister air aloft near 45 • and 45.5 • N, which coincided with the enhanced storms, compared with REF1. Given that the vertical distribution of the observed reflectivity started from 0.5 km as introduced in Section 3.2 and the data capable of effectively sampling complete storms were primarily located above 1.5 km (storms below the lowest elevation angle are unable to be detected by the radar echo), wider vertical localization radii could be more feasible to spread the observed information from the upper level into the lower level, especially in the unobserved areas. Therefore, the convections could be more consistently adjusted in the vertical dimension. In addition, the analyzed stronger cold pool indicated by the colder air below 850 hPa in RADAR_H15_V1.1 tended to enhance the storm propagations in the subsequent forecast [78].

Impact of Inflating Ensemble Spread through RTPS
With the reduced ensemble inflation, the FSSs of RTPS_065 were slightly smaller than those of REF2 at the earlier forecast hours for composite reflectivity and 1 h precipitation at all thresholds ( Figure 12). In the subsequent lead times (e.g., forecast hours of 6-12), larger FSSs were found in RTPS_065 than in REF2 at higher thresholds (30 and 40 dBZ for composite reflectivity and 2.54 mm for precipitation). However, only a few hours showed statistical significance. At most forecast hours and thresholds, the performance of both experiments was similar and mixed. These results imply that these predicted reflectivity and precipitation results from both experiments were insensitive to the RTPS magnitude.
Atmosphere 2021, 12, x FOR PEER REVIEW 17 of 24 radar DA using different localization radii. It was found that their differences could be mostly attributed to the use of different vertical localization radii, as their disparity in horizontal localization radii was very small (12 km vs. 15 km). RADAR_H15_V1.1 had colder and drier air below 850 hPa and moister air aloft near 45° and 45.5° N, which coincided with the enhanced storms, compared with REF1. Given that the vertical distribution of the observed reflectivity started from 0.5 km as introduced in Section 3.2 and the data capable of effectively sampling complete storms were primarily located above 1.5 km (storms below the lowest elevation angle are unable to be detected by the radar echo), wider vertical localization radii could be more feasible to spread the observed information from the upper level into the lower level, especially in the unobserved areas. Therefore, the convections could be more consistently adjusted in the vertical dimension. In addition, the analyzed stronger cold pool indicated by the colder air below 850 hPa in RA-DAR_H15_V1.1 tended to enhance the storm propagations in the subsequent forecast [78].

Impact of Inflating Ensemble Spread through RTPS
With the reduced ensemble inflation, the FSSs of RTPS_065 were slightly smaller than those of REF2 at the earlier forecast hours for composite reflectivity and 1 h precipitation at all thresholds ( Figure 12). In the subsequent lead times (e.g., forecast hours of 6-12), larger FSSs were found in RTPS_065 than in REF2 at higher thresholds (30 and 40 dBZ for composite reflectivity and 2.54 mm for precipitation). However, only a few hours showed statistical significance. At most forecast hours and thresholds, the performance of both experiments was similar and mixed. These results imply that these predicted reflectivity and precipitation results from both experiments were insensitive to the RTPS magnitude. The time series of ensemble mean RMSE verified against RAP analyses and the ensemble spread for the selected upper-air variables at forecast hours of 0, 6, and 12 are shown in Figure 13. Slight error differences were found for all variables at different levels at the analysis time (Figure 13a,d,g,j). For example, REF2 had slightly smaller errors in temperature (T) above 500 hPa than RTPS_065; the errors in dewpoint temperature (DPT) and wind for REF2 were similar to those for RTPS_065 below 700 hPa and at 500 hPa, respectively. During the later forecast hours, both experiments also showed similar errors for all variables, except that REF2 had smaller errors for T in 700 hPa and above at the forecast hour of 6 ( Figure 13b). In terms of spread, REF2 improved spread at all levels and variables for all forecast hours over RTPS_065. For example, the spread of REF2 was 0.05-0.1 • C for T, 0.2-0.5 • C for DPT, and 0.2-0.3 m·s −1 for wind greater than that of RTPS_065 at the analysis time. The spread differences became smaller with the increase in lead times. At the forecast hour of 12, the spread differences between REF2 and RTPS_065 were less than 0.05 • C for T, less than 0.2 • C for DPT, and less than 0.2 m·s −1 for wind. However, both experiments still provided under-dispersive ensemble predictions as the spread was remarkably smaller than RMSE during the first 12 forecast hours. To further enhance the ensemble spread during DA, more approaches (e.g., additive noise inflation [79]) to address model errors may be additionally required.

Summary and Discussion
In this study, we first described the extensions for the GSI-based hybrid EnVar system within the NAMRR system for better initializing the convection-allowing forecasts through leveraging the efforts from previous studies. The major extensions include two

Summary and Discussion
In this study, we first described the extensions for the GSI-based hybrid EnVar system within the NAMRR system for better initializing the convection-allowing forecasts through leveraging the efforts from previous studies. The major extensions include two aspects. One is that cycled NMMB's own EnKF ensemble is used to produce the ensemble-based, flow-dependent estimate of background error covariances (BECs) that will be fed into the EnVar. The other is to allow direct assimilation of multiscale observations, including conventional in situ observations and radar reflectivity.
The primary goal of this study was to examine three critical aspects of DA configuration that could strongly affect the effects of the ensemble-based BECs. Ten high-impact convections during the warm season of 2015-2016 with varied features were selected. Objective verifications were performed using the FSSs on the forecasts of composite reflectivity and 1 h accumulated precipitation. The subjective diagnostics of composite reflectivity analyses and predictions provide an additional physical understanding of the impact of different localization radii for conventional and radar DA, respectively. Additional verifications were performed on the ensemble mean upper-level fields and surface variables against hourly RAP and 2.5 km RTMA analyses. The main results of this study are summarized below.
(1) The first baseline experiment named REF1 with localization radii of 300 km in the horizontal and 0.55-scaled height (the nature log of pressure) in the vertical dimensions for the conventional DA generally produced more skillful short-term (up to 12-16 forecast hours) forecasts than CONV_H500V1.1 with localization radii of 500 km in the horizontal and 1.1-scaled height in the vertical dimensions. Particularly, REF1 outperformed CONV_H500V1.1 more consistently for the weakly forced cases. In contrast, REF1 had a similar but slightly better performance than CONV_H500V1.1 in the first 16 forecast hours at lighter thresholds, and CONV_H500V1.1 became superior at the later lead times (forecast hours of 5-18) at higher thresholds for the strongly forced cases. Additional verifications suggest that the improved reflectivity and precipitation forecasts in REF1 over CONV_H500V1.1 were consistent with the more accurate prediction of the surface variables. Diagnostics suggest that the outperformance of REF1 over CONV_H500V1.1 could be attributed to the use of smaller localization radii in the boundary between different airmasses with large temperature and moisture gradients.
(2) Compared with REF1 using tighter localization radii for radar DA (12 km in the horizontal and 0.55-scaled height in the vertical dimensions), RADAR_H15_V1.1, with localization radii of 15 km in the horizontal and 1.1-scaled height in the vertical dimensions, had higher FSSs for all thresholds throughout most of the forecast period. Diagnostics for the spectra of analyses increments showed that RADAR_H15_V1.1 had higher energy at all scales than REF1, especially at scales below 10 km. Subjectively, RADAR_H15_V1.1 produced higher ensemble probabilities of convection forecasts than REF1. Diagnostics suggest that the improved forecasts by RADAR_H15_V1.1 were achieved by using wider localization radii to more effectively spread observations into unobserved areas and more consistently analyze convections in the vertical dimension, compared to REF1.
(3) Experiments using different RTPS magnitudes were further compared. The RTPS factor was 0.95 for REF2 (alias as RADAR_H15_V1.1) and 0.65 for RTPS_065. Slight differences in FSSs between REF2 and RTPS_065 were produced in the verification of composite reflectivity and 1 h precipitation forecasts. For upper-level fields, there was practically no difference in errors, but consistently higher ensemble spread in REF2 than RTPS_065 over time, while these spreads for all variables became smaller with the increase in lead times.
The first goal of this study was to extend the GSI-based EnVar DA system for the NMMB model. Similar extensions can also be easily migrated to other models, e.g., the nextgeneration National Oceanic and Atmospheric Administration (NOAA) Geophysical Fluid Dynamics Laboratory (GFDL) Finite Volume Cubed-Sphere Dynamical Core (FV3 [80,81]).
This study did not attempt to examine and optimize all aspects of the DA configuration. To determine an optimal set of parameters, additional experiments are required to test a wider range of parameters. Additionally, as suggested by the results in Section 4.1, a scale-dependent localization of ensemble covariances [75,76] is ideally preferred to estimate all scales by assimilating all available observations simultaneously. Furthermore, as the quality of ensemble-based BECs may be degraded by a zero or small variance of hydrometeor mixing ratios for radar DA, an additive noise inflation [22,23,36,49] and/or properly constructed convective-scale static BECs can efficiently address the under-dispersive ensemble spread in storm scales [82]. These approaches remain to be tested in the CONUS EnVar settings in the future.
Funding: This study was supported by NA15OAR4590193, NA16OAR4590236, and NA19OAR4590231.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: All model data produced during this study are archived locally and are available upon request to the corresponding author.