Combining vLAPS and Nudging Data Assimilation

: The combination of techniques that incorporate observational data may improve numerical weather prediction forecasts; thus, in this study, the methodology and potential value of one such combination were investigated. A series of experiments on a single case day was used to explore a 3DVAR-based technique (the variational version of the Local Analysis and Prediction System; vLAPS) in combination with Newtonian relaxation (observation and analysis nudging) for simulating moist convection in the Advanced Research version of the Weather Research and Forecasting model. Experiments were carried out with various combinations of vLAPS and nudging for a series of forecast start times. A limited subjective analysis of reﬂectivity suggested all experiments generally performed similarly in reproducing the overall convective structures. Objective veriﬁcation indicated that applying vLAPS analyses without nudging performs best during the 0–2 h forecast in terms of placement of moist convection but worst in the 3–5 h forecast and quickly develops the most substantial overforecast bias. The analyses used for analysis nudging were at much ﬁner temporal and spatial scales than usually used in pre-forecast analysis nudging, and the results suggest that further research is needed on how to best apply analysis nudging of analyses at these scales.


Introduction
A variety of techniques to incorporate observations have been used in numerical weather prediction models to allow observations to improve the model forecasts. One example is 3D variational analysis (3DVAR; e.g., [1,2]), and another is Newtonian relaxation (nudging; e.g., [3,4]).
The 3DVAR technique has the potential to determine an optimal analysis by combining observations with a background field. However, since an optimal analysis requires that the background error covariance and the observation error covariance be perfectly known, in practice, the analysis will not be optimal. Background error covariance may be estimated by using model forecasts from a series of past times to find a climatological value, from multiple model forecasts of the current time (i.e., an ensemble) to find case-dependent values, or from a combination of these two methods (e.g., [5]). However, these techniques are difficult to apply when creating on-demand forecasts for an area where equivalently configured forecasts (e.g., horizontal resolution) have not been run in the past and there are insufficient computational resources to carry out an ensemble matching the horizontal grid spacing of the on-demand forecasts (although coarser-resolution global ensembles can be used to estimate the case-dependent background error covariance; [5]).
A 3DVAR analysis is commonly applied at a single time at the beginning of a model simulation and can lack physical consistency. Applying the analysis at a single time could result in noise as the model adjusts to a solution consistent with its equations. Additionally, model fields not part of the 3DVAR analyses may not be consistent with the fields in the 3DVAR analyses and may pull the model solution away from the 3DVAR analyses. These potential issues might be mitigated by applying the increments indicated by a 3DVAR analysis over multiple model time steps using incremental analysis updating (e.g., [6,7]) or by nudging towards the 3DVAR analysis. When applying a single analysis, observations made at times other than the analysis time cannot easily be applied at their valid time. While there are methods to mitigate this issue (e.g., first guess at appropriate time, [8]), it is a challenge to know the actual state of the model at the valid time of the observation and to apply the observation at its valid time.
Newtonian relaxation [9,10], also known as nudging, adds nonphysical terms to the tendency terms of the model over a period of time to gradually nudge the model toward analyses or observations. Since the nonphysical terms added by nudging should be smaller than the physical terms in the governing equations, nudging can adjust the model's state variables while maintaining an improved physical consistency compared to inserting modified fields directly at a single time. Nudging allows observations to be applied over a time period centered on their individual valid times, and assimilating analyses valid at multiple times allows observations to be applied to the analysis closest to the valid time of the observation. Fields that are nudged towards must be prognostic variables of the numerical weather prediction model, and thus observations of non-prognostic fields must be converted to prognostic fields in order to be used with nudging. Additionally, nudging is not usually applied in a way that fully accounts for case-specific background error covariances.
Several techniques have combined nudging with another technique. Shaw et al. [11] initialized WRF with a 3DVAR analysis and then used observation nudging and found that the combination appeared to perform better than either solution individually. Liu et al. [12] nudged towards 3DVAR analyses of radar-derived fields while also applying observation nudging. Lei et al. [13][14][15] combined nudging and the ensemble Kalman filter (EnKF) and applied it first in simplified models and then in the Advanced Research version of the Weather Research and Forecasting model (WRF-ARW; [16]). Information from the ensemble was used to determine how the influence of the observations was spread and how observations of one variable affect another variable. They found the hybrid technique performed better than EnKF for wind direction and better than observation nudging in temperature and relative humidity. Lei et al. [15] also found that noise levels in the hybrid technique were much lower than the EnKF simulation. Liu et al. [17] reported on a hybrid nudging-ensemble system that uses an ensemble to determine the strength at which observation nudging is applied. Lei and Hacker [18] used a Lorenz model to test observation nudging, EnKF, and the nudging-EnKF hybrid developed by Lei et al. [13][14][15]. They found that in the Lorenz model, the nudging-EnKF hybrid they tested could not simultaneously outperform both nudging and EnKF.
This study demonstrates the combination of a scheme involving 3DVAR (the variational version of the Local Analysis and Prediction System; vLAPS; [19]) and nudging for a single case day with strong convection. This combination takes advantage of vLAPS analysis with the best fit of the observations and nudging with improvement in the physical consistency among the analysis states. Verification of the WRF reflectivity forecasts suggests that further research is needed to improve analysis nudging of high-temporospatial resolution analyses. The experiments described here were previously described in a report [20]; however, that report does not include the objective verification included here. Nonetheless, since this study reports on a subset of the experiments in Reen et al. [20], there are similarities between the text of that document and this study. This study is unique in its use of analysis nudging to incorporate vLAPS analyses, and there appears to be little past work on assimilating any 3DVAR analyses using analysis nudging. Liu et al. [12] assimilated 3DVAR analyses using analysis nudging but used solely radar observations as the basis for the 3DVAR analyses compared to the much broader set of observations included in the 3DVAR analyses used in our study. While Shaw et al. [11] used 3DVAR and observation nudging, they did not use analysis nudging to apply the 3DVAR analyses, as is done in this study. Analysis nudging is usually used for model forecasts with a horizontal grid spacing much coarser than the 1 km spacing used here (e.g., [21,22]) since analyses are not usually available to nudge towards at such a high resolution. Huo et al. [23] applied what appears to be analysis nudging to assimilate radarderived fields at a fairly high resolution (3 km grid spacing), but one that is somewhat coarser than the 1 km grid spacing in this study. As the current study is limited to a single case day and evaluation of the radar reflectivity field, it is intended as a limited demonstration of the combination of a 3DVAR-based technique (vLAPS) and nudging. One motivating factor in this investigation was to explore assimilation techniques that may be able to provide value for forward-deployed on-demand nowcasting using limited computational capability.

WRF-ARW
The Advanced Research version of the Weather Research and Forecasting model (WRF-ARW) V3.6.1 [16] was used with 56 vertical layers over a 1 km-spaced 801 × 801 horizontal grid centered over Oklahoma ( Figure 1). The Mellor-Yamada-Janjić scheme (MYJ; [24]) was used to parameterize the atmospheric boundary layer (ABL) with the background turbulent kinetic energy and atmospheric boundary layer depth calculation altered as in Lee et al. [25] and Reen et al. [26]. The Thompson microphysics parameterization [27], the Rapid Radiative Transfer Model longwave scheme [28], the Dudhia shortwave scheme [29], and the Noah land surface model [30] were used. Analyses were created every 15 min using vLAPS. The background field for the analyses was taken from the 3 km-horizontal grid spacing High-Resolution Rapid Refre (HRRR; [37]); specifically, we used the output from the HRRR 15 UTC cycle on 20 M

vLAPS
LAPS [31] uses a modified Barnes analysis [32] with some 1D variational components [33], while vLAPS changes to a multiscale 3DVAR scheme for specific fields (temperature, pressure, winds, and humidity) [19]. The vLAPS 3DVAR scheme is based on the Space and Time Multiscale Analysis System [34] and is multiscale spatially and temporally. The analysis was performed here four times with the coarsest analysis at 16 km horizontally and 50 hPa vertically, and subsequent analyses divided the resolution in half through to the final analysis, which was 2 km horizontally and 25 hPa vertically. The first two analyses used a 30 min time window, whereas the final two analyses used a 15 min time window. The multiscale technique allows observations to spread broadly in data-sparse areas while retaining fine-scale features in more data-rich areas and makes the technique less dependent on the accuracy of the background error covariance. For wind, the control variables were the u-and v-wind components [35] since Xie et al. [34,36] demonstrated that the alternative of the stream function and velocity potential introduce numerical errors and noise. The cloud analysis [33] combined Geostationary Operational Environmental Satellite (GOES) infrared and visible data, radar data, surface observations, and model first-guess fields to construct a cloud fraction. The cloud fraction was used to determine hydrometeor fields and vertical velocity, with the latter affecting the 3D wind. Additionally, the cloud analysis was used as a constraint in determining the vLAPS relative humidity field.
Analyses were created every 15 min using vLAPS. The background field for these analyses was taken from the 3 km-horizontal grid spacing High-Resolution Rapid Refresh (HRRR; [37]); specifically, we used the output from the HRRR 15 UTC cycle on 20 May 2013 (i.e., the integration of HRRR that starts at 15 UTC and provides forecasts at a 15 min temporal spacing). Only the 15 UTC cycle of HRRR was used here to approximate conditions in the application driving this research, namely, use in regions where cycles of the numerical weather prediction model used for initial conditions are available much less frequently than hourly. Each vLAPS analysis used the forecast hour of the 15 UTC HRRR cycle corresponding to the vLAPS analysis time (e.g., the 18 UTC vLAPS analysis used the 3 h forecast of the 15 UTC HRRR cycle). Sources of observations used to create the analyses included Meteorological Assimilation Data Ingest System (MADIS; https://madis.noaa.gov) surface, mesonet, profiler, radiosonde, and Aircraft Communications Addressing and Reporting System (ACARS) observations; pilot observations; WSR-88D radars; and GOES. In addition to the quality control applied by the MADIS data provider, MADIS data were also quality controlled by comparing them to the HRRR fields used as the background.

Nudging
Nudging [9,10,38] adds a term to the model's tendency equations based on the difference between the current model value and the value from an analysis (analysis nudging; e.g., [39]) or an observation (observation nudging; [40]). This difference is known as the innovation and is multiplied by weighting factors to create the nonphysical addition to the tendency equation. In this study, some experiments applied analysis nudging using vLAPS analysis, and some experiments applied observation nudging of MADIS observations. For brevity, the details of nudging are simplified in the following discussion (see Reen et al. [20] for additional details).
The vLAPS analyses were used by analysis nudging to modify the potential temperature, water vapor mixing ratio, and u-and v-wind components. Within the ABL (atmospheric boundary layer), only the surface analysis was used to calculate analysis nudging terms; at the first level above the ABL, both the surface and above-surface analyses contributed; and above this, only the above-surface analyses contributed. The innovation calculated at the surface (by comparing the vLAPS surface analysis to the WRF surface values) was applied throughout the ABL for the potential temperature and wind. For water vapor, the innovation for each level in the ABL was calculated by comparing the surface analysis to the model value at that level. Innovations above the ABL were calculated by comparing the above-surface analysis to the model value at that level. WRF-ARW analysis nudging linearly interpolates between analyses, which, in this case, were vLAPS analyses available every 15 min. The assimilation period was 3 h, and thus vLAPS analyses from 13 different times were applied. The analysis valid at the end of the assimilation period was applied with a linearly decreasing weight over the 15 min following the end of the assimilation period. The released version of WRF-ARW V3.6.1 does not properly execute this rampdown period, and thus we modified the code to fix this (the fix was provided to the WRF-ARW maintainers and is included in WRF-ARW starting with V3.8). The strength at which analysis nudging was applied was 3 × 10 −4 s −1 . Analysis (and observation) nudging is designed to gradually modify the model solution while allowing the physical tendency terms to dominate the equations so that the physical consistency of the atmosphere is maintained; this should mitigate any potential issues with overfitting.
Observation nudging used MADIS surface, mesonet, profiler, radiosonde, and ACARS observations (ACARS and profiler observations were inadvertently omitted for part of the simulation). In addition to using the quality control carried out by the data provider, to account for the data quality issues that may be more prevalent in mesonet datasets (e.g., siting issues), mesonet observations were filtered using use/reject lists designed for the Real-Time Mesoscale Analysis [41]. The Obsgrid program [42] was also used to apply quality control. It compared the observations against the 15 UTC cycle of the 3 km HRRR model on 20 May 2013 (i.e., the HRRR forecast that creates forecasts with valid times every 15 min starting at 15 UTC) and against nearby observations. A non-standard version of the WRF Preprocessing System (WPS) program Ungrib was used to vertically interpolate the HRRR field to additional levels to facilitate quality control of single-level above-surface observations (ACARS), and to allow more vertical structure to be retained in multilevel observations (the capability is available in standard Ungrib starting with WPS V3.9).
As with analysis nudging, one must specify the strength of the observation nudging (6 × 10 −4 s −1 was used here), but additional specification is needed in regard to the vertical, horizontal, and temporal weighting since, unlike the analysis, one does not have one value per model grid cell to nudge towards. Vertical weighting depends on observation type, with the innovation from surface observations being spread throughout the ABL during convective conditions, innovations from single-level above-surface observations being applied in a 100 hPa range, and innovations from multilevel observations being interpolated to model layers within the vertical range of the observation. A 30 km radius of influence was used for surface observations, but terrain differences limited the spreading further. For above-surface observations, the radius of influence increased from 60 km near the surface to 120 km at 500 hPa. Surface observations were used for a 2 h time period centered on the observation valid time, and above-surface observations were used for 3 h time periods. At the end of the assimilation period, observation nudging was ramped down to zero in 1 h (no observations valid during the rampdown were assimilated). The modification of water vapor observation nudging to prevent excessive drying described in [43] was applied. When observation nudging spreads the innovation from a location where the model is too moist to a nearby location that is much drier, this can result in unrealistic drying of the location the innovation is spread to. To mitigate this issue, the modification limits the magnitude of negative water vapor innovations being applied in locations drier than the location where the innovation is calculated.

Case Description
The day investigated here (20 May 2013) had strong convection in the southern Great Plains region. The base reflectivity (composited over relevant individual radars) of the observed radar ( Figure 2) indicated echoes <35 dBZ at 1200, 1500, and 1800 UTC, with most echoes generally along an approximately southwest-northeast-oriented line in the northwest of the domain at 1200 and 1500 UTC, but moving to the north central portion of the domain by 1800 UTC. However, after 1800 UTC, a strong line of southwest-to-northeastoriented convection developed ahead of the previous echoes and moved eastward. Multiple tornadoes were observed this day, with one that was determined to be an EF5 on the ground approximately from 1956 to 2035 UTC starting in Newcastle, Oklahoma, and traveling through Moore, Oklahoma [44][45][46]. Severe hail was also observed in multiple locations.
adding nudging. VLAPS0 represents the normal way in which vLAPS (and other 3DVAR analyses) is used, while VLAPS3 allows the effect of a 3 h pre-forecast without additional nudging assimilation to be ascertained. Both experiments used the vLAPS analysis as the initial conditions for the model integration. The other three experiments used vLAPS for the initial conditions, and during a 3 h pre-forecast, they analysis nudged toward the thirteen vLAPS analyses valid during this period (VLAPS3A), observation nudged toward observations (VLAPS3O), or both (VLAPS3AO). These experiments explore the value of applying a series of vLAPS 3D analyses rather than just one, and whether observation nudging might add value even when vLAPS analyses are used. For the purpose of comparing the experiments in the objective verification section, we broke them up into groups based upon the general method of data assimilation applied. These groups are referred to as hot start (HS; VLAPS0), analysis nudge (AN; VLAPS3A and VLAPS3AO), observation nudge (ON; VLAPS3O), and warm start (WS; HRRR0, HRRR3, and VLAPS3). The experiment that used both analysis and observation Various modeling studies have been performed on this case. Hanley et al. [47] used the United Kingdom Met Office's Unified Model with 4.4/2.2/0.5/0.2/0.1 km horizontal grid spacings without data assimilation and found convection initiating in the Oklahoma City area at approximately the right time, but the tornado-like vortices in their finest grids occurred approximately 2.5 h later than the Moore tornado. Zhang et al. [48,49] used WRF-ARW to look at predictability in this case. Zhang et al. [48], using 27/9/3/1 km horizontal grid spacings, found that temporal shifting of initial conditions generally temporally shifts convection but, in some cases, does not because lateral boundary conditions control convective initiation. Zhang et al. [49] used a 1 km ensemble with perturbations smaller than the current observational network could resolve and found that the ensemble members all produce a line of storms, but that details of individual storms differed. Snook et al. [50] used 500 m-horizontal grid spacing Advanced Regional Prediction System (ARPS) Atmosphere 2022, 13, 127 7 of 26 simulations with assimilation of radar and surface observations every 5 min using EnKF and found skill in predicting hail.

Experiment Design
The experiments in Table 1 and Figure 3 were used to investigate the potential value of combining vLAPS (2.2) and nudging (2.3) in WRF-ARW (2.1) for the case day of 20 May 2013 (2.4). The names of the experiments begin with the source of the initial conditions (HRRR or vLAPS), followed by the length of the pre-forecast in hours (0 or 3), and finally an O is added if observation nudging was applied, and an A is added if analysis nudging was applied. The pre-forecast refers to the period at the beginning of the model integration during which the model is assumed to be coming into dynamic balance, and during which observations may be assimilated. Hydrometeors are included in the initial conditions provided by both HRRR and vLAPS. Boundary conditions are based on the output from the 15 UTC HRRR cycle. Assimilation of observations or analyses valid during the pre-forecast time may extend into the beginning of the free forecast, but observations or analyses valid during the free forecast should not be assimilated. However, the time window over which observations are included in vLAPS analyses extends 15 min after the valid time of the analysis, and thus observations during the first 15 min of the free forecast are included via the vLAPS analyses. Table 1. Experimental design. Groups are HS = hot start, WS = warm start, ON = observation nudging, and AN = analysis nudging (VLAPS3AO contains both analysis and observation nudging but is assigned to group AN).

Analysis
Obs Group The experiments initialized with HRRR0 and HRRR3 served as the no-assimilation control experiments for the 0 h pre-forecast and 3 h pre-forecast experiments, respectively. HRRR0 and HRRR3 differ from one another in that, for a given cycle, the HRRR3 experiment starts integrating 3 h earlier to allow WRF to spin up. For example, the 20 UTC cycle of HRRR0 (referred to as HRRR0 20 ) starts integrating at 20 UTC using the 5 h forecast from the 15 UTC HRRR cycle, whereas HRRR3 20 starts integrating at 17 UTC using the 2 h forecast from the 15 UTC HRRR cycle, but the output between 17 and 20 UTC is not considered part of the forecast since it is during the spinup time. The experiments initialized with vLAPS that applied no additional nudging were used to determine the potential value of adding nudging. VLAPS0 represents the normal way in which vLAPS (and other 3DVAR analyses) is used, while VLAPS3 allows the effect of a 3 h pre-forecast without additional nudging assimilation to be ascertained. Both experiments used the vLAPS analysis as the initial conditions for the model integration. The other three experiments used vLAPS for the initial conditions, and during a 3 h pre-forecast, they analysis nudged toward the thirteen vLAPS analyses valid during this period (VLAPS3A), observation nudged toward observations (VLAPS3O), or both (VLAPS3AO). These experiments explore the value of applying a series of vLAPS 3D analyses rather than just one, and whether observation nudging might add value even when vLAPS analyses are used. All experiments used the 1500 UTC cycle of HRRR as the starting point to determ the initial conditions and boundary conditions, with the vLAPS analyses using this as the first-guess field. This simplifies the experimental design and is more represent of the limited frequency of updated model data available to drive finer-scale model ulations in battlefield conditions.
In order to more concretely illustrate the experimental configuration, Figure 5 sh how one specific experiment (VLAPS3AO) was configured for one specific cycle ( UTC), i.e., VLAPS3AO21. The lower portion of Figure 5 illustrates how analysis nud was applied during the 3 h pre-forecast (1800-2100 UTC for this cycle) using 15 vLAPS analyses by showing the analysis nudging details for the first hour of this pe (i.e., 1800-1900 UTC). Each analysis was applied to the model over 30 min centered o valid time of the analysis. The weight at which it was applied linearly increased in th min prior to the valid time of the analysis and linearly decreased in the 15 min follow the valid time. Observation nudging applied observations over a time window cent on each individual observation; as described in Section 2.3, a 2 h time window was for surface observations, and a 3 h time window was used for above-surface observat For the purpose of comparing the experiments in the objective verification section, we broke them up into groups based upon the general method of data assimilation applied. These groups are referred to as hot start (HS; VLAPS0), analysis nudge (AN; VLAPS3A and VLAPS3AO), observation nudge (ON; VLAPS3O), and warm start (WS; HRRR0, HRRR3, and VLAPS3). The experiment that used both analysis and observation nudging was placed within the AN group, and the WS group included all experiments that "spun up" from either a vLAPS-or HRRR-generated background state without further data assimilation application. The motivation for including the non-vLAPS experiments in the WS group was that the HRRR 15 UTC forecast cycle fields provide sufficiently "spun up" mesoscale information (including hydrometeor fields), due to the model's own high-quality data assimilation methodology and 3 km native grid spacing, meaning it is essentially a warm start. In summary (Table 1), HS included VLAPS0, AN included VLAPS3A and VLAPS3AO, ON included VLAPS3O, and WS included HRRR0, HRRR3, and VLAPS3.
For each experiment, independent cycles were carried out hourly with the integration period of all cycles ending at 0000 UTC on 21 May 2013 ( Figure 4). The 0 h forecast time is the end of the pre-forecast and the beginning of the forecast (t 0 in Figures 3 and 4), which was 1800 UTC for the first cycle and 2300 UTC for the final cycle (i.e., cycles were run with the following t 0 values: 1800, 1900, 2000, 2100, 2200, and 2300 UTC). Each cycle is referred to by its 0 h forecast time (t 0 ). Each WRF cycle includes the integration of WRF through the 3 h pre-forecast period up to t 0 (except for VLAPS0 and HRRR0, which do not have a pre-forecast period) and the integration of WRF from t 0 until the end of the forecast at 0000 UTC. For example, the 18 UTC cycle of VLAPS3AO (referred to as VLAPS3AO 18 ) starts integration at 1500 UTC, applies analysis and observation nudging data assimilation during the integration in the 3 h pre-forecast period ending at 1800 UTC (t 0 ), and then continues integrating until the end of the forecast period at 0000 UTC. Each cycle is independent in that it is not affected by any of the other cycles.
Atmosphere 2022, 12, x FOR PEER REVIEW 10 Figure 4. Illustration of the six WRF cycles used for each experiment and which are referred the 0 h forecast time (t0). As shown in Figure 3, the 3 h pre-forecast period wherein any analy observation nudging data assimilation was applied is omitted in some experiments (HRRR0 VLAPS0). The hourly forecast lead times for each cycle are labeled in red; the output was crea a 15 min temporal spacing. . As shown in Figure 3, the 3 h pre-forecast period wherein any analysis or observation nudging data assimilation was applied is omitted in some experiments (HRRR0 and VLAPS0). The hourly forecast lead times for each cycle are labeled in red; the output was created at a 15 min temporal spacing.
All experiments used the 1500 UTC cycle of HRRR as the starting point to determine the initial conditions and boundary conditions, with the vLAPS analyses using this cycle as the first-guess field. This simplifies the experimental design and is more representa-tive of the limited frequency of updated model data available to drive finer-scale model simulations in battlefield conditions.
In order to more concretely illustrate the experimental configuration, Figure 5 shows how one specific experiment (VLAPS3AO) was configured for one specific cycle (2100 UTC), i.e., VLAPS3AO 21 . The lower portion of Figure 5 illustrates how analysis nudging was applied during the 3 h pre-forecast (1800-2100 UTC for this cycle) using 15 min vLAPS analyses by showing the analysis nudging details for the first hour of this period (i.e., 1800-1900 UTC). Each analysis was applied to the model over 30 min centered on the valid time of the analysis. The weight at which it was applied linearly increased in the 15 min prior to the valid time of the analysis and linearly decreased in the 15 min following the valid time. Observation nudging applied observations over a time window centered on each individual observation; as described in Section 2.3, a 2 h time window was used for surface observations, and a 3 h time window was used for above-surface observations.

Results
Here, the evaluation of the combination of a 3DVAR-based technique (vLAPS) and nudging is limited to reflectivity. To demonstrate the variation among the experiments, a single forecast time for a single cycle is examined subjectively. However, given the number of experiments, cycles, and forecast times, the objective evaluation that follows allows for a more holistic understanding of the results.

Subjective Evaluation
The observed composite base reflectivity is compared to the WRF-ARW lowest-level reflectivity for the 18 UTC cycle of each experiment in Figure 6 for 2015 UTC 20 May 2013.

Results
Here, the evaluation of the combination of a 3DVAR-based technique (vLAPS) and nudging is limited to reflectivity. To demonstrate the variation among the experiments, a single forecast time for a single cycle is examined subjectively. However, given the number of experiments, cycles, and forecast times, the objective evaluation that follows allows for a more holistic understanding of the results.

Subjective Evaluation
The observed composite base reflectivity is compared to the WRF-ARW lowest-level reflectivity for the 18 UTC cycle of each experiment in Figure 6 for 2015 UTC 20 May 2013. This is near the time of the peak of the EF5 tornado and was chosen as a time where strong convection is present. The observed reflectivity (Figure 6c) shows convection along a southwest-to-northeast-oriented line from Texas through the southeast corner of Kansas. There are weaker echoes (<30 dBZ) in a short line in central Kansas, and scattered echoes in the northwest corner of the domain. All of the experiments forecast the main line of convection and some echoes in the northwest corner of the domain, but there are notable differences among the experiments.    Figure 6a and HRRR3 18 in Figure 6d). The use of vLAPS as the initial condition slightly strengthens convection in this area (VLAPSO 18 in Figure 6b), and the addition of the 3 h pre-forecast without nudging (VLAPS3 18 in Figure 6e) brings the strength closer to the observations. However, it appears that the experiment analysis nudging to the vLAPS analyses and applying observation nudging (VLAPS3AO 18 in Figure 6h) best reproduces the continuous strength of the convection along this line in Kansas. The southern edge of the main line of convection does not extend far enough south in VLAPS3 18 (Figure 6e), but most of the other experiments appear to extend it too far south. All experiments show echoes in the general region of the observed EF5 tornado, but only three experiments (VLAPSO 18 in Figure 6b, VLAPS3 18 in Figure 6e, and VLAPS3O 18 in Figure 6g) show echoes at the location, with two of these experiments (VLAPSO 18 in Figure 6b and VLAPS3O 18 in Figure 6g) showing echoes ≥ 40 dBZ very close to the location.

The northern portion of the main line of convection is not well simulated by the experiments initialized with HRRR (HRRR0 18 in
The area of weak echoes in central Kansas is not well forecast by any of the experiments. The HRRR-initialized experiment without a pre-forecast (HRRR0 18 in Figure 6a) shows the feature stronger and smaller than observed and, unlike the observations, does not show an area with no echoes between it and the echoes in the northwestern portion of the domain. The vLAPS-initialized experiment without a pre-forecast (VLAPS0 18 in Figure 6b) shows a pair of lines of convection in this area that are both much stronger than observed. The 3 h pre-forecast (HRRR3 18 in Figure 6d) allows the HRRR initialization to reproduce more of the separation between the feature and echoes in the northwestern portion of the domain observed but shows the feature with much less coverage than observed. For experiments with the 3 h pre-forecast, using vLAPS as the initial conditions broadens the area covered by this feature compared to the HRRR initialization (HRRR3 18 ) and removes the second line of convection seen in the vLAPS experiment without the 3 h pre-forecast (VLAPSO 18 ). The experiments with the area covered by this feature that most closely match the observations appear to be the experiments analysis nudging the vLAPS analyses (VLAPS3A 18 in Figure 6f and VLAPS3AO 18 in Figure 6h). However, all of the experiments that produce the feature show it stronger than observed. The echoes in the northwest quadrant of the domain, in general, appear to be stronger than observed in the experiments, perhaps most so for experiments analysis nudging towards vLAPS analyses or observation nudging (VLAPS3A 18 in Figure 6f, VLAPS3O 18 in Figure 6g, and VLAPS3AO 18 in Figure 6h).
All of the experiments produce model base reflectivity fields that are at least generally consistent with the observed composite radar reflectivity, but there is notable variation among the experiments. While it is difficult to subjectively determine which experiment performs best at the time shown here for this cycle, the results suggest that the experiment analysis nudging to the vLAPS analyses and applying observation nudging performed reasonably well (VLAPS3AO 18 in Figure 6h).

Objective Evaluation
The previous subsection presented subjective support that the model simulations realistically reproduced the general convective outbreak event on the afternoon of 20 May, including aspects associated with the timing, spatial location, alignment, and convective mode. Reen et al. [20] showed that the model had some ability in capturing realistic gross structures seen in supercellular storms. However, the evolution of the convective outbreak and structural details of individual convective elements differed among the experiments; this variation offers a spread of potential solutions, consistent with what might be expected based on the findings in Zhang et al. [49]. The following section provides a more objective evaluation using several forecast performance measures to compare all of the different experiments aggregated across the hourly cycles during the afternoon of May 20.
Although the results are only valid for this single unique case study event, they may provide valuable clues as to how the various techniques can perform in a more general sense. We are also interested in whether our combined 3DVAR analysis/observation nudging strategy might be of value in supporting stand-alone operational nowcasting systems run on a more modest computational hardware platform.

Metrics Used in Objective Evaluation
To quantitatively measure the performances of each cycle across the full spectrum of our experiments, a trio of well-established forecast evaluation measures was used to examine the short-range convective forecasts by comparing observed to model forecast radar reflectivity fields. The fractions skill score, or FSS [51], is a spatial verification metric often used for assessing the performance of precipitation forecasts from numerical weather prediction (NWP) models (frequently for convective precipitation). The critical success index, or CSI [52], is another common metric used to evaluate categorical forecasts, taking into account hits, misses, and false alarms. Finally, the frequency bias score (FBIAS) [53,54] is also used to assess the quality of model-derived radar reflectivity field forecasts against real radar observations [19,53]. The FBIAS is simply the ratio between the number of model grid points where the model forecasts reflectivity above a certain threshold and the number of model grid points where the observed radar reflectivity values exceed the same threshold.
The FSS was computed for different neighborhood sizes aggregated from the native grid resolution. The aggregation was also conducted across all of the relevant cycles for each lead forecast time in each experiment (not all cycles are relevant for each forecast lead time since some cycles are too short to include some forecast lead times  Figure 4 shows the relationship between forecast lead time and valid time for each cycle). Since model forecast skill often varies based on the forecast lead time, aggregating statistics by forecast lead time reveals how the temporal evolution of error varies among methodologies. The CSI and FBIAS were not computed using neighborhoods but were both calculated on the highest available resolution grid (here, our native model grid spacing of 1 km) and, as with FSS, were aggregated across cycles for each lead forecast time in each experiment. The FBIAS of reflectivity measures how much the model either overforecasts (forecast is more than the actual) or underforecasts (forecast is less than the actual) coverage of reflectivity values exceeding a threshold. No verification takes place during the pre-forecast (where data assimilation occurs), but verification is instead limited to the forecast period.
As noted previously (and shown in Figure 4), a model integration with a t 0 of each hour from 1800 UTC to 2300 UTC was launched (i.e., an hourly refresh) for each experiment and produced 15 min forecasts from t 0 out to 0000 UTC of 21 May. The latest time verified is 2300 UTC because radar data provided by vLAPS were not available after this time. The output was produced in 15 min intervals, which thus produced forecasts for verification with lead times ranging from 15 min to 5 h (resulting in statistics at 20 different forecast lead times). Due to the 0000 UTC end time of all cycles, but radar data not being available after 2300 UTC, only the 1800 UTC cycle produced a 5 h forecast lead time. Because of various pre-forecast requirements across different experiments, some experiments start their integration before their 0 h lead time (e.g., a model cycle with a base time of 1800 UTC with a 3 h pre-forecast requirement was actually initialized at 1500 UTC with lead times of forecasts based on the 1800 UTC mark). Since all simulations terminated at 0000 UTC, the model integrations which started at 2300 UTC had only a maximum 1 h lead forecast available. Because radar data were not available after 2300 UTC, all verification forecast times of the 2300 UTC cycle (other than the 0 h forecast time) are after the end of radar data availability, and thus the 2300 UTC cycle is not included in the objective verification statistics. WRF-ARW "restart" forecasts were not used to initialize from a previous cycle's forecast-all first-guess initial condition (and lateral boundary tendency) fields for all cycles leveraged the 1500 UTC HRRR cycle forecast fields available at hourly increments. A single HRRR cycle was used to more closely approximate the cycle frequency that is available outside of the United States for models such as the Global Forecast System that could be used as boundary conditions for fine-scale simulations.
Each experiment was aggregated across its simulations to create statistics, with each value representing that experiment's performance for the forecast lead time, reflectivity threshold, and (for FSS) neighborhood size being evaluated. Similar to other studies, the column maximum radar reflectivity from both the model and the National Weather Service WSR-88Ds (provided by the National Oceanic and Atmospheric Administration; NOAA) was used to evaluate the convective forecasts [55]. The ground truth column maximum radar reflectivity fields obtained from the NOAA are the same type as those used for previous studies involving the original LAPS product [33]. The model column maximum reflectivity was computed using the WRF-ARW diagnostic method [56] that uses prognostic hydrometeor fields from the model microphysics parameterization.
The FSS for two radar reflectivity thresholds (25 dBZ and 35 dBZ) using a 9 km neighborhood size is shown in Figure 7. The CSI and FBIAS metrics are shown in Figures 8  and 9. The neighborhood size of 9 km is here considered to be a reasonable estimate for the "effective resolution" of the WRF-ARW when run on a mesh with a native 1 km grid spacing [57]. The number of cycles included in the statistics varies by forecast lead time, with five cycles included at the 1 h forecast and one cycle included at the 5 h forecast.

Outcome of Objective Evaluation
For the purpose of comparing the experiments in this section, we utilized the groups described in Section 2.5 which are based upon the general method of data assimilation applied. The various FBIAS, FSS, and CSI curves throughout the lead forecast period of study (as seen in Figures 7-9) show similarity within these assimilation methodology groups, namely, hot start (HS), analysis nudge (AN), observation nudge (ON), and warm start (WS). Note that the time subscript is omitted when referring to the evaluation of the WRF forecasts in this section because each experiment is evaluated based on data that are combined across the cycles.
Although VLAPS0 is considered as a HS, it does differ from previous LAPS/vLAPS HS experiments [19,58] in that it did not directly insert the vertical velocity from the analysis into the WRF-ARW initial conditions. However, the hydrometeor fields produced in the vLAPS analyses are used in the appropriate initial WRF-ARW arrays. Additionally, since both reflectivity and radial wind information from the WSR-88D radars were incorporated into the vLAPS analyses, it is expected that the initial vertical velocity fields adjusted quickly through continuity considerations to the convective-scale information provided by the radar data. Any experiment that used a pre-forecast period and which started from an initial vLAPS analysis also filled the initial WRF-ARW arrays in the same fashion (Section 2 provides more details on the use of vLAPS analyses for model initialization). When the vLAPS analyses are used instead solely as a source for 15 min intermittent analysis nudging across a pre-forecast period by WRF-ARW, the analyses are only used to nudge the potential temperature, water vapor mixing ratio, and u-and v-wind components, as these are the variables for which analysis nudging is available in WRF-ARW.       Upon examining Figures 7-9, the first thing of immediate note is that for this case study event, the HS experiment stands out from the others, mostly within the initial lead forecast hour. The FSS and CSI scores for the HS experiment are clearly superior to all other groups during at least the first 30 min of lead forecast time (independent of whether the 25 dBZ or 35 dBZ reflectivity threshold value is used). This strong advantage for very Upon examining Figures 7-9, the first thing of immediate note is that for this case study event, the HS experiment stands out from the others, mostly within the initial lead forecast hour. The FSS and CSI scores for the HS experiment are clearly superior to all other groups during at least the first 30 min of lead forecast time (independent of whether the 25 dBZ or 35 dBZ reflectivity threshold value is used). This strong advantage for very short term nowcasting of vLAPS is most likely due to its use of a diabatic hot start methodology and has been noted previously [19,58]. The motivation with vLAPS has always been to compete with and exceed the skill of simpler nowcasting methods such as feature-based advection/extrapolation or basic persistence during the initial hour. This is a very challenging goal for convective-scale NWP modeling. In at least this study, the flip side to HS seems to be that the FSS and CSI scores quickly degrade back to those of the other methods by approximately a 2 h lead time and, by a 5 h lead time, even appear to result in somewhat worse scores. Perhaps this is some evidence of an overfitting to the stronger reflectivity echoes at time 0 h, and that some degree of imbalance could still remain between the full set of mass and momentum model fields produced by the hot start analysis itself.
The AN group of experiments exhibited the highest FSS and CSI scores during the first lead hour (outside of HS). The AN group shows that both FSS and CSI gradually declined to approximately 2 h and then remained mostly flat until approximately 4 h. During most of the period after approximately the 1 h lead forecast time, the AN experiments clump in with the other groups in terms of general FSS and CSI scores. An exception is the VLAPS3AO experiment in the AN group, which shows a gradual decrease in FSS and CSI scores after the 4 h lead forecast-all other experiments in all groups show a slight increase in FSS and CSI scores starting at approximately the 4.5 h lead forecast. This may be an artifact of the very limited number of cycles (and thus very small sample size) available to compute metrics at the 4 h and 5 h lead forecasts. It could also point to some need to fine-tune aspects of the combined analysis and observation nudging criteria used by VLAPS3AO, particularly when combining meso/synoptic-scale conventional observations via observation nudging with convective-scale observation data assimilated through analysis nudging to the 15 min updated vLAPS analyses (when the radar data are incorporated).
The ON group fell in the middle to lower half of the pack of all groups, both for the short and longer lead forecast times, in terms of FSS and CSI scores. The FSS and CSI scores for the VLAPS3O experiment generally remained fairly consistent at all lead forecast times.
The FSS and CSI scores for the WS group tended to track fairly closely to those of the ON group, although HRRR0 was a bit lower initially, while VLAPS3 beyond 2.5 h was a notable exception (higher FSS at further lead times). For the 25 dBZ threshold, the WS group (including HRRR0 and HRRR3) tended to outperform the ON group in FSS and CSI through much of the lead forecast period after approximately 3 h. VLAPS3 generally showed the highest FSS and CSI scores of all experiments from lead forecasts of 2.5 h and beyond, particularly for the 25 dBZ reflectivity threshold. Tables 2 and 3 show the FSS scores at lead forecast times of 0.25 h, 1.75 h, 3.50 h, and 5.00 h for each experiment, and for the 25 dBZ and 35 dBZ thresholds, respectively (for a 9 km neighborhood size). The tables more clearly show the data values, which can also be estimated from Figure 7.  Examining the FBIAS among the groups (Figure 9), we can determine whether the model was forecasting too much (overforecasting; FBIAS > 1.0) or too little (underforecasting; FBIAS < 1.0) areal coverage of reflectivity at or above a given reflectivity threshold. Early in the lead forecast period, nearly all the groups of experiments showed a significant tendency to overforecast the stronger convection (higher reflectivity) areas, such as at the higher 35 dBZ threshold. HRRR0 from the WS group produced the least biased FBIAS results for both thresholds early in the lead forecast time, but with increasing FBIAS through approximately the 3 h lead forecast. The VLAPS3 and HRRR3 experiments, also from the WS group, tended to also be less high biased during the early lead forecast period.  Tables 4 and 5 show the specific FBIAS scores at lead forecast times of 0.25 h, 1.75 h, 3.50 h, and 5.00 h for each experiment, and for the 25 dBZ and 35 dBZ thresholds, respectively. The overforecasting at the 25 dBZ and 35 dBZ thresholds is generally more pronounced for forecasts recently influenced by observations (via vLAPS analyses directly inserted at t 0 in VLAPS0, via vLAPS analyses being analysis nudged towards in VLAPS3A and VLAPS3AO, or via observation nudging in VLAPS3O). It may be that the methods used to incorporate observations into vLAPS analyses and through observation nudging make assumptions that do not work optimally for this specific case and the specific observations available; future work should explore the generality of this bias and seek to determine how the methodologies could be improved to lower the overforecast bias.
The variation in performance by neighborhood size and reflectivity threshold (as measured by FSS) for this case study can be seen more clearly in Table 6. In Table 6, the FSS is shown for a few different lead forecast times near the start and end of the forecast period (0.50 h and 5.00 h), and for different neighborhood sizes (1 km, 9 km, 17 km). The lower 10 dBZ threshold is focused upon in Table 6, although for the 9 km neighborhood (closest to the effective resolution of the WRF 1 km grid spacing output), additional thresholds of 25 dBZ and 35 dBZ are also shown. The tendency of FSS to improve with lower reflectivity Atmosphere 2022, 13, 127 20 of 26 thresholds and larger neighborhood sizes, as seen in most previous NWP convectionallowing precipitation studies, is repeated here.  Table 6. FSS as it varies across three different neighborhood sizes for the 10 dBZ threshold, and for two different lead forecast times (near the start and end of the forecast period). Note that additional thresholds of 25 dBZ and 35 dBZ are shown only for the 9 km neighborhood size (since 9 km is of interest due to it closely representing the effective resolution of the WRF output produced from the 1 km grid spacing nest).

Discussion
This study's main goal was to complete a preliminary investigation of how nudging and vLAPS 3DVAR compare, and how combining them could improve short-range nowcasting. This paper's focus was more upon the overall methods, and less upon the specific differences across the forecast results, because statistical evaluation of additional cases is needed to draw conclusive statements.
From a subjective standpoint, the experiments captured the main features/structures and general timing of the severe convective outbreak over Oklahoma on the afternoon of 20 May 2013 (some more realistically than others). This was a strongly forced convective event and was likely to be inherently more predictable [59][60][61][62]. The FSS and CSI metrics showed that the experiments performed similarly around the 2 h forecast, while differences among the experiments were somewhat larger at other forecast hours. VLAPS0 is an outlier with much higher scores over the initial lead forecast hour (but generally the lowest scores during the 3 to 5 h lead forecast period). The experiments that applied data assimilation across a short pre-forecast period, including the VLAPS3AO hybrid observation/analysis nudging approach introduced by the authors of this paper, appeared to improve FSS and CSI at the start of the nowcast cycles compared to not performing data assimilation during this pre-forecast period (i.e., VLAPS3AO, VLAPS3A, and VLAPS3O compared to VLAPS3). However, they could not match the FSS and CSI gained by inserting vLAPS analyses at the beginning of the simulation without a pre-forecast period (VLAPS0).
The VLAPS3 combination of starting the model integration from a 3D variational analysis (leveraging radar data), followed by a subsequent 3 h pre-forecast period, performed well relative to most of the other experiments for bias in the first portion of the forecast period (HRRR0 had better bias and HRRR3 was very similar to VLAPS3). It also performed better than all of the other experiments for CSI/FSS in the latter portion of the forecast period.
While analysis nudging toward high-resolution 3DVAR vLAPS analyses shows potential promise for improving short-term convection forecasts, the results suggest further work is needed to best leverage high-resolution analyses. The rapid increase in overforecast bias in VLAPS0 at the beginning of the forecast and CSI and FSS performing worse than any other experiment later in the forecast suggest that the VLAPS0 method of directly inserting the analysis at the beginning of the forecast may not be optimal. Analysis nudging towards a series of vLAPS analyses may be a way to improve the assimilation of vLAPS analyses. However, two factors suggest room for improvement in the methodology used to analysis nudge towards the vLAPS analyses (VLAPS3A, VLAPS3AO). One factor is that the analysis nudging experiments have a noticeably higher overforecast bias than an experiment started with the same initial conditions but no subsequent data assimilation (VLAPS3). The second factor is that during the first hour, directly inserting vLAPS analyses at the beginning of the forecast (VLAPS0) performs much better (in terms of CSI and FSS) than analysis nudging towards the vLAPS analyses. Analysis nudging is not normally used to assimilate analyses with the high temporal (15 min) and horizontal (1 km) spacing used here. Thus, informing the analysis nudging weights by previous research may have resulted in analysis nudging having been applied more weakly or more strongly than would best assimilate the analyses. Additionally, in order to fully assimilate the vLAPS analyses, nudging towards additional fields from the analyses may be needed to supplement those currently available for analysis nudging in WRF. Given the verification results, it is not clear that using observation nudging (VLAPS3A vs. VLAPS3AO) adds values in this case; this may be because the high-temporal and spatial resolution observations used in the vLAPS analysis limit the added value of assimilating individual observations at a time centered on their valid times.
Both HRRR0 and HRRR3 (also in the WS group) proved competitive in FSS and CSI scores after approximately the first 90 min of the lead forecast and out through the longer lead times (and for FBIAS throughout the forecast); this is consistent with both the high-resolution nature and the sophisticated data assimilation methodology built into the HRRR cycling model itself. Since the 15 UTC HRRR cycle forecasts were leveraged in every experiment across all cycle times, the oldest HRRR forecast used to provide a lateral boundary condition (at 00 UTC 21 May) had a 9 h lead time. Some recent studies [63][64][65] indicate that convection-resolving NWP models, using sophisticated data assimilation approaches (including radar data ingest), can show skill and predictability out to 6-12 h. This is especially true for strongly forced convection, which is the type of convection in this study. Weakly forced convection (such as the type common over the summer in the southeastern US) tends to be considerably less predictable at these same resolutions, due to less certainty in capturing the convective initiation mechanisms [66]. Two advantages our simulations had were that (1) a high-quality 3 km HRRR forecast produced at 15 UTC was available for creating lateral boundary tendencies (and either the initial conditions or the first guess for the analysis used as the initial conditions); and (2) multiple Doppler WSR-88D radars were available to the vLAPS analyses. If we consider a forward-deployed army unit running a similar modeling configuration on a laptop with restrictive communication bandwidth availability, conditions such as (1) and (2) above will likely not typically be present. However, this study shows that atmospheric data assimilation approaches not reliant on a full EnKF [67] or 4DVAR [68] approach, and not in need of high-performance computing cluster solutions, can still provide value-added short-range nowcast guidance to local human forecasters and automated forecast systems and weather decision support tools.
While this study used a single, large domain to simplify the experiment setup (made possible by the availability of the high-resolution HRRR output), in order to make this tractable for a forward-deployed system, nested grids with a much smaller innermost domain would be needed. In the case of a tactical NWP-based nowcasting system (a rapid update cycling model), the ability to collect and effectively assimilate special tactical weather observations (sensors on unmanned aerial systems, lidar, radar, etc.) along with non-traditional sources of weather observations (for example, the Air Force Global Synthetic Weather Radar product [69], which leverages satellite observations and other data sources) will be an important avenue to full success. Additionally, while this study used vLAPS analyses, the techniques investigated could be used with other high-resolution analyses.
In terms of convective forecasting, which was the focus of this paper, there is still ample work remaining towards improvement upon the bias and skill issues, especially at the higher reflectivity threshold levels. Future work should investigate additional cases to explore the generality of the results seen in this case, including exploring whether the high reflectivity bias seen in the current case is seen in other cases and determining the causes of this bias. Given that this was a strongly forced and large-scale convective outbreak, these issues are likely going to be even more challenging in weakly forced convective environments [66]. This case study describes methodologies that can be used to apply 3DVAR analyses in conjunction with nudging, demonstrates that applying these methodologies for vLAPS analyses for this case study does not show a clear overall improvement, and suggests areas for future research to explore how to improve the combination of 3DVAR analyses and nudging. In this case study, directly applying the vLAPS analyses at the 0 h forecast time without analysis nudging resulted in higher FSS and CSI than the other experiments throughout the 0-1 h forecast, but this was also the only experiment with bias increasing substantially during this time period. Similarly, Jiang et al. [19] found that in WRF forecasts initialized with vLAPS, the equitable threat scores were highest at the beginning of the forecasts At a national level, convection-permitting models are already performing with good skill and with a high degree of sophistication in data assimilation strategies and NWP physics [70]. Running limited-area mesoscale modeling configurations on hardware technology as restricted as a desktop or laptop, while maintaining a flexibility to adjust quickly for operations over diverse and often data-restricted locations around the world, will require making intelligent adjustments to the national approaches at convection-permitting scales. The 1 km (and even finer) grid spacing is important for resolving many boundary layer flow phenomena that impact both military and civilian interests near the earth's surface, and convective forecasting is just one aspect of those (although a potentially highly impactful one).