Impact of AERI Temperature and Moisture Retrievals on the Simulation of a Central Plains Severe Convective Weather Event

: In this study, bias-corrected temperature and moisture retrievals from the Atmospheric Emitted Radiance Interferometer (AERI) were assimilated using the Data Assimilation Research Testbed ensemble adjustment Kalman ﬁlter to assess their impact on Weather Research and Forecasting model analyses and forecasts of a severe convective weather (SCW) event that occurred on 18–19 May 2017. Relative to a control experiment that assimilated conventional observations only, the AERI assimilation experiment produced analyses that were better ﬁt to surface temperature and moisture observations and which displayed sharper depiction of surface boundaries (cold front, dry line) known to be important in the initiation and development of SCW. Forecasts initiated from the AERI analyses also exhibited improved performance compared to the control forecasts using several metrics, including neighborhood maximum ensemble probabilities (NMEP) and fractions skill scores (FSS) computed using simulated and observed radar reﬂectivity factor. Though model analyses were impacted in a broader area around the AERI network, forecast improvements were generally conﬁned to the relatively small area of the computational domain located downwind of the small cluster of AERI observing sites. A larger network would increase the spatial coverage of “downwind areas” and provide increased sampling of the lower atmosphere during both active and quiescent periods. This would in turn o ﬀ er the potential for larger and more consistent improvements in model analyses and, in turn, improved short-range ensemble forecasts. Forecast improvements found during this and other recent studies provide motivation to develop a nationwide network of boundary layer proﬁling sensors. improvement in both RMSE and bias with respect to the meridional wind. These results provide conﬁdence in the e ﬃ cacy of this bullseye is a horseshoe-shaped area of reduced probabilities. This indicates that the AERI ensemble forecasts do a better job of focusing the convection where it actually occurred by eliminating spurious convection in extreme eastern Kansas. The results are not entirely consistent, however, as the probability differences do not coincide precisely with the observed line of convection in central Kansas, and in fact there is a large area of spurious increase to the north of the observations. Elsewhere in the domain, there is little difference between the two sets of forecasts.


Introduction
Severe thunderstorms and tornadoes (hereafter referred to as severe convective weather, SCW) are among the most spectacular of natural phenomena and have attracted the interest of observers for millennia. SCW also poses a significant risk to life and property. While advances have been made in the prediction of SCW over the past decade, due in large part to advances in numerical weather prediction (NWP) models and data assimilation (DA), significant challenges remain. Forecasters increasingly depend on guidance from NWP models, and continued improvements in the sophistication, accuracy and computational efficiency of these models will be necessary if forecasts are to continue to improve.
Progress toward these ends can be met in numerous ways, from increases in model resolution and more realistic physics [1], through the development of new DA algorithms [2], and through the inclusion of new observation types in the data stream that drives the DA. More effective DA methods In the present work, we focus on a network of four AERIs located in north-central Oklahoma as part of the extended facilities of the Atmospheric Radiation Measurement (ARM) [25] Southern Great Plains (SGP) facility. As the network is maintained in a quasi-operational state, it serves as an excellent prototype of a future operational ground-based profiling network, such as that identified as a high priority by the National Research Council [26]. To explore the potential value of a network of AERIs in operational NWP, and to build upon the work presented in Coniglio et al. [20], Hu et al. [22], and Degelia et al. [24], we examine the impact of AERI observations on the simulation of an SCW event that occurred over the central plains during [18][19] May 2017. The AERI instrument provides high-density temperature and moisture observations of the atmospheric boundary layer, and it is our hypothesis that these observations will lead to improved model depictions of surface temperature and moisture when assimilated with the WRF-DART system.
The remainder of the paper is organized as follows. In Section 2, the methodology employed in the research (including descriptions of the NWP model, DA system, observations, and experiment design) is presented. Section 3 presents the results of the numerical experiments, and Section 4 is devoted to summarizing the results and stating the related conclusions, respectively.

Model and Data Assimilation System
Our forecast system employs the Weather Research and Forecasting (WRF-ARW) model V3.8.1 [10] during the forecasting step. The Data Assimilation Research Testbed (DART) Manhattan release [11] employing an ensemble adjustment Kalman filter [27,28] is used to assimilate the observations. For this study, the WRF and DART systems were installed and configured on the S4 supercomputer [29] located at the Space Science and Engineering Center at the University of Wisconsin-Madison.
As mentioned in the introduction, our modeling and DA system captures many of the essential features of the WoFS, the complete description of which is provided in Wheatley et al. [8] and Jones et al. [9]. Here, we provide a brief overview of the more important details as well as differences specific to this study. For this work, a 36-member ensemble is generated from the first 18 members of the Global Ensemble Forecast System (GEFS) using a physics diversity approach similar to that advocated by Stensrud et al. [30] and Fujita et al. [31]. The details are summarized in Table 1. Atmospheric initial conditions are provided by the GEFS analyses valid at 1200 UTC, and boundary conditions are updated using GEFS 6-hourly forecasts valid at 18:00, 00:00 and 06:00 UTC, respectively. Soil moisture and soil temperature are initialized using NCEP's North American Mesoscale Forecast System (NAM) analysis, and their evolution is subsequently modeled using the Noah land-surface model [32]. The WRF configuration consists of a single nest using a 550 × 500 grid point domain with 3 km horizontal grid spacing and 56 vertical levels. The model top is set at 10 hPa. To accommodate the SCW event being studied (see Section 2.4), the model domain is located over the south-central United States (Figure 1). For our study, the half-width of the horizontal and vertical localization radii [33] for conventional observations (radiosonde observations, aircraft reports, surface observations) is Atmosphere 2020, 11, 729 4 of 20 set at 230 and 4 km, respectively, while those for the AERI observations are set at 200 and 4 km, respectively (values arrived at after a period of parameter tuning). The localization is determined by the three-dimensional distance between the observation location and the analysis grid point. In other words, the analysis increment does not fill a cylinder determined by the horizontal and vertical radii given above, but rather an ellipsoid. The locations of the AERI sites (green triangles) and the respective full localization radii (magenta circles) are shown in Figure 1. A spatially and temporally varying adaptive inflation scheme applied to the prior state [34] is used to improve the dispersion of the ensemble (and to prevent filter divergence) for the duration of the DA cycling, and the outlier parameter is set at 3.0, meaning that observations whose observation-minus-background (OMB) value is greater than 3 standard deviations times the total spread are rejected. The DART configuration is summarized in Table 2.
Atmosphere 2020, 11, x FOR PEER REVIEW 4 of 22 The WRF configuration consists of a single nest using a 550 × 500 grid point domain with 3 km horizontal grid spacing and 56 vertical levels. The model top is set at 10 hPa. To accommodate the SCW event being studied (see Section 2.4), the model domain is located over the south-central United States (Figure 1). For our study, the half-width of the horizontal and vertical localization radii [33] for conventional observations (radiosonde observations, aircraft reports, surface observations) is set at 230 and 4 km, respectively, while those for the AERI observations are set at 200 and 4 km, respectively (values arrived at after a period of parameter tuning). The localization is determined by the three-dimensional distance between the observation location and the analysis grid point. In other words, the analysis increment does not fill a cylinder determined by the horizontal and vertical radii given above, but rather an ellipsoid. The locations of the AERI sites (green triangles) and the respective full localization radii (magenta circles) are shown in Figure 1. A spatially and temporally varying adaptive inflation scheme applied to the prior state [34] is used to improve the dispersion of the ensemble (and to prevent filter divergence) for the duration of the DA cycling, and the outlier parameter is set at 3.0, meaning that observations whose observation-minus-background (OMB) value is greater than 3 standard deviations times the total spread are rejected. The DART configuration is summarized in Table 2.  Observation errors for the AERI temperature and dewpoint retrievals are obtained by summing the measurement uncertainties obtained from the AERIoe algorithm (described in detail in Section 2.2) and a specified representativeness error (here taken to be 1 K for both temperature and dewpoint). The aggregate errors are quite similar to those shown in Figure 5 of Coniglio et al. [20] though somewhat smaller given that study assumed a representativeness error of 2 K.
Our WRF-DART modeling and DA system is run from 12:00 UTC to 18:00 UTC in free forecast mode (i.e., without assimilating any observations at 12:00 UTC) to allow the ensemble to develop flow-dependent covariances at finer scales. Beginning at 18:00 UTC, continuous data assimilation using a 15-min cycling period is commenced and continued until 03:00 UTC the following day. Ensemble Atmosphere 2020, 11, 729 5 of 20 forecasts are initialized from the posterior analyses at hourly intervals starting at 19:00 UTC and run until 04:00 UTC.

AERI Temperature and Water Vapor Retrievals
The impact experiments rely on retrievals of temperature and water vapor derived from the raw AERI measurements. The procedure for obtaining these retrievals is described here. Since the vertical distribution of temperature and moisture has a pronounced impact on the atmosphere's radiative characteristics, surface-based observations of the downwelling infrared spectrum are a function of the vertical thermodynamic profiles. The retrieval process attempts to invert that relationship by determining the most likely atmospheric state capable of producing an observed spectrum. This is an ill-posed problem because a finite number of observations must characterize the smoothly-varying vertical profiles. The AERI optimal estimation (AERIoe) [35] retrieval used in the present work is a physical retrieval in that it uses a forward radiative transfer model (in this case the LBLRTM, or line-by-line radiative transfer model, as described by Clough et al. [36] to map a first guess of the thermodynamic state into an infrared spectrum). The modeled spectrum is compared to the observed spectrum, and the first guess temperature and water vapor profiles are iteratively adjusted until they converge to a solution. The information content present within the AERI spectrum enables AERIoe to retrieve the thermodynamic profile in the lowest 3 km of the clear sky atmosphere. Since clouds are opaque in the infrared, AERIoe profiles are available only up to the cloud base. An automated precipitation-sensing hatch closes to protect the AERI optics from rain and snow; therefore, retrievals are also unavailable during active precipitation.
A given retrieval is constrained by a climatology consisting of the mean atmospheric state and its covariance, which provides information about typical atmospheric behavior and thus helps identify likely states. In this case, the climatology is calculated from a 10-plus year dataset of radiosondes launched four times daily from the SGP central facility in Lamont, OK; the large number of radiosonde observations allows the climatological state to be calculated for each month of the year. A three-month span of radiosonde launches centered on the current calendar month provides sufficient background about expected atmospheric behavior while the covariance matrix accounts for variability in that state and is used as the a priori estimate (i.e., first guess) for the retrieval.
As the radiance observations are inherently noisy [13], two techniques were applied to reduce that noise. First, a principal component analysis was applied to the radiance observations to separate uncorrelated noise [37]. Second, the noise-filtered spectra were averaged together over a finite period. Care was taken during the noise filtering process in this study to produce profiles that were proper analogues of those that would be available from an operational network operating in real time. First, an independent noise filter was applied that only used spectral observations from the period leading up to a given observation (since real-time observations could not rely on a dependent noise filter that used data from a period centered on the current time) and, second, data were averaged over Atmosphere 2020, 11, 729 6 of 20 discrete 15-min periods. This allowed a retrieval to be completed on present-day hardware before the next 15-min averaging period was concluded, thereby satisfying a necessary criterion for future operational application.
Finally, since the information content present in the retrievals drops off with height, the retrieved profiles were strongly weighted towards the a priori estimate as height increases from the surface. Therefore, only the retrieved values of temperature and water vapor within 3 km of the surface are retained for this study. Note that, for analysis purposes, subsequent sections refer to the AERI observations with respect to their height above mean sea-level (not above ground-level).
The 2017 Land Atmosphere Feedback Experiment (LAFE) [38] brought three AERIs within 2 km of the four daily radiosonde launches at the SGP site, providing an unprecedented opportunity to evaluate the performance of multiple AERIs concurrently. This helped assess the spread in AERI observations in effectively identical conditions and provided valuable insight into the repeatability of the AERI retrievals. From comparisons with the radiosonde launches, we were able to develop a simple bias correction technique by computing the mean differences between AERI temperature and dewpoint retrievals and their rawinsonde observation system (RAOB) counterparts. These differences were then subtracted from the AERI retrievals used in the present case to produce bias-corrected observations. It should be noted that bias-correction of AERI retrievals with respect to four daily RAOBs may not be practical in all areas; however, similar techniques could be derived using other readily available observation sets (e.g., aircraft reports) and, in principle, ought to work just as well.
A test was performed to compare the quality of model analyses made from cycling experiments using the original AERI dataset as well as the bias-corrected dataset. These results are presented in Section 3.

Forecast Period and Case Selection
To demonstrate the impact of the AERI retrievals, a SCW outbreak case from 2017 was chosen. Aside from being a high-impact event with widespread severe weather, the initiation and development of the associated convection occurred in two discrete areas advantageous for interpretation. The main line of convection developed between 18:00 and 19:00 UTC 18 May, along a dry line located to the west of the AERI sites described above, while a secondary area of severe weather initiated in central Kansas (just north of the AERI sites) near 2000 UTC. Given that the synoptic flow at this time was generally southwesterly (southerly at the surface, rapidly veering to west-southwesterly at 500 hPa), these separate convective initiations occurred upwind and downwind of the AERI sites, respectively. This scenario suggests that AERI observations assimilated every 15 min would in theory have time to modify the simulated pre-storm and inflow environments of the evolving storm system and potentially alter the spatial distribution and intensity of the main line of model-simulated convection as it approached central Oklahoma. The scenario also suggests that the AERI observations might be able to impact the initiation and development of model-simulated convection in central Kansas, as it developed in an area downstream of the AERI observing sites.
The synoptic set-up for this event is depicted in Figure 2. On the morning of 18 May, a vigorous mid-level vorticity maximum associated with a 500-hPa trough and closed low was situated over northern Utah, digging east-southeastward toward Colorado (Figure 2a). A cold front sagged from northern Missouri through Kansas into eastern Colorado then southwestward into Arizona. At the same time, a well-defined dry line was situated south-north across west-central Texas (Figure 2b). The cold front and dry line served as foci for convective activity that developed later in the day as they interacted with a shallow layer of warm, moist air flowing northward from the Gulf of Mexico and the favorable shear profile provided by the approaching vorticity maximum and trough. Given this set-up, the National Oceanic and Atmospheric Administration (NOAA) Storm Prediction Center (SPC) forecast a greater than 30% probability of tornadoes within 25 miles of a given point in central Kansas and Oklahoma (Figure 3a). Severe weather reports received by SPC indicate that 48 tornadoes occurred in two distinct bands: one oriented north-south from central Kansas to central Oklahoma (in Atmosphere 2020, 11, 729 7 of 20 the high-risk area mentioned above) and the other oriented southwest-northeast in eastern Oklahoma and Western Missouri (Figure 3b). In addition, there were numerous reports satisfying criteria for SCW (both with respect to hail size and wind gust magnitude) in this region.  Since the AERI instrument provides high-density temperature and moisture observations of the atmospheric boundary layer, it is our hypothesis that these observations, when assimilated using the WRF-DART system, will provide improved depictions of surface temperature and moisture. This in turn should lead to improved positioning of critical surface boundaries (the cold front and dry line mentioned above) which serve as foci for convective initiation and development.  Since the AERI instrument provides high-density temperature and moisture observations of the atmospheric boundary layer, it is our hypothesis that these observations, when assimilated using the WRF-DART system, will provide improved depictions of surface temperature and moisture. This in Atmosphere 2020, 11, 729 9 of 20 turn should lead to improved positioning of critical surface boundaries (the cold front and dry line mentioned above) which serve as foci for convective initiation and development.

Experiment Design and Evaluation Metrics
To quantify the impact of AERI observations on the analysis and forecast of the SCW event, we first conducted control (CTL) experiments using the model system described in Section 2.1. The CTL experiments assimilate only conventional observations from the National Centers from Environmental Prediction (NCEP) operational Global Data Assimilation System (GDAS). These included aircraft communications addressing and reporting system (ACARS) temperature and horizontal wind; universal RAOB temperature, specific humidity, and horizontal wind; and surface airways pressure (altimeter setting), temperature, dew point, and horizontal wind components. Next, we conducted impact experiments that are identical to CTL but which included assimilation of the AERI observations (hereafter referred to as "AERI"). Comparison of CTL and AERI thus provides an estimate of the impact of the AERI temperature and water vapor retrievals on the accuracy of WRF-DART analyses and the skill of forecasts initiated from these analyses.
To assess forecast quality, we employed two metrics: neighborhood maximum ensemble probability (NMEP) [39] and fractions skill score (FSS) [40,41]. NMEP and FSS are both spatial verification methods, meaning that they measure forecast model performance relative to the observations over discrete neighborhoods rather than point-by-point (as do more familiar metrics such as root-mean square error, or RMSE). In particular, NMEP searches within the neighborhood of a grid point to determine if the maximum value of a forecast model output exceeds a given threshold. This is done for each ensemble member, and the fraction of ensemble members which exceed the threshold gives the NMEP for that grid point. For FSS, on the other hand, the simulated and observed fraction of points within the neighborhood that exceed a given threshold are compared. Model forecasts of composite reflectivity (computed using the WRF Unified Post Processor version 3.2) were first mapped to a grid with 3-km spacing whose boundaries are indicated by the solid red rectangle in Figure 1. National Severe Storms Laboratory (NSSL) MRMS (multi-radar/multi-sensor) observations of composite WSR-88D radar reflectivity [42] were then mapped to the same 3-km grid before computation of NMEP and FSS values. In addition, a smaller subdomain (also with 3-km spacing) was constructed (dashed red rectangle in Figure 1) to assess forecast performance in that portion of the computational domain lying immediately downwind of the AERI sites (hereafter referred to as the "downwind" domain). For the NMEP calculations, a neighborhood of 24 km by 24 km (i.e., a half-width of 12 km) was used, whereas the FSS was computed from forecast aggregates at lead times of 1, 2, 3, 4, 5, and 6 h for neighborhoods ranging from 20 km by 20 km to 120 km by 120 km (i.e., radii ranging from 10 km to 60 km, respectively). Additionally, no attempt was made to account for model bias in the calculation of FSS as it was assumed that any bias present (on account of microphysical parameterization, model resolution, etc.) would impact both sets of experiments equally and become negligible when differences between CTL and AERI were computed. Only forecasts initiated from WRF-DART analyses valid at 19-22 UTC were evaluated. SCW began to impact the AERI network beginning at 23 UTC and made further retrieval of temperature and dewpoint profiles impossible.

Results
Before considering the impact of the AERI observations relative to CTL, it is first instructive to establish the superiority of the bias-corrected observation set. To do so, we conducted two cycling DA experiments. The first (AERI NOBC) assimilated the original (i.e., not bias-corrected) AERI dataset, and the second (AERI) assimilated the bias-corrected dataset. There is a small but noticeable improvement of the model fit to both RAOB temperature ( Figure 4a) and RAOB specific humidity (Figure 4b) in terms of both RMSE and bias, especially in the lowest 150 hPa of the troposphere. The model fit to ACARS wind observations is most notable in the improvement in both RMSE and bias with respect to the meridional wind. These results provide confidence in the efficacy of the AERI bias-correction algorithm, and hereafter AERI refers to the simulation conducted with bias-corrected observations.
Atmosphere 2020, 11, x FOR PEER REVIEW 11 of 22 Both the AERI and CTL experiments successfully depict the broad synoptic features at play in the SCW outbreak ( Figure 2). The 19:00 UTC WRF-DART ensemble-mean analyses in Figure 5a,b demonstrate that CTL and AERI possess a well-defined front evidenced by the gradient in 2-m temperatures from northwest Kansas to the Kansas-Oklahoma border. A large moisture gradient is also evident in both experiments (Figure 5d,e). The portion of the moisture gradient extending northeast-to-southwest across Kansas is associated with the cold front (mentioned above), while the second and much more pronounced gradient is associated with the dry line advancing across the Texas panhandle.  Both the AERI and CTL experiments successfully depict the broad synoptic features at play in the SCW outbreak ( Figure 2). The 19:00 UTC WRF-DART ensemble-mean analyses in Figure 5a,b demonstrate that CTL and AERI possess a well-defined front evidenced by the gradient in 2-m temperatures from northwest Kansas to the Kansas-Oklahoma border. A large moisture gradient is also evident in both experiments (Figure 5d,e). The portion of the moisture gradient extending northeast-to-southwest across Kansas is associated with the cold front (mentioned above), while the second and much more pronounced gradient is associated with the dry line advancing across the Texas panhandle. Both the AERI and CTL experiments successfully depict the broad synoptic features at play in the SCW outbreak ( Figure 2). The 19:00 UTC WRF-DART ensemble-mean analyses in Figure 5a,b demonstrate that CTL and AERI possess a well-defined front evidenced by the gradient in 2-m temperatures from northwest Kansas to the Kansas-Oklahoma border. A large moisture gradient is also evident in both experiments (Figure 5d,e). The portion of the moisture gradient extending northeast-to-southwest across Kansas is associated with the cold front (mentioned above), while the second and much more pronounced gradient is associated with the dry line advancing across the Texas panhandle. Despite these superficial similarities, there are crucial differences between the two experiments. These can be elicited by taking the difference of the CTL and AERI analyses (i.e., AERI-CTL), for both surface temperature (Figure 5c) and moisture (Figure 5f). It is now apparent that, after 1 h of DA using 15 min cycles, information from the AERI retrievals has already propagated some distance from the AERI observing sites themselves. This is to be expected since one of the strengths of ensemble data assimilation is that the flow-dependent error covariances are propagated forward within the ensemble to the next assimilation time.
While both experiments clearly depict the cold front stretching northeast-southwest across Kansas, the AERI temperature analysis is much cooler in south central Kansas and northern Oklahoma, intensifying the temperature gradient across the cold front and thus sharpening the boundary (Figure 5c). The AERI mixing ratio analysis is drier in the same region as well (Figure 5f), and this area extends southward across Oklahoma into the extreme northeastern portion of the Texas panhandle. A narrow strip of warmer surface temperatures (and enhanced surface water vapor mixing ratios) is apparent in the AERI analysis over central Kansas. The net effect of these changes is to sharpen the moisture and temperature gradients along the cold front over a small region in central Kanas (Figure 2b) along which convection initiated approximately 1 h after these analyses (i.e., 20:00 UTC). The area in question is along and just north of the narrow strip of warming/moistening mentioned above.
The positive impact of the AERI observations on the surface temperature and moisture analyses can be confirmed by considering analysis errors with respect to Automated Surface Observing System (ASOS) observations of 2 m temperature and dewpoint obtained from the Iowa Environmental Mesonet (https://mesonet.agron.iastate.edu). The AERI simulation produces analyses which consistently have lower RMSE and bias than its CTL counterparts (Table 3), and indeed reduces the bias of the 2-m dewpoint analyses significantly. Corresponding impacts on the distribution of Convective Available Potential Energy (CAPE) and convective inhibition (CIN) are shown in Figure 6. While the CAPE fields are grossly similar in CTL and AERI (Figure 6a,b), there are noteworthy differences upon examination of the difference field in Figure 6c. In particular, a reduction in CAPE of nearly 1000 J kg −1 occurs in the AERI experiment over southern Kansas and northern Oklahoma, coincident with the cooler and drier areas identified in Figure 5c,f. At the same time, the entire domain, with the exception of areas north of the cold front, is either weakly capped or lacks any cap at all. (Figure 6d,e). Of particular note are a strip of increased CAPE and lower (i.e., more negative) CIN located over central Kansas (Figure 6c,f). The combination of these surface cooling/drying patterns has implications for convective initiation in Kansas and suggests that the AERI experiment would tend to produce forecasts with less convection in central Kansas except along the narrow strip where CAPE has increased (as identified above). This possibility, and the broader question of forecast impact, will be examined next. AERI observation forecast impact is further assessed by considering NMEPs for 1-h AERI and CTL ensemble forecasts initiated at 21:00 UTC. Figure 8 shows NMEPs for composite radar reflectivity using a threshold of 30 dBZ, with the shaded areas in Figure 8a,b representing the fraction of ensemble members for which this threshold is achieved or exceeded within 12 km of a given grid point. The differences between the CTL and AERI experiments are shown in Figure 8c. Overall, inspection of Figure 8c shows that the largest differences occur over Kansas and northern Oklahoma. Although these forecasts were initiated at a time when convection in Kansas had already formed (cf. Figure 7), the pattern of positive and negative probability differences shows that, in the downwind portion of the domain, the AERI ensemble improves the depiction of the observed convection (indicated by the solid black line) relative to CTL. This indicates that information from the AERI retrievals is entering the model analysis and impacting the forecasts in a dynamically consistent manner (it should be noted that such an impact is not obvious in the portion of the domain not downwind of the AERI sites). The spatial coherence is improved most notably in eastern Kansas, where the AERI forecasts place a bullseye of increased probability directly within the observed 30 dBZ contour on the easternmost edge of the convective line. Around this bullseye is a horseshoe-shaped area of reduced probabilities. This indicates that the AERI ensemble forecasts do a better job of focusing the convection where it actually occurred by eliminating spurious convection in extreme eastern Kansas. The results are not entirely consistent, however, as the probability differences do not coincide precisely with the observed line of convection in central Kansas, and in fact there is a large area of spurious increase to the north of the observations. Elsewhere in the domain, there is little difference between the two sets of forecasts. AERI observation forecast impact is further assessed by considering NMEPs for 1-h AERI and CTL ensemble forecasts initiated at 21:00 UTC. Figure 8 shows NMEPs for composite radar reflectivity using a threshold of 30 dBZ, with the shaded areas in Figure 8a,b representing the fraction of ensemble members for which this threshold is achieved or exceeded within 12 km of a given grid point. The differences between the CTL and AERI experiments are shown in Figure 8c. Overall, inspection of Figure 8c shows that the largest differences occur over Kansas and northern Oklahoma. Although these forecasts were initiated at a time when convection in Kansas had already formed (cf. Figure 7), the pattern of positive and negative probability differences shows that, in the downwind portion of the domain, the AERI ensemble improves the depiction of the observed convection (indicated by the solid black line) relative to CTL. This indicates that information from the AERI retrievals is entering the model analysis and impacting the forecasts in a dynamically consistent manner (it should be noted that such an impact is not obvious in the portion of the domain not downwind of the AERI sites). The spatial coherence is improved most notably in eastern Kansas, where the AERI forecasts place a bullseye of increased probability directly within the observed 30 dBZ contour on the easternmost edge of the convective line. Around this bullseye is a horseshoe-shaped area of reduced probabilities. This indicates that the AERI ensemble forecasts do a better job of focusing the convection where it actually occurred by eliminating spurious convection in extreme eastern Kansas. The results are not entirely consistent, however, as the probability differences do not coincide precisely with the observed line of convection in central Kansas, and in fact there is a large area of spurious increase to the north of the observations. Elsewhere in the domain, there is little difference between the two sets of forecasts. To examine whether CTL and AERI ensemble forecasts offer practical guidance regarding the location and timing of severe weather (and, indeed, whether AERI offers improved performance in this regard) we compared NMEPs computed using a threshold value of 50 dBZ with observed indicia of severe weather (in this case, the observed location of hail per SPC reports). Overall, neither CTL nor AERI seems particularly skillful in depicting the precise timing and location of hail threats (Figure 9). However, comparison of the two experiments does indicate that, at least in the downwind portion of the domain. AERI does show increased probabilities of intense convection (i.e., greater than 50 dBZ, known to be associated with an increased risk of hail) in the general vicinity of 7 hail reports in east central Kanas. To examine whether CTL and AERI ensemble forecasts offer practical guidance regarding the location and timing of severe weather (and, indeed, whether AERI offers improved performance in this regard) we compared NMEPs computed using a threshold value of 50 dBZ with observed indicia of severe weather (in this case, the observed location of hail per SPC reports). Overall, neither CTL nor AERI seems particularly skillful in depicting the precise timing and location of hail threats (Figure 9). However, comparison of the two experiments does indicate that, at least in the downwind portion of the domain. AERI does show increased probabilities of intense convection (i.e., greater than 50 dBZ, known to be associated with an increased risk of hail) in the general vicinity of 7 hail reports in east central Kanas. Figure 8. Neighborhood maximum ensemble probabilities (NMEP) for 1-h WRF forecasts initiated at 21:00 UTC and valid at 22:00 UTC 18 May. CTL forecasts (a) and AERI forecasts (b) are evaluated using a threshold composite reflectivity factor of 30 dBZ. NMEP differences (AERI minus CTL) are shown in the rightmost column (c). The heavy black line depicts the threshold contour in the observed NSSL multi-radar/multi-sensor (MRMS) composite reflectivity. Locations of AERI observing sites are depicted by green triangles.
To examine whether CTL and AERI ensemble forecasts offer practical guidance regarding the location and timing of severe weather (and, indeed, whether AERI offers improved performance in this regard) we compared NMEPs computed using a threshold value of 50 dBZ with observed indicia of severe weather (in this case, the observed location of hail per SPC reports). Overall, neither CTL nor AERI seems particularly skillful in depicting the precise timing and location of hail threats ( Figure 9). However, comparison of the two experiments does indicate that, at least in the downwind portion of the domain. AERI does show increased probabilities of intense convection (i.e., greater than 50 dBZ, known to be associated with an increased risk of hail) in the general vicinity of 7 hail reports in east central Kanas. Having investigated the AERI observation impact on single forecasts, we now examine how the forecast errors vary with lead time and spatial scale. To do so, we aggregate forecasts initiated at 19:00, 20:00, 21:00 and 22:00 UTC by lead time on the domain defined by the solid red rectangle in Figure 1 and then compute FSS from the aggregates using the same composite radar reflectivity thresholds used in Figure 8  Having investigated the AERI observation impact on single forecasts, we now examine how the forecast errors vary with lead time and spatial scale. To do so, we aggregate forecasts initiated at 19:00, 20:00, 21:00 and 22:00 UTC by lead time on the domain defined by the solid red rectangle in Figure 1 and then compute FSS from the aggregates using the same composite radar reflectivity thresholds used in Figure 8 (i.e., 30 dBZ). These are shown in Figure 10. The dashed gray line represents the minimum skillful (or useful) FSS value and is given by where f 0 represents, in this case, the aggregate observed frequency of composite reflectivity factor exceeding the threshold value of 30 dBZ over the entire domain. Although both CTL and AERI are skillful at some spatial scales and lead times (neighborhood widths greater than 70 km at 1 h, for example), there is very little difference between the two experiments when assessed over the entire domain. This is emphasized by taking the difference in FSS (AERI-CTL) and computing 95% confidence intervals with a bootstrap resampling method [43] using 1000 replicates. The blue lines in Figure 10 represent FSS AERI -FSS CTL and the shaded gray regions depict the 95% bootstrap confidence intervals. None of the differences is significant, and we conclude that the AERI observations have no appreciable impact in FSS when computed over the entire analysis domain. However, when the FSS are aggregated over the smaller "downwind" domain (the dashed red rectangle in Figure 1) the positive impact of the AERI observations becomes apparent ( Figure 11). AERI forecasts are skillful for all neighborhoods larger than 20 km at lead times up to 3 h (Figure 10a-c) and the differences between AERI and CTL at these lead times are all significant at the 95% confidence level. Beyond three hours the differences diminish and gradually become insignificant for lead times of 5 and 6 h. The results for the "downwind" domain demonstrate that the AERI observations contribute to significant improvement in ensemble forecasts of strong convection in Kansas. The limited geographical extent of the impact is also consistent with the fact that the observations are only available for a small area in central Oklahoma. Use of a more extensive network would likely lead to a larger impact as suggested by the results of Otkin et al. [18] and Hartung et al. [19]. exceeding the threshold value of 30 dBZ over the entire domain. Although both CTL and AERI are skillful at some spatial scales and lead times (neighborhood widths greater than 70 km at 1 h, for example), there is very little difference between the two experiments when assessed over the entire domain. This is emphasized by taking the difference in FSS (AERI-CTL) and computing 95% confidence intervals with a bootstrap resampling method [43] using 1000 replicates. The blue lines in Figure 10 represent FSSAERI-FSSCTL and the shaded gray regions depict the 95% bootstrap confidence intervals. None of the differences is significant, and we conclude that the AERI observations have no appreciable impact in FSS when computed over the entire analysis domain. However, when the FSS are aggregated over the smaller "downwind" domain (the dashed red rectangle in Figure 1) the positive impact of the AERI observations becomes apparent ( Figure 11). AERI forecasts are skillful for all neighborhoods larger than 20 km at lead times up to 3 h (Figure 10a-c) and the differences between AERI and CTL at these lead times are all significant at the 95% confidence level. Beyond three hours the differences diminish and gradually become insignificant for lead times of 5 and 6 h. The results for the "downwind" domain demonstrate that the AERI observations contribute to significant improvement in ensemble forecasts of strong convection in Kansas. The limited geographical extent of the impact is also consistent with the fact that the observations are only available for a small area in central Oklahoma. Use of a more extensive network would likely lead to a larger impact as suggested by the results of Otkin et al. [18] and Hartung et al. [19]. To ensure that improvements are not gained in the "downwind" domain at the expense of significant degradations elsewhere, we compute FSS for the portion of the domain not defined as downwind of the AERI sites (i.e., excluding the area enclosed by the dashed red lines in Figure 1). Figure 12 shows that there is a slight (though statistically insignificant) reduction in FSS relative to CTL at the 1 h lead time, but very little difference thereafter.  To ensure that improvements are not gained in the "downwind" domain at the expense of significant degradations elsewhere, we compute FSS for the portion of the domain not defined as downwind of the AERI sites (i.e., excluding the area enclosed by the dashed red lines in Figure 1). Figure 12 shows that there is a slight (though statistically insignificant) reduction in FSS relative to CTL at the 1 h lead time, but very little difference thereafter. FSS were computed for the portion of the computational domain defined by the dashed red rectangle in Figure 1 (i.e., the "downwind" domain).
To ensure that improvements are not gained in the "downwind" domain at the expense of significant degradations elsewhere, we compute FSS for the portion of the domain not defined as downwind of the AERI sites (i.e., excluding the area enclosed by the dashed red lines in Figure 1). Figure 12 shows that there is a slight (though statistically insignificant) reduction in FSS relative to CTL at the 1 h lead time, but very little difference thereafter. Finally, to clarify the capability of CTL and AERI to provide skillful guidance regarding the location and timing of severe weather, we compute FSS using the 50-dBZ threshold employed in Figure 9. The results ( Figure 13) confirm that, while neither CTL nor AERI has any skill in this regard, at earlier lead times the AERI experiment is significantly better than CTL. This indicates that the AERI observations are, indeed, improving the forecasts of severe weather, although not to a degree sufficient enough to be considered significant per se.
Atmosphere 2020, 11, x FOR PEER REVIEW 18 of 22 CTL FSS is depicted by the solid blue line, and 95% bootstrap confidence intervals are indicated by the shaded gray region. The dashed gray line represents the minimum FSS for which forecasts are considered skillful (i.e., FSSuseful = 0.5 + f0/2, where f0 is the aggregate observed event frequency). The FSS were computed for the portion of the computational domain defined as the difference between the areas enclosed by the solid red rectangle and dashed red rectangles in Figure 1.
Finally, to clarify the capability of CTL and AERI to provide skillful guidance regarding the location and timing of severe weather, we compute FSS using the 50-dBZ threshold employed in Figure 9. The results ( Figure 13) confirm that, while neither CTL nor AERI has any skill in this regard, at earlier lead times the AERI experiment is significantly better than CTL. This indicates that the AERI observations are, indeed, improving the forecasts of severe weather, although not to a degree sufficient enough to be considered significant per se. h-over all forecasts initiated from 19:00 UTC to 22:00 UTC. The difference between the AERI and CTL FSS is depicted by the solid blue line, and 95% bootstrap confidence intervals are indicated by the shaded gray region. The dashed gray line represents the minimum FSS for which forecasts are considered skillful (i.e., FSSuseful = 0.5 + f0/2, where f0 is the aggregate observed event frequency). The FSS were computed for the portion of the computational domain defined by the dashed red rectangle in Figure 1 (i.e., the downwind portion of the domain).

Summary and Conclusions
Temperature and moisture retrievals were obtained from the Atmospheric Emitted Radiance Interferometer (AERI) and then subjected to a simple bias-correction procedure using a large sample of radiosonde observations from a nearby site. The bias-corrected AERI observations were assimilated using a WRF-DART modeling and DA system to demonstrate the observation impact on analyses and forecasts of a SCW event that impacted parts of Kansas and Oklahoma on 18-19 May 2017. Relative to a control simulation (CTL) which only assimilated conventional observations, the

Summary and Conclusions
Temperature and moisture retrievals were obtained from the Atmospheric Emitted Radiance Interferometer (AERI) and then subjected to a simple bias-correction procedure using a large sample of radiosonde observations from a nearby site. The bias-corrected AERI observations were assimilated using a WRF-DART modeling and DA system to demonstrate the observation impact on analyses and forecasts of a SCW event that impacted parts of Kansas and Oklahoma on 18-19 May 2017. Relative to a control simulation (CTL) which only assimilated conventional observations, the AERI experiment produced analyses that were better fit to surface temperature and moisture observations and which displayed a sharper depiction of surface boundaries (cold front, dry line) known to be important in the initiation and development of SCW. Although neither CTL nor AERI ensemble forecasts were skillful in predicting the location and timing of discrete severe weather events (as defined by SPC hail reports), the AERI experiment exhibited significantly higher fractions skill scores (FSS) for threshold values of 50 dBZ than CTL for early forecast lead times in the downwind portion of the forecast domain. This positive impact was even more evident in improvement of FSS for a lower threshold value of 30 dBZ, suggesting that the improved surface temperature and moisture analyses in the AERI experiment led to forecasts with more realistic distributions of convection.
While these results are encouraging, the impact of AERI observations in a fully configured, quasi-operational data assimilation system (including radar observations) would likely be somewhat smaller than presented here due to inevitable overlap of at least some information content between the AERI and radar observation sets. For this reason, future work will include a larger selection of cases designed to gauge performance in a wide range of synoptic and mesoscale regimes and will include larger observation sets (including radar and satellite observations). The role of AERI observation density will also be tested, since understanding the relationship between the density of observing sites and analysis/forecast impact could provide important insights regarding the potential benefits of a larger network.
While the impacts discussed in this study were generally confined to a relatively small area of the computational domain, they were consistent in appearing downwind of the small cluster of AERI observing sites. These results could also be interpreted as providing additional evidence that a national network of boundary layer profiling sensors has the potential to improve forecasts that are sensitive to moisture and thermodynamics in the lower troposphere. Increasing the number of profiling sites would increase the geographic scope of their impact and provide larger and more consistent improvements in model analyses and short-range ensemble forecasts that are the essence of the WoF credo. This is consonant with the goals of the National Research Council's 2009 report [26] on the wisdom of establishing a nationwide network of networks capable of collecting observations representative of mesoscale phenomena.