Impact of Adaptively Thinned GOES-16 Cloud Water Path in an Ensemble Data Assimilation System

: Assimilation of cloud properties in the convective scale ensemble data assimilation system is one of the prime topics of research in recent years. Satellites can retrieve cloud properties that are important sources of information of the cloud and atmospheric state. The Advance Baseline Imager (ABI) aboard the GOES-16 geostationary satellite brings an opportunity for retrieving high spatiotemporal resolution cloud properties, including cloud water path over continental United States. This study investigates the potential impacts of assimilating adaptively thinned GOES-16 cloud water path (CWP) observations that are assimilated by the ensemble-based Warn-on-Forecast System and the impact on subsequent weather forecasts. In this study, for CWP assimilation, multiple algorithms have been developed and tested using the adaptive-based thinning method. Three severe weather events are considered that occurred on 19 July 2019, 7 May and 21 June 2020. The superobbing procedure used for CWP data smoothed from 5 to 15 km or more depending on thinning algorithm. The overall performance of adaptively thinned CWP assimilation in the Warn-on-Forecast system is assessed using an object-based veriﬁcation method. On average, more than 60% of the data was reduced and therefore not used in the assimilation system. Results suggest that assimilating less than 40% of CWP superobbing data into the Warn-on-Forecast system is of similar forecast quality to those obtained from assimilating all available CWP observations. The results of this study can be used on the beneﬁts of cloud assimilation to improve numerical simulation.


Introduction
Earth system forecasting analysis systems rely on ensemble data assimilation (EDA) methods to convert observations into corrections for Earth system model variables [1][2][3][4][5]. EDA methods were developed as computationally feasible to solve the nonlinear filtering problem [3]. This led to sequential ensemble Kalman filter (EnKF) methods to improve assimilation error characteristics and decreased computational cost [6][7][8][9][10][11][12][13][14]. EnKF methods can implement large models and observation sets with less effort. For a given model, executing an EnKF system requires only an observation operator that maps from model space to observation space. The relative simplicity of implementing EnKF has led to a rapid increase in the number of EDA applications. However, EnKF algorithms are challenged by nonlinear relations between model state variables and observations, nonlinear model error and by errors in the observing system [15,16]. The deficiency of EnKF includes low prior variance and a spurious correlation between variables. The two important algorithms are required for better performance of the EnKF system including the localization of the observation impact and covariance inflation [17,18]. An adaptive covariance inflation algorithm has been used to inflate the spread of the state vector to maintain the ensemble spread.
In recent years, EDA of the cloud water path (CWP) from high-resolution geostationary satellite in the convective scale numerical models is one of the prime research topics. Satellite retrieved cloud components are valuable sources of information because of the Meteorology 2022, 1 514 complex nonlinear physical processes associated with cloud development [19][20][21]. Many studies have discussed the cloud analysis system and their use to create a more accurate initial state [22,23]. There are two ways to assimilate cloud properties from satellite data, direct and indirect. In direct assimilation, radiative transfer model (RTM) is used as a forward model to generate simulated value into the satellite radiance in observation space. The well-known operationally using RTM models are the Community Radiative Transfer Model (CRTM) [24] and the Radiative Transfer for TOVS (RTTOV) [25]. The advantage of a direct method is to avoid uncertainties in the retrieval algorithms that depend on satellite [26]. In general, the direct assimilation method performs well in clear-sky situations, but this may be very resource-intensive depending on resolution and the number of spectral bands being assimilated [27][28][29][30][31]. The assimilation of satellite data under clear skies has been extensively researched and is now well established. However, assimilation of satellite data in the presence of cloud is a highly nonlinear problem and a variety of strategies are being investigated to address this issue [32,33]. The area where clouds and precipitation occur has a significant impact on how accurately the weather prediction system predicts the future. The difficulty associated with cloudy radiance assimilation is the uncertainty of hydrometeors simulation in numerical weather prediction and the inaccuracy of the RTM [34,35].
The second approach to assimilating the cloud properly is to use satellite retrieved products in the data assimilation system. Assimilating retrieved cloud products is more friendly because of the simplicity of the observation operator, data reliability and largely untapped source of information [36,37]. Several research groups are actively working on indirect assimilation of cloudy properties for initializing hydrometeors [19,[38][39][40][41]. The satellite cloud products are retrieved from visible and infrared spectral bands. The retrieved cloud products are cloud liquid water path (LWP) and cloud ice water path (IWP). The cloud water path (CWP) represents the LWP and IWP together for simplicity. Many studies discuss the importance of assimilating retrieved CWP for severe weather prediction [41,42]. Assimilating satellite retrieved CWP provides a potential solution to convective initiation and can improve precipitation forecast for severe weather events [41]. The primary drawback to using retrieval cloud products is the difficulty in estimating uncertainties inherent in that retrieval.
Further, all assimilation systems need good and valuable observation that can improve the forecast skill. During the assimilation process, it is important to understand the importance of assimilated observation and how each of the assimilated observations affect analysis and forecasts skill [43,44]. This process includes fine-tuning the data assimilation system including observation error, quality control and observation thinning [45]. In addition, EDA is a highly computationally intensive process, it is important to include those observations that will improve the assimilation system in order to reduce the computing load during assimilation process. When incorporating dense satellite observation into the lower resolution data assimilation system, the observation thinning is important [46,47]. Satellite data thinning methods and the importance of reducing the assimilating observation are discussed in many studies [48][49][50]. One of the essential roles of data thinning is to avoid the difficulty of explicitly treating observation error correlation in an assimilation system [50,51]. Too many observations in the data assimilation system might lead to sub-optimal states since the assimilation system ignores observation correlation. Before assimilation, thinning and superobbing are the two common techniques used for reducing the number of observations and the correlation between observation errors [52,53]. These two methods are static in nature, and it is unknown how much thinning is ideal for various model resolution.
This study assesses the impact of assimilating GOES-16 satellite retrieved CWP with adaptive thinning that are assimilated by the ensemble-based GSI-EnKF system. Since assimilation of hydrometeors is particularly important, it is important to investigate innovative approaches to optimize the number of CWP observations for the assimilation system. Multiple sensitivity experiments for three convective high-impact weather events that occurred on 19 July 2019, 7 May and 21 June 2020 have been performed. The observation increment statistics are computed using observation and the simulated value (model background) at the observed location for quality assessment and impact on the mesoscale environment within the model. Following the introduction, Section 2 of this paper describes the satellite retrieved CWP products, the assimilation and forecast system, and the overall experiment design. Results from this adaptively thinned application of a revised GOES-16 CWP on composite reflectivity (dBZ) and 2-5 km updraft helicity (UH) forecasts are provided in Section 3. Finally, the summary with discussion and conclusion are offered in Section 4.

GOES-16 Cloud Water Path
Over the continental United States, the Advanced Baseline Imager (ABI) aboard the GOES-16 satellites generate high spatiotemporal resolution satellite-retrieved CWP. The GOES-R series of geostationary satellites is a collaborative development and acquisition effort between NOAA and NASA. The GOES-16 having 16 spectral bands provides continuous imaginary and atmospheric measurements at various time intervals. More information about the GOES-16 ABI and its spectral bands can be found in [54,55].
Cloud properties are retrieved from the 1 and 2 km GOES-16 imager radiances for cloudy pixels using the multispectral retrieval algorithm [56]. The satellite-retrieved CWP is obtained using the Satellite Cloud and Radiation Property Retrieval System (SatCORPS). The NASA Langley research center developed the retrieved algorithm [57]. The daytime CWP data are retrieved using channel (2, 7) having wavelength of 0.64 µm (visible band) and 2.25 µm (infrared band). For daytime CWP data, the visible infrared solar-infrared splitwindow technique (VISST) is used by the Clouds Algorithm Working Group. The CWP observations are used to determine cloud optical and microphysical properties (COMP). The COMP measures include water/ice cloud optical depth (COD), cloud particle size (CPS), LWP and IWP for all cloudy pixels. This is only for the cloud top particle size, which can be seen from passive remote sensors. The LWP is the cloud water path associated with only liquid-phase clouds, and the IWP includes the CWPs that have ice and mixedphase clouds. LWP and IWP (in g m −2 ) measure the total water mass in a cloud column and are derived from COD and CPS. The distribution of COD and CPS describes the radiative properties of a cloud and characterizes the effects of the clouds on solar energy and Earth's radiative budget [58]. CPS represents the ratio from the distribution volume of the cloud particles over its cross-area and is an effective parameter for size distribution. Cloud thickness is estimated with a parameterization based on COD, CWP and cloud temperature. Daytime CWP data were only collected from 0.64 µm and 2.25 µm spectral bands assimilated in this study. Assimilation of nighttime CWP data is not addressed in this work. The nighttime CWP data are retrieved using the Shortwave-infrared infrared Split-window Technique (SIST) algorithm. A separate study is needed to understand the error characteristics and biases to assimilated nighttime CWP observation.

Ensemble Data Assimilation and Model Configuration
Tremendous efforts have been devoted to convective-scale EDA using in situ, radar and satellite observation on the NOAA's ensemble-based Warn-on-Forecast project [59][60][61][62]. The Warn-on-Forecast system has developed a probabilistic, convective-scale, EDA and forecasting system. The system uses to improve short-term forecasts of high-impact weather events. All observations were assimilated using the Grid-point Statistical Interpolation (GSI) and Ensemble Kalman Filter (EnKF) data assimilation techniques (GSI-EnKF). The EnKF method used in this study to assimilate observations is called an ensemble square root filter (EnSRF) [63]. Observations are assimilated serially with the assumption that errors in observation are uncorrelated. During each observation assimilation at each time, the ensemble members and mean are updated using the following equations, where β is the reduced-gain factor, K is the Kalman gain, y o is the observation, H is the observation operator, N is the number of ensemble members (N = 36 in this study), n is an index for a particular ensemble member, x represents the entire model state, x represents a particular model field at a grid point, superscript a indicates a posterior estimate, superscript b indicates a prior estimate, an overbar indicates an ensemble mean, σ is the observation-error standard deviation, W is a localization factor. The fifth-order piecewise rational function of Gaspari-Cohn distribution has been used to reduce the spurious correlation between remotely located variables [64]. The Warn-on-Forecast system uses the advanced research version of the Weather Research and Forecasting Model (WRF3.9) at 3 km grid spacing similar to the real-time version (https://wof.nssl.noaa.gov/configuration.php (accessed on 22 November 2022)). All types of observations (Table 1) are assimilated using GSI as an observation operator and EnKF analysis system [65] at 15 min intervals from 1800 UTC until 0300 UTC the next day. The initial and boundary conditions are provided by the High-Resolution Rapid Refresh ensemble (HRRRE) as part of the real-time experiment in 2019 and 2020. Multiphysics options were applied to HRRRE members to initialize 36 ensemble members. The Warn-on-Forecast system produced 18-member short-term, real-time, probabilistic forecasts every half-hour from 1900 to 0300 UTC on the day of the event. Cloud microphysics processes are handled by WRF NSSL two-moment scheme [66]. To maintain ensemble spread, the spatially and temporally varying adaptive inflation [67] is applied to the prior ensemble estimate at the outset of each assimilation step. To help initiate convection near highreflectivity regions, smooth and random additive noise is introduced to the model variables of each ensemble member to maintain ensemble spread [68]. Table 1 lists the assimilated observation type, covariance localizations and the related observation errors used in this study. During the observation preprocess, the high-resolution radar observations are resampled to a 5 km grid using the Cressman scheme [69]. Multiple studies discuss the importance of radar data in accurately predicting matured convection and the related high-impact supercell [59,70]. It should be noted that during convective initialization, assimilation of radar data is not captured in the nonprecipitation phase of clouds [71]; hence, for convective initialization, assimilation CWP data assimilation plays an important role. For mature convection, and from the finding by [72], assimilation of all available CWP observations degraded the forecast skill in the analysis. So, in the high reflectivity region (>40 dBZ), CWP data are not assimilated. This thinning method is applied when both CWP and radar reflectivity is assimilated and are now in real-time operational setup. The covariance localizations for CWP were set to 36 km and 0.9 scale height (SH) horizontally and vertically. These localization parameters are similar to those used by [63,64]. Further details of ensemble configuration, different observation types and observation errors used are available in the previous papers [62]. A parallax correction is applied to correct geolocation errors for all clouds. The CWP retrieval errors are calculated based on the uncertainty characteristics of the derived algorithm (high confidence; clear-sky to low confidence; cloudy-sky). For CWP > 2.5 kg m −2 , retrieval error is higher (0.25 kg m −2 ) because of the high expected uncertainty of thick cloud [41,73]. For CWP = 0 or CWP < 0.025, the retrieval error is 0.025 kg m −2 since the high confidence exits over the clear-sky areas ( Table 2).

CWP Assimilation Techniques
Before assimilation, CWP observations optimize using an adaptive technique. Our technique removed observations that minimally impacted analyses and forecasts or added noise to the model state [62]. In addition, from the preliminary experiment, it has been observed that CWP observations are most likely to reduce hydrometer spread in areas where QCLOUD and QICE variance is significant. To accomplish this, first separately calculated the ensemble standard deviation (ESD) of column-maximum model mixing ratios of cloud liquid water (QCLOUD) and cloud ice (QICE). To minimize the effects of noise in the variance fields, the average ensemble column-maximum QCLOUD and QICE ESD was also computed at each grid point within a 15 km × 15 km (5 × 5 grid point) neighborhood. CWP observations were then only used in the assimilation system if the ESD of QCLOUD or QICE within each 15 km grid box exceeded predefined threshold values (0.05, 0.2 and 0.4 gm kg −1 ). Thus, superobbing procedure smoothed the CWP data that was converted from 5 km to 15 km resolution and uses only those observations at each grid point where the ensemble variances of QCLOUD and QICE are higher than predefined threshold values. Thus, CWP observation was assimilated only if the 36-member ESD of model-simulated QCLOUD or QICE within 15 km boxes exceeded the threshold values.
These three threshold values (0.05, 0.2 and 0.4 gm kg −1 ) distinguished three separate experiments named QCLD-0.05, QCLD-0.2 and QCLD-0.4, respectively, for QCLOUD and QICE-0.05, QICE-0.2 and QICE-0.4 for QICE. This adaptive-based CWP thinning method and the threshold values are similar to that proposed by [74] to assimilate radar radial velocity by reducing the effects of noise in the variance of column-maximum updraft speed. It should be noted that the variance threshold value depends on many factors. This includes ensemble size, model resolution, model physics, observation error and weather conditions. The first set of CWP data assimilation experiments included a total of seven experiments for the 19 July 2019 weather event. Six of these experiments used adaptive-based CWP data thinning methods with the three ESD threshold values for each model-simulated QCLOUD and QICE. The remaining experiment included all CWP data without any filtering (CWP-ALL). To calculate model-simulated CWP at the observed location, a forward operator is used to integrate the model hydrometeor mixing ratios of cloud liquid water, cloud ice, rain, graupel and snow. For further details, including CWP assimilation technique, the forward operator and CWP observation error, the reader is referred to [72,73].

19 July 2019 Event
The 19 July 2019 event included 7 tornado, 27 severe hail and 72 high wind reports over the study domain ( Figure 1b, Table 3). In this event, a large convection complex was moving rapidly from Minnesota to Wisconsin. A few reports of tornadoes and severe winds had already occurred along the storm path. Table 3 shows the Storm Prediction Center reports of tornadoes, high winds and severe hail within the simulation domain of each event. To determine the optimal configuration adaptive-based CWP thinning method and ESD threshold values with QCLOUD and QICE, we first performed six sensitivity experiments using this method (QCLD-0.05, QCLD-0.2, QCLD-0.4, QICE-0.05, QICE-0.2, QICE-0.4) with data from the 19 July 2019 weather event. Figure 1 shows the observed MRMS composite reflectivity and the spatial distribution of assimilated CWP observations at 2100 UTC using different data thinning methods over the study domain. Our results indicate that the percentage of assimilated CWP observations is proportional to the ESD of model simulated QCLOUD and QICE (Table 4). For example, only 56% of CWP were assimilated in QCLD-0.4 experiment, whereas that percentage decreased to 22% in QCLD-0.05 experiment. The assimilated percentage of observations was calculated as the ratio of the number of assimilated thinned CWP to the total number of available CWP assimilated. On average, more than 60% of CWP data are not assimilated in the Warn-on-Forecast system. Table 3. Total number of tornado, hail (diameter more than 1.0 inch; severe hail) and high-wind reports between the time 1500 and 0500 UTC the following day and within the simulated domain.      Figure 2a shows the number of assimilated CWP during each 15 min assimilation cycle and from 1800 UTC one day to 0300 UTC the following day. The number of observations gradually decreases from 1800 to 1900 UTC, and from 1900 UTC to 2300 UTC, the number of assimilated observations is consistent across all the experiments. However, the number of observations increased again after 2300 UTC because the cloud information varies from each assimilation cycle. No CWP observations were assimilated after 0100 UTC because only daytime CWP data were assimilated. The average number of assimilated CWP observations when using the adaptive-based data thinning method was less than half what it was for the CWP-ALL experiment. We used all six adaptive-based data thinning methods (Figure 2b,c), as well as the CWP-ALL method, to calculate the observation increment (innovation, O-B) statistics for each 15 min assimilated cycle from 1800 UTC on 19 July 2019 until 0300 UTC the following day. "Innovation" was defined as the difference between the observation and ensemble means either prior (forecast) or posterior (analysis) to the weather analysis. This measure was used to correct the model fields, which produces a more accurate analysis for the next forecast cycle. Prior bias (O-B) for CWP observations in all experiments ranged from 0 to 1 kg m −2 , and this value decreased by more than 50% after assimilation (O-A). The positive value for prior bias indicates that the model background has systematic biases and likely underestimates results. We also noted that the values of O-B and O-A followed a similar path in all six adaptive-based data thinning methods we studied as well as the CWP-ALL approach (Figure 2). Similar patterns of positive bias were also observed for the other two weather events (Table 5). Average prior biases (O-B) in the CWP-TH experiment were 0.47 and 0.31 kg m −2 for the 7 May 2020 and 21 June 2020 weather events, respectively. These biases are all positive, as with the CWP-ALL experiment in which the average prior biases were 0.37 and 0.35 kg m −2 for these weather events (Table 5). In terms of posterior bias (O-A), all three events perform better to reduce the error. Thus, despite using less than 60% of the observations used by CWP-ALL, the CWP-TH approach has a similar effect on the assimilation system. information varies from each assimilation cycle. No CWP observations were assimilated after 0100 UTC because only daytime CWP data were assimilated. The average number of assimilated CWP observations when using the adaptive-based data thinning method was less than half what it was for the CWP-ALL experiment. We used all six adaptivebased data thinning methods (Figure 2b,c), as well as the CWP-ALL method, to calculate the observation increment (innovation, O-B) statistics for each 15 min assimilated cycle from 1800 UTC on 19 July 2019 until 0300 UTC the following day. "Innovation" was defined as the difference between the observation and ensemble means either prior (forecast) or posterior (analysis) to the weather analysis. This measure was used to correct the model fields, which produces a more accurate analysis for the next forecast cycle. Prior bias (O-B) for CWP observations in all experiments ranged from 0 to 1 kg m −2 , and this value decreased by more than 50% after assimilation (O-A). The positive value for prior bias indicates that the model background has systematic biases and likely underestimates results. We also noted that the values of O-B and O-A followed a similar path in all six adaptive-based data thinning methods we studied as well as the CWP-ALL approach (Figure 2). Similar patterns of positive bias were also observed for the other two weather events (Table 5). Average prior biases (O-B) in the CWP-TH experiment were 0.47 and 0.31 kg m −2 for the 7 May 2020 and 21 June 2020 weather events, respectively. These biases are all positive, as with the CWP-ALL experiment in which the average prior biases were 0.37 and 0.35 kg m −2 for these weather events (Table 5). In terms of posterior bias (O-A), all three events perform better to reduce the error. Thus, despite using less than 60% of the observations used by CWP-ALL, the CWP-TH approach has a similar effect on the assimilation system.   Simulated radar reflectivity compared with the observed MRMS composite reflectivity to assess the impact of CWP assimilation for reflectivity forecast. Forecasts initialized at Meteorology 2022, 1 521 2100 UTC showed that all seven experiments generated accurate one hour forecasts of the eastward-moving convection in Minnesota (Figure 3). All the thinned CWP experiments performed slightly better in comparison to CWP-ALL, as it accurately forecast storm intensity and storm location. In addition to that, we assessed the overall performance of assimilating CWP using an object verification method [75,76]. To quantify the forecast performance of the composite reflectively and the updraft helicity, the categorical performance diagrams [77,78] are computed and aggregated over 3 h forecasts from 1900 UTC to 0300 UTC. Performance diagrams were first used to summarize the impact of assimilated CWP on reflectivity and 2-5 km UH (Figure 4) forecasts at 1 h and 2 h forecast times. For the 19 July 2019 event, the statistics were computed using all forecasts initiated from 1900 to 0300 UTC for all eight experiments tested using the 19 July 2019 event. Table 4 lists the total number of reflectivity and rotation objects accumulated in the adaptive-based CWP assimilation methods. With respect to reflectivity and 2-5 km UH, overall, 1 h to 2 h forecast skill was similar across the CWP-ALL and all thinned CWP experiments (Figure 4). On average, QCLD-0.4 (CWP-TH) achieved similar forecast skill as CWP-ALL with respect to the probability of detection and the success ratio. CWP-TH performed even better with 2 h forecasts relative to CWP-ALL; the differences were even less for 2-5 km UH verification, but always in favor of CWP-TH. Simulated radar reflectivity compared with the observed MRMS composite reflectivity to assess the impact of CWP assimilation for reflectivity forecast. Forecasts initialized at 2100 UTC showed that all seven experiments generated accurate one hour forecasts of the eastward-moving convection in Minnesota (Figure 3). All the thinned CWP experiments performed slightly better in comparison to CWP-ALL, as it accurately forecast storm intensity and storm location. In addition to that, we assessed the overall performance of assimilating CWP using an object verification method [75,76]. To quantify the forecast performance of the composite reflectively and the updraft helicity, the categorical performance diagrams [77,78] are computed and aggregated over 3 h forecasts from 1900 UTC to 0300 UTC. Performance diagrams were first used to summarize the impact of assimilated CWP on reflectivity and 2-5 km UH (Figure 4) forecasts at 1 h and 2 h forecast times. For the 19 July 2019 event, the statistics were computed using all forecasts initiated from 1900 to 0300 UTC for all eight experiments tested using the 19 July 2019 event. Table 4 lists the total number of reflectivity and rotation objects accumulated in the adaptive-based CWP assimilation methods. With respect to reflectivity and 2-5 km UH, overall, 1 h to 2 h forecast skill was similar across the CWP-ALL and all thinned CWP experiments (Figure 4). On average, QCLD-0.4 (CWP-TH) achieved similar forecast skill as CWP-ALL with respect to the probability of detection and the success ratio. CWP-TH performed even better with 2 h forecasts relative to CWP-ALL; the differences were even less for 2-5 km UH verification, but always in favor of CWP-TH.   Results show that QCLD-0.4 is the optimal configuration (Table 4). We therefore validated this method with QCLD-0.4 (hereafter CWP-TH) by studying two more weather events, 7 May and 21 June 2020. Results from the optimal CWP-TH values are discussed in the subsequent section along with all assimilated CWP (CWP-ALL) for the two weather events.

7 May and 21 June 2020 Events
An isolated supercell thunderstorm produced a long swath of hail across northern Texas on 7 May 2020. This storm was one of the most interesting weather events of 2020 for calculating ice particle size using satellite images [79]. The event was also well-described by several studies, including "Hail-producing supercell thunderstorm in Texas" [80], "Where will convective initiation occur?" [81] and "Supercell splits, marches along Texas-Oklahoma border" [82]. Multiple thunderstorms occurred over the northwest part of Oklahoma during the evening and night, and numerous hail reports were near southwest Oklahoma. A total of 79 hail and 76 wind events were reported over the study domain. There were no reports of tornadoes during this event (Table 3). On 21 June 2020, a higher-end severe storm affected portions of central Kansas and south-central. Numerous Results show that QCLD-0.4 is the optimal configuration (Table 4). We therefore validated this method with QCLD-0.4 (hereafter CWP-TH) by studying two more weather events, 7 May and 21 June 2020. Results from the optimal CWP-TH values are discussed in the subsequent section along with all assimilated CWP (CWP-ALL) for the two weather events.

7 May and 21 June 2020 Events
An isolated supercell thunderstorm produced a long swath of hail across northern Texas on 7 May 2020. This storm was one of the most interesting weather events of 2020 for calculating ice particle size using satellite images [79]. The event was also well-described by several studies, including "Hail-producing supercell thunderstorm in Texas" [80], "Where will convective initiation occur?" [81] and "Supercell splits, marches along Texas-Oklahoma border" [82]. Multiple thunderstorms occurred over the northwest part of Oklahoma during the evening and night, and numerous hail reports were near southwest Oklahoma. A total of 79 hail and 76 wind events were reported over the study domain. There were no reports of tornadoes during this event (Table 3). On 21 June 2020, a higher-end severe storm affected portions of central Kansas and south-central. Numerous severe storms occurred mainly across parts of southern Kansas, Nebraska and Oklahoma from early evening into the night. Damaging winds and isolated large hail occurred across the study domain. The number of assimilated observations when using CWP-TH was similarly reduced for the two weather events in the year 2020 (Table 6). Precisely, only 39% and 31% of observations were assimilated with CWP-TH relative to CWP-ALL for the 7 May 2020 and 21 June 2020 weather events. Figure 5 shows the observed MRMS composite reflectivity and the spatial distribution of assimilated CWP observations with and without adaptive-base data thinning methods over the study domain and for the two events. severe storms occurred mainly across parts of southern Kansas, Nebraska and Oklahoma from early evening into the night. Damaging winds and isolated large hail occurred across the study domain. The number of assimilated observations when using CWP-TH was similarly reduced for the two weather events in the year 2020 (Table 6). Precisely, only 39% and 31% of observations were assimilated with CWP-TH relative to CWP-ALL for the 7 May 2020 and 21 June 2020 weather events. Figure 5 shows the observed MRMS composite reflectivity and the spatial distribution of assimilated CWP observations with and without adaptive-base data thinning methods over the study domain and for the two events.  To validate the performance of CWP-TH and CWP-ALL, two more weather events, 7 May and 21 June 2020 are considered. Hourly reflectivity forecasts initialized at 0000 UTC on 8 May 2020 are generated for CWP-TH and CWP-ALL experiments. Both the CWP-TH and CWP-ALL experiments generate similar forecasts including the eastwardmoving convection in northern Texas ( Figure 6) and produced few false alarms over the west part of the Oklahoma region. No significant differences were observed among the two experiments for forecasts initialized at 2300 UTC on 21 June 2020 (Figure 7). Both the CWP-TH and CWP-ALL experiments correctly forecast the evolution of convective cells over Kansas. Qualitatively, both approaches improved forecasts of convection. To validate the performance of CWP-TH and CWP-ALL, two more weather events, 7 May and 21 June 2020 are considered. Hourly reflectivity forecasts initialized at 0000 UTC on 8 May 2020 are generated for CWP-TH and CWP-ALL experiments. Both the CWP-TH and CWP-ALL experiments generate similar forecasts including the eastward-moving convection in northern Texas ( Figure 6) and produced few false alarms over the west part of the Oklahoma region. No significant differences were observed among the two experiments for forecasts initialized at 2300 UTC on 21 June 2020 (Figure 7). Both the CWP-TH and CWP-ALL experiments correctly forecast the evolution of convective cells over Kansas. Qualitatively, both approaches improved forecasts of convection.     An object-based verification method used to further assess the overall performance of both CWP-TH and CWP-ALL assimilation experiments. The total number of reflectivity and rotation objects accumulated in the CWP-TH and CWP-ALL experiments and for each individual weather event are calculated. The CWP-TH experiment generated a higher number of reflectivity objects as compared to CWP-ALL for both events (Table 6). In case of rotation objects, CWP-ALL generated a higher object on 7 May 2020 but less on the 21 May 2020 event. This is presumably because CWP-ALL experiments generated higher biases of reflectivity and, consequently, a higher number of reflectivity objects, and hence more convection. On average, CWP-TH generated a higher number of reflectivity objects across two weather events.
On 7 May 2020, for the 1 h and 2 h reflectivity forecast (Figure 8a,b) both CWP-TH and CWP-ALL experiments performed best. Overall forecast skill was higher and remained stable throughout the forecast period, showing POD values of >0.7. CSI values were between 0.4 and 0.6, and the success ratio was between 0.5 and 0.7, at 1 h and 2 h forecast times. On 21 June 2020, the POD of CWP-TH experiment was higher (>0.7) for 1 h reflectivity forecast, which reduced to 0.5 (Figure 8c,d). This may be because CWP-TH produce higher biases for reflectivity forecasts. Overall, both CWP-TH and CWP-ALL performed poorly at 1 h and 2 h forecast times, while the differences between the experiments were generally small at 2 h forecast. Among the 1 h 2-5 km UH forecasts for 7 May and 21 June 2020, CWP-TH performed better ( Figure 9). While the differences between the CWP-TH and CWP-ALL experiments were generally small for 2 h forecast. In addition, the forecast performance is poor for 2 h forecasts and for both experiments.
Meteorology 2022, 2, FOR PEER REVIEW 13 An object-based verification method used to further assess the overall performance of both CWP-TH and CWP-ALL assimilation experiments. The total number of reflectivity and rotation objects accumulated in the CWP-TH and CWP-ALL experiments and for each individual weather event are calculated. The CWP-TH experiment generated a higher number of reflectivity objects as compared to CWP-ALL for both events (Table 6). In case of rotation objects, CWP-ALL generated a higher object on 7 May 2020 but less on the 21 May 2020 event. This is presumably because CWP-ALL experiments generated higher biases of reflectivity and, consequently, a higher number of reflectivity objects, and hence more convection. On average, CWP-TH generated a higher number of reflectivity objects across two weather events.
On 7 May 2020, for the 1 h and 2 h reflectivity forecast (Figure 8a,b) both CWP-TH and CWP-ALL experiments performed best. Overall forecast skill was higher and remained stable throughout the forecast period, showing POD values of >0.7. CSI values were between 0.4 and 0.6, and the success ratio was between 0.5 and 0.7, at 1 h and 2 h forecast times. On 21 June 2020, the POD of CWP-TH experiment was higher (>0.7) for 1 h reflectivity forecast, which reduced to 0.5 (Figure 8c,d). This may be because CWP-TH produce higher biases for reflectivity forecasts. Overall, both CWP-TH and CWP-ALL performed poorly at 1 h and 2 h forecast times, while the differences between the experiments were generally small at 2 h forecast. Among the 1 h 2-5 km UH forecasts for 7 May and 21 June 2020, CWP-TH performed better ( Figure 9). While the differences between the CWP-TH and CWP-ALL experiments were generally small for 2 h forecast. In addition, the forecast performance is poor for 2 h forecasts and for both experiments.

Summary
The unique contribution made by this study is to discuss the importance of assimilating adaptive-based thinned retrieved CWP data into the Warn-on-Forecast system. A multiple sensitivity experiment was performed for three case studies from 2019 and 2020 to assess how altering the CWP observations assimilated by the Warn-on-Forecast system affected forecast quality. On average, more than 60% of the CWP data thinned out; these data were not used in the assimilation system after applying the thinning approach (CWP-TH). Less than 40% of CWP observations were assimilated in the CWP-TH approach. The CWP-TH approach had a similar effect, and in a few cases, improvement was observed in 1 h and 2 h reflectivity and UH forecasts, compared to the CWP-ALL experiments.
Our study is limited to three convective high-impact severe weather events, one from the year 2019 and the two from year 2020. Additional cases are required to understand how general the improvements from assimilating adaptive thinned CWP are and what potential disadvantages it may have compared to assimilating full CWP data sets alone for various environments. In addition, it is important to identify the seasonal variation of the impact of adaptively thinned methods including multiple assimilation systems. The impacts of thinned CWP observations can vary depending on ensemble size, model resolution, cloud interaction, observation error and weather conditions. For better utilization of good and high impact observation on the high-resolution limited model forecasts, additional studies on the impact of adaptive thinning methods are needed using various assimilation and forecast systems. The satellite retrieved CWP is usually associated with relatively large uncertainty because this quantity is derived from cloud optical depth (COD) and cloud top effective radius (CER) based on the adiabatic assumption. There are many error sources that come with uncertainties related to assumptions inherent in physical retrievals and that will make the CWP inaccurate. The greater error of the retrieval algorithm comes from the uncertainties of the model input variable, which includes temperature pressure and hydrometeors. It should be noted that the CWP observation error can vary depending on the methods used to retrieve CWP, the time of day and the spectral channels used. The daytime CWP observation errors used in this study are the functions

Summary
The unique contribution made by this study is to discuss the importance of assimilating adaptive-based thinned retrieved CWP data into the Warn-on-Forecast system. A multiple sensitivity experiment was performed for three case studies from 2019 and 2020 to assess how altering the CWP observations assimilated by the Warn-on-Forecast system affected forecast quality. On average, more than 60% of the CWP data thinned out; these data were not used in the assimilation system after applying the thinning approach (CWP-TH). Less than 40% of CWP observations were assimilated in the CWP-TH approach. The CWP-TH approach had a similar effect, and in a few cases, improvement was observed in 1 h and 2 h reflectivity and UH forecasts, compared to the CWP-ALL experiments.
Our study is limited to three convective high-impact severe weather events, one from the year 2019 and the two from year 2020. Additional cases are required to understand how general the improvements from assimilating adaptive thinned CWP are and what potential disadvantages it may have compared to assimilating full CWP data sets alone for various environments. In addition, it is important to identify the seasonal variation of the impact of adaptively thinned methods including multiple assimilation systems. The impacts of thinned CWP observations can vary depending on ensemble size, model resolution, cloud interaction, observation error and weather conditions. For better utilization of good and high impact observation on the high-resolution limited model forecasts, additional studies on the impact of adaptive thinning methods are needed using various assimilation and forecast systems. The satellite retrieved CWP is usually associated with relatively large uncertainty because this quantity is derived from cloud optical depth (COD) and cloud top effective radius (CER) based on the adiabatic assumption. There are many error sources that come with uncertainties related to assumptions inherent in physical retrievals and that will make the CWP inaccurate. The greater error of the retrieval algorithm comes from the uncertainties of the model input variable, which includes temperature pressure and hydrometeors. It should be noted that the CWP observation error can vary depending on the methods used to retrieve CWP, the time of day and the spectral channels used. The daytime CWP observation errors used in this study are the functions of the CWP value.
For nighttime CWP observation error and to assimilate, a separate investigation is needed and is left for future work.
A previous study by [62] highlighted that applying adaptive thinning methods is essential for GOES-16 all-sky radiance data optimization and improving severe weather forecast skills. Future investigations will combine all-sky radiance and CWP with appropriate data thinning and bias correction to utilize all the available and more accurate information. Satellite data thinning is essential because too many observations over a similar location might lead to sub-optimal states by the assimilated observation. The method may further improve the analysis and forecast skills and is left for future work. Further studies using machine-learning methods (artificial intelligence, deep learning) [83,84] are essential to accurately derive and assimilate bias-corrected cloud properties in the assimilation system.