In this section, the datasets used to train the model for hail detection are presented. Then the model conceived to calculate the probability of hail occurrences is presented.

#### 3.1. The MWCC Training Dataset

Two independent datasets were used to quantify hail signals and calculate the statistics of the hail detection method. The first, where MHS data are used to directly estimate the impact of hail on TBs, is extracted from the hailstorms listed in

Table 1. Each event is binned in the two classes of diameters of

Table 2 and then the average TB reduction is calculated.

The second dataset is derived from the 10-year (2000–2009) MHS measurements co-located with hail reports over the US. This matchup dataset already used to develop the prototype algorithm in [

29] was derived from satellite overpasses within 5 min and 25 km distance from hailstorm locations, and restricted to an AMSU-B local zenith angle of 25° in order to reduce the impact of changing FOV size due to the scanning geometry.

As discussed in

Section 2.1, a preliminary classification of clouds through the MWCC is needed for mapping those convective clouds where hail signature can be recognized. Because the interest is in severe storms, only deep convections classified as CO3 have been isolated and assumed to be associated with hail. Let us consider that in this domain only limited areas of the cloud can be related with hail. Thus, when hail is captured the signal is much higher than that of the background no-hail areas, showing a very strong TB depression.

In order to better discriminate all signals associated with hail, the TB variation at 184 GHz

$T{B}_{184}^{var}$ (see Equation (1)) as distributed in the CO3 category has been analyzed. Therefore, three sensitivity classes based on the

$T{B}_{184}^{var}$ values have been defined on the basis of the simple concept that

$T{B}_{184}^{var}$ tends to increase by increasing the amount of growing ice, which is in turn related to the vertical development of convection. The magnitude of

$T{B}_{184}^{var}$ is then related to the convection depth: high values of

$T{B}_{184}^{var}$ mean high convection severity (high scattering) and then high probability of hail.

Table 3 shows the average TBs for each

$T{B}_{184}^{var}$ bin.

As demonstrated by Cecil [

26], the TB magnitude tends to decrease with hail diameter and the impact of hail size is modulated by the observing frequency: high frequencies sense the scattering from a wide range of hail diameters while lower frequencies miss the majority of the scattering signal from smaller ice particles and their use is usually limited to very large hailstones.

Table 3 shows that the average TBs decrease with increasing hail diameters and

$T{B}_{184}^{var}$ values. Generally, high

$T{B}_{184}^{var}$ values correspond to low TBs reaching the minimum values for d > 10 cm.

A detailed analysis demonstrates that when

$T{B}_{184}^{var}$ < 25% the TBs show no noticeable reduction from d lower to higher than 10 cm. However, a comparison between the two first rows of

Table 3 highlights that averaged TBs are apparently higher for d > 10 cm with respect to those associated with smaller d. This result appears physically contradictory, but can be justified by considering that in this class the TB variability is due to several factors linked to the intrinsic dynamics of each class. Hailstorms belonging to the class d < 10 cm are generally characterized by a large variety of ice aggregates coexisting on top of convection towers. This large particle size distribution, when segmented by

$T{B}_{184}^{var}$ values, tends to fill the first two bins with a tendency to aggregate in the first bin where the median diameter value (4.5 cm) belongs. On the contrary, hailstorms where d > 10 cm are often characterized by hail cores clustered in a restricted area of convective clouds. Then large hail distributes in the bins where

$T{B}_{184}^{var}$ > 25% leaving very small frozen hydrometeors (possibly associated with supercooled water) in the first bin where

$T{B}_{184}^{var}$ < 25%. The result is that for

$T{B}_{184}^{var}$ < 25% the ice scattering mostly impacts on the category d < 10 cm but for bins with

$T{B}_{184}^{var}$ > 25% the average TBs largely diverge touching minimum values for d < 10 cm.

The CO3 domain of two hailstorms is segmented in

Figure 5 on the basis of

$T{B}_{184}^{var}$ values. The first sequence of images (top) refers to the Colorado hailstorm (d < 3.5 cm) and shows the unfiltered patterns of CO3 (5a) and those filtered by 15% <

$T{B}_{184}^{var}$ < 25%, 25% <

$T{B}_{184}^{var}$ < 35%, and

$T{B}_{184}^{var}$ > 35% (

Figure 5b,c,d respectively). As seen, only the first two

$T{B}_{184}^{var}$ classes capture hail signatures drawing the distribution of main hail cores leading different hail-size diameters. Consistent with the relatively small hail diameters, no hail signals for

$T{B}_{184}^{var}$ > 35% have been found.

The second series of images displays the event over South Dakota (Vivian) where the measured hail size at the ground was around 11 cm. In this case, the sequencing of the whole dataset (

Figure 5e) with the

$T{B}_{184}^{var}$ thresholds demonstrate as the smaller ice particles (low scattering) are collected by 15% <

$T{B}_{184}^{var}$ < 25%, (

Figure 5f) but the signal associated with hail is sampled by classes

$T{B}_{184}^{var}$ > 25% (

Figure 5g,h). In particular,

$T{B}_{184}^{var}$ > 35% (

Figure 5h) caught the highest TB reductions correlated to the extinction by very large hail.

The investigation on the individual channels reveals the high capabilities of the MHS frequencies in detecting the variations of cloud ice bulk and estimating the hydrometeor properties during deep convection. Nevertheless, not all channels contribute at the same level of significance. Frequencies at 90, 150, and 190 GHz perform better in describing the ice scattering regime of convective cores containing hail and this makes them primary candidates as proxies for hailstorm detection.

A closer examination of the previous results adds some further considerations. First, low perturbations induced by relatively low scattering as described by $T{B}_{184}^{var}$ < 25% are very difficult to be directly associated with hail signatures. Hence, CO3 cloud features belonging to this class cannot be directly related to hail clouds. Second, moderate to high TB reductions observed for 25% < $T{B}_{184}^{var}$ < 35% can be associated to hail but only the 150 and 190 GHz channels revealed high sensitivity to hail size modifications. This is probably due to the sensitivity of these wavelengths to the minimal variations of ice size distribution while the dynamic range at 90 GHz typically senses scattering by large to very large hail as observed by the $T{B}_{184}^{var}$ > 35% class.

The threshold $T{B}_{184}^{var}$ > 25% is then applied to the whole dataset in order to isolate the TB values useful to calibrate the hail detection model.

In summary, we explore the combinations of the most sensitive MHS channels to evaluate the best frequency differences fitting hail signatures. The TB differences (BTD) between 90, 150, and 190 GHz are displayed in

Figure 6. Both combinations including the 90 GHz frequency appear less significant in terms of sensitivity so that they do not recommend themselves for inclusion in the hail detection model. For d < 10 cm the distribution (BT90–BT150) is flat and takes a quadratic shape when d > 10 cm. The difference (BT90–BT190) shows a decreasing exponential distribution when d < 10 cm while the distribution becomes quadratic when d > 10 cm. High sensitivity is demonstrated by the (BT150–BT190) difference exhibiting the same distribution in both hail diameter classes becoming zero for large hail signature as detected by

$T{B}_{184}^{var}$ > 35%. The skill of the (BT150–BT190) difference in evidencing hail scattering fits the conceptual model, which is an exponential distribution as a function of hail diameters. Thus, the hail detection model will be trained by exploiting this BTD.

#### 3.2. AMSU-B Matchup Data

The training data for the hail model stems from a large database of co-located ground hail observations and AMSU-B overpasses. The matchup dataset covers the training period March-September 2000–2009 [

29]. Each satellite overpass within 5 min and 25 km distance from the hailstorm location, and restricted to AMSU-B local zenith angle < 25° to avoid low resolution FOVs and reduce the impact of surface type and atmospheric moisture content [

42], is retained. The training dataset was investigated by using the

$T{B}_{184}^{var}$ thresholds to isolate the TBs correlated with hail signatures. AMSU-B average TBs co-located with surface hail observations are displayed in

Table 4.

Only data corresponding to

$T{B}_{184}^{var}$ > 25% should be considered, where the likelihood of hail signatures is higher. By compacting

Table 3 and

Table 4, the training dataset of the hail detection model is distributed as in

Table 5 by assuming as limits for the TBs the average values of the two tables in the corresponding classes.

63.60% (1834) of data populate the class

$T{B}_{184}^{var}$ < 35% and 36.40% (1048 data) belong to the class

$T{B}_{184}^{var}$ > 35%. Note that the majority of hail sizes shows d < 10 cm with a mode value around 5 cm. Only 324 occurrences (see

Table 3) show d > 10 cm with maximum d = 14 cm. Thus, the model is constrained by a statistical population of hail diameters from a few cm or lower to 14 cm within the average TB range in

Table 6. Finally, the absolute minimum of the two datasets defines the maximum sensitivity to hail scattering of the 150 and 190 GHz channels involved in the detection algorithm.

#### 3.3. The Hail Detection Model

The hail detection model is based on a probability model of growth [

43] that exploits the TBs at 150 and 190 GHz. The general concept is based on the inverse proportionality between the upwelling radiation and hail cross-sections. As demonstrated above, the frequencies at 150 and 190 GHz are the most sensitive to the presence of hail being subject to a modification of their radiation field as a function of hail diameter. Furthermore, although their dynamics is pretty similar, the signal at 150 GHz channel tends to decrease more steeply than at 190 GHz. Therefore, we assume as model for hail detection a modified sigmoidal function as follows:

where

x and

y are the TBs at 150 and 190 GHz, respectively. The difference between these two frequencies is demonstrated to be very sensitive to the growth of ice aggregates particularly when hail size is the order of centimeters. Eq. 2 shows boundaries between 0 with clear sky conditions and 0.99 when hail is observed.

The probability model described by Equation (2) was trained with the entire dataset described in

Table 5 knowing that signatures from very large hail, typically correlated with >8–10 cm, can be found when

$T{B}_{184}^{var}$ >35%.

Equation (2) is also reinforced by a variable inspired by the concept of carrying capacity [

44] of biological species, that is the maximum population size that the environment can sustain indefinitely or on the contrary the maximal load of the environment which modulates the population equilibrium. The carrying capacity is then a sort of regulator of population dynamics typically approximated by a probability model, which controls the occurrence of an overshooting of survival conditions that in long term leads to the collapse of the species. In the case of hail detection, the population (hail) impacts on the upwelling radiation (environment) by increasing the scattering effects (environment sustainability). The growth of hail sizes increases the scattering effects and then decreases the radiation to the satellite.

This process is well described by Equation (2), which approximates the expected probability distribution as a function of decreasing TBs. To govern the dynamics of the probability model by stopping the computation when the sensitivity reaches its maximum (overshooting) and no hail signatures can be recognized (saturation condition), a dynamic carrying capacity was introduced. The carrying capacity attains values in the range [0,1] and is described by the dimensionless variable K,

where

α = 104

K and

x is the TB at 150 GHz, frequency chosen because it reaches saturation conditions faster. For

K << 1 the population load is minimum while for

K = 1 the model reaches the overshooting and the computation stops. A realistic distribution of Equations (2) and (3) fed by data in

Table 6 is reported in

Figure 7. Note that both distributions converge to 1 when the minimum value is reached but largely diverge for the maximum TB.

Therefore, by regressing Equations (2) and (3) we derive a compact general model able to represent the hail size distribution as a function of TB depression. The result is a new function

H(

K) that maps the probability of hail detection as only function of TB150.

The dynamics of the hail detection model when applied to the whole dataset (2882 data) is reported in

Figure 8a. The distribution of probability starts with a carrying capacity about 0.40, which roughly corresponds to TB150 > 270 K and is quite far from hail signatures as found in

Table 6. Thus, for finding a numerical value that fits the hail signal, Equation (4) is adjusted on the basis of TB150 values displayed in

Table 6 used as input to

Figure 8b. The hail detector starts with probability values > 0. 36 (TB150 = 181.30 K) generally associated with graupels or small hail aggregates to large hail (2 < d < 10 cm) then reaches values around 0.53 when TB150 = 152.51 K, which is associated to the transition between large hail (d < 10 cm) to super hail (d > 10 cm) and finally the model stops (saturation condition) when TB150 approaches the critical value 103.70 K, i.e., the absolute minimum found in the training dataset.

It is then clear the advantage of using a compact formula (Equation. (4)) in approximating such a complex interaction mechanism between hail and radiation field which typically is no-linear. As a final comment, ought to be said that since the model has been developed and tested during the “warm” months when hail typically forms (Mar-Sep) such possible spurious effects can be arisen over from frozen soils. Experimental applications demonstrate very few false alarms when the hail detector is integrated into the MWCC computational scheme which is able to screen out the surface effects.