Estimation of Maximum Hail Diameters from FY-4A Satellite Data with a Machine Learning Method

: The magnitude of damage caused by hail depends on its size; however, direct observation or indirect estimation of hail size remains a signiﬁcant challenge. One primary reason for estimations by proxy, such as through remote sensing methods, is that empirical relationships or statistical models established in one region may not apply to other areas. This study employs a machine learning method to build a hail size estimation model without assuming relations in advance. It uses FY-4A AGRI data to provide cloud-top information and ERA5 data to add vertical environment information. Before training the model, we conducted a principal component analysis (PCA) to analyze the highly inﬂuential factors on hail sizes. A total of 18 features, composed of four groups, namely brightness temperature (BT), the difference in BT (BTD), thermodynamics, and dynamics groups, were chosen from 29 original features. Dynamic and BTD features show superior performance in identifying large hail. Although the selected features are more closely correlated to hail sizes than unselected ones, the relationships are complicated and nonlinear. As a result, a two-layer regression back propagation neural network (BPNN) model with powerful ﬁtting ability is trained with selected features to predict maximum hail diameter (MHD). The linear ﬁtting R 2 between predicted and observed MHDs is 0.52 on the test set, which signiﬁes that our model performs well compared with other hail size estimation models. We also examine the model concerning all three hail cases in Shanghai, China, between 2019 and 2021. The model attained more satisfactory results than the radar-based maximum estimated hail size (MEHS) method, which overestimates the MHDs, thus further supporting the operational applications of our model.


Introduction
Hailstorms are severe convective weather systems commonly seen in the spring and summer. Hail is precipitation produced by hailstorms in the form of balls or irregular lumps of ice. Hailstones can be dense, minor, or giant [1,2]. By convention, hail has a diameter of 5 mm or more, and smaller particles of similar origin are classed as either ice pellets or snow pellets. Nevertheless, we do not exclude ice/snow pellets when analyzing hail, and for convenient description, we refer to particles smaller than 5 mm as "small hail" in this study. Being one of the most hazardous weather, hail causes massive damage to residential, commercial, and agricultural properties each year worldwide. Moreover, many studies show that hail damage caused by climate change is constantly increasing [3][4][5][6][7][8].
Hailstone size is closely related to the momentum of its falling, which directly affects the magnitude of damage [9][10][11]. Larger hail would cause more severe damage and significant losses. For instance, crops and plants can be severely damaged by hail with a diameter less than 1 cm; cars can be damaged if hailstones are up to 2 cm [12][13][14]; while Hail event records come from the National Meteorological Information Center (NMIC) of the China Meteorological Administration (CMA), which archives historical hail datasets from weather stations in China's surface meteorological observational network. The MHD has been measured or estimated visually by professionally trained meteorological spotters since the 1950s and has become a routine observation at most human-crewed weather stations since 1980 [70]. One record includes latitude, longitude, date, time, and MHD of hail. The environmental conditions in the Tibetan Plateau are very different from other regions owing to its complex terrain. Therefore, we only focus on the region of 15 • -55 • N, 105 • -135 • E, and the period from March to August 2018 in this study. Excluding those unmatched with satellite and reanalysis data in time and space (see step 1 in Section 2.2), 150 hail events are registered. Figure 1 shows their distribution. Figure 1a exhibits the geographical spread of those hail events in the study area. The histogram in Figure 1b demonstrates the number of hail events over different MHDs. The peak of kernel density estimation of the histogram (orange curve) is located at~5 mm, and we can observe that there are only a few giant hailstones (>20 mm). For characteristic statistics (see Section 3.2), hail events are sorted into three size bins: MHD < 5 mm, 5 ≤ MHD < 8 mm, and MHD ≥ 8 mm, hereafter referred to as S (small size, ice/snow pellets), M (medium size), and L (large size), respectively. As there is no unified classification standard, the criterion is adopted to distinguish the environmental and cloud-top parameters among different size bins. In Figure 1, the pie chart and table show the percentage and number of S-, M-, and L-size bins separately.

Data
This study utilizes three types of data: records of hail events with different MHD, ERA5 reanalysis data, and FY-4A satellite images.
Hail event records come from the National Meteorological Information Center (NMIC) of the China Meteorological Administration (CMA), which archives historical hail datasets from weather stations in China's surface meteorological observational network. The MHD has been measured or estimated visually by professionally trained meteorological spotters since the 1950s and has become a routine observation at most human-crewed weather stations since 1980 [70]. One record includes latitude, longitude, date, time, and MHD of hail. The environmental conditions in the Tibetan Plateau are very different from other regions owing to its complex terrain. Therefore, we only focus on the region of 15°-55°N, 105°-135°E, and the period from March to August 2018 in this study. Excluding those unmatched with satellite and reanalysis data in time and space (see step 1 in Section 2.2), 150 hail events are registered. Figure 1 shows their distribution. Figure 1a exhibits the geographical spread of those hail events in the study area. The histogram in Figure 1b demonstrates the number of hail events over different MHDs. The peak of kernel density estimation of the histogram (orange curve) is located at ~5 mm, and we can observe that there are only a few giant hailstones (>20 mm). For characteristic statistics (see Section 3.2), hail events are sorted into three size bins: MHD < 5 mm, 5 ≤ MHD < 8 mm, and MHD ≥ 8 mm, hereafter referred to as S (small size, ice/snow pellets), M (medium size), and L (large size), respectively. As there is no unified classification standard, the criterion is adopted to distinguish the environmental and cloud-top parameters among different size bins. In Figure 1, the pie chart and table show the percentage and number of S-, M-, and L-size bins separately.  ERA5 reanalysis data are employed to calculate the environmental parameters (see Table 1). ERA5 is the fifth generation of the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalysis designed to replace ERA-Interim [71]. The ERA5 is run by ECMWF's Integrated Forecasting System (IFS cycle 41r2) with new and reprocessed observation datasets at a much higher resolution (31 km) and provides an hourly analysis cycle. The involved physical variables in calculating environmental parameters are listed in Table 2. The environmental parameters are calculated with vertical profiles of these quantities. FY-4A satellite image data are adopted to calculate hailstorms' cloud-top parameters (see Table 3). FY-4A is a new generation of Chinese geostationary meteorological satellite launched on 11 December 2016, and positioned at 99.5 • E (it drifted to 104.7 • E on 25 May 2017) above the equator. The advanced geosynchronous radiation imager (AGRI) of FY-4A has 14 channels, containing 6 visible/near-infrared, 2 mid-infrared, and 6 infrared channels. However, only six infrared channels centered at 6.25, 7.10, 8.50, 10.8, 12.0, and 13.5 µm are accepted the data for consistency between day and night. The spatial resolution of these channels is 4 km at the sub-satellite point. Therefore, the China regional image data are adopted because they have a higher temporal resolution (~5 min) than the full-disk images (~15 min).

Method
Three steps from satellite and ERA5 data establish the machine learning model for MHDs estimation (see Figure 2  1. Sample preparation: This step calculates hailstorms' environmental and cloud-top parameters to serve as sample features. First, we identify storms on the satellite images with an approach introduced by Liu et al. [106]. Their method can effectively Step 2: feature selection; X m×n is X m×n centered and standardized, P m×n is the principal component matrix, and C m×m is the coefficient matrix; c ij is the C m×m element representing the coefficient of jth feature on ith PC. (3) Step 3: model establishment; X m ×n is the sample matrix with selected features and is divided into training (70%, n tr = 105), validation (15%, n v = 23), and test (15%, n ts = 22) sets for model training and evaluation; a two-layer BPNN regression model with 35 hidden and 1 output units is optimized.

1.
Sample preparation: This step calculates hailstorms' environmental and cloud-top parameters to serve as sample features. First, we identify storms on the satellite images with an approach introduced by Liu et al. [106]. Their method can effectively reduce the impact of the anvil clouds. Second, the closest storm to an observed hail event in time and space is marked as a hailstorm. Moreover, the time interval between the storm and hail event should be less than 5 min (satellite time resolution), and the storm should be within 20 km (~5 satellite pixels) from the hail event location. The MHD of the hail event will be assigned to the matched hailstorm. The largest MHD will be accepted if more than one hail event matches the same storm. Third, each pixel or grid's environmental and cloud-top parameters within the hailstorm boundary are calculated with FY-4A satellite or ERA5 reanalysis data. The final parameters of the hailstorm are the average of all grid or pixel values. Figure 3 demonstrates a hailstorm sample at 09:53 (UTC) on June 6th, 2018. After this step, we obtain 150 hailstorm samples, each of which has 29 features. The original sample . .  2. Feature selection: This step employed a PCA technique [107,108] to select appropriate features to train the machine learning model. The input to PCA is the matrix x , representing the mean and variance of i x . After PCA calculation, the output is the principal component matrix:

2.
Feature selection: This step employed a PCA technique [107,108] to select appropriate features to train the machine learning model. The input to PCA is the matrix var(x i ), representing the mean and variance of x i . After PCA calculation, the output is the principal component matrix: where C m×m is the coefficient matrix and c ij , i, j = 1, ..., m is the (i, j) element. The percentage of total variance explained by each principal component (PC) descends from p 1 to p m . The first two PCs explain more than half of the variance, so only p i , i ≤ 2 is used to select features. As shown in Equation (1), a PC is a linear combination of feature vectors, and the coefficient of each feature vector represents its contribution to the PC. The larger magnitude of the coefficient is, the more significant contribution of the feature vector could make. Thus, we can select features that differ among hail samples by checking whether they have a larger magnitude of coefficients or not.  Figure 2). The training set X 18×105 is input to the BPNN, and the associated target outputs are the corresponding MHDs. Levenberg-Marquardt optimization [109] updates the weight and bias values. The active functions of the hidden and output layer are sigmoid and linear, respectively. The mean square error function measures the loss. We try the hidden layer size from 20 to 50 with an interval of 5, and it proves that the size of 35 has the best performance on the validation set X 18×23 . The validation set also makes the training stop early if the network performance fails to improve for 42 epochs. After the BPNN model optimization, the test set X 18×22 is utilized to evaluate the performance of the obtained model on new data. In Section 4.1, the linear fitting contrasts the targeted MHDs and predicted MHDs. Table 4 lists the coefficients of features on the first two PCs. The bolded features are selected (18 in total). Most of them in the c 1j (c 2j ) column are cloud-top (environmental) parameters. Noticeably, this reveals that the first and the second PCs focus on two distinct aspects of hailstorm characteristics. Figure 4 illustrates these results by a biplot [108] whose x-axis (y-axis) represents the coefficient values of the first (second) PC. Each line segment from the origin depicts one feature. The x-(y-) axis projection corresponds to the feature's coefficient on the first (second) PC. In Figure 4, the line segments of selected features extend outside the dashed box, while the unselected features do not.   Moreover, the features with adjacent orientation form the natural groups in Figure 4. Therefore, there are four groups within the selected features: (1) BT group (denoted as G1) including features groups BT 8.50 , BT 10.8 , BT 12.0 , BT 6.25 , BT 7.10 , and BT 13.5 ; (2) BTD group (denoted as G2) including BTD 6.25−10.8 , BTD 6.25−7.10 , and BTD 13.5−10.8 ; (3) thermodynamic group (denoted as G3) containing CAPE, KI, PW, H 0 • C , H −20 • C , and WBZ; (4) dynamic group (denoted as G4) containing VWS 0-3 , VWS 0-6 , and SRH. Next, we demonstrate the feature changes in the four groups with hail sizes and discuss their relations.    In Figure 5a, BT features decrease from the S-to L-size bin, suggesting denser and higher clouds produce larger hailstones. In Figure 5b, the BTD6.25−10.8 and BTD6.25−7.10 of the L-size bin are much larger than the S-and M-size bins, implying that they are helpful for large hail recognition. However, for BTD13.5−10.8, the difference between the S-and M-size bins is more prominent, suggesting this BTD is beneficial for distinguishing small hail from other forms of hail. This is because the peak of the weighting function of the 13.5 μm channel is near the Earth's surface, which allows the 13.5 μm channel to detect radiation from the lower troposphere. At the same time, the upper water vapor will absorb the lowlevel radiation in the 6.25 and 7.10 μm channels. Thus, BT6.25 and BT7.10 remain unchanged unless the cloud top reaches peak weighting function levels (~350 and 490 hPa).

Feature Change with Hail Sizes
In Figure 5c, CAPE and KI are not linearly correlated to hail size, and the L-size bin does not have the maximum CAPE and KI. The residence time of embryos in the growth region of storms should be sufficiently extended to produce large hailstones. Thus, a bal- In Figure 5a, BT features decrease from the S-to L-size bin, suggesting denser and higher clouds produce larger hailstones. In Figure 5b, the BTD 6.25−10.8 and BTD 6.25−7.10 of the L-size bin are much larger than the S-and M-size bins, implying that they are helpful for large hail recognition. However, for BTD 13.5−10.8 , the difference between the S-and M-size bins is more prominent, suggesting this BTD is beneficial for distinguishing small hail from other forms of hail. This is because the peak of the weighting function of the 13.5 µm channel is near the Earth's surface, which allows the 13.5 µm channel to detect radiation from the lower troposphere. At the same time, the upper water vapor will absorb the low-level radiation in the 6.25 and 7.10 µm channels. Thus, BT 6.25 and BT 7.10 remain unchanged unless the cloud top reaches peak weighting function levels (~350 and 490 hPa).
In Figure 5c, CAPE and KI are not linearly correlated to hail size, and the L-size bin does not have the maximum CAPE and KI. The residence time of embryos in the growth region of storms should be sufficiently extended to produce large hailstones. Thus, a balance between hailstone fall speed and updraft speed is necessary [110][111][112][113]. If the updraft is too strong, the particles are lofted too quickly to grow to full size and are ejected into the anvil; however, the particles may fall out of the growing region if it is too weak. This may be a partial reason for the L-size bin not having an enormous CAPE and KI. In addition to unstable energy, freezing and melting processes also influence hail size. The L-size bin has the lowest H 0 • C , H −20 • C , and WBZ. The low melting level prevents the hailstones from losing a substantial mass before reaching the ground. On the other hand, the low H −20 • C facilitates big drops freezing. Abundant water vapor is necessary for severe convective weather, so the medians of PW for S-, M-, and L-size bins all exceed 3 cm. However, they do not increase with hail size, suggesting that water vapor is not limited to larger hailstones [57].
In Figure 5d, dynamic group features (i.e., shear features VWS 0-3 , VWS 0-6 , and SRH) show the most apparent differences between L-size and smaller sized bins compared with other groups in Figure 5a-c. The 25th percentiles of VWS 0-3 , VWS 0-6 , and SRH of the L-size bin are larger than or equal to the median values of S-and M-size bins. This suggests that a robust vertical shear environment is necessary for large hail occurrences. As Brooks [45] mentions, given their occurrence, the intensity of tornadoes and hail is entirely a function of shear and only weakly depends on thermodynamic features. The increased VWS can elongate a storm's updraft, so hail process volumes and hailstone residence times increase and create a vast embryo source region [114].
We also compared the box plots of the unselected features with Figure 5. Their medians fluctuate more gently, and boxes overlap more between S-, M-, and L-size bins. This reflects a stronger correlation of features selected with hail size. However, this relationship is somewhat complicated. For example, hail size positively correlates with cloud-top height, but overlapping BTs generating hailstones are unknown. Although the BTD 6.25−10.8 and BTD 6.25−7.10 are useful for identifying large hailstones, their ability to distinguish between middle and small hail is weak for lack of low-level information. The dynamic features show better differentiation than thermodynamic features in hail size, especially in relation to larger hail occurrences. Nevertheless, for more acceptable discrimination, it is essential to couple dynamic features with thermodynamic features, which will not be an easy job in terms of the results presented thus far. As a result, we trained a BPNN regression model to predict MHDs by its powerful fitting ability with selected features, and the next section exhibits our results.

Comparison between the Observed and Predicted MHD
After building the BPNN model, the predicted MHD is represented as a function of the observed MHD on the test set; we then perform linear fitting ( Figure 6). R 2 = 0.52 signifies that our model performs well compared with other hail size estimation models. Table 5 compares data, methods, and results between those models at length. Because a linear active function is employed on the output layer of BPNN (see Figure 2), four cases are estimated negative hail size by our model, as seen in Figure 6. The BPNN architecture can be updated to suppress those unphysical results by changing the active function to a positive linear one in future work.  Then, four plots [115] serve to analyze the residuals for a complete diagnosis of the fit. Figure 7a plots "residuals vs. fitted" and shows whether the linear fitting model is reasonable. According to the points distributed along the zero line (gray horizontal dashed line), the trend of the smoothed curve (red line) is flat and close to the zero line, suggesting that there is no distinguished pattern. Figure 7b is a typical Q-Q plot of the residuals utilized to evaluate whether the errors are distributed normally. If the residuals show accurate normal distribution, they will lie precisely on the 45° line (dashed line). It can be seen that the points lie close to the dashed line, demonstrating a slight deviation  Then, four plots [115] serve to analyze the residuals for a complete diagnosis of the fit. Figure 7a plots "residuals vs. fitted" and shows whether the linear fitting model is reasonable. According to the points distributed along the zero line (gray horizontal dashed line), the trend of the smoothed curve (red line) is flat and close to the zero line, suggesting that there is no distinguished pattern. Figure 7b is a typical Q-Q plot of the residuals utilized to evaluate whether the errors are distributed normally. If the residuals show accurate normal distribution, they will lie precisely on the 45 • line (dashed line). It can be seen that the points lie close to the dashed line, demonstrating a slight deviation and an acceptable fit. Figure 7c is a scale-location plot employed to check whether the data are homoscedastic. Its x-axis is identical to Figure 7a, and its y-axis is the standardized residuals' square root; thus, this plot eliminates the sign of residuals, with a larger magnitude of residuals (positive or negative) plotting the top and small plotting the bottom. We can see that the smoothed curve (red line) is also flat, which means the data are homoscedastic. Figure 7d is an influence plot to show data points that heavily influence the regression line. The influence is measured with Cook's distance (size of points), which is increased by the leverage (x-axis) and the larger magnitude of residuals (y-axis). The highly influential points are those with Cook's distance larger than five times the mean Cook's distance. There is only one in Figure 7d, the data point "21". This outlier does not have extreme residual but has very large leverage. The leverage measures the influence on the regression due to the location in the variable space. The points that lie far from the centroid or isolate have high leverage. The results reflect a few samples with MHD > 10 mm in Figure 6, and we will add more large MHD samples in future work. and an acceptable fit. Figure 7c is a scale-location plot employed to check whether the data are homoscedastic. Its x-axis is identical to Figure 7a, and its y-axis is the standardized residuals' square root; thus, this plot eliminates the sign of residuals, with a larger magnitude of residuals (positive or negative) plotting the top and small plotting the bottom. We can see that the smoothed curve (red line) is also flat, which means the data are homoscedastic. Figure 7d is an influence plot to show data points that heavily influence the regression line. The influence is measured with Cook's distance (size of points), which is increased by the leverage (x-axis) and the larger magnitude of residuals (y-axis). The highly influential points are those with Cook's distance larger than five times the mean Cook's distance. There is only one in Figure 7d, the data point "21". This outlier does not have extreme residual but has very large leverage. The leverage measures the influence on the regression due to the location in the variable space. The points that lie far from the centroid or isolate have high leverage. The results reflect a few samples with MHD > 10 mm in Figure 6, and we will add more large MHD samples in future work.

Case Studies
This section applies the model to all three hail cases from 2019 to 2021 in Shanghai, China. Figures 8-10

Case Studies
This section applies the model to all three hail cases from 2019 to 2021 in Shanghai, China. Figures 8-10    The Chongming hail event is mainly affected by a short-wave trough at the southern periphery of an upper-level cold vortex (Figure 8a). The strong cold advection and the positive vorticity advection at 500 hPa result in the lifting of low-level air and high convective instability in this region. The sounding near the hail event has a relatively dry midlevel layer over a warm and moist low-level layer (Figure 8b), which is conducive to large hailstones reaching the surface and severe and convective local weather [25]. The sounding is characterized by strong VWS, with northerly to northwesterly winds below 850 hPa and westerly winds within the layers above. For the Fengxian hail event, convective cells initiate in front of the 500 hPa short-wave trough and the edge of the 588 line of a subtropical high (Figure 9a), accompanied by a low-level shear line (not shown) that enhances the lowlevel convergence. The southwesterly low-level jets provide favorable dynamic conditions for this event by transporting warm and moist air to destabilize the environment. The sounding in Figure 9b indicates a conducive environment for this hail event with a strong VWS and a large CAPE. As the convective instability began to increase, the convective cells developed into a long-lasting, heavy storm under the effect of strong VWS. The synoptic patterns associated with the Songjiang hail event are shown in Figure 10a. For this event, the eastern coastal regions of China were beneath a deep upper-level trough. Shanghai is located ahead of the trough and at the southern periphery of the low-level shear line at 850 hPa (not shown), where favorable positive vorticity advection from the trough and low-level convergence near the shear line work together to destabilize the environment. The whole sounding near the hail event is relatively moist (Figure 10b), coupled with an enormous CAPE, conducive to developing organized, deep convection.  The Chongming hail event is mainly affected by a short-wave trough at the southern periphery of an upper-level cold vortex (Figure 8a). The strong cold advection and the positive vorticity advection at 500 hPa result in the lifting of low-level air and high convective instability in this region. The sounding near the hail event has a relatively dry midlevel layer over a warm and moist low-level layer (Figure 8b), which is conducive to large hailstones reaching the surface and severe and convective local weather [25]. The sounding is characterized by strong VWS, with northerly to northwesterly winds below 850 hPa and westerly winds within the layers above. For the Fengxian hail event, convective cells initiate in front of the 500 hPa short-wave trough and the edge of the 588 line of a subtropical high (Figure 9a), accompanied by a low-level shear line (not shown) that enhances the low-level convergence. The southwesterly low-level jets provide favorable dynamic conditions for this event by transporting warm and moist air to destabilize the environment. The sounding in Figure 9b indicates a conducive environment for this hail event with a strong VWS and a large CAPE. As the convective instability began to increase, the convective cells developed into a long-lasting, heavy storm under the effect of strong VWS. The synoptic patterns associated with the Songjiang hail event are shown in Figure 10a. For this event, the eastern coastal regions of China were beneath a deep upperlevel trough. Shanghai is located ahead of the trough and at the southern periphery of the low-level shear line at 850 hPa (not shown), where favorable positive vorticity advection from the trough and low-level convergence near the shear line work together to The bottoms of Figures 8c, 9c and 10c illustrate the cross-sections of radar echo from which vertical reflectivity distributions are exhibited. The 50 dBz echo tops are about 7, 12, and 10 km in the Chongming, Fengxian, and Songjiang cases, while the ≥ 60 dBz echo region in the Songjiang case is the most extensive. We calculated the MEHS from radar reflectivity with the method used in Witt et al. [24]. Being different from the machine learning method, MEHS are obtained from an empirical equation that weights radar reflectivity from 0 • C level through −20 • C level to echo top: where .
E is flux value of hail kinetic energy that is transformed from reflectivity data, H is the height above radar level (ARL), W T (H) is the temperature-based weighting function that suggests hail growth which only occurs below 0 • C and mainly occurs at temperatures near −20 • C or colder, H 0 is the height ARL of melting level, and H T is the height of storm top. Please refer to Witt et al. [24] for more details. The calculated MEHS are about 30, 53, and 61 mm for the Chongming, Fengxian, and Songjiang cases, which are much larger than the observed diameters (~11, 20-30, and 5-10 mm) and not quite reasonable according to the statistics of hail size in China where severe hail events (MHD ≥ 30 mm) are rare [57]. These indicate that MEHS estimation cannot be directly adopted outside the region developed before localization. Figures 8d, 9d and 10d demonstrate the large-scale clouds corresponding to weather systems and atmospheric physical processes displayed in Figures 8a, 9a and 10a. A vortex cloud system in the upper section of Figure 8d and a leafy cloud system at the bottom left of Figure 10d are associated with the northeastern cold vortex and trough in Figures 8a and 9a, respectively. Figure 9d shows a shear-line cloud band composed of convective clouds located at the periphery of the subtropical high. The inset images of Figures 8d, 9d and 10d illustrate hailstorms in detail and list the input features of the BPNN model. Eventually, the estimated MHDs by the BPNN model are about 12, 25, and 9 mm in the Chongming, Fengxian, and Songjiang cases, which are all close to the observations (~11, 20-30, and 5-10 mm). We also compare environmental and cloud-top features of hailstorms among three cases and find their change with hail size follows the rules obtained in Section 3.2. With the largest hail size, the Fengxian case has the highest cloud top and maximum VWS. The Chongming case has a similar VWS to the Fengxian case, but it has a lower cloud top than the latter, and its hail size is smaller. The Songjiang case has the smallest VWS and the lowest cloud top, and its hailstone size is the smallest. Additionally, the CAPE of the Songjiang case is the largest, the Fengxian case is the second largest, and the Chongming case is the smallest. These results suggest that the cloud-top and dynamic features have a stronger positive correlation with hail size. In contrast, thermodynamic features are not limited to hail size as long as the energy is sufficient for hail occurrence. In any event, hail size results from multiple factors and is difficult to judge by only one.

Discussions
Some hailstorms produce giant hailstones while others produce smaller hailstones; however, the reasons for these discrepancies are not yet fully understood [16]. Consequently, it is helpful to indirectly evaluate hailstone sizes from satellites that provide continuous, extensive, and early storm observations coupled with sounding data to add vertical atmospheric information to cloud-top characteristics. The objective selection of environmental and cloud-top features based on PCA is implemented in China and can be immediately extended to other regions conveniently. Furthermore, this method is also available when selecting new variables, such as new satellite channels or new weather phenomena recognition, e.g., heavy rainfall, strong wind.
One difficulty in hail size diagnoses is that relations in one region may not quickly transfer to another area. However, the data-driven machine learning method without necessary equation assumptions in advance proves a promising solution. Another difficulty is a lack of accurate, dense, and full hail observations. Clearly, more hail samples with a wide size distribution for training, validation, and testing can improve BPNN model robustness. Additionally, more case studies in different geographical areas and seasons are helpful before its usage. Nevertheless, with more and better meteorological observations planned for the future, the machine learning model will prove an adequate approach for predicting hailstone size.

Conclusions
This study established a hail size estimation model from satellite images based on two algorithms, PCA and BPNN.
The PCA-based technique selects features closely related to hail sizes from 29 candidate parameters. These parameters contain wind shear, temperature layer, instability energy, water vapor, cloud-top height, cloud thickness, cloud-top phase, etc. As a result, a total of 18 features were strictly and carefully selected according to PC contribution and parameter coefficients. Those features naturally congregated into four groups: BT (BT 6.25 , BT 7.10 , BT 8.50 , BT 10 groups. The statistics of the four group features for different hail-sized bins demonstrate that hailstorms with a higher cloud-top height and denser clouds produce larger hailstones. The largest hailstones occur in an environment with strong vertical wind shear and a low melting/freezing level height. Abundant moisture supply and instability energy are required for hail occurrence but are not limiting factors for the largest hailstones. Although the selected features vary more apparently with hail sizes than the unselected ones, the relations between these features and hail size are complicated and nonlinear. As a result, the machine learning model BPNN, with powerful fitting ability, is adopted to estimate hailstone size. Next, to establish the BPNN regression model, the hail samples were segregated into training (70%), validation (15%), and test (15%) sets. The training set trains the BPNN model; the validation set checks the model during training and decides when to stop training early; the test set estimates model accuracy in new data. The BPNN regression model has a two-layer architecture with 35 hidden layer neurons and a mean squared error loss function. The linear fitting R 2 between predicted and observed MHDs on the test set is 0.52. In the Shanghai case studies, the prediction of MHD is close to the observed value and is more reasonable than the MEHS results, which overestimate hail size. All of the results suggest that this model can be applied in practice.