Detection of Crop Hail Damage with a Machine Learning Algorithm Using Time Series of Remote Sensing Data

: Hailstorms usually result in total crop loss. After a hailstorm, the affected ﬁeld is inspected by an insurance claims adjuster to assess yield loss. Assessment accuracy depends largely on in situ detection of homogeneous damage sectors within the ﬁeld, using visual techniques. This paper presents an algorithm for the automatic detection of homogeneous hail damage through the application of unsupervised machine learning techniques to vegetation indices calculated from remote sensing data. Five microwave and ﬁve spectral indices were evaluated before and after a hailstorm in zones with different degrees of damage. Dual Polarization SAR Vegetation Index and Normalized Pigment Chlorophyll Ratio Index were the most sensitive to hail-induced changes. The time series and rates of change of these indices were used as input variables in the K-means method for clustering pixels into homogeneous damage zones. Validation of the algorithm with data from 91 soybean, wheat, and corn plots showed that in 87.01% of cases there was signiﬁcant evidence of differences in average damage between zones determined by the algorithm within the plot. Thus, the algorithm presented in this paper allowed efﬁcient detection of homogeneous hail damage zones, which is expected to improve accuracy and transparency in the characterization of hailstorm events.


Introduction
In the last decades, both the frequency and intensity of extreme weather and climate events have increased worldwide, leading to huge economic losses [1]. In Argentina, the number of intense storms due to climate change has already increased in the last few years [2]. The frequency of hail events is higher in midlatitudes during the warm season [2]. The Pampas region of Argentina, the most important crop production area in the country, is located in midlatitudes, and the time of higher frequency of hail events overlaps both the stage of the highest vulnerability of winter crops (such as wheat, which reaches the fruiting stage in the warm season) and the whole growth period of summer crops such as soybeans and corn.
Farmers aim to mitigate the consequences of natural catastrophes through management strategies within their farming operations, but certain risks, such as hail events, are beyond their financial means. For such cases, farmers usually transfer the risk to specialized risk management companies by taking out agricultural insurance policies, with hail damage policies being very widespread in the country.
Following a hailstorm, an insurance claims adjuster goes to the field and estimates the yield losses due to hail damage. The accuracy of estimates relies heavily on the detection of Homogeneous Damage Zones (HDZs), which allows extrapolating the data from field samples to the whole area of the damaged field. Currently, estimates of the surface area of HDZs are performed visually, and the adjuster has no information about either the affected area within the field or the degree of damage before getting to the site for inspection. Since unsupervised technique. The technique was applied to identify damage patterns in data from different crops affected by hailstorms at different phenological stages.
Although Earth satellite data have been available for more than 30 years, to the best of our knowledge no research has been conducted on the use of SAR data for hail damage to crops. Multispectral data have been used to analyze some local and sporadic cases [15][16][17] which, however, pose practical limitations for regional and operational application [18]. Thus, the purpose of this work was to develop an algorithm for determining HDZs in crops affected by hailstorms, following a standardized and accurate methodology for processing satellite data. Such a development will confer objectivity and transparency to crop damage assessment.
The following section describes the study area and the importance of agricultural insurance in the region. The characteristics and pre-processing of satellite data are also included in this section. Next, we explain the methodology used to select the indices with the highest sensitivity to hail-induced changes in the crop, as well as the steps followed by the algorithm to classify pixels into HDZs and the algorithm validation methodology. Section 3 details the results of the sensitivity analysis for the different vegetation indices and identifies the most sensitive for each signal type. The results and discussion of the algorithm steps are also included in this section, which finishes with the results and discussion of algorithm validation with in situ data. Conclusions are presented in Section 4.

Study Area
The study was conducted in the Pampas, a huge sedimentary basin formed on top of the Brasília Massif in South America. The climate in this region is subtropical humid in the north and temperate in the south, with warm summers and cool and variable winters with frequent frosts but no snowfall. The average annual temperature is 17 • C. It presents a monsoon rainfall regime, with annual averages ranging from 1200 mm in the east, an area under the influence of Atlantic winds, to 300 mm in the west, where the dry, hot winds blowing across the Andes Mountains predominate. The Pampas is an extensive plain extending over the Argentine provinces of Buenos Aires, Entre Ríos, Santa Fe, Córdoba, La Pampa, and San Luis ( Figure 1). Formerly characterized by rangelands and the absence of trees, today it is the most industrialized region in the country. The industrialization process began with cattle raising and the manufacturing of animal products for export continued with agricultural development and is currently fueled by soybeans, corn, wheat, and their by-products. In Argentina, the regions with the highest hailstorm frequencies are located near the Córdoba, Tandil, and La Ventana Mountains, where orographic storms occur, and along the Atlantic coast, where the disturbances caused by cold air currents reaching the continent contribute to the formation of hailstorms [19]. Historically, the damage is greater in winter crops such as wheat since intense storms occur mainly in late spring and early  In Argentina, the regions with the highest hailstorm frequencies are located near the Córdoba, Tandil, and La Ventana Mountains, where orographic storms occur, and along the Atlantic coast, where the disturbances caused by cold air currents reaching the continent contribute to the formation of hailstorms [19]. Historically, the damage is greater in winter crops such as wheat since intense storms occur mainly in late spring and early summer, when the filled grains are exposed [20].
The area allocated to crop production shows small variations between years. In the 2019-2020 growing season, the areas sown with soybeans, corn, and wheat were 16.9, 9.5, and 6.9 million hectares respectively (http://datosestimaciones.magyp.gob.ar/ (accessed on 10 March 2020)). In the same period, the policies issued by all the insurance market for Agricultural and Forestry Risks amounted to US$ 250 million (https://www.argentina.gob. ar/superintendencia-de-seguros (accessed on 25 July 2020)), 91.5% of which was taken out for hail damage, as follows: 37.6% for extensive soybean crops, 37.1% for corn, and 16.8% for wheat.

Data
Data were obtained from Sentinel-1 and -2 Missions of the Copernicus Program (European Space Agency and European Environment Agency, EU), and from in situ damage assessments in 91 plots affected by hailstorms between 2017 and 2020. All plots are located within the study area, 44 planted with soybeans, 32 with wheat, and 15 with corn ( Figure 1). The evaluated plots had an average surface of 137.97 hectares, with a minimum value of 13 and a maximum of 590 hectares; the values of the quartiles Q1, Q2, and Q3 were 69.75, 97.5, and 162.75 hectares respectively. Table S1 shows the central point of each plot, the crop name, and the date of the hailstorm which caused the damage.
Sentinel-1 is equipped with C-band, dual-polarization Synthetic Aperture Radar (SAR) sensors operating at a center frequency of 5.405 GHz. Sensors are mounted on two satellite platforms (A and B), each of which registers images of the same area every 12 days. In most of the Argentinian territory, images overlap, and their incidence angles differ considerably between the platforms. To avoid this problem, when data from platforms A and B overlapped, the first to capture images after a given hailstorm was selected.
The images used were Ground Range Detected, processed from Single Look Complex products with Sentinel-1 Toolbox (http://step.esa.int/main/toolboxes/snap (accessed on 12 December 2018)). This process generates a calibrated, ortho-rectified product, with thermal noise removal, and radiometric calibration, and terrain correction using the SRTM 30 digital elevation model, generating a product with a 10 m spatial resolution.
Sentinel-2 is a remote sensing system that uses the sensors MSI (MultiSpectral Instrument) capturing images in the visible light spectrum, near-infrared, and short-wavelength infrared bands, to record solar radiation reflected by objects on Earth in thirteen 16-bit spectral ranges. Similarly to Sentinel 1, two MSI sensors are mounted on two S2 platforms (A and B). The wavelength ranges used were blue (B2), red (B4), and near-infrared (B8), all of them with 10 m spatial resolution and obtained from a product with Level-1C Processing. The data thus obtained represent the reflectance in the upper layer of the atmosphere. The time resolution of Sentinel-2 is 5 days, or 2 and 3 days in sectors with orbit overlap.
For each assessed plot, data were obtained from satellite images from both missions, within the plot boundaries, with a time frame of 60 days prior to and 60 days after the hailstorm. When the time frame included any date prior to sowing or after harvest of each crop, the series was interrupted to avoid entering data from other crops. Satellite data were retrieved from and processed in the Google Earth Engine platform.
In situ damage assessment for each crop was performed following the United States Department of Agriculture and Federal Crop Insurance Corporation guidelines [21][22][23]. At least ten sampling stations were established in each of the 91 plots analyzed, totaling 1174 observations (646 in soybean plots, 153 in corn, and 375 in wheat). Latitude, longitude, and estimated damage percentage were recorded in each sampling station.

Satellite Data Pre-Processing
SAR images are formed through coherent processing of the return of successive RADAR pulses. When a RADAR sheds light upon a rough surface, the distances between the elementary scatterers and the receiver vary due to surface roughness. As a result, the backscattered waves are coherent in frequency but not in phase. The signal received is strong if the waves add together in a relatively constructive way, whereas it is weak if phase shifts lead to intensity variations between adjacent pixels. These variations appear as a granular pattern, called speckle [24], which makes visual and digital image interpretation more difficult. The technique most widely used is to remove speckle noise in spatial averaging. This technique, however, lowers image resolution significantly, resulting in blurry images. Morphological filters reduce speckle noise without losing image focus [25] and also have an excellent computational performance. These filters transform images by applying successive convolutions. To do this, a kernel function (e.g., maximum, minimum, mean, median) is assigned to a moving window. The kernel moves over the original image performing the calculations for each pixel. The value obtained as output is assigned as the new value of the central pixel in the image. Thus, the pixel values in the new image are modified according to the values of its neighboring pixels in the original image. Kernels can be circular, square, cross-or diamond-shaped, or octagonal, and their size is defined by the user. To remove speckle noise, the median for each pixel was calculated on a circular kernel with a 15-pixel radius. A median rather than an average was used because speckles appear in random positions and their values vary greatly from other pixels representing the same object, which could lead to distortions in the transformation of speckled areas of an image if an average were used. Kernel size was defined according to plot size and the areas to be differentiated within the plot, whereas kernel shape was defined according to the shape of the areas to be differentiated. The crop changes to be detected are determined by soil variables such as salinity, texture, and topography before the hailstorm, and by hail after the storm. A variation in any of these variables will be strongly correlated with the spatial variable, generating irregular patterns [26]. The circular kernel best fitted this pattern.
Although there are numerous methodologies to correct atmospheric interferences in multispectral data, not all can be applied to Sentinel-2 data because it does not record data in the thermal infrared wavelength range. Since atmospheric interferences appear as unexpected and occasional changes along with successive images in a collection, and crop changes manifest themselves gradually along time, temporal methods such as Tmask [27] and MAJA [28] are considered most suitable to perform atmospheric corrections. However, Tmask and MAJA have a high computational cost. Their application to HDZs would slow the algorithm, making it unsuitable for the quick responses required for crop damage insurance claims. Thus, a quick-running time method was selected to filter clouds and their shadows and correct other atmospheric interferences, namely the mobile linear regressions method, which is applied separately to each pixel time series [29]. A 15-day bandwidth was used to ensure interference removal without adversely affecting the values by data degradation due to excessive smoothing.

Selection of Vegetation Indices
Vegetation indices allow identifying and describing structural, phenological, and biophysical parameters, which is a temporal line basis help detecting changes in vegetation [30]. To define the variables to be used for pixel classification into HDZs, we considered the vegetation indices which are most informative about crop condition and can be calculated from Sentinel-1 and -2 satellite missions' data. For this purpose, five microwave indices (calculated with the S1 data set) and five multispectral indices (calculated with the S2 data set) were evaluated. Subsequently, one index was selected for each signal type and named I S1 and I S2 .
The microwave indices considered were: • Dual-Pol Diagonal Distance (DPDD) [6]: where σ vv(i) is the value of pixel i in the co-polarized signal, and σ vh(i) the value of pixel i in the signal with cross-polarization. A low DPDD value represents bare soil, increasing gradually as biomass increases, and reaching maximum values with thick vegetation.
• Inverse Dual-Pol Diagonal Distance (IDPDD) [6]: where σ vv(i) and σ vh(i) are defined as in (1) and σ vv(max) is the highest value of the copolarized signal observed in each image within the plot. IDPDD is better than DPDD to discriminate between vegetated pixels and pixels with bare soil but presents some degree of inaccuracy to differentiate vegetation from water bodies.
• Vertical Dual Depolarization Index (VDDPI) [6]: where σ vv(i) and σ vh(i) are defined as in (1). This index quantifies biomass by measuring the ratio between total polarized power and co-polarized power.
• Microwave Polarization Difference Index (MPDI) [31]: where σ vv(i) and σ vh(i) are defined as in (1). This index is particularly sensitive to surface roughness and soil moisture, both being considerably altered after a hailstorm.
• Dual Polarization SAR Vegetation Index (DPSVI) [6]: where σ vv(i) and σ vh(i) are defined as in (1), and σ vv(max) as in (2). DPSVI is a combination of VDDPI and IDPDD which keeps the advantages of both indices while allowing differentiating vegetation from water bodies. The spectral indices considered were: • Normalized Difference Vegetation Index (NDVI) [32]: where NIR is the pixel value in the near-infrared range (B8); Red is the pixel value in the red range (B4). NDVI is associated with the amount of chlorophyll in the vegetation. High NDVI values correspond to areas with higher reflectance in the near-infrared spectrum, which correspond to thicker, healthier vegetation. This index reduces noises such as differing amounts of light, cloud shadows, atmospheric attenuation, or certain topographic variations [27]. However, it saturates with high biomass and is very sensitive to variations in soil underlying the canopy [33].
• Enhanced Vegetation Index (EVI) [34]: where NIR and Red are defined as in (6), and Blue is the pixel value in the blue range (B2). EVI performs better than NDVI in high biomass situations. It also has a better response to canopy structure variations such as leaf area index, canopy type, plant shape, and canopy architecture [35]. EVI requires a thorough previous removal of both external noise and sensor noise to estimate surface reflectances [34].
• Soil Adjusted Vegetation Index (SAVI) [33]: where NIR and Red are defined as in (6). SAVI is derived from the NDVI formula, to which a coefficient is added for the correction of soil brightness in areas with low vegetative cover. This correction is important in open canopies where the underlying soil affects the reflectance recorded.
• Advanced Vegetation Index (AVI) [36]: where NIR and Red are defined as in (6). Like NDVI, AVI uses the red and near-infrared reflectances. However, since AVI uses a power degree of the infrared response, it is more sensitive to slight differences in canopy density [36].
• Normalized Pigment Chlorophyll Ratio Index (NPCRI) [37]: where Red and Blue are defined as in (6) and (7). This index is sensitive to carotenoid: chlorophyll content ratios. Being especially useful for characterizing leaf senescence and fruit ripening, it also differentiates senescent crops from bare soil. Indices I S1 and I S2 were selected by analyzing data from the hailstorm on February 10th, 2019, near Lozada town, province of Cordoba, Argentina. The analysis was performed in a 90-hectare soybean field (plot 79 in Table S1, hereafter referred to as model plot) in the R3 phenological stage according to the Fehr scale [38]. At the time of the hailstorm, the plot presented total and homogeneous crop cover. After the storm, damage ranged from null to total crop loss, with intermediate degrees of damage in some sectors.
All Sentinel-1 and -2 images containing the model plot were selected from 12 December 2018 (60 days before the storm) to 11 April 2019 (60 days after the storm). Each image was cut out to fit the plot shape, and two datasets were built: S1 with images from mission Sentinel-1, and S2 with images from Sentinel-2. Datasets were pre-processed as described in Section 2.3. In each dataset, the five vegetation indices for each signal were calculated for each image pixel. A graph was built for each index showing the time series of the model plot pixels. Standard deviations of index values were calculated for each image. Next, the maximum values of the standard deviations pre-storm (Max S j ant ) and post-storm (Max S j post ) were calculated for each of the ten vegetation indices I j (j = 1, . . . , 5 with S1 dataset, and j = 6, . . . , 10 with S2 dataset). Then we selected the lowest quotient among the vegetation indices for each dataset S1 and S2, defined as I S1 and I S2 respectively and calculated as: Thus, the microwave index and the spectral index with the lowest pre-storm backscatter compared to the post-storm backscatter were selected.

Algorithm for Identification of HDZs
An unsupervised learning algorithm is proposed to identify HDZs within a plot using Sentinel-1 and-2 satellite images. The steps of the algorithm are: (1) selection and pre-processing of satellite data to remove speckle noise, presence of clouds, and other atmospheric phenomena; (2) calculation of vegetation indices with the highest sensitivity to hail-caused damage to constructing the data matrix for pixel classification; (3) preparation of the variables for classification and data matrix construction; (4) unsupervised pixel classification into HDZs.
(1) Selection and pre-processing of satellite data For each plot, satellite images within the 60-day pre-and post-storm time window, as described in Section 2.2, were selected. Each image was cut out to fit the plot shapes and datasets S1 and S2 were assembled. Both datasets were pre-processed by correcting speckle on microwave signals and atmospheric interferences on the multispectral signal, as described in Section 2.3.

(2) Calculation of vegetation indices
Indices I S1 and I S2 were calculated for each pixel of datasets S1 and S2 respectively, as described in Section 2.4.

(3) Preparation of classification variables and construction of data matrix
To determine HDZs in a field with hail-damaged crops, pixels were classified using the data in Sentinel-1 and -2 images. The number of pixels depended on the size of the affected field, with analyzed fields having an average area of 60 hectares. Thus, for the average field size, 6000 pixels of 10 m resolution (100 m 2 ) each were analyzed.
The algorithm was expected to classify pixels into HDZs. Since the pixels in a given data matrix always corresponded to a single plot, they invariably represented a single crop and variety, sown on the same date. Classification into HDZs considered the pre-and post-hailstorm crop evolution. The resulting HDZs were expected to cluster plants with a similar response to the biophysical variables estimated by the indices, both prior to and after the hailstorm.
In order to capture the diversity and complexity of possible hail-induced crop changes, both microwave and multispectral signals were used. As mentioned in Section 1, microwave signals are more sensitive to changes in structure and geometry, while the multispectral signal is more sensitive to physiological changes manifesting themselves in canopy coloration.
Since the revisit time of the Sentinel 2 mission is shorter than that of Sentinel 1, Sentinel 2 will capture a larger number of images than Sentinel 1 (RADAR) for a given time window. For this reason, a linear interpolation of spectral index values was performed to estimate a value for each of the dates of SAR image capture. This allowed having the same amount of data for both microwave and spectral signals in the matrix, thus balancing the influence of each dataset. The procedure also provided multispectral images with temporal equidistance from each other, which allowed calculating the rates of index value changes.
The rates of change (derivative functions) were calculated by differentiating each of the time series obtained when calculating the different indices of vegetation in the plots. Differentiating an equispaced time series consists of subtracting the immediately preceding value from each value and, therefore, it served us to calculate the derivative of the function. Exploration of both series, the original and the derivative, is interesting since in certain cases the different groups may differ due to the indices observed in a given period, but in other cases due to their change ratios [39]. It was not possible to calculate the derivative of the first image due to a lack of previous data, so it was removed from the matrix after calculating the derivatives. Thus, the data matrix for each plot had as many rows as pixels, and the variables I S1 , I S2 and their corresponding derivatives (∆I S1 , ∆I S2 ) calculated for each observation within the time window. Thus, if the sowing (harvesting) date was prior to (following) the time window (60 days pre-and post-storm date), 9 observations were obtained (4 pre-storm and 5 post-storm) for each variable (I S1 , I S2 , ∆I S1 , ∆I S2 ), resulting in a maximum of 36 values in the vector for each pixel.
(4) Unsupervised Pixel Classification HDZs will be defined using unsupervised machine learning. Unsupervised learning aims to find the underlying structure of a dataset and cluster its data by similarity. The methods most widely used include K-means [40], hierarchical clustering [41], and mixture modeling [42]. Hierarchical clustering requires calculating the distance matrix between all data, which increases processing costs considerably in the case of big datasets. Mixture modeling requires assuming restrictive hypotheses on the probability distribution of the clusters defined. We used K-means [40] because it is simple, easy to implement, and quick and cost-effective for computing big datasets [43]. K-means is an iterative clustering algorithm used for partitioning n observations (x 1 , x 2 , . . . , x n ) into k groups (S = {S 1 , S 2 , . . . , S k }). Each observation is allocated to the cluster with the nearest mean value in order to minimize within-cluster variances, based on the following equation: where µ i is the mean of observations in S i . This method, which has been proven to have an optimal performance in scientific research for analyzing data clusters, requires specifying the number of groups into which data will be classified. For HDZ detection, plot pixels were grouped into three categories (k = 3). In this sense, three classes were defined per plot because it is the number of homogeneous damage zones most frequently detected in the field using visual techniques; thus, it is validated by the expertise of claims adjusters seeking to simplify and standardize field surveys. The Euclidean distance was used to measure proximity to centroids. For different applications of the algorithm, this parameter can be modified to include other classification objectives.

Algorithm Validation with In Situ Data
The quality of clusters provided by the algorithm proposed in this work was validated by external data, different from the data used to determine HDZs in each plot. Even though no information was available about the actual HDZ to which each pixel belonged for any of the plots analyzed, we had in situ damage measurements for some spots.
Pixels for which in situ data were identified in each plot. These pixels constituted a sub-sample of the three HDZs identified with the algorithm, for which actual damage data was available from in situ measurements. To validate the quality of pixel clusters into HDZs, we compared the means of the damage percentages calculated with in situ data in the three HDZs identified by the algorithm. If the algorithm identified HDZs correctly, the mean actual damage percentage had to differ between the zones compared. Homogeneity between HDZs was tested using one-way ANOVA tests. Differences were considered statistically significant if p-value < 0.05.

Selection of Vegetation Indices
Hail damage detection using satellite data is based on the premise that strong winds, rainfall, and hail impacts on crop architecture may alter the physical characteristics of the canopy and underlying soil, thus changing the energy backscattered from the crop to active and passive sensors [44]. The most sensitive indices were selected using the model plot described in Section 2.4. Figure 2 shows a timeline with image capture dates from both missions and some representative images of the plot. Storm damage was heterogeneous, with in situ assessed yield losses ranging from 0 to 100% along a latitudinal gradient with increasing intensity from north to south. Total (100%) damage is defined as areas with the complete destruction of plant tissue and in situ observation of senescent plant remains and bare soil.

Selection of Vegetation Indices
Hail damage detection using satellite data is based on the premise that strong winds, rainfall, and hail impacts on crop architecture may alter the physical characteristics of the canopy and underlying soil, thus changing the energy backscattered from the crop to active and passive sensors [44]. The most sensitive indices were selected using the model plot described in Section 2.4. Figure 2 shows a timeline with image capture dates from both missions and some representative images of the plot. Storm damage was heterogeneous, with in situ assessed yield losses ranging from 0 to 100% along a latitudinal gradient with increasing intensity from north to south. Total (100%) damage is defined as areas with the complete destruction of plant tissue and in situ observation of senescent plant remains and bare soil. The five microwave indices were sensitive to hail-caused damage in field pixels (Figure 3). The graphs highlight the behavior of pixels with no damage, the values of which The five microwave indices were sensitive to hail-caused damage in field pixels ( Figure 3). The graphs highlight the behavior of pixels with no damage, the values of which were obtained by averaging 100 pixels from the north part of the field, where the crop was unaffected by the storm. Curves for the indices evaluated showed a homogeneous pre-storm behavior and split after the storm. were obtained by averaging 100 pixels from the north part of the field, where the crop was unaffected by the storm. Curves for the indices evaluated showed a homogeneous prestorm behavior and split after the storm. Maximum pre-and post-storm standard deviations for each microwave index and their quotients are shown in Table 1. The microwave index selected for classification ( 1 ) was DPSVI, the one with the lowest quotient value ( / ), 0.1535 (Table 1). According to these results and considerations, it can be stated that DPSVI is of crucial importance to identify the scattering effects caused by hail on the observed crops. Maximum pre-and post-storm standard deviations for each microwave index and their quotients are shown in Table 1. The microwave index selected for classification (I S1 ) was DPSVI, the one with the lowest quotient value (Max ant /Max post ), 0.1535 (Table 1). According to these results and considerations, it can be stated that DPSVI is of crucial importance to identify the scattering effects caused by hail on the observed crops. The DPSVI model is based on the scatter diagram pattern built by the backscatter coefficient of VV and VH polarizations and highlights the contribution of cross-polarization. The three parameters (σ vv(i) , σ vh(i) , and σ vv(max) ) in the equation for the DPSVI model (5) have a great influence on biomass characterization [6]. When the biomass production of a single crop increases, the cross-polarized backscatter values are higher [45]. The transmitted vertical polarization signal changes its polarization state partially or totally due to a change in the orientation angle caused by several factors such as the nature, topography, orientation, and distribution of the target. Thus, both rough surfaces and the presence of vegetation cause a significant depolarization of the illuminated signal, whereas smooth surfaces depolarize the signal to a lesser extent. Since C-band microwave signals penetrate through leaves and undergo multiple dispersion within small stalks of the crop, the signal is useful for assessing crop biomass [46]. C-band has a surface penetration ability of about 1 m, reaching saturation at around 60-70 tons/ha [47]. This might account for its good performance to detect hail-induced changes in the model plot.
Analogously, all spectral indices were sensitive to hail damage ( Figure 4). The wavelength ranges used in the indices are sensitive to both healthy vegetation and bare soil. The values obtained with indices NDVI, EVI, SAVI, and AVI were positively correlated with the amount of healthy vegetation in the pixel -the higher the soil cover by vegetation growth, the higher the index value. NPCRI, on the other hand, had a negative correlation with healthy vegetation, showing lower and lower negative values as crop growth increased soil cover. It is worth pointing out that the highlighted curve for the sector with no damage had intermediate values in all indices before the hailstorm, becoming the most extreme curve after the storm with positive values in indices NDVI, EVI, SAVI, and AVI, and negative in NPCRI (Figure 4). This shows the sensitivity of spectral indices to changes in biomass amount in the pixel.
A comparison of the quotient between the pre-and post-storm maximum standard deviation for each spectral index (Table 2) showed that NPCRI had a remarkably better performance than all other indices (Max ant /Max post = 0.1483). growth, the higher the index value. NPCRI, on the other hand, had a negative correlation with healthy vegetation, showing lower and lower negative values as crop growth increased soil cover. It is worth pointing out that the highlighted curve for the sector with no damage had intermediate values in all indices before the hailstorm, becoming the most extreme curve after the storm with positive values in indices NDVI, EVI, SAVI, and AVI, and negative in NPCRI (Figure 4). This shows the sensitivity of spectral indices to changes in biomass amount in the pixel. A comparison of the quotient between the pre-and post-storm maximum standard deviation for each spectral index (Table 2) showed that NPCRI had a remarkably better performance than all other indices ( / = 0.1483). This better performance might be due to the ability of NPCRI to differentiate between bare soil and healthy or senescent vegetation. NDVI is a normalized quotient of the nearinfrared and red reflectance values, while EVI, SAVI, and AVI are equations also based on the vegetation response to these wavelengths. NPCRI is a normalized quotient of the This better performance might be due to the ability of NPCRI to differentiate between bare soil and healthy or senescent vegetation. NDVI is a normalized quotient of the nearinfrared and red reflectance values, while EVI, SAVI, and AVI are equations also based on the vegetation response to these wavelengths. NPCRI is a normalized quotient of the red and blue reflectance values and takes positive values when soil predominates in the pixel (beginning of the curve) and close-to-zero negative values when senescent vegetation is predominant (end of curves). The other four indices show similar values at the beginning and end of curves, without discriminating bare soil predominance from senescent vegetation predominance. When hitting upon the canopy, hail destroys leaves and stems, leading to their senescence. Plant tissues undergo color changes due to modifications in content and relative proportions of pigments [48]. Changes in tissue color in senescent plants are mainly related to increasing chlorophyll degradation and a relative increase of carotenoids. Externally, color shifts from green to yellow, followed by brown when the tissue becomes necrotic. Carotenoids and chlorophylls exhibit strong absorption in the blue spectral range, the reflectance values used in NPCRI [37]. After a storm, when the crop begins to manifest hail-caused senescence, each pixel shows different fractions of bare soil and healthy or senescent vegetation. The ability of NPCRI to differentiate these components might explain its higher performance.

Algorithm Application for Identification of HDZs in the Model Plot
This section details the results of the steps followed by the algorithm applied to the model plot satellite data for post-storm HDZs identification. To corroborate how data preprocessing improves storm effect visualization, the vegetation indices were calculated with and without speckle filtering and atmospheric correction for each type of signal ( Figure 5). Storm effects were difficult to observe in indices without the pre-processing step (Figure 5a) but easily observed in graphs of indices with pre-processed data (Figure 5b). Pre-processing completely removed speckle in the microwave signal and atmospheric interferences in the multispectral signal, without altering index sensitivity to hail-caused changes.
with and without speckle filtering and atmospheric correction for each type of signal (Figure 5). Storm effects were difficult to observe in indices without the pre-processing step (Figure 5a) but easily observed in graphs of indices with pre-processed data (Figure 5b). Pre-processing completely removed speckle in the microwave signal and atmospheric interferences in the multispectral signal, without altering index sensitivity to hail-caused changes. Figure 5. Pre-processing of satellite data for speckle removal in the microwave signal, and atmospheric interference removal in the multispectral signal. The time series for indices applied to model plot pixels without data pre-processing are plotted in (a), those with data pre-processing in (b).
The data matrix for the model plot was constructed with the four variables (indices DPSVI, NPCRI, and their corresponding derivatives), and the images recorded at 9 dates during each crop growing season. Figure 6a maps in grayscale the values of indices and their derivatives, and the time series of indices for each pixel are shown in Figure 6b. DSPVI had an immediate response to hail damage since the microwave signal is sensitive to changes in the physical properties, roughness, and geometry of a crop caused by hailstorms. NPCRI, on the other hand, had a gradual response along with the series. Changes in crop color, detected by the multispectral signal, occur gradually, as relative pigment The data matrix for the model plot was constructed with the four variables (indices DPSVI, NPCRI, and their corresponding derivatives), and the images recorded at 9 dates during each crop growing season. Figure 6a maps in grayscale the values of indices and their derivatives, and the time series of indices for each pixel are shown in Figure 6b. DSPVI had an immediate response to hail damage since the microwave signal is sensitive to changes in the physical properties, roughness, and geometry of a crop caused by hailstorms. NPCRI, on the other hand, had a gradual response along with the series. Changes in crop color, detected by the multispectral signal, occur gradually, as relative pigment amounts are modified by the destruction of plant structures. The DSPVI change rates confirmed these changes in the first post-storm image, whereas NPCRI change rates showed alterations in the pre-storm image due to the linear regression applied to the S2 collection as a smoothing procedure. This resulted in the visualization of storm-induced changes before the storm date. Since the K-means method does not take into account image date for image processing, this effect did not alter the result of classifications.
Next, the pixels in the model plot data matrix were subjected to unsupervised classification into HDZs using the K-Means method. The in situ damage assessments confirmed the good performance of the algorithm for detecting HDZs (Figure 6c).
To the best of our knowledge, the k-means method has not been applied to define HDZs in crops, but this technique has been widely used to identify homogeneous zone for site-specific management in precision agriculture [49]. A management zone is a sub-region of a field with relatively homogeneous soil and landscape attributes and responds in a similar way to certain management practices. The problems posed for management zone delimitation are comparable to those addressed in our study to detect HDZs. In the field of site-specific management, several authors have obtained satisfactory results using K-means to delimit homogeneous management zones [50][51][52].

Algorithm Validation with In Situ Data
The results of algorithm application in each of the 91 plots for which in situ data were available are shown in Table 3. The p-values obtained with the one-way ANOVA test in each plot are shown in Table S1. In situ data of hail damage in corn showed that in 66.67% of plots (15 out of 5) there was significant evidence of differences in mean damage percentage between the three HDZs determined by the algorithm. For wheat, mean damage differed between HDZs in 78.13% of the plots (25 out of 32 analyzed), while in soybeans, damage differences between algorithm-determined HDZs were confirmed in 72.70% of plots (32 out of 44). To the best of our knowledge, the k-means method has not been applied to define HDZs in crops, but this technique has been widely used to identify homogeneous zone for site-specific management in precision agriculture [49]. A management zone is a subregion of a field with relatively homogeneous soil and landscape attributes and responds in a similar way to certain management practices. The problems posed for management  Table 3. Algorithm validation comparing the means of damage percentages obtained from in situ assessments for each HDZ. Plots with correct HDZ determination (those with significant differences between HDZs) and plots without significant differences between HDZs are shown in columns 2 and 3. Differences between HDZs were calculated using one-way ANOVA with a p-value < 0.05.

Crop
Plots The ANOVA test is frequently used to evaluate the performance of clustering methods [53] and previous researchers [49,54] have reported similar results in validating homogeneous management zones determination. Cluster validation by analysis of variance is demanding since the presence of outliers might hide significant differences between groups; however, the method is highly reliable when it identifies significant differences.
We have identified some particular situations in plots with no significant evidence of differences in degrees of crop damage between the three HDZs defined by the algorithm. In six plots, damage values were highly similar over the whole surface area. In such situations, the k-means algorithm is forced to perform artificial discrimination between pixels, since it works on the assumption of the existence of three different pixel clusters. Thus, no differences in mean damage values were found between HDZs, not due to a wrong classification but to the fact that all the plot pixels belonged to a single HDZ category. Also, no significant evidence of differences in crop damage between HDZs was found in 8 plots which presented highly heterogeneous vegetation conditions before the hailstorm. Plots with excess surface water during the growing season were a clear example of this anomaly. In this case, hail damage differences between HDZs were less conspicuous than the pre-storm crop differences.
If these 14 plots presenting particular situations are excluded, the overall result of algorithm validation amounts to 87.01%, making it very reliable for the detection of areas with similar degrees of damage. The tool is expected to increase the accuracy and fairness of damage adjustments for all the parties involved in the insurance contract: insurance and reinsurance companies, and policyholders.

Conclusions
In this research, we have proposed an algorithm for the automatic determination of crop damage by hail, using satellite data from Sentinel-1 and -2 missions. The algorithm selected the satellite data to be used based on the information recorded for each plot (sowing and harvest dates, storm date, and plot geometry), then performed a pre-processing step by applying morphological filters to microwave data, and linear regressions for atmospheric correction of multispectral data time series. The efficiency of data pre-processing was confirmed in the model plot. The next step consisted in calculating indices with sensitivity to hail damage: DPSVI for the microwave signal, and NPCRI for the multispectral signal. After setting the classification variables, a data matrix was constructed by calculating the change rates of index values to enhance the classifications. HDZ determination with the K-means method had excellent results in the model plot, in which in situ assessments were intensified. Overall validation showed significant evidence of differences in mean damage values between algorithm-determined HDZs in 87.01% of the plots analyzed. The proposed methodology provides transparency to HDZ determination and improves the scope of damage assessment since remote sensors allow sampling of entire plots. Thus, the method avoids the need to extrapolate the scarce in situ observations performed by insurance claims adjusters. Future extensions of the algorithm should adapt index selection to each particular crop and its phenological stages, and also include the different soil types of the region where the crop is grown.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/agronomy11102078/s1, Table S1: plot coordinates, crop, hailstorm dates, and p-values of classification validation with ANOVA for each plot.