Performance Evaluation of IMERG GPM Products during Tropical Storm Imelda

: Tropical Storm Imelda struck the southeast coastal regions of Texas from 17–19 September, 2019, and delivered precipitation above 500 mm over about 6000 km 2 . The performance of the three IMERG (Early-, Late-, and Final-run) GPM satellite-based precipitation products was evaluated against Stage-IV radar precipitation estimates. Basic and probabilistic statistical metrics, such as CC, RSME, RBIAS, POD, FAR, CSI, and PSS were employed to assess the performance of the IMERG products. The products captured the event adequately, with a fairly high POD value of 0.9. The best product (Early-run) showed an average correlation coefﬁcient of 0.60. The algorithm used to produce the Final-run improved the quality of the data by removing systematic errors that occurred in the near-real-time products. Less than 5 mm RMSE error was experienced in over three-quarters (ranging from 73% to 76%) of the area by all three IMERG products in estimating the Tropical Storm Imelda. The Early-run product showed a much better RBIAS relatively to the Final-run product. The overall performance was poor, as areas with an acceptable range of RBIAS (i.e., between − 10% and 10%) in all the three IMERG products were only 16% to 17% of the total area. Overall, the Early-run product was found to be better than Late- and Final-run.


Introduction
The U.S. has faced around 258 weather and climate disasters, each of which reached an overall damage exceeding a billion dollars, since 1980. The National Centers for Environmental Information (NCEI), a subsidiary of the National Oceanic and Atmospheric Association (NOAA), records weather and climate events with significant economic and societal impacts in the United States [1]. The total cost of all of these disasters exceeded an estimated damage cost of 1.75 trillion dollars. During 2019 alone, the U.S. experienced 14 separate billion-dollar disasters, and 2019 was marked as the fifth consecutive year (2015-2019) in recorded history with 10 or more individual billion-dollar disasters [1]. The natural disasters in 2019 included: three major inland floods, eight severe storms, two tropical cyclones or depressions (Dorian and Imelda), and one major wildfire event; totaling an overall damage cost exceeding 14 billion dollars [1]. The state of Texas currently leads the nation in the total number of billion-dollar disasters since 1980, with a total estimated damage cost of around 250 billion dollars [2]. In recent history, Texas has experienced some of the nation's deadliest and most expensive natural disasters, such as the Memorial Day Flood (2015), Tax Day Flood (2016), Hurricane Harvey (2017), The Great June Flood (2018), and Tropical Storm Imelda (2019) [3]. Major drivers of these natural disasters were extreme precipitation and heavy downpour, which have left scientists and researchers looking for solutions to better model and predict such extreme scenarios. About three billion or nearly half of the world's population lives in coastal areas, and in the United States the southern coastal states like Louisiana, Florida, and Texas face recurring extreme grated Multi-Satellite Retrievals for the Global Precipitation Mission (IMERG) products in 2014, the use of satellite-based precipitation estimates in hydrologic applications has seen great interest from researchers. Researchers have validated the IMERG products using ground-based observations (radar and rain gauges) and found an overall improvement in precipitation estimates over previous similar products, such as TRMM, PERSIANN, CMORPH, GSMAP, and MSWEP [34,35]. Stage-IV radar data have been widely used as a validation dataset to evaluate the performance of the IMERG products, as researchers aim to improve the algorithm of the IMERG products [36][37][38][39]. Data acquisition from the satellite-based measurements can be almost real-time, with only a 4-hr latency between capture and acquisition and 3.5 months latency for the availability of post-real-time research data. IMERG GPM precipitation product estimates overcome the limitations of both gauge and radar systems by warranting real-time data and global spatial coverage [29]. These products are less affected by localized weather or terrain conditions during weather events such as storms, tornadoes, and flooding, and by other technical issues, than measurements obtained with ground-based radar instruments. Although these products provide precipitation estimates with near-global spatial coverage, especially valuable for mountainous regions, the spatial resolution is still rather coarse [40][41][42][43]. Hence, many researchers have tried to evaluate and validate the performance of IMERG GPM precipitation estimates in different hydrological scenarios and applications [29]. Detailed analysis of the IMERG GPM precipitation estimates and quantification of the errors associated with these products will contribute towards both hydrology and climatology applications if IMERG is used as reference data for poorly instrumented or remote regions [44].
Although IMERG products provide almost global spatial coverage and high temporal resolution, earlier studies have shown unsatisfactory performance of their estimates in Northern China and the Indian subcontinent based on seasonal variations [45][46][47]. The inconsistency of the IMERG products' performance over various regions can be attributed to various sources of errors, which were investigated in multiple error modeling studies [48][49][50][51][52]. Owing to their limitations in terms of spatial resolution and performance inconsistencies, the need remains for the evaluation of IMERG GPM products at regional scales and under major precipitation and flood events, such as hurricanes and tropical storms [53].
Accurate precipitation estimates of major natural disasters are immensely important for many hydrological applications. Better prediction and modeling of the hydrological events will provide key insights for environmental design and will offer opportunities to influence policy and planning for resilient infrastructure networks during crisis events [54][55][56][57][58][59]. Therefore, researchers need to assess the performance of the IMERG GPM products during severe natural disasters and evaluate their potential for filling crucial precipitation measurement gaps [40,47,48,60,61]. The objective of this study was to evaluate the performance of the IMERG GPM products at a local scale in southeast Texas, which is vulnerable to hurricane and storm events. The study also focused on assessing the performance accuracy of GPM satellite-based precipitation products during the Imelda Tropical Storm that made its landfall on 17 September 2019. Section 2 provides a detailed description of the study area, the datasets used, and the methodology, while Section 3 elaborates the results and relevant discussion on the findings. Section 4 represents the study conclusions.

Study Area
The current study area focuses on the southeast region of the state of Texas, the largest state in the contiguous United States, with three climatic regions. The regions differ from each other based on hydrometeorological conditions, topography and land cover. The western region is typically arid and dry, the central and eastern regions are under humid climatic conditions, and the southeastern region is predominantly wet, with subtropical weather conditions [15,62,63]. The southeast region is one of the most vulnerable regions in the United States to extreme precipitation events such as hurricanes, tropical storms, and flash-floods. Tropical Storm Imelda struck the southeast coastal regions of Texas and caused massive flooding in the Houston and Galveston areas extending along the I-10 corridor from Winnie eastward to Fannett, Beaumont, Vidor, and Orange Texas. It was the second most expensive natural disaster in the contiguous United States in 2019 costing a total of 3.5 billion. Our current study area ranges from 93.4 • to 106.7 • West in longitude and 25.7 • to 36.6 • North in latitude. Within this study area, data from radar and GPM IMERG products were used for precipitation estimate analysis. Figure 1 shows the path tracking of the storm as it made landfall in Freeport, TX on 17 September 2019, and continued inward in the latter days.  Figure 1 shows t path tracking of the storm as it made landfall in Freeport, TX on 17 September 2019, a continued inward in the latter days.

Precipitation Data
The IMERG GPM data was downloaded from the Precipitation Measurement M sions website that uses the algorithm version V06B (http://pmm.nasa.gov/data-a cess/downloads/gpm/ accessed on 22 March 2020). The IMERG is the unified U.S. alg rithm that provides the multi-satellite precipitation product for the U.S. GPM team. T IMERG GPM is available in three products, namely Early, Late, and Final. All of t IMERG products have a temporal resolution of one hour and a spatial resolution of 0.1 0.1°. The Early-run product is produced with the first run of the IMERG algorithm, whi has a latency of about 4 h from the observation time. Another product is produced wh the algorithm is run for the second time, with more information from the satellites a

Precipitation Data
The IMERG GPM data was downloaded from the Precipitation Measurement Missions website that uses the algorithm version V06B (http://pmm.nasa.gov/data-access/ downloads/gpm/ accessed on 22 March 2020). The IMERG is the unified U.S. algorithm that provides the multi-satellite precipitation product for the U.S. GPM team. The IMERG GPM is available in three products, namely Early, Late, and Final. All of the IMERG products have a temporal resolution of one hour and a spatial resolution of 0.1 • × 0.1 • . The Early-run product is produced with the first run of the IMERG algorithm, which has a latency of about 4 h from the observation time. Another product is produced when the algorithm is run for the second time, with more information from the satellites at a latency of about 14 h. This product is called Late-run. The Final-run product is developed when the monthly gauge analysis data is obtained, and it has a latency period of about 3.5 months from the observation time. The baseline for the post-real-time Final-run half-hour estimates is calibrated so that they sum to the monthly satellite-gauge Final-run combination. This product is developed mainly for research purposes. The Final-run product is adjusted with a gauge-network which converts it to a more accurate product relative to the Earlyand Late-run products. Since the radar products have an hourly temporal resolution, the IMERG products used in this study were also converted to an hourly format at the end of each hour.
The National Weather Service/National Centers for Environmental Prediction (NWS/ NCEP) Stage-IV Quantitative Precipitation Estimates (QPEs) gridded radar-rainfall (http: //data.eol.ucar.edu/ accessed on 22 March 2020) serves as the source of radar precipitation measurements. The operational Stage-IV precipitation radar data production started its journey in 2001 at NWS River Forecast Centers (RFCs) and has been continued since then [64]. It contains bias-adjusted, quality-controlled hourly rain-gauge precipitation records with a 4 × 4-km spatial resolution. The measurements are in a cumulative hourly format at the end of each hour.

Methodology
Tropical Storm Imelda predominantly affected the southeastern counties of Texas. To better track the devastation of the event, this study focuses on the precipitation downpour around the Houston and Galveston areas. IMERG products at Early-, Late-, and Final-run stages of the event were compared to the NCEP Stage-IV gridded radar product at an hourly temporal resolution for eight consecutive days, starting from 14 September and ending on 21 September, during Tropical Storm Imelda. Three days before and after landfall (17 September 2019) were included in the analysis to reduce the impact of the presence of the extreme storm event on the probabilistic statistical indices.
Euclidean distance was used to identify the nearest radar grid to each satellite grid. For each grid of the satellite product, a nearest radar grid was found using the advantage of the satellite product's spatial spacing being three times higher than that of the radar grids. Subsequently, statistical metrics were calculated to compare the precipitation estimates of the IMERG grids and their nearest radar grids at an hourly temporal resolution. There are 40,434 NWS/NCEP radar grids versus 6577 IMERG satellite grids in Texas. However, only 2570 grids of Stage-IV radar data and 418 IMERG grids were used that cover the analysis area spanning across 17 counties located in southeast Texas.

Statistical Indices
The IMERG products were evaluated using two types of statistical comparative metrics to assess their performance (Table 1), using the radar product as reference. The basic statistical indices included Pearson's correlation coefficient (CC), relative bias (RBIAS), and root mean squared error (RMSE). These statistical parameters were calculated to examine the consistency of the data in contrast to the precipitation radar data. The probabilistic statistical indices: probability of (rainfall) detection (POD), false alarm ratio (FAR), critical success index (CSI), and Peirce skill score (PSS) were calculated to obtain information about the probability and accuracy of precipitation detection by the IMERG products compared to the reference radar products. Probabilistic statistical indices are widely used to assess the probabilistic quality of satellite products in this category [9,15]. The probability of rainfall detection by IMERG satellites during the precipitation reporting of the radar data is evaluated with POD. The FAR index provided information on the inconsistencies between the IMERG satellites falsely detecting rainfall while the radar products were without detection. CSI index aggregates all reports captured by both satellite and radar. The CSI index provides more harmony over the precipitation reporting of both radar and satellite products over the study area, while also providing more insight into the satellite data in various spatiotemporal resolutions [15]. The system's overall accuracy in capturing rainfall events was indicated by PSS, which is the difference between the probability of detection and the probability of false detection. The Kling-Gupta model efficiency coefficient is another powerful statistical metric that measures the goodness-of-fit of a model. Gupta et al. [65] developed this method as an alternative to the mean squared error (MSE) and Nash-Sutcliffe efficiency (NSE) in the context of hydrological modeling. This coefficient has three main components: correlation, bias, and variability. In this study, the updated version, i.e., Kling et al. [66] was used to ensure the variability and bias components were not cross-related. This means the coefficient of variation was used as a measure of variability instead of standard deviation. Table 1 lists all the parameters with their formulae, ranges, and perfect values. Correlation Probability of detection (POD) 1 n: Sample size; Sat n : IMERG rainfall estimate; Rad n : Radar rainfall estimate; Sat: IMERG average rainfall estimate; Rad: Radar average rainfall estimate; SD Sat : Standard deviation of IMERG rainfall estimates; SD Rat : Standard deviation of radar rainfall estimates; C SR : Number of detected rainfall hours recorded by GPM satellite and radar; C R M S : Number of detected rainfall hours recorded by radar but absent in GPM satellite estimates; C S M R : Number of detected rainfall hours recorded by GPM satellite and absent in radar estimates; M SR : Number of rainfall hours missed by both IMERG and radar products (no estimate). 2 In this study, the negative correlation coefficients were also examined to better understand the discrepancies. Figure 2 depicts the spatial distribution of the cumulative precipitation captured by both the radar and GPM IMERG products during Tropical Storm Imelda. Cumulative rainfall estimates show variability in spatial patterns between the Stage-IV gridded product and the three IMERG satellite products for the same grids during the study period ( Figure 2). Both the Early-and Late-run IMERG products showed similar spatial patterns in the coastal and inland areas in the southeast coastal region receiving a high amount of precipitation during the study period (7 days). The products showed the area of highly hit areas (areas receiving more than 250 mm) as~15,000 km 2 and~14,000 km 2 for Earlyand Late-run respectively. The Final-run product showed a different pattern, with a more expanded distribution of the tropical storm, with a~21,500 km 2 area getting more than 250 mm during the seven-day event ( Figure 2C). The Stage-IV radar product ( Figure 2D) showed a different spatial distribution in the cumulative rainfall estimation, showing two hot spots along the southeast coastal region. However, when considering the extreme peak that is about the limit of the NWS rain gauges (~500 mm), the pattern reverses. The Earlyand Late-run reported areas about 3700 km 2 and 3500 km 2 to have received precipitation more than 500 mm, respectively. While, the Final-run showed only a 900 km 2 area with precipitation higher than 500 mm. The most extreme rainfall area was observed on the Stage-IV radar product, with almost 6000 km 2 getting precipitation higher than 500 mm. Early-and Late-run reported areas about 3700 km 2 and 3500 km 2 to have received preci tation more than 500 mm, respectively. While, the Final-run showed only a 900 km 2 ar with precipitation higher than 500 mm. The most extreme rainfall area was observed on t Stage-IV radar product, with almost 6000 km 2 getting precipitation higher than 500 mm. When it comes to the maximum recorded grid totals by the products, all three IMER products reported close enough totals, ranging from 740 mm to 775 mm. However, t maximum precipitation from the radar product was about 1.5 times the IMERG produ at the peak grid reported value of 1150 mm. The maps shown in Figure 2 show that t IMERG satellite product generally overestimated the precipitation in the places which ceived lower precipitation, but underestimated the storm in places where there was ve high precipitation. The satellite also did not detect precipitation in some coastal are where radar products recorded values. All products successfully recoded the center of t storm near the southeast side of the Houston area. Overall, the radar product estimat were more concentrated over the storm center and satellite products covered a larger sp tial extent of the southeast Texas region (Houston, Galveston, and Beaumont areas).  Figure 3B). On the day of the landfall (17 September), the Final-run and Stag IV showed a similar pattern. On the next day (18 September), the two products report different spatial distributions of the storm. The satellite product missed the hotspot in t southern part of the region, as shown in Figure 3. On the day of the heavy precipitati event (19 September), the satellite-based product failed to capture the intensity of the tro ical storm, with a relatively smaller area (4200 km 2 ) getting more than 250 mm. Howev the radar product estimated almost twice (7700 km 2 ) the area reported by IMERG to ha When it comes to the maximum recorded grid totals by the products, all three IMERG products reported close enough totals, ranging from 740 mm to 775 mm. However, the maximum precipitation from the radar product was about 1.5 times the IMERG product at the peak grid reported value of 1150 mm. The maps shown in Figure 2 show that the IMERG satellite product generally overestimated the precipitation in the places which received lower precipitation, but underestimated the storm in places where there was very high precipitation. The satellite also did not detect precipitation in some coastal areas where radar products recorded values. All products successfully recoded the center of the storm near the southeast side of the Houston area. Overall, the radar product estimates were more concentrated over the storm center and satellite products covered a larger spatial extent of the southeast Texas region (Houston, Galveston, and Beaumont areas).  Figure 3B). On the day of the landfall (17 September), the Final-run and Stage-IV showed a similar pattern. On the next day (18 September), the two products reported different spatial distributions of the storm. The satellite product missed the hotspot in the southern part of the region, as shown in Figure 3. On the day of the heavy precipitation event (19 September), the satellite-based product failed to capture the intensity of the tropical storm, with a relatively smaller area (4200 km 2 ) getting more than 250 mm. However, the radar product estimated almost twice (7700 km 2 ) the area reported by IMERG to have received precipitation of more than 250 mm on that day. Moreover, the maximum daily rainfall observed on that day by radar was 2.3 times the maximum daily rainfall observed by IMERG, which was 380 mm. Lastly, the dissipation of the tropical storm was observed on 20 September from both products with quite different spatial patterns. received precipitation of more than 250 mm on that day. Moreover, the maximum daily rainfall observed on that day by radar was 2.3 times the maximum daily rainfall observed by IMERG, which was 380 mm. Lastly, the dissipation of the tropical storm was observed on 20 September from both products with quite different spatial patterns. A time series plot of an area average of the tropical storm shows the temporal evolution at a higher temporal resolution as shown in Figure 4. The time series of the event showed that the precipitation data during Tropical Strom Imelda had a bimodal distribution. The first peak was reached at about 11:00 AM on 18 September UTC time zone. The second and largest peak came more than 24 h later at 2:00 PM on 19 September UTC time zone (Figure 4). The second peak was captured by all the products with reasonable accuracy in its rise and fall. However, the first peak was overestimated significantly by the IMERG Final-run product, with an average of 1.5 mmhr −1 over the study area, which included an area of around 41,100 km 2 . This was approximately 62 million cubic meters of water in an hour spread across the entire study area. The Early-run product showed the best fit among the other IMERG products matching both of the peaks of the radar product. The Late-run also showed a similar underestimation in both of the peaks. However, the highest correlation coefficient of the time series was shown between the radar and the Late-run, with a value of 0.98. Moreover, the other two products also showed high correlation coefficients of 0.97 and 0.96 for the Early-and Final-run, respectively. A recent study conducted over the Sichuan Basin of China revealed that the precipitation deviation of the Final-run products was mainly observed during moderate precipitation events (1-10 mmhr −1 ) [67]. Final-run products showed better detection over Early-and Late-run products for light precipitation (<1 mmhr −1 ) events [67]. This might explain the slight underestimations on 17 September and 18 September when precipitation was lower than 1 mmhr −1 , and the overestimations on the following days when precipitation ranged between 1.25-10 mmhr −1 . A time series plot of an area average of the tropical storm shows the temporal evolution at a higher temporal resolution as shown in Figure 4. The time series of the event showed that the precipitation data during Tropical Strom Imelda had a bimodal distribution. The first peak was reached at about 11:00 AM on 18 September UTC time zone. The second and largest peak came more than 24 h later at 2:00 PM on 19 September UTC time zone (Figure 4). The second peak was captured by all the products with reasonable accuracy in its rise and fall. However, the first peak was overestimated significantly by the IMERG Finalrun product, with an average of 1.5 mmhr −1 over the study area, which included an area of around 41,100 km 2 . This was approximately 62 million cubic meters of water in an hour spread across the entire study area. The Early-run product showed the best fit among the other IMERG products matching both of the peaks of the radar product. The Late-run also showed a similar underestimation in both of the peaks. However, the highest correlation coefficient of the time series was shown between the radar and the Late-run, with a value of 0.98. Moreover, the other two products also showed high correlation coefficients of 0.97 and 0.96 for the Early-and Final-run, respectively. A recent study conducted over the Sichuan Basin of China revealed that the precipitation deviation of the Final-run products was mainly observed during moderate precipitation events (1-10 mmhr −1 ) [67]. Final-run products showed better detection over Early-and Late-run products for light precipitation (<1 mmhr −1 ) events [67]. This might explain the slight underestimations on 17 September and 18 September when precipitation was lower than 1 mmhr −1 , and the overestimations on the following days when precipitation ranged between 1.25-10 mmhr −1 .   Figure 5 shows the CC spatial patterns between Stage-IV radar data and each of the three IMERG-GPM satellite products during Tropical Storm Imelda. The CC spatial pattern shows the consistency of the IMERG-GPM products with more detail. As there is high spatial variability during extreme events, the CC pattern helps to evaluate the performance of the satellite products and illustrate their characteristics. The spatial patterns of CC show a higher consistency in the coastal areas and in places hit hardest by the storm (Figure 5). The average CC was almost the same for the three IMERG products, at 0.574, 0.600, and 0.595 for the Early-, Late-, and Final-run products, respectively. The spatial distribution also shows similar behaviors for all of the IMERG products. Higher CC values were observed for all comparisons over the counties of Jefferson, Liberty, and Montgomery in the northeastern part of the study area ( Figure 5). Generally, coastal areas with a high amount of precipitation had relatively higher correlation coefficients (CC) than the areas further inland. Two-thirds of the study area (above 67%) showed a CC higher than 0.5 for all IMERG products. Though the Late-run product showed the highest consistency of CC throughout the study area, the Final-run product showed a better consistency in the regions near the center of the storm. An earlier study  Figure 5 shows the CC spatial patterns between Stage-IV radar data and each of the three IMERG-GPM satellite products during Tropical Storm Imelda. The CC spatial pattern shows the consistency of the IMERG-GPM products with more detail. As there is high spatial variability during extreme events, the CC pattern helps to evaluate the performance of the satellite products and illustrate their characteristics.   Figure 5 shows the CC spatial patterns between Stage-IV radar data and each of the three IMERG-GPM satellite products during Tropical Storm Imelda. The CC spatial pattern shows the consistency of the IMERG-GPM products with more detail. As there is high spatial variability during extreme events, the CC pattern helps to evaluate the performance of the satellite products and illustrate their characteristics. The spatial patterns of CC show a higher consistency in the coastal areas and in places hit hardest by the storm (Figure 5). The average CC was almost the same for the three IMERG products, at 0.574, 0.600, and 0.595 for the Early-, Late-, and Final-run products, respectively. The spatial distribution also shows similar behaviors for all of the IMERG products. Higher CC values were observed for all comparisons over the counties of Jefferson, Liberty, and Montgomery in the northeastern part of the study area ( Figure 5). Generally, coastal areas with a high amount of precipitation had relatively higher correlation coefficients (CC) than the areas further inland. Two-thirds of the study area (above 67%) showed a CC higher than 0.5 for all IMERG products. Though the Late-run product showed the highest consistency of CC throughout the study area, the Final-run product showed a better consistency in the regions near the center of the storm. An earlier study The spatial patterns of CC show a higher consistency in the coastal areas and in places hit hardest by the storm (Figure 5). The average CC was almost the same for the three IMERG products, at 0.574, 0.600, and 0.595 for the Early-, Late-, and Final-run products, respectively. The spatial distribution also shows similar behaviors for all of the IMERG products. Higher CC values were observed for all comparisons over the counties of Jefferson, Liberty, and Montgomery in the northeastern part of the study area ( Figure 5). Generally, coastal areas with a high amount of precipitation had relatively higher correlation coefficients (CC) than the areas further inland. Two-thirds of the study area (above 67%) showed a CC higher than 0.5 for all IMERG products. Though the Late-run product showed the highest consistency of CC throughout the study area, the Final-run product showed a better consistency in the regions near the center of the storm. An earlier study [53] on the same region during hurricane Harvey found that the complex structure of hurricane Harvey might have significantly affected the performance of the satellite products near the center of the hurricane. Figure 6 shows scatterplots of IMERG Early-, Late-and, Final-run satellite products with Stage-IV radar product during Tropical Storm Imelda. The scatterplot helps to visually inspect the alignment of the individual pixels of the two products during the event). Similar to the CC values, all products followed similar patterns, with almost the same linear model fit (R 2 ) with minor variations in the slopes (0.46, 0.43, and 0.44 for Early-, Late-, and Finalrun, respectively). All the slopes were found to be significantly different than zero at a 0.01 significance level. There were significant differences between the radar estimates and those of the satellite at a pixel scale. Moreover, overestimation of light precipitation and underestimation of heavy precipitation is evident from the scatterplots. However, there was a clear improvement in the quality of the data from the Early-run to the Final-run. In Figure 6, the systematic error is seen as an apparent point cloud, appearing as a ceiling in the top part ( Figure 6A). That ceiling of the point cloud is much lower in the Late-run and disappears in the Final-run. The coefficient of determination (R 2 ) with a value around 0.4 shows a poor fit for the IMERG products. The data visualization and analysis in Figure 6 shows that the IMERG algorithm needs further improvements to minimize systematic errors for the Early-and Late-run products.

Basic Statistical Indices
Atmosphere 2021, 12, x FOR PEER REVIEW 10 of 18 [53] on the same region during hurricane Harvey found that the complex structure of hurricane Harvey might have significantly affected the performance of the satellite products near the center of the hurricane. Figure 6 shows scatterplots of IMERG Early-, Late-and, Final-run satellite products with Stage-IV radar product during Tropical Storm Imelda. The scatterplot helps to visually inspect the alignment of the individual pixels of the two products during the event). Similar to the CC values, all products followed similar patterns, with almost the same linear model fit (R 2 ) with minor variations in the slopes (0.46, 0.43, and 0.44 for Early-, Late-, and Final-run, respectively). All the slopes were found to be significantly different than zero at a 0.01 significance level. There were significant differences between the radar estimates and those of the satellite at a pixel scale. Moreover, overestimation of light precipitation and underestimation of heavy precipitation is evident from the scatterplots. However, there was a clear improvement in the quality of the data from the Early-run to the Final-run. In Figure 6, the systematic error is seen as an apparent point cloud, appearing as a ceiling in the top part ( Figure 6A). That ceiling of the point cloud is much lower in the Late-run and disappears in the Final-run. The coefficient of determination (R 2 ) with a value around 0.4 shows a poor fit for the IMERG products. The data visualization and analysis in Figure 6 shows that the IMERG algorithm needs further improvements to minimize systematic errors for the Early-and Late-run products. The spatial distribution of the RMSE (Figure 7) shows an almost similar pattern in all the IMERG products. It is clearly seen that areas highly affected by Tropical Strom Imelda (southeast coastal regions stretching up to the border with Louisiana) show a higher RMSE, with two hot spots in all three products. The RMSE of the products suggests that the products were improved, in minimizing the average error compared to a similar study conducted in the same area earlier [53]. The mean and median RMSE was 3.74 mm and 3.67 mm, 3.88 mm and 3.27 mm, and 3.15 mm and 3.42 mm for the Early-, Late-, and Finalrun, respectively. In all of the three IMERG products, about three quarters (ranging from 73% to 76%) of the area experienced an RMSE of less than 5 mm in estimating the Tropical Storm Imelda. Cui et al. [44] conducted a study over the central and eastern United States comparing IMERG products with ground-based observations and found that the IMERG products showed the largest RMSE values during summer and fall months, caused by severe underestimation. They attributed this error to dry biases of the IMERG algorithm for precipitation estimation of convective systems. The spatial distribution of the RMSE (Figure 7) shows an almost similar pattern in all the IMERG products. It is clearly seen that areas highly affected by Tropical Strom Imelda (southeast coastal regions stretching up to the border with Louisiana) show a higher RMSE, with two hot spots in all three products. The RMSE of the products suggests that the products were improved, in minimizing the average error compared to a similar study conducted in the same area earlier [53]. The mean and median RMSE was 3.74 mm and 3.67 mm, 3.88 mm and 3.27 mm, and 3.15 mm and 3.42 mm for the Early-, Late-, and Final-run, respectively. In all of the three IMERG products, about three quarters (ranging from 73% to 76%) of the area experienced an RMSE of less than 5 mm in estimating the Tropical Storm Imelda. Cui et al. [44] conducted a study over the central and eastern United States comparing IMERG products with ground-based observations and found that the IMERG products showed the largest RMSE values during summer and fall months, caused by severe underestimation. They attributed this error to dry biases of the IMERG algorithm for precipitation estimation of convective systems. Figure 8 illustrates the spatial variability of the relative bias (RBIAS) over the study area during Tropical Storm Imelda. The spatial distribution of the RBIAS shows similar patterns for the near-real-time IMERG products (Early and Late). However, the spatial pattern was different for the Final-run product, which had a latency of about 3.5 months (Figure 8). The inland areas show much higher positive RBIAS (overestimation) in the Finalrun product than the two earlier products, especially in counties such as San Jacinto, Polk, and Fort Bend, where the positive RBIAS was more than 75%. This might be attributed to the tendency of IMERG Final-run products to overestimate precipitation measurements in inland areas during extreme storm events [68,69]. With the Early-and Late-run products, the coastal areas show a higher positive RBIAS and the inlands show a lower negative RBIAS. Overall, the Early-and Late-run products underestimated the tropical storm, with mean RBIAS values of −4% and −11%, and medians of −15% and −18%, respectively. However, the Final-run product showed contrasting results by providing a mean RBIAS of 41% and a median RBIAS of 20%. This suggested that the Final-run product overestimated the tropical storm by a large margin. However, when we follow the spatial distribution of the RBIAS (Figure 8C), it is very clear that the Final-run product significantly overestimated precipitation in less-hit areas. In contrast, it underestimated precipitation in the hard-hit areas. For all three products, about 50% of the areas experienced RBIAS between −30% and 30%. Hence, it can be said that the performance of the IMERG Final-run product was not as good as the Early-and Late-run products in capturing the storm event. In more than 30% of the study area, the Final-run product indicated an overestimation of the storm event by more than 50%. However, the Late and Early product estimates showed an overestimation of more than 50%, in 8% and 12% of the study area, respectively. This suggest that there was a significant problem of overestimation in the Final product, especially in the outskirts of the study area ( Figure 8C). The areas with an acceptable range of RBIAS (i.e., between −10% and 10%) in all the three IMERG products were about 16% to 17% of the total area. This outcome follows similar ones from earlier studies, where GPM products underestimated precipitation measurements during extreme events [35,53,68,69]. Table 2 summarizes the performance of the three IMERG-GPM products based on basic statistical indices for the study period.  Figure 8 illustrates the spatial variability of the relative bias (RBIAS) over the study area during Tropical Storm Imelda. The spatial distribution of the RBIAS shows similar patterns for the near-real-time IMERG products (Early and Late). However, the spatial pattern was different for the Final-run product, which had a latency of about 3.5 months (Figure 8). The inland areas show much higher positive RBIAS (overestimation) in the Final-run product than the two earlier products, especially in counties such as San Jacinto, Polk, and Fort Bend, where the positive RBIAS was more than 75%. This might be attributed to the tendency of IMERG Final-run products to overestimate precipitation measurements in inland areas during extreme storm events [68,69]. With the Early-and Laterun products, the coastal areas show a higher positive RBIAS and the inlands show a lower negative RBIAS. Overall, the Early-and Late-run products underestimated the tropical storm, with mean RBIAS values of −4% and −11%, and medians of −15% and −18%, respectively. However, the Final-run product showed contrasting results by providing a mean RBIAS of 41% and a median RBIAS of 20%. This suggested that the Final-run product overestimated the tropical storm by a large margin. However, when we follow the spatial distribution of the RBIAS (Figure 8C), it is very clear that the Final-run product significantly overestimated precipitation in less-hit areas. In contrast, it underestimated precipitation in the hard-hit areas. For all three products, about 50% of the areas experienced RBIAS between −30% and 30%. Hence, it can be said that the performance of the IMERG Final-run product was not as good as the Early-and Late-run products in capturing the storm event. In more than 30% of the study area, the Final-run product indicated The Kling-Gupta model efficiency coefficient (KGE) shows a spatial distribution pattern similar to the other basic statistical indices above. The early and late products outperformed the final product, especially in the places that were less affected by the storm events (low precipitation), as shown in Figure 9. The average values of the KGE show that the late product was found to be the best model for capturing the event, with an average of 0.39, followed by the Early and Final products, with KGE values of 0.36 and 0.16, re-  The Kling-Gupta model efficiency coefficient (KGE) shows a spatial distribution pattern similar to the other basic statistical indices above. The early and late products outperformed the final product, especially in the places that were less affected by the storm events (low precipitation), as shown in Figure 9. The average values of the KGE show that the late product was found to be the best model for capturing the event, with an average of 0.39, followed by the Early and Final products, with KGE values of 0.36 and 0.16, respectively. Around one-third of the study area showed a KGE of more than 0.5 in the Early and Late products, whereas only one-quarter was indicated in the Final product. The Kling-Gupta model efficiency coefficient (KGE) shows a spatial distribution pattern similar to the other basic statistical indices above. The early and late products outperformed the final product, especially in the places that were less affected by the storm events (low precipitation), as shown in Figure 9. The average values of the KGE show that the late product was found to be the best model for capturing the event, with an average of 0.39, followed by the Early and Final products, with KGE values of 0.36 and 0.16, respectively. Around one-third of the study area showed a KGE of more than 0.5 in the Early and Late products, whereas only one-quarter was indicated in the Final product.

Probabilistic Statistical Indices
In the POD analysis ( Figure 10), Final-run was found to have the best results, with the highest POD values out of the three GPM products. The spatial distribution of the POD in all three products reveals that, as expected, the POD was found to be higher in counties that registered higher precipitation. The average POD for all three products was around 0.90, with a higher median of 0.92. For the Final-run product, the POD was observed to be greater than 0.80 in about 95% of the study area. However, the short study period and small study area (8 days with 192 observations) could have skewed the POD values. Moreover, the timeframe included in the study covers an extreme event that can be easily detected by both

Probabilistic Statistical Indices
In the POD analysis (Figure 10), Final-run was found to have the best results, with the highest POD values out of the three GPM products. The spatial distribution of the POD in all three products reveals that, as expected, the POD was found to be higher in counties that registered higher precipitation. The average POD for all three products was around 0.90, with a higher median of 0.92. For the Final-run product, the POD was observed to be greater than 0.80 in about 95% of the study area. However, the short study period and small study area (8 days with 192 observations) could have skewed the POD values. Moreover, the timeframe included in the study covers an extreme event that can be easily detected by both radar and satellite products. Hence, to normalize the presence of the extreme event, three days before landfall and after dissipation were included in the analysis.
In Figure 11, the false alarm ratio (FAR) followed similar patterns to those of the POD. Lower FAR values were registered in the areas heavily affected by the storm, and higher FAR values were seen in the periphery of the study area. All three IMERG products showed a similar spatial pattern. The Early-and Late-run products showed a relatively better FAR, with an average value of 0.40, however, the Final-run product registered a FAR value of 0.44. The Early product showed its advantage in FAR by registering a lower median, of about 0.35, compared to 0.38 and 0.42 in the Late-and Final-run products, respectively. Nearly two-thirds (67%) of the study area for the Early-run product showed a FAR of less than 0.4, whereas, only 40% of the areas in the Final-run showed a FAR of less than 0.4. Generally, the FAR by all the IMERG products was found to be very high, especially when considering the analysis was performed for a short period (8 days) and in an extreme event condition. radar and satellite products. Hence, to normalize the presence of the extreme event, three days before landfall and after dissipation were included in the analysis. In Figure 11, the false alarm ratio (FAR) followed similar patterns to those of the POD. Lower FAR values were registered in the areas heavily affected by the storm, and higher FAR values were seen in the periphery of the study area. All three IMERG products showed a similar spatial pattern. The Early-and Late-run products showed a relatively better FAR, with an average value of 0.40, however, the Final-run product registered a FAR value of 0.44. The Early product showed its advantage in FAR by registering a lower median, of about 0.35, compared to 0.38 and 0.42 in the Late-and Final-run products, respectively. Nearly two-thirds (67%) of the study area for the Early-run product showed a FAR of less than 0.4, whereas, only 40% of the areas in the Final-run showed a FAR of less than 0.4. Generally, the FAR by all the IMERG products was found to be very high, especially when considering the analysis was performed for a short period (8 days) and in an extreme event condition. The spatial distribution of the critical success index (CSI) again supported that the near-real-time IMERG products (Early and Late) outperformed the Final-run, especially in the areas where the storm was most severe (Figure 12). The Early-run product showed the best performance in terms of CSI, with an average CSI of 0.57 and a median of 0.60. The Final-run product was found to have the lowest CSI, having an average value of 0.53 and a median of 0.55. To put this into perspective, in the Early-run product, about 20% of the study area showed a CSI of more than 0.7, but only 5% of the area recorded a similar CSI in the Final-run product.  In Figure 11, the false alarm ratio (FAR) followed similar patterns to those of the POD. Lower FAR values were registered in the areas heavily affected by the storm, and higher FAR values were seen in the periphery of the study area. All three IMERG products showed a similar spatial pattern. The Early-and Late-run products showed a relatively better FAR, with an average value of 0.40, however, the Final-run product registered a FAR value of 0.44. The Early product showed its advantage in FAR by registering a lower median, of about 0.35, compared to 0.38 and 0.42 in the Late-and Final-run products, respectively. Nearly two-thirds (67%) of the study area for the Early-run product showed a FAR of less than 0.4, whereas, only 40% of the areas in the Final-run showed a FAR of less than 0.4. Generally, the FAR by all the IMERG products was found to be very high, especially when considering the analysis was performed for a short period (8 days) and in an extreme event condition. The spatial distribution of the critical success index (CSI) again supported that the near-real-time IMERG products (Early and Late) outperformed the Final-run, especially in the areas where the storm was most severe (Figure 12). The Early-run product showed the best performance in terms of CSI, with an average CSI of 0.57 and a median of 0.60. The Final-run product was found to have the lowest CSI, having an average value of 0.53 and a median of 0.55. To put this into perspective, in the Early-run product, about 20% of the study area showed a CSI of more than 0.7, but only 5% of the area recorded a similar CSI in the Final-run product. The spatial distribution of the critical success index (CSI) again supported that the near-real-time IMERG products (Early and Late) outperformed the Final-run, especially in the areas where the storm was most severe (Figure 12). The Early-run product showed the best performance in terms of CSI, with an average CSI of 0.57 and a median of 0.60. The Final-run product was found to have the lowest CSI, having an average value of 0.53 and a median of 0.55. To put this into perspective, in the Early-run product, about 20% of the study area showed a CSI of more than 0.7, but only 5% of the area recorded a similar CSI in the Final-run product. The Peirce skill score (PSS) helps by providing insights into the ability of satellite products to separate the actual occurrences of precipitation from the no-occurrence instances. PSS shows the overall robustness of the system to accurately predict the tropical storm event. The Early-and Late-products again outperformed the Final-run in the PSS analysis (Figure 13), in a similar way as for FAR and CSI. The spatial distribution of PSS from the three products showed a PSS of more than 0.70 in case of the Early and Late products, however, only 36% of the area was found to have a PSS higher than 0.70 in the The Peirce skill score (PSS) helps by providing insights into the ability of satellite products to separate the actual occurrences of precipitation from the no-occurrence instances. PSS shows the overall robustness of the system to accurately predict the tropical storm event. The Early-and Late-products again outperformed the Final-run in the PSS analysis (Figure 13), in a similar way as for FAR and CSI. The spatial distribution of PSS from the three products showed a PSS of more than 0.70 in case of the Early and Late products, however, only 36% of the area was found to have a PSS higher than 0.70 in the Final-run product. Table 3 summarizes the probabilistic statistical indices of the performance of the Early-, Late-, and Final-run products. A comparison of the summarized data with an earlier study [44]   The Peirce skill score (PSS) helps by providing insights into the ability of satellite products to separate the actual occurrences of precipitation from the no-occurrence instances. PSS shows the overall robustness of the system to accurately predict the tropical storm event. The Early-and Late-products again outperformed the Final-run in the PSS analysis (Figure 13), in a similar way as for FAR and CSI. The spatial distribution of PSS from the three products showed a PSS of more than 0.70 in case of the Early and Late products, however, only 36% of the area was found to have a PSS higher than 0.70 in the Final-run product. Table 3 summarizes the probabilistic statistical indices of the performance of the Early-, Late-, and Final-run products. A comparison of the summarized data with an earlier study [44]

Conclusions
This study focused on the performance of the GPM-IMERG-Early-, Late-, and Finalrun satellite precipitation products during the Tropical Storm Imelda. Categorized as a tropical storm at midday on 17 September by the National Hurricane Center (NHC), Imelda

Conclusions
This study focused on the performance of the GPM-IMERG-Early-, Late-, and Finalrun satellite precipitation products during the Tropical Storm Imelda. Categorized as a tropical storm at midday on 17 September by the National Hurricane Center (NHC), Imelda made landfall with massive downpours during 17-19 September 2019. The spatial distribution and temporal evolution of the storm during 17-20 September showed that the storm caused two major peak precipitations; one on 18 September and one on 19 September.
The quality of IMERG products at hourly and 0.1 • × 0.1 • temporal resolutions during the storm was evaluated using Stage-IV radar measurements as a reference. The hourly Stage-IV radar product was calibrated and adjusted using rain gauge networks in the surrounding study area. The timeframe of the data was adjusted to include three days before the landfall and three days after the dissipation to better capture the temporal and spatial variability of the event.
Generally, areas that received a higher amount of precipitation were found to have relatively good statistical metrics by all three of the IMERG GPM products. Maps of the CC showed that the alignment of the IMERG and radar was not as good as other studies have shown, with an average of 0.60 for the best product [53,62,70]. Moreover, the coefficient of determination (R 2 ) for all three IMERG products was just 0.4. When it comes to RMSE, threequarters (ranging from 73% to 76%) of the area experienced an RMSE of less than 5 mm in all three IMERG products. The IMERG GPM Final-run product showed a huge overestimation of the storm, which was confirmed by basic statistical indices. The spatial distribution of the RBIAS indicated similar patterns in the near-real-time IMERG products (Early and Late). However, the gauge-adjusted IMERG product (Final-run) showed the very high overestimation and underestimation across the affected region varied by location (inland and coastal areas). Significant underestimation was found in the hard-hit coastal areas, whereas significant overestimation was recorded in the relatively less-hit inland areas. This summarizes the weakness of satellite-based precipitation products; it has been reported in the literature for a long time that the IMERG products underestimate precipitations during extreme (heavy) events and overestimate average to low events [70][71][72][73][74][75]. It is also notable that during summer and fall months, IMERG products have been found to show dry biases for convective systems [44]. The Early-run showed a much better RBIAS relative to the Final-run product. However, when considering the overall performance of the IMERG products, it was poor, because areas with an acceptable range of RBIAS (i.e., between −10% and 10%) in all the three IMERG products were only about 16% to 17% of the total area.
Probabilistic statistical indices showed that all three products were successful in capturing the precipitation event with adequate accuracy (average POD ≈ 0.9). FAR index results were less strong, with the Early-and Late-run products having almost 67% of the areas with a FAR <0.4; however, only 40% of the areas in the Final-run product showed a FAR <0.4. PSS and CSI scores showed that the IMERG GPM Final-run product was outperformed by the near-real-time IMERG products (Early and Late) in capturing the entire storm event with better accuracy.
Overall, between the products, the Early-run was found to be better than the Late-and Final-run products for capturing the storm precipitation. Thus, it can be opined that the processing algorithm that used the monthly gauge analysis to develop the Final-run failed to capture extreme events that occurred for more than a few days. Furthermore, better precipitation estimates for all stages of the GPM products will be crucial in hydrological analysis and modeling during extreme events, such as hurricanes, tornadoes, and tropical storms. This conclusion is supported by earlier studies on IMERG-GPM products during extreme precipitation events [35,53,68,69].