Evaluation of Satellite Precipitation Estimates over Australia

: This study evaluates the U.S. National Oceanographic and Atmospheric Administration’s (NOAA) Climate Prediction Center morphing technique (CMORPH) and the Japan Aerospace Exploration Agency’s (JAXA) Global Satellite Mapping of Precipitation (GSMaP) satellite precipitation estimates over Australia across an 18 year period from 2001 to 2018. The evaluation was performed on a monthly time scale and used both point and gridded rain gauge data as the reference dataset. Overall statistics demonstrated that satellite precipitation estimates did exhibit skill over Australia and that gauge-blending yielded a notable increase in performance. Dependencies of performance on geography, season, and rainfall intensity were also investigated. The skill of satellite precipitation detection was reduced in areas of elevated topography and where cold frontal rainfall was the main precipitation source. Areas where rain gauge coverage was sparse also exhibited reduced skill. In terms of seasons, the performance was relatively similar across the year, with austral summer (DJF) exhibiting slightly better performance. The skill of the satellite precipitation estimates was highly dependent on rainfall intensity. The highest skill was obtained for moderate rainfall amounts (2–4 mm / day). There was an overestimation of low-end rainfall amounts and an underestimation in both the frequency and amount for high-end rainfall. Overall, CMORPH and GSMaP datasets were evaluated as useful sources of satellite precipitation estimates over Australia.


Introduction
Precipitation is an essential climate variable and is one of the most important climate variables affecting human activities [1]. Variations in the intensity, duration, and frequency of precipitation directly impact water availability for many millions of people and industries. Measuring rainfall over broad areas enables efficient water management and disaster response and recovery.
The conventional method of using rain gauges to estimate spatial patterns of rainfall provides a direct measurement of surface rainfall but spatial density can be an issue across many parts of the world, including over the oceans, where the installation of an adequate rain gauge network is economically or physically unfeasible [2]. This greatly affects the ability to accurately assess rainfall across a region as it is a variable that exhibits a high degree of spatial variation and a point-based measurement may not provide an ideal representation of an area. Rain gauge estimates are subject to instrumental errors with many relying on manual sampling methods. Clock synchronization and mechanical faults are examples of potential issues [3]. Furthermore, they are also affected by localised effects including wind (precipitation can be prevented from entering the gauge), evaporation (some April 2014 to March 2016 and showed that GSMaP overestimated light precipitation (<16 mm/day) while underestimating heavier precipitation (>32 mm/day) [20]. Hit bias rather than false or missed event bias was noted as the major error with false event bias also being more significant than missed event bias. The introduction of a bias-correction scheme is largely able to correct a positive bias by scaling down the magnitudes, but the inability to correct missed events means there has been much less success in correcting the negative bias [13].
Previous studies have also indicated that a significant degradation of performance occurred over orography, with satellites underestimating rainfall over higher elevations [18,21]. The bias can be worse during winter where the poor detection of snowfall, as well as rainfall, over cold surfaces leads to both missed events and an underestimation of intensity [22]. Derin et al. (2016) performed an evaluation over the western Black Sea region of Turkey, an area featuring complex topography in the form of a mountain range, from 2007 to 2011, and found that CMORPH exhibited a bias of −54% for the windward side of the region during the warm season, increasing to −82% during the cold season [21]. Kubota et al. (2009) found that the greatest biases in GSMaP were over coastal areas with frequent orographic rainfall and that estimates were generally better over the ocean than over the land [23]. Coastal regions are likely to present difficulties as the retrieval algorithm struggles to account for both ocean and land surfaces in a single grid point.
This study aims to contribute to the validation of satellite rainfall data. It differs from earlier studies by evaluating satellite precipitation estimates over a relatively long period of record (18 years) with a focus on Australia, which has a relatively dense rain gauge network over a large area when compared to other world regions [15]. The use of a percentile-based verification statistic is an innovative feature of this study, while the use of both gridded and point gauge data as a reference adds additional insight compared to using just one. The CMORPH and GSMaP datasets were chosen due to their provision as part of the World Meteorological Organization (WMO) Space-based Weather and Climate Extremes Monitoring Demonstration Project (SEMDP) [24]. This project aims to introduce operational satellite rainfall monitoring products based on these two datasets, to East Asia and Western Pacific countries, of which many lack adequate rainfall monitoring capabilities due to the absence of an extensive and accurate rain gauge network. The verification of these datasets is thus an important step for the creation of these products. Moreover, the cloud-motion advection method used to blend PMW and IR data ranks amongst the best in terms of performance across various satellite methods used to estimate precipitation [25]. The variance of the errors in the satellite precipitation estimates with location, season, and rainfall intensity was investigated.
The paper is organised as follows. Section 2 describes the study area, datasets, and methods used in the study. Section 3 presents the results while Section 4 discuss the findings. Section 5 summarises the major findings and provides directions for future work.

Study Area
Australia has a land area of around 8.6 million km 2 , making it the sixth-largest country in the world by land size. Its large geographical size means that it experiences a variety of climates, including temperate zones to the south east and south west, tropical zones to the north, and deserts or semi-arid areas across much of the interior [26]. The main orographic feature occurs in the form of the Great Dividing Range (GDR), a mountain range along the eastern side of the country that extends more than 3500 km from the north-eastern tip of Queensland, towards and along the coast of New South Wales, and into the eastern and central parts of Victoria. The width of the GDR ranges from about 160 to 300 km with a maximum elevation of 2228 m, though the typical elevation range for the highlands is from 300 to 1600 m [27]. In Figure 1, the domain of analysis is shown, with the stations used in the study also marked. Remote Sens. 2020, 12, x FOR PEER REVIEW 4 of 18 (a) (b)

Datasets
As part of the WMO SEMDP, access to GSMaP and CMORPH data were provided by JAXA and NOAA, respectively. Both datasets of satellite precipitation estimates employ the cloud-advection technique introduced in Section 1. The GSMaP version used was GSMaP Gauge-adjusted Near-Real-Time (GNRT) Version 6. To allow for a faster data latency, gauge adjustment over land was performed against the gauge-calibrated version (GSMaP gauge) from the past period, which, itself, is calibrated by matching daily satellite rainfall estimates to a global gauge analysis, CPC Unified Gauge-Based Analysis of Global Daily Precipitation (CPC Unified) [28]. Further details can be found in the GSMaP technical documentation [28].
Two versions of CMORPH were used. These were the bias-corrected CMORPH (CMORPH CRT) and the gauge-blended CMORPH (CMORPH BLD) datasets. Bias correction over land was also performed using the CPC Unified analysis but using a different algorithm that involves matching to probability distribution function (PDF) tables from the past 30 days. The gauge-blended version uses the bias-corrected version as a first guess and then incorporates the gauge data based on the density of the observations; further details can be found in [10,13].
Consequently, this study used two gauge-corrected sets (GSMaP and CMORPH CRT) and one that had been further processed by combining CMORPH CRT with gauge data (CMORPH BLD).
The reference datasets used were both based on the Bureau of Meteorology (BoM) rain gauges with the Australian Water Availability Project (AWAP) analysis being used as the reference dataset for the gridded comparison and the values from the stations themselves being used for the point comparison. The AWAP rainfall analysis is generated by decomposing the field into a climatology component and an anomaly component based on the ratio of the observed rainfall value to the climatology [29]. The Barnes successive-correction technique is applied to the anomaly component and added to the monthly climatological averages, which were derived using a three-dimensional smooth splice approach [29]. The climatological averages were generated from 30 years of monthly totals [29]. For the point comparison, only 'Series 0' stations were chosen as these stations are Bureaumaintained and conform to International Civil Aviation Organization (ICAO) standards. The minimum number of stations used across the period was 4764. As discussed earlier, even though rain gauge network measurements can be taken as 'truth', they still contain errors, which will artificially inflate the errors attributed to satellite measurements.
Details on the spatial and temporal resolutions of the gridded datasets along with their domains are shown in Table 1.

Datasets
As part of the WMO SEMDP, access to GSMaP and CMORPH data were provided by JAXA and NOAA, respectively. Both datasets of satellite precipitation estimates employ the cloud-advection technique introduced in Section 1. The GSMaP version used was GSMaP Gauge-adjusted Near-Real-Time (GNRT) Version 6. To allow for a faster data latency, gauge adjustment over land was performed against the gauge-calibrated version (GSMaP gauge) from the past period, which, itself, is calibrated by matching daily satellite rainfall estimates to a global gauge analysis, CPC Unified Gauge-Based Analysis of Global Daily Precipitation (CPC Unified) [28]. Further details can be found in the GSMaP technical documentation [28].
Two versions of CMORPH were used. These were the bias-corrected CMORPH (CMORPH CRT) and the gauge-blended CMORPH (CMORPH BLD) datasets. Bias correction over land was also performed using the CPC Unified analysis but using a different algorithm that involves matching to probability distribution function (PDF) tables from the past 30 days. The gauge-blended version uses the bias-corrected version as a first guess and then incorporates the gauge data based on the density of the observations; further details can be found in [10,13].
Consequently, this study used two gauge-corrected sets (GSMaP and CMORPH CRT) and one that had been further processed by combining CMORPH CRT with gauge data (CMORPH BLD).
The reference datasets used were both based on the Bureau of Meteorology (BoM) rain gauges with the Australian Water Availability Project (AWAP) analysis being used as the reference dataset for the gridded comparison and the values from the stations themselves being used for the point comparison. The AWAP rainfall analysis is generated by decomposing the field into a climatology component and an anomaly component based on the ratio of the observed rainfall value to the climatology [29]. The Barnes successive-correction technique is applied to the anomaly component and added to the monthly climatological averages, which were derived using a three-dimensional smooth splice approach [29]. The climatological averages were generated from 30 years of monthly totals [29]. For the point comparison, only 'Series 0' stations were chosen as these stations are Bureau-maintained and conform to International Civil Aviation Organization (ICAO) standards. The minimum number of stations used across the period was 4764. As discussed earlier, even though rain gauge network measurements can be taken as 'truth', they still contain errors, which will artificially inflate the errors attributed to satellite measurements.
Details on the spatial and temporal resolutions of the gridded datasets along with their domains are shown in Table 1.

Method
The satellite datasets were compared against the gauge-based datasets. Both a gridded comparison and point comparison were performed. When performing the comparisons, all the datasets were linearly interpolated to the same spatial resolution. An interpolation to the coarsest resolution was chosen (i.e., 0.25 • ). Values at each grid box from these interpolated grids could then be compared against each other for the gridded comparison.
For the point comparison, values corresponding to the location of a station were linearly interpolated from each grid. These values could then be compared to the actual station value. Inclusion of the AWAP dataset was done to provide an additional reference. A complication arose from the fact that the gauge-based data values were 24 h accumulated values to 0900 local standard time (LST), while the satellite data values were values to 00 UTC. As this study is focused on monthly comparisons, the longer period greatly reduces the impact of this timing inconsistency. An elementary remedy would be to have shifted the gauge and AWAP values one day ahead of their satellite counterparts, reducing the inconsistency to two hours or less. Doing this adjustment resulted in improvements of less than 2% and so the unadjusted datasets were used for simplicity.
Both continuous and percentile-based statistics were calculated. The continuous statistics calculated were the mean bias (MB), root-mean-square error (RMSE), mean average error (MAE), and the Pearson correlation coefficient (R). The MB is the average difference between the estimated and observed values, which gives an indicator of the overall bias. The MAE measures the average magnitude of the error. To remove the effect of higher rainfall averages leading to larger errors, the MAE was also normalised through division by the average rainfall producing the normalised mean average error. The RMSE also measures the average error magnitude but is weighted towards larger errors. R is commonly known as the linear correlation coefficient as it measures the linear association between the estimated and observed datasets.
In addition to continuous verification statistics, a percentile-based verification can also be performed to measure how well the datasets reproduce the occurrence of low-and high-end values. This is a novel verification metric that the authors have deemed useful to assess because, even if the satellites performs poorly in terms of absolute values, they may still produce accurate values relative to their own climatology, meaning there is the potential to produce percentile-based products. Such products have already been produced (e.g., both NOAA and JAXA have generated satellite-derived versions of the Standardized Precipitation Index, as well as rainfall values expressed as high-end percentiles). The quintile for an observed month at a location could be derived by ranking that value against the same month but for different years across the verification period. The ranking can then be converted to a percentile through linear interpolation. If a bottom or top quintile was observed, the value from the satellite dataset was then investigated. If it was also registered in the same quintile, this was recorded as a success; otherwise, it was recorded as a failure. The number of successes was then converted to a hit rate. This hit rate was only calculated for the gridded comparison as the varying number of stations across the verification period made a point-based comparison more difficult. The use of quintiles provided greater differentiation of extreme values than terciles or quartiles, while the record length was considered too short for the use of deciles.
The equations for the metrics are summarised in Table 2 with E i representing the estimated value at a point or grid box i, O i being the observed value, and N being the number of samples (across the whole domain and period) for the continuous metrics. Table 2. Summary of metrics used.

Metric Equation Range
Perfect Value Unit

Results
The results of the gridded continuous comparison against AWAP data are presented in Figure 2. The linear correlation of the satellite rainfall estimates ranges from 0.77 to 0.88, while the MAE ranges from 0.61 to 0.43 mm/day. The trend amongst all the metrics is the same with performance being the best for CMORPH BLD, then CMORPH CRT, and lastly GSMaP. CMORPH CRT and GSMaP display similar performances, while there is a clear increase in performance for CMORPH BLD.
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 18 The equations for the metrics are summarised in Table 2 with Ei representing the estimated value at a point or grid box i, Oi being the observed value, and N being the number of samples (across the whole domain and period) for the continuous metrics. Table 2. Summary of metrics used.

Metric Equation Range Perfect Value Unit
Mean bias (MB) Normalised mean average error

Results
The results of the gridded continuous comparison against AWAP data are presented in Figure  2. The linear correlation of the satellite rainfall estimates ranges from 0.77 to 0.88, while the MAE ranges from 0.61 to 0.43 mm/day. The trend amongst all the metrics is the same with performance being the best for CMORPH BLD, then CMORPH CRT, and lastly GSMaP. CMORPH CRT and GSMaP display similar performances, while there is a clear increase in performance for CMORPH BLD. The gridded percentile-based comparison against AWAP data is shown in Figure 3. The satellite datasets obtain around a 70%-80% hit rate for the bottom quintile whilst scoring around 10% less for The gridded percentile-based comparison against AWAP data is shown in Figure 3. The satellite datasets obtain around a 70%-80% hit rate for the bottom quintile whilst scoring around 10% less for Remote Sens. 2020, 12, 678 7 of 17 the top quintile. This suggests the rainfall values produced by the satellites are relatively accurate in terms of climatological occurrence, with better performance exhibited for low-end extremes. There appears to be potential in generating percentile-based products from satellite data.
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 18 the top quintile. This suggests the rainfall values produced by the satellites are relatively accurate in terms of climatological occurrence, with better performance exhibited for low-end extremes. There appears to be potential in generating percentile-based products from satellite data.   The ranking of performance between the satellite datasets remains the same for both the continuous and the percentile-based statistics. The benefit of blending in gauge data is again displayed, with CMORPH BLD showing significant improvement over the unblended datasets and skill comparable to AWAP. As the trend between MB, MAE, RMSE, and R is the same, future references to continuous statistics will refer to just MAE, normalised MAE and R for brevity.
The values and residuals of the datasets against point gauge data are shown in Figure 5. There appears to be a tendency towards an overestimation for low rainfall months and an underestimation for high rainfall months. AWAP and, to a lesser extent, CMORPH BLD were able to capture the highend rainfall months more accurately, with observation of months where more than 40 mm/day was recorded, being distinctly better. All datasets appear to struggle with very high-end rainfall months (>60 mm/day). These gauge totals sit along the lower boundary, which indicates that the datasets observed little rainfall while the gauges observed a significant amount. The fact that even AWAP does not depict these totals well suggests that gridded datasets systematically struggle with these very high-end values. A likely reason is that the gridded datasets smooth down point values as part of their objective analysis process and so it is expected that high-end totals will be underrepresented by the grids. The impact from this effect would be worse if there were nearby gauges with low totals.  the top quintile. This suggests the rainfall values produced by the satellites are relatively accurate in terms of climatological occurrence, with better performance exhibited for low-end extremes. There appears to be potential in generating percentile-based products from satellite data.   The ranking of performance between the satellite datasets remains the same for both the continuous and the percentile-based statistics. The benefit of blending in gauge data is again displayed, with CMORPH BLD showing significant improvement over the unblended datasets and skill comparable to AWAP. As the trend between MB, MAE, RMSE, and R is the same, future references to continuous statistics will refer to just MAE, normalised MAE and R for brevity.
The values and residuals of the datasets against point gauge data are shown in Figure 5. There appears to be a tendency towards an overestimation for low rainfall months and an underestimation for high rainfall months. AWAP and, to a lesser extent, CMORPH BLD were able to capture the highend rainfall months more accurately, with observation of months where more than 40 mm/day was recorded, being distinctly better. All datasets appear to struggle with very high-end rainfall months (>60 mm/day). These gauge totals sit along the lower boundary, which indicates that the datasets observed little rainfall while the gauges observed a significant amount. The fact that even AWAP does not depict these totals well suggests that gridded datasets systematically struggle with these very high-end values. A likely reason is that the gridded datasets smooth down point values as part of their objective analysis process and so it is expected that high-end totals will be underrepresented by the grids. The impact from this effect would be worse if there were nearby gauges with low totals. The ranking of performance between the satellite datasets remains the same for both the continuous and the percentile-based statistics. The benefit of blending in gauge data is again displayed, with CMORPH BLD showing significant improvement over the unblended datasets and skill comparable to AWAP. As the trend between MB, MAE, RMSE, and R is the same, future references to continuous statistics will refer to just MAE, normalised MAE and R for brevity.
The values and residuals of the datasets against point gauge data are shown in Figure 5. There appears to be a tendency towards an overestimation for low rainfall months and an underestimation for high rainfall months. AWAP and, to a lesser extent, CMORPH BLD were able to capture the high-end rainfall months more accurately, with observation of months where more than 40 mm/day was recorded, being distinctly better. All datasets appear to struggle with very high-end rainfall months (>60 mm/day). These gauge totals sit along the lower boundary, which indicates that the datasets observed little rainfall while the gauges observed a significant amount. The fact that even AWAP does not depict these totals well suggests that gridded datasets systematically struggle with these very high-end values. A likely reason is that the gridded datasets smooth down point values as part of their objective analysis process and so it is expected that high-end totals will be underrepresented by the grids. The impact from this effect would be worse if there were nearby gauges with low totals.

Variation with Geography
A gridded comparison was performed over the Australian domain with the geographical representations of the MB and MAE shown in Figure 6. The CMORPH CRT and CMORPH BLD datasets were chosen to allow an investigation into the effects of gauge correction. Generally, the satellite-derived data overestimate rainfall, except over western Tasmania where there is a significant underestimation.

Variation with Geography
A gridded comparison was performed over the Australian domain with the geographical representations of the MB and MAE shown in Figure 6. The CMORPH CRT and CMORPH BLD datasets were chosen to allow an investigation into the effects of gauge correction. Generally, the satellite-derived data overestimate rainfall, except over western Tasmania where there is a significant underestimation.

Variation with Geography
A gridded comparison was performed over the Australian domain with the geographical representations of the MB and MAE shown in Figure 6. The CMORPH CRT and CMORPH BLD datasets were chosen to allow an investigation into the effects of gauge correction. Generally, the satellite-derived data overestimate rainfall, except over western Tasmania where there is a significant underestimation. The effects of normalisation are indicated along the northern coast of Australia and in western Tasmania where the unnormalised errors were previously the greatest but improve to about average after the adjustment, at least for the CMORPH BLD dataset.
The effect of gauge correction is especially evident around western Tasmania, as well as around western parts of Western Australia, the southern Australian coastline, the northern coastline of New South Wales, the Australian Alps, and the southwestern coast of Western Australia. In these areas, there are significant improvements in the normalised errors from the uncorrected dataset to the corrected one, indicating that there is a problem with satellite rainfall detection that cannot be accounted for by higher rainfall averages. Possible reasons will be discussed in the next section.
A point-based comparison using rain gauges categorised by states supported the gridded comparison with the results shown in Figure 7. The unnormalised MAE values suggest that performance is decreased in the tropical regions and in Tasmania, but after normalisation, the performance is much higher even across the states. The performance is slightly worse in Queensland and South Australia, while gauge correction appears to have the greatest effect in Tasmania.  The effects of normalisation are indicated along the northern coast of Australia and in western Tasmania where the unnormalised errors were previously the greatest but improve to about average after the adjustment, at least for the CMORPH BLD dataset.
The effect of gauge correction is especially evident around western Tasmania, as well as around western parts of Western Australia, the southern Australian coastline, the northern coastline of New South Wales, the Australian Alps, and the southwestern coast of Western Australia. In these areas, there are significant improvements in the normalised errors from the uncorrected dataset to the corrected one, indicating that there is a problem with satellite rainfall detection that cannot be accounted for by higher rainfall averages. Possible reasons will be discussed in the next section.
A point-based comparison using rain gauges categorised by states supported the gridded comparison with the results shown in Figure 7. The unnormalised MAE values suggest that performance is decreased in the tropical regions and in Tasmania, but after normalisation, the performance is much more even across the states. The performance is slightly worse in Queensland and South Australia, while gauge correction appears to have the greatest effect in Tasmania. The effects of normalisation are indicated along the northern coast of Australia and in western Tasmania where the unnormalised errors were previously the greatest but improve to about average after the adjustment, at least for the CMORPH BLD dataset.
The effect of gauge correction is especially evident around western Tasmania, as well as around western parts of Western Australia, the southern Australian coastline, the northern coastline of New South Wales, the Australian Alps, and the southwestern coast of Western Australia. In these areas, there are significant improvements in the normalised errors from the uncorrected dataset to the corrected one, indicating that there is a problem with satellite rainfall detection that cannot be accounted for by higher rainfall averages. Possible reasons will be discussed in the next section.
A point-based comparison using rain gauges categorised by states supported the gridded comparison with the results shown in Figure 7. The unnormalised MAE values suggest that performance is decreased in the tropical regions and in Tasmania, but after normalisation, the performance is much higher even across the states. The performance is slightly worse in Queensland and South Australia, while gauge correction appears to have the greatest effect in Tasmania.

Variation with Seasons
A seasonal analysis was completed by categorising the data into four seasons with December, January, and February (DJF); March, April, and May (MAM); June, July, and August (JJA); and September, October, and November (SON) representing austral summer, autumn, winter, and spring respectively.
A gridded comparison showing the normalised MAE from the CMORPH BLD dataset is displayed in Figure 8. The greatest seasonal variation of the error is observed towards the interior and around the northern coastline with winter possessing the worst performance and summer having the best.

Variation with Seasons
A seasonal analysis was completed by categorising the data into four seasons with December, January, and February (DJF); March, April, and May (MAM); June, July, and August (JJA); and September, October, and November (SON) representing austral summer, autumn, winter, and spring respectively.
A gridded comparison showing the normalised MAE from the CMORPH BLD dataset is displayed in Figure 8. The greatest seasonal variation of the error is observed towards the interior and around the northern coastline with winter possessing the worst performance and summer having the best. An analysis using point gauge data was also performed with the results shown in Figure 9. The MAE is the smallest in SON and largest in DJF where the error is approximately 50% greater. Normalisation of the errors results in the smallest relative error occurring in DJF and the largest in MAM and JJA, supporting the gridded comparison. The linear correlation coefficients across the seasons also suggest that DJF has the best performance across the seasons. The performance increase is more prominent in the non-gauge blended datasets, where the improvement is at least 10%. An analysis using point gauge data was also performed with the results shown in Figure 9. The MAE is the smallest in SON and largest in DJF where the error is approximately 50% greater. Normalisation of the errors results in the smallest relative error occurring in DJF and the largest in MAM and JJA, supporting the gridded comparison. The linear correlation coefficients across the seasons also suggest that DJF has the best performance across the seasons. The performance increase is more prominent in the non-gauge blended datasets, where the improvement is at least 10%.

Variation with Seasons
A seasonal analysis was completed by categorising the data into four seasons with December, January, and February (DJF); March, April, and May (MAM); June, July, and August (JJA); and September, October, and November (SON) representing austral summer, autumn, winter, and spring respectively.
A gridded comparison showing the normalised MAE from the CMORPH BLD dataset is displayed in Figure 8. The greatest seasonal variation of the error is observed towards the interior and around the northern coastline with winter possessing the worst performance and summer having the best. An analysis using point gauge data was also performed with the results shown in Figure 9. The MAE is the smallest in SON and largest in DJF where the error is approximately 50% greater. Normalisation of the errors results in the smallest relative error occurring in DJF and the largest in MAM and JJA, supporting the gridded comparison. The linear correlation coefficients across the seasons also suggest that DJF has the best performance across the seasons. The performance increase is more prominent in the non-gauge blended datasets, where the improvement is at least 10%. Overall, the performance appears to be relatively similar across the seasons with the exception of DJF, which shows a somewhat superior performance to the rest.

Variations with Rainfall Intensity
The effects of the intensity of the rainfall on the accuracy of the data were also analysed. The data were categorised into these bins: 0-0.2, 0.2-1, 1-2, 2-3, 3-4, 4-6, 6-9, and >9 mm/day. These rainfall ranges were chosen to ensure there were a reasonable amount of values in each bin with the values of 0.2 and 1 mm being specifically chosen as they correspond to the rainy-day threshold for BoM and a commonly used value in contingency statistics studies, respectively [15]. Continuous statistics for these bins were calculated along with a comparison of occurrence frequencies and cumulative volumes. These are shown in Figure 10. Overall, the performance appears to be relatively similar across the seasons with the exception of DJF, which shows a somewhat superior performance to the rest.

Variations with Rainfall Intensity
The effects of the intensity of the rainfall on the accuracy of the data were also analysed. The data were categorised into these bins: 0-0.2, 0.2-1, 1-2, 2-3, 3-4, 4-6, 6-9, and >9 mm/day. These rainfall ranges were chosen to ensure there were a reasonable amount of values in each bin with the values of 0.2 and 1 mm being specifically chosen as they correspond to the rainy-day threshold for BoM and a commonly used value in contingency statistics studies, respectively [15]. Continuous statistics for these bins were calculated along with a comparison of occurrence frequencies and cumulative volumes. These are shown in Figure 10. Overall, the performance appears to be relatively similar across the seasons with the exception of DJF, which shows a somewhat superior performance to the rest.

Variations with Rainfall Intensity
The effects of the intensity of the rainfall on the accuracy of the data were also analysed. The data were categorised into these bins: 0-0.2, 0.2-1, 1-2, 2-3, 3-4, 4-6, 6-9, and >9 mm/day. These rainfall ranges were chosen to ensure there were a reasonable amount of values in each bin with the values of 0.2 and 1 mm being specifically chosen as they correspond to the rainy-day threshold for BoM and a commonly used value in contingency statistics studies, respectively [15]. Continuous statistics for these bins were calculated along with a comparison of occurrence frequencies and cumulative volumes. These are shown in Figure 10. The datasets appear to capture the correct frequency best for rainfall amounts between 3 and 6 mm/day. For higher amounts, the satellite-derived data underestimate the frequency, while for lower amounts, the frequency is underestimated for very low values (<0.2 mm/day) but overestimated between the range of 0.2 and 3 mm/day. The change in sign of the bias from 0-0.2 to 0.2-1 mm/day may indicate that very-low-rainfall events are being incorrectly attributed to the higher ranges. For values above 1 mm/day, the frequency matches the gauge data quite well.
Analysis of the cumulative volumes demonstrates that below 1-3 mm/day, the satellite-derived data overestimate the gauge amount, while above this range, they underestimate the amount. Combining this result with the frequency analysis suggests that although the frequency of very-low-rainfall events is underestimated, each event is an overestimation of reality.
The MAE suggests decreasing skill as the rainfall rate increases. The normalised MAE was calculated by normalising the MAE by the mean rainfall amount for each bin. It indicates that the relative error was the largest for very small values (<0.2 mm/day).
Overall, satellite-derived data appears to be most reliable for low-moderate rainfall totals (2-4 mm/day), with a significant underestimation of amounts occurring for high-end totals and an underestimation of frequency and overestimation of amounts occurring for very low totals.

Discussion
It is important to acknowledge the effect of the errors in the reference datasets. The errors in the quality-controlled gauge network used for the point comparison are minor; however, the same cannot be said for the AWAP dataset used for the gridded comparison. Jones et al. (2009) performed a cross-validation of AWAP against station observations and found the monthly rainfall mean bias, RMSE, MAE, and normalised MAE to be 0.016, 0.7, 0.38, and 0.21 mm/day, respectively [29]. The RMSE and MAE for the satellite datasets ranged between 0.79 to 1.08 and 0.43 to 0.61 mm/day respectively, indicating that the errors in AWAP are comparable to those in the satellite datasets using AWAP as truth. To gain a better idea of the true error of the datasets, the satellite datasets along with AWAP were compared to a climate reanalysis (ERA5). A climate reanalysis is a numerical representation of meteorological fields created by combining meteorological observations with climate models. A gridded comparison using ERA5 as the reference was completed and is presented in Figures 11 and 12. The results demonstrate comparable performance across the datasets. CMORPH BLD and AWAP displayed remarkably similar performances. The datasets appear to capture the correct frequency best for rainfall amounts between 3 and 6 mm/day. For higher amounts, the satellite-derived data underestimate the frequency, while for lower amounts, the frequency is underestimated for very low values (<0.2 mm/day) but overestimated between the range of 0.2 and 3 mm/day. The change in sign of the bias from 0-0.2 to 0.2-1 mm/day may indicate that very-low-rainfall events are being incorrectly attributed to the higher ranges. For values above 1 mm/day, the frequency matches the gauge data quite well.
Analysis of the cumulative volumes demonstrates that below 1-3 mm/day, the satellite-derived data overestimate the gauge amount, while above this range, they underestimate the amount. Combining this result with the frequency analysis suggests that although the frequency of very-lowrainfall events is underestimated, each event is an overestimation of reality.
The MAE suggests decreasing skill as the rainfall rate increases. The normalised MAE was calculated by normalising the MAE by the mean rainfall amount for each bin. It indicates that the relative error was the largest for very small values (<0.2 mm/day).
Overall, satellite-derived data appears to be most reliable for low-moderate rainfall totals (2-4 mm/day), with a significant underestimation of amounts occurring for high-end totals and an underestimation of frequency and overestimation of amounts occurring for very low totals.

Discussion
It is important to acknowledge the effect of the errors in the reference datasets. The errors in the quality-controlled gauge network used for the point comparison are minor; however, the same cannot be said for the AWAP dataset used for the gridded comparison. Jones et al. (2009) performed a cross-validation of AWAP against station observations and found the monthly rainfall mean bias, RMSE, MAE, and normalised MAE to be 0.016, 0.7, 0.38, and 0.21 mm/day, respectively [29]. The RMSE and MAE for the satellite datasets ranged between 0.79 to 1.08 and 0.43 to 0.61 mm/day respectively, indicating that the errors in AWAP are comparable to those in the satellite datasets using AWAP as truth. To gain a better idea of the true error of the datasets, the satellite datasets along with AWAP were compared to a climate reanalysis (ERA5). A climate reanalysis is a numerical representation of meteorological fields created by combining meteorological observations with climate models. A gridded comparison using ERA5 as the reference was completed and is presented in Figures 11 and 12. The results demonstrate comparable performance across the datasets. CMORPH BLD and AWAP displayed remarkably similar performances.   The results of the error analysis of the gridded comparison are supported by the point-based comparison where both satellite datasets and AWAP were compared to station gauges with the errors in AWAP being smaller but still within the same order of magnitude as those from the satellite dataset. This highlights the caution needed in understanding that the gridded comparison results are unlikely to be a proper depiction of the true error of the satellite datasets.
There are certain regions where the performance of satellite rainfall detection is decreased. Past studies have indicated that the detection of cold frontal-based rainfall is poor [15,17]. The absence of ice crystals in the relatively low precipitating clouds typically associated with frontal rainfall hinders the ability of satellites to detect rainfall via scattering [15]. This is a likely factor behind the large errors over Tasmania, Western Australia, South Australia, and central Australia, areas where the prevalent rainfall generation mechanism is cold frontal systems. Errors are pronounced over the western half of Tasmania and the southwestern coast of Western Australia, areas of relatively high rainfall due to increased exposure to westerly flow and associated cold fronts.
Performance is also known to be decreased over topography [18,21]. Decreased performance is observed along the eastern coastline near the Great Dividing Range. The errors are greatest along the northern NSW coastline and the Australian Alps where the Great Dividing Range is at its highest elevations, leading to a strong orographic influence on rainfall.
A high-quality rain gauge network is extremely valuable for improving the accuracy of satellitederived rainfall estimates as satellite estimates rely on gauges to calibrate or correct their raw values. The significantly greater number of gauges towards the coastline where most of Australia's population resides allows for a much greater improvement from gauge correction in contrast to the interior of the continent. Consequently, areas towards the coastlines that experience problematic regimes such as cold-frontal rainfall and orographically influenced rainfall greatly benefit from gauge correction, resulting in a performance similar to unproblematic regimes. However, the lack of rain gauges towards the interior means there are still large normalised errors in this region, even in the gauge-corrected dataset. This is compounded by the tendency of rainfall to be lighter towards the interior compared to the coast as light rainfall has been shown to be a problematic regime as well [17,18].
Low mean rainfall is another factor that would contribute to a large normalised MAE. Some areas of large normalised MAE around the interior of the continent can be seen to generally align with areas of low mean rainfall values, as seen in Figure 13, which depicts the seasonal mean rainfall across Australia. This is especially true during the austral winter 'dry' season for central Australia and northwards towards the Northern Territory coast. The results of the error analysis of the gridded comparison are supported by the point-based comparison where both satellite datasets and AWAP were compared to station gauges with the errors in AWAP being smaller but still within the same order of magnitude as those from the satellite dataset. This highlights the caution needed in understanding that the gridded comparison results are unlikely to be a proper depiction of the true error of the satellite datasets.
There are certain regions where the performance of satellite rainfall detection is decreased. Past studies have indicated that the detection of cold frontal-based rainfall is poor [15,17]. The absence of ice crystals in the relatively low precipitating clouds typically associated with frontal rainfall hinders the ability of satellites to detect rainfall via scattering [15]. This is a likely factor behind the large errors over Tasmania, Western Australia, South Australia, and central Australia, areas where the prevalent rainfall generation mechanism is cold frontal systems. Errors are pronounced over the western half of Tasmania and the southwestern coast of Western Australia, areas of relatively high rainfall due to increased exposure to westerly flow and associated cold fronts.
Performance is also known to be decreased over topography [18,21]. Decreased performance is observed along the eastern coastline near the Great Dividing Range. The errors are greatest along the northern NSW coastline and the Australian Alps where the Great Dividing Range is at its highest elevations, leading to a strong orographic influence on rainfall.
A high-quality rain gauge network is extremely valuable for improving the accuracy of satellite-derived rainfall estimates as satellite estimates rely on gauges to calibrate or correct their raw values. The significantly greater number of gauges towards the coastline where most of Australia's population resides allows for a much greater improvement from gauge correction in contrast to the interior of the continent. Consequently, areas towards the coastlines that experience problematic regimes such as cold-frontal rainfall and orographically influenced rainfall greatly benefit from gauge correction, resulting in a performance similar to unproblematic regimes. However, the lack of rain gauges towards the interior means there are still large normalised errors in this region, even in the gauge-corrected dataset. This is compounded by the tendency of rainfall to be lighter towards the interior compared to the coast as light rainfall has been shown to be a problematic regime as well [17,18].
Low mean rainfall is another factor that would contribute to a large normalised MAE. Some areas of large normalised MAE around the interior of the continent can be seen to generally align with areas of low mean rainfall values, as seen in Figure 13, which depicts the seasonal mean rainfall across Australia. This is especially true during the austral winter 'dry' season for central Australia and northwards towards the Northern Territory coast. Remote Sens. 2020, 12, x FOR PEER REVIEW 14 of 18 The importance of gauge correction is reduced for unproblematic regimes. For example, the normalised errors for the corrected and uncorrected datasets around the northern coastline of Australia are relatively similar, highlighting how the raw satellite algorithms exhibit decent performance in these areas, leading to gauge correction being less crucial. Tropical-based rainfall has been noted to be one of the better-performing regimes for satellite rainfall detection [15].
Satellite-derived precipitation estimates for austral winter demonstrate the worst performance, a result that agrees with past studies [15]. The difficulty of detecting cold-frontal rainfall, which is more frequent during winter, is most likely a key factor. The introduction of snow is another challenge for satellite detection of precipitation and is likely a contributing factor to the poor performance observed in western Tasmania and the Australian Alps. By contrast, the greater prevalence of convective-based rainfall in summer is a reason for this season performing the best [15,17].
An overestimation (underestimation) of low (high) rainfall rates was observed and is consistent with past literature [19,20].
It is natural to expect that CMORPH BLD (a gauge-blended dataset) should have at least equal performance to AWAP (a gauge-based analysis) as it relies on using gauges where the data exist whilst depending more heavily on satellites where there is little to no gauge data. However, the key assumption here is that satellite depiction of rainfall is superior to interpolation methods in areas with little to no data. This is not necessarily true, as, even though satellites are sourcing their data through a physical sensor, this process still relies heavily on calibration to rain gauge data. For locations where there is little to no gauge data, calibration and, subsequently, performance will be severely hindered. Furthermore, AWAP used a minimum number of stations exceeding 3000 while the satellite datasets are calibrated to the CPC Unified gauge analysis, which has a minimum number of stations across Australia at least an order of magnitude less than that of AWAP [29,30]. The ingestion of less data is likely to contribute to the discrepancies observed. The importance of gauge correction is reduced for unproblematic regimes. For example, the normalised errors for the corrected and uncorrected datasets around the northern coastline of Australia are relatively similar, highlighting how the raw satellite algorithms exhibit decent performance in these areas, leading to gauge correction being less crucial. Tropical-based rainfall has been noted to be one of the better-performing regimes for satellite rainfall detection [15].
Satellite-derived precipitation estimates for austral winter demonstrate the worst performance, a result that agrees with past studies [15]. The difficulty of detecting cold-frontal rainfall, which is more frequent during winter, is most likely a key factor. The introduction of snow is another challenge for satellite detection of precipitation and is likely a contributing factor to the poor performance observed in western Tasmania and the Australian Alps. By contrast, the greater prevalence of convective-based rainfall in summer is a reason for this season performing the best [15,17].
An overestimation (underestimation) of low (high) rainfall rates was observed and is consistent with past literature [19,20].
It is natural to expect that CMORPH BLD (a gauge-blended dataset) should have at least equal performance to AWAP (a gauge-based analysis) as it relies on using gauges where the data exist whilst depending more heavily on satellites where there is little to no gauge data. However, the key assumption here is that satellite depiction of rainfall is superior to interpolation methods in areas with little to no data. This is not necessarily true, as, even though satellites are sourcing their data through a physical sensor, this process still relies heavily on calibration to rain gauge data. For locations where there is little to no gauge data, calibration and, subsequently, performance will be severely hindered. Furthermore, AWAP used a minimum number of stations exceeding 3000 while the satellite datasets are calibrated to the CPC Unified gauge analysis, which has a minimum number of stations across Australia at least an order of magnitude less than that of AWAP [29,30]. The ingestion of less data is likely to contribute to the discrepancies observed.

Conclusions
The high spatial variation of rainfall along with the issue of installing a sufficiently dense network of rain gauges in many areas around the world make satellites an attractive option in terms of their ability to provide a continuous estimate of near-surface rainfall. Numerous verifications of satellite-estimated rainfall have been performed in the past, but few studies have focused on Australia using a relatively long data record. This study aimed to fill that gap by performing a validation over Australia using monthly CMORPH (both the bias-corrected CMORPH CRT and the gauge-blended CMORPH BLD) and GSMaP (gauge-corrected) data across an 18 year period from 2001 to 2018.
Station data were used as a point of reference, both in the form of the AWAP analysis along with individual stations in order to enable both a gridded and point-based comparison, respectively. Both continuous statistics (MB, MAE, RMSE, and R) and percentile-based statistics (hit rate for bottom and top quintiles) were chosen. General performance along with the geographical, seasonal, and intensity dependencies were subsequently investigated.
Overall statistics showed that satellite performance was decent and, in the case of CMORPH BLD, somewhat comparable to the AWAP analysis used as truth. CMORPH BLD performed best followed by CMORPH CRT and then GSMaP. Linear correlations from 0.71 to 0.90 and a bottom quintile hit rate from 70% to 80% were especially encouraging.
A geographical analysis of the error dependency was completed by plotting the gridded errors over Australia, as well as by breaking down the point comparison into states. Western Australia, western Tasmania, central Australia, and the Australian Alps displayed large errors in the uncorrected datasets. Orographically influenced rainfall and cold frontal rainfall have been identified as problematic regimes by past studies and are applicable to these regions. The blending of gauge data was beneficial, especially for regions that had problematic rainfall regimes. However, a dense rain gauge network is also needed for accurate calibration, and it is likely that the lack of rain gauges towards the interior of the continent was probably the reason why little to no improvement was seen in the gauge-blended dataset over these areas.
Categorising the results by seasons demonstrated that the performance was relatively similar across the seasons, with satellite-derived precipitation estimates in austral summer performing best and those in austral winter performing worst. A categorisation by rainfall intensity suggested that the performance was best for moderate rainfall amounts (2-4 mm/day). The frequency of high-end rainfall was captured well but the amount was severely underestimated while low-end rainfall amounts were overestimated.
The main results from this study agree with past literature reconciling the performance of satellite-derived precipitation estimates over Australia with those seen around other regions in the world. The results obtained in this study are generally better than past studies. For example, Jiang et al. (2016) evaluated CMORPH CRT and CMORPH BLD over China on a monthly time scale from 2000 to 2012 and obtained slightly lower correlation coefficients of 0.72 and 0.83 respectively [16]. Possible reasons may be that satellite technology has continued to improve over the years, as well as Australia having a relatively high-quality and dense rain gauge network that allows for improved performance of gauge correction and blending.
The study supported the finding that orographically influenced rainfall and cold frontal rainfall are problematic regimes for satellite rainfall detection. Advancement in the detection of these regimes would be very beneficial. Gauge-blending was shown to be a worthy process; however, its performance is strongly tied to the availability of high-quality rain gauge network data, which do not exist in many regions. Considering that one of, if not, the most valuable use of satellite rainfall monitoring is in areas without rain gauges, an accuracy that is dependent on gauge-blending should not be relied on. The unblended datasets do demonstrate skilful performance, which would be useful for areas that lack a rain gauge network, but there is still a considerable amount of progress needed to bring unblended datasets to a level comparable to that of rain gauges.
To conclude, evaluation of satellite precipitation estimates (CMORPH and GSMaP) is an essential scientific contribution to WMO activities in assisting countries in Asia and the Pacific with improving precipitation monitoring (including accumulated heavy precipitation and drought monitoring) which WMO provides through its flagship initiatives such as the Space-based Weather and Climate Extremes Monitoring [24] and the Climate Risk and Early Warning Systems [31], among others.