3.1. Synoptic Stations vs. IMERG
As indicated earlier, synoptic stations provide the most accurate rainfall measurements in the Philippines because they are manned, regularly calibrated, and the measurement procedures follow the standards set by WMO. However, these stations are few and sparsely distributed in the country. Synoptic stations are maintained and operated to monitor mainly synoptic scale systems such as tropical cyclones. Thus, these stations are spaced strategically to capture most of the weather and climatic events in locations that represent the general conditions in the country. However, such events are often unpredictable depending on the location, especially with the advent of climate change and periodic or natural phenomena, such as the El Niño Southern Oscillation (ENSO) and the Pacific Decadal Oscillation. This makes the denser coverage provided by ARGs and satellite data critically important.
The scatter plot of IMERG LR data versus synoptic station data, as presented in
Figure 3a,b, shows a generally good relationship with correlation coefficients of 0.73 and 0.75 for the 10 day and monthly rainfall, respectively. Better statistics were obtained by comparing the final product, IMERG FR, with synoptic station data, having correlation coefficients of 0.81 and 0.87 for 10 day and monthly rainfall, respectively. IMERG FR also improved on the MAE, RMSE, and bias of IMERG LR. The results of regression analysis, however, show disagreements with the slope and offset (blue line) far from the line that corresponds to perfect agreement (red line). With the synoptic station data being used as the baseline, the disagreement cannot be attributed to problems with the IMERG data alone because there was also likely a mismatch in the values that were compared. As indicated earlier, the synoptic data is just a point measurement, whereas the IMERG data represent an average of the rain conditions over a much larger area. This is likely the cause of reported disagreements in previous studies comparing synoptic station data with satellite rainfall products [
6,
13,
14,
15].
3.2. ARGs vs. Synoptic Stations
The relatively dense network of rain gauges provides the means to extend the limited coverage provided by the synoptic stations. However, it is important that the unmanned rain gauges provide rainfall data that are consistent with those from the more accurate synoptic station measurements. To assess the consistency, scatter plots for synoptic rainfall data versus ARG data from various sites surrounding the synoptic station are presented in
Figure 4 for both 10 day and monthly averages. The data points in the scatter plots are color coded to indicate the distance between the synoptic station and the location of the ARG. As expected, the plots indicate that the discrepancies are higher when the distance of the locations of the two datasets are larger. Qualitative analysis indicates that most of the data points that show general agreement are those within an optimal distance of about 1 km.
To further illustrate the distance-dependent relationship of rainfall measurements from the ground stations, spatial autocorrelation analysis of the data was conducted.
Figure 5 shows the spatial autocorrelation, in terms of the Global Moran’s I value, for all 10 day and monthly rainfall obtained from all ARGs and synoptic stations in the country. The distance (
d) considered for each ground stations’ neighbors was tested using intervals of
d ≤ 1 km, 1 km <
d ≤ 10 km, and 10 km <
d ≤ 50 km. It can be observed that the majority of the Global Moran’s I throughout the study period are significantly positive (with
p < 0.05), revealing potential spatial clustering of both high and low rainfall values obtained from the ground stations. The magnitude of Global Moran’s I values increases as the distance between neighboring stations is reduced. This suggests that for distances within 1 km, ARGs and synoptic stations exhibit the greatest spatial autocorrelation. This result again suggests the need for short distances when ARGs and synoptic stations are being compared. In this study, only data within distances of 1 km were selected for estimating the correction parameters needed to make ARG consistent with synoptic data. As confirmed later, such a distance provides a high likelihood that the same rain is being observed by both ARG and synoptic station.
Synoptic station and ARG pairs that are separated by 1 km or less, and were used in the analysis to obtain the correction parameters, are listed in
Table 1. Each ARG was identified by an ID previously set based on installation number. To minimize the influence of topography, only data pairs with a difference in elevation of less than 50 m were considered. Results of this comparative analysis are shown in the scatter plots of
Figure 6. The results of regression analysis of the raw data points (in black) indicate a much stronger relationship, with the correlation coefficients being 0.94 and 0.97 for the 10 day and monthly averages, respectively. However, the corresponding biases of 1.93 and 1.69 mm/day, and slopes that are significantly less than 1.0 (red line), indicate a mismatch that might be caused by instrumentation differences associated with calibration and maintenance. Using the generalized reduced gradient algorithm to correct the bias, the results improved shown by green data points depict a much better consistency of the two sets of data. Optimized values of correction factors, a = 1.282 and b = 1.013, were obtained for 10 day rainfall, and a = 1.349 and b = 0.996 for monthly rainfall. Bias correction on ARG rainfall reduced the errors (MAE, RMSE) and biases compared with those of the synoptic measurements. A summary of the improvements is presented in
Table 2.
To further illustrate the value of making corrections on the ARG data, time series of 10 day and monthly rainfall are presented in
Figure 7a,b, respectively. The plots show a comparison of raw ARG values, corrected ARG values, and synoptic station data in Catbalogan, Samar. It is apparent from the plots that the corrected ARG values, represented by the green line, are in a much better agreement with the synoptic data (in red) than the raw ARG data (in black). However, there are some disagreements with the synoptic data, which are sometimes higher than corrected ARG data, and vice-versa. This is expected because there are factors other than distances that may affect the difference in rainfall values. By comparison, the raw ARG data are shown to be consistently lower than the synoptic station data. Assuming that the differences between the synoptic data and raw ARG values are caused primarily by the same technical differences, including calibration, the same set of correction parameters was applied to all raw ARG data to generate an improved ARG dataset that is consistent with synoptic data.
After performing bias correction for the ARGs, the absolute differences between rainfall measurements from ARGs and its nearest neighboring synoptic station were obtained per individual 10 day period and month. The absolute differences from all ARG–synoptic station pairs within specific distances were averaged, as shown in
Figure 8. It can be observed that the discrepancy between the ARGs and synoptic stations decrease as their distance becomes closer. In this case, for distances of less than 1 km, the average discrepancy for both 10 day and monthly rainfall is usually less than 3 mm/day. It should also be noted that the discrepancies between ARGs and synoptic stations do not necessarily decrease monotonically with distances within the optimal range of 1 km because of the other factors indicated earlier. Moreover, there appears to be a seasonality in the differences, with peak rainfall discrepancies occurring during the Summer Monsoon months from June until September.
3.3. ARGs vs. IMERG
The next step was to derive a IMERG dataset that is consistent with both synoptic data and corrected ARG measurements. Following the better statistics obtained from IMERG FR in
Figure 3, the GPM product used from this section onwards will be the IMERG FR. The challenge in this venture is overcoming the difference between the point measurements made at the synoptic station and ARG, and the satellite measurements that cover a much larger footprint. This difference produced the observed inconsistencies reported in previous studies that directly compared point values from synoptic stations with satellite gridded values that covered the same location as the synoptic station [
6,
13,
14,
15]. To overcome this problem, we used the more robust number of ARG measurements to compare with IMERG FR. The strategy was to find IMERG FR data elements that have a number of ARG measurements within the IMERG FR pixel, and use the data to derive the correction factor that makes IMERG consistent with corrected ARG and synoptic data.
Comparisons of raw IMERG FR data with corrected ARG measurements within the footprint of the IMERG FR are presented in the scatter plots shown in
Figure 9. The scatter plots are color coded to indicate the number of ARG measurements within the IMERG FR footprint that were used. The averaging is either over 10 days as presented in
Figure 9a, or over a month as shown in
Figure 9b. It is apparent that the higher the number of ARGs, the closer the data points to the red line, which represents perfect agreement. This indicates that within the IMERG FR footprint, the ARG values are not necessarily uniform and an average of a few ARG measurements, at least, is required to compare well with IMERG FR values.
Considering the effect of the number of ARGs on the agreement of ARG rainfall with IMERG FR values, the analysis proceeded by comparing only IMERG FR pixels with the average of an ample number of ARGs within its footprint. A buffer of 0.5° outside the IMERG FR pixel was applied to nearby ARGs as their measurements still have direct effect on the average rainfall within the pixel. Only IMERG FR pixels that initially have 10 ARGs or more within their footprint were considered in further analysis. Because not all ARGs can provide rainfall data all of the time, a minimum of five stations (out of 10 or more stations) must have 10 day and monthly measurements to be considered in the time series comparison.
Scatter plots of raw IMERG FR versus averages of corrected ARGs within the IMERG FR footprints are presented in
Figure 10a,b for 10 day and monthly data, respectively. Only pixels with a minimum of 10 ARGs within its footprint are presented, and linear regression analysis yielded a correlation coefficient of 0.68 for the 10 day and 0.83 for the monthly data. In addition, rainfall biases were −0.66 mm/day and −0.55 mm/day for 10 day and monthly rainfall. Negative biases indicate that average residuals point to higher IMERG FR rainfall estimates. However, the slope values of less than one indicate a linear trend in which ARGs are greater than IMERG FR rainfall estimates. Similar comparisons were also performed using IMERG LR and the results were similar to those from the use of IMERG FR.
To better understand the relationship of IMERG FR and ARGs, as depicted in
Figure 10, data during light to moderate rain days (0.1 to 10.0 mm/day) were plotted separately from those during heavy to severe rain days (10.0 to 30.0 mm/day). As shown in
Figure 11, IMERG FR tends to overestimate ARG rainfall during light to moderate rain days, which is depicted by the majority of the scattered points lying above the perfect agreement (red) line. The opposite can be observed during heavy to severe rain days, in which IMERG FR usually underestimates ARG rainfall. Because the slopes are different, these discrepancies per rainfall interval can be further used to optimize the accuracy of the bias correction technique that will be employed to IMERG FR.
Further correction to the IMERG FR values was performed in two steps. The first step was performed by employing the generalized reduced gradient algorithm in Equation (1) for different rainfall intervals. Optimized correction factors were obtained and applied to IMERG FR depending on the average value of ARGs within the IMERG FR footprint. Because the monthly data is more uniform than the 10 day data, correction factors obtained from the monthly data were also used for the 10 day rainfall. The correction procedure of the first step is as follows:
For 8 mm/day < ARGs < 12 mm/day:
The second step is to use the resulting linear regression parameters (slope = 0.81 and intercept = 1.12) obtained by comparing IMERG FR′ with the average ARGs. The correction procedure of the second step is as follows:
For IMERG FR′ < 1.12 mm/day:
For IMERG FR′ ≥ 1.12 mm/day:
Equation (5) was used to avoid having negative values for IMERG FR″, which serves as the final bias-corrected IMERG FR product.
The bias-corrected IMERG FR was compared with the average ARG rainfall, as shown in
Figure 12. Correlation coefficients improved from the raw IMERG FR with values from 0.68 to 0.88 for 10 day rainfall, and 0.83 to 0.93 for monthly rainfall. Error statistics MAE and RMSE were reduced. Rainfall biases also decreased to 0.04 and −0.09 mm/day for 10 day and monthly data, respectively. Finally, slopes become closer to one and the intercepts were reduced. These results suggest that performing correction on IMERG FR using both the generalized reduced gradient algorithm (power transform) and linear regression provides better consistency with ground rainfall data in terms of reduction in rainfall errors and biases. Moreover, employing the two-step correction is a direct method to concurrently use rainfall data from ARGs and IMERG FR, which are made consistent with synoptic station data. Through this technique, the final product was obtained from comparisons of similar rainfall events that generated optimized correction factors that can be used to construct a long-term gridded rainfall dataset for the Philippines.
The number of ARG stations needed for an effective comparative study of IMERG FR data and ARG data depends on various factors, including the uniformity of rain over a large area. To assess the optimal number, correlation coefficients as a function of the minimum number of ARGs within the satellite footprint are plotted in
Figure 13. It is apparent that the correlation coefficient increases with the minimum number of ARGs. The plot for monthly data shows a slight increase in the correlation coefficients from 0.83 for ≥5 ARGs to about 0.89 for ≥10 ARGs. The plot for 10 day data shows a greater increase from 0.67 for ≥5 ARGs to about 0.77 for ≥10 ARGs. It appears that the use of 10 ARGs in the aforementioned analysis is desirable and, although using more would be preferred, the results at ≥11 ARGs indicate declines in correlation coefficients, suggesting statistical limits due to the relatively small number of ARG measurements that meet this criterion.
To illustrate the merit of having a rainfall dataset that consists of synoptic station data, rain gauge data, and IMERG FR data, sample color-coded maps of raw, interpolated, and corrected 10 day data for the period 6–15 March 2017 are presented in
Figure 14.
Figure 14a,b represents the point measurements provided by synoptic stations and ARGs, with the size of the data points considerably increased for improved visibility. The images show relatively low values in Luzon and other areas, but considerably higher values towards the south, and particularly on the island of Mindanao. The more comprehensive measurements provided by IMERG FR and depicted in
Figure 14c show significantly more defined spatial distribution, with the northern regions exhibiting the dry season expected for this time of the year but with an increasing rate towards the southern region. The rainfall pattern depicted is influenced by the lack of tropical cyclones and the behavior of the monsoons during this period.
If satellite data is not available, the only means of obtaining regional patterns is through spatial interpolation of the synoptic and ARG data. Spatial interpolation was undertaken by applying inverse distance weighting [
24] with the power parameter,
p = 5 on synoptic and ARG rainfall data, as illustrated in
Figure 14d,e, respectively. These two images show generally similar patterns but there are distinct differences in some regions. For example, the patterns in the southern island of Mindanao and of the island of Palawan are quite different. Spatially interpolated ARGs were used as a basis for the ARG threshold values required for the two-step bias correction of the IMERG FR product shown in
Figure 14f. However, interpolated synoptic stations may also be used when ARGs are not available because similar spatial patterns are observed between the two. When the bias-corrected IMERG FR is compared with the raw IMERG FR data shown in
Figure 14c, distinct differences are observed, highlighting the significant improvements in the IMERG FR product. However, when compared with the spatially interpolated synoptic and ARG data, huge differences are apparent, especially in areas where there is a paucity of measurements, or no measurements, from the ground. These results show the weakness of having just ground observations as provided by the synoptic and ARG data.
Similar sets of color-coded maps are presented in
Figure 15, but for the monthly data in March 2017. The monthly data show similar patterns, but somewhat different values when compared with the 10 day data because they represent average measurements for a longer period, with the non-overlapping data likely representing a different rain event than the rest of the monthly data. Again, the spatially interpolated synoptic and ARG data shown in
Figure 15d,e depict approximately similar patterns but different values, with the ARG providing more regional details, thereby confirming the value of having measurements in more locations. The maps for raw IMERG FR and the corrected IMERG FR are again shown to have slight but significant differences, but are nonetheless consistent with the spatial patterns of rainfall. The subtle differences in various locations indicates that for monthly averages, the raw IMERG FR data already provide a good representation of the rainfall patterns in the Philippines, which is further improved upon introduction of bias correction.