Improving the Regional Applicability of Satellite Precipitation Products by Ensemble Algorithm

Satellite-based precipitation products (e.g., Integrated Multi-Satellite Retrievals for Global Precipitation Measurement (IMERG) and its predecessor, Tropical Rainfall Measuring Mission (TRMM)) are a critical source of precipitation estimation, particularly for a region with less, or no, hydrometric networking. However, the inconsistency in the performance of these products has been observed in different climatic and topographic diverse regions, timescales, and precipitation intensities and there is still room for improvement. Hence, using a projected ensemble algorithm, the regional precipitation estimate (RP) is introduced here. The RP concept is mainly based on the regional performance weights derived from the Mean Square Error (MSE) and the precipitation estimate from the TRMM product, that is, TRMM 3B42 (TR), real-time (late) (IT) and the research (post-real-time) (IR) products of IMERG. The overall results of the selected contingency table (e.g., Probability of detection (POD)) and statistical indices (e.g., Correlation Coefficient (CC)) signposted that the proposed RP product has shown an overall better potential to capture the gauge observations compared with the TR, IR, and IT in five different climatic regions of Pakistan from January 2015 to December 2016, at a diurnal time scale. The current study could be the first research providing preliminary feedback from Pakistan for global precipitation measurement researchers by highlighting the need for refinement in the IMERG.


Introduction
The accurate quantification of precipitation is critical for significant flood projection, drought assessment and water resources management practices. Furthermore, in the context of current climate change, it is necessary to advance our understanding of temporal and spatial precipitation dynamics, both regionally and globally. The accuracy of the precipitation input also substantially influences the efficacy of hydrological models. Furthermore, obtaining an accurate regional precipitation database for significant hydrological predictions is essential; however, it is considered a challenging task in developing countries like Pakistan with complex terrain with no, or an insufficient, temporal database.
Traditionally, precipitation is measured through ground-based observation using several rain gauging networks/instruments, such as rain gauges, weather radar and/or interpolation of rainfall measured at rain gauges. However, many parts of the developing, and even the developed, world are characterized by no, or sparse, density and inconvenient spatial distribution of hydrometric networking and therefore obtaining spatial rainfall observations are subject to even larger uncertainties.
Based on a comprehensive evaluation, these studies concluded that: (i) Version 7 of the TRMM product outperforms the preceding version 6 by substantial improvements in the bias; (ii) Based on the mean error, correlation with ground-based observation and the false alarm ratio, the TRMM 3B42 resulted in a significantly better performance compared to other satellite products across many regions [34]. This may be because of the bias adjustment integrated into the 3B42 algorithm. Though the 3B42 could be strongly biased in regions with sparse density and inconvenient spatial distribution of hydrometric networking; nonetheless, the product resulted in a low probability of detection in Australia and some parts of South America [31]; (iii) As compared to the TRMM, the IMERG showed an appreciably better performance across the mainland of China, however, no significant improvement was noted in the data-sparse mountainous watershed of Myanmar. Though satellite products, for example., the IMERG and its predecessor the TRMM version 3B42, could potentially capture the spatiotemporal variability of precipitation, however, it contains considerable error and there is still room for further improvement of the capturing capability for both products in a dry climate and high altitude constituency [33,34]; (iv) In complex terrain and in regions with rapid precipitation gradients, the rain detection efficacy of most of the satellite products showed a weak performance and mean errors mainly depended on magnitude. This may be because of the poor ability to distinguish between raining and non-raining clouds. Even though Global Satellite Mapping of Precipitation moving vector with Kalman filter (GSMaP-MVK) showed high correlations within situ data over Japan, there is still no universally acceptable product whose performance could be considered the best in these types of regions. Hence, over precipitous regions, further validation research could provide a better understanding of the limitations of using satellite products in flash flood applications [31]; (v) In addition to topographic diversity, the climatological features and seasonality also play a vital role in the better performance of these products, specifically in terms of mean errors and the probability of detection [32]. A significantly better performance was shown in the equatorial and tropical region as compared to the semi-arid region. Moreover, the summer (warm) season associated with the convective structure and cold seasons with light rainfall played a decisive part in the performance evaluation of these products [32]; (vi) In Pakistan, the TRMM products provide a better performance in plain and medium elevation areas, however, resulted in overestimation in the glaciated and mountainous areas in the north and in coastal areas and in arid regions, their use is questionable [38,41]. The error associated with the satellite products showed significantly geo-topographic dependability [37][38][39][40][41]; (vii) More conclusively, the performance of these products mainly based on the variability of many factors, for example., climatic and topographic diversity, timescales (annual, seasonal, daily, and monthly), precipitation intensities, season (monsoon, winter etc.) [31,38,42] and, irrespective of significant improvements, in most of satellite precipitation products, still a region-specific assessment is necessary before any hydrological simulation, assessment, projection or outlook studies [42].
The current study introduces an ensemble algorithm based quantification of precipitation estimates considering a blended precipitation estimator. The basic idea behind the concept was to use the relative regional performance weights of the various well-known satellite precipitation products (here, IMERG research (IR), IMERG real-time (IT) and TRMM 3B42 (TR)) and to provide significant final regional precipitation estimates by exploiting the advantages and minimizing the disadvantages of these satellite products. More specifically, the main objectives of this study were; (i) To evaluate the performance of satellite products across the divergent climatic regions of Pakistan; (ii) To quantify blended precipitation estimates aiming at a better performance compared to the individual satellite observations in all climatic and topographic diverse regions of Pakistan. Additionally, providing the preliminary feedback from Pakistan for global precipitation measurement researchers by highlighting the need for refinement in the IMERG.

Study Area
Pakistan is a developing country and ranges from approximately 61-77 • E (longitude) and 23.5-37 • N (latitude) with a coverage of 79.6 Million ha area and an elevation ranging from 0 m (Arabian sea) to 8611 m (k2-Mount Godwin-Austin mountain). The total cultivated land is about 24% of the total country area, of which irrigated is about 80%, land covering Forests and grazing is about 4%, 31% is unfit for agriculture and 2% under cover. Landscape diversity ranging from coastline alongside the Arabian Sea in the south, plains area, deserts, and plateaus in the middle, to the snowed mountainous region in the north ( Figure 1). The geographical and water resources based division includes; the Indus riverine covering a major area of Punjab, Khyber Pakhtunkhwa, and the Sind province of Pakistan, the Himalayan mountainous region (North-East), Northern Highland, and the drought-prone arid climatic region southwest region. Owing to these diverse climatic regimes, the significant spatiotemporal variation in precipitation (i.e., from 300 mm in the south to about 1500 mm in the north) occurs throughout the country. For the measurement of the precipitation, the ground-based hydrometric networking is installed, however, characterized by no or inadequate density and inconvenient temporal and spatial resolutions of rain gauge networking for hydrologic applications.

Rain Gauges
The climate of Pakistan is in general characterized by a warm summer time and cold winters with a wide range of variations between extremes in different regions. To monitor this variation, currently, the Pakistan Meteorological Department (PMD) has established significant weather stations (including those from 1950, seasonal ones and new ones) all over Pakistan. Based on 30 years' precipitation data trend and climatic analysis of different regions, Chen et al. [43] had divided Pakistan into five different climatic regions (detailed below). Using this classification, we selected several stations from each climatic region ( Figure 1) for the evaluation of selected satellite products and the quantification of proposed precipitation estimates. The preliminary selection of the thirty-five stations was to ensure the quality of reference data, independencies from gauge-adjusted satellite product, confirmed by the PMD, (source agency) and a continuous string of the observed data with no missing data. Furthermore, the salient features of the considered region are as follows; Region 1 (G-1): G1 includes a very cold climate in winter, a mild temperature in summer with high green mountains range, situated in the north (between 34 • N to 38 • N) of Pakistan. Region 2 (G-2): G2 has a mild cold climate in winter, a relatively hot summer, Sub Mountains, and located between 31 • N to 34 • N. Region (G-3): G3 comprises of relatively cold winters and hot in summer. Most of the stations have dry mountainous areas covering an area between 27 • N to 32 • N. Region (G-4): G4 is the hottest and driest region of the country where highest maximum temperatures (53 • C) are generally recorded in summer. The area is almost plain with some desert areas like Thal, Cholistan, and Thar. Region (G-5): G5 is a big region with a coastal range and is mostly arid to hyper-arid.

Satellite Based Estimates
Integrated Multi-Satellite Retrievals for Global Precipitation Measurement (IMERG) The IMERG V.04, level 3 is a quasi-global (60 • N to 60 • S) multi-satellite precipitation product of Global Precipitation Measurement (GPM) (https://pmm.nasa.gov/GPM), which provides estimates based on the combined use of passive microwave (PMW) sensors, infrared (IR) satellites, and ground-based precipitation data. The GPM-IMERG mission was launched in 2014 (February), aiming at continuing and improved satellite precipitation and snowfall products on a global scale. The level 3 products comprise of gridded precipitation and a snowfall database, with a spatial resolution of 0.1 • × 0.1 • and temporal resolution of 30 min, estimated from the combined GPM constellation satellites, and calibrated by the Global Precipitation Climatology Centre (GPCC)'s gauge analysis [33,34,44].
The IMERG generally run twofold, first to generate the IMERG-Early/post real-time database (about 6 h after minimal observation time) for a warning from probable flood or landslides, and next to produce the IMERG-Late observation with about 18 h latency for studies of drought monitoring or agricultural forecasting. After receiving monthly gauge observations, the IMERG-Final cycle is run to generate the database approximate latency 3-months later, the observation month. In this study, we use the calibrated GIS version of daily real-time (IMERG-late) (IT) and research (IMERG-post-real-time) (IR) products to accomplish the projected objectives.

Tropical Rainfall Measuring Mission (TRMM)
The TRMM_3B42v7, Multi-satellite Precipitation Analysis (TMPA) (https://pmm.nasa.gov/dataaccess/downloads/trmm) launched in 1997, is envisioned to provide the best satellite estimate of precipitation on a nearly quasi-global scale between 50 • S and 50 • N with spatial resolution of 0.25 • × 0.25 • and a temporal resolution of 3 h, accumulated daily and monthly using combined satellite precipitation related sensors. The TRMM combines IR radar with four PMW sensors named Microwave Imager (TMI), Precipitation Radar (PR), and Special Sensor Imager (SSI) and advances Microwave Scanning Radiometer (AMSR). Here, the TRMM-TMPA 3B42 Real Time (hereafter TR) was used in order to evaluate its performance and using candidates to be an ensemble for the quantification of regional precipitation estimates in Pakistan.

Evaluation of Precipitation Products
As the selected satellite precipitation products differ in spatial resolution, initially an aggregating method was used to perform further analysis on the same spatial resolution (i.e., 0.25 • ) and daily timescale. Using the aggregation method, the daily precipitation products of the IMERG was aggregated from a 0.1 • to 0.25 • spatial resolution. For precipitation aggregation, we assigned areal weights, that is, 0.16, 0.08 and 0.04 to four IMERG grid cells falling completely inside a 0.25 • TRMM grid cell, four located halfway within, and the ninth covers one-fourth inside, respectively [34]. For better data quality, the approximate similarity between the time of observations for both satellite (09.30 a.m. PST, +5 in case of UTC 1430) and gauge sources (08 a.m. PST, +5 in case of UTC 1300) insured. Furthermore, to avoid any mismatch issue, using the latitude/longitude of the rain gauges, the amount of satellite precipitation in the pixel against the exact location of each rain gauge was extracted.
Moreover, following the point to pixel analysis [45], the IMERG and TRMM V7 precipitation observations at the grid cell (box), where at least one weather station's falls were compared with the corresponding gauged observations from January 2015 to December 2016. To systematically assess the efficacy of IMERG and TRMM products, several statistical indices were used, which include: (a) The correlation coefficient (CC) (Equation (1)), to evaluate the agreement between the satellite products observations and gauge-based precipitation; and (b) The Mean Square Error (MSE), (Equation (2)) to assess the average absolute error.
In addition to the above two indices, the contingency of satellite products was represented using: (c) The probability of detection (POD)/hit rate, (Equation (3)) representing a fraction of correctly detected precipitation events by the satellite products among all the gauged-based precipitation events-moreover, 1 mm was used as the rain/no rain threshold [44]; (d) False alarm ratio (FAR), (Equation (4)) denotes the fraction of incorrectly detected events by satellite records rainfall; and (e) The Critical success index (CSI), (Equation (5)) shows the overall fraction of precipitation events correctly detected by the satellite products. It is also valuable to mention here that the ideal values of CC, POD, and CSI is one, while in the case of FAR the value is zero, whereas, the product with minimum MSE is considered comparatively reliable.
where, S i and G i represents daily satellite precipitation product in mm, and gauge-based observation for the ith time step, respectively S and G are average values (mm), N indicate the sample size of selected time series, hits: means the numbers of events when both satellite product and gauged based data record the precipitation, miss: indicating the events recorded by gauged based observation, while missed by satellite product, false alarms: events when satellite product capture the precipitation, while gauged observation showing no precipitation.

Proposed Algorithm for Regional Satellite Precipitation Estimate Quantification
The proposed regional satellite precipitation estimate quantification framework is mainly based on the leave-one-out cross validation (LOOCV), regional performance weights of the considered satellite products, and the ensemble algorithm. In LOOCV, the data record of one station (assumed ungauged station) in a specific region was held out from the calibration database. Then the regional weights associated with each product were estimated using the calibration results of the remaining station (assumed donors). The LOOCV was repeated for all stations considered in the current study. The estimation process of regional weights and regional satellite precipitation estimate is as follows. i.
To apply the RP algorithm in selected five (5) diverse regions, Let for a specific gauge in any of selected region, S j (whereas j = 1, 2, 3, . . . , n) be the any available satellite product to be considered, S  (6)) associated with each jth satellite product has been computed. ii.
Using estimated MSE j of all donor gauged stations of a specific region (i.e., G 1 , or G 2 or, . . . , G 5 ), the regional MSE j r associated with each satellite product of that climatic region was estimated by simply averaging the MSE j of all donor stations. iii.
The product that does not provide the significant agreement with observed data can tend to produce high inaccuracy (high MSE). Hence, preliminary, instead of blindly combining the products, the S j with comparatively highest MSE j r was eliminated through MSE filter from the further process. The intention behind the elimination was to minimize the future quantification overhead while enhancing accuracy. More specifically, the subset k (whereas k = 1, 2, 3, . . . , m) of satellite products with least MSE j r value has been retained by eliminating j-k (here one) products having the highest MSE j r for further quantification. iv.
Further, based on performance (MSE j r value), the weights were assigned to individual retained kth satellite product using Equation (7).

v.
After calculation of the respective regional weights of the best subset satellite products (k) (Figure 2), the satellite precipitation observations time series of k satellite products have been ensemble using Equation (8) to quantify the regional precipitation estimates (RP) for the assumed ungauged station.
vi. Lastly, the developed RP i ← [RP 1 , RP 2 , RP 3 , . . . , RP N ] time series for a specific assumed ungauged station was evaluated by corresponding observations G i ← [G 1 , G 2 , G 3 , . . . , G N ] using Equations (1)-(6) to validate the efficacy and projected improvement compared to individual selected j satellite products.

Figure 2.
Flow diagram of the projected algorithm for the regional satellite precipitation estimate (RP).

Spatial Variability of Product Performance at Grid-Scale
All diurnal satellite-based precipitation time series-i.e., TR, IR, IT data-against the rain-gauge data were compared at grid having at least one station. Preliminary the emphasis was on the evaluation of the performance of all satellite product at different climatic regions. The performance evaluation was carried out using various metrics i.e., CC, POD, FAR, CSI and MSE. The spatial variability of these metrics for TR, IR, and IT daily product over the selected five different climatic regions are shown in Figure 3. The CC based evaluation ensued that compared to IR and TR, IT agreed well with the majority of the gauge observations. However, the CC was little lower in most parts of the selected regions and even under 0.2 in few gauge stations. The POD was significantly comparable in the case of IT and TR; resulting in similar spatial characteristics in most of the cases; however, IR was slightly lower over most of the stations, whereas the FAR distribution ensued quite unlike result from the preceding metrics. Based on the FAR evaluation, the TR resulted in more significant outcomes with low FAR compared to IR and IT in all the regions and MSE shows a quite similar distribution in most of the cases. The performance comparison based on MSE noted that although these three satellite products produced substantial results at a specific gauge, none of the above performed consistently well at all gauges. Among the three selected products, the TR and IT performed better at the majority of the stations in G1 (with average MSE value of 81 and 97 mm 2 ), G3 (with average MSE value of 10 and 10 mm 2 ), and G4 (with avg. MSE value of 38 and 41 mm 2 ). Whereas, the IR and TR showed significant results in G2 (with avg. MSE value of 108 and 76 mm 2 ) and G5 (with average MSE value of 14 and 17 mm 2 ) (see Figures 4 and 5). There was no particular reason to prefer, a priori, one of these for all stations. Hence, to assess consistency and reliability throughout the entire study region, the projected RP, based on the ensemble algorithm, was also analyzed here.
The same metrics for performance evaluation were adopted to validate the hypothesized framework. Table 1 shows the complete Evaluation Statistic for the PR algorithm at each selected gauge. The results (Table 1 and Figure 3 (RP-section)) illustrated that the CC and POD significantly increased as compared to the TR, IR, and IT throughout the study area. A very few stations (having less than five rainfall events) still evidenced the low values of CC and POD. Furthermore, to test whether the difference in the evaluation measures (CC, POD etc.) are statistically significant or not, we performed a nonparametric test (i.e., Wilcoxon signed rank sum test) using the sample size n = 36. The Wilcoxon signed rank sum test was performed to test whether our research finding, that is, that the RP performed better compared to other satellite products (alternative hypothesis H1), is meaningful or whether there is no significant difference in outcomes (Null hypothesis Ho), at a significance level of α = 0.05.
The Wilcoxon signed rank sum test showed that there is a significant difference between the RP and all selected satellite products (i.e., TR, IR and IT) in terms of POD with a p value less than α, a Wilcoxon test statistic (w) less than the critical value (i.e., w critical ( f or.n = 36) = 227 and an absolute z-statistic |z| > z critical . Moreover, in the case of CC, a significant statistical difference was also observed between the RP and IR [|z|(4.861) > z critical (1.96) and w(0) ≤ w critical (227)], RP and TR [ and w(39.5) ≤ w critical (227)], and between RP and IT [w(64.5) ≤ w critical (227)]. Figure 4 shows the competitive performance evaluation of the IR, IT, TR, and RP over all the selected stations in Pakistan. The CC, POD, MSE and CSI based evaluation result was that the RP product significantly outperformed a majority of other stations (as shown with the blue circles) and provided a consistent performance in all the diverse regions of Pakistan. It may be worthy here to mention that low performance value of RP can be resulted in case of poor performance of all the TR, IR and IT at the specific station.     The box plot ( Figure 5) provides the graphical summary for the performance of each satellite product. The accuracy of the products can be observed by the minimum (lower line) first quartile 3rd quartile (central rectangle), whisker (segment inside the rectangle) and maximum (top small line) of the given box plots drawn against each product for whole study areas. The statistical summary indicates that the RP provides a comparatively better performance based on all the metrics.

Spatial Distribution of Product Performance at Regional Scale
To further explore the continuity, consistency of the TR, IR, IT and RP precipitation estimate, the products were evaluated and compared at the regional scales ( Figure 6) using the evaluation metrics over the five different climatic regions. The metrics evaluation ensured that in most cases, the RP outperformed the other products in term of CC, CSI, FAR and more specifically in terms of POD and MSE. The POD and MSE based performance evaluations show that the performance has been improved greatly compared to either product. The higher POD and lower MSE value in all regions indicated that, irrespective of the region, the RP could perform better compared to other selected satellite products. It was expected, as the RP significantly exploits the advantages and minimizes the disadvantages of these satellite products. An outcome-based conclusion resulting in a significant reliability of agreement has been observed between the RP and gauged observations. Hence, such a blended approach could be applied to pixels without gauges and considered a step forward in approving the efficacy of satellite-based observations at the regional scale with no, or less, hydrometric networking.

Discussion
Satellite-based precipitation products are promising alternative sources of observation in a region with sparse hydrometric networking. Several studies have been carried out to develop and evaluate these products in Pakistan [37][38][39][40][41] and all over the world. However, the supremacy of any specific product has not been observed in different regions with diverse climatic conditions [31,42]. In the current study, the quantitative comparison between three satellite products' estimates and projected precipitation estimates was carried out using gauge-based observations. The analysis outcomes presented in Figures 3-6 support the finding of evolutionary studies [37][38][39][40][41] in Pakistan. The results demonstrated that the selected satellite product (TR, IR, and IT) had significant potential to capture the gauge observation; however, performance was inconsistent in different regions of Pakistan. Similar to the findings of [38,41], the TMPA product provided substantial outcomes in term of CC, FAR, POD, CSI and MSI-in low altitude regions, however, in the case of high altitude, glaciated regions, dry areas and regions with fewer events of rainfall, the performance declined. Anjum et al. [39] evaluated real-time and post real-time TRMM precipitation estimates for a single heavy precipitation event (28-30 July) over a watershed and results in low correlation and underestimation. Furthermore, Cheema et al. [40] also attempted to calibrate the TRMM product at a basin scale (Indus basin) and different temporal scales by using regression and geographical differential analysis. More specifically, Anjum et al. and Cheema et al. [39,40] also conceded the geo-topographic, climatic dependability, and low performance of these products in Pakistan, which also second the outcomes of the current study. There could be numbers of reason, for example, uncertainty in verification of results in developing countries like Pakistan (e.g., station density, the impact of wind flow, random static error), complex topography and climate over a specific region, seasonality (monsoon, pre-monsoon, post-monsoon, winter, summer etc.) [31,38,42], and scarce use of the gauge in the production of GPM gauge observation estimation etc.
To avoid this kind of non-negligible error in these satellite products, several efforts have been made, for example, improvement in calibration algorithms, moving from TRMM to IMERG, reducing sampling issue and so forth, however, still there is room for improvement to provide consistent results in diverse regions [33,34]. This study demonstrated that by merging different satellite products, there could be the possibility to minimize the inconstancy issue and the error associated with these products. The comparative outcomes second the hypothesis and the results showed that, despite the low performance (i.e., CC avg = 0.33), the project RP framework was able to provide better agreement evaluation (i.e., low MSE avg : 37.79 and high POD avg : 0.61) with the gauged observation throughout the study area. This may be because of the ensemble algorithm used in the projected framework, which has the advantage of selecting a suitable subset of products based on performance and considering the final product. In addition, the plausible reason behind the low performance is owed to the total dependability on the performance of the candidate products.
Moreover, to the authors' knowledge, the IMERG is yet to be evaluated in the selected study area. Hence, besides providing a projected framework, the current study could be the first providing useful feedback from Pakistan for GPM researchers by highlighting the need for refinement in the IMERG products in Pakistan. In addition, it accentuates the need for denser and more reliable long-term precipitation observations networking to facilitate effective evaluation in the selected area of study.

Conclusions
In this study, the quality of different satellite products, that is, TRMM 3B42 (TR) and real-time (late) (IT) and research (post-real-time) (IR) products of IMERG have been evaluated over five different climatic and topographic regions of Pakistan. In addition to evaluation, an ensemble algorithm based on regional precipitation estimates (RP) have been introduced aiming at a better performance throughout the regions. The considered TR, IR, IT, and RP products were compared based on five distinct performance evaluation metrics (i.e., correlation coefficient (CC), Mean Square Error (MSE), probability of detection (POD), False alarm ratio (FAR) and Critical success index (CSI)) on the diurnal time scale. The outcomes-based main conclusions drawn are as follows: (i) Despite the inconsistency in different climatic regions, the TR, IR, and IT have shown significant potential to capture the gauge observations. Among these products, the IT outperformed the others at the majority of the stations; however, in G3 and G5, a slightly low performance was observed (ii) Although these products have performed well, still they contain considerable errors and there is a room for further improvement in their ability to provide better agreement throughout different regions (iii) The comparison ensured that the proposed RP framework provided significant agreement with the gauged observations throughout the study area and better outcomes. In addition, the proposed method has an advantage of selecting more suitable products based on performance and considering it as the final observation, therefore it can be considered a step forward in the improvement of the efficacy of the satellite-based observations (iv) As the projected framework is totally dependent on the performance of the candidate products, hence this total reliability could be a factor to be considered as its major drawback.