Machine Learning-Based Error Modeling to Improve GPM IMERG Precipitation Product over the Brahmaputra River Basin

The Integrated Multisatellite Retrievals for Global Precipitation Measurement (GPM) (IMERG) Level 3 estimates rainfall from passive microwave sensors onboard satellites that are associated with several uncertainty sources such as sensor calibration, retrieval errors, and orographic effects. This study aims to provide a comprehensive investigation of multiple machine learning (ML) techniques (Random Forest, and Neural Networks), to stochastically generate an error-corrected improved IMERG precipitation product at a daily time scale and 0.1°-degree spatial resolution over the Brahmaputra river basin. In this study, we used the operational IMERG-Late Run version 06 product along with several meteorological and land surface parameters (elevation, soil type, land type, soil moisture, and daily maximum and minimum temperature) to produce an improved precipitation product in the Brahmaputra basin. We trained, tested, and optimized ML algorithms using 4 years (from 2015 through 2019) of reference rainfall data derived from the rain gauge. The ML generated precipitation product exhibited improved systematic and random error statistics for the study area, which is a strong indication for using the proposed algorithms in retrieving precipitation across the globe. We conclude that the proposed ML-based ensemble framework has the potential to quantify and correct the error sources for improving and promoting the use of satellite-based precipitation estimates for water resources applications.


Introduction
The accurate estimation of precipitation incorporates a significant impact on the hydrology, vegetation, natural life, and ecology of any water resource system [1,2]. Despite the fact that precipitation is one of the most imperative parameters for water resource management, documenting precise precipitation information on a global scale is a challenge for climate experts and the scientific community [3][4][5]. While in situ rain gauge station and weather radar data are the most common sources to obtain precipitation information, satellite-based precipitation products have been recognized

Study Area
The GBM basin consists of the Ganges, Brahmaputra, and Meghna rivers originating from the Himalayas and Vindhya ranges flowing through China, Bhutan, Nepal, India, and Bangladesh and ultimately connecting with the inlet at the Bay of Bengal [68]. In terms of hydrologic vulnerability assessment, GBM deserves special attention for some reasons. For instance, the GBM basin dominates annual flooding cycles, the region of inundation, and the withdrawal of floodwaters based on the hydrologic settings of the adjacent countries surrounded by it [46,69].
Moreover, under changing climate scenarios, dry season rainfall pattern is projected to further decrease for elevated temperature while monsoon rainfall is expected to be more intense resulting from the glacier or early snowmelt. Therefore, a basin level assessment with a more comprehensive evaluation of climate change impacts is mandatory for flood regions such as Nepal, India and Bangladesh [70][71][72]. Improved IMERG data with limited error might have the utility for the large-scale water resources modeling in GBM to assess climate impact [21,29,39].
Inside the GBM basin, the Brahmaputra River (alternatively known as Yarlung Tsangpo river in China) Basin in Southeast Asia is the fourth largest fluvial system in the world [73]. This basin has a drainage area of about 570,000 km 2 with rugged terrain, accommodating a population of 130 million which is spread over China, India, Bhutan, and Bangladesh [74]. Hydrological modeling for this region is crucial and complex due to its intense seasonal rainfall, unevenly distributed and poorly maintained real-time rain gauge data, and convoluted transboundary issues [75,76].
The study area has a varied topographic gradient from around 8500 m MSL at the origin to about 2 m MSL at the outlet where it meets the Ganges. The upper Brahmaputra river basin lies in the temperate climate zone with mostly unpopulated area whereas the lower Brahmaputra river basin is in a tropical climate that is densely populated and vulnerable to monsoon flooding [77,78]. Hence, this region has a higher number of in situ stations. The study area and the corresponding in situ gauge networks consisting of 120 stations are shown in Figure 1.

Datasets
The datasets used in this study span from March 2015 to March 2019. The daily accumulated precipitation datasets from these stations were collected from the Central Water Commission, India; Bangladesh Meteorological Department, Bangladesh, and the Department of Hydrometeorology, Nepal. The gauge measurement were averaged at 0.1 × 0.1 degree grid resolution. For this study, we used the operational IMERG-Late Run version 06 product [79] which was the latest available late run product (https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGDL_06/summary). IMERG precipitation has three different versions, (a) early run; (b) late run and (c) final run. In this study we used late run product which uses a climatological adjustment that incorporates gauge data. The IMERG late run product has both backward and forward morphing and retrieved from different passive microwave (PMW) and infrared (IR) sensors. We used the latest available product (V06) late run version of IMERG because the objective of this study was to focus on the operational use of this precipitation in short-term decision making, cropping, drought management, and water resources planning for the resource-limited stakeholder organizations. Moreover, the latest version of IMERG (V6) has a number of advantages over previous versions such as upgraded GPROF-TMI V05 incorporation into the dataset (https://gpm.nasa.gov/sites/default/files/document_files/IMERG_V06_release_notes_190503.pdf). IMERG (V6) is a high-resolution product available with 0.1 × 0.1 degree spatial and 30 min temporal resolution [6,79]. The dataset's spatial coverage is from 90° N-90° S and temporal coverage is from April 2014 to the present (https://gpm.nasa.gov/data-access/downloads/gpm).
For model forcing, elevation data were extracted from a 30 × 30 DEM (Shuttle Radar Topography Mission) 1 Arc-Second Global (Digital Object Identifier (DOI) number:/10.5066/F7PR7TFT). Soil Moisture Active Passive (SMAP) Level-4 soil moisture data were collected from the National Snow and Ice Data Center for spatial coverage: N: 85.044, S: −85.044, E: 180, W: −180. This dataset has a 9 km Equal-Area Scalable Earth (EASE)-Grid spatial and 3-hourly temporal resolution [80]. The SMAP soil moisture data from March 2015 to March 2019 was used in this study. Daily maximum and minimum land surface temperature was collected from NASA Land Processes Distributed Active Archive Center (LP DAAC). The MOD11C1 version 06 products provide land surface temperature value in a 0.05° by 0.05° Climate Modeling Grid on a daily temporal scale with a latency of approximately 1 day [81]. For land type, USGS Global Land Cover Characterization (GLCC) product

Datasets
The datasets used in this study span from March 2015 to March 2019. The daily accumulated precipitation datasets from these stations were collected from the Central Water Commission, India; Bangladesh Meteorological Department, Bangladesh, and the Department of Hydrometeorology, Nepal. The gauge measurement were averaged at 0.1 × 0.1 degree grid resolution. For this study, we used the operational IMERG-Late Run version 06 product [79] which was the latest available late run product (https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGDL_06/summary). IMERG precipitation has three different versions, (a) early run; (b) late run and (c) final run. In this study we used late run product which uses a climatological adjustment that incorporates gauge data. The IMERG late run product has both backward and forward morphing and retrieved from different passive microwave (PMW) and infrared (IR) sensors. We used the latest available product (V06) late run version of IMERG because the objective of this study was to focus on the operational use of this precipitation in short-term decision making, cropping, drought management, and water resources planning for the resource-limited stakeholder organizations. Moreover, the latest version of IMERG (V6) has a number of advantages over previous versions such as upgraded GPROF-TMI V05 incorporation into the dataset (https: //gpm.nasa.gov/sites/default/files/document_files/IMERG_V06_release_notes_190503.pdf). IMERG (V6) is a high-resolution product available with 0.1 × 0.1 degree spatial and 30 min temporal resolution [6,79]. The dataset's spatial coverage is from 90 • N-90 • S and temporal coverage is from April 2014 to the present (https://gpm.nasa.gov/data-access/downloads/gpm).
For model forcing, elevation data were extracted from a 30 × 30 DEM (Shuttle Radar Topography Mission) 1 Arc-Second Global (Digital Object Identifier (DOI) number:/10.5066/F7PR7TFT). Soil Moisture Active Passive (SMAP) Level-4 soil moisture data were collected from the National Snow and Ice Data Center for spatial coverage: N: 85.044, S: −85.044, E: 180, W: −180. This dataset has a 9 km Equal-Area Scalable Earth (EASE)-Grid spatial and 3-hourly temporal resolution [80]. The SMAP soil moisture data from March 2015 to March 2019 was used in this study. Daily maximum and minimum land surface temperature was collected from NASA Land Processes Distributed Active Archive Center (LP DAAC). The MOD11C1 version 06 products provide land surface temperature value in a 0.05 • by 0.05 • Climate Modeling Grid on a daily temporal scale with a latency of approximately 1 day [81]. For land type, USGS Global Land Cover Characterization (GLCC) product with 1 km spatial resolution was used [82]. Soil type was extracted from the FAO Harmonized World Soil Database [83]. Table 1 summarizes land surface and meteorological datasets. All the meteorological datasets were resampled to 0.1 • by 0.1 • using the cubic spline method. The original soil moisture dataset was of 9 km EASE grid with 3 hourly temporal resolution. These datasets were averaged into the daily scale and resampled to 0.1 • by 0.1 • using the cubic spline method. Daily maximum and minimum temperature data were 0.05 • by 0.05 • . These data were resampled to the same resolution of the principal forcing data (precipitation) using the cubic spline method. Regarding land surface variables, Shuttle Radar Topographic Mission (SRTM) based Digital Elevation Model (DEM) elevation was extracted for each of the grid locations of the precipitation. Similarly, USGS land cover type and FAO soil type were extracted in each of the grid locations of precipitation. The in-situ precipitation was station-based (in discrete locations). For each day of the study period, all the available in-situ rainfall data were converted into gridded rainfall using Inverse Distance Weightage (IDW) method as mentioned in [84]. Finally, all daily data were mapped to the 0.1 • grid chosen to be the final spatial grid for the error model to generate the error-corrected IMERG product. For the error models, the response variable is the rainfall estimate from rain gauge.

Precipitation Error Modeling
To develop the error models for the study area we used two machine learning techniques: Random Forests (RF) and Neural Network (NN), to improve GPM IMERG precipitation product. A schematic diagram of the error modeling process is shown in Figure 2. This study devised a randomized and out-of-sample validation experiment to quantify the uncertainty of IMERG precipitation product. Specifically, the two error models, (Neural Network and Random Forest) were developed on the training dataset and were used to predict the holdout dataset, which was applied for testing.
We used "sample" function in R programming to shuffle the row indices of all the dataset to reorder the rows of the dataset randomly. Then, we split 80% of the dataset into the training set and remaining 20% of them into the testing set. Specifically, randomly divided 121,046 rows of data were treated as training and 30,259 rows of the data as testing. To avoid overfitting, we used these independent test data to check the method's accuracy on training data after training which adjusted network structure as well as optimization algorithm parameters of network weight. By using these data and according to the magnitude of mean squared residuals, the network parameters were adjusted. The performance of the error models was evaluated by comparing the error metrics described in Section 3.2.
treated as training and 30,259 rows of the data as testing. To avoid overfitting, we used these independent test data to check the method's accuracy on training data after training which adjusted network structure as well as optimization algorithm parameters of network weight. By using these data and according to the magnitude of mean squared residuals, the network parameters were adjusted. The performance of the error models was evaluated by comparing the error metrics described in Section 3.2.

Random Forests (RF)
To develop the precipitation error model, we used non-parametric Random Forest (RF) algorithm which uses ensembles ("forests") of classification or regression trees [52]. Each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges as the number of trees in the forest becomes large [85,86]. We used R package of "randomForest" and number of trees 1000 for the RF model, and the error for the model was converged before number of trees 1000 as shown in Figure 3.
Initially, to optimize the model, datasets were randomly divided into a training dataset, validation dataset, and test dataset based on the 8:1:1 split rule and the parameters were adjusted for this algorithm. One of the challenges in a data-driven machine learning algorithm is overfitting. In the RF algorithm, each tree chooses and permutes random subsets of input variables at each splitting node, which reduces overfitting and improves the strength of predictions [52]. Therefore, RF utilizes the optimal number "mtry" (size of the random subset of input variables) for split point selection at each node, which introduces randomness in the forests to reduce the correlation between trees [14]. After finalizing the model, we split 80% of the dataset into the training set and remaining 20% of them into the testing set. A schematic diagram is presented in Figure 4. The model testing results are described in Section 4.2.
algorithm which uses ensembles ("forests") of classification or regression trees [52]. Each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges as the number of trees in the forest becomes large [85,86]. We used R package of "randomForest" and number of trees 1000 for the RF model, and the error for the model was converged before number of trees 1000 as shown in Figure 3. Initially, to optimize the model, datasets were randomly divided into a training dataset, validation dataset, and test dataset based on the 8:1:1 split rule and the parameters were adjusted for this algorithm. One of the challenges in a data-driven machine learning algorithm is overfitting. In the RF algorithm, each tree chooses and permutes random subsets of input variables at each splitting node, which reduces overfitting and improves the strength of predictions [52]. Therefore, RF utilizes the optimal number "mtry" (size of the random subset of input variables) for split point selection at each node, which introduces randomness in the forests to reduce the correlation between trees [14]. After finalizing the model, we split 80% of the dataset into the training set and remaining 20% of them into the testing set. A schematic diagram is presented in Figure 4. The model testing results are described in Section 4.2.

Neural Network (NN)
The neural network replicates the function of clusters of biological neurons that constitute an animal brain. The fundamental building blocks are called nodes and are used as information processing elements [55][56][57]. Through a training process, neural networks learn algorithms that can be fitted to the information for detailed data analysis. A schematic representation of Neural Network (NN) is shown in Figure 5. Such learning algorithms are defined by the utilization of a given output that is comparable to the predicted output and by adjusting the parameters as per the comparison.

Neural Network (NN)
The neural network replicates the function of clusters of biological neurons that constitute an animal brain. The fundamental building blocks are called nodes and are used as information processing elements [55][56][57]. Through a training process, neural networks learn algorithms that can be fitted to the information for detailed data analysis. A schematic representation of Neural Network (NN) is shown in Figure 5. Such learning algorithms are defined by the utilization of a given output that is comparable to the predicted output and by adjusting the parameters as per the comparison.

Neural Network (NN)
The neural network replicates the function of clusters of biological neurons that constitute an animal brain. The fundamental building blocks are called nodes and are used as information processing elements [55][56][57]. Through a training process, neural networks learn algorithms that can be fitted to the information for detailed data analysis. A schematic representation of Neural Network (NN) is shown in Figure 5. Such learning algorithms are defined by the utilization of a given output that is comparable to the predicted output and by adjusting the parameters as per the comparison.  These predicted outputs are usually transformed through the hidden layers of the neural network from the input data by weights in the parameter [87,88]. When an input enters the node, it gets multiplied by a weight value and the resulting value is either observer or passed to the next layer of the network, which can be helpful to understand the mechanism of such a concept. Suppose a neural network calculates an output y = f(x), for a given input x and weight, w. However, the training process is not completed yet. As such, the predicted output y will be different from the observed output x.
To identify such discrepancies, error function E, like the sum of squared errors, SSE = n i=1 (y i − x i ) 2 can be used; where y i is the ith value of the variable to be predicted. All weights keep getting adapted based on the rule of the learning algorithm.
The process stops when all the partial derivative, dE dw of the error function with respect to the weights are smaller than the defined threshold. Such a methodology was implemented in order to reduce errors for satellite weather data propagation [89]. In this study, NN used backpropagation, namely the resilient backpropagation (RPROP) algorithm [90] which allowed for flexible settings through custom-choice of error and activation function. Finally, to optimize the error model, the calculation of generalized weights [91] was implemented and generated error corrected IMERG prediction. Table 2 summarizes the tuned hyperparameters for the algorithms used in this study.

Performance Evaluation Error Metrics
We tasked various error metrics to assess the error model performances. We evaluated the random error component based on the normalized centered root mean square error (NCRMSE), and defined as: Here, y i is the reference rainfall,ŷ i is model predicted rainfall, and n is the quantity of samples used in the calculation. NCRMSE ranges from 0 (an optimal value) to positive infinity.
To measure the systematic error, we used mean relative error (MRE) which is the mean of the relative percentage error, calculated by the normalized average: MRE represents the magnitude and direction of error with positive value referring to overestimation while negative value referring to underestimation.
We applied Theil's 'coefficient of inequality' for the model performances. Theil's inequality coefficient U 1 and U 2 are expressed as below [92]: Here, the variable of interested is denoted by y t and the forecast is denoted by f t . The magnitude of U 1 ranges from 0 and 1 with U 1 = 0 suggesting perfect forecast (y t = f t ). Similarly, U 2 , a value of zero indicates perfect forecast (y t+1 = f t+1 ). U 2 value of 1 indicates how the model performance compares with naïve forecast (f t+1 = y t ) (Details in [62]).
To assess the relative error metric difference (∆ error ) (in %) between model corrected IMERG and original IMERG products we devised the following equation: where ∆ m indicates the error metric for model corrected IMERG, and ∆ i represents the error metric for original IMERG product. To calculate the relative reduction of random error, NCRMSE error metric is used in Equation (5). Similarly, we used MRE in Equation (5) to calculate the relative reduction of systematic error (Details in [15]).

Variable Importance
To construct the error model, the selection of features was based on past research, which demonstrated that several meteorological and land surface features such as satellite-based precipitation, elevation, soil type, land type, soil moisture, and temperature are crucial input features that contribute to the uncertainty of the ML-based error model [14][15][16]. After choosing these input features, p-value experiment and the variable importance methodology [52] were applied to quantify the impact of change from one feature to another.
To assess the impact of the sample size and variability of each variable, p-value experiments were examined for this study area. A variable's low p-value (<0.05) indicates the rejection of the null hypothesis which means there is a trend in the time series. In this study, p-values were determined for all the variables, i.e., IMERG, temperature, soil moisture, elevation, land cover, and soil type. The p-values were found close to 0 which are less than the significance level α (alpha) = 0.05 for all the input variables. The result is considered statistically significant by rejecting the null hypothesis. Therefore, the predictor variables are significant in machine learning-based error modeling for this study area.
A variable importance experiment was conducted by calculating the magnitude of the percentage increase in mean square error (%IncMSE) of the model [52,93]. Higher magnitudes of %IncMSE show higher importance of the input features for the error model. The result from the variable importance experiment is displayed in Figure 6

Evaluation of Error Model Corrected Rainfall Rates
In this section, we used Quantile-vs.-Quantile (Q-Q) plot and the error metrics described in Section 3.2 (NCRMSE, MRE, U1, and U2) to compare machine learning-based error model performances. To compare the error-corrected IMERG(V6) precipitation estimates using the two different error models (NN and RF), the Q-Q plots of the original IMERG(V6), error-corrected IMERG(V6), and reference rain rates were produced for the test datasets, as shown in Figure 7. The figure displayed that the NN model corrections exhibited a slight improvement compared to the RF model.

Evaluation of Error Model Corrected Rainfall Rates
In this section, we used Quantile-vs.-Quantile (Q-Q) plot and the error metrics described in Section 3.2 (NCRMSE, MRE, U 1 , and U 2 ) to compare machine learning-based error model performances.
To compare the error-corrected IMERG(V6) precipitation estimates using the two different error models (NN and RF), the Q-Q plots of the original IMERG(V6), error-corrected IMERG(V6), and reference rain rates were produced for the test datasets, as shown in Figure 7. The figure displayed that the NN model corrections exhibited a slight improvement compared to the RF model. Section 3.2 (NCRMSE, MRE, U1, and U2) to compare machine learning-based error model performances. To compare the error-corrected IMERG(V6) precipitation estimates using the two different error models (NN and RF), the Q-Q plots of the original IMERG(V6), error-corrected IMERG(V6), and reference rain rates were produced for the test datasets, as shown in Figure 7. The figure displayed that the NN model corrections exhibited a slight improvement compared to the RF model. The performances of the models were also evaluated in terms of the mean relative error (MRE), as shown in Figure 8. MRE is calculated for five reference precipitation ranges: rainfall values in the range of <25th, 25th-75th, 75th-90th, 90th-95th, and >95th percentile. The results indicated that NN and RF were able to significantly reduce the systematic error for the >25th percentile. We found low systematic error values for both models corrected IMERG(V6) compared to original IMERG(V6) estimates, which indicated acceptable characterization of estimation uncertainty. The error-corrected IMERG(V6) product exhibited a slightly higher improvement in the NN technique by reducing systematic error compared to the RF model for all five reference precipitation ranges. For the >75th percentile, all individual rainfall datasets (original IMERG(V6), RF-corrected IMERG(V6), NNcorrected IMERG(V6)) showed underestimation. For the low rainfall (<25th), the systematic error (3.2-3.3) slightly reduced for both models, compared to the systematic error (3.7) of the original IMERG(V6). Moreover, the error metric difference (∆ ) considering MRE was estimated for the The performances of the models were also evaluated in terms of the mean relative error (MRE), as shown in Figure 8. MRE is calculated for five reference precipitation ranges: rainfall values in the range of <25th, 25th-75th, 75th-90th, 90th-95th, and >95th percentile. The results indicated that NN and RF were able to significantly reduce the systematic error for the >25th percentile. We found low systematic error values for both models corrected IMERG(V6) compared to original IMERG(V6) estimates, which indicated acceptable characterization of estimation uncertainty. The error-corrected IMERG(V6) product exhibited a slightly higher improvement in the NN technique by reducing systematic error compared to the RF model for all five reference precipitation ranges. For the >75th percentile, all individual rainfall datasets (original IMERG(V6), RF-corrected IMERG(V6), NN-corrected IMERG(V6)) showed underestimation. For the low rainfall (<25th), the systematic error (3.2-3.3) slightly reduced for both models, compared to the systematic error (3.7) of the original IMERG(V6). Moreover, the error metric difference (∆ error ) considering MRE was estimated for the different reference precipitation ranges to show the performances of RF-corrected IMERG(V6), and NN-corrected IMERG(V6), ( Table 3). The relative reduction of the systematic error for both model corrected IMERG(V6) is substantial (9-42%) with respect to original IMERG(V6).
The normalized centered root mean square error (NCRMSE) metric was examined precisely for the quantification of the performances of the estimates of IMERG. The results are summarized in Figure 8. The results showed that the random error reduced consistently in all products (original IMERG(V6), RF-corrected IMERG(V6), NN-corrected IMERG (V6) as the rainfall rate increased. Both models corrected IMERG(V6) and exhibited lower random error in comparison to the original IMERG(V6) for all precipitation ranges. The NN-corrected IMERG(V6) results exhibited a substantially higher improvement by producing lower random error compared to the RF model. Specifically, for the high rain rates (>95th percentile), the NN error model reported considerably reduced NCRMSE values (~0.05) compared to the RF error model. Similarly, for reference rainfall values in the moderate rainfall ranges (>25th percentile to <95th percentile), the results show that the NN-based corrections (NCRMSE: 0.15-0.20) bring IMERG estimates closer to the reference precipitation. Furthermore, the error metric difference (∆ error ) considering NCRMSE for the different models are presented in Table 3. The NN-corrected IMERG(V6) showed substantial relative reduction of random error (37-65%) with respect to the original IMERG(V6) precipitation datasets over the study area. Similarly, RF-corrected IMERG(V6) produced a reasonable relative reduction of random error (~<21%) which also demonstrated the satisfactory performance of RF-corrected IMERG (V6) in comparison to original IMERG (V6). Overall, this study revealed a machine learning-based error model that leads to an advanced error characterization of IMERG precipitation estimation by the significant improvement of random and systematic error.
Forecasting 2020, 2 FOR PEER REVIEW 12 different reference precipitation ranges to show the performances of RF-corrected IMERG(V6), and NN-corrected IMERG(V6), ( Table 3). The relative reduction of the systematic error for both model corrected IMERG(V6) is substantial (9-42%) with respect to original IMERG(V6). The normalized centered root mean square error (NCRMSE) metric was examined precisely for the quantification of the performances of the estimates of IMERG. The results are summarized in Figure 8. The results showed that the random error reduced consistently in all products (original IMERG(V6), RF-corrected IMERG(V6), NN-corrected IMERG (V6) as the rainfall rate increased. Both models corrected IMERG(V6) and exhibited lower random error in comparison to the original IMERG(V6) for all precipitation ranges. The NN-corrected IMERG(V6) results exhibited a substantially higher improvement by producing lower random error compared to the RF model. Specifically, for the high rain rates (>95th percentile), the NN error model reported considerably reduced NCRMSE values (~0.05) compared to the RF error model. Similarly, for reference rainfall values in the moderate rainfall ranges (>25th percentile to <95th percentile), the results show that the NN-based corrections (NCRMSE: 0.15-0.20) bring IMERG estimates closer to the reference precipitation. Furthermore, the error metric difference (∆ ) considering NCRMSE for the different  In addition, Theil's inequality co-efficient (U 1 and U 2 ) values for the different models are shown in Figure 9. Theil's inequality co-efficient, U 1 marginally varied 0.15 to 0.17 for the two models, values which are less than the original IMERG(V6) produced U 1 (0.21), as a function of magnitude which showed prominent performances in both models. The magnitude of U 2 greater (lower) than 1 indicates less (more) accurate performance compared to the naïve approach. Moreover, both models also showed similar performances by producing U 2 values close to~1. Theil's inequality co-efficient, U 2 (1.15) for original IMERG(V6) is greater than both models. These results indicated that the model-corrected IMERG(V6) exhibited a slightly further improvement compared to the original IMERG(V6). Overall, the Theil's inequality co-efficient results showed small relative improvements for the model-corrected IMERG(V6) compared to the original IMERG(V6).

>95th
32% 32% 65% 21% In addition, Theil's inequality co-efficient (U1 and U2) values for the different models are shown in Figure 9. Theil's inequality co-efficient, U1 marginally varied 0.15 to 0.17 for the two models, values which are less than the original IMERG(V6) produced U1 (0.21), as a function of magnitude which showed prominent performances in both models. The magnitude of U2 greater (lower) than 1 indicates less (more) accurate performance compared to the naïve approach. Moreover, both models also showed similar performances by producing U2 values close to ~1. Theil's inequality co-efficient, U2 (1.15) for original IMERG(V6) is greater than both models. These results indicated that the modelcorrected IMERG(V6) exhibited a slightly further improvement compared to the original IMERG(V6). Overall, the Theil's inequality co-efficient results showed small relative improvements for the modelcorrected IMERG(V6) compared to the original IMERG(V6).

Discussion
Two error models, the random forest (RF) and the neural network (NN), were evaluated based on quantitative error statistics (i.e., NCRMSE and MRE), and Theil's "coefficient of inequality"

Discussion
Two error models, the random forest (RF) and the neural network (NN), were evaluated based on quantitative error statistics (i.e., NCRMSE and MRE), and Theil's "coefficient of inequality" statistics (U 1 and U 2 ). The systematic error for the models varied from overestimation to underestimation as the rain rate increased which is coherent to the findings of [3,14,19,20]. Moreover, the NN model showed promising performance for the moderate and high precipitation rate by displaying a high relative reduction of systematic error (23-37%).
Additionally, the NCRMSE metric was also assessed to quantify the effect of precipitation error modeling in reducing the random error component. The study showed that machine learning-based error models significantly reduced random error, and this reduction exhibited rainfall magnitude dependence. Specifically, NCRMSE for NN reduced (65%) considerably for the high rain rates (>95th percentile), showing a very high degree of agreement with reference precipitation which was consistent with findings by previous studies [14,15,33]. In addition, the RF and NN techniques considered elevation as a significant feature, which demonstrated the ability of the models to reduce the systematic and random error considerably in the study area. Overall, the performance of the machine learning-based precipitation estimates was consistent with findings by previous studies [14,15,33].
We would like to note that the results shown in Section 4.2 are based on the test dataset and showed improved results by significantly reducing random and systematic errors, which indicates that our model is successfully calibrated and could potentially be useful to predict the independent hydrometeorological dataset. The machine learning-based error models can manipulate the training data in such a way that the actual results expected from the untrained dataset can be quite different from the evaluated results using the training dataset [51,52,94]. Therefore, we considered the representation of extreme (>95th and <25th) precipitation values in the training and testing dataset to make sure that it covered the entire range of the dataset. Applying such a validation approach, the model has good skill on the independent test data in this analysis, which prevents overfitting by producing reliable results.

Conclusions
In this study, we investigated machine learning-based precipitation error modeling algorithms to improve the GPM IMERG precipitation product utilizing meteorological and land surface features (elevation, soil type, land type, soil moisture, and daily maximum and minimum temperature) with high-resolution in-situ precipitation rainfall data over the Brahmaputra river basin.
The comparison of NN and RF corrected rainfall values and the reference rainfall values were performed using Q-Q plots and showing satisfactory alignment along the 45-degree line. The error corrected IMERG(V6) results exhibited a slightly higher improvement by NN compared to the RF model. To investigate the accuracy of the error models, validation experiments based on the out-of-sample data approach were used. In terms of systematic and random error metrics, no significant differences were exhibited between two models (RF and NN). Generally, the machine learning-based model is expected not to capture very low and extremely high values successfully [14,93]. This is because the model accuracy is sensitive to sample size and the data representativeness in the training dataset [95,96]. Therefore, very large sample sizes are required for low and extremely high values to quantify the rate of convergence to the underlying cumulative distribution function. Results from quantitative error statistics are consistent in terms of the reduction of the random and systematic error for all the precipitation percentile ranges. This is an indication of how we successfully trained our model instead of overfitting.
The accurate estimation of rainfall in ungauged areas is an essential component to understand water resource systems efficiently. Therefore, extending this machine learning-based error modeling algorithm to the global scale and for other PMW precipitation estimates can be potentially useful. The improvements demonstrated by the error models with independent cross-validation approach indicate the transferability of the error model among complex terrains. Another possible extension of this study is to investigate uses of the PMW ensemble-based error predictions in integrated precipitation algorithms such as NOAA's CMORPH techniques.