Enhancing Precipitation Estimates Through the Fusion of Weather Radar, Satellite Retrievals, and Surface Parameters

: Accurate and timely monitoring of precipitation remains a challenge, particularly in hyper-arid regions such as the United Arab Emirates (UAE). The aim of this study is to improve the accuracy of the Integrated Multi-satellitE Retrievals for the Global Precipitation Measurement (GPM) mission’s latest product release (IMERG V06B) locally over the UAE. Two distinct approaches, namely, geographically weighted regression (GWR), and artiﬁcial neural networks (ANNs) are tested. Daily soil moisture retrievals from the Soil Moisture Active Passive (SMAP) mission (9 km), terrain elevations from the Advanced Spaceborne Thermal Emission and Reﬂection digital elevation model (ASTER DEM, 30 m) and precipitation estimates (0.5 km) from a weather radar network are incorporated as explanatory variables in the proposed GWR and ANN model frameworks. First, the performances of the daily GPM and weather radar estimates are assessed using a network of 65 rain gauges from 1 January 2015 to 31 December 2018. Next, the GWR and ANN models are developed with 52 gauges used for training and 13 gauges reserved for model testing and seasonal inter-comparisons. GPM estimates record higher Pearson correlation coe ﬃ cients (PCC) at rain gauges with increasing elevation (z) and higher rainfall amounts (PCC = 0.29 z 0.12 ), while weather radar estimates perform better for lower elevations and light rain conditions (PCC = 0.81 z − 0.18 ). Taylor diagrams indicate that both the GWR- and the ANN-adjusted precipitation products outperform the original GPM and radar estimates, with the poorest correction obtained by GWR during the summer period. The incorporation of soil moisture resulted in improved corrections by the ANN model compared to the GWR, with relative increases in Nash–Sutcli ﬀ e e ﬃ ciency (NSE) coe ﬃ cients of 56% (and 25%) for GPM estimates, and 34% (and 53%) for radar estimates during summer (and winter) periods. The ANN-derived precipitation estimates can be used to force hydrological models over ungauged areas across the UAE. The methodology is expandable to other arid and hyper-arid regions requiring improved precipitation monitoring.


Introduction
Despite the widely reported inconsistencies of precipitation products over the Arabian Peninsula [1][2][3][4], a limited number of studies have attempted to improve precipitation monitoring over the progressively water-stressed region. Existing attempts are limited to gauge-based bivariate linear regression approaches [5,6]. Sources of precipitation estimates can be broadly grouped into three classes, namely: (i) ground-based rain gauge and radar observations, (ii) satellite precipitation retrievals, and (iii) reanalysis products fused from numerical weather predictions (NWP) models and observations. Despite the ongoing leaps in computational power, several key processes like convection, phase change, and collision-coalescence occur at the microscale, i.e., nine orders of magnitude less than current weather or climate model resolutions [7].
Remotely sensed precipitation estimates from ground-based radar and satellite platforms offer an attractive alternative to reanalysis products due to their higher spatiotemporal resolutions and coverage. Weather radars generate high-resolution real-time estimates of rainfall above the surface by emitting electromagnetic signals and analyzing backscatters from intercepted hydrometeors [8]. Consequently, the reliability of radar rainfall estimates is diminished by several factors, such as terrain blockage, different sources of clutter and signal attenuation [9,10]. Additionally, the high maintenance costs associated with weather radars limit their deployment at the global scale. With their global coverage, satellite products continue to be the most widely used precipitation data sources. These include products from the Tropical Rainfall Measurement Mission (TRMM) [11] and its successor the Global Precipitation Measurement (GPM) mission [12], the Global Precipitation Climate Center (GPCC) [13], the Climate Research Unit (CRU) [14], and the Climate Prediction Center morphing (CMORPH) technique [15], among others. Despite their widespread applications, their uncertainties remain high, especially over arid regions with absolute and relative biases reaching 100 mm and 300%, respectively [16,17]. The sparse distribution of rain gauges and inhomogeneity of observations hamper the calibration of such products for improved water resource management with rapidly expanding urbanization across the Arabian Peninsula [5].
To ameliorate the uncertainties, both precipitation correction and multi-source estimation approaches have been explored and applied for different regions. Here, we distinguish between (1) the conventional approach of exclusively relying on rain gauge observations [6,18,19] and (2) the more recent approach of incorporating additional explanatory variables [20][21][22][23] to correct precipitation estimates. The latter approach is the focus of the current study. A physically-based selection of explanatory variables is expected to preserve process dynamics and interlinkages within datasets which remain unresolved in conventional statistical correction methods. For example, water content in the uppermost soil layer exhibits an instantaneous response to collocated precipitation and is widely used as a proxy for precipitation occurrence. In fact, most currently used soil moisture retrieval algorithms are corrected by precipitation flags (rain/no rain) from available precipitation sources [24][25][26]. This soil moisture-precipitation dependency is particularly relevant for arid regions and desert environments, where background/residual soil moisture prior to a rain event is relatively uniform as a result of negligible surface flow. Therefore, any soil moisture perturbations are controlled by the spatiotemporal distribution of rainfall events and provide a sustained surface signature beyond the satellite overpass time. Using the Weather Research and Forecasting (WRF) model, Weston, et al. [27] studied the sensitivity of the heat exchange coefficient to surface conditions, including soil moisture, and demonstrated a strong impact on heat fluxes and local meteorological conditions within the United Arab Emirates (UAE). Elevation is another explanatory variable that has been widely used for precipitation correction [28][29][30][31] and is especially relevant to the current study area, given the frequently occurring local orographic rainfall events over the northeastern UAE [32][33][34]. Additional surface and atmospheric variable inputs, such as slope, air temperature, vegetation indices, surface energy fluxes and cloud characteristics have been investigated [35]. The significance of the selected inputs varies based on the geographic and climatic attributes of each study domain and, more importantly, based on the methodology followed.
Several studies report spatial correlations between precipitation and vegetation indices [36], topography [37], and land surface temperature [38]. It is crucial to account for all possible explanatory variables in the estimation of precipitation. In this regard, the geographically weighted regression (GWR) method has proven to be reliable, especially for precipitation product correction and downscaling [29,30,39]. Initially proposed by Brunsdon, et al. [40], GWR was developed to infer spatially varying dependencies between datasets beyond the simplifying assumption of constant relationships in space imposed by linear regression [41]. Using a GWR model, Kamarianakis, et al. [42] tested the hypothesis of null spatial non-stationarity in the relationship between rain gauge observations and collocated satellite estimates over the Mediterranean. Rejecting the null hypothesis, they found statistically significant spatial non-stationary components, with the satellite algorithm performing better in geographical locations with specific terrain attributes. Chao et al. [35] used a GWR-based approach to merge daily CMORPH precipitation with gauge records over the Ziwuhe Basin of China. They incorporated additional surface inputs, namely, slope, aspect, surface roughness, and distance to coastline in their model. Compared to the original CMORPH estimates, their merged product improved the gauge-based correlation from 0.208 to 0.724, and RMSE from 1.208 to 0.706 mm/hr. Relevant to the current study area, Wehbe et al. [3] conducted the first attempt to assess the consistency of different precipitation products over the Arabian Peninsula. They employed geographically-temporally weighted regression to infer water storage variations from inputs of soil moisture, terrain elevation and four different precipitation datasets. The TRMM Multi-Satellite Precipitation Analysis (TMPA V7) product showed the best predictive performance with a goodness-of-fit coefficient (R 2 ) of 0.84.
Blending explanatory variables to enhance precipitation estimates has also been addressed using Artificial Neural Networks (ANNs), a subset of machine learning (ML) techniques, that have been increasingly applied in climate studies for their abilities to perform adaptive, efficient, and holistic mappings of nonlinearities between large datasets [43,44]. Maier, et al. [45] and Gopal [46] give a detailed overview on the development and application of ANNs and their most compatible configurations for geospatial analyses. While several types of ANNs have been developed for different applications, the feedforward multilayer perceptron (MLP) architecture remains the most commonly used framework for modeling precipitation [47][48][49][50][51]. In addition to model-and satellite-based precipitation correction attempts, ANNs have also been successfully applied to improve weather radar rainfall estimates [52][53][54][55][56]. Moghim et al. [18] applied a three-layer feedforward neural network to correct precipitation and temperature model outputs over northern South America. For precipitation correction, they obtained consistent improvements of 8%, 8.5%, and 15.7% in mean square error, bias, and correlation metrics, respectively from the ANN configuration compared to linear regression. On the other hand, without incorporating precipitation inputs, Fereidoon and Koch [22] trained an ANN with daily inputs from the Advanced Microwave Scanning Radiometer -Earth Observing System (AMSR-E) soil moisture product and air temperature measurements against rainfall records at five weather stations. Nevertheless, the ANN performed reasonably well with R 2 values reaching 0.65 during testing. Importantly, despite using separate time periods, they locally tested their model at the same stations used for training without attempting to verify the generalized spatial performance of the ANN-based estimates. They also highlighted the need for further case studies to be conducted over other regions with different soil moisture products.
This study provides the first attempt of multivariate nonlinear precipitation estimation over the UAE by correcting the Integrated Multi-satellitE Retrievals for the Global Precipitation Measurement (GPM) mission's latest daily product release (IMERG V06B) overland using ancillary data and explanatory variables. Two techniques are tested, namely, the GWR and the ANN. First, to assess multi-collinearity of the datasets, the individual performances of both GPM and ground-based radar precipitation estimates are compared against 65 rain gauge records from 1 January 2015 to 31 December 2018. Next, the proposed configuration and development of the GWR and ANN models using 52 out of the 65 available rain gauges is outlined. In addition to the GPM and radar estimates, terrain elevation, and satellite soil moisture estimates are used as explanatory variables to incorporate surface wetting signatures. Finally, both models are inter-compared to the original GPM and radar estimates at 13 gauges left out during the training process. The developed models are expected to outperform both the GPM and radar estimates by overcoming their individual biases. Figure 1 shows the UAE study area and topography derived from the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) digital elevation model (DEM), described in Toutin [57]. Ground-based rainfall observations are recorded from a network of 72 rain gauges (7 offshore and 65 overland) operated by the UAE National Center of Meteorology (NCM). The training and testing stations used for the model development are also indicated. While rainfall amounts are logged at 15-min intervals by the gauges, the quality-controlled daily accumulations were made available for this study.

Rain Gauge Data
Remote Sens. 2020, 12, x FOR PEER REVIEW 4 of 28 Figure 1 shows the UAE study area and topography derived from the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) digital elevation model (DEM), described in Toutin [57]. Ground-based rainfall observations are recorded from a network of 72 rain gauges (7 offshore and 65 overland) operated by the UAE National Center of Meteorology (NCM). The training and testing stations used for the model development are also indicated. While rainfall amounts are logged at 15-min intervals by the gauges, the quality-controlled daily accumulations were made available for this study. The seven offshore gauges are not used in the current work since the correction would be exclusively gauge-based due to the limited extent of the radar estimates and their additional uncertainties from sea clutter. The offshore univariate correction approach would require a simpler model configuration (e.g., ordinary least squares regression), as pursued in [5] for the TMPA V7 product over the same study area. More importantly, the limited number of offshore gauges require a longer study period to ensure representative results which are reserved for future work.  The seven offshore gauges are not used in the current work since the correction would be exclusively gauge-based due to the limited extent of the radar estimates and their additional uncertainties from sea clutter. The offshore univariate correction approach would require a simpler model configuration (e.g., ordinary least squares regression), as pursued in [5] for the TMPA V7 product over the same study area. More importantly, the limited number of offshore gauges require a longer study period to ensure representative results which are reserved for future work. The Thunderstorm Identification Tracking and Analyses (TITAN) software [58], which is included in the Lidar Radar Open Software Environment (LROSE), is used for the operational radar data processing. Default algorithms and correction factors are used for de-cluttering, noise filtering and attenuation correction. A fuzzy logic classifier is applied for de-cluttering using the features of: radial velocity, texture of reflectivity, texture of differential reflectivity, and correlation coefficient. This is followed by noise filtering by a moving average window. Next, a standard C-band attenuation correction factor (ACF) of 0.014 dB per degree is applied based on the approximated linear relationship between specific (and differential) attenuation and differential phase [59]. Finally, the merged plan position indicator (PPI) is used to merge multiple radar overlaps based on a maximum reflectivity value approach. The radars are subject to annually-scheduled calibrations by the manufacturer using the dual-pol measurements, as well as routine maintenance to maintain a ±1 dB error margin.

Radar-Based Rainfall Estimates
The Z−R relation used for rainfall estimation is set by the manufacturer as Z = 200 R 1.455 (adapted from [60]) for mixed-phase cloud processes typical to the UAE. At a range limit of 100 km (outlined in Figure 1), the rainfall intensity R (mm/hr) is estimated for each 6-min, 100-m (range gate) elemental volume scan using vertical levels between 1-3 km. The rainfall amounts are then accumulated to the daily timescale and re-gridded to the 0.5 km resolution provided to the authors. It is important to note the range-dependent variations in the elemental volume scan resolution, where beam widths sampled at ranges beyond~30 km exceed the 0.5 km resolution used here. Evaporative loss below the 1 km level is not corrected for, and no gauge data is used for calibration/validation. Apart from the aforementioned quality control steps for the radar data, bias-correction using the gauge observations would prevent the use of the radar data in the multivariate approach sought here. Data pre-processing (Section 3) involves further steps to reduce the impact of remaining data quality issues on the training and model performance. Uncertainties from the aforementioned standard quality control steps remain, but may favor the generalization of model correction performance during the training stage [61]. On the other hand, pronounced errors may exist over the northeastern highlands due to terrain blockage and merging uncertainties. The authors intend to assess different gap-filling methods to improve coverage for this area in separate work.

GPM IMERG (Version 06B) Precipitation Product
The GPM mission, launched in February 2014, provides higher resolution (30-min, 0.1 • ) precipitation estimates through the IMERG product, compared to its TRMM TMPA (3-hourly, 0.25 • ) predecessor. The IMERG algorithm inter-calibrates, merges and interpolates GPM constellation satellite precipitation estimates with microwave-calibrated infrared estimates and rain gauge analyses to produce a higher resolution and more accurate product [12]. The GPM core satellite estimates precipitation from two instruments, the GPM microwave imager (GMI) and the dual-frequency precipitation radar (DPR). More importantly for this study, the DPR adds sensitivity to light precipitation, compared to that of TRMM's single-frequency radar. The latest release V6 uses an improved morphing scheme with a model-based propagation from the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2), compared to the V5 satellite-based propagation vectors of IR cloud-top temperature.
The GPM IMERG V06B Level-3 (L3) daily product without gauge correction is used to ensure no prior dependencies on the rain gauge data as ground truth. Nevertheless, the gauges used here are not included in the World Meteorological Organization's Global Precipitation Climatology Network which is used for the final IMERG calibration [62].

SMAP Enhanced L3 (Version 2) Soil Moisture Product
On 31 January 2015, NASA launched the SMAP mission as the first attempt to collect coincident measurements of active (radar) and passive (microwave) soil moisture retrievals [63,64]. Up to 5 cm depth of soil moisture is estimated on a 685-km, near-polar, sun-synchronous orbit, with equator crossings at 6:00 a.m. (descending) and 6:00 p.m. (ascending) local time. However, a permanent fault in the radar instrument on 7 July 2015 left only the radiometer-derived and assimilated soil moisture estimates. To compensate for the active retrieval loss, the European Space Agency's Sentinel-1A and -1B C-band radar backscatter coefficients were incorporated to derive the L2 SMAP soil moisture product. The enhanced L3 soil moisture product used here is a daily composite of the L2 soil moisture gridded on a 9-km Equal-Area Scalable Earth Grid, Version 2.0 (EASE-Grid 2.0) in a global cylindrical projection. Both the ascending and descending overpasses are used here for the daily estimates, with the higher pixel values retained in case of overlaps.

Methods
In this section, the proposed GWR model configuration and ANN architecture, along with their respective training approaches are presented. Then, the k-fold cross-validation method [65] used for model calibration is outlined. Finally, the statistical metrics and testing approach used for inter-comparing model performances are presented.
The daily GPM estimates are available at 0.1 • × 0.1 • grid scales. Consequently, data pre-processing involved aggregating the weather radar (0.5 km), SMAP (9 km) and ASTER (30 m) datasets to consistent 0.1 • (see Figure A1 in Appendix A) and daily resolutions for model training and testing. The statistical significance of each of the considered input predictors is assessed using ordinary least squares regression. The t-test [66] hypothesis testing is adopted as a widely used method to identify and sort predictors among a pool of independent variables [28]. Additionally, the Pearson correlation coefficient (PCC) is used to test independent variables for multi-collinearity [67]. Removing covariates that are highly correlated is suggested to avoid standard errors and biases in a regressive model [68]. All four selected predictors showed to be statistically significant with p-values less than 0.001 and low multi-collinearity potential with all pair-wise PCCs < 0.5.
A detailed sensitivity analysis of the impact of input data quality on predictive accuracy for an ANN with a single hidden layer is reported in [69]. A significant decrease in model performance is recorded for data error rates beyond 20% during the training stage, compared to the base case scenario with unperturbed training data. However, the model performance slightly improves as the input data error rate varies between 5-15%. This is consistent with other findings showing that the involved arithmetic operations can dampen random and systematic errors in input data. Pre-processing involved normalizing all datasets to zero mean and unity standard deviation distributions (i.e., ranging between −1 and 1) for faster convergence [70,71]. The model outputs are then de-normalized and returned to the original form. Details on the normalization and de-normalization steps can be found in Appendix A.

GWR Model Configuration
Precipitation is typically characterized by large spatial variability, which is especially the case for the UAE's rainfall regime. As such, inferring weighted relationships irrespective of spatial information (using all pixels) through global regression methods introduces significant bias. Local methods such as GWR are proposed to account for spatial non-stationarity by assigning variable weights at selected locations (pixel-per-pixel). Equation (1) illustrates the generalized form of the GWR model proposed by Brunsdon et al. [40].
where Y i denotes the i th observation of the dependent variable, is the set of coefficient weights at each location for k independent variable (predictor) values X i , and ε i is the aggregated residual term. The detailed derivation of Equation (1) and the GWR approach in general is provided by Brunsdon, et al. [72].
For the special case of (1) can be reduced to a simple linear regression equation. The coefficient weights for the i th observation can be expressed (without the spatial coordinates u i and v i ) aŝ where W i is a matrix (n × n) with a diagonal of coefficient weight elements. The Gauss function is used between observations i and regression point j to provide a continuous and exponential decay relationship between the distance function and the weighting matrix as where b is the Gaussian kernel bandwidth and d ij denotes the distance function. Given the uneven distribution of stations, an adaptive bandwidth b is automatically assigned based on cross-validation [73,74]. The developed GWR model can be expressed as where CP i is the corrected precipitation output, RP i and SP i are the ground-based radar and satellite precipitation, respectively, SM i is the SMAP soil moisture estimate and Z i is the ASTER DEM value, each at any point (u i , v i ) across the domain at 0.1 • resolution. When time-dependent relationships are expected between input and/or output variables, time-varying weights must be derived. For example, geographically-temporally weighted regression was used by the authors to investigate rainfall-groundwater recharge mechanisms in previous work [3]. However, in the current study, a same-day response is expected between the input and output datasets (i.e., any change in observed rainfall will reflect on both radar and satellite estimates, as well as on soil moisture at the daily scale). Therefore, spatially distributed weights from GWR are used here without temporal variation.

Feedforward MLP Configuration
ANNs can simulate complex nonlinear relationships between variables and resolve higher-order dependencies overlooked by conventional linear regression methods. ANNs were formulated to replicate the functionality and learning ability of biological neural networks, with neurons being their basic functional units. Each neuron is bounded by input and output variables, with intermediary weighting coefficients and activation functions embedded in one or more hidden layers. The widely used feedforward MLP architecture is a type of supervised ANN that requires output information (targets) to be specified. The configuration of a feedforward MLP is defined by the number of hidden layers and hidden neurons as well as the selected activation functions and training algorithms. Table 1 gives an overview of the proposed MLP configuration and reasoning for each selection. One hidden layer is chosen according to the widely recommended three-layer feedforward network configuration [44,46], particularly for precipitation bias correction studies as sought here [18,47,75].  [18,77] Training algorithm Levenberg-Marquardt algorithm (trainlm) See [18,71] Activation functions, also known as transfer functions, provide sequential connections between neurons in all three layers. First, the input data is weighted and forwarded to the hidden layer where the weighted summations are then converted to output fields. Sigmoid-based functions are reported to be the most applied functions between the input and hidden layers [79,80]. Depending on the application, selected types of activation functions (including sigmoid subtypes) are known to improve the performance of ANNs, but do not constrain the networks' mapping power [81]. The hyperbolic tangent (tansig) and linear (purelin) transfer functions are selected here for the hidden and output layers, respectively, due to their reported success when used for precipitation bias correction [18,77]. Following the same terminology used for Equation (5) and without explicit listing of spatial coordinates (u i , v i ), the general form of the proposed MLP can be expressed as where n is the number of hidden neurons, λ j are the connection weights between the j th neuron in the hidden layer and the output neuron, β j1 , β j2 , . . . β j4 are the connection weights between the j th neuron of the hidden layer and each of the four neurons of the input layer, β jo and β o are the bias parameters, and f out and f hid are the activation functions for the output and hidden layers, respectively.

Training Algorithm
As in the case of GWR, weights are the key parameters of the MLP determined by a selected training algorithm. A training algorithm continuously modifies the network's weights and biases with the aim of minimizing a predefined error function (mean squared error used here) between the gauge observations and network output. The choice of the training algorithm dictates the computation time for the training and, consequently, the memory capacity, especially with a large number of inputs. The Levenberg-Marquardt (LM) algorithm [82] combines the advantages of both the Gauss-Newton (GN) [83] and the gradient descent (GD) methods [84] in terms of fast convergence with randomly assigned initial weights. This dictates its widespread use for training moderate-sized networks with up to several hundred weights [78,[85][86][87].
The detailed derivation of the LM method can be found in Marquardt [88]. Similar to the Newton methods, the LM avoids the costly computation of the Hessian matrix, expressed as H = J T J and gradient g = J T e, where J is the Jacobian matrix that contains first derivatives of the network errors with respect to the weights and biases, and e is a vector of the network errors. A standard backpropagation technique is used to compute the Jacobian matrix in place of the Hessian matrix, and the LM algorithm can be expressed as where w k is the vector of weights for the k th iteration, I is the identity matrix, and µ is a nonzero combination coefficient that ensures the Hessian matrix is invertible. For larger and smaller values of µ, the LM method approaches the GD and GN methods, respectively. The dimension of the hidden weight matrix is 4 × 16, where each input variable is associated with 16 weights (one per hidden neuron) followed by a 16 × 1 output weight matrix. The neurons are adaptively activated depending on the data fed to the network. This complexity is expected to preserve hidden information (including spatial) when trained with the gauge observations and collocated variables [46]. Figure 2 illustrates the proposed configuration of the feedforward MLP with an input layer consisting of 4 neurons, a hidden layer with 16 neurons, and an output layer consisting of 1 neuron, as well as the selected activation functions. Details on the MLP calibration by k-fold cross-validation can be found in Appendix A.

Model Testing and Skill Scores
The 4-year (2015-2018) annual average rainfall was computed for each of the 65 gauges and ascendingly ranked. Then, a verification (testing) station was sequentially selected for every 5 ranks, amounting to 13 stations. The remaining 52 stations were used for training. This approach captures the domain's full precipitation range [91]. The number of testing gauges (13) was determined as 20% of the total 65 gauges. This is in line with the commonly used 80/20 ratio for training/testing samples to ensure proper verification without compromising the training quality [92]. Figure 1 Error! Reference source not found.shows the spatial distribution of the training and testing gauges. An alternative approach is temporal sub-setting (TS) by using the full network of stations for training during 2015-2017, and testing during 2018.
After training and calibration, the GWR and ANN models are tested over an independent subsample using both spatial and temporal divisions. The error measures used are listed below and include the root mean squared error (RMSE), relative BIAS (rBIAS), probability of detection (POD) and false alarm ratio (FAR). A threshold of 3 mm was used for computing the POD and FAR values as recommended in [5].
POD = events detected by both rain gauge and estimate source events detected by rain gauge alone (9)

Model Testing and Skill Scores
The 4-year (2015-2018) annual average rainfall was computed for each of the 65 gauges and ascendingly ranked. Then, a verification (testing) station was sequentially selected for every 5 ranks, amounting to 13 stations. The remaining 52 stations were used for training. This approach captures the domain's full precipitation range [89]. The number of testing gauges (13) was determined as 20% of the total 65 gauges. This is in line with the commonly used 80/20 ratio for training/testing samples to ensure proper verification without compromising the training quality [90]. Figure 1 shows the spatial distribution of the training and testing gauges. An alternative approach is temporal sub-setting (TS) by using the full network of stations for training during 2015-2017, and testing during 2018.
After training and calibration, the GWR and ANN models are tested over an independent subsample using both spatial and temporal divisions. The error measures used are listed below and include the root mean squared error (RMSE), relative BIAS (rBIAS), probability of detection (POD) and false alarm ratio (FAR). A threshold of 3 mm was used for computing the POD and FAR values as recommended in [5].
POD = events detected by both rain gauge and estimate source events detected by rain gauge alone (9) FAR = events dected by estimate source total events detected by estimate source including those detected by rain guage (10) where y est i and y oi are the estimated (model) and observed (gauge) precipitation, respectively, at gauge i and n is the sample size.
The model performance is also assessed using the PCC and Nash-Sutcliffe Efficiency (NSE) coefficients defined by Equations (11) and (12), respectively [91,92]. The PCC records the statistical association between the model and observational datasets and can range between −1 to 1, where 0 indicates no association and positive/negative values indicate increasing/decreasing relationships between two variables.
The NSE records the absolute difference between observed values and corresponding estimates, normalized by the observational variance to reduce bias. It ranges between −∞ and 1, where values closer to 1 indicate model accuracy. A threshold value of 0.5 is generally used to imply an adequate model performance [93,94].

Inter-Comparison of Spatial Distributions
First, the individual performance of the daily GPM and radar estimates is evaluated against the overland rain gauge network. Figure 3 shows the spatial distribution of annual rainfall amounts accumulated from the daily values of the radar, GPM, and rain gauge data. Annual accumulations are derived for each of the four years (2015-2018) and gridded at their native resolutions. The gauge records indicate that most of the country's rainfall events occur around Al Ain and the northeastern highlands, with 2017 being the wettest (max. observed >300 mm) and 2015 being the driest (max. observed 121 mm). Relatively low rainfall amounts (<50 mm) are consistently observed in the gauge records over the western region of Abu Dhabi. The GPM estimates exhibit a similar spatial organization to the gauge records, with the exception of 2017 (Figure 3e) where the highest precipitation amounts (196 mm) are retrieved over the western coastline. The GPM product captures most events in the northeastern highlands but with consistent underestimations compared to the gauge records, which is mainly attributed to the difference in scale and missing the short and small-scale local (orographic) convective events. More importantly, the GPM product severely underestimates rainfall around Al Ain each year. This is more clearly depicted in the seasonal accumulations shown in Figure 4, where inland gauges around Al Ain record heavy winter precipitation events (Figure 4f), which are missed in the GPM product (Figure 4e). On the other hand, large overestimations (over 100 mm) from GPM are shown over the coastal areas, The GPM estimates exhibit a similar spatial organization to the gauge records, with the exception of 2017 (Figure 3e) where the highest precipitation amounts (196 mm) are retrieved over the western coastline. The GPM product captures most events in the northeastern highlands but with consistent underestimations compared to the gauge records, which is mainly attributed to the difference in scale and missing the short and small-scale local (orographic) convective events. More importantly, the GPM product severely underestimates rainfall around Al Ain each year. This is more clearly depicted in the seasonal accumulations shown in Figure 4, where inland gauges around Al Ain record heavy winter precipitation events (Figure 4f  The radar-based precipitation pattern agrees with the observed records in terms of the spatial organization, with higher amounts localized in the northeastern highlands and Al Ain, and lower amounts to the west. Due to their higher spatial resolution (0.5 km), the radar estimates match the spatial pattern of observed gauge rainfall more closely than the GPM retrievals (10 km). Contrary to GPM, overestimation in the radar product is pronounced in the summer accumulations (Figure 4a) compared to the gauge amounts (Figure 4c).
The results thus far suggest the importance of accounting for elevation and land cover attributes to address the discrepancies between the satellite, radar, and gauge-based precipitation estimates. The impact of elevation on the performance of the two precipitation estimates is discussed in the following subsection.

Effect of Topography on Precipitation Estimates
The PCC value at each of the 65 gauges is computed between the daily gauge observations and each of the corresponding GPM and radar estimates. Figure 5a,b show boxplots of the obtained PCC values and their variation as a function of gauge elevation for the GPM and the radar products, respectively. The GPM-derived PCCs varied from 0.21 to 0.76 with a median value of 0.53, whereas those obtained from the radar estimates showed a larger variance from 0.03 to 0.82, but with a comparable median of 0.48. The larger interquartile range observed in the radar data dictates the larger variation observed in the PCCs at lower elevations compared to GPM. The radar-based precipitation pattern agrees with the observed records in terms of the spatial organization, with higher amounts localized in the northeastern highlands and Al Ain, and lower amounts to the west. Due to their higher spatial resolution (0.5 km), the radar estimates match the spatial pattern of observed gauge rainfall more closely than the GPM retrievals (10 km). Contrary to GPM, overestimation in the radar product is pronounced in the summer accumulations (Figure 4a) compared to the gauge amounts (Figure 4c).
The results thus far suggest the importance of accounting for elevation and land cover attributes to address the discrepancies between the satellite, radar, and gauge-based precipitation estimates. The impact of elevation on the performance of the two precipitation estimates is discussed in the following subsection.

Effect of Topography on Precipitation Estimates
The PCC value at each of the 65 gauges is computed between the daily gauge observations and each of the corresponding GPM and radar estimates. Figure 5a,b show boxplots of the obtained PCC values and their variation as a function of gauge elevation for the GPM and the radar products, respectively. The GPM-derived PCCs varied from 0.21 to 0.76 with a median value of 0.53, whereas those obtained from the radar estimates showed a larger variance from 0.03 to 0.82, but with a comparable median of 0.48. The larger interquartile range observed in the radar data dictates the larger variation observed in the PCCs at lower elevations compared to GPM. The power law relation provided the best fit to the PCC-elevation dependency and are shown for each case. Figure 5a indicates better agreement for GPM estimates at higher elevations. Conversely, Figure 5b shows a degradation in the radar performance with increasing elevation as a result of orography and mountain blockage. This is in line with the annual-scale results (Figure 3) with the northeastern highlands associated with higher rainfall amounts. Figure 5c shows the boxplot of the SMAP-derived soil moisture estimates at each gauge location along with the PCC-elevation scatter plot. Correlations with observed rainfall record an interquartile range of 0.38 to 0.58 with upper and lower bounds of 0.78 and 0.13, respectively. However, the fitted power law curve shows a statistically insignificant decreasing relationship (R 2 = 0.24). To further illicit the spatial dependencies of the SMAP-rainfall agreement, Figure 6 shows the spatial distribution of the PCC recorded at each rain gauge.  The power law relation provided the best fit to the PCC-elevation dependency and are shown for each case. Figure 5a indicates better agreement for GPM estimates at higher elevations. Conversely, Figure 5b shows a degradation in the radar performance with increasing elevation as a result of orography and mountain blockage. This is in line with the annual-scale results (Figure 3) with the northeastern highlands associated with higher rainfall amounts. Figure 5c shows the boxplot of the SMAP-derived soil moisture estimates at each gauge location along with the PCC-elevation scatter plot. Correlations with observed rainfall record an interquartile range of 0.38 to 0.58 with upper and lower bounds of 0.78 and 0.13, respectively. However, the fitted power law curve shows a statistically insignificant decreasing relationship (R 2 = 0.24). To further illicit the spatial dependencies of the SMAP-rainfall agreement, Figure 6 shows the spatial distribution of the PCC recorded at each rain gauge.  The results indicate the complementary performance of the satellite and radar-based precipitation datasets, with GPM recommended for the northeastern highlands and radar estimates for inland and coastal areas, which justifies blending them into one model framework. SMAP soil moisture estimates record statistically significant correlations (PCC > 0.5) with observed rainfall at more than 70% of the gauges, showing that the daily overpasses (6 am/pm LST) of SMAP soil moisture retrievals preserve surface signature of observed rainfall events. This is particularly true for the inland and low topography areas, whereas less agreement is observed in the northern highlands. Nevertheless, the recorded agreement corroborates the use of the SMAP soil moisture estimates as proxies for daily observed rainfall events.

Evaluation of Model Performances
In this section, the results of the fully trained GWR and ANN models are presented. For the same arbitrary training gauge used in the previous section, Figure 7 shows the time series of daily SMAP soil moisture and rainfall records from the gauges and GPM product, in addition to the corrected rainfall estimates from the ANN and GWR models. The GPM product shows consistent underestimation (rBIAS = −24.4%) of observed gauge rainfall, except for large overestimations for three events in the last quarter of the study period. This is in line with the previous work reporting the biases in GPM estimates over the UAE, attributed to ice-scattering microwave retrieval deficiencies over desert land cover [5] and difference in spatial scales [97]. Both models significantly reduce the bias of the uncorrected GPM product compared to the rain gauge record. The GWR model reduced the bias to −14.6%, while the ANN recorded a more significant reduction to 0.7%. The results indicate the complementary performance of the satellite and radar-based precipitation datasets, with GPM recommended for the northeastern highlands and radar estimates for inland and coastal areas, which justifies blending them into one model framework. SMAP soil moisture estimates record statistically significant correlations (PCC > 0.5) with observed rainfall at more than 70% of the gauges, showing that the daily overpasses (6 am/pm LST) of SMAP soil moisture retrievals preserve surface signature of observed rainfall events. This is particularly true for the inland and low topography areas, whereas less agreement is observed in the northern highlands. Nevertheless, the recorded agreement corroborates the use of the SMAP soil moisture estimates as proxies for daily observed rainfall events.

Evaluation of Model Performances
In this section, the results of the fully trained GWR and ANN models are presented. For the same arbitrary training gauge used in the previous section, Figure 7 shows the time series of daily SMAP soil moisture and rainfall records from the gauges and GPM product, in addition to the corrected rainfall estimates from the ANN and GWR models. The GPM product shows consistent underestimation (rBIAS = −24.4%) of observed gauge rainfall, except for large overestimations for three events in the last quarter of the study period. This is in line with the previous work reporting the biases in GPM estimates over the UAE, attributed to ice-scattering microwave retrieval deficiencies over desert land cover [5] and difference in spatial scales [95]. Both models significantly reduce the bias of the uncorrected GPM product compared to the rain gauge record. The GWR model reduced the bias to −14.6%, while the ANN recorded a more significant reduction to 0.7%. For a selected weather event on 3 January 2016, Figure 8 depicts the precipitation amounts (mm/day) retrieved by radar and GPM data, generated by the ANN and GWR models, recorded by the rain gauges, as well as the corresponding soil moisture retrievals from the SMAP product. The rain gauges record between 30 and 50 mm/day with the event predominantly impacting the northeastern UAE, while lighter rainfall between 10 and 15 mm/day is recorded inland near Al Ain and parts of the northern coastline.
In line with the previous results from the SMAP-rain gauge comparison (Figure 6), the soil moisture conditions (Figure 8f) capture the spatial extent of the weather event observed by the rain gauge distribution (Figure 8e). Higher soil moisture values between 0.2 and 0.3 cm 3 /cm 3 exist within areas of observed rainfall, while lower values (residual moisture as low as 0.09 cm 3 /cm 3 ) are recorded in areas not impacted by the event. The GPM (Figure 8b) estimates capture the event pattern, with lower underestimations (5-15 mm) over the northeastern areas and higher underestimations (20-25 mm) inland and around Al Ain. More importantly, the GPM product shows erroneous rainfall estimates coincident with residual soil moisture values and null gauge rainfall, which is also evident during the fourth quarter of 2018 in Figure 7b. For a selected weather event on 3 January 2016, Figure 8 depicts the precipitation amounts (mm/day) retrieved by radar and GPM data, generated by the ANN and GWR models, recorded by the rain gauges, as well as the corresponding soil moisture retrievals from the SMAP product. The rain gauges record between 30 and 50 mm/day with the event predominantly impacting the northeastern UAE, while lighter rainfall between 10 and 15 mm/day is recorded inland near Al Ain and parts of the northern coastline.
In line with the previous results from the SMAP-rain gauge comparison (Figure 6), the soil moisture conditions (Figure 8f) capture the spatial extent of the weather event observed by the rain gauge distribution (Figure 8e). Higher soil moisture values between 0.2 and 0.3 cm 3 /cm 3 exist within areas of observed rainfall, while lower values (residual moisture as low as 0.09 cm 3 /cm 3 ) are recorded in areas not impacted by the event. The GPM (Figure 8b) estimates capture the event pattern, with lower underestimations (5-15 mm) over the northeastern areas and higher underestimations (20-25 mm) inland and around Al Ain. More importantly, the GPM product shows erroneous rainfall estimates coincident with residual soil moisture values and null gauge rainfall, which is also evident during the fourth quarter of 2018 in Figure 7b.  Both the ANN and GWR precipitation outputs in Figure 8c,d, respectively, exhibit an intermediary pattern between the radar and GPM representations. Both models increase the event extent and resulting rainfall over the northeastern domain and more closely match the gauge and soil moisture distributions. The major differences between the ANN and GWR results exist over the poorly gauged western region. Compared to the GWR pattern, the ANN pattern more closely matches the soil moisture fields from SMAP. This suggests the ANN model's capability to integrate  Both the ANN and GWR precipitation outputs in Figure 8c,d, respectively, exhibit an intermediary pattern between the radar and GPM representations. Both models increase the event extent and resulting rainfall over the northeastern domain and more closely match the gauge and soil moisture distributions. The major differences between the ANN and GWR results exist over the poorly gauged western region. Compared to the GWR pattern, the ANN pattern more closely matches the soil moisture fields from SMAP. This suggests the ANN model's capability to integrate soil moisture response into the precipitation correction process more effectively than the GWR process.

Model Testing: Spatial (SS) and Temporal (TS) Sub-Setting
To further diagnose the models' inter-comparison, Figure 9 shows the Taylor diagrams [96] of the radar, GPM, GWR, and ANN daily rainfall estimates using both SS and TS approaches during summer (JJAS) and winter (DJFM) periods. Taylor diagrams depict the relative skill of each precipitation source (0.1 • grid) while simultaneously accounting for the PCC, RMSE and standard deviation with respect to the gauge values. The Euclidean distance between each of the four precipitation sources and the rain gauge-labeled markers gives the pooled test result, with the smallest distance indicating the best performance.
Remote Sens. 2020, 12, x FOR PEER REVIEW 17 of 28 soil moisture response into the precipitation correction process more effectively than the GWR process.

Model Testing: Spatial (SS) and Temporal (TS) Sub-Setting
To further diagnose the models' inter-comparison, Figure 9 shows the Taylor diagrams [98] of the radar, GPM, GWR, and ANN daily rainfall estimates using both SS and TS approaches during summer (JJAS) and winter (DJFM) periods. Taylor diagrams depict the relative skill of each precipitation source (0.1° grid) while simultaneously accounting for the PCC, RMSE and standard deviation with respect to the gauge values. The Euclidean distance between each of the four precipitation sources and the rain gauge-labeled markers gives the pooled test result, with the smallest distance indicating the best performance.   (Figure 9a), the radar estimates outperform the GPM estimates and vice versa for the winter period (Figure 9b), which further corroborates the results in Section 4.2. The ANN records the best performance during both periods, with the highest agreement during the summer period. The poorest correction is recorded by the GWR model during summer, with a comparable performance to the original radar estimates which captured summertime precipitation better than the original GPM estimates (Section 4.2). Any improvement in the model corrections of  The ANN records the best performance during both periods, with the highest agreement during the summer period. The poorest correction is recorded by the GWR model during summer, with a comparable performance to the original radar estimates which captured summertime precipitation better than the original GPM estimates (Section 4.2). Any improvement in the model corrections of summertime precipitation over the poorly gauged western region is mainly attributed to the soil moisture representation. Hence, the ANN outperforms the GWR model in terms of addressing the precipitation-soil moisture dependencies during the summer period. On the other hand, the winter correction is largely controlled by the initial GPM estimates rather than the soil moisture conditions, with the intense orographic events localized over the northeastern highlands. The GWR captures the variance of the gauge data with a standard deviation of 18.3 mm, almost matching that of the gauges. However, the ANN records a slightly lower RMSE (10 mm) compared to that obtained from the GWR (11.6 mm). Overall, both models perform comparatively well with small differences in PCC, RMSE and standard deviation of 0.04, 1.6 mm and 4.37 mm, respectively, in slight favor of the ANN model. Figure 9c,d show the results from the TS approach using all stations for the summer and winter periods of 2018. The ANN continues to outperform the GWR results, as well as the original radar and GPM estimates during both periods. The GPM and radar estimates record the most comparable performance during winter TS, suggesting input collinearity when using the full network of gauges, which results in the ANN's lowest corrective performance (shortest Euclidean distance to the gauge marker).
To further analyze the differences between the TS and SS approaches and their impact on the models' performance, Table 2 lists the rBIAS, NSE, POD and FAR measures obtained from both approaches. For summer SS, the radar estimates outperform the GPM and both model estimates in terms of the POD (0.83) and FAR (0.28) measures, while the ANN leads in terms of rBIAS (2.42%) and NSE (0.56) values. Similarly, for summer TS, the radar estimates outperform the other sources with a comparable POD (0.76) and FAR (0.31) to those obtained from summer SS. Furthermore, the ANN leads again with further comparable measures of rBIAS (2.81%) and NSE (0.51). On the other hand, for both winter SS and TS, the ANN consistently outperforms the remaining three sources across all four metrics, with slightly lower improvements from TS (as reported in the Taylor analysis). However, when considering relative rates of improvement in NSE compared to the original GPM estimates in each case, the winter TS records a 65% increase compared to a 24% increase from the winter SS. This suggests the robustness of the ANN model in both training approaches, as well as the value of performing a fully distributed spatiotemporal division as future work with longer dataset coverages.

Discussion
The overall spatial distributions of the GPM-and radar-based estimates are consistent with the 10-year rainfall regime reported by the authors in a previous study [5]. The coastal contamination observed in the GPM estimates is likely attributed to uncertainties in the land mask used in IMERG, which assigns sea pixels to recent coastal expansions and significantly impacts the microwave (passive) signal during rainfall events. Most of the radar-intercepted rainfall aloft (>1 km) is evaporated before reaching the surface, which explains the pronounced overestimation in the radar-based estimates during summer periods (Figure 4a). Conversely, during the colder winter seasons (Figure 4d), evaporative loss is overridden by attenuation errors, which affect both the transmitted and reflected radar waves. Intercepted precipitation within the volume scan weakens the signal, particularly when intense convective cells are situated near the radar, which is the case of the Al Ain radar. Also, gaps and merging uncertainties are evident in the radar precipitation field over the far northeastern highlands due to terrain blockage (Figure 4d), where four gauges are situated on the leeward slopes.
Limited agreement between SMAP soil moisture and observed rainfall over the northeastern highlands is noted. This is explained by the rapid lateral propagation of surface moisture with gravity-driven runoff reported for the same study area [32], characterized by short lag times (<1 h) for surface moisture transport between upstream and downstream locations (1 km apart). Topography also largely contributes to macro roughness, particularly in the absence of vegetation [97]. This leads to additional emissions from mountain sites which are still not accounted for in satellite-based soil moister products, including SMAP. Also, water vapor accumulated from the high daytime evaporation and land/sea breeze is subject to condensation during nighttime cooling. With low wind speeds and low dew point temperatures, such condensation leads to frequent fog events in desert environments, as well as surface condensation (i.e., dew) in cases with high mixing ratios [33,98,99]. Hence, surface condensation causes spikes in the coincident SMAP morning overpass (6:00 a.m. LST), but rapidly evaporates as temperatures warm up. Moreover, the retrieval of soil moisture from passive microwave data should correspond to the depth of the effective temperature which varies depending on soil properties [100,101]. Furthermore, the discrepancy between the used soil temperature in the retrieval of soil moisture and the actual effective temperature may lead to some uncertainty in soil moisture estimates [102].
To evaluate the model performance, an independent set of testing data left out during training is commonly used. Ideally, sub-setting is carried out both spatially and temporally, i.e., using separate rain gauges over time periods beyond the training temporal coverage. However, for the relatively small dataset size (24,245) used here, a spatiotemporal sub-set further limits the available training sample size which increases the risk of under-fitting. For limited rain gauge data, sub-setting can be done either spatially [89,103] or temporally, with the latter being more relevant to assess the performance of forecast models. In the current work, The GWR and ANN are not developed for prediction, but instead for post-processing with a focus on spatial interdependency between the explanatory variables over ungauged areas. Nevertheless, both models were tested using TS and ST approaches with the ANN recording the highest agreement with the testing rain gauge samples. This finding motivates future work using a fully independent spatiotemporal testing approach with longer dataset coverages, which are currently limited by the radar dataset.
A sensitivity analysis is carried out to investigate the contribution of each input variable within both GWR and ANN model configurations. The relative contribution of input variables is assessed by recording the change in performance obtained from their individual exclusion, in turn, compared to the base case (all variables included), during the testing stage. The largest increase in RMSE indicates the most influential input variable. The results are summarized in Figure 10 below. For both models, the GPM product is ranked as the most influential variable with the largest increases in RMSE. Conversely, elevation is ranked as the least important variable with the smallest increases in RMSE. Radar rainfall and SMAP soil moisture show contradicting levels of importance between the two models. The ANN shows a slightly higher increase in RMSE from the exclusion of SMAP (4.82) compared to that of radar data (4.63), while the GWR shows a larger increase in RMSE from the exclusion of radar data (3.91) compared that of SMAP (2.96). This corroborates the results of Section 4.4, suggesting that the ANN outperforms the GWR model in better resolving precipitationsoil moisture dependencies.

Conclusions
The objective of this study is to derive a multi-source precipitation product with local gauge adjustment over the hyper-arid region of the UAE by implementing two widely used approaches, namely, GWR and ANNs.
Elevation-dependent biases are widely reported in the literature with larger biases in GPMderived estimates at elevations exceeding 4000 m [106]. However, the current work shows that for narrow ranges between 0 to 1800 m in the UAE, the GPM IMERG daily product performs better over the northeastern highlands. Conversely, the weather radar-derived precipitation estimates show better agreement over flat inland and coastal locations with lower rainfall amounts avoiding mountain blockage. GWR and ANN-based merging of the radar and GPM estimates is employed to complementarily preserve the performance of both sources. Uncertainties from the standard decluttering, attenuation correction, and merging approaches remain evident in the radar-based rainfall estimates. Similarly, the GPM and SMAP input datasets exhibit their own sources of spatiotemporal uncertainties. However, errors in the training data are demonstrated to favor the generalization of regression models and improve their corrective performance against an output target [61]. This is particularly expected from the more robust ANN architecture which is capable of resolving nonlinear uncertainties compared to GWR [107].
SMAP soil moisture shows adequate agreement with the gauge observations with PCC values reaching 0.78 around Al Ain, while lower consistency over the northeastern highlands is attributed to rapid surface drainage. Nevertheless, the SMAP product is shown to preserve surface signatures of actual rain events that may be missed during the GPM constellation overpass times. The incorporation of soil moisture resulted in improved corrections by the ANN model compared to the GWR during summer. Taylor diagrams show that both GWR and ANN models outperform the individual GPM and radar estimates, with the poorest correction obtained by GWR during the summer period. Higher agreement is consistently obtained by the ANN compared to GWR with NSE For both models, the GPM product is ranked as the most influential variable with the largest increases in RMSE. Conversely, elevation is ranked as the least important variable with the smallest increases in RMSE. Radar rainfall and SMAP soil moisture show contradicting levels of importance between the two models. The ANN shows a slightly higher increase in RMSE from the exclusion of SMAP (4.82) compared to that of radar data (4.63), while the GWR shows a larger increase in RMSE from the exclusion of radar data (3.91) compared that of SMAP (2.96). This corroborates the results of Section 4.4, suggesting that the ANN outperforms the GWR model in better resolving precipitation-soil moisture dependencies.

Conclusions
The objective of this study is to derive a multi-source precipitation product with local gauge adjustment over the hyper-arid region of the UAE by implementing two widely used approaches, namely, GWR and ANNs.
Elevation-dependent biases are widely reported in the literature with larger biases in GPM-derived estimates at elevations exceeding 4000 m [104]. However, the current work shows that for narrow ranges between 0 to 1800 m in the UAE, the GPM IMERG daily product performs better over the northeastern highlands. Conversely, the weather radar-derived precipitation estimates show better agreement over flat inland and coastal locations with lower rainfall amounts avoiding mountain blockage. GWR and ANN-based merging of the radar and GPM estimates is employed to complementarily preserve the performance of both sources. Uncertainties from the standard de-cluttering, attenuation correction, and merging approaches remain evident in the radar-based rainfall estimates. Similarly, the GPM and SMAP input datasets exhibit their own sources of spatiotemporal uncertainties. However, errors in the training data are demonstrated to favor the generalization of regression models and improve their corrective performance against an output target [61]. This is particularly expected from the more robust ANN architecture which is capable of resolving nonlinear uncertainties compared to GWR [105].
SMAP soil moisture shows adequate agreement with the gauge observations with PCC values reaching 0.78 around Al Ain, while lower consistency over the northeastern highlands is attributed to rapid surface drainage. Nevertheless, the SMAP product is shown to preserve surface signatures of actual rain events that may be missed during the GPM constellation overpass times. The incorporation of soil moisture resulted in improved corrections by the ANN model compared to the GWR during summer. Taylor diagrams show that both GWR and ANN models outperform the individual GPM and radar estimates, with the poorest correction obtained by GWR during the summer period. Higher agreement is consistently obtained by the ANN compared to GWR with NSE improvement rates of 56% (and 25%) for GPM estimates and 34% (and 53%) for radar estimates during summer (and winter) periods. Therefore, multiple linear regression approaches, including GWR, still fail to map important processes due to the complex and spatiotemporal nonlinearities between precipitations and other land/atmospheric variables, especially over heterogeneous domains [38].
The overland ANN-based correction framework proposed here can be used to generate more reliable inputs for hydrological studies over ungauged areas across the UAE. These include hydrological assessments from the catchment-scale and beyond (e.g., the macro-and regional-scale) [106]. While the developed ANN configuration is set up locally for the UAE, the methodology followed is applicable to other arid and hyper-arid regions requiring improved precipitation monitoring. Future work with additional surface variables, particularly soil texture and land cover, is suggested to account for soil moisture drawdown and its spatial variation to further improve the physically-based ANN representation.

Funding:
This research was funded by Khalifa University of Science and Technology, grant number KUX-8434000101.

Acknowledgments:
The authors acknowledge the support received from the UAE National Center of Meteorology (NCM) by providing the quality-controlled rain gauge and radar-derived precipitation datasets. The authors also thank Dr. George Huffman, Research Scientist at NASA Goddard Space Flight Center (GSFC), and the three anonymous reviewers for their comments that improved the analyses. The GPM IMERG products can be downloaded from the Goddard Space Flight Center, Precipitation Measurement Missions at the National Aeronautics and Space Administration (NASA) portal: https://pmm.nasa.gov/data-access/downloads/gpm. The SMAP data can be accessed through National Snow and Ice Data Center at http://nsidc.org/data/smap.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in the manuscript:

A.1. MLP Calibration: k-Fold Cross Validation
Having specified the number of hidden layers, activation functions and training algorithms, a careful selection of the number of neurons (nodes) in the hidden layer is critical for the network performance. The hidden nodes are the units that establish nonlinear parallel mapping of inputs to the output target. A small number of hidden nodes may cause under-fitting, while larger numbers favor increasing accuracy but with a larger risk of over-fitting. Instead of a trial and error approach to optimize this tradeoff, the k-fold cross-validation (CV) is pursued here for selecting an optimal number of neurons [109,110]. The CV approach sequentially partitions the entire datasets into predefined folds. The model is re-run for every fold to obtain a generalized result for the optimal number of neurons as explained below.

A.1. MLP Calibration: k-Fold Cross Validation
Having specified the number of hidden layers, activation functions and training algorithms, a careful selection of the number of neurons (nodes) in the hidden layer is critical for the network performance. The hidden nodes are the units that establish nonlinear parallel mapping of inputs to the output target. A small number of hidden nodes may cause under-fitting, while larger numbers favor increasing accuracy but with a larger risk of over-fitting. Instead of a trial and error approach to optimize this tradeoff, the k-fold cross-validation (CV) is pursued here for selecting an optimal number of neurons [107,108]. The CV approach sequentially partitions the entire datasets into pre-defined folds. The model is re-run for every fold to obtain a generalized result for the optimal number of neurons as explained below.
A 10-fold CV is shown to generate the best performance for network sizes similar to that of the present study [76,109,110]. Hence, 10 equally sized subsamples are randomly partitioned from the original sample, with 9 subsamples used for training and a varying single (unseen) sample used for the mean squared error (MSE). The CV error (CVE) is then computed as the mean of the obtained MSE values as where y est i and y oi are the estimated and observed values for location i, respectively, and N = n 10 is the subsample size. This was repeated for N = 1 to 50 hidden neurons and the lowest CVE value (best fit) was recorded for 16 neurons ( Table 1).
The fact that the training subsamples are subject to overlap (not independent) during this CV, introduces inherent biases into the CVE estimates [111]. Therefore, the CV approach used here is limited to network parameter selection (number of hidden neurons) and is not used for evaluating generalized model performance (testing) which is presented in Section 3.3.

A.2. Data Pre-Processing: Normalization and De-Normalization
Preprocessing involved normalizing all datasets to zero mean and unity standard deviation distributions (i.e., ranging between −1 and 1) for faster convergence [70,71] and compatibility with the tansig activation function range in the case of the ANN model. The output of the models is then de-normalized and returned to the original form. Details on the normalization and de-normalization steps can be found in [71] and are summarized below where X is the initial (actual) value and Y is the respective normalized value. By definition, Y max and Y min are 1 and −1 respectively, which reduces Equation (AE2) to The output data is then de-normalized as X = (1 + Y)(X max − X min ) 2 + X min (AE4)