High-Resolution Precipitation Datasets in South America and West Africa based on Satellite-Derived Rainfall , Enhanced Vegetation Index and Digital Elevation Model

Mean Annual Precipitation is one of the most important variables used in water resource management. However, quantifying Mean Annual Precipitation at high spatial resolution, needed for advanced hydrological analysis, is challenging in developing countries which often present a sparse gauge network and a highly variable climate. In this work, we present a methodology to quantify Mean Annual Precipitation at 1 km spatial resolution using different precipitation products from satellite estimates and gauge observations at coarse spatial resolution (i.e., ranging from 4 km to 25 km). Examples of this methodology are given for South America and West Africa. We develop a downscaling method that exploits the relationship among satellite-derived rainfall, Digital Elevation Model and Enhanced Vegetation Index. Finally, we validate its performance using rain gauge measurements: comparable annual precipitation estimates for both South America and West Africa are retrieved. Validation indicates that high resolution Mean Annual Precipitation downscaled from CHIRP (Climate Hazards Group Infrared Precipitation) and GPCC (Global Precipitation Climatology Centre) datasets present the best ensemble of performance statistics for both South America and West Africa. Results OPEN ACCESS Remote Sens. 2015, 7 6455 also highlight the potential of the presented technique to downscale satellite-derived rainfall worldwide.


Introduction
Quantifying the spatial distribution of annual rainfall is crucial for water resource management in developing countries such as those of South America and West Africa.Precipitation regulates hydrological and agricultural process, and an accurate estimate of the Mean Annual Precipitation (hereafter MAP or annual precipitation) is necessary to different disciplines including, among others, hydrological modeling, agricultural resources management, ecology and meteorology.
Beside its importance for an improved knowledge of the water cycle, annual precipitation is also crucial for mitigation strategies of natural hazards and Disaster Risk Reduction: the work by Hosking et al. [1] on L-moments and regional frequency analysis identified statistically significant relationship between rainfall extremes and annual precipitation (for further reference see Schaefer [2]).
However, the main issue is the difficulty with obtaining precipitation data with an adequate spatial resolution.Many watersheds, especially in developing countries, are poorly instrumented and it becomes difficult to generate maps of annual precipitation using only data from rain-gauge stations.The sparse observational data coverage also means that use of advanced hydrological analysis is not allowed.
In the last 30 years, precipitation estimates are also available at global scale from satellite sensors.Satellite-derived precipitation products generally exploit the Thermal Infrared (TIR), the microwave channel and ground measurements (for further reading see Tapiador et al. [3]).The launch in 2014 of the NASA Global Precipitation Measurements mission (GPM, [4]) is an evidence of the central importance of rainfall estimation from satellite for the hydrological and geophysical research.
Nevertheless, satellite-derived precipitation is at coarse spatial resolution (see Table 1 in Section 2), while for many hydrological applications it is recommended to use a finer spatial scale.For example, the Tropical Rainfall Measuring Mission (TRMM, see Section 2.2.2) has a spatial resolution of ~25 km, that may not capture the spatial heterogeneity of precipitation fields.Generally, the finer the spatial resolution, the better the estimation of runoff volumes and other hydrological fluxes [5].This issue is amplified at low latitudes, dominated by convective rain events with high spatial variability [6].
A common answer within the hydrological community consists in the spatial downscaling/disaggregation of low-resolution rainfall from satellite sensors [7].Precipitation fields vary depending on altitude and wind direction, and they influence vegetation greenness.Many studies thereof hypothesize a linear relationship between precipitation and satellite-based products such as Digital Elevation Model (hereafter DEM), and vegetation indices such as the Normalized Difference Vegetation Index (NDVI) or the Enhanced Vegetation Index (EVI) [8].For example, Immerzeel et al. [9] assumed that NDVI is a proxy for cumulated precipitation over the Iberian Peninsula, and consequently the NDVI has been regressed against satellite-derived precipitation.Hunink et al.,  developed linear regression models employing the DEM and the NDVI to downscale rainfall, considering also the topographical effects.Most of approaches use global regression analysis, assuming that the relationship between precipitation, elevation and vegetation greenness does not vary spatially [13].However, this assumption is inconsistent with many studies that contradict the spatial stationarity of the relationship between precipitation and NDVI [14].Especially for large area (i.e., analysis at regional/continental scale), when the non-stationarity in the relationship among rainfall and other environmental variables is modeled by stationary models, possible wrong conclusions might be drawn.
A recent and original approach is the one implemented by Chen et al. [15], where satellite-derived precipitation is downscaled at 1 km spatial resolution using a Geographically Weighted Regression (GWR), a local form of linear regression used to model spatially varying relationships (for further details on GWR see Section 2.2).
The goal of this study is to develop a downscaling scheme to obtain annual precipitation at 1 km spatial resolution across South America and West Africa.To address this question we created datasets of MAP at a spatial resolution of 1 km in South America and Western Africa, downscaling the estimates of annual precipitation acquired by satellites and gauge observations.
In the present study we exclude "local rain gauge" datasets since we want to employ only well-established and long-standing precipitation datasets.Precipitation information is mainly derived by satellite instruments, and to a lesser extent by gauge observations (see Section 2.1.):henceforward, estimates of annual precipitation acquired by satellites or gauge observations that we want to downscale will be referred to as "satellite-derived MAP".
The originality of this study lies in the high resolution datasets of annual precipitation, rather than in the methodology per se.As previously mentioned, the downscaling scheme implemented in this study has been proposed by Chen et al. [15] across North China.However, the proposed methodology offers some elements of novelty compared to the models that have inspired it, such as the "global" character of the methodology and the use of the EVI instead of the NDVI (for further discussion, see Discussion section).
The downscaling method presented below implements the procedure in which, once we have chosen the country and the rainfall dataset: (1) a Geographically Weighted Regression (GWR) is performed among the satellite-derived MAP, DEM and average annual EVI at coarse spatial resolution, (2) the residual of the regression-i.e., the amount of satellite-derived precipitation that could not be explained by the model-is interpolated at 1 km resolution and (3) the GWR is applied to DEM and average annual EVI at the original resolution of 1 km, and the high-resolution residual is added.
This technique is applied for each of the seven satellite-derived precipitation products available (see Sections 2.1.1 to 2.1.7).For each product, validation analysis with ground weather stations has been carried out to determine the reliability of rainfall estimates at high spatial resolution.
Combining these datasets and techniques the aim of the study is to: (1) improve retrievals of rainfall at high spatial resolution, and (2) provide constraints on the uncertainty in precipitation estimates from remote sensing across West Africa and South America.This paper describes the methodology and presents results over South America and over West Africa, even if it can be applied worldwide.

Materials
In this section we present the rainfall datasets used in the present study, the Enhanced Vegetation Index, the Digital Elevation Model and the validation dataset.Table 1 summarizes the rainfall datasets used.The latter are described in Subsections 2.1.1 to 2.1.7,whereas Enhanced Vegetation Index, the DEM and the validation dataset are described in Subsections 2.1.8to 2.1.10.For all practical purposes, (1) all datasets are first corrected removing obvious outliers, then (2) the daily/weekly/monthly products are aggregated on annual basis, and (3) then the annual precipitation is calculated for the entire reference period of the dataset.
This study has been carried out for the well-established and long-standing precipitation datasets available in literature.However, for the sake of simplicity, not all the global precipitation datasets available have been used.For example, Brocca et al. [16] recently proposed a promising "bottom-up" approach that, by doing inverse modeling, employs soil moisture estimates from microwave satellite to infer the preceding rainfall amounts.However, the rough spatial resolution (~100 km) effectively renders it useless for this study.Precipitation from Global Climatic Models (GCM) [17] are also excluded because of their rough spatial resolution (again, in the region of 100 km).
If possible, we consider the longest time series of precipitation even if their timespans are not consistent with each other, in order to have a more reliable estimate of the MAP.As a result, we assume that temporal trends in annual precipitation are negligible.This is a strong assumption, but trend analysis and trend removal at continental scale, along with the spatial interpolation of low-resolution datasets would only make the final outcome more inaccurate.Moreover, trend analysis is beyond the scope of the study.

GPCC
The Global Precipitation Climatology Centre (GPCC) provides the mean monthly global land-surface precipitation for every month of the year and the annual total [18].This climatology is available on a regular latitude/longitude grid with different spatial resolutions: 0.25°, 0.5°, 1.0°, and 2.5° (i.e., ~25, 50, 100 and 250 km, respectively).The GPCC dataset at 0.25° is the best for complete spatial coverage, whereas the GPCC dataset at 0.50° has the highest accuracy.We decided to use in our study the 0.25° product, preferring the spatial coverage and the spatial resolution to the accuracy.The GPCC rainfall product at 0.25° spatial resolution (hereafter GPCC) is based on the 67,200 stations world-wide that feature record durations of 10 years or longer, for the target reference period January 1951 to December 2000 [19].

TRMM 3B43
TRMM 3B43 dataset is based on the Tropical Rainfall Measuring Mission (TRMM) and the monthly accumulated GPCC rain gauge dataset.TRMM is a joint U.S.-Japan satellite mission to monitor tropical and subtropical precipitation, which employs radars, passive microwave along with visible/infrared radiometers.The purpose of TRMM 3B43 is to produce the "TRMM and Other Data" best-estimate precipitation estimates.Basically, TRMM rainfall data are summed for the calendar month, and then the GPCC rain gauge data are used to perform a large-scale bias correction [20].The final gridded precipitation data have a monthly temporal resolution and a 0.25° spatial resolution.TRMM 3B43 dataset (hereafter TRMM) is available for the period January 1998 to April 2015.Spatial coverage extends from 50° S to 50° N latitude (for further information see [21]).

PERSIANN CDR
The PERSIANN CDR (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks Climate Data Record) consists of daily precipitation estimates derived from long wave infrared satellite data [22].The PERSIANN CDR dataset (hereafter PERSIANN) provides precipitation data beginning in 1983 at 0.25° spatial resolution.Satellite-derived rainfall are adjusted using the GPCC monthly product to maintain consistency of the two datasets throughout the entire record [23].

CMORPH
CMORPH refers to the name of a technique (MORPHing technique) to produce global precipitation maps from microwave satellite source at high spatial and temporal resolution.At present, the National Oceanic and Atmospheric Administration (NOAA) released a CMORPH product, called CMORPH Version 1.0 [24] incorporating precipitation estimates derived from the passive microwaves aboard the NASA's AQUA and TRMM spacecraft (for further information see Joyce et al. [25]).Please note that, although the dataset is also available at 8 km spatial resolution, the resolution of the individual satellite estimates is coarser than that.Therefore the finer resolution is obtained via interpolation.We used in the present study the CMORPH gauge-satellite blended precipitation product from 1998 onwards, at 0.25° spatial resolution (hereafter CMORPH).A routine has been implemented to: (1) aggregate the daily rainfall to annual time step and (2) calculate the annual precipitation throughout the entire record period.

CHIRP
Climate Hazards Group Infrared Precipitation with Station data (hereafter CHIRP) is a 30-year quasi-global rainfall dataset [26].Spanning 50° S-50° N (and all longitudes), starting in 1981 to present, CHIRP incorporates 0.05° resolution satellite imagery with in-situ station data to create gridded monthly time series for seasonal drought monitoring and water resource management [27].

RFE
The Rainfall Estimate (hereafter RFE) is an operational product created by the Climate Prediction Center for the United States Agency for International Development Famine Early Warning System project to assist in drought monitoring and flood forecasting over Africa [28].The spatial resolution is 0.1° and the spatial domain spans −40° S to 40° N in latitude, and 20° W to 55° E in longitude, encompassing thus the whole Africa continent.Daily precipitation maps are created by merging gauge observations and three kinds of satellite estimates (i.e., GPI, SSM/I and AMSU) using the method described by Xie and Xiong [29].Data is available from 2000 onwards [30].

TAMSAT
TAMSAT stands for Tropical Applications of Meteorology using SATellite data and ground-based observations and it is developed at the department of meteorology of the University of Reading, UK.
The TAMSAT group provides a 30-year rainfall climatology (from 1983 to 2012) covering Africa with a spatial resolution of 0.0375° [31].Ten-daily and monthly rainfall estimates, along with the corresponding anomalies, are derived from archived Meteosat thermal infrared imagery, calibrated against rain gauge records collected from numerous African agencies [32].
2.1.8.Vegetation Index: EVI Vegetation indices are dimensionless variable, generally varying between zero and one, useful for many applications, such as agriculture and hydrological modeling, which can be monitored from space.Vegetation indices are attractive because they can provide information on catchment water balance, and because their spatial patterns are driven by water availability [33].
The most widely used vegetation index is the Normalized Difference Vegetation Index (NDVI), which consists of a normalized ratio of the near infra-red and red spectral bands.However, NDVI often saturates, it is affected by atmospheric and soil conditions, and it might be no longer sensitive to changes in vegetation [34].To overcome these limitations, new vegetation indices, such as the Enhanced Vegetation Index (EVI), have been created.EVI is a Vegetation Index that effectively characterizes bio-physical/ biochemical states and processes from vegetated surfaces [8].EVI minimizes canopy background variations and maintains sensitivity over dense vegetation conditions.The EVI also employs the blue spectral band, particularly sensitive to aerosols, to account for atmospheric effects.EVI is calculated from these measurements as follows: where RED, BLUE and NIR stand for the surface reflectance measurements acquired in the visible (red and blue) and near-infrared regions, respectively; whereas C1, C2 are the coefficients describing the aerosol resistance, and L is the canopy background adjustment parameter.
The most successful applications of EVI are reported in areas having high biomass such as the Amazon forest [8,35] and Africa rainforest [36].The EVI, in the same manner as the NDVI, is sensitive to the cumulated rainfall, and its average annual value can be used as a proxy for the mean annual precipitation [9,15].
The EVI dataset employed in this study is derived from monthly observations captured by Moderate Resolution Imaging Spectroradiometer (MODIS) on board the Terra satellite at spatial resolutions of 1 km (MODIS product MOD13A3) from 2000 to present [37].EVI dataset has been averaged at annual scale, and then we have calculated the average annual EVI for the entire reference period of the dataset.This average annual EVI-a kind of "climatology" of annual EVI with 14 years of observations-is the product that we use consecutively to downscale satellite-derived precipitation.
It has not escaped our notice the issue of the temporal consistency between EVI and rainfall datasets which often span different or successive periods (e.g., GPCC refers to the 1951-2000 period).However, we assume that in ~10 years EVI dataset provides us with a good estimate of the climatology of vegetation, filtering out anomalies and outliers.In order to obtain a consistent EVI measurement through a longer time, we should have processed GIMMS (Global Inventory Modeling and Mapping Studies) dataset [38] instead.However, GIMMS dataset has been discarded since it presents a coarse resolution (i.e., ~8km), combining different sensors spanning successive periods and does not employ the blue spectral band, particularly sensitive to aerosols, not accounting thus for atmospheric correction.
One of the main problem facing with EVI products from MODIS is represented by rivers, lakes and water bodies, where the EVI has negative and/or low values.These areas have to be identified, filtered out and interpolated to avoid problems with the subsequent regression.Identification of water bodies is not straightforward [39] since we cannot use an a priori threshold to mask EVI value corresponding to water bodies.In order to have a global and portable method, we use the SWBD dataset (SRTM Water Bodies Data, [40]) to mask water bodies and to remove the corresponding EVI values.EVI values are subsequently filled with a simple bilinear interpolation.
2.1.9.DEM Information on the elevation (i.e., DEM) is available from the HydroSHEDS product (Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales [41]), derived from the Shuttle Radar Topography Mission (SRTM, [42]).The spatial resolution of the DEM is 1 km.HydroSHEDS product provides hydrographic information in a consistent and comprehensive format for regional and global-scale hydrological applications.HydroSHEDS dataset includes stream networks, watershed boundaries and drainage directions.The original SRTM data have been hydrologically conditioned to void filling and remove spurious sinks using a sequence of automated and manual procedures (for further details see Lehner et al. [43]).

Validation Datasets
Rainfall data from ground weather stations across South America and West Africa have been used for validation purpose.Specifically, in-situ annual precipitations are provided by national meteorological institutions participating in EUROCLIMA project [44] (for further information see acknowledgment section).
Rainfall products have been merged and undergone quality test, including the removal of duplication (e.g., rainfall data are available in different source datasets), outliers, and stations having less than 15 years of records.
It is important to underline that the validation dataset is independent of satellite-derived precipitation.The use of EUROCLIMA dataset allows a relatively homogeneous spatial coverage across South America and West Africa in order to assess the reliability of rainfall estimates at high spatial resolution.
All the high resolution rainfall products are validated with the independent meteorological rainfall dataset (hereafter observed precipitation).Three statistical indicators are used for the evaluation of the performance of the high resolution rainfall products, according to the formulae given below: (1) the root-mean-square error-RMSE; (2) the absolute error-Abs.Err.; and (3) the mean percentage error-MPE-i.e., the mean error computed as percentage.
where Pdownscaled and Pobserved represent respectively downscaled and observed precipitation, while n is the total number of validation points.

Methods
The downscaling scheme presented in this paper has been developed on the basis of the study of [15].The basic assumption of this study is the existence of a linear and spatially variable relationship among MAP, DEM and EVI.
The downscaling scheme, shown in Figure 1, is organized through the following steps: We select the satellite-based rainfall dataset at coarse resolution to downscale, hereafter MAPsat@CR (where CR stands for "Coarse Resolution"); 1) DEM and average annual EVI are upscaled by pixel averaging from the original fine-scale resolution of 1 km (i.e., DEM@1km and EVI@1km) to the spatial resolution of the satellite-based rainfall dataset (e.g., 25 km for GPCC, 5 km for CHIRP), hereafter DEM@CR and EVI@CR; 2) A Geographically Weighted Regression (GWR, see next subsection for further details) is performed to establish a relationship among MAPsat@CR, DEM@CR and EVI@CR.In essence, the GWR is a local form of linear regression described in the Equation ( 5): where β0(u), β1(u), and β2(u) are the intercept and the slope parameters, varying with location (u), while ∆@CR is the residual, representing the amount of precipitation that cannot be explained by the model.
If we write the regression without the residual component we obtain: where MAPpred@CR is the annual precipitation at coarse resolution predicted/explained using the regression model of DEM and EVI.The closer the value of the predicted rainfall to the satellite-based rainfall, the better the performance of the regression and the lower the value of the residual.The GWR also supplies a per-pixel flag documenting the coefficient of determination of the regression (i.e., r 2 ); 3) GWR is repeated excluding EVI@CR: in this way we re-compute MAPpred@CR, along with GWR parameters and a map of coefficient of determinations: Step ( 4) is performed because in some cases there is not a strong relationship between precipitation, DEM and EVI, hence it is beneficial to exclude EVI; 4) For each cell of the spatial domain, we select the MAPpred@CR , the intercept and slope parameters of the GWR associated with the highest coefficient of determination r 2 , as shown in Figure 2; 5) We subtract MAPpred@CR , obtained in step (5), from MAPsat@CR, obtaining the residual of regression model at coarse resolution, @CR, which represents the amount of precipitation that cannot be explained by the regression based on DEM and EVI; 6) We downscale @CR to 1 km resolution by applying a cubic spline interpolation, and we obtain @1km; 7) We downscale the model parameters to 1 km by the nearest neighbour resampling, obtaining thus β0@1km, β1@1km, and β2@1km; 8) We apply the GWR model with the EVI and DEM at the original spatial resolution of 1 km, obtaining thus the predictive value of annual precipitation with 1 km resolution, hereafter MAPpred@1km: For the cells where EVI values are masked, the corresponding values of the coefficient 2@1km are set equal to zero; 9) By adding this high resolution predictive precipitation data to the high-resolution residual obtained in step (8), we attain the final downscaled MAP with a 1 km resolution, hereafter MAP@1km, as described below in Equation ( 9): Steps 4 and 5, i.e., the modifications of the GWR-are introduced to improve the performance of the regression by excluding EVI.For example, where the land use is fragmented, it may happen that the relationship between EVI and MAP is not robust: the Vegetation Index is tied to land use while its relationship with rainfall is weak.In this case, the regression among annual precipitation and DEM could better explain the data than the regression that also includes EVI.The downscaling methodology is implemented for each country of both South America and West Africa, and for each satellite-derived dataset.For South America, we run the downscaling algorithm 65 times (13 countries multiplied by 5 rainfall datasets), 105 times for West Africa (15 countries multiplied by 7 rainfall datasets).Please note that for Africa we have two more datasets (i.e., TAMSAT and RFE) that are Africa-specific.
It is worth noting that downscale schemes employed for the residual and for the parameters of the regression model (i.e., steps 7 and 8) are different, as stated in Immerzeel et al., Jia et al. and Chen et al. [9,11,15].The residual is de facto a rainfall, thus it makes sense to employ a downscaling scheme such as the spline interpolation that considers the values of neighbouring points.Conversely, parameters of the model cannot be modified freely, since a slight variation might cause large and unexplained variations in the output of the model.Thus it makes sense the use of a piecewise-constant interpolator such as the nearest neighbour to disaggregate the parameters of the regression model.
Also, note that we use the acronym CR for coarse resolution instead of a fixed number throughout the paper: the size of the low-resolution dataset depends on the choice of the original rainfall dataset, ranging from 4 km to 25 km (e.g., TAMSAT and GPCC, respectively).The proposed methodology supplies a per-pixel flag, ranged in a separate layer.Each flag documents the goodness of fit of the regression (i.e., r 2 ) and the variables employed in the regression (i.e., both EVI and DEM, only EVI, only DEM).
The Geographically Weighted Regression is described more fully in the next sub-section.

Geographically Weighted Regression (GWR)
The relationships among MAP, elevation and vegetation is intrinsically different across space: in some areas, precipitation are higher than in others having the same values of EVI and DEM.The degree to which annual precipitation varies may relate to other neglected effects such as terrain aspect, wind direction, distance from the sea or land cover which exert local downward/upward influence on precipitation.
As pointed out in Chen et al. [15], even if we find a statistically significant relationship between annual precipitation, DEM and EVI, if we apply uniformly this relationship over the whole country, we will have a large amount of variance that is not explained by the regression.An insight on the need to characterize spatial heterogeneity is given in Figure 2.
In this example, we establish a global linear regression for the whole Colombia among TRMM annual precipitation, DEM and EVI, assuming that the parameters of the model (i.e., slopes and intercept) are constant over space.The p-values computed for the global linear regression are less than 0.01, indicating thus that it is unlikely that no relationship exists between annual precipitation, DEM and EVI.The first panel shows the precipitation from TRMM satellite, whereas the second panel represents the annual rainfall explained by DEM and EVI with the global model.The third panel represents their difference, i.e., the difference between TRMM and modeled rainfall.The spatially homogeneous global regression cannot capture spatial variations in the relationship between MAP, DEM and EVI, much to the detriment of spatially varying relationships.Spatial patterns of annual precipitation are captured by the residual instead of the model itself.
The spatial non-stationarity in this study is modeled by means of the Geographically Weighted Regression (GWR) [13], a linear regression where the relationship between variables varies spatially.GWR has been initially used as graphical tool for data exploration in social science, economy and urban economics disciplines.Only in the last decade it has been broadly applied in geoscience disciplines.The general form of the GWR for one independent variable is expressed in the Equation (10): where x and y are the independent and the dependent variable, β0 and β1 are the intercept and the slope of the regression, respectively, ε(u) is the residual and (u) are the spatial coordinates.Instead of remaining the same everywhere, as in the case of the global linear model, the intercept and the slope of the GWR vary in terms of locations.In other words, the spatial non-stationarity of GWR implies that the same combination of EVI and DEM may refer to different annual precipitation values in different parts of the study region.
In order to be able to do it, a separate regression is run for each point of the spatial domain, using a spatial kernel that centers on a given point and weights observations subjected to a distance decay function, also called bandwidth, as shown in Figure 3. Separate equations are built for every kernel in the spatial domain, incorporating the dependent and explanatory variables falling within the bandwidth of kernel.As the bandwidth gets larger the GWR model approaches the global model.Conversely, as the bandwidth decreases, the GWR will increasingly rely on observations closer to the regression point.Basically, the main assumption is that nearer observations have more influence in estimating the local set of regression coefficients than observations farther away.
Without going into much detail, in literature, many different approaches have been proposed to define the shape and extent of this spatial kernel [13].The most popular for gridded dataset, just as in our case, is the fixed Gaussian kernel.The fixed kernel suits regular gridded datasets, whereas the adaptive kernel is suitable for sparse and irregular configurations.However, whichever weighting function is used, the result is highly sensitive to the bandwidth, whose choice is more important than the shape of the kernel [45].
Bandwidth optimization provides a way of choosing bandwidth that makes optimal trade-off between bias and variance [46]: the smaller the bandwidth, the larger the variance but the lower the bias; conversely, the larger the bandwidth, the larger the bias but the smaller the variance.The optimum kernel bandwidth for the GWR can be found by minimizing some model fit indicators, such as a leave-one-out cross-validation (CV) [47] or the Akaike Information Criterion (AIC) [48].However, the latter is excluded since it significantly increases the computational burden of the fitting process.
In our study we employ a fixed Gaussian kernel and the CV criterion is used to determine the "optimal" value for the bandwidth for each country: in practice we employ the bandwidth associated to the lowest CV.However, a test study over two small countries such as French Guyana and Benin (representative of South America and West Africa, respectively), indicates that the results of bandwidth minimization obtained with AIC and CV method are absolutely identical.
In fact, there is a calculation limitation: we cannot always employ the GWR for each country as a whole because the computational burden of this implementation becomes a significant barrier.For this reason, each country with more than 250,000 cells (1,250,000 km 2 with CHIRP dataset at 5 km resolution, e.g., Brazil or Argentina) is partitioned into smaller sub-regions for whom we apply the GWR.In practice we divide the spatial domain in blocks of adjacent cells, and we sequentially run the GWR over these blocks of contiguous pixels.This partition is subjectively-decided but it allows to avoid computation limitation and it cuts down calculation times.
In order to quantify the effectiveness of this partition, that may introduce some errors, a test case has been designed over the French Guyana with the CHIRP precipitation.This country has been chosen because it represents an optimal size (i.e., 83.534 km 2 ) to: (1) perform the GWR on the entire country, (2) perform the GWR on its partition into two sub-regions which are statistically significant, and (3) evaluate the difference of the modeled rainfall with the two different methods.Given the elongated shape of the French Guyana, the partition was carried out by dividing the country into two blocks along an imaginary parallel/horizontal line.
The differences of annual precipitation predicted with a GWR performed (1) on the entire country and (2) on the two subsets indicate that no significant biases are introduced: the absolute error is about 0.6 mm.A possible explanation is that the area of influence of the GWR is strictly limited to the neighborhood.Therefore the GWR is "local" and the net effect of increasing the size of the spatial domain lies only in computation times, while the difference pertaining to the results is negligible; and the benefits of the proposed methodology can be considered to be far beyond their potential drawbacks.
Clearly, GWR is a powerful spatial downscaling technique, but there are other methods that may fit certain situations better.Specifically, since information on rainfall, elevation and vegetation is available at all grid nodes (i.e., all raster nodes or pixels in the maps), we could have used Kriging with External Drift [49] or Regression-Kriging [50] methods instead.Example of Regression-Kriging of precipitation across China is provided by Teng et al. [51].However, Kriging techniques are computationally demanding and often cannot be used [52].In our case, the computation of predictions and errors using Kriging for entire countries might need lots of memory and processing time and it is not feasible.Furthermore, GWR presents appealing and unique features such as the quantification of the impact of each of the independent variables of the model over space, in addition to feasible computation times.In other words, with the proposed methodology it is always possible to quantify the independent variable coefficients over space and to map the changes of their influence on the dependent variable.
In order to quantify the performance of the GWR, Figure 4 shows the residual of the GWR method, i.e., the difference between remotely sensed and explained precipitation across Colombia using CHIRP dataset.
The residual of the model ranges from −160 to 150 mm, whereas the average value is equal to 3.5 mm.The highest values are generally located where annual precipitation is higher and where the orography is more complex, e.g., the Andean range.It is also possible to see a kind of tiled texture of the residual that depends on the bandwidth and on the number of independent variables used in the regression.In fact, if the regression for a given pixel uses only the elevation, while in the neighborhood also EVI is employed, the residual associated to the pixel will "stand out" against the background.Moran test excludes autocorrelation for the residual across the entire country but the upper-right zone (i.e., the square-shaped feature) and the slopes of Andean range (i.e. the bluish and the reddish areas in the central-left zone).

South America
Figure 5 shows, by way of example, the output of the downscaling scheme across South America for CHIRP dataset.The number of available weather ground stations used for validation purpose, shown with the green dots is equal to 3353.Extreme values of color bars at country level (i.e., Figure 6) are different from those at continental level (i.e., Figure 5) since the upper values depend on the statistical distribution (i.e., percentiles) of the rainfall.It is possible to notice on the upper-right zone of Figure 6, a square-shaped region due to the same square-shaped feature in the coarse resolution CHIRP map.Also, note that the map of South America is cut at 50° S due to the spatial extent of the original CHIRP dataset.
Other than that, the map presents a smooth texture.Anyway, edges and high gradients depend on the original low-resolution dataset used in the downscaling procedure: if there is an abrupt change of precipitation, our method has to preserve this spatial pattern in the high resolution dataset.
Figure 7 shows the scatterplot of the difference between observed and downscaled precipitation across the entire South America.For each panel, the insets show the difference in annual precipitation for each satellite-derived product versus rain gauge measurements.A perfect matching would collapse all points on the 1:1 line.The color scale on each plot indicates the occurrence of the two distributions; the blue color corresponds to a low density of sample points whereas the red colour indicates the highest density.Each panel in this figure also reports the root mean squared error (RMSE), and the absolute error (Abs.Err.) between satellite-derived and in situ records.CHIRP exhibits the best correlation, whereas CMORPH and PERSIANN provide the highest RMSE and Abs.Err.The RMSE varies in the two cases from 388 mm (CHIRP) to 685 mm (PERSIANN).Less than outstanding results come from TRMM that positions itself between PERSIANN and CHIRP.GPCC shows good results, especially in terms of Abs.Err.
Box-and-Whisker plot of Figure 8 summarizes the difference between downscaled and observed precipitation across South America for different satellite-derived products, showing the statistical distribution of the difference.
Results confirm our previous findings: CHIRP and GPCC exhibit the best correlation.Good results come also from TRMM and CMORPH, whereas PERSIANN displays the highest disagreement.CMORPH tends to underestimate, while PERSIANN tends to overestimate precipitation.Figure 9 provides a summary of the difference between downscaled and observed precipitation with respect to different elevation classes (i.e., the elevation of rain gauges).
Since most of rain gauges are located below 1000 m asl, good results for that class (especially for CHIRP, CMORPH, TRMM and GPCC) lend confidence to the satellite-derived products.Rain gauges located between 1000 and 2000 m asl present a large spread.Also the "above 2000 m" class presents notably bad statistics, with the exception of CHIRP and CMORPH.The interquartile range (i.e., the "width" of the box) of this elevation class indicates a general overestimation of RS-derived rainfall.It is also possible to observe that CHIRP achieves better results than GPCC for the "above 2000 m" and the "1000-2000 m" classes.
Figure 10 shows the difference between downscaled and observed precipitation with respect to different climate classes, using the Köppen climate classification [53].Here as well, CHIRP and GPCC exhibit the best correlation, whereas PERSIANN exhibits the highest difference.CHIRP achieves slightly better results than GPCC for Tropical and Polar/Alpine climate classes, the other way around for Dry climate.CMORPH shows notably good results for Dry climate, not outstanding results for Tropical and Temperate climates, and poor results for Polar/Alpine climate (it does not contain the zero line inside the interquartile range).Both PERSIANN and TRMM present bad results and tend to underestimate rainfall.However, TRMM achieves better results than PERSIANN in terms of interquartile and extreme ranges, with the exception of the Temperate climate, and the zero line is always inside the interquartile range.
The scatterplot analysis performed at continental scale, has been carried out also at country scale: Figure 11 shows the scatterplot of the difference between observed and downscaled precipitation for Venezuela.CHIRP exhibits the best correlation, whereas PERSIANN displays the highest disagreement.The RMSE varies in the two cases from 317 mm (CHIRP) to 568 mm (PERSIANN).
Table 2 summarizes the main statistical indicators.Also, numbers in parentheses in the first row quantify the number of rain gauge stations employed for each country.It is possible to observe that Brazil, Colombia and Venezuela have dense weather networks.
Generally, results pertaining to CHIRP and GPCC are consistent with previous findings of Figure 7 and Figure 11.The highest agreement is reached for CHIRP and GPCC for all the test countries but Uruguay.GPCC outperforms CHIRP for Peru.CMORPH on one hand gets reasonably good results for Uruguay, on the other hand gets notably poor results for the other countries.Across Venezuela, a country with a high number of rain gauges, it is possible to observe that CHIRP exhibits the highest correlation, conversely, CMORPH the lowest one.Note that the worst results in terms of validation at continental scale comes from Colombia that exhibits the worst set of statistical indicators, increasing thus the RMSE and the absolute errors.This is probably due to: (1) the highest complexity, (2) the highest values (i.e., up to 10,000 mm of annual precipitation), and (3) the spatial variability of rainfall fields across Colombia, also visible from the Köppen Geiger climate classification [53].Conversely, Brazil, the country with the highest number of rain gauges, maintains reasonably good results with a RMSE around 200 mm and the lowest MPE.
Figure 12 provides a "local scale" example of the performance of the RS-derived rainfall products across the Bogotá river basin, where the capital city of Colombia is located.
Bogotá river basin has been chosen because of: (1) the complex orography (the altitude ranges from 50 m to 3000 m asl), (2) the presence of around 50 rain gauges with precipitation records from 1975 onward, and (3) the high spatial variability of annual precipitation, ranging from 500 to 1300 mm. Figure 12 shows the absolute value of the difference between satellite-derived product and observed rainfall.The latter has been interpolated with Ordinary Kriging technique.Annotations in the maps indicate the mean and the standard deviation of the absolute value of the difference.It is possible to observe isolated spots with a difference greater than 1500 mm, mainly located in the most elevated areas of the basin.These errors are probably due to the complex orography and the high spatial variability of rainfall.CHIRP exhibits the best performance, followed by CMORPH and GPCC.CMORPH confirms our previous findings, achieving good results for the "above 2000 m" elevation class.TRMM and PERSIANN show the highest divergence, with average errors up to 650 and 850 mm, respectively.Squared structures in the texture of the maps, mainly noticeable for CHIRP, TRMM and CMORPH, depend on the coarse resolution of the original rainfall products.Note that the interpolation technique used for the rain gauges (i.e., Ordinary Kriging) does not affect the results: spatial patterns of the difference between satellite-derived and observed precipitation-and the corresponding statistics-do not depend on the interpolation method because of the high density of gauge stations within the basin.

West Africa
Figure 13 shows, by way of example, annual precipitation over West Africa using GPCC and TRMM, respectively.Clearly, the spatial patterns of MAP are different from each other, in the same way as the original low resolution datasets.In these figures, especially across the Sahel region, it is possible to notice a kind of tiled texture.The small squared structures are nothing more than areas where there is a statistical relationship between MAP and DEM and EVI, surrounded by areas where there is a relationship only among MAP and DEM (see points 4 and 5 in subsection 2.2).These irregularities in the texture of the high resolution maps are the drawbacks of the proposed methodology.
Whilst for South America we have several national weather service/networks, West Africa has one of the least developed weather network [54].The number of available weather ground stations, shown with green dots in Figure 13 is equal to 344.Their spatial distribution is quite uneven, mainly concentrated across Senegal and rare across desert regions and the Gulf of Guinea.This is the reason why the country-level validation has been conducted only for the five countries with more than 10 rain gauge stations in order to have robust statistics.
As for South America, all the high resolution rainfall products are evaluated against the in-house national meteorological rainfall products.Figure 14 shows the scatterplot of the difference between observed and downscaled precipitation for the entire West Africa.Figure 14 shows that the best performance statistics refer to CHIRP, whereas GPCC, as for South America, presents comparable performance statistics.PERSIANN goes from the poor performance across South America to good performance across West Africa.TRMM gets reasonably good results; conversely RFE, TAMSAT and CMORPH have among the highest errors.RMSE values range from 135 mm to 386 mm, whereas absolute error ranges from 85 mm to 238 mm for GPCC and CMORPH, respectively.It is interesting that density plots exhibit a trace of bimodal distribution, noticeable by the reddish and greenish clusters around 500 and 1200 mm of annual precipitation.
Just as for South America, Box-and-Whisker plot of Figure 15 summarizes the difference between downscaled and observed precipitation across West Africa for different RS products.Results support our previous findings.GPCC's median is closest to zero, however, CHIRP has an interquartile and an extreme range narrower than those pertaining to GPCC.PERSIANN and TRMM show comparable performances.RFE and CMORPH present notably poor results, whereas TAMSAT displays the highest disagreement.Also, note that the interquartile range of the latter does not contain the zero.
The elevation-dependent analysis has not been carried out because of: (1) the reduced size of the validation dataset, and (2) the distribution of the elevation.In fact, around 60% of the rain gauges across West Africa are located below 100 m asl.
Figure 16 provides a summary of the difference between downscaled and observed precipitation with respect to the two principal climate classes of the rain gauge network available across West Africa, namely Tropical and Dry climates.Box-and-Whiskerplot analysis shows that precipitation across Dry zones is reasonably well reproduced for all the satellite products but TAMSAT and TRMM, whereas across Tropical zones is well reproduced only for GPCC, PERSIANN and TRMM.GPCC presents the best ensemble statistics, RFE and CMORPH shows good results for Dry climate and TAMSAT shows notably bad statistics.Again, for both Tropical and Dry climates, GPCC's median is closest to zero, even if CHIRP exhibits an interquartile and an extreme range narrower than those pertaining to GPCC.
Just as for South America, Table 3 summarizes the main statistical indicators used for the evaluation of the rainfall products.
For Senegal, that includes around 50% of the rain gauge stations of West Africa CHIRP is the best dataset, followed by PERSIANN and GPCC, while TAMSAT and CMORPH present the highest discrepancies.Also, CMORPH exhibits contrasting results in RMSE, Abs.Err. and MPE: the percentage error is fairly low, whereas both RMSE and Abs.Err are not the lowest ones.
Generally, results are consistent with previous findings of Figure 14 but for Mauritania where PERSIANN outperforms both CHIRP and GPCC.Also, Table 3 suggests that CMORPH presents a dual behavior: on one hand it gets reasonably good results for Burkina Faso and Niger, on the other hand it gets notably poor results for the other countries.

Discussion
Obviously the scheme presented here must be considered as a first-order attempt towards rainfall downscaling across regions with sparse gauge data, and consequently has a number of limitations that prevent it from reproducing precipitation patterns completely truthfully.
Firstly, as pointed out in the Materials section, the temporal consistency among satellite-derived rainfall, EVI and rain gauge time series is a critical aspect.We always consider the longest time series of precipitation from both satellite and rain gauge-even if the corresponding timespans are not consistent with each other-in order to have a more reliable estimate of the MAP.On one hand, this assumption might be an oversimplification of the real situation and generate errors.On the other hand, if we use a consistent time window (e.g., 2000-2012) between EVI and satellite-derived precipitation: (1) GPCC is excluded, since there is no temporal overlap with the EVI dataset, and (2) we lose the temporal consistency with the validation dataset.Note that the validation dataset is a climatology that mainly consists of 30-year precipitation dataset.
Secondly, the validation of the high resolution precipitation product with ground-based measurements-the core element of our analysis [55]-is fraught with difficulties.Even if ground observations are available, they are limited to few points in space and time, and they often need a thorough quality control.Thereby, neither high-resolution satellite-derived nor rain gauge data can be considered as the "truth", and caution must be paid when handling these datasets.This is particularly true in the case of West Africa, where in situ data have not been made extensively enough to validate properly the spatial patterns of MAP.In other words, the volume of validation data over West Africa, especially in northern Mali, northern Niger, Liberia, and across the Gulf of Guinea is limited.Thereof, low values of RMSE could be misleading.
Thirdly, statistical indicators used in the validation phase such as RMSE, are scale-dependent and are sensitive to the magnitude of the rainfall regime.This implies that low RMSE across West Africa is also due to the fact that rainfall is generally lower than in South America.Conversely, across Colombia, the high values of RMSE are associated to the tropical climate and the displacement of the intertropical convergence zone with wet weather through the year (up to 5000 mm each year).
These limitations do not allow us to select directly the "most suitable" rainfall dataset, and require particular attentions and supplementary investigations.
That being said, our downscaling scheme has the capability to capture the complex and spatially varying relationship between precipitation, elevation and vegetation index, and as such can serve to quantify MAP across regions with sparse gauge network with an adequate spatial resolution for advanced hydrological analysis.Moreover, statistics presented in the Results section give an indication of the performance of the downscaled rainfall product.Country-level, elevation-and climate-class statistic complement the accuracy assessment of the downscaled data.These analyses, in spite of the scarce rain gauge network across South America and West Africa, provide a general indication to users on which dataset is better to use across South America or West Africa.
Also, note that the aim of this study is to provide high resolution precipitation maps, shown to be useful to increase the effectiveness of water resource management, rather than propose a new methodology.The originality of this study lies in the high resolution dataset at continental scale, not in the methodology.However, the proposed methodology offers some elements of novelty compared to the models that have inspired it (i.e., Chen et al. [15]).Firstly, with the proposed methodology there is the possibility to select the best regression at the option to exclude EVI from the regression.Secondly, we have a "global point of view"-i.e., we run our downscaling scheme over two continents-while Chen et al. [15] only addresses local scale needs.Thirdly, we use the EVI instead of the NDVI, and this accurate index is more advantageous to the regression.In fact, EVI is able to capture the relationship between precipitation and vegetation greenness being influenced to a lesser degree by soil or atmospheric effects.
Applications of these datasets are manifold, to name only one, the long-term average water balance.Maps of annual precipitation, along with evapotranspiration data, could be useful to quantify the Budyko curve for mean annual water balance [56] at continental level.

Conclusions
Although precise precipitation information can be derived from ground "truth", that is, rain gauge measurements, few rain gauges are generally available in developing countries: spatial distribution of rainfall presents a considerable challenge for these regions which often have sparse gauge data and a highly variable climate.In principle, satellites are accurate and versatile instruments to evaluate the precipitation instead, but their coarse spatial resolution (i.e., often in the region of 20 km) do not allow a direct use without undertaking appropriate downscaling.
Robust assessments of precipitation distribution at high spatial resolution for these regions could provide essential information for hydrological modeling, water resource planning and management, and ecosystem services, amongst others.
This research outlines a simple and portable approach to downscale at 1 km spatial resolution Mean Annual Precipitation (MAP) using different satellite-based products, using the scheme proposed by Chen et al. [15].The results are specific to South America and West Africa, but the method is designed to operate worldwide.
The proposed methodology includes: (1) the Geographically Weighted Regression (GWR) among satellite-derived annual precipitation, elevation (DEM) and a Vegetation Index (EVI) at coarse spatial resolution, (2) the interpolation at 1 km resolution of the residual of the model-i.e., the amount of satellite-derived precipitation that could not be explained by the model-and (3) the application of the GWR to the dataset of DEM and EVI at the resolution of 1 km, adding up the high-resolution residual.A pixel-wise flagging strategy documents each step of the procedure, providing us with the statistics of the regression and information about whether both EVI and DEM or only DEM are employed in the regression.
Validation indicates that high resolution MAP from CHIRP and GPCC present the best ensemble of performance statistics for South America.CHIRP exhibits the best correlation for the elevation class above 1000 m.Also, CHIRP achieves better results than GPCC across Temperate, Tropical, Polar/Alpine climates, the other way around across Dry climate.GPCC and CHIRP outperform the other products also across West Africa where CHIRP achieves better results than GPCC across Dry climate zones, the other way around across Tropical climate.
The high resolution annual precipitation across South America using CHIRP dataset presents a RMSE of 388 mm and an absolute error of 211 mm if compared to ground stations.For West Africa, the RMSE between high resolution annual precipitation from CHIRP and ground measurements is equal to 126 mm, whereas the absolute error is equal to 85 mm.
High resolution datasets of precipitation across South America and West Africa are crucial.Applications of this analysis are manifold: to name only few, MAP can be used to: (1) develop scenarios for land and river management, (2) manage the "water-energy-food" nexus [57], and (3) develop frequency analysis of rainfall extremes using jointly MAP values and L-moments [58].
The major advantage of this approach presented lies in its numerical simplicity.However, it should be noted that numerous assumptions lie beneath this methodology, and caution must be paid when handling the validation results because of the limited volume of ground measurements.
High resolution datasets are openly available and are provided as "Supplementary Files and Materials" alongside this article.High resolution datasets are also available in AquaKnow [59], a platform managed by the Joint Research Centre of the European Commission, dedicated to technical and scientific knowledge for the sustainable development of the water sector.
The next steps of this work include: (1) the implementation of a downscaling scheme at monthly time step, exploiting better the vegetation response to precipitation; and (2) the computation of the Budyko curve for mean annual and monthly water balance.

Figure 1 .
Figure 1.Flowchart of the scheme used to downscale different satellite-based precipitation datasets.CR on the left bar stands for Coarse Resolution.

Figure 2 .
Figure 2. Example of Global regression of annual precipitation for Colombia, where we assume that the relationship among Mean Annual Precipitation (MAP), Digital Elevation Model (DEM) and Enhanced Vegetation Index (EVI) is spatially homogeneous.(a) represents the MAP from Tropical Rainfall Measuring Mission (TRMM), (b) represents the MAP explained by a global model of DEM and EVI, (c) shows the residual of the model.Note that (a) and (c) exhibit the same spatial patterns.

Figure 3 .
Figure 3. Geographically Weighted Regression scheme.Geographical weighting is achieved by a fixed Gaussian kernel function with a given bandwidth moving across the spatial domain.Bandwidth determines the rate at which the weights decay around each cell, and reflects the degree of spatial variation: if the bandwidth gets larger, the model will tend to a global regression model.

Figure 4 .
Figure 4. Residual of the Geographically Weighted Regression (GWR) method, i.e., the difference between remotely sensed and explained precipitation across Colombia with CHIRP dataset.

Figure 5．
Figure 5．Mean Annual Precipitation across South America using the CHIRP dataset.Green points represent the rain gauge stations against with the high resolution MAP product has been validated.

Figure 6
Figure6shows the output of the downscaling scheme for CHIRP dataset across Colombia.

Figure 6 .
Figure 6.Mean Annual Precipitation across Colombia using the CHIRP dataset.

Figure 7 .
Figure 7. Scatterplot of Rain gauges versus Satellite-derived (RS-derived) precipitation for the entire South America.Reddish (bluish) areas in the plots indicate a higher (lower) density of sample points.

Figure 8 .
Figure 8. Box-and-Whiskerplot of the difference between downscaled and observed precipitation across the entire South America for different RS products.

Figure 9 .
Figure 9. Box-and-Whiskerplot of the difference between downscaled and observed precipitation for different elevation classes across the entire South America.

Figure 10 .
Figure 10.Box-and-Whiskerplot of the difference between downscaled and observed precipitation for different climate classes across the entire South America.

Figure 11 .
Figure 11.Scatterplot of Rain gauges vs. Satellite-derived precipitation for Venezuela.

Figure 12 .
Figure 12.Example of difference between for the Bogotá river Basin in Colombia.Maps represent the absolute value of the difference between RS-derived product and observed rainfall, whereas the text annotations indicate average (mean) and standard deviation (std) values of the latter.

Figure 13 .
Figure 13.MAP of West Africa at 1 km using GPCC dataset.Green points represent the rain gauge stations against with the high resolution MAP product has been validated.

Figure 14 .
Figure 14.Scatterplot of Rain gauges versus Satellite-derived precipitation for West Africa.

Figure 15 .
Figure 15.Box-and-Whiskerplot of the difference between downscaled and observed precipitation across West Africa for different RS products.

Figure 16 .
Figure 16.Box-and-Whiskerplot of the difference between downscaled and observed precipitation across West Africa for different climate classes.

Table 1
Overview of widely known precipitation products from satellite estimates (Sat.) and ground rain gauge observations (Gr.).

Table 2 .
Statistical indicators of Satellite-derived dataset versus rain gauge stations across nine validation countries.Numbers in parentheses in the first row refer to the number of rain gauge station employed for each country.

Table 3 .
Statistical indicators of Satellite-derived dataset versus rain gauge stations across nine validation countries.Numbers in parentheses in the first row refer to the number of rain gauge station employed for each country.