Performance of TRMM TMPA 3B42 V7 in Replicating Daily Rainfall and Regional Rainfall Regimes in the Amazon Basin (1998–2013)

Knowledge and studies on precipitation in the Amazon Basin (AB) are determinant for environmental aspects such as hydrology, ecology, as well as for social aspects like agriculture, food security, or health issues. Availability of rainfall data at high spatio-temporal resolution is thus crucial for these purposes. Remote sensing techniques provide extensive spatial coverage compared to ground-based rainfall data but it is imperative to assess the quality of the estimates. Previous studies underline at regional scale in the AB, and for some years, the efficiency of the Tropical Rainfall Measurement Mission (TRMM) 3B42 Version 7 (V7) (hereafter 3B42) daily product data, to provide a good view of the rainfall time variability which is important to understand the impacts of El Nino Southern Oscilation. Then our study aims to enhance the knowledge about the quality of this product on the entire AB and provide a useful understanding about his capacity to reproduce the annual rainfall regimes. For that purpose we compared 3B42 against 205 quality-controlled rain gauge measurements for the period from March 1998 to July 2013, with the aim to know whether 3B42 is reliable for climate studies. Analysis of quantitative (Bias, Relative RMSE) and categorical statistics (POD, FAR) for the whole period show a more accurate spatial distribution of mean daily rainfall estimations in the lowlands than in the Andean regions. In the latter, the location of a rain gauge and its exposure seem to be more relevant to explain mismatches with 3B42 rather than its elevation. In general, a good agreement is observed between rain gauge derived regimes and those from 3B42; however, performance is better in the rainy period. Finally, an original way to validate the estimations is by taking into account the interannual variability of rainfall regimes (i.e., the presence of sub-regimes): four sub-regimes in the northeast AB defined from rain gauges and 3B42 were found to be in good agreement. Furthermore, this work examined whether TRMM 3B42 V7 rainfall estimates for all the grid points in the AB, outgoing longwave radiation (OLR) and water vapor flux patterns are consistent in the northeast of AB.


Study Area
The AB is located in the intertropical zone, between 6°N and 20.5°S and between 48.5°W and 80.5°W (Figures 1 and 2). The AB presents strong topographic contrasts between highlands in the Andean mountains in its western and southwestern parts and lowlands in its central, northern, and eastern parts. The South American Monsoon System (SAMS) [23][24][25][26] is the most important climatic feature affecting the region. During the rainy season, typically spanning from October to March, the South American continent warms up and generates the advection of humidity from the northern tropical Atlantic to the AB and until the La Plata basin by the low level jet (LLJ; Figure 1). This increase in humidity in the La Plata basin helps support the South Atlantic Convergence Zone (SACZ) and its associated heavy rainfall. Conversely, from April to September the SAMS moves into its dry phase in most of the AB. The large latitudinal amplitude and the variety of topographical characteristics give rise to a number of rainfall regimes in the AB poorer [27][28][29], among others]: equatorial in the west and along the Negro Basin ( Figure 1); with a dry and a rainy season in the tropical regions of the basin. Over the Atlantic Ocean the convergence of northern and southern trade winds forms the Intertropical Convergence Zone (ITCZ; Figure 1), characterized by strong convection. The equatorial and tropical Atlantic also warm up at the end of the austral summer leading to the southward migration of the ITCZ, and the triggering of the rainy season in the northeast AB.

Observed Precipitations: Rain Gauge Data
The network of observed daily data used as a reference to validate 3B42 estimated data consists of 205 rain gauges (geographically distributed as shown in Figure 2) that have less than 20% missing values and have been quality controlled [30,31]. However, the lack of rain gauges metadata avoid to provide information like the kind device, and then know their accuracy, which could be of great interest when comparing observed and estimated data.

Observed Precipitations: Rain Gauge Data
The network of observed daily data used as a reference to validate 3B42 estimated data consists of 205 rain gauges (geographically distributed as shown in Figure 2) that have less than 20% missing values and have been quality controlled [30,31]. However, the lack of rain gauges metadata avoid to provide information like the kind device, and then know their accuracy, which could be of great interest when comparing observed and estimated data.
The study covers the period from March 1998 to July 2013. These data were obtained from the National Water Agency (ANA) and the National Meteorological Institute (INMET) in Brazil, the National Meteorological and Hydrological Institute (INAMHI) in Ecuador, the Hydrological Meteorological and Environmental Studies Institute (IDEAM) in Colombia, and the National Hydrological and Meteorological Service (SENAMHI) in Peru and Bolivia. Unfortunately, no such data could be collected in Venezuela.

Estimated Precipitations: TRMM TMPA 3B42 Version 7 Daily Product
The 3B42 daily product is computed using the TRMM Multisatellite Precipitation Analysis (TMPA) algorithm, developed by the NASA Goddard Space Flight Center [32]. It consists in a gridded precipitation product with a spatial resolution of 0.25 • × 0.25 • (approximately 25 × 25 km near the equator) and is available between 50 • N and 50 • S. This product is gauge-adjusted, but the independence between estimated and observed data is assumed because rain gauge network used in 3B42 comes from monthly estimates from the Global Precipitation Climatology Centre product, which only have few gauges in the AB. The TMPA algorithm combines multiple independent precipitation estimates from multiple instruments on board: the Advanced Microwave Scanning Radiometer for Earth Observing Systems (AMSR-E), Special Sensor Microwave Imager (SSMI), Special Sensor Microwave Imager/Sounder (SSMIS), Advanced Microwave Sounding Unit (AMSU), and Microwave Humidity Sounder (MHS), each calibrated to the TRMM Combined Instrument (TCI). Coverage gaps in space and time are completed with merged and microwave-adjusted geo-infrared (IR) data.

Outgoing Longwave Radiation and Water Vapor Flux
The satellite measures of OLR were obtained from the National Center for Atmospheric Research of the National Ocean and Atmospheric Administration (NCAR/NOAA) [33] and water vapor flux reanalysis [34] is from the National Center for Environmental Prediction (NCEP{XE "NCEP-NCAR" \t "National Center for Environnemental Prediction-National Center for Atmospheric Research"}). The water vapor flux is computed from the specific humidity and the horizontal wind (zonal -u-) from the ground to 300 h Pa [35] as: where q is the humidity, V is the wind vector, p is the pressure, and g is the acceleration due to gravity.
where Pe, I is the estimate and Po, I the observation for the i-th day; N is the total number of days, / is the mean observed precipitation in the whole period, H (hit) is a precipitation event observed by the rain gauge and also detected by 3B42; M (miss) is a precipitation event observed by the rain gauge but not detected by 3B42; F (false alarm) is a precipitation event detected by 3B42 but not observed by the rain gauge. The bias indicates the overestimation or the underestimation by 3B42 in percentage. The relative RMSE gives an average of the error of 3B42, in millimeter (mm). The POD measures the number of rainfall events correctly detected by the estimated product with values ranging between 1 (a perfect score) and 0. The FAR measures the fraction of wrong events detected by 3B42 with values ranging between 1 (the worst score) and 0. The threshold for determining the occurrence of a rain event is set here at 0.1mm in order to avoid the difference in measurement resolution between the two datasets. Indeed, the data from the rain gauges have a resolution of one decimal while 3B42 has several decimals. It is therefore necessary to take it into account in order not to bias the FAR.
In order to assess how well the estimated product reproduces the diversity of rainfall regimes within the AB, the performance of 3B42 is assessed using the standard statistical methods above mentioned at regional scale and for each month of the year. Results of the comparisons are examined, whenever relevant, in the context of the geographical characteristics of the stations (elevation, windward or leeward position).

Rainfall Regimes
In addition to the time-averaged, global 3B42 quality assessment, observed and estimated regional mean annual rainfall and regimes have been computed for five regions, defined using a spectral clustering method [37,38], within the AB (Figure 2). The main principle of spectral clustering is to represent all rain gauges in separate nodes of a connected graph whose vertexes express the similarity between two nodes. The spectral analysis of this graph enables to isolate its main consistent groups. To compute the connection between two nodes into this graph, the basic solution consists in

Daily Rainfall Values
Because of the impossibility to extrapolate the scarce rain gauge network, the comparison between observed and estimated data is based on a point-to-pixel approach, using the pixel that contains the rain gauge (hereafter the 3B42 pixel). This approach was already used in the AB in previous studies and gave accurate results [21,36]. Furthermore, Demirtas et al. showed that quantitative precipitation forecast verification with either a grid to grid or grid to point approach give close results.
The quantitative statistics used in this work are the mean daily rainfall, the Bias Equation (1) and the relative root-mean-square error (relative RMSE, Equation (2)) while the categorical statistics are the probability of detection (POD, Equation (3)) and the False Alarm Ratio (FAR, Equation (4)). These statistics are defined as: where Pe, I is the estimate and Po, I the observation for the i-th day; N is the total number of days, /Po is the mean observed precipitation in the whole period, H (hit) is a precipitation event observed by the rain gauge and also detected by 3B42; M (miss) is a precipitation event observed by the rain gauge but not detected by 3B42; F (false alarm) is a precipitation event detected by 3B42 but not observed by the rain gauge. The bias indicates the overestimation or the underestimation by 3B42 in percentage. The relative RMSE gives an average of the error of 3B42, in millimeter (mm). The POD measures the number of rainfall events correctly detected by the estimated product with values ranging between 1 (a perfect score) and 0. The FAR measures the fraction of wrong events detected by 3B42 with values ranging between 1 (the worst score) and 0. The threshold for determining the occurrence of a rain event is set here at 0.1mm in order to avoid the difference in measurement resolution between the two datasets. Indeed, the data from the rain gauges have a resolution of one decimal while 3B42 has several decimals. It is therefore necessary to take it into account in order not to bias the FAR. In order to assess how well the estimated product reproduces the diversity of rainfall regimes within the AB, the performance of 3B42 is assessed using the standard statistical methods above mentioned at regional scale and for each month of the year. Results of the comparisons are examined, whenever relevant, in the context of the geographical characteristics of the stations (elevation, windward or leeward position).

Rainfall Regimes
In addition to the time-averaged, global 3B42 quality assessment, observed and estimated regional mean annual rainfall and regimes have been computed for five regions, defined using a spectral clustering method [37,38], within the AB (Figure 2). The main principle of spectral clustering is to represent all rain gauges in separate nodes of a connected graph whose vertexes express the similarity between two nodes. The spectral analysis of this graph enables to isolate its main consistent groups. To compute the connection between two nodes into this graph, the basic solution consists in computing a simple Euclidian distance. However, in order to estimate clusters separated in a non-linear way, we exploit the kernel trick. The idea consists in projecting data in another space than the usual one (represented by multi-variate vectors where each component is a value of precipitation) where the separation between clusters is linear. Under some specific properties (see Camps-Valls et Bruzzone [38] for a complete theory of kernels), this projection can simply be done by changing the way one computes the connection between nodes. In practice, this is done using a Gaussian kernel where the connection between two rain gauges x 1 and x 2 is: where σ is a parameter to fix. It has been proven that this kernel enables to efficiently separate highly non-linear clusters.
The determination of the optimal number of clusters is an open problem for which no sound solution exists at the moment. In this study, we rely on the intra/inter inertia. More precisely, a reliable clustering should reveal both homogeneity inside clusters (all stations of the same group are similar) and heterogeneity between averaged clusters (all clusters represent different groups). Therefore, the ratio between the inertia among (averaged) clusters and the internal inertia (sum on inertia inside all groups) should be maximal. The ratio find here is reached when defining 12 clusters. Afterward, small clusters (less than five rain gauges) are gathered with the nearest cluster (taking into account the spatial and interclass variance). The resulting regionalization in Figure 2 is consistent with former studies, showing for example the separation between tropical and equatorial regions and between highlands and lowlands [27,29,39,40]. Finally, these regions, have similar rainfall regimes, and were named according to their relative position within the AB-northeast, south, north, center, and west-as shown in Figure 2.
Then, sub-regimes for each region, based on spectral clustering approach, were also defined in order to assess the interannual variability of regional rainfall patterns. We obtained, for each region, several groups of years-or, clusters-presenting the same type of rainfall sub-regime. Next, based on these clusters rainfall sub-regimes were computed using 3B42. Notice that different clusters not necessarily include the same number of years Finally, water vapor fluxes and outgoing longwave radiation (OLR) anomalies were compared to precipitation patterns derived from 3B42 all grid point in order (1) to confirm the precipitation patterns and (2) to give some explanations to the observed anomalies. Indeed, OLR is commonly used as proxy for convection in the Tropical region and water vapor fluxes in the AB play an important role in the precipitation variability, and OLR is commonly used as proxy for convection in the Tropical region.

Comparison between Points and Pixels at Annual and Monthly Time Scales
At large scale, for the whole AB, 3B42 reproduces well the spatial distribution of mean daily rainfall (Figure 3a,b), with the highest rates of rainfall near the equator while rainfall decreases southward and northward at tropical latitudes. The lowest values are in the Andes, as shown previously. However, 3B42 tends to overestimate rainfall in the lowlands. In the mountains the results vary from one region to another. Indeed, differences can be observed between the northern, central and southern parts of the Andes. In the central part of the Andes, the underestimation is stronger than in the northern part, and rainfall is overestimated in the southern part of the Andes.
The other indicators also depict these differences between lowlands and highlands and within the highlands. The most relevant is the bias (Figure 4a In the Andes the bias are positive (until +161%) in most stations of Ecuador (northern Andes), which is coherent with Zubieta et al. [22] and Zulkafli et al. [12] and Bolivia (southern Andes), while the bias is generally negative (until −45.5%) in Peru (Central Andes). This contrast between northern and central Andes was also observed by Zulkafli et al. [12] while the heterogeneity of the bias was found by Salio et al. [13] for the Brazilian Remote Sens. 2018, 10, 1879 7 of 20 and Bolivian Amazon. Two main hypotheses can be proposed to explain these bias. The first is the location and the leeward or windward exposition of the rain gauges in each region, this point will be discussed later. However, as it is unlikely that all the stations are located leeward or windward, another possibility is that the type of rain gauge differs between the countries and/or during the time series, and have different calibration. Indeed, some rain gauges, those using tipping bucket, can have less precision due to the time to fill the bucket and then to make it tip and register the amount of precipitation during light rainfall. Furthermore, a high evaporation can also cause a lower accuracy of the rain gauge by avoiding a good tipping of the bucket [41,42], because of a same order of magnitude of the precipitation and evaporation rate. Unfortunately, we do not have such information about the rain gauge stations.  In the Andes the bias are positive (until +161%) in most stations of Ecuador (northern Andes), which is coherent with Zubieta et al. [22] and Zulkafli et al. [12] and Bolivia (southern Andes), while the bias is generally negative (until −45.5%) in Peru (Central Andes). This contrast between northern and central Andes was also observed by Zulkafli et al. [12] while the heterogeneity of the bias was found by Salio et al. [13] for the Brazilian and Bolivian Amazon. Two main hypotheses can be proposed to explain these bias. The first is the location and the leeward or windward exposition of the rain gauges in each region, this point will be discussed later. However, as it is unlikely that all the stations are located leeward or windward, another possibility is that the type of rain gauge differs between the countries and/or during the time series, and have different calibration. Indeed, some rain gauges, those using tipping bucket, can have less precision due to the time to fill the bucket and then to make it tip and register the amount of precipitation during light rainfall. Furthermore, a high evaporation can also cause a lower accuracy of the rain gauge by avoiding a good tipping of the bucket [41,42], because of a same order of magnitude of the precipitation and evaporation rate. Unfortunately, we do not have such information about the rain gauge stations.
The RMSE (Figure 4b) for most stations was found to be smaller than 3 mm/day, but in some Andean stations and in the southern tropics the RMSE reached values as high as 7 mm/day.
Better results are also observed in the lowlands with the categorical statistics. Indeed, the POD values indicate a good detection (POD > 0.7) of rainy events by the estimated product along a diagonal from northwestern to southeastern Amazon, i.e., along the mean position of the South Atlantic Convergence Zone that produces heavy rainfall in summer (Figure 5a). There is a correct detection of rainy events in the central and southern Andes. The worst values are observed in the northern part of the Andes, in the Bolivian lowlands, and northeast AB. range of rainfall regimes in the AB, these time differences could represent only a part of the intraannual variability, then, a further spatial analysis is needed to better known if that variability is regionally dependent.  The RMSE (Figure 4b) for most stations was found to be smaller than 3 mm/day, but in some Andean stations and in the southern tropics the RMSE reached values as high as 7 mm/day. Better results are also observed in the lowlands with the categorical statistics. Indeed, the POD values indicate a good detection (POD > 0.7) of rainy events by the estimated product along a diagonal from northwestern to southeastern Amazon, i.e., along the mean position of the South Atlantic Convergence Zone that produces heavy rainfall in summer (Figure 5a). There is a correct detection of rainy events in the central and southern Andes. The worst values are observed in the northern part of the Andes, in the Bolivian lowlands, and northeast AB.   Furthermore, as rainfall changes during the year, we computed the Bias, the relative RMSE, the POD and the FAR ( Figure 6) at monthly time scale. The results show a contrast between the rainy and the dry seasons, with a higher occurrence of low POD (until 0) and high FAR (until 1), a higher bias (until 300%) and RMSE (until 24 mm) during the driest months. However, since it exists a large range of rainfall regimes in the AB, these time differences could represent only a part of the intra-annual variability, then, a further spatial analysis is needed to better known if that variability is regionally dependent.

Regional and Time Analysis of TRMM 3B42 V7 Performance
At large scale, for the whole AB, 3B42 reproduces well the spatial distribution of mean daily rainfall (Figure 3a,b), with the highest rates of rainfall near the equator while rainfall decreases toward the tropics. Since the performance of 3B42 varies across the AB the statistical tools are computed at regional and annual time scale. Moreover, as in the previous section they are now computed month by month to examine whether there are any seasonal differences. The regions resulting from the spectral clustering analysis (Figure 2) may be divided into two groups: one where rainy and dry seasons alternate (Figure 7, regions northeast, south, north), and the other without pronounced dry or rainy season (Figure 7, center, west). Remote Sens. 2018, 10, x FOR PEER REVIEW 9 of 21

Regional and Time Analysis of TRMM 3B42 V7 Performance
At large scale, for the whole AB, 3B42 reproduces well the spatial distribution of mean daily rainfall (Figure 3a,b), with the highest rates of rainfall near the equator while rainfall decreases toward the tropics. Since the performance of 3B42 varies across the AB the statistical tools are computed at regional and annual time scale. Moreover, as in the previous section they are now computed month by month to examine whether there are any seasonal differences. The regions resulting from the spectral clustering analysis (Figure 2) may be divided into two groups: one where rainy and dry seasons alternate (Figure 7, regions northeast, south, north), and the other without pronounced dry or rainy season (Figure 7, center, west).

Annual Rainfall in the Regions
In addition to the time-averaged, global 3B42 quality assessment, observed and estimated regional mean annual rainfall and regimes have been computed for five regions within the AB. For all regions the mean annual precipitation was overestimated by 3B42 (Table 1). However, this overestimation is weak in the north and center regions, while south and west regions have the worst results with an average of 6% overestimation.  3.2.2. Regional Rainfall Regimes Figure 7 shows a close reproduction, for all regions, of the mean annual rainfall regimes by 3B42 (despite overestimation in the total mean) with accurate distinction between rainy and dry seasons. The overestimation depends on the month and varies regionally as well: in south, north, and northeast regions rainfall is overestimated during the rainiest months, whereas in center and west regions, the strongest overestimation is observed from March to July.  Figure 2 for the AB. The x-axis represents each month of the year, y-axis represents the rainfall in millimeters per month. Blue bars represent the observed rainfall and red ones the estimated rainfall.
Furthermore, the statistical parameters show, as for the whole Amazon ( Figure 6), a contrast between rainy and dry seasons ( Figure 8) with a higher occurrence of low POD and high FAR, a higher bias and RMSE during the driest months. However, that contrast presents a spatial dependence: during their respective driest months the worst bias are near by +5% for the north, below 10% for the center and the northeast and between +20 and 25% for the south and the west region. In this later region, a higher bias during the dry season is coherent with the observation of Zubieta et al.  Figure 2 for the AB. The x-axis represents each month of the year, y-axis represents the rainfall in millimeters per month. Blue bars represent the observed rainfall and red ones the estimated rainfall.

Annual Rainfall in the Regions
In addition to the time-averaged, global 3B42 quality assessment, observed and estimated regional mean annual rainfall and regimes have been computed for five regions within the AB. For all regions the mean annual precipitation was overestimated by 3B42 (Table 1). However, this overestimation is weak in the north and center regions, while south and west regions have the worst results with an average of 6% overestimation. Table 1. Comparison between rain gauge and 3B42 mean annual rainfall (in mm), in each region of the AB. Differences are computed with respect to rain gauge mean. 3.2.2. Regional Rainfall Regimes Figure 7 shows a close reproduction, for all regions, of the mean annual rainfall regimes by 3B42 (despite overestimation in the total mean) with accurate distinction between rainy and dry seasons. The overestimation depends on the month and varies regionally as well: in south, north, and northeast regions rainfall is overestimated during the rainiest months, whereas in center and west regions, the strongest overestimation is observed from March to July.

Region
Furthermore, the statistical parameters show, as for the whole Amazon ( Figure 6), a contrast between rainy and dry seasons ( Figure 8) with a higher occurrence of low POD and high FAR, a higher bias and RMSE during the driest months. However, that contrast presents a spatial dependence: during their respective driest months the worst bias are near by +5% for the north, below 10% for the center and the northeast and between +20 and 25% for the south and the west region. In this later region, a higher bias during the dry season is coherent with the observation of Zubieta et al. [22] in the Ecuador (until +25%) from 2003 to 2009. However, in each region during their driest months the Bias have a large range of values. On the contrary, during their rainiest months, the bias has lower range of value and the results are better in all the region. These poorer and changing results during the regional driest period, as well as the contrast with the rainiest period (which has better results), are observed for all statistical parameters here ( Figure 8).
Once again, the type of rain gauge used could, in part, explain the errors and differences between the regions. Indeed, during the dry season, in the case of tipping bucket rain gauges, if the rainfall is heavy and the bucket take time to fill and tilt, a part of the rainfall can be evaporated [41,42] and not recorded while 3B42 registered this rainfall amount. Unfortunately, because of a lack of information we were not able to take it into account and understand how it may influence the results. However, the type of cloud organization in winter and the difference in spatial resolution of the two datasets [13,16] also appeared to make clear a part of the results. During the dry season, the convection is local, sparse and rainy events are scarcer than during the rainiest months especially in the southern Amazon (south of 6 • S) and in the eastern equatorial part of the Amazon [43]. Then a plausible explanation would be that because of its spatial resolution, 3B42 is able to detect a rainy event while the rain gauge cannot, but conversely 3B42 may not detect an event if the convection is too shallow. Additionally, because of the low number of rainy events during the driest months, errors are relatively higher than in other seasons. Conversely, good results in the central region may be due to low seasonal rainfall contrast and deep convection. Future works will further investigate this aspect using cloud cover product as Durieux et al. [44,45] who large range of rainfall examined 10 years of three-hourly infrared data from the International Satellite Cloud Climatology Project. Remote Sens. 2018, 10, x FOR PEER REVIEW 12 of 21   In mountainous regions (the western part of the south region and the west region), the accuracy of the estimates tends to decrease with elevation [12,16]. To investigate this issue in more detail the correlation between all the statistical parameters and the elevation was examined. The results show that only POD is slightly but significantly correlated to the elevation (Figure 9c). A possible interpretation would be that at high elevation rainy events are rarer but convective systems are well organized during the summer, which allows a good detection of the events by the satellites and POD scores higher than 0.4. The other parameters do not show a significant relationship between the elevation and the quality of 3B42 estimates, in agreement with Turko who found similar results for 3B42 estimates in the Bolivian Andes.
Some other explanation of the poor results in the Andean region could be the location of the rain gauge itself and its exposure. Indeed, in the western region and in the western part of the southern region, the stations where 3B42 overestimates rainfall are located either at the bottom of a valley or are leeward (Figure 10a,b respectively). One potential interpretation is the alteration of measurements due to evapotranspiration. Indeed, satellites measure rainfall at a higher elevation that rain gauges, then if the evaporation is important, it can avoid the precipitation to hit the ground, like in the case of virga cloud, and then cause a difference of recorded value between observations and estimations [46]. Conversely, due to the orographic effect, some precipitations can form close to the ground by advection of humid air on the hillside well exposed to the wind; in that case, the station is able to catch such precipitations, while satellites are not. Another possibility is that strong winds blow the rainfall and avoid it to enter in the rain gauges when this one are exposed to these winds. However we do not have this wind exposition information. At topographic scale, the orography also seems to explain some of 3B42 overestimation. Figure 10a presents for instance a case of stations sheltered by a mountain range. Main winds from the northwest are blocked by this rock and form a rainfall hotspot on the northern slope [18].

Regional Rainfall Sub-Regimes in the Northeastern Region of the AB
The interannual regional rainfall variability sub-regimes were first analyzed using the rain gauge network. For each region, between two and four groups of years presenting the same type of rainfall sub-regime were defined using the spectral clustering method. These sub-regimes were assumed to depict the interannual variability of regional precipitation. Hereafter we present the results for the northeastern region (Figure 2), as it contains a relatively large number of rain gauges, and good 3B42 performance. Table 2 shows the clusters and the years belonging to each cluster and Table 3 rain gauge and 3B42 mean annual rainfall (in mm) for each rainfall sub-regime of the northeastern region. For each rainfall sub-regime, 3B42 tends to overestimate by 2 to 4% the annual precipitation (Table 3), which is coherent with the analysis of the mean regional rainfall regime ( Figure 5). Table 2. Years included in each rainfall sub-regime (Cl1 to Cl4) detected on the rain time series, by spectral clustering method, in the northeastern region of the AB. 3.2.3. Regional Rainfall Sub-Regimes in the Northeastern Region of the AB The interannual regional rainfall variability sub-regimes were first analyzed using the rain gauge network. For each region, between two and four groups of years presenting the same type of rainfall sub-regime were defined using the spectral clustering method. These sub-regimes were assumed to depict the interannual variability of regional precipitation. Hereafter we present the results for the northeastern region (Figure 2), as it contains a relatively large number of rain gauges, and good 3B42 performance. Table 2 shows the clusters and the years belonging to each cluster and Table 3 rain gauge and 3B42 mean annual rainfall (in mm) for each rainfall sub-regime of the northeastern region. For each Remote Sens. 2018, 10, 1879 13 of 20 rainfall sub-regime, 3B42 tends to overestimate by 2 to 4% the annual precipitation (Table 3), which is coherent with the analysis of the mean regional rainfall regime ( Figure 5). Table 2. Years included in each rainfall sub-regime (Cl1 to Cl4) detected on the rain time series, by spectral clustering method, in the northeastern region of the AB.
Year Cluster  Table 3. Rain gauge and 3B42 mean annual rainfall (in mm) for each rainfall sub-regime of the northeastern region. Cl1  2181  2234  2  Cl2  2170  2261  4  Cl3  2545  2607  2  Cl4  2369  2419  2 Despite the overestimation in the mean annual totals, Figure 11 shows that the annual cycles of rainfall of observed and estimated precipitation are similar for the four rainfall sub-regimes in the northeastern region. The seasonal and intra-seasonal inflections of the curve are well reproduced by 3B42. In comparison with a traditional analysis of annual precipitations, this sub-regime approach shows the irregular distribution errors along the year. Furthermore, and in contrast with Figure 7a, the mismatches depend on the rainfall sub-regime and not on a specific month or period of the year. The highest overestimation of annual precipitation in Cl2 (Table 3) appears to be caused by a slight but persistent overestimation all year long (Figure 11), while the strongest overestimation between May and July in Cl3 is smoothed by underestimation during the other months of the year. Despite these differences, the reproduction of precipitation patterns by 3B42 is quite accurate.

Comparisons between 3B42, Water Vapor Flux, and OLR Spatial Pattern Anomalies
Up to now, a point-to-pixel approach was used in order to validate 3B42 estimates for the AB, however, there still remains the question whether 3B42 delivers spatially coherent patterns if we consider all grid points (i.e., grid points beyond the pixels used for the comparisons with ground station).
To investigate this issue, the monthly anomalies derived from the full grid of 3B42 (hereafter 3B42 grid) were analyzed for the northeastern region. OLR anomalies were then used to confirm the spatial pattern of 3B42 grid anomalies and water vapor flux to try to explain these anomalies. Monthly anomalies of these variables correspond to the difference between the mean value of the years of the rainfall sub-regime and the mean value of the period 1998-2013 and only the significant anomalies values (higher than 1 standard deviation) are represented in Figures 12 and 13. The precipitation anomalies are normalized, that is, divided by the standard deviation of the whole study period. May and July in Cl3 is smoothed by underestimation during the other months of the year. Despite these differences, the reproduction of precipitation patterns by 3B42 is quite accurate. Figure 11. Rainfall sub-regimes Cl1 to Cl4, in the northeastern region of AB from rain gauge (blue line), 3B42 pixel (red line), and mean annual regime of the region from rain gauge (grey line).

Comparisons between 3B42, Water Vapor Flux, and OLR Spatial Pattern Anomalies
Up to now, a point-to-pixel approach was used in order to validate 3B42 estimates for the AB, however, there still remains the question whether 3B42 delivers spatially coherent patterns if we consider all grid points (i.e., grid points beyond the pixels used for the comparisons with ground station).
To investigate this issue, the monthly anomalies derived from the full grid of 3B42 (hereafter 3B42 grid) were analyzed for the northeastern region. OLR anomalies were then used to confirm the spatial pattern of 3B42 grid anomalies and water vapor flux to try to explain these anomalies. Monthly anomalies of these variables correspond to the difference between the mean value of the years of the rainfall sub-regime and the mean value of the period 1998-2013 and only the significant anomalies values (higher than 1 standard deviation) are represented in Figures 12 and 13. The precipitation anomalies are normalized, that is, divided by the standard deviation of the whole study period.
The analysis was performed in all the cluster sub-regimes [37], but we focus on the sub-regime Cl1 and Cl4 which are contrasted. Indeed, Cl1 has a shorter peak period than the average cycle of the region (grey line), and Cl4 on the contrary, has a longer rainfall peak ( Figure 11). Furthermore, the The analysis was performed in all the cluster sub-regimes [37], but we focus on the sub-regime Cl1 and Cl4 which are contrasted. Indeed, Cl1 has a shorter peak period than the average cycle of the region (grey line), and Cl4 on the contrary, has a longer rainfall peak ( Figure 11). Furthermore, the explanations of the anomalies by the atmospheric circulation are very clear for these two clusters, which are chosen as examples. Cl3 only contains one year which can confuse the results Figures 12 and 13 show for sub-regimes Cl1 and Cl4, composite maps of normalized monthly anomalies of 3B42 grid and OLR and water vapor flux monthly anomalies. The blue polygon indicates the northeastern region within the AB. A positive (negative) OLR anomaly in warm (cold) color indicates less (more) convection and consequently, a lower (higher) probability of positive (negative) rainfall anomalies. The water vapor flux anomalies represented by vectors indicate if more or less humidity is advected to the AB; a lower (higher) transport of humidity is consistent with lower (higher) convection.
Composites for Cl1 show that 3B42 grid rainfall anomalies are in good agreement with the previous rainfall sub-regimes found at rain gauge and 3B42 pixel. The shorter rainfall peak observed in Figure 11 is in agreement with negative precipitation anomalies in the 3B42 grid in the northeastern region. These anomalies may be homogeneously distributed in the region, as in January and June, or concentrated in certain areas (e.g., in December and May). This highlights the high spatial rainfall variability and the difficulty to take it into account with the rain gauge network only. The anomalies of the 3B42 and OLR composites also are in accordance ( Figure 12) and show strong negative anomalies of convection in the northeastern region in December-January and May-June (Figure 12), which is in line with a reduced peak of rainfall in this region occurring usually from December to May. The deficit of precipitation and positive OLR anomalies are consistent with the reduced and divergent water vapor flux from North Tropical Atlantic (NATL). See for instance the lack convergent water vapor fluxes in December-January and May-June over the region, while convergent fluxes are observed from February to April when enhanced precipitation is observed. The excess of rainfall in February-March, during the rainfall peak (Figure 11), appears to be linked with an area with positive rainfall (Figure 11) close to the coast that is strongly influenced by the water vapor flux from tropical ocean. These fluxes are slightly reinforced in March and may explain the enhanced rainfall during this period of the year.
Another feature that appears using the 3B42 grid is the opposed inland and coastal anomalies inside the region (e.g., in March and April, Figure 12).  The precipitation anomalies in the 3B42 grid composites of the rainfall sub-regime Cl4 also highlight the importance of the local scale compared to the regional scale. Indeed, during the longer and higher than usual rainy peak of Cl4, with strong surpluses of precipitation in January and from March to June (Figure 11), rainfall anomalies appear in the 3B42 grid composites but with an inhomogeneous spatial distribution ( Figure 13). In contrast, in February the slight rainfall deficits in rainfall sub-regime 3B42 pixel ( Figure 11) also appears in the 3B42 grid, but the latter have a more spatial homogeneous pattern. The positive rainfall anomaly is connected to negative OLR anomalies ( Figure 12) on the northeast of the AB and on the Atlantic ITCZ in January and from March to June. The convection increase is related to the convergence in the northeastern region between water vapor flux anomalies from the west, the north and the southeast, as well as a weaker than usual export of humidity by the low-level jet to the La Plata Basin. On the contrary, in February even if strong convection is present on the Northeast AB region the water vapor fluxes are reduced which is coherent with the rainfall deficits of 3B42 grid in this region in years where Cl4 dominates.

Conclusions
The 3B42 daily precipitation product is an incredibly valuable product as it provides spatial rainfall information for the whole AB, where the access to ground observed data is particularly complicated. However, as they are estimated, some bias and errors exist that need to be assessed. The evaluation of the 3B42 quality was done by means of quantitative (mean daily rainfall, bias, relative RMSE) and categorical statistics (POD and FAR), considering daily data in the whole basin. These statistical parameters showed monthly differences with a contrast between the dry and rainy period. This last one presents the better results. In relation to these intra-annual contrasts, we focused our evaluation on regional rainfall regimes and sub-regimes which have not been shown in any previous studies The average bias is +7% (overestimation), but large errors ranging from −45% to +161% are observed in the Andean regions. 3B42 performs better in the detection of rainy events than in the quantitative estimation of rainfall. Also, as observed in previous studies and as shown by all the statistics, 3B42 performs better in the lowlands of the AB than in mountainous regions of the Andes. However this work highlights differences within the high areas: in the northern (Ecuador) and southern Andes (Bolivia) 3B42 tends to overestimate precipitation, whereas in central Andes (Peru) rainfall is underestimated. Furthermore, no correlation is found between the elevation of the stations and the results of the statistical parameters. Warm clouds are frequently assumed to be a factor of the

Conclusions
The 3B42 daily precipitation product is an incredibly valuable product as it provides spatial rainfall information for the whole AB, where the access to ground observed data is particularly complicated. However, as they are estimated, some bias and errors exist that need to be assessed. The evaluation of the 3B42 quality was done by means of quantitative (mean daily rainfall, bias, relative RMSE) and categorical statistics (POD and FAR), considering daily data in the whole basin. These statistical parameters showed monthly differences with a contrast between the dry and rainy period. This last one presents the better results. In relation to these intra-annual contrasts, we focused our evaluation on regional rainfall regimes and sub-regimes which have not been shown in any previous studies The average bias is +7% (overestimation), but large errors ranging from −45% to +161% are observed in the Andean regions. 3B42 performs better in the detection of rainy events than in the quantitative estimation of rainfall. Also, as observed in previous studies and as shown by all the statistics, 3B42 performs better in the lowlands of the AB than in mountainous regions of the Andes. However this work highlights differences within the high areas: in the northern (Ecuador) and southern Andes (Bolivia) 3B42 tends to overestimate precipitation, whereas in central Andes (Peru) rainfall is underestimated. Furthermore, no correlation is found between the elevation of the stations and the results of the statistical parameters. Warm clouds are frequently assumed to be a factor of the difference measured between observed and estimated precipitations. However, others phenomena in AB can also cause differences between observed and estimated data. Indeed, overestimation by 3B42 can results from the capacity of satellite to measure precipitation at high elevation, while the evaporation avoids this same precipitation to hit the ground and then, the rain gauges. Underestimations by 3B42 could depend on orography with a low capacity to detect orographic rainfall forming close to the ground on hillside exposed to the main wind, while the rain gauge succeeds in catching it. Furthermore, we assume that further evaluation can be done by considering several threshold values to determine the occurrence of a precipitation event (here, 0.1mm) to check if underestimation depends on such threshold.
The ability of 3B42 to accurately replicate annual rainfall regimes in the different AB regions was also used as a tool to assess its quality. There is generally a good agreement between regional regimes derived either from rain gauge or 3B42. However, in each region, the performance of 3B42 is lower during the drier period than during the rainy period. That could be partially explained by the difference in spatial resolution of the two datasets (the sparse occurrence of rainfall not being measured by the rain gauges but being captured by 3B42). Also, during the drier period, rainy events are scarcer and the relative error may be larger.
Rainfall sub-regimes from the rain gauge and those from the 3B42 pixels were compared and are in good agreement in the northeastern region, despite an overestimation of 2 to 4% of annual rainfall. The seasonal and intra-seasonal behavior is reproduced well. It is worth noting that over or underestimation of 3B42 pixel do not depend on the months or the season, and differ between the rainfall sub-regimes.
Finally, the adequacy between observed and estimated rainfall sub-regimes were compared looking at the whole northeast grid. Generally, both are in good agreement. However, it can be noted that sometimes precipitations anomalies are not homogeneous in the northeastern region, for instance with differences between inland and coastal sectors. That underlines a high spatial rainfall variability that is difficult to take into account with a rain gauge network only, and highlights the importance of spatially refined data such as 3B42. Spatial and time accuracy of 3B42 grid anomalies are coherent with the water vapor flux and OLR anomalies, despite their different resolutions.
At a daily scale, in most parts of AB, 3B42 provides an exhaustive and quite good precipitation dataset. This higher quantity of information with satisfactory quality emphasizes intraregional precipitation distribution, especially between coastal (often influenced by sea breeze front) and inland sectors. Nevertheless, when using 3B42 for the AB, data should be used with caution in the dry season or in mountainous regions. Beyond these aspects, the upcoming data homogenization between 3B42 and the recent Global Precipitation Measurement Mission will provide a longer and improved rainfall estimated time series.