Assessment of SM 2 RAIN-Derived and State-ofthe-Art Satellite Rainfall Products over Northeastern Brazil

Microwave-based satellite rainfall products offer an opportunity to assess rainfall-related events for regions where rain-gauge stations are sparse, such as in Northeast Brazil (NEB). Accurate measurement of rainfall is vital for water resource managers in this semiarid region. In this work, the SM2RAIN-CCI rainfall data obtained from the inversion of the microwave-based satellite soil moisture (SM) observations derived from the European Space Agency (ESA) Climate Change Initiative (CCI), and ones from three state-of-the-art rainfall products (Climate Hazards Group InfraRed Precipitation with Stations (CHIRPS), Climate Prediction Center Morphing Technique (CMORPH), and Multi-SourceWeighted-Ensemble Precipitation (MSWEP)) were evaluated against in situ rainfall observations under different bioclimatic conditions at the NEB (e.g., AMZ, Amazônia; CER, Cerrado; MAT, Mata Atlântica; and CAAT, Caatinga). Comparisons were made at daily, 5-day, and 0.25◦ scales, during the time-span of 1998 to 2015. It was found that 5-day SM2RAIN-CCI has a reasonably good performance in terms of the correlation coefficient over the CER biome (R median: 0.75). In terms of the root mean square error (RMSE), it exhibits better performance in the CAAT biome (RMSE median: 12.57 mm). In terms of bias (B), the MSWEP, SM2RAIN-CCI, and CHIRPS datasets show the best performance in MAT (B median: −8.50%), AMZ (B median: −0.65%), and CER (B median: 0.30%), respectively. Conversely, CMORPH poorly represents the rainfall variability in all biomes, particularly in the MAT biome (R median: 0.43; B median: −67.50%). In terms of detection of rainfall events, all products show good performance (Probability of detection (POD) median > 0.90). The performance of SM2RAIN-CCI suggests that the SM2RAIN algorithm fails to estimate the amount of rainfall under very dry or very wet conditions. Overall, results highlight the feasibility of SM2RAIN-CCI in those poorly gauged regions in the semiarid region of NEB.


Introduction
Climate variability and extreme weather events threaten many populations around the world [1].Recent evidence has revealed that extreme events, such as droughts and flash floods, are increasing in incidence, causing thousands of casualties and significant damage worldwide every year [2,3].Droughts and floods have received special attention in Northeast Brazil (NEB), because they have been experienced with a higher frequency, spatial extent, severity, and duration [4][5][6].Therefore, the accurate estimation of rainfall is of paramount importance for analyzing the spatial and temporal patterns of rainfall at various scales [7-9], and advancing our understanding of the effect of droughts and floods in Brazil.
Rainfall data from ground stations have been conventionally used to provide local estimates of rainfall amounts [10,11] and are considered the most accurate source of rainfall data for operational drought monitoring [12,13].In NEB, conventional rain gauges are the main source of rainfall data [14,15].Nevertheless, despite the efforts of the National Center for Monitoring and Early Warning of Natural Disasters (CEMADEN), and other state climate agencies, most of the rain-gauge networks currently available are inadequate to produce reliable rainfall analysis, due to their scarce spatial coverage, high-proportion of missing data, and short-length records [16][17][18].
Other alternative sources of rainfall information are: (i) ground-based radar remote sensing, (ii) satellite remote sensing, and (iii) atmospheric reanalysis models [19].In general, weather radars provide high temporal and spatial resolution but they are affected by issues such as beam blockage and frozen hydrometeors, which affect the quality of the rainfall estimates [20,21].Satellites are also capable of sensing large regions with a high temporal and spatial resolution, but satellite retrieval approaches are prone to systematic biases, insensitive to light rainfall, and tend to fail over snow-and ice-covered surfaces [22][23][24].Finally, atmospheric reanalysis models are adequate for simulating the dynamics of large-scale weather systems, but poorly represent the spatiotemporal variability linked with local convection, mainly due to their low spatial resolution and deficiencies in the parameterizations of sub-grid processes [23,25,26].On this basis, the above-mentioned rainfall products are subject to errors and must be validated against rain gauge data in order to assess their uncertainties before being used [27].
Recent studies have been carried out to validate rainfall estimates obtained from some satellite rainfall products over different zones of the NEB.By way of example, Reference [28] evaluated the quality of the Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA) 3B42V6 and 3B42V7 products on a daily and monthly basis by comparing them with gridded ground-based rainfall data distributed in Brazil.TMPA combines rainfall estimates retrieved from Passive Microwave (PMW) and thermal infrared (IR) observations from multiple satellite sensors with radar data from TRMM [29,30].They found that TRMM TMPA performed poorly in coastal areas of the NEB, whereas exhibited good performance in its semiarid zone.In this same region, the monthly rainfall estimates from the CHIRPS (Climate Hazards Group InfraRed Precipitation with Stations) product were compared with ones from the rain gauge's data [18].CHIRPS blends satellite and gauge rainfall estimates using inverse-error weighted averaging to produce an unbiased estimate [31,32].According to these authors, the CHIRPS data correlate well with observations, but tend to overestimate low and underestimate high rainfall values.On the other hand, CHIRPS achieves better results during the wet season, but its ability for the rain detection is poor.Similar results are reported by Reference [33], who applied a comparison and validation among rainfall estimates derived from the Eta/CPTEC (Centro de Previsão de Tempo e Estudos Climáticos) model, the 3B42V7-TRMM and CHIRPS products, and the rainfall data from the INMET (National Institute of Meteorology) meteorological stations located in the south-southeast sub-region of the NEB.The authors point out that the CHIRPS data show a pattern similar to the rain gauge data, though some statistical metrics were lower than those of the 3B42V7-TRMM data.Such studies reveal that the estimation of rainfall with good accuracy is still a challenge, and hence, the exploration of new approaches utilizing new datasets and/or algorithms is a hot spot for the scientific community.
A novel approach for rainfall estimation through in situ soil moisture (SM) measurements and satellite-based SM estimates was proposed by Reference [34] who used SM data to obtain a direct quantitative estimate of rainfall by inverting the soil-water balance equation.For this purpose, rainfall is computed from the knowledge of the SM state and its variation in time by means of an algorithm called SM2RAIN [35].This approach has shown to be particularly suitable for estimating accumulated rainfall amounts, and has been tested against in situ SM data and single-sensor SM products [36,37] with successful results at the regional scale [38][39][40].
In early 2017, the ESA CCI SM v03.2 dataset was released by the European Space Agency Climate Change Initiative (ESA-CCI) project.This dataset was obtained by merging SM retrievals from both active and passive microwave instruments carried by various satellite platforms and provided daily SM estimates on a global scale, with 0.25 • of spatial resolution from 1978 to 2015 [41][42][43][44].This then allowed Reference [21] to develop a global-scale rainfall dataset by applying the SM2RAIN algorithm to the ESA-CCI SM products to obtain the rainfall estimate at 0.25 • and daily spatial-temporal resolution for the period 1998-2015 (hereinafter referred to as SM2RAIN-CCI).They also conducted several statistical analyses and found that SM2RAIN-CCI showed relatively good results in terms of the correlation coefficient, root mean square difference, and bias on a global scale when the 5-day accumulated rainfall data and the Global Precipitation Climatology Centre Full Data Daily (GPCC-FDD) product [45] as an independent benchmark were considered.Since then, other SM2RAIN-CCI evaluation studies have been carried out with similar conclusions at a regional scale, such as in the Upper Blue Nile basin [46] and the Mediterranean region [47].These results are promising, but a detailed study investigating the performance of the SM2RAIN algorithm, its range of applicability, and its limitations in the NEB is still needed.
The objective of this study is to evaluate the quality of the SM2RAIN-CCI rainfall dataset for the first time in the NEB.In addition, its performance was compared with the performance of three state-of-the-art rainfall products: CHIRPS, CMORPH (Climate Prediction Center Morphing Technique) [48], and MSWEP (Multi-Source Weighted-Ensemble Precipitation) [23,49], using high-quality ground-based observations as a benchmark.The analysis was performed in four bioclimatic regions using 0.25 • spatial sampling and both 5-day accumulated rainfall and daily rainfall, during the period 1998-2015.

Study Area
The NEB is located between 1.3-18.2• S and 34.4-48.4• W, occupying an area of about 1,555,000 km 2 (nearly 19% of Brazilian territory).It has more than 53 million inhabitants (about 30% of the Brazilian population), and a human population density of about 34 inhabitants per square kilometer [5].It is characterized by a dominant semiarid climate, with the mean annual rainfall ranging from ≈400 to ≈700 mm/yr.[50].The weather system in most of the NEB is controlled by the Inter-Tropical Convergence Zone (ITCZ), defining the rainy season between April and June [15,50,51].Higher rainfall amounts occur along the eastern coastal strip due to the contrast in air temperature over the ocean and continent, but the rainfall regime is mainly influenced by the orographic effect [16,28].Over southern NEB, there is actuation of the Front Systems and the South Atlantic Convergence Zone [52].
The main biomes of NEB are Amazônia (Amazonia), Cerrado, Mata Atlântica (Atlantic Forest), and Caatinga; hereinafter AMZ, CER, MAT, and CAAT, respectively.The spatial distribution of such biomes is strongly linked to the spatiotemporal variability of rainfall [53,54].The CAAT biome is characterized by a mosaic of seasonally dry tropical forests and thorn scrubs [55,56], with the lowest rainfall regime [57].The CER biome is a vast tropical savanna, whose main habitats are forest savanna, wooded savanna, park savanna, gramineous-woody savanna, savanna wetlands, and gallery forests [58].The MAT biome extends along the Atlantic coast of the NEB, and groups the seasonal moist and dry broad-leaf tropical and subtropical grasslands, savannas, shrublands, and mangrove forests [59].The AMZ biome is a moist broadleaf forest, which comprises the largest and most biodiverse tract of tropical rainforest in the NEB and is characterized by the highest rainfall regime [60].These biomes are selected as benchmark areas in order to investigate the performances of the rainfall products under different bioclimatic conditions.The biomes map of the NEB is shown in Figure 1, whereas their main features are listed in Table 1.

Ground Station Data
For this study, a grid with a spatial resolution of 0.25° × 0.25° of daily rainfall for all of Brazil that was developed by Reference [61] using in situ rainfall observations provided by the Brazilian Water Agency (ANA), the National Institute of Meteorology (INMET), and the Water and Electric Energy

Ground Station Data
For this study, a grid with a spatial resolution of 0.25 • × 0.25 • of daily rainfall for all of Brazil that was developed by Reference [61] using in situ rainfall observations provided by the Brazilian Water Agency (ANA), the National Institute of Meteorology (INMET), and the Water and Electric Energy Department of São Paulo state (DAEE), was used (version 2.1 released on February 2018 and available at http://careyking.com/data-download); hereinafter GBGR.The procedure to generate the GBGR dataset involved a quality control check, in which rainfall data exceeding 450 mm/day and less than 0 mm/day are eliminated, and also those rain gauge data that are duplicated [61].To create this dataset, Reference [61] tested six different methods to interpolate precipitation throughout the period 1980-2015: angular distance weighting (ADW), inverse-distance weighted (IDW), average inside the area of each grid of 0.25 • × 0.25 • , thin plate spline, natural neighbor, and ordinary point kriging.Among them, they verified using cross-validation analysis that the ADW interpolation scheme is superior to the others (e.g., R = 0.65; bias = 0.027 mm; root mean square error, RMSE = 8.54 mm for all of Brazil).The ADW method used two weights: one based on the correlation decay distance (CDD) and the other based in the position of the rain gauges in relation to the query point where we want to do the estimation [61,62].For more details about the GBGR product, the reader is referred to Reference [61].As can be seen in Figure 2, there are large areas without any rain gauge stations in NEB.In this context, the GBGR dataset is used here as a high-quality reference rainfall dataset as in [28].
Remote Sens. 2017, 9, x FOR PEER REVIEW 5 of 29 Department of São Paulo state (DAEE), was used (version 2.1 released on February 2018 and available at http://careyking.com/data-download); hereinafter GBGR.The procedure to generate the GBGR dataset involved a quality control check, in which rainfall data exceeding 450 mm/day and less than 0 mm/day are eliminated, and also those rain gauge data that are duplicated [61].To create this dataset, Reference [61] tested six different methods to interpolate precipitation throughout the period 1980-2015: angular distance weighting (ADW), inverse-distance weighted (IDW), average inside the area of each grid of 0.25° × 0.25°, thin plate spline, natural neighbor, and ordinary point kriging.Among them, they verified using cross-validation analysis that the ADW interpolation scheme is superior to the others (e.g., R = 0.65; bias = 0.027 mm; root mean square error, RMSE = 8.54 mm for all of Brazil).The ADW method used two weights: one based on the correlation decay distance (CDD) and the other based in the position of the rain gauges in relation to the query point where we want to do the estimation [61,62].For more details about the GBGR product, the reader is referred to Reference [61].As can be seen in Figure 2, there are large areas without any rain gauge stations in NEB.In this context, the GBGR dataset is used here as a high-quality reference rainfall dataset as in [28].

SM2RAIN-CCI Rainfall Dataset
The SM2RAIN algorithm proposed by Reference [34] is based on the inversion of the soil-water balance equation for retrieving rainfall from soil moisture data.It assumes that during rainfall, the evapotranspiration rate and the surface runoff are negligible [38,63].In this context, a simplified version of the soil-water balance equation is formulated as follows:

SM2RAIN-CCI Rainfall Dataset
The SM2RAIN algorithm proposed by Reference [34] is based on the inversion of the soil-water balance equation for retrieving rainfall from soil moisture data.It assumes that during rainfall, the evapotranspiration rate and the surface runoff are negligible [38,63].In this context, a simplified version of the soil-water balance equation is formulated as follows: where p(t) is the estimated rainfall between two successive SM retrievals for the time step dt [L/T], Z* represents the water capacity of the soil layer [L], s(t) denotes the relative soil saturation [-], t is the time [t], and a and b are two parameters describing the nonlinearity between soil saturation and drainage.The parameters a, b, and Z* are estimated through calibration [21].The SM2RAIN algorithm has the main limitation of not being able to estimate rainfall when the soil is close to saturation since in this condition the rainfall is not able to drive any SM variation as SM keeps constant [36].More information about the SM2RAIN algorithm can be found in Reference [38].The SM2RAIN-CCI product is obtained by the implementation of the SM2RAIN algorithm [34] to the ESA CCI soil moisture Active and Passive products separately [42,43].Then, an integration procedure based on a weighted average is applied to obtain the accumulated rainfall between 00:00 and 23:59 UTC of the indicated day.On the other hand, the quality flag provided within the raw soil moisture observations (i.e., ESA CCI SM v03.2) is used to mask out low quality data and those observations characterized by issues in the retrieval (e.g., frozen soil, snow-dominated regions, dense vegetation, and high topographic complexity).For this product, the SM2RAIN algorithm is calibrated during the 1998-2001, 2002-2006, and 2007-2013  CHIRPS provides daily rainfall estimates by taking advantage of three types of information: global climatology, satellite estimates, and in situ observations.The development of CHIRPS products entails three stages.First, Infrared Precipitation (IRP) pentad (5-day) rainfall estimates are created from satellite data using cold cloud durations (CCD) and calibrated in relation to the TMPA3B42-based precipitation pentads.Then, the IRP pentads are divided by its long-term IRP mean values to present a percent of normal.Second, the percent of normal IRP pentad is multiplied by the corresponding Climate Hazards Precipitation Climatology (CHPclim) pentad to generate an unbiased gridded estimate, with units of millimeters per pentad, called the CHG IR Precipitation (CHIRP).Finally, CHIRPS is produced through blending stations with the CHIRP datasets [64,65].This product is freely available at http://chg.geog.ucsb.edu/data/chirps/.For more details about the CHIRPS product, the reader is referred to Reference [31].
CMORPH uses rainfall estimates exclusively derived from satellite microwave sensors, but when microwave data are not available at a particular spatial location, the geostationary satellite infrared data are used as a medium to propagate microwave-derived precipitation features by executing a time-weighting interpolation between two successive satellite overpasses [48].In this study, CMORPH v0.x is considered at a 0.25 • spatial resolution on a daily basis.It was downloaded from the National Center for Atmospheric Research-University Corporation for Atmospheric Research (NCAR/UCAR) at https://rda.ucar.edu/(it covers only satellite microwave sensors in operation since December 2002).More details about the CMORPH product can be found in Reference [66].
MSWEP v2.1 is a non-operational product that combines rainfall information from satellites (CMORPH, GridSat, GSMaP, and TMPA 3B42RT), reanalysis (ERA-Interim and JRA-55), and rain gauge (GPCC v7) data.The estimates obtained through satellite sensors, reanalysis, and in situ stations are merged by the use of integration weights.The product is freely available at http:// data.princetonclimate.com/,from 1979 to 2016, with a 3-hourly temporal and 0.1 • spatial resolution.Full details about the main steps carried out to produce MSWEP v2.1 can be found in Reference [23].
The CHIRPS, CMORPH, and MSWEP datasets were selected because of their long available data record, relatively high spatial resolution, high temporal resolution, and data sources employed.The main features of all rainfall datasets are shown in Table 2.

Datasets Pre-Processing
All datasets were clipped using a shapefile of the NEB as a mask.To harmonize the datasets, the products with spatial resolutions higher or lower than 0.25 • , such as CHIRPS and MSWEP, were resampled to the 0.25 • GBGR grid by using the nearest neighbor algorithm [36].Both 5-day accumulated rainfall and daily rainfall were considered in this study.The rationale for the 5-day aggregation is related to the fact that the SM2RAIN parameters in the generation of the SM2RAIN-CCI product were obtained by minimizing the root mean square difference (RMSD) between the 5-day estimated rainfall and the GPCC-FDD data during three calibration periods 1998-2001, 2002-2006, and 2007-2013 [21].Furthermore, the use of 5-day values minimizes the impact of any remaining temporal mismatches in the 24-hour accumulation period between the datasets and the rain gauge data [23].This temporal scale is also typically used for irrigation schedules, where farmers commonly use a 5-day or weekly period [67,68].Hence, a 5-day accumulated rainfall was obtained for each grid cell during the period 1998-2015.When less than five daily data are available, the 5-day accumulated rainfall was considered as a missing value and excluded from the analysis (hereinafter referred to as 5-day SM2RAIN-CCI).

Performance Evaluation Methods
Figure 3 summarizes the methods of analysis applied in this study.At first, an intercomparison of 5-day accumulated rainfall estimates derived from SM2RAIN-CCI with ones from the GBGR dataset has been carried out in order to assess the quality of their rainfall estimates during the calibration periods used in the generation of the SM2RAIN-CCI product (i.e., 1998-2001, 2002-2006, and 2007-2013).Secondly, to examine the performance of the 5-day SM2RAIN-CCI product, a comparative analysis based on continuous metrics with the GBGR dataset as a benchmark was made over different bioclimatic conditions.Finally, we compared the performances of the SM2RAIN-CCI, CHIRPS, CMORPH, and MSWEP products at a daily scale under different bioclimatic conditions in order to determine whether the effects of bioclimatic conditions on performance varied regionally.
To measure how the value of estimates from the SM2RAIN-CCI, CHIRPS, CMORPH, and MSWEP products differed with the value of the GBGR dataset, four continuous metrics were used.These metrics were based on a pair-wise comparison to evaluate the performance of each product in estimating rainfall amounts derived from the GBGR dataset on a pixel-to-pixel basis.The Pearson correlation coefficient (R), root mean square error (RMSE), mean error (ME), mean absolute error (MAE), and percent bias (B) were considered in this study, whose equations are summarized in Table 3. R measures the linear relationship strength between estimations and observations, varying from −1 to 1, with the best score equal to 1. RMSE, ME, MAE, and B measure how the value of estimates differs from the observed values.RMSE and MAE acquire only positive values, with lower values corresponding to better performance.ME and B can take any negative or positive value, with a perfect score equal to 0. Positive ME or B values indicate an overestimation, while negative ones indicate an underestimation [27,69].

Name
Formula Perfect Score Pearson correlation coefficient To examine the rain-detection capability of the SM2RAIN-CCI, CHIRPS, CMORPH, and MSWEP products, four categorical metrics at a pixel-to-pixel basis were used.This term refers to the skill of a rainfall product for detection of observed rainfall events, taking into account a threshold to differentiate the rainfall events from non-rainfall events at any time scale (e.g., daily) [18].These metrics were derived from a contingency table (not shown here) in which the letters A, B, C, and D represent, respectively, hits (event forecast to occur, and did occur), false alarms (event forecast to occur, but did not occur), missing (event forecast not to occur, but did occur), and correct negatives (event forecast not to occur, and did not occur), with a rainfall threshold of 5 mm/day [8,18].The equations for these metrics are listed in Table 4. Probability of Detection (POD) and False Alarm Ratio (FAR) indicate the fraction of the observed events that were correctly forecasted and the fraction of the predicted events did not occur, respectively.POD and FAR vary from 0 to 1, with a perfect score equal to 1 and 0, respectively [28].The accuracy (ACC) is the fraction of all product-based events that were correct.The ACC value ranges from 0 ≤ ACC ≤ 1, and the best score is 1.The Critical Success Index (CSI) is the fraction between hits to all product-based events.The value varies from 0 ≤ CSI ≤

Table 3.
Formulas of continuous metrics, where G: GBGR-based rainfall measurement, G: average GBGR-based rainfall measurement, C: product-based rainfall estimate, C: average product-based rainfall estimate, and N: number of data pairs.The products are SM2RAIN-CCI, CHIRPS, CMORPH, and MSWEP.

Name
Formula Perfect Score To examine the rain-detection capability of the SM2RAIN-CCI, CHIRPS, CMORPH, and MSWEP products, four categorical metrics at a pixel-to-pixel basis were used.This term refers to the skill of a rainfall product for detection of observed rainfall events, taking into account a threshold to differentiate the rainfall events from non-rainfall events at any time scale (e.g., daily) [18].These metrics were derived from a contingency table (not shown here) in which the letters A, B, C, and D represent, respectively, hits (event forecast to occur, and did occur), false alarms (event forecast to occur, but did not occur), missing (event forecast not to occur, but did occur), and correct negatives (event forecast not to occur, and did not occur), with a rainfall threshold of 5 mm/day [8,18].The equations for these metrics are listed in Table 4. Probability of Detection (POD) and False Alarm Ratio (FAR) indicate the fraction of the observed events that were correctly forecasted and the fraction of the predicted events did not occur, respectively.POD and FAR vary from 0 to 1, with a perfect score equal to 1 and 0, respectively [28].The accuracy (ACC) is the fraction of all product-based events that were correct.
The ACC value ranges from 0 ≤ ACC ≤ 1, and the best score is 1.The Critical Success Index (CSI) is the fraction between hits to all product-based events.The value varies from 0 ≤ CSI ≤ 1, with the best score equal to 1 [70].Given the limited data availability of the CMORPH product (see Table 2), its analysis was considered starting from 2002 and continued to 2015.

Evaluation Using 5-Day SM2RAIN-CCI Rainfall Estimates during the Calibration Periods
An intercomparison of 5-day accumulated rainfall estimates derived from SM2RAIN-CCI with ones from the GBGR dataset were carried out in order to assess the quality of their rainfall estimates.The analysis was performed on a pixel-by-pixel basis by considering R and RMSE as performance metrics during the three calibration periods 1998-2001, 2002-2006, and 2007-2013 used for the generation of the 5-day SM2RAIN-CCI product [21].As can be seen in Table 5, there is a moderate increase of performance with respect to the ground measurements over time.The R median values range between 0.56 and 0.76, whereas the RMSE median values vary from 14.91 mm to 12.64 mm.   4 is due to the use of a static mask to mask out periods with high frozen soil and snow probabilities, rainforest areas, and areas with high topographic complexity before applying the SM2RAIN algorithm to the ESA CCI SM dataset [21].Five-day SM2RAIN-CCI rainfall estimates exhibited reasonably good agreement with GBGR, especially over the inland of Bahia (BA), Pernambuco (PE), and Ceara (CE) in terms of R and RMSE.The RMSE and R spatial patterns seemed to be related to the rainfall regimes.The lowest RMSE values are observed in semiarid regions characterized by the low rainfall regime and the presence of open flatland that might favor the satellite retrievals accuracy [71].In congruence with results reported in Table 5, the visual comparison also revealed better performance during the period 2007-2013.On the other hand, 5-day SM2RAIN-CCI showed lower performance along the NEB coastline, characterized by a wetter rainfall regime than inland.

5-Day SM2RAIN-CCI Performance Evaluation under Different Bioclimatic Conditions
The main biomes of NEB are selected as benchmark regions to assess the capability of the 5-day SM2RAIN-CCI product in estimating accumulated rainfall under different bioclimatic conditions during the period 1998-2015.Figure 5 shows the rainfall regime for each biome during this period.The rainy season occurs at different times of the year for different biomes: (i) the AMZ biome, from January to May (mean annual rainfall, MAR = 1659 mm); (ii) the MAT biome, from April to July (MAR = 1131 mm); (iii) the CER biome has its rainy season between December to April (MAR = 1123 mm); and (iv) the CAAT biome, has a short rainy season from January to April (MAR = 687 mm).Mean annual rainfall in NEB showed a high spatial variability, ranging from less than 700 mm to above 1600 mm on average.In brief, the highest values are located in the AMZ biome, and the lowest values were observed in the CAAT biome.

5-Day SM2RAIN-CCI Performance Evaluation under Different Bioclimatic Conditions
The main biomes of NEB are selected as benchmark regions to assess the capability of the 5-day SM2RAIN-CCI product in estimating accumulated rainfall under different bioclimatic conditions during the period 1998-2015.Figure 5 shows the rainfall regime for each biome during this period.The rainy season occurs at different times of the year for different biomes: (i) the AMZ biome, from January to May (mean annual rainfall, MAR = 1659 mm); (ii) the MAT biome, from April to July (MAR = 1131 mm); (iii) the CER biome has its rainy season between December to April (MAR = 1123 mm); and (iv) the CAAT biome, has a short rainy season from January to April (MAR = 687 mm).Mean annual rainfall in NEB showed a high spatial variability, ranging from less than 700 mm to above 1600 mm on average.In brief, the highest values are located in the AMZ biome, and the lowest values were observed in the CAAT biome.We further evaluated the performance of 5-day SM2RAIN-CCI at a monthly scale to investigate the seasonal variation within these biomes.From Table 6, one can see that the performance of 5-day SM2RAIN-CCI was relatively comparable in the CAAT and CER biomes.In terms of correlation, 5day SM2RAIN-CCI showed R values greater than 0.80 during the transition from the dry to wet season.Table 6 also reveals higher underestimation (overestimation) of rainfall during the transition from dry to wet season (the driest months), particularly between November and February (July to September).In addition, 5-day SM2RAIN-CCI showed lower performance in the AMZ and MAT biomes, characterized by a general underestimation (overestimation) within the MAT (AMZ) biome, particularly from November to June (July to October), coinciding with the transition from the dry to wet season (the driest months over AMZ).
The temporal pattern of the performance of 5-day SM2RAIN-CCI at a monthly timescale is shown in Figure 6.Although a general agreement between observed and estimated monthly rainfall was clearly perceptible after 2010, 5-day SM2RAIN-CCI tend to overestimate the amount of rainfall during the dry season, especially in the AMZ, CER, and CAAT biomes.Based on the abovementioned results, we could infer that 5-day SM2RAIN-CCI reproduces the seasonality of rainfall reasonably well (poorly) in the CAAT and CER (AMZ and MAT) biomes.We further evaluated the performance of 5-day SM2RAIN-CCI at a monthly scale to investigate the seasonal variation within these biomes.From Table 6, one can see that the performance of 5-day SM2RAIN-CCI was relatively comparable in the CAAT and CER biomes.In terms of correlation, 5-day SM2RAIN-CCI showed R values greater than 0.80 during the transition from the dry to wet season.Table 6 also reveals higher underestimation (overestimation) of rainfall during the transition from dry to wet season (the driest months), particularly between November and February (July to September).In addition, 5-day SM2RAIN-CCI showed lower performance in the AMZ and MAT biomes, characterized by a general underestimation (overestimation) within the MAT (AMZ) biome, particularly from November to June (July to October), coinciding with the transition from the dry to wet season (the driest months over AMZ).
The temporal pattern of the performance of 5-day SM2RAIN-CCI at a monthly timescale is shown in Figure 6.Although a general agreement between observed and estimated monthly rainfall was clearly perceptible after 2010, 5-day SM2RAIN-CCI tend to overestimate the amount of rainfall during the dry season, especially in the AMZ, CER, and CAAT biomes.Based on the above-mentioned results, we could infer that 5-day SM2RAIN-CCI reproduces the seasonality of rainfall reasonably well (poorly) in the CAAT and CER (AMZ and MAT) biomes.

Daily Performance of SM2RAIN-CCI and the State-of-the-Art Rainfall Datasets in the NEB Biomes
To evaluate the reliability of the daily rainfall estimates from the SM2RAIN-CCI, CHIRPS, and MSWEP datasets, a comparison analysis using the GBGR dataset as benchmark was implemented for each biome during the period 1998-2015.As already mentioned, the evaluation of CMORPH was limited to the period 2002-2015 due to its temporal coverage (see Table 2).In this context, the continuous metric verification is a statistical quantification of the differences in the amount of rainfall from the SM2RAIN-CCI, CHIRPS, and MSWEP products with those from the GBGR dataset at the pixel scale.This included the R and B metrics.Besides, the categorical metrics, described in Section 2.4 (i.e., POD, FAR, CSI, and ACC), determined how well their estimates can detect rain events.

Seasonal and Regional Daily Analysis for SM2RAIN-CCI
Boxplots of R and B values between SM2RAIN-CCI and GBGR grouped by month and biome are shown in Figures 7 and 8, respectively.In terms of correlation, the SM2RAIN-CCI product showed relatively poor performance; particularly over the AMZ and MAT biomes (median R: 0.17 and 0.32).Furthermore, results disclose a negative bias in AMZ (median B: −0.65%) and MAT (median B: −20.70%), whereas it is positive in CER (median B: 12.80%) and CAAT (median B: 9.15%), suggesting an overestimation (underestimation) of the rainfall amount in much of the semiarid (wet) region.

Daily Performance of SM2RAIN-CCI and the State-of-the-Art Rainfall Datasets in the NEB Biomes
To evaluate the reliability of the daily rainfall estimates from the SM2RAIN-CCI, CHIRPS, and MSWEP datasets, a comparison analysis using the GBGR dataset as benchmark was implemented for each biome during the period 1998-2015.As already mentioned, the evaluation of CMORPH was limited to the period 2002-2015 due to its temporal coverage (see Table 2).In this context, the continuous metric verification is a statistical quantification of the differences in the amount of rainfall from the SM2RAIN-CCI, CHIRPS, and MSWEP products with those from the GBGR dataset at the pixel scale.This included the R and B metrics.Besides, the categorical metrics, described in section 2.4 (i.e., POD, FAR, CSI, and ACC), determined how well their estimates can detect rain events.

Seasonal and Regional Daily Analysis for SM2RAIN-CCI
Boxplots of R and B values between SM2RAIN-CCI and GBGR grouped by month and biome are shown in Figures 7 and 8, respectively.In terms of correlation, the SM2RAIN-CCI product showed relatively poor performance; particularly over the AMZ and MAT biomes (median R: 0.17 and 0.32).Furthermore, results disclose a negative bias in AMZ (median B: −0.65%) and MAT (median B: −20.70%), whereas it is positive in CER (median B: 12.80%) and CAAT (median B: 9.15%), suggesting an overestimation (underestimation) of the rainfall amount in much of the semiarid (wet) region.Figure 8 shows that the SM2RAIN-CCI product tended to underestimate (overestimate) daily rainfall relative to ground-based estimates during the rainy (dry) season at all biomes, although this feature was clearly noticeable in AMZ and CER.It also revealed that R is sensitive to the daily amount of rainfall; especially in the CER biome.Note that the R values followed a decreasing trend when a decrease in the amount of rainfall was observed, implicating that its quantitative performance varies seasonally independent of the biome.On the other hand, it showed a global median value for POD, FAR, CSI, and ACC equal to 0.89, 0.09, 0.83, and 0.87, respectively.In terms of POD, SM2RAIN-CCI exhibited good performance over the CAAT and MAT biomes (median POD: 0.95 and 0.90, respectively).The ACC and CSI were also higher in these biomes, with median values of 0.90 and 0.87 for CAAT, and 0.81 and 0.75 for MAT, respectively (Table 7).The ability to detect rainy events using the SM2RAIN-CCI product showed low sensitivity to the amount of rainfall, excepting the AMZ biome, where the POD and the amount of rainfall vary linearly through the year.7, but for the percent bias.The components of a boxplot are described in Figure 5.
Figure 8 shows that the SM2RAIN-CCI product tended to underestimate (overestimate) daily rainfall relative to ground-based estimates during the rainy (dry) season at all biomes, although this feature was clearly noticeable in AMZ and CER.It also revealed that R is sensitive to the daily amount of rainfall; especially in the CER biome.Note that the R values followed a decreasing trend when a decrease in the amount of rainfall was observed, implicating that its quantitative performance varies seasonally independent of the biome.On the other hand, it showed a global median value for POD, FAR, CSI, and ACC equal to 0.89, 0.09, 0.83, and 0.87, respectively.In terms of POD, SM2RAIN-CCI exhibited good performance over the CAAT and MAT biomes (median POD: 0.95 and 0.90, respectively).The ACC and CSI were also higher in these biomes, with median values of 0.90 and 0.87 for CAAT, and 0.81 and 0.75 for MAT, respectively (Table 7).The ability to detect rainy events using the SM2RAIN-CCI product showed low sensitivity to the amount of rainfall, excepting the AMZ biome, where the POD and the amount of rainfall vary linearly through the year.
Table 7. Probability of detection (POD), false alarm ratio (FAR), critical success index (CSI), and accuracy (ACC) for the daily rainfall estimates derived from SM2RAIN-CCI against observations from the GBGR data over the main biomes of NEB during the period 1998-2015.For each score, the median value is reported.The performance of CHIRPS based on continuous metrics, and clustered per biome and month, is summarized in Figures 9 and 10.The R values were relatively comparable at all biomes, with median values of 0.52, 0.53, 0.53, and 0.41 for AMZ, CAAT, CER, and MAT, respectively.A similar pattern was found for the B score, but it revealed a slight overestimation (underestimation) of the amount of rainfall at the biomes AMZ and CER (CAAT and MAT) with median values of B equal to 5.30% and 0.30% (−1.60% and −2.50%), respectively.Moreover, results also indicated that R and B were very sensitive to the amount of rainfall.As can be seen in Figure 9, the R values are persistently low for low rainfall observations, whereas B tended to exhibit significant negative values during the dry season; particularly in the CAAT and CER biomes, and from June to September (Figure 10).The performance of CHIRPS based on continuous metrics, and clustered per biome and month, is summarized in Figures 9 and 10.The R values were relatively comparable at all biomes, with median values of 0.52, 0.53, 0.53, and 0.41 for AMZ, CAAT, CER, and MAT, respectively.A similar pattern was found for the B score, but it revealed a slight overestimation (underestimation) of the amount of rainfall at the biomes AMZ and CER (CAAT and MAT) with median values of B equal to 5.30% and 0.30% (−1.60% and −2.50%), respectively.Moreover, results also indicated that R and B were very sensitive to the amount of rainfall.As can be seen in Figure 9, the R values are persistently low for low rainfall observations, whereas B tended to exhibit significant negative values during the dry season; particularly in the CAAT and CER biomes, and from June to September (Figure 10).The scores listed in Table 8 revealed good agreement between the benchmark and CHIRPS datasets in term of the detection of rain events.Overall, CHIRPS provided PDO, CSI, and ACC median values higher than 0.80, and FAR median values lower than 0.06.Indeed, it produced POD (FAR) median values of 0.86, 0.94, 0.90, and 0.90 (0.11, 0.05, 0.08, and 0.11) for AMZ, CAAT, CER, and MAT, respectively.Note that a decreasing trend in the performance based on the POD, CSI, and ACC scores could be observed at the AMZ and CER biomes during the dry season (Table 8), highlighting the impact of the amount of rainfall on the ability to discriminate between rainfall events and nonrainfall events over those bioregions.The scores listed in Table 8 revealed good agreement between the benchmark and CHIRPS datasets in term of the detection of rain events.Overall, CHIRPS provided PDO, CSI, and ACC median values higher than 0.80, and FAR median values lower than 0.06.Indeed, it produced POD (FAR) median values of 0.86, 0.94, 0.90, and 0.90 (0.11, 0.05, 0.08, and 0.11) for AMZ, CAAT, CER, and MAT, respectively.Note that a decreasing trend in the performance based on the POD, CSI, and ACC scores could be observed at the AMZ and CER biomes during the dry season (Table 8), highlighting the impact of the amount of rainfall on the ability to discriminate between rainfall events and non-rainfall events over those bioregions.The MSWEP product showed relatively good performance in the estimation of daily rainfall amounts at all biomes of NEB (Figures 11 and 12).It yielded R (B) median values of 0.56, 0.66, 0.62, and 0.66 (−5.10%, −13.20%, −3.80%, and −8.50%) for the AMZ, CAAT, CER, and MAT biomes, respectively.Similar to SM2RAIN-CCI and CHIRPS, R and B exhibited moderate sensitivity to daily rainfall values.In the visual verification of Figure 11, it can be noted that R median values are generally higher (lower) during the rainy (dry) season in all biomes.Furthermore, Figure 12 discloses a moderate underestimation (slight overestimation) of the amount of rainfall during the dry (rainy) season in all biomes, revealing a strong seasonal signal on performance of the MSWEP product.The MSWEP product showed relatively good performance in the estimation of daily rainfall amounts at all biomes of NEB (Figures 11 and 12).It yielded R (B) median values of 0.56, 0.66, 0.62, and 0.66 (−5.10%, −13.20%, −3.80%, and −8.50%) for the AMZ, CAAT, CER, and MAT biomes, respectively.Similar to SM2RAIN-CCI and CHIRPS, R and B exhibited moderate sensitivity to daily rainfall values.In the visual verification of Figure 11, it can be noted that R median values are generally higher (lower) during the rainy (dry) season in all biomes.Furthermore, Figure 12 discloses a moderate underestimation (slight overestimation) of the amount of rainfall during the dry (rainy) season in all biomes, revealing a strong seasonal signal on performance of the MSWEP product.When the biomes were compared over the course of the year in terms of POD, FAR, CSI, and ACC (Table 9), MSWEP exhibited a similar performance in all biomes.In fact, it produced POD (FAR) median values of 0.92, 0.98, 0.94, and 0.94 (0.12, 0.04, 0.08, and 0.09) for the AMZ, CAAT, CER, and MAT biomes, respectively.The results from Table 9 reveal that MSWEP shows low sensitivity to daily rainfall values, except for the CER and AMZ biomes where the detection of rainfall events was associated with a moderate uncertainty during the rainy season (i.e., from January to May).When the biomes were compared over the course of the year in terms of POD, FAR, CSI, and ACC (Table 9), MSWEP exhibited a similar performance in all biomes.In fact, it produced POD (FAR) median values of 0.92, 0.98, 0.94, and 0.94 (0.12, 0.04, 0.08, and 0.09) for the AMZ, CAAT, CER, and MAT biomes, respectively.The results from Table 9 reveal that MSWEP shows low sensitivity to daily rainfall values, except for the CER and AMZ biomes where the detection of rainfall events was associated with a moderate uncertainty during the rainy season (i.e., from January to May).The CMORPH product showed poor performance in the estimation of daily rainfall amounts in all biomes of NEB.From Figure 13 (Figure 14), we can see that it shows R (B) median values of 0.49, 0.55, 0.52, and 0.43 (−4.35%, −23.10%, 4.30%, and −67.50%) for the AMZ, CAAT, CER, and MAT biomes, respectively.CMORPH exhibited high sensitivity to daily rainfall values, particularly in the biomes CER and AMZ, where R (B) median values were generally higher (negative) during their rainy (dry) seasons.CMORPH showed B values that were persistently negative during the year in the MAT and CAAT biomes, revealing a strong underestimation of the amount of rainfall for both bioregions.
Remote Sens. 2017, 9, x FOR PEER REVIEW 19 of 29 3.3.4.Seasonal and Regional Daily Analysis for CMORPH The CMORPH product showed poor performance in the estimation of daily rainfall amounts in all biomes of NEB.From Figure 13 (Figure 14), we can see that it shows R (B) median values of 0.49, 0.55, 0.52, and 0.43 (−4.35%, −23.10%, 4.30%, and −67.50%) for the AMZ, CAAT, CER, and MAT biomes, respectively.CMORPH exhibited high sensitivity to daily rainfall values, particularly in the biomes CER and AMZ, where R (B) median values were generally higher (negative) during their rainy (dry) seasons.CMORPH showed B values that were persistently negative during the year in the MAT and CAAT biomes, revealing a strong underestimation of the amount of rainfall for both bioregions.3.3.4.Seasonal and Regional Daily Analysis for CMORPH The CMORPH product showed poor performance in the estimation of daily rainfall amounts in all biomes of NEB.From Figure 13 (Figure 14), we can see that it shows R (B) median values of 0.49, 0.55, 0.52, and 0.43 (−4.35%, −23.10%, 4.30%, and −67.50%) for the AMZ, CAAT, CER, and MAT biomes, respectively.CMORPH exhibited high sensitivity to daily rainfall values, particularly in the biomes CER and AMZ, where R (B) median values were generally higher (negative) during their rainy (dry) seasons.CMORPH showed B values that were persistently negative during the year in the MAT and CAAT biomes, revealing a strong underestimation of the amount of rainfall for both bioregions.When the biomes were compared in the course of the year in terms of POD, FAR, CSI, and ACC, CMORPH exhibited a similar performance in all biomes (Table 10).Indeed, it yielded POD (FAR) median values of 0.93, 0.98, 0.94, and 0.99 (0.13, 0.05, 0.10, and 0.13) for the AMZ, CAAT, CER, and MAT biomes, respectively.The metrics listed in Table 10 suggest that the ability of CMORPH to detect rainfall events was not affected by the amount of daily rainfall, except for the CER and AMZ biomes where the detection of rainfall events during the beginning of the year was often deficient (i.e., from January to April).To examine the effect topography had on the performance of rainfall products, a linear correlation analysis was applied between the scores (i.e., R, B, and POD) and elevation for each product at the pixel scale (Table 11).Results from this analysis suggested that the performance of all rainfall products was sensitive to the elevation.In brief, the performance of SM2RAIN-CCI in terms of R and B, was partially affected by orographic effects; particularly in the MAT biome, where B reflected moderate overestimation of the rain amount in regions with complex topography (R = 0.30).A similar behavior could be seen in the CER biome when CHIRPS was taken into account.In this case, B revealed underestimation of the rain amount (R = −0.13).MSWEP also tended to underestimate the rain amount in mountainous regions in all biomes of NEB.Conversely, CMORPH underestimated the rain amount in the plains, whereas it overestimated towards mountains, particularly in the AMZ and MAT biomes.Finally, results showed that the ability for the rain detection of SM2RAIN-CCI, CHIRPS, and MSWEP tended to decrease in regions with complex topography.
For the sake of simplicity, Table 12 summarizes performance metrics considering all datasets grouped by biomes.In terms of R, MSWEP showed the best performance, whereas CMORPH and CHIRPS were relatively comparable.CHIRPS (CMORPH) provided very good (poor) performance in terms of B. All rainfall products showed a similar performance in terms of POD, FAR, CSI, and ACC.In fact, they provided a median POD higher than 0.90 in almost all biomes.With regard to the rainfall amount and the skill of detection of rainfall events, SM2RAIN-CCI performed quite well in the CAAT biome.Finally, to evaluate the performance of the SM2RAIN-CCI, CHIRPS, MSWEP, and CMORPH products at a local scale, their rainfall estimates were compared to in situ rainfall measurements at the benchmark stations shown in Figure 1a.The bias, RMSE, and R statistics for each rainfall product are computed for each station.This comparison can be seen in Figure 15.For AMZ (Zé Doca), results showed that the CHIRPS, CMORPH, and MSWEP products tended to overestimate the in situ daily rainfall values (B values: 7.60%, 1.70%, and 1.30%, respectively), whereas SM2RAIN-CCI tended to underestimate these measurements (B value: −3.70%).The RMSE on daily rainfall for the four rainfall products was close to 11 mm.R was higher for MSWEP and CMORPH (≈0.50) than for CHIRPS (0.48) and SM2RAIN-CCI (0.31).
The comparison between the rainfall estimates and the in situ soil moisture (SM) measurements at Barbalha (located in CAAT; Figure 15m-p) revealed RMSE values close to 9 mm.In terms of R, a value of 0.31 was obtained for SM2RAIN-CCI, while CHIRPS, CMORPH, and MSWEP provided values of 0.57, 0.62, and 0.74, respectively.The four rainfall products underestimated the in situ rainfall (B values: −44%, −44%, −25%, and −3% for SM2RAIN-CCI, CMORPH, MSWEP, and CHIRPS, respectively).Regarding the MAT biome (Itiruçu; Figure 15e-h), results revealed that the four rainfall products underestimated the in situ daily rainfall.This underestimation was higher in the case of CMORPH and MSWEP (B values < −35%) than in the case of SM2RAIN (B value: −24%) and CHIRPS (B value: −15%).All rainfall products showed a low variability in terms of RMSE, ranging between 6.12 mm and 7.17 mm.Similar to Zé Doca (located in AMZ), R was higher for MSWEP and CMORPH (≈0.40) than for CHIRPS (0.38) and SM2RAIN-CCI (0.32).
For the CER biome (Barreiras; Figure 15i-l), the RMSE was similar for all rainfall products (≈7.00 mm).Results disclosed that the MSWEP, CMORPH, and CHIRPS products slightly overestimated the in situ rainfall (B values: 3.60%, 6.50%, and 7.70%, respectively), whereas SM2RAIN-CCI tended to underestimate these measurements (B value: −8.20%).All rainfall products captured the temporal dynamics of the in situ daily rainfall relatively well (R between 0.40 and 0.61).

Discussion
Several statistical metrics were used to evaluate the SM2RAIN-CCI, CHIRPS, MSWEP, and CMORPH products with respect to gridded rainfall observations (i.e., the GBGR data set) and in situ rainfall measurements (i.e., those provided by INMEH) in NEB.In general, these metrics reveal that 5-day SM2RAIN-CCI provided more precise rainfall estimates after 2002 (Table 5, Figure 4).This can be explained by the fact that there was an increase in available data after 2002.For instance, the Advanced Microwave Scanning Radiometer-Earth Observing System (AMSR-E), and the Advanced Scatterometer (ASCAT), among other passive/active microwave satellites, were put into orbit during this period, leading to more frequent satellite overpasses during a day [43,72], thus reducing the errors associated with the retrievals [73].As expected, the most significant improvements could be noted after 2010 (Figure 6), corresponding to the start of SMOS operation, which was launched in early November 2009 by the European Space Agency (ESA) [74].This result agreed reasonably well with those shown by Reference [21] who obtained median values of R and RMSE from 0.52 to 0.63 and from 10.47 mm to 9.48 mm respectively, for 5 days of accumulated rainfall and the same periods analyzed (i.e., 1998-2001, 2002-2006, and 2007-2013), but at a global scale.
The poor performance of 5-day SM2RAIN-CCI along the NEB coastline in comparison to the inland of NEB can be linked to the rainfall regime and its high topographic complexity (Table 1 and Figure 1b).Although further research is needed to ascertain the exact cause of this spatial discrepancy, results show that in terms of R, the 5-day SM2RAIN-CCI product provided relatively good performance in all biomes, but tended to underestimate the 5-day accumulated rainfall, particularly in the MAT and AMZ biomes (see Table 6), which are characterized by a wetter rainfall regime than inland (Figure 5).In this context, it is interesting to mention that References [75,76] found that the SM retrievals' accuracy depended on many physical factors such as, but not limited to, soil type, rainfall regime, presence of noise (Radio-Frequency Interference in the case of SMOS and SMAP), land cover, and complex topography.Moreover, Reference [63] reported that the SM2RAIN algorithm cannot adequately estimate the rainfall when the soil is close to saturation because the rainfall is not able to drive any SM variation as SM keeps constant [37].Thus, the lower performance obtained here for 5-day SM2RAIN-CCI could be due to the presence of the higher frequency of rainfall events, dense vegetation, and mountainous terrain in the NEB coastline (Figure 4).
A factor less evident that influenced the numerical performance of the 5-day SM2RAIN-CCI product was the seasonal variation of rainfall (Table 6).This product reproduced the dynamic of the rainfall relatively well during the transition from the dry to wet season in the CAAT and CER biomes (R > 0.80).Nevertheless, it showed higher underestimation (overestimation) of the rainfall amount during the transition from the dry to rainy season (the driest months).Regarding the AMZ and MAT biomes, here it shows a general underestimation (overestimation) within the MAT (AMZ) biome, coinciding with the transition from the dry to wet season (the driest months over AMZ), confirming that it tended to fail under very dry environmental conditions, which was reflected as rainfall estimates with a positive bias.About this point, Reference [77] found that the SM retrieval algorithms tended to fail under dry environmental conditions, leading to SM values that were almost constant over time.As it can be noticed in Equation (1), the SM2RAIN algorithm is based on temporal variation of the relative soil saturation; therefore, if this variable tends to be constant over time, the algorithm fails to estimate rainfall from SM values.Evidence of this is the overestimation observed during the driest months (see Table 6).As already mentioned, the poor performance obtained for 5-day SM2RAIN-CCI in AMZ and MAT during the wettest months is attributed to: (1) the difficulties of the SM2RAIN algorithm to reproduce the rainfall when the soil is close to saturation, and (2) the dominant presence of mountains and densely vegetated regions within these bioregions.Indeed, according to Reference [21], these physiographic conditions affected the SM data quality (and, hence, the SM2RAIN-derived rainfall).
In light of the daily comparisons, we found that CMORPH performed rather poorly (Table 10, and Figures 13 and 14), coinciding with findings from previous studies [48,78], possibly because it was only based on rainfall estimates from microwave sensors and geostationary satellite infrared sensors.Interestingly, CMORPH performed relatively similar to CHIRPS, particularly in terms of R and POD over the CER and AMZ biomes (Table 12), which is somewhat unexpected given the sophisticated interpolation scheme employed in generating the CHIRPS dataset [65,79].A possible reason for this is that CHIRPS uses a low number of anchor stations during the stage of bias correction of CHPclim for these regions of the NEB [18].
It is convenient to mention that SM2RAIN-CCI produced a better performance in term of R and B for the 5-day timescale than daily in all biomes of NEB (Tables 6 and 7, and Figures 7 and 8), implying that daily estimates were more affected by systematic errors than 5-day estimates [28].Thus, better results can potentially be obtained by applying a temporal aggregation to the SM2RAIN-CCI product.Similar to the 5-day SM2RAIN-CCI product, SM2RAIN-CCI should be used with caution during the dry season in the CER, AMZ, and CAAT biomes (see the percent bias from June to August in Figure 8) because the SM2RAIN algorithm failed to estimate the amount of rainfall under very dry conditions.
Moreover, CHIRPS performed reasonably well in all biomes (Table 8, and Figures 9 and 10) and their performance metrics were similar to those obtained by References [18,33] for the entire NEB.As expected, MSWEP provided good performance in almost all biomes (except CAAT) (Table 9, and Figures 11 and 12) because it took full advantage of the complementary nature of satellite and reanalysis data.One of the advantages of CHIRPS and MSWEP in comparison to SM2RAIN-CCI and CMORPH was that both have a gauge-based correction scheme [23,31], which significantly improved the estimation of the rainfall amount derived from those rainfall products.This fact would explain the relatively good performance of CHIRPS and MSWEP for most of the biomes (Table 12).
In terms of the detection of rainfall events, our results showed that the four rainfall products have a similar ability (median POD > 0.90).Nevertheless, this feature tended to decrease in those regions with a complex topography (Table 11), where a low density of rainfall gauges was observed (see Figure 2).This was in line with the findings from some previous studies [18,33].In fact, it is a common limitation of satellite-based rainfall products, except those that include orographic corrections [23,28].In addition, for the local scale, it is interesting to remark that the four rainfall products showed overestimation of lower daily rainfall values and underestimation of higher values, indicating that they fail to estimate the amount of rainfall under light rainfall or heavy rainfall conditions (Figure 15).This result is consistent with the above results and once again confirms that all rainfall products fail to estimate the amount of rainfall under very dry or very wet conditions.
The assessment of MSWEP, CHIRPS, and SM2RAIN-CCI described above was not completely independent; this is due to the fact that the rainfall observations from the GBGR dataset, or those provided by INMEH, could have been partially used in the development of MSWEP and CHIRPS for the bias correction of CHPclim [23,31] and the calibration step of SM2RAIN-CCI [21].In contrast, CMORPH is exclusively a satellite-based rainfall product [66]; therefore, this feature may partly explain its poor performance.Overall, results suggested that SM2RAIN-CCI could be useful for some operational purposes on a weekly scale within the CAAT biome, e.g., water irrigation planning [80].

Conclusions
Management of rainfall-related extreme events is critical to reducing vulnerability.Microwave-based satellite rainfall products offer an opportunity to assess rainfall-related extreme events for regions where rain-gauge stations are sparse, such as NEB.NEB is subject to frequent droughts; hence, accurate measurement of rainfall is vital for operational applications and water resource managers.The new SM2RAIN-CCI product provides rainfall data obtained from the inversion of the microwave-based satellite SM observations derived from the ESA-CCI project.This study was set out with the aim of evaluating the performance of SM2RAIN-CCI satellite rainfall estimates against rain gauge observations under different bioclimatic conditions for the first time in NEB.The performance of SM2RAIN-CCI was also compared with those from three other state-of-the-art rainfall products (CHIRPS, CMORPH, and MSWEP) in order to assess its capability in rainfall amount estimation and the detection of rainfall events.The analysis was performed on a sub-regional scale at 0.5 • spatial sampling with 5-day accumulated rainfall and daily rainfall during the period 1998-2015.Concerning the obtained results, the following conclusions could be drawn: • 5-day SM2RAIN-CCI showed relatively good performance in rainfall estimation; especially after 2007 in all biomes (see Table 5, Figures 4 and 6).

•
The reliability of rainfall products (i.e., CHIRPS, CMORPH, MSWEP, and SM2RAIN-CCI) was dependent on the topography (see Table 11) and bioclimatic conditions (see Table 12) of NEB.They provided R values higher than 0.40, with median values of B, PDO, FAR, CSI, and ACC equal to 0.95, 0.06, 0.84, and 0.89, respectively.• MSWEP performed substantially better than the other datasets for all biomes in terms of R, but CHIRPS performed better in terms of the estimation of rain amount (see Table 12).

•
Despite the simplicity of the SM2RAIN algorithm, SM2RAIN-CCI performed quite well in the CER and CAAT biomes (see Figures 7 and 8, and Table 7).

•
The performance of SM2RAIN-CCI suggested that the SM2RAIN algorithm tended to fail in very dry or very wet conditions (see Table 12).
This validation study revealed that 5-day SM2RAIN-CCI was reasonably suitable for some operational purposes on a weekly scale (e.g., water irrigation planning) in those poorly-gauged regions in the semiarid areas of northeast Brazil.Future work will involve on validation of the product at different spatial and temporal scales, as well as during the drought and wet periods for a complete understanding of its potential as soon as new updates and improvements of SM2RAIN-CCI are available.

Figure 2 .
Figure 2. The spatial distribution of rain gauges used in the generation of the GBGR dataset provided by the Brazilian Water Agency (ANA), the National Institute of Meteorology (INMET), and the Water and Electric Energy Department of São Paulo state (DAEE) within the NEB [61].The number of rain gauges depicts the median value for each grid during the period 1998-2015.The area of each grid is equal to 0.25° × 0.25°.Whited cells depict grids without rain gauges.
p(t) is the estimated rainfall between two successive SM retrievals for the time step dt [L/T], Z* represents the water capacity of the soil layer [L], s(t) denotes the relative soil saturation [-], t is the time [t], and a and b are two parameters describing the nonlinearity between soil saturation and drainage.The parameters a, b, and Z* are estimated through calibration [21].The SM2RAIN algorithm has the main limitation of not being able to estimate rainfall when the soil is close to

Figure 2 .
Figure 2. The spatial distribution of rain gauges used in the generation of the GBGR dataset provided by the Brazilian Water Agency (ANA), the National Institute of Meteorology (INMET), and the Water and Electric Energy Department of São Paulo state (DAEE) within the NEB [61].The number of rain gauges depicts the median value for each grid during the period 1998-2015.The area of each grid is equal to 0.25 • × 0.25 • .Whited cells depict grids without rain gauges.

100 B 0 Figure 3 .
Figure 3. Flowchart of the summarized research design and method.

Figure 3 .
Figure 3. Flowchart of the summarized research design and method.

Figure 4
Figure4displays the spatiotemporal distribution for R and RMSE between 5-day SM2RAIN-CCI against the GBGR dataset during the periods1998-2001, 2002-2006, and 2007-2013.Note that the blank area in Figure4is due to the use of a static mask to mask out periods with high frozen soil and snow probabilities, rainforest areas, and areas with high topographic complexity before applying the SM2RAIN algorithm to the ESA CCI SM dataset[21].Five-day SM2RAIN-CCI rainfall estimates exhibited reasonably good agreement with GBGR, especially over the inland of Bahia (BA), Pernambuco (PE), and Ceara (CE) in terms of R and RMSE.The RMSE and R spatial patterns seemed to be related to the rainfall regimes.The lowest RMSE values are observed in semiarid regions characterized by the low rainfall regime and the presence of open flatland that might favor the satellite retrievals accuracy[71].In congruence with results reported in Table5, the visual comparison also revealed better performance during the period 2007-2013.On the other hand, 5-day SM2RAIN-CCI showed lower performance along the NEB coastline, characterized by a wetter rainfall regime than inland.

Figure 4 .
Figure 4. Pearson correlation (top row) and root mean square error (bottom row) maps between the 5-day SM2RAIN-CCI and GBGR datasets for accumulated rainfall during the periods: (a,d) 1998-2001; (b,e) 2002-2006; and (c,f) 2007-2013.The NEB states are shown in Figure 1d.Whited cells over Maranhão (MA) depict gaps due to the application of a static mask used by the SM2RAIN-CCI product.

Figure 4 .
Figure 4. Pearson correlation (top row) and root mean square error (bottom row) maps between the 5-day SM2RAIN-CCI and GBGR datasets for accumulated rainfall during the periods: (a,d) 1998-2001; (b,e) 2002-2006; and (c,f) 2007-2013.The NEB states are shown in Figure 1d.Whited cells over Maranhão (MA) depict gaps due to the application of a static mask used by the SM2RAIN-CCI product.

Figure 5 .
Figure 5. Boxplots for the mean monthly rainfall estimated by GBGR over the biomes: (a) AMZ; (b) MAT; (c) CER; and (d) CAAT during the period 1998-2015.The center line of each boxplot depicts the median value (50th percentile), and the box encompasses the 25th and 75th percentiles of the sample data.The whiskers extend from q1 − 1.5 × (q3 − q1) to q3 + 1.5 × (q3 − q1), where q1 and q3 are the 25th and 75th percentiles of the sample data, respectively.

Figure 5 .
Figure 5. Boxplots for the mean monthly rainfall estimated by GBGR over the biomes: (a) AMZ; (b) MAT; (c) CER; and (d) CAAT during the period 1998-2015.The center line of each boxplot depicts the median value (50th percentile), and the box encompasses the 25th and 75th percentiles of the sample data.The whiskers extend from q1 − 1.5 × (q3 − q1) to q3 + 1.5 × (q3 − q1), where q1 and q3 are the 25th and 75th percentiles of the sample data, respectively.

Figure 7 .
Figure 7. Boxplots for the correlation coefficient obtained by comparing the daily rainfall estimates from the SM2RAIN-CCI against ones from the GBGR dataset grouped per biome and month during the period 1998-2015.The components of a boxplot are described in Figure 5.

Figure 7 .
Figure 7. Boxplots for the correlation coefficient obtained by comparing the daily rainfall estimates from the SM2RAIN-CCI against ones from the GBGR dataset grouped per biome and month during the period 1998-2015.The components of a boxplot are described in Figure 5.

Figure 8 .
Figure 8.As per Figure 7, but for the percent bias.The components of a boxplot are described in Figure 5.

Figure 9 .
Figure 9.As per Figure 7, but for the CHIRPS rainfall product.Figure 9.As per Figure 7, but for the CHIRPS rainfall product.

Figure 9 .
Figure 9.As per Figure 7, but for the CHIRPS rainfall product.Figure 9.As per Figure 7, but for the CHIRPS rainfall product.

Figure 10 .
Figure 10.As per Figure 8, but for the CHIRPS rainfall product.

Figure 10 .
Figure 10.As per Figure 8, but for the CHIRPS rainfall product.

Figure 11 .
Figure 11.As per Figure 7, but for the MSWEP rainfall product.

Figure 11 .
Figure 11.As per Figure 7, but for the MSWEP rainfall product.

Figure 12 .
Figure 12.As per Figure 8, but for the MSWEP rainfall product.

Figure 12 .
Figure 12.As per Figure 8, but for the MSWEP rainfall product.

Figure 13 .
Figure 13.As per Figure 7, but for the CMORPH rainfall product during the period 2002-2015.

Figure 14 .
Figure 14.As per Figure 8, but for the CMORPH rainfall product during the period 2002-2015.

Figure 13 .
Figure 13.As per Figure 7, but for the CMORPH rainfall product during the period 2002-2015.

Figure 13 .
Figure 13.As per Figure 7, but for the CMORPH rainfall product during the period 2002-2015.

Figure 14 .
Figure 14.As per Figure 8, but for the CMORPH rainfall product during the period 2002-2015.Figure 14.As per Figure 8, but for the CMORPH rainfall product during the period 2002-2015.

Figure 14 .
Figure 14.As per Figure 8, but for the CMORPH rainfall product during the period 2002-2015.Figure 14.As per Figure 8, but for the CMORPH rainfall product during the period 2002-2015.

Figure 15 .
Figure 15.Daily rainfall estimates from rainfall products against in situ daily rainfall for each benchmark station shown in Figure 1a during the period 1998-2015: (a-d) Site: Zé Doca, Biome: AMZ; (e-h) Site: Itiruçu, Biome: MAT; (i-l) Site: Barreiras, Biome: CER; and (m-p) Site: Barbalha, Biome: CAAT.The correlation coefficient (R), bias percent (B), and root mean square error (RMSE) for each comparison are shown.The orange line indicates 1:1 correspondence and the red line gives the linear regression best fit.The benchmark stations' data was provided by the National Institute of Meteorology (INMET).

Figure 15 .
Figure 15.Daily rainfall estimates from rainfall products against in situ daily rainfall for each benchmark station shown in Figure 1a during the period 1998-2015: (a-d) Site: Zé Doca, Biome: AMZ; (e-h) Site: Itiruçu, Biome: MAT; (i-l) Site: Barreiras, Biome: CER; and (m-p) Site: Barbalha, Biome: CAAT.The correlation coefficient (R), bias percent (B), and root mean square error (RMSE) for each comparison are shown.The orange line indicates 1:1 correspondence and the red line gives the linear regression best fit.The benchmark stations' data was provided by the National Institute of Meteorology (INMET).
periods against the GPCC-FDD gauge-based product[21].The use of different calibration periods relies on the different data and sensors that have been used for building the Active and Passive SM datasets.The time-span of the SM2RAIN-CCI data ranges from 1 January 1998, to 31 December 2015, over land at a 0.25 • spatial resolution on a daily basis (available online at https://zenodo.org/record/846260).The same SM2RAIN-CCI product as considered in Reference [21] is employed here.

Table 2 .
Main features of the datasets considered.

Table 4 .
Formulas of categorical metrics, where A: number of hits, B: number of false alarms, C: number of misses, and D: number of correct negatives.N: number of events.

Table 5 .
Correlation coefficients (R) and root mean square error (RMSE) for the accumulated rainfall estimates derived from 5-day SM2RAIN-CCI against the GBGR dataset during the three calibration periods1998-2001, 2002-2006, and 2007-2013.The median, mean, minimum, maximum, and standard deviation values for each performance metric are shown.

Table 6 .
Correlation coefficients (R), mean error (ME), mean absolute error (MAE), and percent bias (B) for the 5-day accumulated rainfall estimates derived from SM2RAIN-CCI against observations from the GBGR data over the main biomes of NEB during the period 1998-2015.The monthly rainfall average (MRA) is based on the GBGR data.For each score, the median value is reported.

Table 6 .
Correlation coefficients (R), mean error (ME), mean absolute error (MAE), and percent bias (B) for the 5-day accumulated rainfall estimates derived from SM2RAIN-CCI against observations from the GBGR data over the main biomes of NEB during the period 1998-2015.The monthly rainfall average (MRA) is based on the GBGR data.For each score, the median value is reported.

Table 7 .
Probability of detection (POD), false alarm ratio (FAR), critical success index (CSI), and accuracy (ACC) for the daily rainfall estimates derived from SM2RAIN-CCI against observations from the GBGR data over the main biomes of NEB during the period 1998-2015.For each score, the median value is reported.
Figure 8.As per Figure

Table 8 .
As per Table7, but for the CHIRPS rainfall product.

Table 8 .
As per Table7, but for the CHIRPS rainfall product.

Table 9 .
As per Table7, but for the MSWEP rainfall product.

Table 9 .
As per Table7, but for the MSWEP rainfall product.

Table 10 .
As per Table7, but for the CMORPH rainfall product during the period 2002-2015.

Table 11 .
Correlation coefficients for the R, B, and POD scores against elevation at the pixel scale for each rainfall dataset.Not significant at a 95% level of significance based on the t-test statistic. *

Table 12 .
Comparison among correlation coefficient (R), percent bias (B), probability of detection (POD), false alarm ratio (FAR), critical success index (CSI), and accuracy (ACC) for the daily rainfall estimates derived from SM2RAIN-CCI, CHIRPS, MSWEP, and CMORPH against observations from the GBGR data over the biomes of NEB.For each score, the median value is reported.