Assessment of Gridded CRU TS Data for Long-Term Climatic Water Balance Monitoring over the S ã o Francisco Watershed, Brazil

: Understanding the long-term behavior of rainfall and potential evapotranspiration (PET) over watersheds is crucial for the monitoring of hydrometeorological processes and climate change at the regional scale. The S ã o Francisco watershed (SFW) in Brazil is an important hydrological system that transports water from humid regions throughout the Brazilian semiarid region. However, long-term, gapless meteorological data with good spatial coverage in the region are not available. Thus, gridded datasets, such as the Climate Research Unit TimeSeries (CRU TS), can be used as alternative sources of information, if carefully validated beforehand. The objective of this study was to assess CRU TS (v4.02) rainfall and PET data over the SFW, and to evaluate their long-term (1942–2016) climatological aspects. Point-based measurements retrieved from rain gauges and meteorological stations of national agencies were used for validation. Overall, rainfall and PET gridded data correlated well with point-based observations ( r = 0.87 and r = 0.89), with a poorer performance in the lower (semiarid) portion of the SFW ( r ranging from 0.50 to 0.70 in individual stations). Increasing PET trends throughout the entire SFW and decreasing rainfall trends in areas surrounding the semiarid SFW were detected in both gridded (smoother slopes) and observational (steeper slopes) datasets. This study provides users with prior information on the accuracy of long-term CRU TS rainfall and PET estimates over the SFW.


Introduction
Precipitation and potential evapotranspiration (PET) are among the most important meteorological variables related to the water and energy cycles that regulate climate worldwide. In a global climate change context, evidence indicates that long-term changes in these variables are leading to important impacts in agriculture, water resources management, and ecosystems dynamics in general [1][2][3][4]. Therefore, their consistent and reliable monitoring is crucial to understand past, present, and future climate behavior. This is particularly important in regions that are more vulnerable to climate change, such as arid and semiarid regions, which are sensitive to small shifts in precipitation and PET patterns [5].
Gridded monthly precipitation and PET data from the CRU TS version 4.02 were compared with observational point-based data measured in rain gauges and meteorological stations located in the SFW. The source, characteristics and quality control of the data will be described in the following sections. Data were compared over the period from 1942 to 2016 (75 years), which was defined based on observed data availability. A longer period could have been considered, but with less spatial representativeness and less data for validation.

CRU TS v4.02 Data
The CRU TS v4.02 dataset comprises the fourth version of the gridded product developed by the Climate Research Unit of the University of East Anglia. It includes a number of variables (precipitation, mean temperature, vapor pressure, daily temperature range, PET, among others) disposed in a global (excluding Antarctica) 0.5 • × 0.5 • grid (~56 km) over the period from 1901 to 2017, obtained through the interpolation of monthly data retrieved from the archives of the World Meteorological Organization [10,13]. The CRU TS grid for the SFW is shown in Figure 1.
It is worth mentioning that the CRU TS dataset provides PET data calculated through the Penman-Monteith equation, which takes into consideration elements of the energy balance. Because of this, the proper evaluation of the product would be difficult to carry out, due to the lack of the necessary measured data (vapor pressure, cloud cover and static wind field) in the SFW. Thus, we decided to estimate Thornthwaite's PET from mean monthly temperature data.
It should be noted that we do not intend to discuss the pros and cons of Thornthwaite's method to estimate PET, as it has been thoroughly discussed in several studies [39][40][41]. Rather, we decided to use it as representation of potential atmospheric demand for water (in opposition to precipitated water) in broader climate terms. Indeed, these studies showed that the Penman-Monteith method undoubtedly retrieves better PET estimates, but temperature-derived PET (such as Thornthwaite's method) provides reliable estimates for long-term annual cycle studies or drought assessment studies based on the climatic water balance.

Point-Based Measurement Data
Monthly rainfall and mean air temperature data were obtained from rain gauges monitored by the National Waters Agency (ANA-Agência Nacional das Águas), meteorological stations of the National Institute of Meteorology (INMET-Instituto Nacional de Meteorologia), and surface stations compiled in the Global Historical Climatology Network (GHCN) database. All stations and gauges located within the boundaries of the SFW plus an additional buffer of up to 60 km (approximately equivalent to one CRU TS grid) were selected. A total of 171 point-based series were selected for the evaluation of rainfall data, and 57 series were selected for the evaluation of PET ( Figure 1).

Thornthwaite's Potential Evapotranspiration
PET (mm) was derived from temperature values (°C) through Thornthwaite's classical method. Monthly PET was estimated based on temperature as follows [42,43] where ETp is the noncorrected standard PET, Tm is mean monthly air temperature (ºC), and the parameters I and a are calculated as: and,

Point-Based Measurement Data
Monthly rainfall and mean air temperature data were obtained from rain gauges monitored by the National Waters Agency (ANA-Agência Nacional das Águas), meteorological stations of the National Institute of Meteorology (INMET-Instituto Nacional de Meteorologia), and surface stations compiled in the Global Historical Climatology Network (GHCN) database. All stations and gauges located within the boundaries of the SFW plus an additional buffer of up to 60 km (approximately equivalent to one CRU TS grid) were selected. A total of 171 point-based series were selected for the evaluation of rainfall data, and 57 series were selected for the evaluation of PET ( Figure 1).

Thornthwaite's Potential Evapotranspiration
PET (mm) was derived from temperature values ( • C) through Thornthwaite's classical method. Monthly PET was estimated based on temperature as follows [42,43]: where ET p is the noncorrected standard PET, T m is mean monthly air temperature ( • C), and the parameters I and a are calculated as: (T m,i /5) 1.514 (2) and, a = 6.75 × 10 −7 I 3 − 7.71 × 10 −5 I 2 + 1.79 × 10 −2 I + 0.49 Atmosphere 2020, 11, 1207 5 of 25 Since ET p represents PET that would occur in the thermal conditions of a standard month with 30 days and a photoperiod of 12 h, it needs to be corrected for each month, as follows: where ND is the number of days in the corresponding month, and N is the mean photoperiod in that month. These correction factors can be found in tables presented in several studies, such as the classical Thornthwaite's work [42].

Observed Data Quality Control
Observed data went through careful quality control before being used as reference for the evaluation of the gridded dataset. Each time series was firstly screened by removing all monthly data outside the ±3.5 standard deviation threshold and all duplicated values in adjacent months [23,44]. Then, the homogeneity of the time series was assessed by means of the Standard Normal Homogeneity Test (SNHT) developed by Alexandersson [45]. The SNHT was used in order to identify potential rupture points in the series, which could indicate, for example, that the station has been displaced. For each observation i two averages are calculated: one for the k months before the observation i, µ 1k ; one for the k months after the observation i, µ 2k . The statistic of the SNHT (T i ) is then calculated as follows: where µ i is the mean between µ 1k,i and µ 2k,i , and σ i is the standard deviation in the period of k months before and after the observation i. If the peak values of T i for a given observation are higher than a threshold, these points are marked as potential rupture points in the time series.
Although the threshold value of T i should depend on the number of observations in each evaluated time series, studies show that it varies from 9 to 10.6 for series with 36 to 900 observations with a 95% confidence level [46]. Since monthly data in the present study comprises a period of 75 years (maximum of 900 observations), we adopted a T i threshold value of 10 for each time series. The value of k was set to 60 (5 years) for temperature time series and 120 (10 years) for rainfall time series, since the latter are more heterogeneous and with higher variability.
Finally, each time series was visually analyzed with the aid of the T i statistic, removing nonhomogeneous portions of the series. Overall, we retained the most recent portion of the series, i.e., values measured after the detection of a rupture point, as it is conventionally carried out in similar studies [47]. Figure 2 shows an example of the analysis using the SNHT for one rainfall time series.  Percentage of gaps was not considered as an exclusion criterion, and therefore, no gap filling procedure was carried out. The objective was to compare CRU TS data with as many reliable and consistent available point-based data as possible. The description and characterization of all observed data used in the study are fully presented in the Supplementary Materials (Tables S1-S3;  Figures S1 and S2), including source, latitude, longitude, elevation, total number of observations, percentage of gaps, and data removed in each step of the quality control.

Spatial and Temporal Units of Analysis
The entire study period comprised 75 years from 1942 to 2016, as previously mentioned. However, CRU TS data were also evaluated in three distinct 25-year periods: 1942-1966, 1967-1991, and 1992-2016. Such a procedure was carried out in order to measure the accuracy of the gridded dataset throughout the whole time series, verifying if data were better or worse represented in a particular period. Intervals of 25 years were chosen because they represent the closest interval to climatological normals, into which the full 75-year time series could be divided.
The evaluation was carried out for the entire SFW considering the four previously mentioned subregions: USF, MSF, LMSF, and LSF ( Figure 1). By doing so, it was possible to identify if there is a particular region under specific climate conditions where CRU TS data is more or less accurate. For that end, Table 1 shows the main climate features of each subregion and also the number and density of stations selected for evaluation. Previous studies showed that station's elevation can be an important source of uncertainty in CRU TS rainfall data [7], particularly above 3800 m. Since maximum altitude in the SFW is of approximately 1500 m ( Figure 1) and topography is generally nonheterogeneous, no specific evaluation regarding altitude was carried out.

Accuracy Measurement
The Pearson's correlation coefficient (r) was calculated between each CRU TS grid and each station within that grid, if any. In some cases, multiple stations were located within the same grid (as seen in Figure 1). In these situations, r was calculated for each station individually in order to identify how well CRU TS grids capture small scale nuances and variability in observed data due to spatial heterogeneity. The correlation between all available data was also computed for each subregion and for each period.
Furthermore, the monthly root mean square error (RMSE) and percent bias (PBIAS) were calculated considering the mean CRU TS and point-based rainfall and PET time series in each subregion (mean between all grids or stations within each subregion). The monthly reliability of the CRU TS datasets in each 25-year period was also evaluated based on the relative RMSE value. When RMSE is less than 50% of the mean observed value in a given month, the CRU TS estimates are considered reliable in relative terms [48,49].
The RMSE and the PBIAS were computed as follows [50]: where E CRU refers to CRU TS data, O refers to point-based observed values, n is the total number of compared pairs, and:

Trend Test and Change Point Detection
Trends in CRU TS rainfall and PET data were compared with trends in point-based data, considering the mean time series in each subregion. Series were pre-whitened (12-month moving average) in order to remove autocorrelation, and then linear trends were estimated through the least squares method [51]. The significance of the trend slopes was calculated through Student's t-test (p < 0.01). Furthermore, the Pettitt change point detection test [52] was used to identify the exact year where a significant shift in the central tendency in the time series occurred, which was also compared between gridded and point-based measurements. When used in climate data, the Pettitt's test indicates the approximate period when a significant change is observed in the mean behavior of a given meteorological variable. We also compared the slope of the trend in each grid of the CRU TS databases with point-based trends in observational data. In this case, we selected only one station per grid point with a significant trend (p < 0.01), and only stations with consistent data throughout the entire studied period. Figure 3 shows an overall strong correlation between CRU rainfall data and surface observations during the entire 75 years of the time series (r = 0.87). Individual correlations were mostly higher than 0.80 throughout the USF, MSF, and part of the LMSF. Only the transition zone between the LMSF (semiarid region) and the LSF (tropical coastal region) presented a few stations with which CRU data were weakly or moderately correlated (between 0.50 and 0.70).

Overall Spatial and Temporal Performance
Regarding the three 25-year periods, a noticeable increase in available stations can be observed from 1967 onwards. Overall, rainfall was best represented by the CRU TS dataset during the period from 1967-1991, when correlations below 0.80 were observed with only 12 out of the 168 stations. On the other hand, the most recent period (1992-2016) was when CRU TS data presented the weakest correlation (between 0.50 and 0.80) with almost all stations located in the LMSF and the LSF, although overall correlation was high (r = 0.87). Throughout the three periods, CRU TS data was mostly strongly correlated with observed data in the USF and MSF.
These results are further detailed in Figure 4, which shows the scatter plot of CRU TS rainfall data and observed data in each subregion of the SFW and for each 25-year period. It can be noted that in all plots CRU data seems to frequently underestimate higher monthly rainfall values. Another overall observation is that increasing the sample size did not necessarily lead to a better fit of the linear relationship between datasets.  Regarding the three 25-year periods, a noticeable increase in available stations can be observed from 1967 onwards. Overall, rainfall was best represented by the CRU TS dataset during the period from 1967-1991, when correlations below 0.80 were observed with only 12 out of the 168 stations. On the other hand, the most recent period (1992-2016) was when CRU TS data presented the weakest correlation (between 0.50 and 0.80) with almost all stations located in the LMSF and the LSF, although overall correlation was high (r = 0.87). Throughout the three periods, CRU TS data was mostly strongly correlated with observed data in the USF and MSF.  (1942-1966; 1967-1991; 1992-2016). Overall correlation (r) and total number of stations (n) in each period are shown at the top of each map. USF-upper; MSF-middle; LMSF-lower-middle; LSF-lower São Francisco watershed.
As previously hinted in Figure 3, the highest slope coefficients of the scatter plot fit lines were found for the period from 1967-1991 (0.85; 0.90; 0.78; 0.91 for the USF, MSF, LMSF, and LSF, respectively). This means than in this period, the CRU TS rainfall estimates were less biased, and more correlated to observed data as evidenced by the correlation coefficients. The lowest slope coefficients (more frequent underestimations), however, were found in the most recent period, 1992-2016 (0.83; 0.81; 0.60; 0.72 for the USF, MSF, LMSF, and LSF, respectively), despite a larger available sample during these years.
It is also worth mentioning that the differences between periods for the USF and MSF are much less prominent than for the LMSF and LSF.
Atmosphere 2020, 11, x FOR PEER REVIEW 9 of 27 These results are further detailed in Figure 4, which shows the scatter plot of CRU TS rainfall data and observed data in each subregion of the SFW and for each 25-year period. It can be noted that in all plots CRU data seems to frequently underestimate higher monthly rainfall values. Another overall observation is that increasing the sample size did not necessarily lead to a better fit of the linear relationship between datasets. As previously hinted in Figure 3, the highest slope coefficients of the scatter plot fit lines were found for the period from 1967-1991 (0.85; 0.90; 0.78; 0.91 for the USF, MSF, LMSF, and LSF, respectively). This means than in this period, the CRU TS rainfall estimates were less biased, and more correlated to observed data as evidenced by the correlation coefficients. The lowest slope coefficients (more frequent underestimations), however, were found in the most recent period, 1992-2016 (0.83; 0.81; 0.60; 0.72 for the USF, MSF, LMSF, and LSF, respectively), despite a larger available sample during these years. It is also worth mentioning that the differences between periods for the USF and MSF are much less prominent than for the LMSF and LSF.
Regarding PET values calculated using CRU TS temperature dataset, results appear to be more consistent both spatially and temporally, despite a few isolated poorly correlated stations in 1942-1966 and 1967-1991. Figure 5 shows that both in the 75-year period and in the three 25-year subperiods, most of the CRU TS data were mostly strongly correlated with available stations (r > 0.80). Furthermore, good relationships between datasets are found throughout the entire SFW, with no particular subregion presenting a weaker or stronger correlation pattern. This result was expected, since temperature, and therefore Thornthwaite's PET, is less variable in the region. Regarding PET values calculated using CRU TS temperature dataset, results appear to be more consistent both spatially and temporally, despite a few isolated poorly correlated stations in 1942-1966 and 1967-1991. Figure 5 shows that both in the 75-year period and in the three 25-year subperiods, most of the CRU TS data were mostly strongly correlated with available stations (r > 0.80). Furthermore, good relationships between datasets are found throughout the entire SFW, with no particular subregion presenting a weaker or stronger correlation pattern. This result was expected, since temperature, and therefore Thornthwaite's PET, is less variable in the region.
The scatter plot between CRU TS temperature-derived PET and observed data ( Figure 6) shows that the gridded dataset tends to more frequently underestimate high PET values, while low PET values are slightly overestimated. This pattern is observed across all analyzed 25-year periods and all subregions. It is also evident that low PET values in the LMSF (semiarid region) are more overestimated than in other regions. For example, the slope coefficient in the LMSF during 1992-2016 was 0.66, while in the USF, MSF, and LSF it was 0.74, 0.68, and 0.72, respectively.
The scatter plot between CRU TS temperature-derived PET and observed data ( Figure 6) shows that the gridded dataset tends to more frequently underestimate high PET values, while low PET values are slightly overestimated. This pattern is observed across all analyzed 25-year periods and all subregions. It is also evident that low PET values in the LMSF (semiarid region) are more overestimated than in other regions. For example, the slope coefficient in the LMSF during 1992-2016 was 0.66, while in the USF, MSF, and LSF it was 0.74, 0.68, and 0.72, respectively.  (1942-1966; 1967-1991; 1992-2016). Overall correlation (r) and total number of stations (n) in each period are shown at the top of each map. USF-upper; MSF-middle; LMSF-lower-middle; LSFlower São Francisco watershed. Different to what was observed with rainfall data, correlation is higher during the first 25-year period in the USF and LSF (r = 0.90) with the regression line better fitting to the 1:1 ratio. During the two following periods, the slope of the scatter plot fit line does not seem to have changed much in any subregion, even with the increase in sample size. Nevertheless, the correlation coefficient in the last period (1992-2016) is similar or higher than in the first period  in all subregions, except for the LMSF.
Atmosphere 2020, 11, x FOR PEER REVIEW 11 of 27 Different to what was observed with rainfall data, correlation is higher during the first 25-year period in the USF and LSF (r = 0.90) with the regression line better fitting to the 1:1 ratio. During the two following periods, the slope of the scatter plot fit line does not seem to have changed much in any subregion, even with the increase in sample size. Nevertheless, the correlation coefficient in the last period (1992-2016) is similar or higher than in the first period  in all subregions, except for the LMSF.

Seasonal Performance
The seasonal performance of the CRU TS datasets was assessed based on the analysis of monthly RMSE and PBIAS. Regarding rainfall estimates, Figure 7 shows that the RMSE generally improves in the most recent periods of the 75-year time series in all subregions. Since the RMSE is scaledependent, higher values are found during the wet months if compared to dry months. In the LMSF and LSF, RMSE was lower than 68 mm in all months during the period from 1967 until 2016. During dry months, RMSE varied from 4 mm (August 1942-1966 in the MSF) to 34 mm (December 194234 mm (December -1966 in the LSF).

Seasonal Performance
The seasonal performance of the CRU TS datasets was assessed based on the analysis of monthly RMSE and PBIAS. Regarding rainfall estimates, Figure 7 shows that the RMSE generally improves in the most recent periods of the 75-year time series in all subregions. Since the RMSE is scale-dependent, higher values are found during the wet months if compared to dry months. In the LMSF and LSF, Regarding PET estimates, the behavior of monthly RMSE and PBIAS was more consistent and less important in all subregions and during all periods. RMSE ranged from 6.4 mm month −1 in the LSF (August 1942-1966) to 26.8 mm month −1 also in the LSF (October 1992-2016). PBIAS, on the other hand was higher in the LMSF, with overestimations of up to 11.4% in July .
Finally, Table A1 (Appendix A, presented at the end of the manuscript before the references) shows the monthly observed and CRU TS rainfall mean (±standard deviation) for each 25-years period and each subregion. Furthermore, it indicates whether CRU TS estimates are reliable or not in each period based on the relative proportion of the RMSE in relation to the observed mean values. Overall, CRU TS rainfall estimates proved to be reliable in the wet season months of the USF and the MSF (approximately from October to March) in all period, except for 1942-1966 in the MSF. For the LMSF and the LSF, the RMSE analysis indicated that the monthly rainfall estimates of the CRU TS dataset are unreliable since the RMSE was higher than 50% of the observed mean values. This result, however, should be taken with care since the relative RMSE value in dry periods are expected to be high, although absolute differences may not be so remarkable. For example, in June 1942-1966, CRU TS rainfall was of 3 ± 7 mm in the MSF, while observed rainfall was of 2 ± 6 mm, which is obviously a reliable estimate despite the relative RMSE lower than 50%.
Similarly, Table A2 (Appendix A, presented at the end of the manuscript before the references) shows the monthly observed and CRU TS Thornthwaite's PET mean (± standard deviation) for each 25-year period and each subregion. In this case, the analysis of the RMSE indicated that the CRU TS product is reliable in all months, during the entire 75-years period and in all subregions.

Trends and Change-Point Comparisons
Firstly, we compared the trends and change-points detected in the smoothed monthly mean time series of rainfall and PET derived from the CRU TS dataset and observational data (Figure 8). For the USF, the same change-points were identified in both datasets: December 2012 and November 1993 Regarding PET estimates, the behavior of monthly RMSE and PBIAS was more consistent and less important in all subregions and during all periods. RMSE ranged from 6.4 mm month −1 in the LSF (August 1942-1966) to 26.8 mm month −1 also in the LSF (October 1992-2016). PBIAS, on the other hand was higher in the LMSF, with overestimations of up to 11.4% in July .
Finally, Table A1 (Appendix A, presented at the end of the manuscript before the references) shows the monthly observed and CRU TS rainfall mean (±standard deviation) for each 25-years period and each subregion. Furthermore, it indicates whether CRU TS estimates are reliable or not in each period based on the relative proportion of the RMSE in relation to the observed mean values. Overall, CRU TS rainfall estimates proved to be reliable in the wet season months of the USF and the MSF (approximately from October to March) in all period, except for 1942-1966 in the MSF. For the LMSF and the LSF, the RMSE analysis indicated that the monthly rainfall estimates of the CRU TS dataset are unreliable since the RMSE was higher than 50% of the observed mean values. This result, however, should be taken with care since the relative RMSE value in dry periods are expected to be high, although absolute differences may not be so remarkable. For example, in June 1942-1966, CRU TS rainfall was of 3 ± 7 mm in the MSF, while observed rainfall was of 2 ± 6 mm, which is obviously a reliable estimate despite the relative RMSE lower than 50%.
Similarly, Table A2 (Appendix A, presented at the end of the manuscript before the references) shows the monthly observed and CRU TS Thornthwaite's PET mean (± standard deviation) for each 25-year period and each subregion. In this case, the analysis of the RMSE indicated that the CRU TS product is reliable in all months, during the entire 75-years period and in all subregions.

Trends and Change-Point Comparisons
Firstly, we compared the trends and change-points detected in the smoothed monthly mean time series of rainfall and PET derived from the CRU TS dataset and observational data (Figure 8). For the USF, the same change-points were identified in both datasets: December 2012 and November 1993 for rainfall and PET, respectively. Furthermore, significant decreasing trends were found ( Table 2)  Regarding PET data, significant increasing trends were found in both datasets and in all portions of the time series (Figure 8 and Table 2). Indeed, the change rate in PET increased from 1 mm decade −1 until 1993 to 3 mm decade −1 in the last 25 years of the studied period. The mean PET in the two sections of the time series increased from 1092 mm (observational) and 1036 mm (CRU TS) to 1165 mm (observational) and 1122 mm (CRU TS).
Atmosphere 2020, 11, x FOR PEER REVIEW 13 of 27 for rainfall and PET, respectively. Furthermore, significant decreasing trends were found (Table 2) for both the entire rainfall time series (−1 mm decade −1 ) and from December 2012 to 2016 (−14 and −17 mm decade −1 for observed and CRU TS data, respectively). Regarding PET data, significant increasing trends were found in both datasets and in all portions of the time series (Figure 8 and Table 2). Indeed, the change rate in PET increased from 1 mm decade −1 until 1993 to 3 mm decade −1 in the last 25 years of the studied period. The mean PET in the two sections of the time series increased from 1092 mm (observational) and 1036 mm (CRU TS) to 1165 mm (observational) and 1122 mm (CRU TS). In the MSF, an important change in behavior was identified by Pettitt's test in the observed time series in March 1964 ( Figure 8 and Table 2). Indeed, rainfall values increased after this year but subsequently presented a significant decreasing trend of −2 mm decade −1 . For CRU TS rainfall data, a significant change-point was detected in October 1986, while the decreasing trend was significant only when considering the entire time series (−1 mm decade −1 ). Regarding PET in the MSF, similar change-points were detected in the two datasets: July 1987 (observations) and July 1986 (CRU TS). In the MSF, an important change in behavior was identified by Pettitt's test in the observed time series in March 1964 ( Figure 8 and Table 2). Indeed, rainfall values increased after this year but subsequently presented a significant decreasing trend of −2 mm decade −1 . For CRU TS rainfall data, a significant change-point was detected in October 1986, while the decreasing trend was significant only when considering the entire time series (−1 mm decade −1 ). Regarding PET in the MSF, similar change-points were detected in the two datasets: July 1987 (observations) and July 1986 (CRU TS). Furthermore, significant increasing trends were found in both datasets, although in the observational data the overall change rate was of 8 mm decade −1 compared to only 2 mm decade −1 for the CRU TS time series. Still regarding Figure 8 and Table 2, also similar change-points were found in the LMSF rainfall time series: January 1964 and December 1963, for the observations and CRU TS estimates, respectively.
Both portions of the time series presented significant decreasing trends: −4 and −3 mm decade −1 for the observational data and −1 and −2 mm decade −1 for CRU TS data. Therefore, rainfall derived from the CRU TS dataset appears to decrease slightly smoother than observed data. Regarding PET data in the LMSF, nonsignificant change-points were found, while a significant overall increasing trend was found in both datasets.
Finally, significant change-points at different periods were detected in the LSF (Figure 8 and Table 2): July 1979 and May 1995, for observed and CRU TS rainfall data, respectively. In these series, significant decreasing trends of −2 and −1 mm decade −1 were found with a mean annual rainfall of 785 mm and 878 mm for observational and CRU TS data, respectively. At the same time, a remarkable significant increasing trend of 6 mm decade −1 was found for observed PET data. This increase reached up to 11 mm decade −1 in the period from May 1995 until the end of the series. CRU TS derived PET also presented significant increasing trends, but with a much less important change rate (1 mm decade −1 ).
The spatial distribution of the significant trend slopes (p < 0.01) of CRU TS data is shown in Figure 9. For rainfall, negative trends are found in most part of the MSF, LSF and the northern portion of the USF. Additionally, positive trends are found in the southern portion of the USF. Overall, these results are similar to what was found with observed data, which also indicated negative trends in the same regions of the SFW, although with higher change rates. Furthermore, CRU TS data featured significant rainfall trends in large areas of the basin, while point-based significant trends are more heterogeneous and sparser.

Discussion
In the first part of the study we assessed the overall spatial and temporal performance of the CRU TS dataset in representing monthly rainfall and PET (estimated from temperature data) over the SFW. Regarding Thonrthwaite's PET derived from the CRU TS dataset, significant trends were found in the entire SFW, indicating an increase in the atmospheric demand for water. Indeed, observational data was also mostly significant, mainly in the LMSF and LSF while some stations in the MSF and USF presented non-significant trends. One station at the LMSF, near the Sobradinho reservoir, presented negative PET trend, which was not captured by the CRU TS dataset. The magnitude of the change rate is also stronger in observed data than in CRU TS data, particularly in the LMSF and the LSF.

Discussion
In the first part of the study we assessed the overall spatial and temporal performance of the CRU TS dataset in representing monthly rainfall and PET (estimated from temperature data) over the SFW.
CRU TS rainfall data presented weaker correlation with observed data in the LMSF and the LSF, which is probably associated with the high spatial and temporal variability of precipitation in these regions. The LMSF is entirely located in the semiarid region of Brazil while the LSF comprises a transition zone between a semiarid environment and the coastal zone, with a more humid climate [30]. Thus, it is expected that interpolated datasets with relatively large spatial resolutions such as the CRU TS (~57 km) will perform worse in these regions. Previous studies have already reported a poorer performance of gridded datasets in regions of high rainfall variability such as semiarid zones in Pakistan [53]. PET data, on the other hand, presented more consistent results in spatial terms. Indeed, temperature and consequently Thornthwaite's PET spatial and temporal patterns are less variable [54,55].
The scatter plot between data showed that rainfall CRU TS data generally underestimate higher rainfall and PET values, while low PET values are usually overestimated. This is probably associated with the smoothing of data which is inherent to interpolation techniques used to produce gridded datasets [6,56].
Results showed that the best correlations between data were found in the period from 1967-1991, while the most recent period (1992-2016) presented slightly worst correlations. There are two potential explanations for this fact. Firstly, the density of the contributing stations with at least 75% of observations per decade for the development of the CRU TS product was higher in the second half of the 20th century than in the first decades of the 21st century [13]. Thus, it is expected that a higher density of stations provides better interpolated estimates when comparing with point-based observations. Another explanation is associated with the aforementioned characteristic of gridded datasets of usually performing worse over semiarid and drier regions. The period from 1992-2016 comprised the worst drought event (2012-2016) registered in the semiarid region of Brazil [57,58], which may have also contributed to the poorer performance of the algorithm in this 25-years period, especially in the LMSF and LSF.
The seasonal performance of the CRU TS datasets was assessed based on the monthly RMSE and PBIAS. Since the RMSE is scale-dependent, it was proportional to the overall magnitude of the assessed variables. Thus, higher rainfall RMSE are found during the wet season, while for PET it is rather stable throughout the year. The PBIAS, on the other hand, is much more sensitive to lower values (dry season) but indicates the direction of the monthly bias (underestimation or overestimation of observed values). Thus, high PBIAS values should be interpreted with caution for the dry season, while high RMSE values should be interpreted with caution for the wet season.
Results showed that CRU TS data generally overestimates rainfall during the dry season in the USF and MSF, while during the wet season estimations are reliable (RMSE < 50% of observed mean value). For Thornthwaite's PET, CRU TS slightly underestimates (maximum PBIAS of −10%) observations during all months in the USF, MSF, and LSF, although the estimates are reliable. Therefore, when using CRU TS data to monitor temperature and temperature-derived PET, one should note that actual atmospheric demand for water is slightly higher in these regions. In the LMSF, however, results are the opposite, with actual PET being slightly slower than CRU TS derived PET.
Monthly mean values of rainfall and PET derived from the CRU TS were comparable to those observed in surface stations. In the USF, higher rainfall rates were observed in the austral autumn and summer (October to March), mainly due to the development of the South Atlantic Convergence Zone in the central portion of South America, encompassing the western and southern SFW [30,59,60]. In the MSF, a similar pattern was observed, but with smaller rainfall rates since the region is larger and comprises the transition zone between the humid subtropical climate and the semiarid.
In the northern portion of the SFW, the LMSF is under the influence of a semiarid climate, where rainfall is modulated mainly by the position of the Intertropical Convergence Zone (ITCZ) from February to May [30,61], producing rainfall extreme events in that region [24]. The LSF, on the other hand, presents a rainfall regime typical of coastal zones in the Northeast Brazil (Köppen's As-tropical with dry summer), with important rain events occurring from March to July, mainly due to the ITCZ, sea breezes, and the propagation of easterly wave disturbances [30,62,63]. Regarding PET, seasonal variations are similar in all subregions of the SFW, although lower values are found in the higher latitudes of the basin (USF and MSF).
For the assessment of long-term monitoring of the two components of the climatic water balance, trends and change-points of CRU TS data were compared to observed data. Regarding regional mean time series, a general decreasing trend in rainfall and increasing trend in PET was observed in all subregions. Trends in both datasets presented the same sign, although observed data presented higher slopes, indicating that the actual balance between precipitation and PET may favor higher atmospheric demand for water in the future. In fact, the study by Marengo et al. [34] already projected an increase of up to 1.5 • C in temperature and a decrease of up to 20% in rainfall over the SFW until 2040. These results were corroborated by De Jong et al. [64]. The smaller slopes in CRU TS data trends may also be related to the smoothing of data due to the interpolation procedure, which can be clearly visualized in Figure 8.
Remarkable positive trend slopes for PET time series were found in the MSF and LSF (6 mm decade −1 ). Curiously, these are the regions surrounding the semiarid portion of the basin (LMSF), which may indicate an expansion of arid zones towards regions under more humid climates. Indeed, Dubreuil et al. [37] recently detected an expansion of the areas under semiarid climate in Brazil.
These results are further confirmed by the spatial distribution of CRU TS and point-based trend slopes presented in Figure 9. CRU TS data shows clear significant decreasing trends in rainfall over the MSF and LSF and an increase in the atmospheric demand for water throughout the entire watershed. The significance of point-based measures is more heterogeneous, which has been previously evidenced by Bezerra et al. [30]. It is also important to note that the negative rainfall trends observed in the LSF would not necessarily impact water availability in the São Francisco river, since it is located near its mouth. These changes, however, may influence the occurrence of rainfall extreme events in the region, which has been detected in previous studies [60].
Regarding point-based PET trends, a single isolated station presented a decreasing trend. In fact, this station is located downstream of the Sobradinho reservoir and the many irrigated perimeters that developed after the construction of the dam in the early 1980s. Previous studies have already assessed PET trends in this particular region, finding similar results and suggesting that the massive extension of the dam's lake and the irrigated perimeters might have influenced local climate [33].

Conclusions
This study presented a thorough assessment of rainfall and PET (calculated through temperature data) derived from the CRU TS dataset in the SFW, an extremely important basin in Brazil that lacks long-term consistent observed data with good spatial coverage for these variables. The study comprised time series of 75 years , and the evaluation was carried out also considering three 25-years period (1942-1966, 1967-1991, and 1992-2016) and four subregions of the basin (USF, MSF, LMSF, LSF) with different climate characteristics.
Overall, CRU TS data was strongly correlated with observational data, with maximum r equal to 0.88 in 1967-1991 for rainfall data and 0.89 in 1992-2016 for PET data. The spatial distribution of correlations indicated that the poorest performances (r ranging from 0.50 to 0.80) occurred in the LMSF and LSF regions, which are characterized by a semiarid climate and the transition to a humid coastal zone.
The evaluation of monthly estimates showed that estimations of Thornthwaite's PET obtained through CRU TS temperature data are reliable (RMSE < 50% of observed mean) in all months and in all subregions. Regarding rainfall data, CRU TS data was reliable for estimations derived during the wet months in the USF and MSF. In either case, results derived from analysis using these products must always be interpreted with caution. In fact, the present study strongly advocates that more in-depth assessment procedures should be carried out before using any gridded product for meteorological variables. Understanding the strengths and limitations of these datasets should be a first step in every study proposing their use.
The trends detected in both datasets pointed towards the same direction: increasing PET in all subregions and decreasing rainfall mostly in the MSF and the LSF, the areas surrounding the semiarid portion of the SFW. However, the slope of the trends in observed data was steeper than in CRU TS data.
In general, CRU TS data are consistent with observed data and their use should be encouraged, given that the actual conditions of available observed data in the region (as evidenced in Supplementary Figures S1 and S2) seriously limits consistent long-term hydroclimatological studies. However, the assessment provided in the present study should always be taken into consideration when using CRU TS rainfall and temperature data over the SFW.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4433/11/11/1207/s1, Table S1: Summary of the main characteristics of the measuring stations selected in the study, Figure S1: Visual representation of available data and gaps in each time series of observational rainfall data after quality control was applied, Figure S2: Visual representation of available data and gaps in each time series of observational temperature (Thornthwaite's potential evapotranspiration) data after quality control was applied, Table S2: Summary of observational rainfall data availability after each step of the quality control procedure at the seasonal scale, Table S3: Summary of observational temperature (Thornthwaite's potential evapotranspiration) data availability after each step of the quality control procedure at the seasonal scale. Acknowledgments: The authors are thankful to the French Ministry of Foreign Affairs for the "Eiffel Scholarship of Excellence" also granted to the first author. The authors are also thankful to the CRU of the University of East Anglia for developing and providing the gridded dataset, and to the INMET, ANA, and GHCN for providing station data.

Conflicts of Interest:
The authors declare no conflict of interest. Table A1. Monthly mean (µ) ± standard deviation (σ) of CRU TS rainfall estimations and observed data in the: USF-upper; MSF-middle; LMSF-lower-middle; LSF-lower São Francisco watershed. The reliability of CRU TS estimates in each month is also shown based on whether monthly root mean square error is less than 50% of the mean observed value (where indicates reliable results and indicates nonreliable results).