Evaluating Precipitation Estimates from Eta , TRMM and CHRIPS Data in the South-Southeast Region of Minas Gerais State — Brazil

Precipitation estimates derived from the Eta model and from TRMM (Tropical Rainfall Measuring Mission) and CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data) remotely sensed data were compared to the precipitation data of the INMET (National Institute of Meteorology) meteorological stations in the south-southeast region of Minas Gerais state, Brazil, in the period between July 2009 and June 2015. Then, information about evapotranspiration (ETR), water deficit (DEF), and water surplus (EXC) was obtained from the precipitation data, using the sequential water balance (SWB) separately for each type of precipitation data (INMET, TRMM, Eta, and CHIRPS). Subsequently, the components of the SWB were comparatively analyzed. The results indicate that all three products overestimate rainfall. The strongest relationships between the INMET data and the estimated data were observed for the TRMM, in terms of precipitation estimates, as well as DEF, EXC, and ETR components. The Eta precipitation estimates are overestimated relative to those from INMET, resulting in underestimation of the water deficit (DEFETA) and overestimation of evapotranspiration (ETRETA). In general, the CHIRPS data presented a pattern similar to the station data, though statistical analyses were lower than those of the TRMM data.


Introduction
Precipitation is one of the main meteorological variables affecting soil preparation and the harvest, transportation, growth, development, and productivity of agricultural crops, in addition to having a large impact on water and energy resources [1].In this context, the information on meteorological conditions provided by ground stations is of utmost importance for planning agricultural activities, monitoring crops, and making decisions.However, Brazil has a low density and uneven distribution of meteorological stations.The data are generally difficult to access and can contain irregularities, which limits the registration of this data and limits the characterization of spatial and temporal patterns of environmental variables that is necessary for agricultural planning and research.Such limitations consequently inhibit the application of the data to agricultural productivity modeling [2][3][4].
At large scales, remotely sensed data and numerical predictions have helped to overcome the lack of meteorological stations, serving as an alternative source of time series data on global and/or regional scales and enabling event detection and decision-making.However, these resources are rarely used in the agricultural sector [5,6].
There are currently many precipitation products derived from numerical models, such as Eta [7] and Global Forecast System [8], as well as data based on remote sensing and meteorological station observations, such as the Tropical Rainfall Measuring Mission (TRMM) [9]; Global Precipitation Climatology Project (GPCP) [3,10]; Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) [11]; and, Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) [12,13].
However, these data are subject to errors and must be evaluated and validated before being used, for example, to determine water balance [14][15][16], information that, in an agricultural suitability study, is indispensable for estimating water deficits and surpluses to detect change during the crop cycle of interest, supporting the application of management practices, erosion simulation studies, and even agricultural crop yield models based on evapotranspiration and water stress [17][18][19][20][21][22].Precipitation and evapotranspiration values are determining factors for agroeconomic purposes and require analysis of quality and validation for their use [23][24][25].
Vieira Junior et al. [5] pointed out not only that the Eta model usually over estimate rainfall in Minas Gerais, but also that random and systematic errors are related to regional characteristics.Franchito et al. and Pereira et al. [26,27] validated the accuracy of TRMM data over Brazil.They highlighted that TRMM data tends to overestimate rainfall despite its high correlation to ground station observations.Additionally, Franchito et al. [26] stressed the relation between errors-random and systemic-and seasonal and regional characteristics.Paredes-Trejo et al. [28] compared mensal rainfall estimates of CHIRPS v.2 with rain gauges in Northeast Brazil and they found a trend to overestimate low and underestimate high rainfall values.Furthermore, CHIRPS not only achieves better results during the wet season (March to May), but it also correlates well with observations for all stations and can be a useful substitute for rain-gauge precipitation data.
In this context, the objective of this work is to evaluate the relationships of the precipitation estimates from Eta/CPTEC, TRMM, and CHIRPS with the data from the meteorological stations of the National Institute of Meteorology (INMET, for Instituto Nacional de Meteorologia in Portuguese) in the south-southeast region of Minas Gerais, Brazil, to assess their applicability in areas where the density of stations is low.We also assessed the use of these data to determine evapotranspiration, water surplus, and water deficit through the sequential water balance (SWB) method.

Terrestrial Data
In this study, ground-truth data were collected at fourteen meteorological stations distributed within and around the south-southeast region of Minas Gerais, Brazil, located between the latitudes 20 • S and 23 • S and longitudes 44 • W and 47 • W (Figure 1).A mountainous region, the south-southeast mesoregion of Minas is the principal coffee producer in Minas Gerais, concentrating ≈21% of Brazilian Arabica coffee production.The region's climate is subtropical with well-defined seasons: a dry season in the winter and rainy one in the summer.According to the Köppen climate classification system [29], the predominant climates on the area are Cwb and Cwa.The former is subtropical altitude climate, that is, dry winters and mild summers when the hottest monthly mean temperature is less than 22 • C and the latter is a subtropical climate of dry winters and hot-rainy summers with temperatures of less than 18 • C and more than 22 • C, respectively.The area is covered in Cerrado vegetation and Atlantic Forest [30].

Database
In this study, we used environmental variables sourced from fourteen INMET meteorological stations, precipitation forecast data of the Eta/CPTEC model, and the satellite-based TRMM and CHIRPS precipitation data.Strictly speaking, we sought to investigate the relationship between the latter three data sources and the data collected from the stations.These data are briefly described below.

Meteorological Station Data
To assess the precipitation estimates obtained using the Eta, TRMM, and CHIRPS, we used historical daily rainfall (mm) series from fourteen INMET stations.Each station was identified by the name of the city where it is located, its altitude, latitude, and longitude (Table 1).The meteorological data are part of a research and teaching database of INMET.This database holds time series collected by conventional meteorological stations.The database was validated and the duplicated or inconsistent data were rejected in order to ensure data consistency [26,28].

Eta Precipitation Data
A numerical model developed at the Hydrometeorological Institute of the former Yugoslavia and the University of Belgrade, the Eta model was operationalized by the National Center for Environmental Prediction (NCEP).In Brazil, the data are provided by the Center for Weather Forecasting and Climate Studies (CPTEC, for Centro de Previsão de Tempo e Estudos Climáticos in Portuguese) [31,32].With a 15-km resolution, the model covers part of South America and provides data every three hours, with rotations at 0:00 UTC and 12:00 UTC [33].

TRMM Precipitation Data
A partnership between the National Aeronautics and Space Administration (NASA) and the Japan Aerospace Exploration (JAXA), the TRMM produces global precipitation estimates based on remotely sensed data [34,35].
The daily 3B42 product (Version 7, TRMM Multi-Satellite Precipitation Analysis) used in this study is available at https://pmm.nasa.gov/data-access/downloads/TRMM.It covers an area between 50 • N and 50 • S and has a spatial resolution of 0.25 • (~25 km).The final product is derived from the combination of precipitation estimates based on observations in the microwave and infrared channels obtained by satellites.Furthermore, these multi-satellite estimates are computed using re-analyzed precipitation data from the Global Precipitation Climatology Centre (GPCC), which comprises approximately 67,200 gauge stations world-wide, including the rain gauges from INMET [35,36].Thus, the TRMM data partially and indirectly depend on the INMET information.

CHIRPS Precipitation Data
The daily CHIRPS data (version CHIRPS v.2) were obtained from the site ftp://ftp.chg.ucsb.edu/pub/org/chg/products/CHIRPS-2.0/global_daily/tifs/p05.According to [12], the main sources of data used in CHIRPS are (i) the monthly accumulated climatological precipitation, from CHPClim; (ii) geostationary satellite observations in the infrared (IR) channel from the NOAA data sources, product of the Climate Prediction Center (CPC) and the B1 IR of the National Climatic Data Center (NCDC); (iii) precipitation estimated by the TRMM; (iv) the rainfall field of the NOAA atmospheric model, Climate Forecast System version 2 (CFSv2); and, (v) observations of precipitation in situ, obtained from national and regional meteorological services.The data have a spatial resolution of approximately 5.3 km (0.05 • ), with coverage between 50 • S, 50 • N, 180 • E, and 180 • W.

Determination of Reference and Evapotranspiration
The water balance comprises fluxes of precipitation, evapotranspiration, and water surplus, as well as storage.These components are related through Equation (1), which shows that the fluxes of precipitation (P), evapotranspiration (ETR), and water surplus (EXC) are balanced by the change in water storage (S) in the Earth's surface: A simple method of determining water availability was proposed by [37], named the sequential water balance (SWB), through which the water deficiency (DEF), water surplus (EXC), and evapotranspiration (ETR) of a crop can be determined from data on precipitation, potential evapotranspiration (ETP), and water storage capacity (WSC).
Potential evapotranspiration (ETP) of a crop was determined using the Penman-Monteith equation described in [38].In this step, we used the following data: air temperatures (minimum, average and maximum, in • C); wind velocity at a height of 2 m (m.s −1 ); relative air humidity (%); and, hours of insolation.These data were obtained from the INMET meteorological stations between January 2009 and June 2015.Subsequently, the components DEF, EXC, and ETR were estimated using the [37].
The SWB was determined separately for each type of precipitation data-meteorological station, Eta model, TRMM, and CHIRPS-for the considered period.Beginning the study period in January 2009 was necessary to determine the point at which precipitation (P) was greater than potential evapotranspiration (ETP), an attribute needed to initialize the water balance for the chosen method.However, only the period between July 2009 and June 2015 was described in this study.

Comparative Analysis
The daily precipitation data were grouped into 16-day periods, totaling 23 periods per year, with the first period of each year starting on the 1st of January.When comparing between the average accumulation data from INMET and the estimated data, we used a point-to-pixel analysis; i.e., the value of the point was compared to the value of the pixel corresponding to the point's location.To avoid incorporating uncertainties in the values, we opted for non-interpolation of the point values to the grid.

Assessment of Errors Associated with Meteorological Variables
To quantify the differences between the precipitation, ETR, and DEF measures, we used the average of errors (MBE or bias), root mean squared error (RMSE), and percent bias (PB), as described in Table 2.We also used the coefficient of determination (R 2 ) and the Nash-Sutcliffe efficiency coefficient (EFF).These values were submitted to a test of significance using the p-value of the regression analysis, for α = 5%, when considering the null hypothesis that a linear correlation does not exist between the observed and estimated data.
Table 2. Formulas of performance measures coefficient of determination (R 2 ), average of errors (MBE), root mean squared error (RMSE), relative root mean squared error (rRMSE), percent average of errors (PB), and Nash-Sutcliffe efficiency (EFF).The MBE indicates whether the modeled data are underestimated (negative value) or overestimated (positive value) relative to the observed data.The RMSE is a measure of the average magnitude of errors and will always be a positive value.The nearer that the MBE and RSME values are to zero, the greater the model precision.EFF values may vary from minus infinity to 1, a value of 1 indicating perfect agreement.In relation to the rRMSE value, Franchito et al. [26] consider reliable estimates to be those that present rRMSE below 50%.

Name Formula
Besides, the three products (Eta, TRMM, and CHIRPS) were validated regarding their capacity to detect rainfall.The Probability of Detection (POD), False Alarm Ratio (FAR), Critical Success Index (CSI), and Probability of false detection (PODF) (see Table 3) were computed using a contingency table (not shown here).POD and CSI are rates of events forecasted right (see Table 3); FAR is the rate of forecasted events which did not occur; POFD is the rate of no-rain events identified as rain [39,40].The perfect score for POFD and FAR is 0, while for POD and CSI is 1.The precipitation day threshold was set to 5 mm [41].

Correlation between Observed and Estimated Precipitation Data
Figure 2 shows the patterns of the average precipitation accumulations obtained from meteorological stations and from Eta, TRMM, and CHIRPS estimates for the periods between July 2009 and June 2015.Note that the rainy period generally began in September-October, with precipitation concentrated in the summer months (December to February), while the dry periods generally occurred between March and August.Contrary to the Eta data, the distribution of precipitation according to TRMM and CHIRPS data followed the variations in the station data for all of the analyzed periods.In Figure 3, the accumulated precipitation averages from INMET are plotted against the accumulated precipitation averages estimated from the Eta, TRMM, and CHIRPS, separately, for each crop year (corresponds to the period from July to June of the next year).For the Eta data, the coefficients of determination varied between 0.69 (2013/2014) and 0.94, with the best result in 2010/2011, the year of highest average precipitation.For the TRMM data, the coefficients of determination were higher than 0.90 in all of the crop years, with a minimum of 0.93 (2012/2013) and a maximum of 0.96 (2011/2012).For the CHIRPS data, R 2 varied between 0.88 and 0.  Based on the paired t-test, we found a significant difference between the station data and the data estimated by the Eta model (α < 0.05).Despite the high R 2 values (0.69 to 0.94), the EFF values were low (−1.66 to 0.74), indicating low correspondence between the observed and estimated data.As shown, the rRMSE values exceeded 50% (except for the 2010/2011 crop year), which indicates that the Eta measures are not very reliable for the analyzed period.Similar results showing overestimation in the Eta data for the Minas Gerais region was observed in [5].According to the authors, the modeling errors may be associated with deficiencies in modeling elements of atmospheric circulation, elements of topography, surface coverage, and possible interactions between these elements.Furthermore, according to [33], the model generated forecasts based on average grid data, and is therefore not directly applicable to a specific point.In relation to the TRMM data (Figure 4), RMSE varied between 9.51 mm (2013/2014) and 17.33 mm (2012/2013), and MBE stayed below 9 mm.Regression analysis between the observed and estimated data showed a determination higher than 0.90.The EFF value varied from 0.90 to 0.96, which indicates that the estimated data and observed data are in strong agreement.We found a positive PB value for the TRMM values relative to the measured data; however, the values were lower than the Eta model estimates.We found no significant difference between the station data and the two TRMM estimates, except for the 2009/2010 crop year (α < 0.05).
Regarding High EFF index values show high correspondence between the observed and estimated values.However, we found a significant difference between the station data and CHIRPS estimates for the 2014/2015 crop year (α < 0.05).Discrepancies between the observed and the CHIRPS data were also reported by [28]; according to the authors, that fact may be associated with a reduction in the number of stations used to generate CHIRPS data [42] over the years.
Analyses of the performance of Eta, TRMM, and CHIRPS precipitation products in relation to the INMET precipitation for the entire period from July 2009 to June 2015 are presented in Tables 4 and 5.
In the Table 4, the MBE for the period reveals an overestimation of 42.8 mm, 4.21 mm, and 3.45 mm by Eta, TRMM, and CHIRPS, respectively.High agreement of the TRMM and CHIRPS with the INMET data is indicated by high EFF and R 2 values, both above 0.90.Note that, the low error associated with TRMM data is likely caused by using the GPCC reanalysis data.Similar findings were reported by [9].
The three products display good performance for detecting rainfall events (see Table 5).The best one is Eta (POD = 1), followed by CHIRPS (POD = 0.95) and TRMM (POD = 0.94).As far as FAR is concerned, the best performance was achieved by TRMM (FAR = 0.01) and CHIRPS (0.03), while Eta is 0.08.Regarding CSI, the three products displayed similar rates (Eta 0.92, TRMM 0.94, and CHIRPS 0.93).Finally, POFD is high for Eta (0.69), while for TRMM and CHIRPS is less than 20%.Figure 5 shows the seasonal variations (DJF, MAM, JJA, SON) of the MBE, RMSE, PB (%), and rRMSE (%), and of the R 2 and EFF coefficients between the Eta, TRMM, and CHIRPS data and the INMET data.In relation to the Eta data (Figure 4), we found low correspondence between the estimated and observed values, with low EFF metrics (EFF < 0) for all periods, as well as rRSME and PB greater than 50%; furthermore, the R 2 values are not significant (paired t-test; α < 0.05) for any of the four analyzed periods.As shown in Figure 5, the coefficients of determination between the INMET data and the TRMM and CHIRPS data are high for most seasons.The coefficients of determination for the TRMM data are greater than 0.87 in all of the periods, the highest R 2 values being observed in the periods Sept-Oct-Nov (SON; 0.94) and Dec-Jan-Feb (DJF; 0.94).The percent bias is negative only for the period Jun-Jul-Aug (JJA; −0.87) and lower than 11% in all seasons; the highest PB values occur in the periods DJF (6.49%) and Mar-Apr-May (MAM; 10.42%).Figure 5.b.2 shows the seasonal variation of the rRMSE.We found that the rRMSE value is lower than 50% in MAM (35.38%),SON (17.65%), and DJF (15.33%).However, rRMSE is high in JJA (101.85%),agreeing with the EFF value (0.71), which indicates that the estimate for that period is not reliable, since the error exceeds 50% of the amount of observed precipitation.Similar results were reported by [26].
Note in Figure 5 that the CHIRPS values tend to overestimate precipitation relative to the INMET data (positive PB).The high R 2 (> 0.85) and EFF values (> 0.80) indicate good correspondence between the estimated values and the observed INMET values for the periods between September and MAY (SON, DJF, MAM); however, the rRSME value of 74% in JJA indicates that the data for that period is not very reliable.

Analyses of Water Deficiency and Evapotranspiration Values
In this part, we evaluated the components of the sequential water balance: the observed data of INMET and the estimates from the Eta model and remotely sensed TRMM and CHIRPS data were used to determine water surplus, water deficit and evapotranspiration.The evaluations were made by comparing the average accumulation data of the stations (Table 2, column WB) with the averages data of the Eta, TRMM, and CHIRPS, separately.
Figure 6 presents the water deficit and water surplus components estimated through the sequential water balance, following [37].In general, the DEF and EXC data obtained with TRMM and CHIRPS are similar to the INMET data.The Eta data not only overestimates the precipitation values, but also obscures the water deficit, due to high water surplus values.
Table 6 shows the performance metrics of the Eta, TRMM, and CHIRPS products in determining the water surplus and deficit in the period from July/09 to June/15.In general, negative MBE and PB (%) values indicate that the products tend to underestimate the water deficit.As the high EFF and R values show, there is strong correspondence between DEF TRMM and DEF CHIRPS and DEF INMET .The performance of the Eta data, however, is inferior to that of TRMM and CHIRPS.The relative errors (rRMSE) are 83.27%,30.63%, and 45.28% for DEF ETA , DEF TRMM , and DEF CHIRPS , respectively.In the joint analysis of metrics, DEF TRMM shows greater correspondence with DEF INMET , presenting higher R and EFF values and lower MBE and RMSE values in comparison to DEF ETA and DEF CHIRPS .Positive PB (Table 6) indicates that EXC ETA , EXC TRMM and EXC CHIRPS were overestimated relative to EXC INMET ; however, the errors associated with EXC ETA were greater than the other analyzed data, with high rRMSE and PB: ≈ 400% and 225%, respectively.Meanwhile, EXC TRMM and EXC CHIRPS present rRMSE lower than 70% and PB lower than 13%.The high errors agree with the low R 2 (0.55) and EFF (<0), indicating low agreement between EXC ETA and EXC INMET .In this case, the water surplus is due to overestimation of precipitation by the Eta model.As shown in Table 6, EXC TRMM and EXC CHIRPS present similar performance metrics, with R 2 of 0.92 and EFF close to 1; furthermore, both show MBE close to 2 mm and RMSE of about 12 mm.The observed results agree with the findings of [14][15][16].Those findings rely on rainfall data from remote sensing and they report overestimations of water surplus due to overestimations in rainfall.Table 7 shows the results of the statistical analyses of each product for evapotranspiration.Positive MBE values indicate that, relative to ETR INMET , the three products overestimate evapotranspiration.However, high EFF and R 2 suggest strong agreement of ETR TRMM and ETR CHIRPS with ETR INMET .Evapotranspiration of TRMM presented the strongest relationship with that of INMET, represented by a high R 2 of 0.94 and a low RMSE of 3.97 mm•period −1 ; it also showed overestimation, represented by MBE of 0.80 mm•period −1 and PB of 1.82%.The CHIRPS product presented an intermediate pattern between the two products, with an overestimation of 3.57% and strong agreement with INMET, represented by a low rRMSE of 13.4% and high EFF of 0.93.Regarding evapotranspiration, the errors found in the three products (Eta, TRMM, and CHIRPS) fall in the confidence interval described by [43,44].The analysis of water deficit/surplus and evapotranspiration show that the Eta's overestimation of precipitation (Figure 8) in the dry period (JJA) is not reflected in the water surplus, which presents high EFF values and low MBE for that period; it is reflected, however, in the water deficit, which is underestimated, and thus in the evapotranspiration, which is overestimated (high MBE) and in low agreement with the INMET data (low EFF).We found that the water deficits in the other periods were masked by the overestimated precipitation, which resulted in EXC and ETR ETA estimates with higher error than ETR INMET .
Figure 8 shows that the strong agreement of the TRMM and CHIRPS estimates with the INMET estimates is maintained throughout the four analyzed periods (DJF, MAM, JJA, and SON).However, we found the best performance in the WB simulation with TRMM data, characterized by high EFF and low MBE values for DEF TRMM , EXC TRMM , and ETR TRMM in the four seasons.

Figure 1 .
Figure 1.Spatial locations of meteorological stations (left) and relative pixel sizes of the rain data products (right).

2 N
Relative root mean squared error rRMSE = RMSE P o Percent Bias PB = 100 ∑ N i=1 (Pi−Po) ∑ N i=1 P o Nash-Sutcliffe efficiency coefficient EFF = 1 − ∑ N i=1 (P i −P o ) 2 ∑ N i=1 (P o −P o ) 2 where: P i = estimated variable; P o = variable observed at meteorological station; and P o = average of the values observed at meteorological stations; N = total number of observations.

Figure 2 .
Figure 2. Distribution of the average accumulated precipitation estimated from Eta, TRMM, and CHIRPS and observed at fourteen INMET meteorological stations between July 2009 to June 2015, for the south-southeast region of Minas, MG.

Figure 3 .
Figure 3. Precipitation (mm) from INMET versus the precipitation estimates from Eta, TRMM, and CHIRPS, per crop year.

Figure 4 ,
the RMSE of the CHIRPS data varied from 10.25 mm (2011/2012) to 21.85 mm (2014/2015).When compared to the other estimated data, the annual RMSE values are lower than those that are presented by the Eta, and lower than those presented by the TRMM for the years between 2009/2010 and 2011/2012.However, in the subsequent years, there was an increase of errors in the CHIRPS relative to the TRMM.Analysis of the MBE indicates underestimated CHIRPS values for the years 2009/2010 and 2010/2011 and overestimated data in the following years.

Figure 6 .
Figure 6.Average values of water deficit (DEF) and water surplus (EXC) estimated through the sequential water balance, with use of the observed data of INMET meteorological stations and the Eta, TRMM, and CHIRPS data for the period July 2009 to June 2015.

Figure 7 .
Figure 7. Average values of evapotranspiration (ETR) estimated through the sequential water balance, using INMET, Eta, TRMM, and CHIRPS data, for the period July 2009 to June 2015.

Table 1 .
Location of the meteorological stations.Availability of data is indicated by ( ).

Table 3 .
Formulas for performance measurements based on Probability of Detection (POD), False Alarm Ratio (FAR), Critical Success Index (CSI) and Probability of false detection (POFD).

Table 4 .
Coefficients of correlation (R), of determination (R 2 ), and of Nash-Sutcliffe efficiency (EFF), average of errors (MBE), root mean squared error (RMSE), relative root mean squared error (rRMSE), and percent bias (PB) for precipitation during the period from July 2009 to June 2015.

Table 5 .
Probability of Detection (POD), False Alarm Ratio (FAR), Critical Success Index (CSI), and Probability of false detection (POFD) for precipitation during the period from July 2009 to June 2015.

Table 6 .
Coefficients of correlation (R), determination (R 2 ) and Nash-Sutcliffe efficiency (EFF), Average of errors (MBE), root mean squared error (RMSE), relative root mean squared error (rRMSE), and percent bias (PB) for the water deficit (DEF), and water surplus (EXC) in the period from July 2009 to June 2015.

Table 7 .
Coefficients of correlation (R), determination (R 2 ) and Nash-Sutcliffe efficiency (EFF), average of errors (MBE), root mean squared error (RMSE), relative root mean squared error (rRMSE), and percent bias (PB) for evapotranspiration in the period from July 2009 to June 2015.