Reference Evapotranspiration Variation Analysis and Its Approaches Evaluation of 13 Empirical Models in Sub-Humid and Humid Regions: A Case Study of the Huai River Basin, Eastern China

: Accurate and reliable estimations of reference evapotranspiration ( ET 0 ) are imperative in irrigation scheduling and water resource planning. This study aims to analyze the spatiotemporal trends of the monthly ET 0 calculated by the Penman–Monteith FAO-56 (PMF-56) model in the Huai River Basin (HRB), eastern China. However, the use of the PMF-56 model is limited by the insufﬁciency of climatic input parameters in various sites, and the alternative is to employ simple empirical models. In this study, the performances of 13 empirical models were evaluated against the PMF-56 model by using three common statistical approaches: relative root-mean-square error (RRMSE), mean absolute error (MAE), and the Nash–Sutcliffe coefﬁcient (NS). Additionally, a linear regression model was adopted to calibrate and validate the performances of the empirical models during the 1961–2000 and 2001–2014 time periods, respectively. The results showed that the ET PMF increased initially and then decreased on a monthly timescale. On a daily timescale, the Valiantzas3 (VA3) was the best alternative model for estimating the ET 0 , while the Penman (PEN), WMO, Trabert (TRA), and Jensen-Haise (JH) models showed poor results with large errors. Before calibration, the determination coefﬁcients of the temperature-based, radiation-based, and combined models showed the opposite changing trends compared to the mass transfer-based models. After calibration, the performance of each empirical model in each month improved greatly except for the PEN model. If the comprehensive climatic datasets were available, the VA3 would be the recommended model because it had a simple computation procedure and was also very well correlated linearly to the PMF-56 model. Given the data availability, the temperature-based, radiation-based, Valiantzas1 (VA1) and Valiantzas2 (VA2) models were recommended during April–October in the HRB and other similar regions, and also, the mass transfer-based models were applicable in other months.

irrigation scheduling and management, which plays a vital role in the atmosphere, hydrosphere, and biosphere. From the agricultural perspective, its estimation is greatly significant in humid and sub-humid regions like eastern China, where it is vital for determining high crop water requirements and subsequently, for planning and managing irrigation practices [1]. Accurate and reliable estimations of ET 0 are essential for improving water resource planning and management [2,3], farm irrigation demand, and also environmental assessment [4].
Numerous models based on climatic variables have already been employed to calculate the ET 0 in many climatic and hydrogeographic settings [2,5]. For example, the Penman-Monteith FAO-56 (PMF-56) model recommended by the Food and Agricultural Organization (FAO) has been recognized as the most accurate model for estimating the ET 0 over the past few decades [6]. The PMF-56 model has two advantages compared with other models [2,7]. Firstly, it is used globally without any calibrations because of its biophysical basis. Secondly, this model has been well-documented in the existing literature, in which it has been evaluated using a variety of conditions [5]. The main shortcoming of the PMF-56 model is the requirement of large datasets, including the air mean, maximum and minimum temperature, relative humidity, wind speed, and solar radiation. Records of these meteorological input parameters are often with debatable quality or are unavailable for a specific site, especially in some developing countries [1]. In addition, the installation and maintenance of meteorological station instruments can be expensive and complicated [8]. Furthermore, some researchers or institutions may not have access to complete meteorological datasets in some study areas. In the areas where the observed large meteorological data are difficult to obtain, the PMF-56 model is not the best option. To solve this problem, ET 0 estimation models with a fewer errors and a simple computation procedure are preferably applied. Therefore, there is an urgent need to find an accurate, suitable, and simple alternative model to estimate the ET 0 relative to the PMF-56 model when the meteorological datasets are limited or missing.
During recent decades, the empirical models have been developed for estimating the ET 0 , which required fewer meteorological parameters or simplified expressions. There are four main climatic models: mass transfer-based, temperature-based, radiation-based, and the combined models. Although a significant number of studies have been performed to evaluate these empirical models in various climatic regions throughout China (e.g., Chen et al. [9]; Cai et al. [7]; Huo et al. [10]; Wen et al. [3]; Liu et al. [11]; and Feng et al. [12]), few such studies have been conducted in humid and sub-humid climatic regions of eastern China [1,13]. Indeed, the performances of the empirical models may vary in various environmental conditions, and local evaluation and calibration are needed [5]. Feng et al. [12] calibrated the Hargreaves model using Bayesian theory in the Sichuan basin of southwestern China. Liu et al. [11] determined the decisive meteorological variables using path analysis to establish the specific models for estimating the ET 0 . The above studies mainly adopted temperature-based and radiation-based models to estimate the ET 0 ; however, the mass transfer-based and combined models have rarely been used in China, and their applicability remains to be tested thoroughly. Hence, it is imperative to carry out research evaluating the performance of the empirical models to determine the best, or a relatively appropriate model for estimating the ET 0 in a humid and sub-humid region.
Due to the simple operability of regression models, practitioners have widely used the simple linear regression model in many studies, as can be found in Mallikarjuna et al. [14], Wen et al. [3], Peng et al. [15], Cobaner et al. [16], Citakoglu et al. [17], and Huo et al. [10]. The regression model expresses the dependence of a response parameter on many independent parameters and is used in modeling a varied range of hydrologic process studies. In addition, Rahimikhoob et al. [4] adopted the regression model to evaluate the performance and characteristics of four empirical models for ET 0 estimation in a subtropical climate in Iran and found that all the performances of the models were improved after the calibration of regional specific coefficients. Similarly, we employ the linear regression model in the present study to assess the performances of 13 empirical models for ET 0 estimation as an alternative approach to the PMF-56.
Owing to the complex relationship between the ET 0 and climatic factors, many practitioners and researchers have adopted computer algorithms to estimate the ET 0 with higher accuracy. Huo et al. [10] trained and tested the artificial neural network (ANN) model for estimating the ET 0 in northwest China and found that the maximum and minimum temperature and relative humidity were the most crucial inputs for this model in arid and semi-arid areas. Tabari et al. [2] evaluated the performance of the support vector machine (SVM), an adaptive neuro-fuzzy inference system (ANFIS), multiple linear regression (MLR), and multiple non-linear regression (MNLR) for ET 0 estimation in a semi-arid highland environment in Iran. They found that the SVM and ANFIS models with inputs of mean temperature, relative humidity, wind speed, and solar radiation showed the best performance. After evaluation of the trained extreme learning machine (ELM), back propagation neural networks optimized by the genetic algorithm (GANN), and wavelet neural networks (WNN), two temperature-based and three radiation-based models were developed in a humid area of southwest China by Feng et al. [18]. Feng et al. [18] recommended the ELM and GANN models as the best alternatives with limited meteorological data. Mehdizadeh et al. [19] tested the performance of the SVM and multivariate adaptive regression splines (MARS) and found that they were better than the empirical models in Iran. Chauhan and Shrivastava [20] investigated the performance of reference evapotranspiration in India using climate-based models and ANNs and found that the ANN models performed better than the climatic-based models in all performance indices. Despite the higher accuracy of the computer algorithms, these algorithms must be implemented through specific software, and the models with particular inputs of climatic factors could not be expressed in straightforward mathematical expressions like the empirical models [16]. Thus, the empirical model as an alternative option to estimate the ET 0 has been recommended in many other studies [20][21][22][23][24].
Except for the aforementioned examples, most of the previous studies have been carried out using a low-precision timescale when there is a need for calibrating the empirical models. Furthermore, Bourletsikas et al. [25] recommended that calibration at a seasonal or even monthly time-step could obtain more accurate daily estimates. To the best of our knowledge, no comprehensive studies have been undertaken to analyze the spatiotemporal trends of monthly ET 0 and evaluate the performances of empirical models in the HRB, eastern China, especially on a monthly timescale, which in itself is the novelty of our research work. Although the performance of 10 empirical models were compared for different sub-regions of mainland China [15], a thorough and detailed study for choosing the best alternative empirical model for the PMF-56 model in the HRB has not been conducted. To fill this research gap, in this study, we chose 13 extensively applied empirical models-including one temperature-based model (Hargreaves-Samani), three mass transfer-based models (Penman, WMO, and Trabert), six radiation-based models (Makkink, Priestly-Taylor, Jensen-Haise, Abtew, Irmak, and Tabari), and three combined models (Valiantzas1, Valiantzas2, and Valiantzas3)-based on their meteorological input parameters and applicability worldwide. Consequently, we propose two hypotheses: (1) the different empirical models will produce significantly different results for the estimation of reference evapotranspiration on a monthly timescale; and (2) the linear regression model can effectively calibrate the 13 empirical models against the PMF-56 model in the HRB. Ultimately, the main objectives of this study are (1) to analyze the spatiotemporal trends of the ET PMF in the HRB during 1961-2014 on a monthly timescale; (2) to evaluate the performances of 13 empirical models against the PMF-56 model for ET 0 estimation on a daily timescale; (3) to calibrate the 13 empirical models using the daily datasets from 1961-2000 by adopting the linear regression model on a monthly timescale; and (4) to validate the calibrated empirical models by using three statistical approaches during the period of 2001-2014 on a monthly timescale. The outcomes of the study will provide meaningful guidance for agricultural production and hydrological planning and management in this vital region, as well as other regions with the similar climates.

Study Area and Datasets
The Huai River Basin (HRB) is located in a climate transition zone (111 • 55 -121 • 25 E, 30 • 55 -36 • 36 N) between northern and southern China, with a catchment area of about 270,000 km 2 . The western, southwestern, and northeastern parts of the HRB are mainly mountainous and hilly areas, with the remaining area occupied by broad plains for about two-thirds of the total basin area ( Figure 1). The annual average temperature ranges from 11 • C to 16 • C, increasing from north to south, as well as from coast to island areas. The annual mean water surface evaporation ranges from 900 mm to 1500 mm, and relative humidity ranges from 40 to 70%. Annual average precipitation is about 970 mm; more than 50% of rainfall is concentrated in the monsoon season from June to September. With an uneven distribution pattern of precipitation and complex weather systems, the HRB is extremely vulnerable to floods during the rainy season and drought during the dry season.
Daily meteorological data from 137 meteorological stations in the HRB during 1961-2014, including mean temperature (T, • C), maximum temperature (T max , • C), minimum temperature (T min , • C), relative humidity (RH, %), wind speed at 2 m height (u 2 , m·s −1 ), sunshine duration (SD, h), and precipitation (P r , mm), were obtained from the National Meteorological Information Center (NMIC) of the China Meteorological Administration (CMA). Quality control had already been applied to the meteorological datasets by the staff members of the NMIC. Detailed data descriptions can be found at the website http://data.cma.cn/. Specific information for each station is listed in Table A1. All stations were divided into two climate zones, namely humid and sub-humid regions. The subdivision of these two climate zones is mainly according to the global aridity index (AI) adopted by the United Nations Convention to Combat Desertification [26][27][28]. The AI is defined as the ratio of annual average precipitation to reference evapotranspiration, and the classification standards of humid and sub-humid regions are as follows: AI > 1 for humid regions and 0.5 < AI < 1 for sub-humid regions. The demarcation line between humid and sub-humid regions is displayed in Figure 1. The description of the main climatic factors in each region of the HRB during the study period is presented in Table 1.  Note: R s is the solar radiation (MJ·m −2 ·d −1 ).

Penman-Monteith FAO-56 Model (PMF-56 Model)
Due to the absence of the observation of lysimeters, the Penman-Monteith FAO-56 model (PMF-56), which was proposed by Allen et al. [6], was adopted as standard all over the world and is considered the best model for estimating reference evapotranspiration [1,[29][30][31]. The exact expression is shown in the following Equation (1): where ET PMF is the daily reference evapotranspiration (mm·d −1 ), ∆ is the slope of the vapor pressure curve (kPa· • C −1 ), R n is the net solar radiation (MJ·m −2 ·d −1 ), G is the soil heat flux density (MJ·m −2 ·d −1 ), γ is the psychrometric constant (kPa· • C −1 ), T is the daily mean air temperature at 2 m height ( • C), u 2 is the wind speed at 2 m height (m·s −1 ), e s is the saturation vapor pressure (kPa), and e a is the actual vapor pressure (kPa). The detailed calculations of the parameters in Equation (1) can be found in the literature [1,6].

Empirical Models
Our preliminary evaluation of the 13 empirical models was based on the acceptance of their meteorological input parameters and the applicability of the models worldwide. These 13 empirical ET 0 models, which commonly performed well in various regions of the world [24,[32][33][34][35][36][37][38][39][40], were selected to compare to the PMF-56 model. The combined models included the three Valiantzas equations [41,42], which were proposed to simplify the PMF-56 equation. The three Valiantzas [41,42] equations were comparatively new, and their performances had not been validated in eastern China. The Hargreaves-Samani equation (HS) was adopted in the present study because the PMF-56 manual recommended the use of the HS as a less complex model mainly requiring data on temperature and extraterrestrial radiation. Thus, the 13 empirical models employed in this study can be divided into the following four categories: one temperature-based model (Hargreaves-Samani), three mass transfer-based models (Penman, WMO, and Trabert), six radiation-based models (Makkink, Priestly-Taylor, Jensen-Haise, Abtew, Irmak, and Tabari), and three combined models (Valiantzas1, Valiantzas2, Valiantzas3). The specific calculation equations, main input variables, and references are presented in Table 2. Table 2. Original forms of the 13 empirical models.

NO. Models Models Input Equations References
Temperature-based 1 Hargreaves Mass transfer-based 2 Penman Trabert u 2 , e s − e a ET TRA = 3.075u 2 0.5 (e s − e a ) [35] Radiation-based 5 Makkink [41,42] Note: R a is the extraterrestrial radiation (MJ·m −2 ·d −1 ), R s is the solar radiation (MJ·m −2 ·d −1 ), R n is the net solar radiation (MJ·m −2 ·d −1 ), T, T max , and T min are mean, maximum, and minimum temperature ( • C), respectively, u 2 is the wind speed at 2 m height (The unit of u 2 is in m·s −1 in all equations except the Penman model, where u 2 is in miles·d −1 ), e s and e a are saturation and actual vapor pressure, respectively (The units of e s and e a are in hPa in all equations except the Penman model, where e s and e a are in mmHg.), RH is the relative humidity (%), ∆ is the slope of the vapor pressure curve (kPa· • C −1 ), γ is the psychrometric constant (kPa· • C −1 ), λ is the latent heat of vaporization (≈2.45 MJ·kg −1 ), G is the soil heat flux density (MJ·m −2 ·d −1 ), and ϕ is the latitude (rad). The abbreviations of the 13 empirical models are arranged in order that the models appear in Table 2: HS, PEN, WMO, TRA, MAK, PT, JH, ABT, IRM, TAB, VA1, VA2, and VA3.

Performance Evaluation Approaches
In this study, the performance of the 13 empirical models was evaluated by adopting three statistical approaches: relative root-mean-square error (RRMSE), mean absolute error (MAE) and the Nash-Sutcliffe coefficient (NS) [12,43]. The following Equations (2)-(4) are used to evaluate the performances of the 13 empirical models: where ET i PMF and ET i EMP are daily reference evapotranspiration estimated by the PMF-56 model and the 13 empirical models, respectively, n is number of the sample size, and ET PMF,mean is the mean value of ET PMF . The RRMSE is dimensionless, with the value ranging from 0 to ∞. The MAE is in mm·d −1 . The closer the value of the RRMSE or the MAE to 0, the better the performance of empirical equations. The NS is dimensionless, with the value ranging from 1 to −∞, the closer the value of the NS to 1, the better the performance of empirical models.

Calibration and Validation of the Empirical Models
As recommended by Allen et al. [6], the linear regression model was employed to calibrate and validate the empirical models against the PMF-56 model. The specific expression is shown in the below Equation (5): where ET PMF and ET EMP represent the daily reference evapotranspiration estimated by the PMF-56 model and the 13 empirical models, respectively, and a and b are calibrated empirical coefficients.

Trend Test
The nonparametric Mann-Kendall (MK) test [44,45] was applied to identify the trend of the ET 0 . The MK test statistic (Z) follows the standard normal distribution with a mean of 0 and variance of 1 under the null hypothesis of no trend in the ET 0 . The null hypothesis is rejected if |Z| ≥ Z 1−β/2 at a significance level of β, where Z 1−β/2 is the (1 − β/2)-quantile. If the Z value is positive (or negative), then the ET 0 has an increasing (or decreasing) trend. As β = 0.05, if |Z| > 1.96, the trend is significant. In addition, Theil-Sen's slope estimator (β) [46,47] is used to determine the extent of a trend. This method has been usually used to detect the slope of a trend in a hydrometeorological time series dataset, which can be found in the literature [1,13,48]. The spatial distributions of the monthly ET 0 and its trends are mapped by the inverse distance weighted (IDW) interpolation model in ArcGIS (version 9.3) software.

Monthly Variations of the ET PMF
As shown in Figure 2, the ET PMF increased initially and then decreased on a monthly timescale, with the highest value appearing in June and the lowest value in January. Meanwhile, the ET PMF also revealed large regional differences in April, May, and June, which were especially obvious in June, with the value ranging from 112 mm to 165 mm. Similar monthly trends of the ET PMF could also be detected in other regions of China, such as the Yellow River Basin [49], northwest China [50], and the Sanjiang Plain in northeast China [51]. The spatial distribution of the ET PMF results showed a similar tendency of temporal distribution in the HRB (Figure 3). From January to June, the highest-value region of the ET PMF shifted from the northwest to the northern parts of the HRB, generally demonstrating a gradually increasing trend from south to north. However, from July to December, the highest ET PMF existed in the central and western parts of the HRB, then shifted to the southeastern parts in August, September, and October, and then shifted back to the southern and northwestern parts in November and December. Different distribution patterns of the ET PMF were found on a seasonal timescale in the HRB [13]. The ET PMF represented an obvious spatial evolution pattern on a monthly timescale.
In this study, for a comprehensive understanding of the ET PMF trends on a monthly timescale in the HRB, the MK test and Sen's slope estimator were employed. As shown in Table 3 and Figure 4, on a temporal scale, the ET PMF exhibited significant decreasing trends in January, June, July, and August, with the values of −0.108 mm·a −2 , −0.628 mm·a −2 , −0.330 mm·a −2 , and −0.460 mm·a −2 , respectively. However, in March and April, the ET PMF demonstrated slightly non-significant increasing trends. Similar to the results in Figure 2, the magnitude of the ET PMF trends on a spatial scale were higher in June. Particularly, in January, the ET PMF showed significant decreasing trends in the central and northwestern parts of the HRB, while a non-significant increasing trend of the ET PMF was detected in some parts of the southwest and southeast of the HRB. Similar trends were identified in September, November, and December, while the magnitude of the ET PMF trends was much smaller than that in January. In February and October, the stations with significant decreasing ET PMF trends mainly existed in the northwest of the HRB, and the stations with increasing ET PMF trends were found in the northeast and southeast of the HRB. In March, about 56% of stations exhibited increasing ET PMF trends, with a few stations showing significant increasing trends distributed in the southwest and southeast of the HRB. In April, about 62% of stations showed increasing ET PMF trends, with about 34% of stations distributed in the southern parts of the HRB accounting for the significant increasing trends. Meanwhile, only about 12% of stations located in the north of the HRB exhibited significant decreasing trends in the ET PMF . However, in May, the proportion of stations with significant decreasing ET PMF trends in the northern part of the HRB increased to 33%, while the proportion of stations with significant increasing ET PMF trends decreased to 4% of all stations that were mainly distributed in the southeast of the HRB. Unlike in other months, the ET PMF showed significant decreasing trends in almost the whole HRB in June and August, accounting for about 89% and 91% of stations, respectively, except for a few stations that exhibited non-significant decreasing trends in the southeast region. Nevertheless, significant decreasing trends of the ET PMF in July were mainly distributed in the northwestern part of the HRB, with about 54% of stations occupied.     Figure 5 shows the scatter plots of reference evapotranspiration estimated by the PMF-56 model (ET PMF ) and the 13 empirical models (ET EMP ) in the HRB. As seen in Figure 5, climatic-based models, such as the temperature-based model (HS), radiation-based models (MAK, PT, JH, ABT, IRM, and TAB), and the combined models (VA1, VA2, and VA3) performed better than the mass transfer-based models (PEN, WMO, and TRA), with all the scatters of the ET EMP distributed along the 1:1 line and the R-squared greater than 0.9. Among the well-performing models, the HS, PT, JH, ABT, VA1, VA2, and VA3 models overestimated the ET 0 in the whole HRB, with the slope of the linear fit line greater than 1, and the MAK, IRM, and TAB models underestimated the ET 0 , with the slope of the linear fit line less than 1. However, the performance of the mass transfer-based models was not satisfactory, with the values of the determination coefficients at less than 0.801. Furthermore, the scatters of the mass transfer-based models are more dispersed, generally underestimating the ET 0 with large errors. For better evaluation of each empirical model, the statistical analysis, including the RRMSE, the MAE, and the NS, are shown in Table 4. In the temperature-based model, the HS model showed average performance, with the values of the RRMSE, the MAE, and the NS were 0.222, 0.475, and 0.853, respectively. The simulation accuracy is higher than that in the Sichuan Basin of southwestern China [12] (where the cloud cover is relatively large, and the solar radiation is reduced, ultimately influencing the estimation results of the ET 0 ) and also different sub-regions of mainland China [15]. Of the mass transfer-based models, the performance of each model followed the sequence of TRA > WMO > PEN. Considering the results of large errors shown in Figure 5 and values of the RRMSE, the MAE, and the NS, these three models cannot be used as appropriate alternatives, especially the PEN model (for which the values of the RRMSE, the MAE, and the NS were 0.580, 1.301, and −0.006, respectively). Of the radiation-based models, all the models performed well except the JH, for which the values of the RRMSE, the MAE, and the NS were 0.417, 0.870, and 0.419, respectively. The overall performances followed the order of IRM > TAB > PT > ABT > MAK > JH. The poor performance of JH agreed with the results of Ahooghalandari et al. [52] in Australia. In the combined models, all the statistical parameters of each model performed better in estimating the ET 0 than the others, especially the VA3 model, for which the values of the RRMSE, the MAE, and the NS were 0.126, 0.267, and 0.953, respectively, followed closely by VA1 and VA2. Based on the aforementioned discussion, on a daily timescale, the VA3 model is the best alternative model to the PMF-56 model for estimating the ET 0 in the HRB. In earlier studies, Mehdizadeh et al. [19] reported that the mass transfer-based models showed the worst performance, while the combined models exhibited the best accuracy. The higher precision of the combined models might be due to the combination of the most suitable and important meteorological parameters incorporated therein (e.g., T from the temperature-based model, u 2 and RH from the mass transfer-based models, and R s from the radiation-based models). Tabari et al. [24] also tested 10 mass transfer-based models in the humid conditions of Iran and found that some of their performances were poor and had underestimated the results. Our results are in good agreement with the findings of Tabari et al. [24]. In this study, the VA3 model showed the best accuracy because of the combination of the T, RH, u 2 , and R s parameters, which play a vital role in ET 0 estimation. Similar findings were also found in Tanzania and southwestern Kenya in eastern Africa [53], India [20], and western Australia [54]. In addition, the better performance of the radiation-based models was mainly due to the important role of the R s parameter in ET 0 estimation in humid climates [55].

Calibration of the Empirical Models
As seen in Figure 5 and Table 4, it has been found that each empirical model has more or less errors when estimating reference evapotranspiration compared with that estimated by the PMF-56 model to some extent. In Figure 6, the monthly ET EMP calculated by the 13 empirical models were compared with the ET PMF on a monthly timescale. Evidently, the mass transfer-based models (PEN, WMO, and TRA) have different monthly change trends compared to other models, which might be responsible for their poor performance. The underestimation of these three mass transfer-based models was in accordance with the results of Tabari et al. [24] in Iran, although with similar humid conditions, some differences also existed in the monthly average trend. Except for the mass transfer-based models, despite the analogous monthly change trends between the other empirical models and the PMF-56 model, there was still some obvious overestimation (JH, HS, and PT) and underestimation (TAB and MAK) on the monthly timescale. Taking JH model as an example, it overestimated the ET 0 greatly from April to October, while underestimating the ET 0 in other months. Jensen et al. [56], Tabari et al. [24], and Tegos et al. [57] also reported that the JH model tends to overestimate the ET 0 in humid climate conditions. This difference can also be confirmed in Table 4. Due to the large differences estimated by the empirical models on a monthly timescale, we decided to calibrate the empirical models from month to month. In this study, the 13 empirical models were calibrated using the meteorological datasets from 1961 to 2000 by establishing the linear regression model between the ET PMF and the ET EMP . The specific calibration coefficients can be found in Table 5. As seen in Table 5, we found that the determination coefficients of the temperature-based, radiation-based, and combined models all presented change trends that increased primarily and then decreased from January to December. The high values of the determination coefficients for these models mainly existed between April and October, especially in July, August, and September, in which the highest values were found. At this point, these models could be recommended as appropriate alternatives for estimating the ET 0 during these months. However, in other months, especially in January and December, the performances of these models were poor, with the PT model showing the worst performance (R 2 = 0.114). According to the earlier study of Li et al. [13], the R s parameter was the most dominant factor influencing the ET 0 trends in the growing season (April-October) and summer (June-August) in the HRB, and it might be responsible for the good performance of the radiation-based models in these periods. In addition, the determination coefficient of the VA3 model ranged from 0.882 in December to 0.993 in July, which indicated that it was the ideal alternative over the other models. This phenomenon can also be verified in Figure 5 and Table 4. Despite the fact that the VA3 model possessed a relatively simple calculation formula in comparison with the PMF-56 model, its applicability still needed more consideration due to the numerous meteorological input parameters (e.g., T, R s , RH, and u 2 ) required. On the contrary, the mass transfer-based models revealed an opposite change trend that decreased initially and then increased from January to December, with the lowest determination coefficient appearing in September. Despite the fact that the mass transfer-based models showed the poorest performance in daily scatter plots fit ( Figure 5 and Table 4), the performance of these models in January, February, March, November, and December should gain more recognition, with the values of the determination coefficients at greater than 0.8. Furthermore, especially in January and December, the determination coefficients of the WMO and TRA models were greater than 0.9 and also greater than that of the VA3 model; these results illustrated that these two mass transfer-based models could be recommended as the best alternative to estimate the ET 0 in the corresponding time periods. The WMO and TRA models shared the same meteorological input parameter of u 2 . Then, this phenomenon could reasonable be explained by the previous study, which found that u 2 was the dominant contributing factor to the ET 0 trends in the HRB in these periods [13]. Together with the performances of the VA1, VA2, and VA3 models, the distinctive differences between VA1/VA2 and VA3 indicated that the combination of these four meteorological parameters, namely R s , T, RH, and u 2 , could provide more accurate ET 0 estimations in winter. These findings are in disagreement with the results of Peng et al. [15]. Similar findings could also be verified by Tegos et al. [58], who proposed that because of the absence of relative humidity and wind speed in the parametric model, the estimated values of the parametric model deviated from the PMF-56 model estimation in some locations.

Validation of the Calibrated Empirical Models
In this study, in order to validate the calibrated empirical models, the meteorological datasets from 2001 to 2014 were used. The radar charts of Figures 7-9 show the RRMSE, MAE and NS values between the original reference evapotranspiration and the calibrated reference evapotranspiration using the 13 empirical models in the HRB. In general, after the calibration, the performance of each empirical models in each month has been improved greatly compared with that of original. After the calibration, the RRMSE and MAE values are closer to 0 and the NS values are closer to 1. Moreover, the RRMSE, MAE, and NS values became smaller and tended to be stable. As given in Figure 7, before the calibration, the RRMSEs in January, February, and December were larger in each empirical model, especially in the ABT and JH models and the PEN and WMO models in February. Even after the calibration, the RRMSE values were still imperfect compared with those in other periods, except for the WMO, TRA, and VA3 models. The performance of these three models was good, especially the WMO and TRA models in January and December, which could also be confirmed in Table 4. However, in other months, despite large RRMSE values that could be detected in the PEN, WMO, and JH models before the calibration, the performances of these models have been greatly enhanced after calibration. Similar results could also be verified from the MAE and NS values in Figures 8 and 9.
After the calibration, it must be emphasized that the temperature-based HS model showed the simplest expression and only needed three easily acquired meteorological variables, namely T, T max , and T min . However, the performance of the HS model was not very good compared with the other types of models. Similar findings could also be found in Feng et al. [12], despite the fact that the performance of the HS model was enhanced by adopting the Bayesian theory, the model still overestimated the results. This might be due to the fact that no physical mechanism was taken into account. Of the mass transfer-based models, the performances of the three models followed the order of TRA > WMO > PEN from January to July and WMO > TRA > PEN from August to December except for September. In September, the performances were in the sequence of PEN > TRA > WMO. Of the radiation-based models, the performances of these six models (MAK, PT, JH, ABT, IRM, and TAB) were good, and the differences between the groups were small from April to October. In other months, distinctive differences could be found, with the IRM and TAB models showing the best performances and the PT and JH models showing the worst performances. That the JH model performed the worst in our study is consistent with the outcomes found in the humid climates of Serbia [59] and Florida, USA [40]. In addition, of the combined models, the VA3 model showed the optimum performance, followed closely by the VA1 and VA2 models, except in September and October, when the performance of these three models followed the order of VA3 > VA2 > VA1.
To more specific, this study also compared the reference evapotranspiration before and after the calibration on monthly and annual timescales using the 13 empirical models in comparison with the PMF-56 model. As shown in Figure 10, after the calibration, the reference evapotranspiration estimated by each of the 13 empirical models on monthly and annual timescales were very close to that estimated by the PMF-56 model. Although the performance of the PEN model was improved after the calibration, as shown in Figures 7-9, it still overestimated the ET 0 , especially from March to October (Figure 10b), and ultimately lead to an overestimated ET 0 on an annual timescale (Figure 10d) as well. This implied that the linear regression model could not properly calibrate the PEN model, and thus, the PEN model is not recommended for estimating the ET 0 in the HRB when compared with other empirical models.     Based on the above discussion, the VA3 model exhibited the best accuracy compared with the other 12 empirical models in the HRB, eastern China. If the complete meteorological datasets were accessible, the VA3 model would be the best alternative to the PMF-56 model because of its satisfactory accuracy and simple algorithm. However, based on data availability, the temperature-based, radiation-based, VA1, and VA2 models are recommended for use during April-October if the corresponding meteorological input parameters in Table 2 are accessible in the HRB and other similar regions, whereas the mass transfer-based models are preferable for other months.
Although the most accurate PMF-56 model was used as the benchmark in the current research to evaluate the performance of the empirical models, the outcomes obtained from this study are still inexplicable to some extent and further verification should be carried out for the experimental ET 0 data (e.g., by eddy covariance systems, lysimeters, etc.) if the conditions permit. In addition, despite the fact that the linear regression model usually improved the performance of the empirical models, large errors could still be found in certain months and models. More calibration is needed to enhance the performance of the empirical models based on the mathematical and physical theories. It is also necessary to evaluate the models used here in other climatic conditions for testing of similar climate-type impacts.

Conclusions
Based on the daily climatic dataset collected from 137 meteorological stations during 1961-2014 across two sub-regions of the HRB, eastern China, this study aims to identify the spatiotemporal trends of the ET PMF on a monthly timescale and to compare the performances of 13 original empirical models with the PMF-56 model. The main results are summarized as follows: (1) The ET PMF increased initially and then decreased on a monthly timescale, with the peak value appearing in June and the lowest value appearing in January. The ET PMF exhibited significant decreasing trends in January, June, July, and August; however, in March and April, the ET PMF demonstrated slightly non-significant increasing trends. (2) On a daily timescale, before the calibration, the VA3 model could be regarded as the best alternative model for estimating reference evapotranspiration in the HRB. However, the PEN, WMO, TRA, and JH models could not be considered appropriate alternative models, because of large errors in their estimations. In particular, the PEN model performed the worst with values of the RRMSE, the MAE, and the NS at 0.580, 1.301, and −0.006, respectively. (3) During the calibration, the determination coefficients of the temperature-based, radiation-based, and combined models presented change trends that increased primarily and then decreased from January to December. High determination coefficients of these models mainly existed between April and October. On the contrary, the mass transfer-based models revealed opposite change trends from January to December. Despite the fact that the mass transfer-based models showed poor performances in daily scatter plots, the performances of these models in January and December were better, with the determination coefficients of the WMO and TRA models at greater than 0.9 and also greater than that of the VA3 model. (4) After the calibration, the reference evapotranspiration calculated by each of the 13 empirical models on monthly and annual timescales were very close to that estimated by the PMF-56 model, except for the PEN model, which overestimated the reference evapotranspiration from March to October and also on an annual timescale. (5) If the comprehensive meteorological datasets were available, the VA3 model would be the best alternative empirical model for the PMF-56 model, because it had an easy computation procedure and generated fewer errors compared to the other 12 empirical models, and it was also highly correlated with the PMF-56 model. After accurate validation for the VA3 model using Equation (5), the calibrated parameters of a and b for each site in the HRB were obtained. Based on data availability, the temperature-based, radiation-based, VA1, and VA2 models are recommended during April-October if corresponding input parameters in Table 2 are accessible  in the HRB and other similar regions, whereas the mass transfer-based models are preferable in  other months. This study is a crucial contribution to the estimation of the ET 0 in the HRB, eastern China, when the large requirements of climate data cannot be met fully. The outcomes of this study will provide guidance to irrigation managers and agrometeorologists for the planning of water resources and irrigation scheduling in the HRB and other regions with similar climates, because the more accurate combination model will give an accurate and reliable estimation of the ET 0 based on the available comprehensive data. Author Contributions: Meng Li and Ronghao Chu analyzed the data and wrote the paper; Abu Reza Md. Towfiqul Islam reviewed the paper and modified the language; and Shuanghe Shen guided the entire research.

Conflicts of Interest:
The authors declare no conflict of interest.