Splitting and Length of Years for Improving Tree-Based Models to Predict Reference Crop Evapotranspiration in the Humid Regions of China

Liu, Xiaoqiang; Wu, Lifeng; Zhang, Fucang; Huang, Guomin; Yan, Fulai; Bai, Wenqiang

doi:10.3390/w13233478

Open AccessArticle

Splitting and Length of Years for Improving Tree-Based Models to Predict Reference Crop Evapotranspiration in the Humid Regions of China

¹

Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas of the Ministry of Education, Northwest A&F University, Yangling, Xianyang 712100, China

²

School of Hydraulic and Ecological Engineering, Nanchang Institute of Technology, Nanchang 330099, China

³

State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin, China Institute of Water Resources and Hydropower Research, Beijing 100038, China

^*

Authors to whom correspondence should be addressed.

Water 2021, 13(23), 3478; https://doi.org/10.3390/w13233478

Submission received: 1 November 2021 / Revised: 23 November 2021 / Accepted: 24 November 2021 / Published: 6 December 2021

(This article belongs to the Section Hydrology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

To improve the accuracy of estimating reference crop evapotranspiration for the efficient management of water resources and the optimal design of irrigation scheduling, the drawback of the traditional FAO-56 Penman–Monteith method requiring complete meteorological input variables needs to be overcome. This study evaluates the effects of using five data splitting strategies and three different time lengths of input datasets on predicting ET₀. The random forest (RF) and extreme gradient boosting (XGB) models coupled with a K-fold cross-validation approach were applied to accomplish this objective. The results showed that the accuracy of the RF (R² = 0.862, RMSE = 0.528, MAE = 0.383, NSE = 0.854) was overall better than that of XGB (R² = 0.867, RMSE = 0.517, MAE = 0.377, NSE = 0.860) in different input parameters. Both the RF and XGB models with the combination of T_max, T_min, and Rs as inputs provided better accuracy on daily ET₀ estimation than the corresponding models with other input combinations. Among all the data splitting strategies, S5 (with a 9:1 proportion) showed the optimal performance. Compared with the length of 30 years, the estimation accuracy of the 50-year length with limited data was reduced, while the length of meteorological data of 10 years improved the accuracy in southern China. Nevertheless, the performance of the 10-year data was the worst among the three time spans when considering the independent test. Therefore, to improve the daily ET₀ predicting performance of the tree-based models in humid regions of China, the random forest model with datasets of 30 years and the 9:1 data splitting strategy is recommended.

Keywords:

data splitting; length of years; random forest; extreme gradient boosting; reference crop evapotranspiration

1. Introduction

Evapotranspiration (ET), the total water consumption of soil evaporation and crop transpiration, is of great significance for water resources planning and management, irrigation systems, land drainage implementation, groundwater research, drought assessment, analysis of farmland environments, and agricultural water management in water shortage areas [1,2,3,4]. The precise prediction of ET is critical at the global level because it has an impact on the hydrological cycle [5,6]. In the context of climate change, agricultural water resources are decreasing on a temporal and spatial scale across the world [7]. Crop water use is the key factor of soil water circulation in farmland, which is exceedingly significant regarding the optimal allocation of water resources and the formulation of irrigation systems, and the key to calculate the crop water demand is to determine the evapotranspiration of crops [8,9,10]. However, methods for calculating the ET, such as the water balance method [11], the conduction theory of aqueous vapor [12], or using the lysimeter device, are extremely time-consuming and expensive in practice, which limits their applicability. Hence, to determine the actual ET value in a wide range, the reference evapotranspiration (ET₀) was developed as an alternative method for calculating the ET and has been widely used [13].

Plenty of nonlinear mathematical models with meteorological variables have been established for ET₀ prediction [14,15,16], among which the FAO-56 Penman–Monteith model is the most widely accepted standard model in different regions and climates. However, the FAO-56 Penman–Monteith model needs a mass of meteorological variables for its calculation, e.g., maximum and minimum ambient temperatures, wind speed, relative humidity, and solar radiation [17,18,19], which is the major weaknesses for its application across the world. Therefore, models with fewer meteorological parameters as inputs, e.g., temperature-based, mass transfer-based, and radiation-based models, have been developed and applied widely in regions where only incomplete meteorological data are available [6,20,21,22,23]. In spite of the wide application, there are still many inconveniences in the estimation of evapotranspiration with empirical models as most of them are linear functions, while evapotranspiration in reality is a highly complicated nonlinear process.

Over the past few decades, machine learning models have been successfully modeled in various fields (i.e., pan evaporation, dew point temperature, global solar radiation, streamflow, water quality, drought events, etc.) due to their excellence in dealing with complex and nonlinear relationships [23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39], including in the field of ET₀ estimation [40,41]. For example, various algorithms, including artificial neural networks [42,43,44], extreme learning machines [45,46,47], support vector machines [48,49,50], gene expression programming [51,52,53], extreme gradient boosting [54,55,56], M5 model tree [57,58], and deep learning [59,60,61], have been evaluated for their capability in estimating the ET₀.

The random forest (RF) is an ensemble-based method. Due to the random forest being able to handle extremely large datasets, RF has been commonly used for predicting ET₀ in recent years [62,63,64,65]. For example, Feng et al. studied the capabilities of the RF and GRNN models for estimating the daily ET₀ with meteorological parameters from two weather stations in southwest China and discovered that both RF and GRNN performed well, while RF was a little better than GRNN in general [62]. Wang et al. reported that the derived generalized ET₀ model based on the RF could be successfully applied to ET₀ estimation with both complete and incomplete meteorological variables, which was recommended for application in water balance research [65]. Junior et al. predicted ET₀ with the inverse distance weighting (IDW), ordinary kriging (OK), random forest (RF), and a random forest variation for spatial predictions (RFsp) based on maximum and minimum temperature data from 136 climatological stations located in Brazil, in which they found that the RF obtained better results than conventional approaches [63]. Karimi et al. used 10-year daily data from Iran and considered the impact of replacing missing meteorological variables with calculated meteorological variables based on the standard FAO-56 PM, some commonly used empirical equations, and the random forest model [5]. According to their results, when the calculated value was used to replace the missing variable, the RF model based on the combination of wind speed has higher accuracy than the RF model based on the combination of solar radiation. In addition, the random forest was also widely used in flood probability mapping [65,66], and there are relevant reports using remote sensing data [67,68]. Meanwhile, the random forest has also been well applied in water quality [69].

In recent years, because the error is reduced, the prediction accuracy is better and the calculation costs are lower. Chen and Guestrin proposed the tree-based extreme gradient boosting (XGB) [70], which has been widely applied in various fields [71,72,73]. In addition, the model has also been used to predict ET₀. For example, Wu et al. explored the performance of the XGB model in estimating the monthly mean daily ET₀ using temperature data and found that the XGB model exhibited better estimation accuracy than the other methods [74]. Fan et al. evaluated the capability of the XGB model in estimating daily reference evapotranspiration using the Global Ensemble Reforecast v2 data in different climatic zones of China [75]. The results indicated that the XGB model can be satisfactory for estimating the daily ET₀. Furthermore, the optimization algorithm of the XGB model has received more and more attention because of its ability to enhance the ability of artificial intelligence methods in the modeling process of solving engineering problems, and it has been used to estimate ET₀ [76,77,78]. Therefore, the XGB model is suited to estimate the daily ET₀ in data-limited regions.

Machine learning models of different heuristic agrometeorological variables have shown high accuracy in ET₀ estimation based on finite data. However, the soundness of the model to overcome the complexity in reality and to obtain high-precision simulating results is highly dependent on the data management strategy during model development and evaluation, especially for the splitting strategy of data allocating to the model training and testing stages. Therefore, the key to ensuring a model obtaining the best simulation accuracy with the data series is to find a suitable standard for appropriately splitting the data into the model training and testing stages. For instance, Wu et al. established an RF model with a 2:1 data splitting for the training and testing and found that the RF had higher simulation accuracy than the other intelligent models [74]. To find an alternative method of mass transfer-based methods, Shiri et al. established a random forest using a cross-validation at the local and cross-station scales with a single data splitting for the training and testing series [79]. It was found that the simulation accuracy of the random forest model was better than the transfer-based models.

In the context of climate change, both meteorological factors and ET₀ have changed a great deal [80,81]. This poses a challenge to model establishment and evaluation for estimation with a long-term period of data, and the efficiency for the model estimating ET₀ is related to the time length of the input datasets [4,82,83,84]. Yassen et al. divided a 35-year historical record (1983–2017) into four groups (i.e., 17 years (long-term), 10 years and 7 years (middle-term), and 5 years (short-term)) to study the temporal and spatial changes of Egypt’s annual reference evapotranspiration [85]. The results indicated that the short-term group showed the most significant differences in all the studied areas of Egypt, while the long-term and medium-term differences were only significantly different in a certain area of Egypt. Ning et al. studied the interaction of the three factors (i.e., vegetation, climate, and topography) and their corresponding impacts on ET modelling at six different time spans in the Loess Plateau of China [86]. The results showed that the long-term spans showed stronger relationships between the three factors than short-term spans in most catchments. Therefore, it can be concluded that the time length of input datasets has an important influence on the model accuracy of evaluating ET₀.

To our knowledge, the trend of ET₀ was found to have changed in different regions of the world. Both Iran in the Middle East [87] and Spain in the Iberian Peninsula in southwestern Europe [88] found an increasing trend in ET₀. However, a decreasing ET₀ trend had been reported in Northern China [89,90]. In the context of climate change, due to the large population, vast land area, and frequent floods in the humid area of southern China in this study, the uncertainty of the climate is expected to intensify the variability of the ET₀ in this area [80,81]. However, relevant reports to date are still lacking in southern China. Therefore, it is of great significance to study how to improve the accuracy of the ET₀ modeling for alleviating the pressure on the water resources in the region. Meanwhile, the application of the relatively simple tree-based RF and extreme gradient boosting model in ET₀ estimation under various data splitting strategies (i.e., different proportions of splitting) has not been evaluated. In addition, there is no corresponding report on the applicability of the random forest and extreme gradient boosting in estimating the ET₀ under limited meteorological data and various time lengths of input datasets (i.e., data obtained from different time ranges). Accordingly, the performance of the RF and XGB on daily ET₀ estimation under various conditions consisting of different model input combinations, data splitting strategies, and time lengths was evaluated in this study with meteorological records from twenty-one climatological stations in the humid areas of southern China. Overall, the aims of this research are to: (1) discuss the influence of different meteorological variable input combinations on model performance; (2) evaluate the effectiveness of various data splitting strategies in estimating the ET₀ under different input combinations; and (3) evaluate the effectiveness of different time lengths of data on ET₀ estimation under various input combinations and splitting strategies.

2. Materials and Methods

2.1. Study Areas

In this research, daily meteorological data from 21 representative meteorological stations across the humid region of China (Figure 1) were used to build the RF and XGB models to estimate ET₀. This area is rich in water and heat resources, the geographic range including two river basins (the Yangtze River Basin and the Pearl River Basin). Due to the effect of El Nino and typhoons, the occurring frequency of floods and waterlogging disasters in this region is generally high, often bringing huge impacts to nature and the society of this region. For example, a summer flood that occurred in the Poyang Lake of the Yangtze River Basin affected over 2.531 million people and 190.4 thousand hectares of crops, resulting in an economic loss of 2.39 billion RMB. Therefore, this area has become an area of widespread concern for many scholars who study hydrological phenomena and climate [55,91].

2.2. Used Temperature Data

Continuous and long-term series of observed daily maximum (T_max) and minimum (T_min), relative humidity (RH), global solar radiation (Rs), extra-terrestrial solar radiation (Ra), and wind speed (U₂) from 1966 to 2019 were gathered from 21 representative climatological stations in the humid region of China (Figure 1). Among them, 1966–2015 was used for training and testing models, and 2016–2019 was used for independent testing. The meteorological records with quality control were obtained from the National Meteorological Information Center (NMIC) of China Meteorological Administration (URL: http://data.cma.cn accessed on 5 March 2020). The detailed description of the 21 studied weather stations is listed in Table 1. Among these stations, the mean daily maximum ambient temperatures were 7.75–29.75 °C, and the mean daily minimum ambient temperatures were 0.55–21.65 °C. The range of daily average wind speed varied from 0.49 to 2.37 m·s⁻¹, while the daily average relative humidity ranged between 85.51% at Emeishan and 62.43% at Lijiang. The range of daily average global solar radiation varied between 16.94 MJ·m⁻²·d⁻¹ at Lijiang and 10.15 MJ·m⁻²·d⁻¹ at Guiyang. The highest daily average ET₀ (3.44 mm·d⁻¹) was monitored at Mengzi, while the lowest value (1.72 mm·d⁻¹) appeared at Emeishan. In general, the plateau site is more variable than sites in plains and hilly areas.

2.3. Estimation of Reference Evapotranspiration Using the FAO-56 Penman–Monteith Equation

The Penman–Monteith equation advocated by Allen et al. was used to compute daily ET₀ [3] and provide the reference evapotranspiration for the machine learning models in this study [62,88,92]:

E T_{0} = \frac{0.408 Δ (R_{n} - G) + γ \frac{900}{T_{m e a n} + 273} U_{2} (e_{s} - e_{a})}{Δ + γ (1 + 0.34 U_{2})}

(1)

where Rn: net radiation (MJ·m⁻²·d⁻¹); G: soil heat flux (MJ·m⁻²·d⁻¹); T_mean: average ambient temperature (°C), i.e., T_mean = (T_max + T_min)/2; U₂: wind speed (m·s⁻¹); e_s: saturation vapor pressure (kPa); e_a: actual vaporpressure (kPa); Δ: slope of the vapor pressure curve (kPa °C⁻¹), and γ: psychrometric constant (kPa °C⁻¹). For more details on how Penman–Monteith equation was constructed, please refer to the literature of Allen et al. [3].

2.4. Random Forest (RF)

Random forest (RF) is used for classification and regression [7], mainly used for regression problems [55,91,93]. The RF algorithm builds a decision tree on data samples and then obtains the prediction results from each sample, reduces overfitting by averaging the results, and finally optimizes the solution, thereby improving the prediction performance.

The model of random forest is established by decision-based learning device. To establish an RF model, the first step is to get the sub-training set from the original data. Suppose there are M samples in the initial dataset D, and the probability of not selecting a particular individual after M samples is (1-M⁻¹)^M. This means that when the training sets are generated by sampling, each training set contains 63.2% of the original datasets, and the unselected ones (36.8% of the original datasets) become out-of-bag datasets.

The main difference between random forest and bagging is that, when constructing each tree, n features are randomly selected from all the features M. When optimizing each segmentation node, the principle of minimum Gini coefficient is adopted. The Gini coefficient can be expressed as follows:

G i n i (p) = 2 p (1 - p)

(2)

For the classification problem, the original problem began with developing trees on the basis of random vector when using RF [7]. The prediction ability of the random forest model needs to be evaluated by the edge function, and the equation is as follows:

m g (X, Y) = a v_{k} I (h_{k} (X) = Y) - \underset{j \neq Y}{m a x} {a v}_{k} I (h_{k} (X) = j)

(3)

Generalization error is used to measure the accuracy of the random forest model. The generalization error of random forest is:

{P E}^{*} = P_{X, Y} (m g (X, Y) < 0)

(4)

For the parameter meaning in the above formula and the details of the random forest model establishment, please refer to the literature of Breiman [7]. The structure of the RF algorithm is shown in Figure 2.

2.5. Extreme Gradient Boosting

Extreme gradient boosting (XGB) is a new algorithm of gradient enhancers (GBMs) proposed by Chen and Guestrin [9]. The XGB model is designed to prevent over-fitting while reducing the computational cost by keeping the predictions at the best computational efficiency through simplification and regularization. The XGB algorithm is derived from the concept of “boosting”. It combines all the predictions of a group of weak learners and trains strong learners through special training. The calculation formula is as follows:

f_{i}^{(t)} = \sum_{k = 1}^{t} f_{k} (x_{i}) = f_{i}^{t - 1} + f_{t} (x_{i})

(5)

where t is the number of trees, f_t(x_i) is a function, and x_i is the input variable.

In order to prevent the over-fitting problem without affecting the calculation speed of the model, the XGB model can derive the following formula:

O b j^{(t)} = \sum_{k = 1}^{n} l (\bar{y_{i}}, y_{i}) + \sum_{k = 1}^{n} Ω (f_{i})

(6)

where

l

is loss function, n is the number of the observed,

\sum_{k = 1}^{n} l (\bar{y_{i}}, y_{i})

is training error,

\bar{y_{i}}

is the predicted value, y_i is the actual value, Ω is the regularization term, and the formula is:

Ω (f) = γ T + \frac{1}{2} λ {‖ω‖}^{2}

(7)

where ω is norm of leaf scores, λ is a regularization parameter, and γ represents the parameter that controls the weight of the number of leaves.

The XGB algorithm is based on a gradient boosting strategy. It does not reach all the trees at once but adds a new tree each time to patch the previous test results. Assuming that the predicted value at step t is

{\overset{\land}{y i}}^{(t)}

, the following derivation process can be obtained:

\begin{array}{l} {\overset{\land}{y_{^{i}}}}^{(0)} = 0 \\ {\overset{\land}{y_{i}}}^{(1)} = f_{1} (x_{i}) = {\overset{\land}{y i}}^{(0)} + f_{1} (x_{i}) \\ {\overset{\land}{y_{i}}}^{(2)} = f_{1} (x_{i}) + f_{2} (x_{i}) = {\overset{\land}{y_{i}}}^{(1)} + f_{2} (x_{i}) \\ ⋮ \\ {\overset{\land}{y_{i}}}^{(t)} = {\sum_{k = 1}^{t} f_{k} (x_{i}) = \overset{\land}{y_{i}}}^{(t - 1)} + f_{t} (x_{i}) \end{array}

(8)

Details of the XGB model can be found in Song et al. [94].

2.6. Input Combinations

Four input combinations of meteorological variables were applied in present research to discuss the influences of different climatic factors on daily ET₀ estimation. Therefore, utilizing various combinations of T_max, T_min, Ra, Rs, RH, and U₂, a total of four combinations of input are considered (Table 2). The flowchart of this study is described in Figure 3.

2.7. Data Splitting Strategies and Time Lengths of Input Data

In this study, five data splitting strategies with different proportions of datasets allocated for model training and testing were applied. Specifically, the proportions of dataset allocating to training and testing stages were set as 5:5 (S1), 6:4 (S2), 7:3 (S3), 8:2 (S4), and 9:1 (S5), respectively (Figure 4). Within each of the splitting strategies, three levels of data with different time ranges (spanning 10, 30, and 50 years, respectively) were used for model development and evaluation, which were defined as the 10-year span (2006–2015), the 30-year span (1986–2015), and the 50-year span (1966–2015), respectively (Figure 4). Details of the data splitting strategy, the selection of specific years, and the cross-validation procedure for the establishment and evaluation of each model are shown in Figure 4. Furthermore, this paper used a fixed test dataset from 2016 to 2019 for independent testing and varying only the training dataset. Based on the above data manipulation, the machine learning models coupled with a K-fold cross-validation approach was then applied to estimate ET₀ under each of the input combinations.

2.8. Statistical Performance Analysis

The accuracy of the models for estimating daily ET₀ were evaluated with four generally used statistical indicators [64,91], which were root mean square error (RMSE), mean absolute error (MAE) [95], coefficient of determination (R²), and Nash–Sutcliffe coefficient (NSE) [96], respectively. The statistical indices are expressed as follows:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(X_{i, P} - X_{i ., R})}^{2}}

(9)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |X_{i, P} - X_{i, R}|

(10)

R^{2} = \frac{{[\sum_{i = 1}^{n} (X_{i, P} - {\bar{X}}_{i, P}) (X_{i, R} - {\bar{X}}_{i, R})]}^{2}}{\sum_{i = 1}^{n} {(X_{i, P} - {\bar{X}}_{i, P})}^{2} \sum_{i = 1}^{n} {(X_{i, R} - {\bar{X}}_{i, R})}^{2}}

(11)

N S E = 1 - \frac{\sum_{i = 1}^{n} {(X_{i, p} - X_{i, R})}^{2}}{\sum_{i = 1}^{n} {(X_{i, p} - X_{i, R})}^{2}}

(12)

where X_i,P, X_i,R,

{\bar{X}}_{i, P}

, and n are the FAO-56 Penman–Monteith ET₀, the predicted ET₀, the mean of FAO-56 Penman–Monteith ET₀, and the number of observed meteorological data, respectively. The value of R² exceedingly approaches 1, meaning the model has better performance and data fitting. Conversely, the values of RMSE and MAE extremely approach 0, indicating higher prediction accuracy. Sutcliffe coefficient (NSE) is a commonly used indicator when evaluating the performance of a model. The higher the value of NSE, the better the performance of the model and vice versa. A perfect well between the estimated and the target ET₀ will produce NSE = 1.0 [97].

3. Results

3.1. Comparisons of XGB and RF Predicting Daily ET₀ with Various Input Combinations

The predicting capability of machine learning models for reference evapotranspiration at three levels of time length (2006–2015, 1986–2015, and 1966–2015) was evaluated by the R², RMSE, MAE, and NSE, which is largely due to the input of meteorological data, These meteorological data are derived from the FAO-56 Penman–Monteith model. The statistical results of the four different input combinations for predicting the daily ET₀ at the twenty-one climatological stations in the humid areas of China are provided in Table 3.

Taking the 50-year span as an example, the RF and XGB models with input combination 2 (i.e., the RF2 and XGB2 models, input variables consisting of T_max, T_min, and Rs) had better predicting accuracy than the other input combinations (Table 3); the range of the mean RMSE value of the two combinations (inputs with T_max, T_min, and Rs; inputs with T_max, T_min, and Ra, respectively) were 0.324–0.688 mm d⁻¹ during the testing phase, and the homologous values of the XGB models were 0.328–0.689 mm d⁻¹. The input combination of T_max, T_min, RH, and Ra produced a pleasing daily ET₀ prediction, and the mean RMSE values were 0.516 mm d⁻¹ and 0.526 mm d⁻¹ in the RF and XGB, respectively. Whereas, the models with input combination 4 (i.e., input variables consisting of T_max, T_min, U₂, and Ra) were also capable of estimating the daily ET₀ with respectable precision, possessing a mean RMSE value of 0.607 mm d⁻¹ and 0.620 mm d⁻¹ in the RF and XGB, respectively. These phenomena show that a reasonable combination of parameters is beneficial to the improvement of model accuracy. On the basis of temperature variables, the importance for each of the other three meteorological variables (i.e., Rs, RH, and U₂) contributing to the improvement of model accuracy can be ranked as Rs > RH > U₂. Although the input combination with Rs can produce better model accuracy than input combinations with any other variable, it should be noted that the radiation records are not universally available across the world, especially for less developed regions. In comparison, RH is a variable that could be easily obtained in most regions on Earth, while, at the same time, it provides a decent contribution to improving model accuracy. Therefore, RH is recommended as an alternative for ET₀ estimation with the model in regions where Rs is not available. In terms of machine learning models’ performance under different input combinations, compared with the 50-year span, similar patterns were observed in the other two levels of time range, and the random forest model is better than the extreme gradient boosting model (i.e., the 10-year span and the 30-year span, respectively; see Table 3).

3.2. Comparisons of XGB and RF Predicting Daily ET₀ with Data Splitting Proportions

Table 4, Table 5 and Table 6 present the statistical results of the machine learning models with the five data splitting strategies (i.e., splitting into proportions of 5:5 (S1), 6:4 (S2), 7:3 (S3), 8:2 (S4), and 9:1 (S5), respectively) under four combinations of input during testing phases. As shown in the tables, the models predicting accuracy differ among data splitting strategies under the same input combination. Using the 50-year span (Table 6) as an example, the S5 proportion demonstrated that the values of R² and NSE are closest to 1 and the values of RMSE and MAE are closest to 0 in the testing phase for the four combinations of input in two machine learning models, compared to the S4, S3, S2, and S1. The ranks of the researching proportions of the two machine learning models in the field of estimation precision in the testing phase were: S5 > S4 > S3 > S2 > S1. In other words, the S5 proportion had a slightly better capability than the S4 proportion and S3 proportion while realizing a greater edge in capability over the S2 proportion and the S1 proportion. The S5 and S4 proportions had almost equivalent performance (distinction in RMSE < 2%) in predicting the daily ET₀ for the four combinations of input, both of which move beyond the other three data splitting proportions in estimating the daily ET₀. However, the S1 proportion of XGB and RF revealed the worst estimates of the daily ET₀ for the S5 proportion, with an increase in RMSE by 7.5–7.6% and 7.1–7.2% for the combination of input (i.e., T_max, T_min, Ra, and RH) and only by 3.5–5.9% and 2.6–5.0% for the other three input combinations, respectively. In general, for the five data splitting proportions, the statistical performance of the data splitting proportion of the RF is better than that of the XGB (Table 6), indicating that the random forest models produced high-precision estimation at the testing. Compared with the 50-year span, similar patterns of model performance with different data splitting proportions were observed in the 10-year span (Table 4) and the 30-year span (Table 5).

The box diagrams of the FAO-56 Penman–Monteith ET₀ values and ET₀ predicted by the RF model of the model of the S5 proportion during ten cross-validation periods using the best combination of input (i.e., the combination of T_max, T_min, and Rs) in the testing phase are demonstrated in Figure 5. The diagrams clearly presented that the scopes of ET₀ values estimated by the ten cross-validation stages were close to the FAO-56 Penman–Monteith ET₀ values of their corresponding stages, further highlighting the model accuracy on estimating daily ET₀. Overall, the accuracy of the ten cross-validation periods for the four selected sites was high, suggesting that the RF model can be utilized for estimating ET₀ in this area. In particular, the medians, inter-quartile ranges, and extreme values of the fifth and six cross-validation periods were closer to their corresponding values of FAO-56 Penman–Monteith than other cross-validation periods, indicating a better daily ET₀predicting performance for the former two periods. Among the four selected sites, the distribution of the maximum, minimum, and interquartile range values of the ET₀ at Guiyang station (inland plateau) was the closest to the corresponding values of the FAO-56 PM estimated ET₀ during the ten cross-validation stages.

3.3. Comparisons of XGB and RF Predicting Daily ET₀ with Various Time Lengths of Input Data

The average and local RMSE values of the RF and XGB models for estimating daily ET₀ using the available length of years variables in the testing stage at the meteorological stations in the humid regions of southern China are presented in Figure 6. Similar to previous results (Table 3), the machine learning models with input combination 2 (i.e., the RF2 and XGB2 models, input variables consisting of T_max, T_min, and Rs) and the data spitting proportion of S5 had more promising accuracy than other models and proportions. Specifically, under the different data splitting strategies in the testing stage, compared to the 10-year dataset, the increased percentage of the average RMSE of the RF2 model datasets from a length of 50 years ranged from 2.811 to 3.21%, while the increased percentage of the average RMSE of the 30-year dataset increased by 0.39 to 0.74% in the RF2 model. Besides, the ranges of the increased percentage in the RF1, RF3, and RF4 models were 3.16–3.56%, 6.21–7.07%, and 1.01–1.24%; 0.58–0.79%, 0.45–1.89%, and 0.46–0.84% in the field of the average RMSE in the length of 50 years and length of 30 years datasets relative to the 10-year dataset, respectively. Moreover, the extreme gradient boosting model is consistent with the results shown by the random forest model. Compared to the 30-year dataset, the increased percentage of the average RMSE in the XGB2 model datasets from a length of 50 years ranged from 3.45–3.78%, while the decreased percentage of average RMSE of a 10-year dataset decreased by 2.85–3.30%. Among the three levels of time lengths of input data, the XGB and RF models with the 50-year span performed worst (RMSE = 0.276 mm·d⁻¹–0.612 mm·d⁻¹ and 0.259 mm·d⁻¹–0.572 mm·d⁻¹, respectively), followed by the 30-year span (with RMSE ranging 0.266 mm·d⁻¹–0.593 mm·d⁻¹ and 0.252 mm·d⁻¹–0.557 mm·d⁻¹, respectively); the 10-year span (2006–2015) showed satisfying daily ET₀ estimates in southern China (RMSE = 0.257 mm·d⁻¹–0.579 mm·d⁻¹ and 0.250 mm·d⁻¹–0.554 mm·d⁻¹, respectively). Overall, under the same ratios and combinations of the RMSE values of the three time spans, the reduction in the modeling data used improves the accuracy of the XGB and RF models (Figure 6).

3.4. Comparisons of XGB and RF Predicting Daily ET₀ with a Fixed Testing Dataset

To effectively assess the impacts of different data splitting proportions and various time lengths of input data on model performance, a fixed testing dataset consisting of records from 2016 to 2019 was used for the model testing of all the types of models constructed in this study. Meanwhile, the training datasets remained varied among different models, the same as stated previously. The average statistical indicators of models with the fixed testing dataset (2016–2019) were calculated for different time lengths of input data (Table 7, Table 8 and Table 9). As shown in the tables, under the same time length of input data, both the RF and XGB models with input combination 2 (i.e., the RF2and XGB2 models, input variables consisting of T_max, T_min, and Rs) had better predicting accuracy than other input combinations, and this pattern did not vary among different time lengths. Furthermore, for any of the three time lengths, the estimating accuracies of the two groups of machine learning models with different data splitting proportions was ranked as S5 > S4 > S3 > S2 > S1. Specifically, compared with other splitting proportions, the values of R² and NSE were closer to 1, while the values of RMSE and MAE were closer to 0 in the S5 proportion during the testing phase for any of the four input combinations, and these trends did not differ between the RF and XGB models. The results with the fixed testing dataset were consistent with the results of the above testing datasets (Table 4, Table 5 and Table 6).

To evaluate the impacts of different time lengths of input data on model accuracy, the statistical indicators of models with the fixed testing dataset (2016–2019) under the input combination 2 and the S5 proportion were analyzed (Figure 7). Generally, RF showed higher accuracy than XGB. Under each of the three time lengths, the RF model consistently had higher values of R² and NSE and lower RMSE and MAE values than the XGB model (Figure 7). Among the three time lengths, the models with the 30-year span data showed the best estimating accuracy, followed by models with the 50-year span data and then with the 10-year span data, respectively. Taking the RF model as an example, the values of R² (0.951) and NSE (0.946) for the models with the 30-year span data were higher than models with the 50-year span data (R² = 0.950; NSE = 0.944), or the same as models with the 10-year span data (R² = 0.951; NSE = 0.946). Meanwhile, the 10-year-span models had lower error values (RMSE = 0.312 mm·d⁻¹; MAE = 0.234 mm·d⁻¹) than models with other time spans (RMSE = 0.313 mm·d⁻¹ and MAE = 0.237 mm·d⁻¹ for the 50-year span; RMSE = 0.317 mm·d⁻¹ and MAE = 0.238 mm·d⁻¹ for the 10-year span). The results for other input combinations and other data splitting proportions (see Tables S1 and S2 for details) were consistent with the above results.

4. Discussion

4.1. Effects of Input Combination Strategy on Daily ET₀ Estimation

The category of the parameters of input was a crucial factor for the estimation precision of the machine learning models in estimating the daily ET₀. The model commonly operated the worst when the T_max/T_min and Ra were valid in southern China. Since the model prediction accuracy generally increases with the more meteorological input parameters [57,98,99], models with temperature data as inputs would only generate non-ideal daily ET₀ estimation despite the fact that temperature data are generally widely effective around the world [20,100]. Therefore, the extreme gradient boosting and random forest model with wind speed, relative humidity, and global solar radiation (instead of extra-terrestrial radiation) data would produce acceptable ET₀ values. In this study, the machine learning models with the input combination of T_max, T_min, and Rs presented better prediction accuracy than other combinations. The results indicate that, with the global solar radiation (Rs) as inputs, the ET₀ values estimated by the XGB and RF models show a favorable viewpoint with the homologous FAO-56 Penman–Monteith values in the humid regions of China. Feng et al., Fan et al. and Huang et al. also demonstrated that the random forest models with T_max/T_min and Rs attained extremely pleasing ET₀ estimation in southern China [54,55,62]. The XGB and RF models with T_max/T_min, Ra, and RH outperformed the XGB and RF models with T_max/T_min, Ra, and U₂ in the humid region. These consequences indicate that relative humidity is a more important factor than wind speed when estimating the ET₀ with the XGB and RF models in the humid region. Among the three single factors other than temperature, the significance of meteorological parameters to estimate daily ET₀ was ranked as Rs > RH > U₂ in the humid area of southern China. This consequence is consistent with the research of Yan et al. [78], where they conclude that Rs is more influential than RH and U₂ for estimating the daily ET₀ in the humid region.

4.2. Effects of Data Splitting Proportions on Daily ET₀ Estimation

Previous studies have shown that high-precision simulations of machine learning models on ET₀ prediction can be obtained with a single ratio of allocating data into training and testing [56,61]. However, under the same total dataset, there is no report on whether the multiple ratios between the training data and testing data will improve the precision of the machine learning models. As mentioned above (see in Table 4, Table 5 and Table 6, respectively), the extreme gradient boosting and random forest models with the data splitting proportion of S5 showed excellent capability in predicting the daily ET₀ for all the combinations of input, which exceeded the other four data splitting proportions at twenty-one meteorological stations during the testing phase. Moreover, as the number of years in the testing phase decreases, the accuracy of the model increases. This is an exceedingly hopeful strategy for improving the accuracy of machine learning models to estimate daily ET₀, especially when there are plenty of historical years of data in the training phase. Consequently, for improving the accuracy of machine learning models, the models should be established with appropriate data segments. In this research, the five proportions among the proportions within the dataset were identified. The accuracy of the data-segment increased with the increase in the ratio in five ratios. In the split rule cases of Rezaabad et al. [101], the three nearest proportions among the proportions within the ten percent of the dataset were also identified. The accuracy of the smallest data segment has been known as the inferior ratio in the three ratios. However, the accuracy of the maximum proportion of this study is not perfect. Therefore, how to precisely select a satisfying proportion needs further study. Shiri et al. established the GEP model, utilizing data splitting strategies in sub-humid stations for estimating the daily ET₀, and procured good results in sub-humid regions [102]. However, in this study, the XGB and RF models were evaluated in humid areas. Future studies will be needed to use coupled data from arid and humid stations for evaluating the machine learning models.

4.3. Effects of Available Length of Years on Daily ET₀ Estimation

The average RMSE calculated by the period of the length of the 10-year dataset was much lower than those of the corresponding two periods under various combinations and proportions, while the length of the 50-year dataset was the highest (Figure 6). The results indicated that the reduced use of modeling data can improve the accuracy of the precision of the random forest models under various input parameters and data segmentation. This shows that the length of 50 years has been particularly inaccurate in dealing with the complex non-linear relationship between the ET₀ and its parameters in the XGB and RF models, The reason for this phenomenon may be that climate change has caused changes in meteorological factors, resulting in a corresponding increase in the value of the ET₀ with the growing length of years. Related phenomena have also been reported in the literature [85,86,103]. However, the results of independent testing data show that the model with a 30-year span has the highest accuracy and the model with a 10-year span has the lowest (Figure 7), which is inconsistent with the results shown in the test dataset. The reason for this phenomenon may be due to the over-fitting phenomenon caused by the smaller dataset of the 10-year span model [104]. In this study, the results showed that appropriately reducing the year span of the dataset is beneficial for the improvement of the model accuracy. However, the specific causes remain to be further studied. In addition, the superiority of datasets of different lengths for predicting ET₀ has been widely researched [105]. Yin et al. coupled the bi-directional and different datasets for predicting the ET₀ and discovered that the length of the short dataset provides the best forecast performance in three lengths of datasets [106]. In the present study, the three different lengths of years were used to build extreme gradient boosting and random forest models for the first time. Due to the variables of different lengths of years, the prediction precision of the random forest and extreme gradient boosting models have been enhanced (Figure 6 and Figure 7). Although the 10-year meteorological data obtained high accuracy in the test dataset, its performance was the worst in independent testing. Therefore, the 30-year data span model is a promising method for predicting the ET₀ in the humid southern regions of my country, and it may also apply to regions with similar climates.

5. Conclusions

The extreme gradient boosting and random forest models of data splitting strategies and variable ranges of years have been put forward to predict the daily ET₀ in twenty-one weather stations of the humid regions of China. The results revealed that the accuracy of the random forest model is better than that of the extreme gradient boosting model, and the Rs were more crucial than the RH, U₂, and Ra in predicting the daily ET₀ in southern China. The data splitting proportion of S5 showed excellent performance for all the same input combinations, and the importance of the data splitting variables for predicting the daily ET₀ was as follows: S5 > S4 > S3 > S2 > S1. Compared with the length of 30 years, the estimation accuracy of the 50-year length with limited data is reduced, while the length of meteorological data of 10 years improves the accuracy for southern China. However, the 10-year performance was worse when considering the independent test. Considering that the data span of 30 years has high accuracy and a stable performance, it is recommended that the random forest model with a dataset of 30-year length produces the daily ET₀. In the absence of continuous and complete meteorological records, this promising strategy can be used as an alternative to the FA0-56 P-M model to calculate ET₀. Consequently, the random forest model is proposed as a hopeful selective approach to improving the accuracy for estimating the daily ET₀ under conditions of insufficient climatic data in the humid area of southern China. Whereas, further research is required to estimate the performance of the suggested random forest model in the arid and humid climate areas of China or similar climates around the world.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/w13233478/s1, Table S1. Average statistical values of different input parameters of three length of years of machine learning models in testing process of 21 stations under the fixed test data set (2016-2019). Table S2. Average statistical values of five proportions of three length of years of machine learning models in testing process of 21 stations under the fixed test data set (2016-2019).

Author Contributions

X.L.: Data curation, Formal analysis, Software, Validation, Funding acquisition, Writing—original draft, Writing—review & editing. F.Z.: Formal analysis, Writing-review & editing. L.W.: Conceptualization, Methodology, Software, Writing-review & editing. G.H.: Writing—review & editing. F.Y.: Data curation, Formal analysis. W.B.: Data curation, Formal analysis. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by “National Key Research and Development Program of China, grant number 2017YFC1502701” and “Science and technology Cooperation Project in Jiangxi of China, grant number 20212BDH80016”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data will be made available on request to the correspondent author’s email with appropriate justification.

Acknowledgments

Thanks to the National Meteorological Information Center of China Meteorological Administration for offering the meteorological data.

Conflicts of Interest

The authors declare that they have no conflict of interest to influence the work reported in this paper.

References

Abdullah, S.S.A.; Malek, M.A.; Abdullah, N.S.; Kisi, O.; Yap, K.S. Extreme learning machines: A new approach for prediction of reference evapotranspiration. J. Hydrol. 2015, 527, 184–195. [Google Scholar] [CrossRef]
Fan, J.; Oestergaard, K.T.; Guyot, A.; Lockington, D.A. Estimating groundwater recharge and evapotranspiration from water table fluctuations under three vegetation covers in a coastal sandy aquifer of subtropical Australia. J. Hydrol. 2014, 519, 1120–1129. [Google Scholar] [CrossRef] [Green Version]
Feng, Y.; Peng, Y.; Cui, N.; Gong, D.; Zhang, K. Modeling reference evapotranspiration using extreme learning machine and generalized regression neural network only with temperature data. Comput. Electron. Agric. 2017, 136, 71–78. [Google Scholar] [CrossRef]
Traore, S.; Luo, Y.; Fipps, G. Deployment of artificial neural network for short-term forecasting of evapotranspiration using public weather forecast restricted messages. Agric. Water Manag. 2016, 163, 363–379. [Google Scholar] [CrossRef]
Karimi, S.; Shiri, J.; Mart, P. Supplanting missing climatic inputs in classical and random forest models for estimating reference evapotranspiration in humid coastal areas of Iran. Comput. Electron. Agric. 2020, 176, 105633. [Google Scholar] [CrossRef]
Priestley, C.H.B.; Taylor, R.J. On the assessment of surface heat flux and evaporation using large-scale parameters. Mon. Weather Rev. 1972, 100, 81–92. [Google Scholar] [CrossRef]
Djaman, K.; Tabari, H.; Balde, A.B.; Diop, L.; Futakuchi, K.; Irmak, K. Analyses, calibration and validation of evapotranspiration models to predict grass-reference evapotranspiration in the Senegal river delta. J. Hydrol. Reg. Stud. 2016, 8, 82–94. [Google Scholar] [CrossRef] [Green Version]
Feng, Y.; Cui, N.; Zhao, L.; Hu, X.; Gong, D. Comparison of ELM, GANN, WNN and empirical models for estimating reference evapotranspiration in humid region of Southwest China. J. Hydrol. 2016, 536, 376–383. [Google Scholar] [CrossRef]
Karimi, S.; Kisi, O.; Kim, S.; Kim, S.; Nazemi, A.; Shiri, J. Modelling daily reference evapotranspiration in humid locations of South Korea using local and cross-station data management scenarios. Int. J. Climatol. 2017, 37, 3238–3246. [Google Scholar] [CrossRef]
Yan, S.; Wu, Y.; Fan, J.; Zhang, F.; Qiang, S.; Zheng, J.; Xiang, Y.; Guo, J.; Zou, H. Effects of water and fertilizer management on grain filling characteristics, grain weight and productivity of drip-fertigated winter wheat. Agric. Water Manage. 2019, 213, 983–995. [Google Scholar] [CrossRef]
Guitjens, J.C. Models of Alfalfa yield and evapotranspiration. J. Irrig. Drain. Div. Proc. Am. Soc. Civ. Eng. 1982, 108, 212–222. [Google Scholar] [CrossRef]
Harbeck, G.E., Jr. A Practical Field Technique for Measuring Reservoir Evaporation Utilizing Mass-Transfer Theory; Paper 272-E; US Government Printing Office: Washington, DC, USA, 1962; pp. 101–105.
Allen, R.G.; Pereira, L.S.; Raes, D.; Smith, M. Crop Evapotranspirationguidelines for Computing Crop Water requirements-FAO Irrigation and Drainage Paper 56. Fao Rome 1998, 300, D05109. [Google Scholar]
Doorenbos, J.; Pruitt, W.O. Guidelines for predicting crop water requirements. In FAO Irrigation and Drainage Paper 24; FAO: Rome, Italy, 1977. [Google Scholar]
Monteith, J.L. Evaporation and environment. In Symposia of the Society for Experimental Biology; Society for Experimental Biology: London, UK, 1965; Volume 19, pp. 205–234. [Google Scholar]
Penman, H.L. Natural evaporation from open water, hare soil and grass. Proc. R. Soc. Lond. 1948, 193, 120–145. [Google Scholar]
Fan, J.; Wang, X.; Wu, L. New combined models for estimating daily global solar radiation based on sunshine duration in humid regions: A case study in South China. Energy Convers. Manage. 2018, 156, 618–625. [Google Scholar] [CrossRef]
Fan, J.; Chen, B.; Wu, L. Evaluation and development of temperature-based empirical models for estimating daily global solar radiation in humid regions. Energy 2018, 144, 903–914. [Google Scholar] [CrossRef]
Shiri, J.; Nazemi, A.H.; Sadraddini, A.A.; Landeras, G.; Kisi, O.; Fard, A.F.; Marti, P. Comparison of heuristic and empirical approaches for estimating reference evapotranspiration from limited inputs in Iran. Comput. Electron. Agric. 2014, 108, 230–241. [Google Scholar] [CrossRef]
Feng, Y.; Jia, Y.; Cui, N.; Zhao, L.; Li, C.; Gong, D. Calibration of Hargreaves model for reference evapotranspiration estimation in Sichuan basin of south-west China. Agric. Water Manage. 2017, 181, 1–9. [Google Scholar] [CrossRef]
Jensen, D.T.; Hargreaves, G.H.; Temesgen, B.; Allen, R.G. Computation of ET0 under non ideal conditions. J. Irrig. Drain. Eng. 1997, 123, 394–400. [Google Scholar] [CrossRef]
Martí, P.; Zarzo, M.; Vanderlinden, K.; Girona, J. Parametric expressions for the adjusted Hargreaves coefficient in Eastern Spain. J. Hydrol. 2015, 529, 1713–1724. [Google Scholar] [CrossRef]
Mendicino, G.; Senatore, A. Regionalization of the Hargreaves coefficient for the assessment of distributed reference evapotranspiration in Southern Italy. J. Irrig. Drain Eng. 2013, 139, 349–362. [Google Scholar] [CrossRef]
Barzkar, A.; Najafzadeh, M.; Homaei, F. Evaluation of drought events in various climatic conditions using data-driven models and a reliability-based probabilistic model. Nat. Hazards 2021, 1–22. [Google Scholar] [CrossRef]
Dong, J.; Wu, L.; Liu, X.; Li, Z.; Gao, Y.; Zhang, Y.; Yang, Q. Estimation of daily dew point temperature by using bat algorithm optimization based extreme learning machine. Appl. Therm. Eng. 2020, 165, 114569. [Google Scholar] [CrossRef]
Fan, J.; Wang, X.; Wu, L. Comparison of support vector machine and extreme gradient boostinging for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: A case study in China. Energy Convers. Manage. 2018, 164, 102–111. [Google Scholar] [CrossRef]
Fan, J.; Wu, L.; Zhang, F. Evaluating the effect of air pollution on global and diffuse solar radiation prediction using support vector machine modeling based on sunshine duration and air temperature. Renew. Sustain. Energy Rev. 2018, 94, 732–747. [Google Scholar] [CrossRef]
Kaba, K.; Sarıgül, M.; Avcı, M.; Kandırmaz, H.M. Estimation of daily global solar radiation using deep learning model. Energy 2018, 162, 126–135. [Google Scholar] [CrossRef]
Keshtegar, B.; Mert, C.; Kisi, O. Comparison of four heuristic regression techniques in solar radiation modeling: Kriging method vs RSM, MARS and M5 model tree. Renew. Sustain. Energy Rev. 2018, 81, 330–341. [Google Scholar] [CrossRef]
Kim, S.; Singh, V.; Lee, C.; Seo, Y. Modeling the physical dynamics of daily dew point temperature using soft computing techniques. KSCE J. Civ. Eng. 2015, 19, 1930–1940. [Google Scholar] [CrossRef]
Mehdizadeh, S.; Behmanesh, J.; Khalili, K. Application of gene expression programming to predict daily dew point temperature, Appl. Therm. Eng. 2017, 112, 1097–1107. [Google Scholar] [CrossRef]
Movahed, S.F.; Najafzadeh, M.; Mehrpooya, A. Receiving More Accurate Predictions for Longitudinal Dispersion Coefficients in Water Pipelines: Training Group Method of Data Handling Using Extreme Learning Machine Conceptions. Water Resour. Manag. 2020, 34, 529–561. [Google Scholar] [CrossRef]
Najafzadeh, M.; Niazmardi, S. A Novel Multiple-Kernel Support Vector Regression Algorithm for Estimation of Water Quality Parameters. Nat. Resour. Res. 2021, 5, 3761–3775. [Google Scholar] [CrossRef]
Singh, K.P.; Basant, N.; Gupta, S. Support vector machines in water quality management. Anal. Chim. Acta 2011, 703, 152–162. [Google Scholar] [CrossRef] [PubMed]
Sun, D.; Li, Y.; Wang, Q. A unified model for remotely estimating chlorophyll a in Lake Taihu, China, based on SVM and in situ hyperspectral data. IEEE Trans. Geosci. Rem. Sens. 2009, 47, 2957–2965. [Google Scholar]
Wang, L.; Niu, Z.; Kisi, O.; Kisi, O.; Li, C.; Yu, D. Pan evaporation modeling using four different heuristic approaches. Comput. Electron. Agric. 2017, 140, 203–213. [Google Scholar] [CrossRef]
Wang, L.; Kisi, O.; Hu, B.; Bilal, M.; Kermani, M.; Li, H. Evaporation modelling using different machine learning techniques. Int. J. Climatol. 2017, 37, 1076–1092. [Google Scholar] [CrossRef]
Wu, L.; Huang, G.; Fan, J.; Zhang, F.; Wang, X.; Zeng, W. Potential of kernel-based nonlinear extension of Arps decline model and gradient boostinging with categorical features support for predicting daily global solar radiation in humid regions. Energy Convers. Manage. 2019, 183, 280–295. [Google Scholar] [CrossRef]
Yaseen, Z.M.; Awadh, S.M.; Sharafati, A.; Shahid, S. Complementary data-intelligence model for river flow simulation. J. Hydrol. 2018, 567, 180–190. [Google Scholar] [CrossRef]
Ahmadi, F.; Mehdizadeh, S.; Mohammadi, B.; Pham, Q.B.; DOAN, T.N.C.; Vo, N.D. Application of an artificial intelligence technique enhanced with intelligent water drops for monthly reference evapotranspiration estimation. Agric. Water Manage. 2021, 244, 106622. [Google Scholar] [CrossRef]
Pandey, P.; Pandey, V. Development of reference evapotranspiration equations using an artificial intelligence-based function discovery method under the humid climate of Northeast India. Comput. Electron. Agric. 2020, 179, 105838. [Google Scholar] [CrossRef]
Kim, S.; Kim, H.S. Neural networks and genetic algorithm approach for nonlinear evaporation and evapotranspiration modeling. J. Hydrol. 2008, 351, 299–317. [Google Scholar] [CrossRef]
Kisi, O.; Alizamir, M. Modelling reference evapotranspiration using a new wavelet conjunction heuristic method: Wavelet extreme learning machine vs wavelet neural networks. Agric. For. Meteorol. 2018, 263, 41–48. [Google Scholar] [CrossRef]
Kumar, M.; Raghuwanshi, N.S.; Singh, R.; Wallender, W.W.; Pruitt, W.O. Estimating evapotranspiration using artificial neural network. J. Irrig. Drain. Eng. 2002, 128, 224–233. [Google Scholar] [CrossRef]
Chia, M.; Huang, Y.; Koo, C. Swarm-based optimization as stochastic training strategy for estimation of reference evapotranspiration using extreme learning machine. Agric. Water Manage. 2021, 243, 106447. [Google Scholar] [CrossRef]
Wu, L.; Peng, Y.; Fan, J.; Wang, Y.; Huang, G. A novel kernel extreme learning machine model coupled with K-means clustering and firefly algorithm for estimating monthly reference evapotranspiration in parallel computation. Agric. Water Manage. 2020, 245, 106624. [Google Scholar] [CrossRef]
Zhu, B.; Feng, Y.; Gong, D.; Jiang, S.; Zhao, L.; Cui, N. Hybrid particle swarm optimization with extreme learning machine for daily reference evapotranspiration prediction from limited climatic data. Comput. Electron. Agric. 2020, 173, 105430. [Google Scholar] [CrossRef]
Chia, M.; Huang, Y.; Koo, C. Support vector machine enhanced empirical reference evapotranspiration estimation with limited meteorological parameters. Comput. Electron. Agric. 2020, 175, 105577. [Google Scholar] [CrossRef]
Ferreira, L.B.; da Cunha, F.F.; de Oliveira, R.A.; Fernandes Filho, E.I. Estimation of reference evapotranspiration in Brazil with limited meteorological data using ANN and SVM—A new approach. J. Hydrol. 2019, 572, 556–570. [Google Scholar] [CrossRef]
Moazenzadeh, R.; Mohammadi, B.; Shamshirband, S.; Chau, K.-W. Coupling a firefly algorithm with support vector regression to predict evaporation in northern Iran. Eng. Appl. Comput. Fluid Mech. 2018, 12, 584–597. [Google Scholar] [CrossRef] [Green Version]
Kiafar, H.; Babazadeh, H.; Marti, P.; Kisi, O.; Landeras, G.; Karimi, S.; Shiri, J. Evaluating the generalizability of GEP models for estimating reference evapotranspiration in distant humid and arid locations. Theor. Appl. Climatol. 2017, 130, 377–389. [Google Scholar] [CrossRef]
Mattar, M. Using gene expression programming in monthly reference evapotranspiration modeling: A case study in Egypt. Agric. Water Manage. 2018, 198, 28–38. [Google Scholar] [CrossRef]
Shiri, J. Evaluation of FAO56-PM, empirical, semi-empirical and gene expression programming approaches for estimating daily reference evapotranspiration in hyper-arid regions of Iran. Agric. Water Manage. 2017, 188, 101–114. [Google Scholar] [CrossRef]
Fan, J.; Yue, W.; Wu, L.; Zhang, F.; Cai, H.; Wang, X.; Lu, X.; Xiang, Y. Evaluation of SVM, ELM and four tree-based ensemble models for predicting daily reference evapotranspiration using limited meteorological data in different climates of China. Agric. For. Meteorol. 2018, 263, 225–241. [Google Scholar] [CrossRef]
Huang, G.; Wu, L.; Ma, X.; Zhang, W.; Fan, J.; Yu, X.; Zeng, W.; Zhou, H. Evaluation of Catboosting method for prediction of reference evapotranspiration in humid regions. J. Hydrol. 2019, 574, 1029–1041. [Google Scholar] [CrossRef]
Zhang, Y.; Zhao, Z.; Zheng, J. Catboosting: A new approach for estimating daily reference crop evapotranspiration in arid and semi-arid regions of Northern China. J. Hydrol. 2020, 588, 125087. [Google Scholar] [CrossRef]
Fan, J.; Ma, X.; Wu, L.; Zhang, F.; Yu, X.; Zeng, W. Light Gradient boostinging Machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data. Agric. Water Manage. 2019, 225, 105758. [Google Scholar] [CrossRef]
Kisi, O.; Kilic, Y. An investigation on generalization ability of artificial neural networks and M5 model tree in modeling reference evapotranspiration. Theor. Appl. Climatol. 2016, 126, 413–425. [Google Scholar] [CrossRef]
Chen, Z.; Zhu, Z.; Jiang, H.; Sun, S. Estimating daily reference evapotranspiration based on limited meteorological data using deep learning and classical machine learning methods. J. Hydrol. 2020, 591, 125286. [Google Scholar] [CrossRef]
Ferreira, L.; Cunha, F. Multi-step ahead forecasting of daily reference evapotranspiration using deep learning. Comput. Electron. Agric. 2020, 178, 105728. [Google Scholar] [CrossRef]
Ferreira, L.; Cunha, F. New approach to estimate daily reference evapotranspiration based on hourly temperature and relative humidity using machine learning and deep learning. Agric. Water Manage. 2020, 234, 106113. [Google Scholar] [CrossRef]
Feng, Y.; Cui, N.; Gong, N.; Zhang, Q.; Zhao, L. Evaluation of random forests and generalized regression neural networks for daily reference evapotranspiration modelling. Agric. Water Manag. 2017, 193, 163–173. [Google Scholar] [CrossRef]
Júnior, J.; Medeiros, V.; Garrozi, C.; Montenegro, A.; Gonalves, G. Random forest techniques for spatial interpolation of evapotranspiration data from Brazilian’s Northeast. Comput. Electron. Agric. 2019, 166, 105017. [Google Scholar] [CrossRef]
Wang, S.; Lian, J.; Peng, Y.; Hu, B.; Chen, H. Generalized reference evapotranspiration models with limited climatic data based on random forest and gene expression programming in Guangxi, China. Agric. Water Manag. 2019, 221, 220–230. [Google Scholar] [CrossRef]
Avand, M.; Janizadeh, S.; Bui, D.T.; Pham, V.H.; Ngo, P.T.T.; Nhu, V. A tree-based intelligence ensemble approach for spatial prediction of potential groundwater. Int. J. Digit. Earth 2020, 13, 1408–1429. [Google Scholar] [CrossRef]
Chen, W.; Li, Y.; Xue, W.; Shahabi, H.; Li, S.; Hong, H.; Wang, X.; Bian, H.; Zhang, S.; Pradhan, B.; et al. Modeling flood susceptibility using data-driven approaches of naïve Bayes tree, alternating decision tree, and random forest methods. Sci. Total Environ. 2020, 701, 134979. [Google Scholar] [CrossRef]
Avand, M.; Janizadeh, S.; Bui, D.T. Using machine learning models, remote sensing, and GIS to investigate the effects of changing climates and land uses on flood probability. J. Hydrol. 2021, 595, 125663. [Google Scholar] [CrossRef]
Najafzadeh, M.; Homaei, F.; Farhadi, H. Reliability assessment of water quality index based on guidelines of national sanitation foundation in natural streams: Integration of remote sensing and data-driven models. Artif. Intell. Rev. 2021, 54, 4619–4651. [Google Scholar] [CrossRef]
Wang, F.; Wang, Y.; Zhang, K.; Gamane, D.; Kisi, O. Spatial heterogeneity modeling of water quality based on random forest regression and model interpretation. Environ. Res. 2021, 202, 111660. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboosting: A scalable tree boostinging system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco, CA, USA, 13 August 2016; pp. 785–794. [Google Scholar]
Fan, J.; Zheng, J.; Wu, L.; Zhang, F. Estimation of daily maize transpiration using support vector machines, extreme gradient boosting, artificial and deep neural networks models. Agric. Water Manag. 2021, 244, 106547. [Google Scholar] [CrossRef]
Ni, L.; Wang, D.; Wu, J.; Wang, Y.; Tao, Y.; Zhang, J.; Liu, J. Streamflow forecasting using extreme gradient boosting model coupled with Gaussian mixture model. J. Hydrol. 2020, 586, 124901. [Google Scholar] [CrossRef]
Yu, S.; Chen, Z.; Yu, B.; Wang, L.; Wu, B.; Wu, J.; Zhao, F. Exploring the relationship between 2D/3D landscape pattern and land surface temperature based on explainable extreme Gradient Boosting tree: A case study of Shanghai, China. Sci. Total Environ. 2020, 725, 138229. [Google Scholar] [CrossRef]
Wu, L.; Peng, Y.; Fan, J.; Wang, Y. Machine learning models for the estimation of monthly mean daily reference evapotranspiration based on cross-station and synthetic data. Hydrol. Res. 2019, 50, 1730–1750. [Google Scholar] [CrossRef]
Fan, J.; Wu, L.; Zheng, J.; Zhang, F. Medium-range forecasting of daily reference evapotranspiration across China using numerical weather prediction outputs downscaled by extreme gradient boosting. J. Hydrol. 2021, 601, 126664. [Google Scholar] [CrossRef]
Han, Y.; Wu, J.; Zhai, B.; Pan, Y.; Zeng, W. Coupling a bat algorithm with xgboost to estimate reference evapotranspiration in the arid and semiarid regions of china. Adv. Meteorol. 2019, 2019, 9575782. [Google Scholar] [CrossRef]
Lu, X.; Fan, J.; Wu, L.; Dong, J. Forecasting Multi-Step Ahead Monthly Reference Evapotranspiration Using Hybrid Extreme Gradient boostinging with Grey Wolf Optimization Algorithm. Comp. Model. Eng. 2020, 125, 699–723. [Google Scholar]
Yan, S.; Wu, L.; Fan, J.; Zhang, F.; Zou, Y.; Wu, Y. A novel hybrid WOA-XGB model for estimating daily reference evapotranspiration using local and external meteorological data: Applications in arid and humid regions of China. Agric. Water Manag. 2021, 244, 106594. [Google Scholar] [CrossRef]
Shiri, J. Improving the performance of the mass transfer-based reference evapotranspiration estimation approaches through a coupled wavelet-random forest methodology. J. Hydrol. 2018, 561, 737–750. [Google Scholar] [CrossRef]
Huo, Z.; Dai, X.; Feng, S.; Kang, S.; Huang, G. Effect of climate change on reference evapotranspiration and aridity index in arid region of China. J. Hydrol. 2013, 492, 24–34. [Google Scholar] [CrossRef]
Li, Y.; Yao, N.; Chau, H.W. Influences of removing linear and nonlinear trends from climatic variables on temporal variations of annual reference crop evapotranspiration in Xinjiang, China. Sci. Total. Environ. 2017, 592, 680–692. [Google Scholar] [CrossRef] [Green Version]
Luo, Y.; Traore, S.; Lyu, X.; Wang, W.; Wang, Y. Medium range daily reference evapotranspiration forecasting by using ANN and public weather forecasts. Water Resour. Manag. 2015, 29, 3863–3876. [Google Scholar] [CrossRef]
Luo, Y.; Chang, X.; Peng, S.; Khan, S.; Wang, W.; Zheng, Q.; Cai, X. Short-term forecasting of daily reference evapotranspiration using the Hargreaves—Samani model and temperature forecasts. Agric. Water Manag. 2014, 136, 42–51. [Google Scholar] [CrossRef]
Tikhamarine, Y.; Malik, A.; Kumar, A.; Souag-Gamane, D.; Kisi, O. Estimation of monthly reference evapotranspiration using novel hybrid machine learning approaches. Hydrol. Sci. J. Des. Sci. Hydrol. 2019, 64, 1824–1842. [Google Scholar] [CrossRef]
Yassen, A.N.; Nam, W.H.; Hong, E.M. Impact of climate change on reference evapotranspiration in Egypt. Catena 2020, 194, 104711. [Google Scholar] [CrossRef]
Ning, T.; Zhou, S.; Chang, F.; Shen, H.; Li, Z.; Liu, W. Interaction of vegetation, climate and topography on evapotranspiration modelling at different time scales within the Budyko framework. Agric. For. Meteorol. 2019, 275, 59–68. [Google Scholar] [CrossRef]
Tabari, H.; Marofi, S.; Aeini, A.; Talaee, P.H.; Mohammadi, K. Trend analysis of reference evapotranspiration in the western half of Iran. Agric. For. Meteorol. 2011, 151, 128–136. [Google Scholar] [CrossRef]
Espadafor, M.; Lorite, I.; Gavilán, P.; Berengena, J. An analysis of the tendency of reference evapotranspiration estimates and other climate variables during the last 45 years in southern Spain. Agric. Water Manag. 2011, 98, 1045–1061. [Google Scholar] [CrossRef]
Liu, Q.; Yang, Z. Quantitative estimation of the impact of climate change on actual evapotranspiration in the Yellow River Basin, China. J. Hydrol. 2010, 395, 226–234. [Google Scholar] [CrossRef]
Tang, B.; Tong, L.; Kang, S.; Zhang, L. Impacts of climate variability on reference evapotranspiration over 58 years in the Haihe river basin of north China. Agric. Water Manag. 2011, 98, 1660–1670. [Google Scholar] [CrossRef]
Lu, X.; Ju, Y.; Wu, L.; Fan, J.; Zhang, F.; Li, Z. Daily pan evaporation modeling from local and cross-station data using three tree-basedmachine learning models. J. Hydrol. 2018, 566, 668–684. [Google Scholar] [CrossRef]
Saggi, M.K.; Jain, S. Application of fuzzy-genetic and regularization random forest (FG-RRF): Estimation of crop evapotranspiration (ETc) for maize and wheat crops—ScienceDirect. Agric. Water Manage. 2020, 229, 105907. [Google Scholar] [CrossRef]
Karimi, S.; Shiri, J.; Kisi, P.; Xu, T. Forecasting daily streamflow values: Assessing heuristic models. Hydrol. Res. 2018, 49, 658–669. [Google Scholar] [CrossRef]
Song, R.; Chen, S.; Deng, B.; Li, L. Extreme Gradient boostinging for Identifying Individual Users Across Different Digital Devices. In International Conference on Web-Age Information Management; Springer International Publishing: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
Najafzadeh, M.; Oliveto, G. More reliable predictions of clear-water scour depth at pile groupsby robust artificial intelligence techniques while preserving physical consistency. Soft Comput. 2021, 25, 5723–5746. [Google Scholar] [CrossRef]
Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
Despotovic, M.; Nedic, V.; Despotovic, D.; Cvetanovic, S. Review and statistical analysis of different global solar radiation sunshine models. Renew. Sustain. Energy Rev. 2015, 52, 1869–1880. [Google Scholar] [CrossRef]
Antonopoulos, V.Z.; Antonopoulos, A.V. Daily reference evapotranspiration estimates by artificial neural networks technique and empirical equations using limited input climate variables. Comput. Electron. Agric. 2017, 132, 86–96. [Google Scholar] [CrossRef]
Tabari, H.; Kisi, O.; Ezani, A.; Talaee, P.H. SVM, ANFIS, regression and climate based models for reference evapotranspiration modeling using limited climatic data in a semi-arid highland environment. J. Hydrol. 2012, 444, 78–89. [Google Scholar] [CrossRef]
Sanikhani, H.; Kisi, O.; Maroufpoor, E.; Yaseen, Z.M. Temperature-based modeling of reference evapotranspiration using several artificial intelligence models: Application of different modeling scenarios. Theor. Appl. Climatol. 2019, 135, 449–462. [Google Scholar] [CrossRef]
Rezaabad, Z.; Salajegheh, M. ANFIS Modeling with ICA, BBO, TLBO, and IWO Optimization Algorithms and Sensitivity Analysis for Predicting Daily Reference Evapotranspiration. J. Hydol. Eng. 2020, 25, 4020038. [Google Scholar] [CrossRef]
Shiri, J.; Marti, P.; Landeras, G. Data splitting strategies for improving data driven models for reference evapotranspiration estimation among similar stations. Comput. Electron. Agric. 2019, 162, 70–81. [Google Scholar] [CrossRef]
Pandey, B.K.; Khare, D. Identification of trend in long term precipitation and reference evapotranspiration over Narmada river basin (India). Global Planet. Change. 2018, 161, 172–182. [Google Scholar] [CrossRef]
Mutasa, S.; Sun, S.; Ha, R. Understanding artificial intelligence based radiology studies: What is overfitting? Clin. Imaging 2020, 65, 96–99. [Google Scholar] [CrossRef]
Laaboudi, A.; Mouhouche, B.; Draoui, B. Conceptual reference evapotranspiration models for different time steps. J. Pet. Environ. Biotechnol. 2012, 3, 1000123. [Google Scholar]
Yin, J.; Deng, Z.; Amor, V.; Wu, J.; Rasu, E. Forecast of short-term daily reference evapotranspiration under limited meteorological variables using a hybrid bi-directional long short-term memory model (Bi-LSTM). Agric. Water Manag. 2020, 242, 106386. [Google Scholar] [CrossRef]

Figure 1. The geographical locations of the twenty-one weather stations in the humid areas of China in the present study.

Figure 2. General architecture of the random forest model.

Figure 3. Simple flowchart of the proposed methodology in this study.

Figure 4. The data splitting strategies, lengths of years, and various cross-validation stages involved in this study.

Figure 5. Box diagrams of daily FAO56-PM ET₀ values and predicted ET₀ values by the S5 proportion during ten cross-validation stages using the perfect dataset in the testing stage (1966–2015) at the four weather stations. The numbers on the horizontal direction represent the 10 cross-validation periods at S5 proportion, respectively.

Figure 6. Bar plots of average RMSE values of the models for estimating daily ET₀ by various length of years using the different proportions in the testing stage at the 21 weather stations. (a,b) stand for RF and XGBoost, respectively; S1, S2, S3, S4, and S5 represent data splitting proportions of 5:5, 6:4, 7:3, 8:2, and 9:1, respectively. RF1, RF2, RF3, and RF4 represent the four combinations of the random forest model; XGB1, XGB2, XGB3, and XGB4 represent the four combinations of the extreme gradient boosting model.

Figure 7. Bar chart of each average statistical indicator value in the fixed test dataset (2016–2019) with S5 proportion under combination 2 of random forest and extreme gradient boosting model.

Table 1. The geographical locations and daily mean values of meteorological data for each of the twenty-one weather stations in the present study.

Station Name	Altitude (m)	Latitude (° N)	Longitude (° E)	Rs (MJ·m⁻²·d⁻¹)	T_max (°C)	T_min (°C)	RH (%)	U₂ (m·s⁻¹)	ET₀ (mm·d⁻¹)
Emeishan	3048.6	29.31	103.21	12.60 (0.59)	7.75 (0.93)	0.55 (12.95)	85.51 (0.20)	2.27(0.57)	1.72 (0.66)
Lijiang	2394.40	26.51	100.13	16.94 (0.36)	19.52 (0.23)	8.07 (0.72)	62.43 (0.30)	2.37(0.49)	3.36 (0.40)
Tengchong	1648.70	25.07	98.29	15.22 (0.38)	21.61 (0.17)	10.73 (0.57)	77.14 (0.16)	1.24(0.42)	2.68 (0.38)
Kunming	1896.80	25.01	102.41	14.95 (0.45)	21.16 (0.22)	10.77 (0.53)	71.20 (0.19)	1.62(0.49)	2.92 (0.45)
Jinghong	553.60	21.55	100.45	15.60 (0.34)	29.75 (0.13)	18.05 (0.25)	79.28 (0.13)	0.49(0.77)	3.12 (0.37)
Mengzi	1301.70	23.20	103.23	15.55 (0.41)	24.70 (0.20)	15.07 (0.33)	70.45 (0.17)	2.21(0.53)	3.44 (0.42)
Yichang	134.30	30.42	111.05	10.79 (0.70)	21.56 (0.43)	13.59 (0.61)	75.04 (0.16)	0.98 (0.51)	2.28 (0.68)
Wuhan	27.00	30.38	114.17	12.05 (0.65)	21.41 (0.45)	13.28 (0.71)	76.66 (0.15)	1.38 (0.63)	2.45 (0.68)
Guiyang	1074.30	26.34	106.42	10.15 (0.70)	19.58 (0.42)	12.07 (0.59)	77.40 (0.14)	1.67 (0.45)	2.26 (0.62)
Guilin	166.20	25.20	110.18	11.21 (0.65)	23.29 (0.37)	16.06 (0.47)	74.82 (0.18)	1.79 (0.70)	2.66 (0.56)
Ganxian	124.70	25.50	114.50	12.26 (0.60)	24.20 (0.37)	16.26 (0.49)	74.86 (0.15)	1.18 (0.57)	2.71 (0.60)
Gushi	57.90	32.10	115.4	12.86 (0.61)	20.31 (0.48)	11.89 (0.79)	76.01 (0.18)	2.00 (0.47)	2.57 (0.66)
Nanjing	12.50	32.00	118.48	12.48 (0.59)	20.54 (0.47)	11.93 (0.81)	74.92 (0.16)	1.86 (0.55)	2.51 (0.64)
Hefei	36.50	31.53	117.15	12.04 (0.62)	20.63(0.47)	12.47 (0.76)	75.20 (0.17)	1.96 (0.47)	2.52 (0.65)
Hangzhou	43.20	30.19	120.12	11.69 (0.67)	21.22 (0.45)	13.47 (0.66)	75.84 (0.18)	1.66 (0.50)	2.48 (0.68)
Nanchang	45.70	28.40	115.58	12.11 (0.65)	21.84 (0.43)	14.88 (0.59)	75.95 (0.17)	1.77 (0.65)	2.63 (0.64)
Fuzhou	85.40	26.05	119.17	12.11 (0.62)	24.66 (0.31)	17.05 (0.40)	75.13 (0.16)	1.92 (0.43)	2.90 (0.55)
Guangzhou	4.20	23.08	113.19	11.62 (0.53)	26.56 (0.24)	19.01 (0.33)	76.70 (0.17)	1.32 (0.61)	2.65 (0.47)
Shantou	7.30	23.21	116.40	13.71 (0.48)	25.57 (0.23)	19.01 (0.32)	79.25 (0.12)	1.81 (0.50)	2.96 (0.45)
Nanning	73.70	22.51	108.19	12.50 (0.56)	26.34 (0.27)	18.56 (0.35)	79.24 (0.12)	1.07 (0.62)	2.73 (0.52)
Kaikou	18.00	19.59	110.20	13.89 (0.52)	28.14 (0.19)	21.65 (0.20)	83.06 (0.10)	1.97 (0.50)	3.16 (0.47)
maximum value	3048.60	32.10	120.12	16.94	29.75	21.65	85.51	2.37	3.44
minimum value	4.20	19.59	98.29	10.15	7.75	0.55	62.43	0.49	1.72
average value	485.31	26.39	110.38	12.97	22.40	14.02	76.00	1.64	2.70

Note: data outside the brackets are daily averages from 1966 to 2015, while data inside the brackets are daily coefficients of variation from 1966 to 2015.

Table 2. Input combinations for the machine learning models.

Input Combination	Models		Meteorological Variables
Input Combination	RF	XGB	Meteorological Variables
1	RF1	XGB1	T_maxT_min Ra
2	RF2	XGB2	T_maxT_min Rs
3	RF3	XGB3	T_max T_min Ra RH
4	RF4	XGB4	T_max T_min Ra U₂

Table 3. Average statistical values of different input parameters of two machine learning models in the testing stages of 21 stations from various time lengths of input data.

Length of Years/Input Combination	Meteorological Variables	XGB				RF
		R²	RMSE	MAE	NSE	R²	RMSE	MAE	NSE
		R²	(mm·d⁻¹)	(mm·d⁻¹)	NSE	R²	(mm·d⁻¹)	(mm·d⁻¹)	NSE
10-span
1	T_max T_min Ra	0.792	0.673	0.494	0.783	0.801	0.657	0.484	0.792
2	T_max T_min Rs	0.951	0.320	0.231	0.948	0.954	0.311	0.225	0.951
3	T_maxT_min Ra RH	0.889	0.502	0.360	0.876	0.896	0.485	0.350	0.884
4	T_max T_min Ra U₂	0.843	0.587	0.424	0.836	0.853	0.567	0.412	0.847
30-span
1	T_max T_min Ra	0.786	0.672	0.495	0.777	0.789	0.666	0.492	0.780
2	T_max T_min Rs	0.950	0.323	0.232	0.947	0.952	0.314	0.227	0.949
3	T_maxT_min Ra RH	0.882	0.503	0.362	0.873	0.888	0.491	0.355	0.879
4	T_max T_min Ra U₂	0.832	0.597	0.431	0.825	0.840	0.583	0.423	0.833
50-span
1	T_max T_min Ra	0.777	0.689	0.509	0.768	0.776	0.688	0.509	0.768
2	T_max T_min Rs	0.947	0.328	0.234	0.945	0.948	0.324	0.232	0.946
3	T_maxT_min Ra RH	0.875	0.526	0.379	0.862	0.880	0.516	0.372	0.868
4	T_max T_min Ra U₂	0.820	0.620	0.448	0.812	0.827	0.607	0.440	0.819