Splitting and Length of Years for Improving Tree-Based Models to Predict Reference Crop Evapotranspiration in the Humid Regions of China

: To improve the accuracy of estimating reference crop evapotranspiration for the efﬁcient management of water resources and the optimal design of irrigation scheduling, the draw-back of the traditional FAO-56 Penman–Monteith method requiring complete meteorological input variables needs to be overcome. This study evaluates the effects of using ﬁve data splitting strategies and three different time lengths of input datasets on predicting ET 0 . The random forest (RF) and extreme gradient boosting (XGB) models coupled with a K-fold cross-validation approach were applied to accomplish this objective. The results showed that the accuracy of the RF (R 2 = 0.862, RMSE = 0.528, MAE = 0.383, NSE = 0.854) was overall better than that of XGB (R 2 = 0.867, RMSE = 0.517, MAE = 0.377, NSE = 0.860) in different input parameters. Both the RF and XGB models with the combination of T max , T min , and Rs as inputs provided better accuracy on daily ET 0 estimation than the corresponding models with other input combinations. Among all the data splitting strategies, S5 (with a 9:1 proportion) showed the optimal performance. Compared with the length of 30 years, the estimation accuracy of the 50-year length with limited data was reduced, while the length of meteorological data of 10 years improved the accuracy in southern China. Nevertheless, the performance of the 10-year data was the worst among the three time spans when considering the independent test. Therefore, to improve the daily ET 0 predicting performance of the tree-based models in humid regions of China, the random forest model with datasets of 30 years and the 9:1 data splitting strategy is recommended.


Introduction
Evapotranspiration (ET), the total water consumption of soil evaporation and crop transpiration, is of great significance for water resources planning and management, irrigation systems, land drainage implementation, groundwater research, drought assessment, analysis of farmland environments, and agricultural water management in water shortage areas [1][2][3][4]. The precise prediction of ET is critical at the global level because it has an impact on the hydrological cycle [5,6]. In the context of climate change, agricultural water resources are decreasing on a temporal and spatial scale across the world [7]. Crop water use is the key factor of soil water circulation in farmland, which is exceedingly significant regarding the optimal allocation of water resources and the formulation of irrigation systems, and the key to calculate the crop water demand is to determine the evapotranspiration of crops [8][9][10]. However, methods for calculating the ET, such as the water balance method [11], the conduction theory of aqueous vapor [12], or using the lysimeter device, are extremely time-consuming and expensive in practice, which limits their applicability. Hence, to determine the actual ET value in a wide range, the reference evapotranspiration (ET 0 ) was developed as an alternative method for calculating the ET and has been widely used [13].
Plenty of nonlinear mathematical models with meteorological variables have been established for ET 0 prediction [14][15][16], among which the FAO-56 Penman-Monteith model is the most widely accepted standard model in different regions and climates. However, the FAO-56 Penman-Monteith model needs a mass of meteorological variables for its calculation, e.g., maximum and minimum ambient temperatures, wind speed, relative humidity, and solar radiation [17][18][19], which is the major weaknesses for its application across the world. Therefore, models with fewer meteorological parameters as inputs, e.g., temperature-based, mass transfer-based, and radiation-based models, have been developed and applied widely in regions where only incomplete meteorological data are available [6,[20][21][22][23]. In spite of the wide application, there are still many inconveniences in the estimation of evapotranspiration with empirical models as most of them are linear functions, while evapotranspiration in reality is a highly complicated nonlinear process.
The random forest (RF) is an ensemble-based method. Due to the random forest being able to handle extremely large datasets, RF has been commonly used for predicting ET 0 in recent years [62][63][64][65]. For example, Feng et al. studied the capabilities of the RF and GRNN models for estimating the daily ET 0 with meteorological parameters from two weather stations in southwest China and discovered that both RF and GRNN performed well, while RF was a little better than GRNN in general [62]. Wang et al. reported that the derived generalized ET 0 model based on the RF could be successfully applied to ET 0 estimation with both complete and incomplete meteorological variables, which was recommended for application in water balance research [65]. Junior et al. predicted ET 0 with the inverse distance weighting (IDW), ordinary kriging (OK), random forest (RF), and a random forest variation for spatial predictions (RFsp) based on maximum and minimum temperature data from 136 climatological stations located in Brazil, in which they found that the RF obtained better results than conventional approaches [63]. Karimi et al. used 10-year daily data from Iran and considered the impact of replacing missing meteorological variables with calculated meteorological variables based on the standard FAO-56 PM, some commonly used empirical equations, and the random forest model [5]. According to their results, when the calculated value was used to replace the missing variable, the RF model based on the combination of wind speed has higher accuracy than the RF model based on the combination of solar radiation. In addition, the random forest was also widely used in flood probability mapping [65,66], and there are relevant reports using remote sensing data [67,68]. Meanwhile, the random forest has also been well applied in water quality [69].
In recent years, because the error is reduced, the prediction accuracy is better and the calculation costs are lower. Chen and Guestrin proposed the tree-based extreme gradient boosting (XGB) [70], which has been widely applied in various fields [71][72][73]. In addition, the model has also been used to predict ET 0 . For example, Wu et al. explored the performance of the XGB model in estimating the monthly mean daily ET 0 using temperature data and found that the XGB model exhibited better estimation accuracy than the other methods [74]. Fan et al. evaluated the capability of the XGB model in estimating daily reference evapotranspiration using the Global Ensemble Reforecast v2 data in different climatic zones of China [75]. The results indicated that the XGB model can be satisfactory for estimating the daily ET 0 . Furthermore, the optimization algorithm of the XGB model has received more and more attention because of its ability to enhance the ability of artificial intelligence methods in the modeling process of solving engineering problems, and it has been used to estimate ET 0 [76][77][78]. Therefore, the XGB model is suited to estimate the daily ET 0 in data-limited regions.
Machine learning models of different heuristic agrometeorological variables have shown high accuracy in ET 0 estimation based on finite data. However, the soundness of the model to overcome the complexity in reality and to obtain high-precision simulating results is highly dependent on the data management strategy during model development and evaluation, especially for the splitting strategy of data allocating to the model training and testing stages. Therefore, the key to ensuring a model obtaining the best simulation accuracy with the data series is to find a suitable standard for appropriately splitting the data into the model training and testing stages. For instance, Wu et al. established an RF model with a 2:1 data splitting for the training and testing and found that the RF had higher simulation accuracy than the other intelligent models [74]. To find an alternative method of mass transfer-based methods, Shiri et al. established a random forest using a cross-validation at the local and cross-station scales with a single data splitting for the training and testing series [79]. It was found that the simulation accuracy of the random forest model was better than the transfer-based models.
In the context of climate change, both meteorological factors and ET 0 have changed a great deal [80,81]. This poses a challenge to model establishment and evaluation for estimation with a long-term period of data, and the efficiency for the model estimating ET 0 is related to the time length of the input datasets [4,[82][83][84]. Yassen et al. divided a 35-year historical record (1983-2017) into four groups (i.e., 17 years (long-term), 10 years and 7 years (middle-term), and 5 years (short-term)) to study the temporal and spatial changes of Egypt's annual reference evapotranspiration [85]. The results indicated that the short-term group showed the most significant differences in all the studied areas of Egypt, while the long-term and medium-term differences were only significantly different in a certain area of Egypt. Ning et al. studied the interaction of the three factors (i.e., vegetation, climate, and topography) and their corresponding impacts on ET modelling at six different time spans in the Loess Plateau of China [86]. The results showed that the long-term spans showed stronger relationships between the three factors than short-term spans in most catchments. Therefore, it can be concluded that the time length of input datasets has an important influence on the model accuracy of evaluating ET 0 .
To our knowledge, the trend of ET 0 was found to have changed in different regions of the world. Both Iran in the Middle East [87] and Spain in the Iberian Peninsula in southwestern Europe [88] found an increasing trend in ET 0 . However, a decreasing ET 0 trend had been reported in Northern China [89,90]. In the context of climate change, due to the large population, vast land area, and frequent floods in the humid area of southern China in this study, the uncertainty of the climate is expected to intensify the variability of the ET 0 in this area [80,81]. However, relevant reports to date are still lacking in southern China. Therefore, it is of great significance to study how to improve the accuracy of the ET 0 modeling for alleviating the pressure on the water resources in the region. Meanwhile, the application of the relatively simple tree-based RF and extreme gradient boosting model in ET 0 estimation under various data splitting strategies (i.e., different proportions of splitting) has not been evaluated. In addition, there is no corresponding report on the applicability of the random forest and extreme gradient boosting in estimating the ET 0 under limited meteorological data and various time lengths of input datasets (i.e., data obtained from different time ranges). Accordingly, the performance of the RF and XGB on daily ET 0 estimation under various conditions consisting of different model input combinations, data splitting strategies, and time lengths was evaluated in this study with meteorological records from twenty-one climatological stations in the humid areas of southern China. Overall, the aims of this research are to: (1) discuss the influence of different meteorological variable input combinations on model performance; (2) evaluate the effectiveness of various data splitting strategies in estimating the ET 0 under different input combinations; and (3) evaluate the effectiveness of different time lengths of data on ET 0 estimation under various input combinations and splitting strategies.

Study Areas
In this research, daily meteorological data from 21 representative meteorological stations across the humid region of China ( Figure 1) were used to build the RF and XGB models to estimate ET 0 . This area is rich in water and heat resources, the geographic range including two river basins (the Yangtze River Basin and the Pearl River Basin). Due to the effect of El Nino and typhoons, the occurring frequency of floods and waterlogging disasters in this region is generally high, often bringing huge impacts to nature and the society of this region. For example, a summer flood that occurred in the Poyang Lake of the Yangtze River Basin affected over 2.531 million people and 190.4 thousand hectares of crops, resulting in an economic loss of 2.39 billion RMB. Therefore, this area has become an area of widespread concern for many scholars who study hydrological phenomena and climate [55,91].

Used Temperature Data
Continuous and long-term series of observed daily maximum (T max ) and minimum (T min ), relative humidity (RH), global solar radiation (Rs), extra-terrestrial solar radiation (Ra), and wind speed (U 2 ) from 1966 to 2019 were gathered from 21 representative climatological stations in the humid region of China ( Figure 1). Among them, 1966-2015 was used for training and testing models, and 2016-2019 was used for independent testing. The meteorological records with quality control were obtained from the National Meteorological Information Center (NMIC) of China Meteorological Administration (URL: http://data.cma.cn accessed on 5 March 2020). The detailed description of the 21 studied weather stations is listed in Table 1. Among these stations, the mean daily maximum ambient temperatures were 7.75-29.75 • C, and the mean daily minimum ambient temperatures were 0.55-21.65 • C. The range of daily average wind speed varied from 0.49 to 2.37 m·s −1 , while the daily average relative humidity ranged between 85.51% at Emeishan and 62.43% at Lijiang. The range of daily average global solar radiation varied between 16.94 MJ·m −2 ·d −1 at Lijiang and 10.15 MJ·m −2 ·d −1 at Guiyang. The highest daily average ET 0 (3.44 mm·d −1 ) was monitored at Mengzi, while the lowest value (1.72 mm·d −1 ) appeared at Emeishan. In general, the plateau site is more variable than sites in plains and hilly areas.

Random Forest (RF)
Random forest (RF) is used for classification and regression [7], mainly used for regression problems [55,91,93]. The RF algorithm builds a decision tree on data samples and then obtains the prediction results from each sample, reduces overfitting by averaging the results, and finally optimizes the solution, thereby improving the prediction performance.
The model of random forest is established by decision-based learning device. To establish an RF model, the first step is to get the sub-training set from the original data. Suppose there are M samples in the initial dataset D, and the probability of not selecting a particular individual after M samples is (1-M −1 ) M . This means that when the training sets are generated by sampling, each training set contains 63.2% of the original datasets, and the unselected ones (36.8% of the original datasets) become out-of-bag datasets.
The main difference between random forest and bagging is that, when constructing each tree, n features are randomly selected from all the features M. When optimizing each segmentation node, the principle of minimum Gini coefficient is adopted. The Gini coefficient can be expressed as follows: For the classification problem, the original problem began with developing trees on the basis of random vector when using RF [7]. The prediction ability of the random forest model needs to be evaluated by the edge function, and the equation is as follows: Generalization error is used to measure the accuracy of the random forest model. The generalization error of random forest is: For the parameter meaning in the above formula and the details of the random forest model establishment, please refer to the literature of Breiman [7]. The structure of the RF algorithm is shown in Figure 2.

Extreme Gradient Boosting
Extreme gradient boosting (XGB) is a new algorithm of gradient enhancers (GBMs) proposed by Chen and Guestrin [9]. The XGB model is designed to prevent over-fitting while reducing the computational cost by keeping the predictions at the best computational efficiency through simplification and regularization. The XGB algorithm is derived from the concept of "boosting". It combines all the predictions of a group of weak learners and trains strong learners through special training. The calculation formula is as follows: where t is the number of trees, f t (x i ) is a function, and x i is the input variable.
In order to prevent the over-fitting problem without affecting the calculation speed of the model, the XGB model can derive the following formula: where l is loss function, n is the number of the observed, ∑ n k = 1 l(y i , y i ) is training error, y i is the predicted value, y i is the actual value, Ω is the regularization term, and the formula is : where ω is norm of leaf scores, λ is a regularization parameter, and γ represents the parameter that controls the weight of the number of leaves.
The XGB algorithm is based on a gradient boosting strategy. It does not reach all the trees at once but adds a new tree each time to patch the previous test results. Assuming that the predicted value at step t is , the following derivation process can be obtained: Details of the XGB model can be found in Song et al. [94].

Input Combinations
Four input combinations of meteorological variables were applied in present research to discuss the influences of different climatic factors on daily ET 0 estimation. Therefore, utilizing various combinations of T max , T min , Ra, Rs, RH, and U 2 , a total of four combinations of input are considered ( Table 2). The flowchart of this study is described in Figure 3.

Data Splitting Strategies and Time Lengths of Input Data
In this study, five data splitting strategies with different proportions of datasets allocated for model training and testing were applied. Specifically, the proportions of dataset allocating to training and testing stages were set as 5:5 (S1), 6:4 (S2)  Figure 4. Furthermore, this paper used a fixed test dataset from 2016 to 2019 for independent testing and varying only the training dataset. Based on the above data manipulation, the machine learning models coupled with a K-fold cross-validation approach was then applied to estimate ET 0 under each of the input combinations.

Statistical Performance Analysis
The accuracy of the models for estimating daily ET 0 were evaluated with four generally used statistical indicators [64,91], which were root mean square error (RMSE), mean absolute error (MAE) [95], coefficient of determination (R 2 ), and Nash-Sutcliffe coefficient (NSE) [96], respectively. The statistical indices are expressed as follows: where X i,P , X i,R , X i,P , and n are the FAO-56 Penman-Monteith ET 0 , the predicted ET 0 , the mean of FAO-56 Penman-Monteith ET 0 , and the number of observed meteorological data, respectively. The value of R 2 exceedingly approaches 1, meaning the model has better performance and data fitting. Conversely, the values of RMSE and MAE extremely approach 0, indicating higher prediction accuracy. Sutcliffe coefficient (NSE) is a commonly used indicator when evaluating the performance of a model. The higher the value of NSE, the better the performance of the model and vice versa. A perfect well between the estimated and the target ET 0 will produce NSE = 1.0 [97].

Comparisons of XGB and RF Predicting Daily ET 0 with Various Input Combinations
The predicting capability of machine learning models for reference evapotranspiration at three levels of time length (2006-2015, 1986-2015, and 1966-2015) was evaluated by the R 2 , RMSE, MAE, and NSE, which is largely due to the input of meteorological data, These meteorological data are derived from the FAO-56 Penman-Monteith model. The statistical results of the four different input combinations for predicting the daily ET 0 at the twenty-one climatological stations in the humid areas of China are provided in Table 3. Taking the 50-year span as an example, the RF and XGB models with input combination 2 (i.e., the RF2 and XGB2 models, input variables consisting of T max , T min , and Rs) had better predicting accuracy than the other input combinations (Table 3); the range of the mean RMSE value of the two combinations (inputs with T max , T min , and Rs; inputs with T max , T min , and Ra, respectively) were 0.324-0.688 mm d −1 during the testing phase, and the homologous values of the XGB models were 0.328-0.689 mm d −1 . The input combination of T max , T min , RH, and Ra produced a pleasing daily ET 0 prediction, and the mean RMSE values were 0.516 mm d −1 and 0.526 mm d −1 in the RF and XGB, respectively. Whereas, the models with input combination 4 (i.e., input variables consisting of T max , T min , U 2 , and Ra) were also capable of estimating the daily ET 0 with respectable precision, possessing a mean RMSE value of 0.607 mm d −1 and 0.620 mm d −1 in the RF and XGB, respectively. These phenomena show that a reasonable combination of parameters is beneficial to the improvement of model accuracy. On the basis of temperature variables, the importance for each of the other three meteorological variables (i.e., Rs, RH, and U 2 ) contributing to the improvement of model accuracy can be ranked as Rs > RH > U 2 . Although the input combination with Rs can produce better model accuracy than input combinations with any other variable, it should be noted that the radiation records are not universally available across the world, especially for less developed regions. In comparison, RH is a variable that could be easily obtained in most regions on Earth, while, at the same time, it provides a decent contribution to improving model accuracy. Therefore, RH is recommended as an alternative for ET 0 estimation with the model in regions where Rs is not available. In terms of machine learning models' performance under different input combinations, compared with the 50-year span, similar patterns were observed in the other two levels of time range, and the random forest model is better than the extreme gradient boosting model (i.e., the 10-year span and the 30-year span, respectively; see Table 3).

Comparisons of XGB and RF Predicting Daily ET 0 with Data Splitting Proportions
Tables 4-6 present the statistical results of the machine learning models with the five data splitting strategies (i.e., splitting into proportions of 5:5 (S1), 6:4 (S2), 7:3 (S3), 8:2 (S4), and 9:1 (S5), respectively) under four combinations of input during testing phases. As shown in the tables, the models predicting accuracy differ among data splitting strategies under the same input combination. Using the 50-year span ( Table 6) as an example, the S5 proportion demonstrated that the values of R 2 and NSE are closest to 1 and the values of RMSE and MAE are closest to 0 in the testing phase for the four combinations of input in two machine learning models, compared to the S4, S3, S2, and S1. The ranks of the researching proportions of the two machine learning models in the field of estimation precision in the testing phase were: S5 > S4 > S3 > S2 > S1. In other words, the S5 proportion had a slightly better capability than the S4 proportion and S3 proportion while realizing a greater edge in capability over the S2 proportion and the S1 proportion. The S5 and S4 proportions had almost equivalent performance (distinction in RMSE < 2%) in predicting the daily ET 0 for the four combinations of input, both of which move beyond the other three data splitting proportions in estimating the daily ET 0 . However, the S1 proportion of XGB and RF revealed the worst estimates of the daily ET 0 for the S5 proportion, with an increase in RMSE by 7.5-7.6% and 7.1-7.2% for the combination of input (i.e., T max , T min , Ra, and RH) and only by 3.5-5.9% and 2.6-5.0% for the other three input combinations, respectively. In general, for the five data splitting proportions, the statistical performance of the data splitting proportion of the RF is better than that of the XGB ( Table 6), indicating that the random forest models produced high-precision estimation at the testing. Compared with the 50-year span, similar patterns of model performance with different data splitting proportions were observed in the 10-year span (Table 4) and the 30-year span ( Table 5).  The box diagrams of the FAO-56 Penman-Monteith ET 0 values and ET 0 predicted by the RF model of the model of the S5 proportion during ten cross-validation periods using the best combination of input (i.e., the combination of T max , T min , and Rs) in the testing phase are demonstrated in Figure 5. The diagrams clearly presented that the scopes of ET 0 values estimated by the ten cross-validation stages were close to the FAO-56 Penman-Monteith ET 0 values of their corresponding stages, further highlighting the model accuracy on estimating daily ET 0 . Overall, the accuracy of the ten cross-validation periods for the four selected sites was high, suggesting that the RF model can be utilized for estimating ET 0 in this area. In particular, the medians, inter-quartile ranges, and extreme values of the fifth and six cross-validation periods were closer to their corresponding values of FAO-56 Penman-Monteith than other cross-validation periods, indicating a better daily ET 0 predicting performance for the former two periods. Among the four selected sites, the distribution of the maximum, minimum, and interquartile range values of the ET 0 at Guiyang station (inland plateau) was the closest to the corresponding values of the FAO-56 PM estimated ET 0 during the ten cross-validation stages.

Comparisons of XGB and RF Predicting Daily ET 0 with Various Time Lengths of Input Data
The average and local RMSE values of the RF and XGB models for estimating daily ET 0 using the available length of years variables in the testing stage at the meteorological stations in the humid regions of southern China are presented in Figure 6. Similar to previous results (Table 3), the machine learning models with input combination 2 (i.e., the RF2 and XGB2 models, input variables consisting of T max , T min , and Rs) and the data spitting proportion of S5 had more promising accuracy than other models and proportions. Specifically, under the different data splitting strategies in the testing stage, compared to the 10-year dataset, the increased percentage of the average RMSE of the RF2 model datasets from a length of 50 years ranged from 2.811 to 3.21%, while the increased percentage of the average RMSE of the 30-year dataset increased by 0.39 to 0.74% in the RF2 model. Besides, the ranges of the increased percentage in the RF1, RF3, and RF4 models were 3.16-3.56%, 6.21-7.07%, and 1.01-1.24%; 0.58-0.79%, 0.45-1.89%, and 0.46-0.84% in the field of the average RMSE in the length of 50 years and length of 30 years datasets relative to the 10-year dataset, respectively. Moreover, the extreme gradient boosting model is consistent with the results shown by the random forest model.

Comparisons of XGB and RF Predicting Daily ET 0 with a Fixed Testing Dataset
To effectively assess the impacts of different data splitting proportions and various time lengths of input data on model performance, a fixed testing dataset consisting of records from 2016 to 2019 was used for the model testing of all the types of models constructed in this study. Meanwhile, the training datasets remained varied among different models, the same as stated previously. The average statistical indicators of models with the fixed testing dataset (2016-2019) were calculated for different time lengths of input data (Tables 7-9). As shown in the tables, under the same time length of input data, both the RF and XGB models with input combination 2 (i.e., the RF2and XGB2 models, input variables consisting of T max , T min , and Rs) had better predicting accuracy than other input combinations, and this pattern did not vary among different time lengths. Furthermore, for any of the three time lengths, the estimating accuracies of the two groups of machine learning models with different data splitting proportions was ranked as S5 > S4 > S3 > S2 > S1. Specifically, compared with other splitting proportions, the values of R 2 and NSE were closer to 1, while the values of RMSE and MAE were closer to 0 in the S5 proportion during the testing phase for any of the four input combinations, and these trends did not differ between the RF and XGB models. The results with the fixed testing dataset were consistent with the results of the above testing datasets (Tables 4-6).   To evaluate the impacts of different time lengths of input data on model accuracy, the statistical indicators of models with the fixed testing dataset (2016-2019) under the input combination 2 and the S5 proportion were analyzed ( Figure 7). Generally, RF showed higher accuracy than XGB. Under each of the three time lengths, the RF model consistently had higher values of R 2 and NSE and lower RMSE and MAE values than the XGB model ( Figure 7). Among the three time lengths, the models with the 30-year span data showed the best estimating accuracy, followed by models with the 50-year span data and then with the 10-year span data, respectively. Taking the RF model as an example, the values of R 2 (0.951) and NSE (0.946) for the models with the 30-year span data were higher than models with the 50-year span data (R 2 = 0.950; NSE = 0.944), or the same as models with  Tables S1 and S2 for details) were consistent with the above results.

Effects of Input Combination Strategy on Daily ET 0 Estimation
The category of the parameters of input was a crucial factor for the estimation precision of the machine learning models in estimating the daily ET 0 . The model commonly operated the worst when the T max /T min and Ra were valid in southern China. Since the model prediction accuracy generally increases with the more meteorological input parameters [57,98,99], models with temperature data as inputs would only generate non-ideal daily ET 0 estimation despite the fact that temperature data are generally widely effective around the world [20,100]. Therefore, the extreme gradient boosting and random forest model with wind speed, relative humidity, and global solar radiation (instead of extra-terrestrial radiation) data would produce acceptable ET 0 values. In this study, the machine learning models with the input combination of T max , T min , and Rs presented better prediction accuracy than other combinations. The results indicate that, with the global solar radiation (Rs) as inputs, the ET 0 values estimated by the XGB and RF models show a favorable viewpoint with the homologous FAO- 56 [54,55,62]. The XGB and RF models with T max /T min , Ra, and RH outperformed the XGB and RF models with T max /T min , Ra, and U 2 in the humid region. These consequences indicate that relative humidity is a more important factor than wind speed when estimating the ET 0 with the XGB and RF models in the humid region. Among the three single factors other than temperature, the significance of meteorological parameters to estimate daily ET 0 was ranked as Rs > RH > U 2 in the humid area of southern China. This consequence is consistent with the research of Yan et al. [78], where they conclude that Rs is more influential than RH and U 2 for estimating the daily ET 0 in the humid region.

Effects of Data Splitting Proportions on Daily ET 0 Estimation
Previous studies have shown that high-precision simulations of machine learning models on ET 0 prediction can be obtained with a single ratio of allocating data into training and testing [56,61]. However, under the same total dataset, there is no report on whether the multiple ratios between the training data and testing data will improve the precision of the machine learning models. As mentioned above (see in Tables 4-6, respectively), the extreme gradient boosting and random forest models with the data splitting proportion of S5 showed excellent capability in predicting the daily ET 0 for all the combinations of input, which exceeded the other four data splitting proportions at twenty-one meteorological stations during the testing phase. Moreover, as the number of years in the testing phase decreases, the accuracy of the model increases. This is an exceedingly hopeful strategy for improving the accuracy of machine learning models to estimate daily ET 0 , especially when there are plenty of historical years of data in the training phase. Consequently, for improving the accuracy of machine learning models, the models should be established with appropriate data segments. In this research, the five proportions among the proportions within the dataset were identified. The accuracy of the data-segment increased with the increase in the ratio in five ratios. In the split rule cases of Rezaabad et al. [101], the three nearest proportions among the proportions within the ten percent of the dataset were also identified. The accuracy of the smallest data segment has been known as the inferior ratio in the three ratios. However, the accuracy of the maximum proportion of this study is not perfect. Therefore, how to precisely select a satisfying proportion needs further study. Shiri et al. established the GEP model, utilizing data splitting strategies in sub-humid stations for estimating the daily ET 0 , and procured good results in sub-humid regions [102]. However, in this study, the XGB and RF models were evaluated in humid areas. Future studies will be needed to use coupled data from arid and humid stations for evaluating the machine learning models.

Effects of Available Length of Years on Daily ET 0 Estimation
The average RMSE calculated by the period of the length of the 10-year dataset was much lower than those of the corresponding two periods under various combinations and proportions, while the length of the 50-year dataset was the highest (Figure 6). The results indicated that the reduced use of modeling data can improve the accuracy of the precision of the random forest models under various input parameters and data segmentation. This shows that the length of 50 years has been particularly inaccurate in dealing with the complex non-linear relationship between the ET 0 and its parameters in the XGB and RF models, The reason for this phenomenon may be that climate change has caused changes in meteorological factors, resulting in a corresponding increase in the value of the ET 0 with the growing length of years. Related phenomena have also been reported in the literature [85,86,103]. However, the results of independent testing data show that the model with a 30-year span has the highest accuracy and the model with a 10-year span has the lowest (Figure 7), which is inconsistent with the results shown in the test dataset. The reason for this phenomenon may be due to the over-fitting phenomenon caused by the smaller dataset of the 10-year span model [104]. In this study, the results showed that appropriately reducing the year span of the dataset is beneficial for the improvement of the model accuracy. However, the specific causes remain to be further studied. In addition, the superiority of datasets of different lengths for predicting ET 0 has been widely researched [105]. Yin et al. coupled the bi-directional and different datasets for predicting the ET 0 and discovered that the length of the short dataset provides the best forecast performance in three lengths of datasets [106]. In the present study, the three different lengths of years were used to build extreme gradient boosting and random forest models for the first time. Due to the variables of different lengths of years, the prediction precision of the random forest and extreme gradient boosting models have been enhanced (Figures 6 and 7). Although the 10-year meteorological data obtained high accuracy in the test dataset, its performance was the worst in independent testing. Therefore, the 30-year data span model is a promising method for predicting the ET 0 in the humid southern regions of my country, and it may also apply to regions with similar climates.

Conclusions
The extreme gradient boosting and random forest models of data splitting strategies and variable ranges of years have been put forward to predict the daily ET 0 in twenty-one weather stations of the humid regions of China. The results revealed that the accuracy of the random forest model is better than that of the extreme gradient boosting model, and the Rs were more crucial than the RH, U 2 , and Ra in predicting the daily ET 0 in southern China. The data splitting proportion of S5 showed excellent performance for all the same input combinations, and the importance of the data splitting variables for predicting the daily ET 0 was as follows: S5 > S4 > S3 > S2 > S1. Compared with the length of 30 years, the estimation accuracy of the 50-year length with limited data is reduced, while the length of meteorological data of 10 years improves the accuracy for southern China. However, the 10-year performance was worse when considering the independent test. Considering that the data span of 30 years has high accuracy and a stable performance, it is recommended that the random forest model with a dataset of 30-year length produces the daily ET 0 . In the absence of continuous and complete meteorological records, this promising strategy can be used as an alternative to the FA0-56 P-M model to calculate ET 0 . Consequently, the random forest model is proposed as a hopeful selective approach to improving the accuracy for estimating the daily ET 0 under conditions of insufficient climatic data in the humid area of southern China. Whereas, further research is required to estimate the performance of the suggested random forest model in the arid and humid climate areas of China or similar climates around the world. Data Availability Statement: All data will be made available on request to the correspondent author's email with appropriate justification.