Performance Evaluation of Deep Learning-Based Gated Recurrent Units (GRUs) and Tree-Based Models for Estimating ETo by Using Limited Meteorological Variables

: The amount of water allocated to irrigation systems is signiﬁcantly greater than the amount allocated to other sectors. Thus, irrigation water demand management is at the center of the attention of the Ministry of Agriculture and Forestry in Turkey. To plan more e ﬀ ective irrigation systems in agriculture, it is necessary to accurately calculate plant water requirements. In this study, daily reference evapotranspiration (ETo) values were estimated using tree-based regression and deep learning-based gated recurrent unit (GRU) models. For this purpose, 15 input scenarios, consisting of meteorological variables including maximum and minimum temperature, wind speed, maximum and minimum relative humidity, dew point temperature, and sunshine duration, were considered. ETo values calculated according to the United Nations Food and Agriculture Organization (FAO) Penman-Monteith method were considered as model outputs. The results indicate that the random forest model, with a correlation coe ﬃ cient of 0.9926, is better than the other tree-based models. In addition, the GRU model, with R = 0.9837, presents good performance relative to the other models. In this study, it was found that maximum temperature was more e ﬀ ective in estimating ETo than other variables.


Introduction
Today, with a growing population, reliable food production and supply are among the main policy concerns of many countries, highlighting the need to use renewable water efficiently to prevent future water shortages. In Turkey, the amount of water allocated to agriculture is significantly greater than for other sectors; thus, the careful use of water is important. To plan irrigation networks and to save water, the water needs of different agricultural plants should be determined correctly [1]. There are various methods to determine plant water requirements, but the Penman-Monteith (ETo-PM) method presented by the United Nations Food and Agriculture Organization (FAO) has been accepted as the standard, since other methods give different results [2]. This method calculates reference evapotranspiration values using different meteorological variables. of the study, models based on artificial intelligence methods were found to be successful for ETo estimation. Srivastava et al. [17] successfully estimated the amount of reference evapotranspiration (ETo) using NASA/POWER and National Center for Environmental Prediction (NCEP) global data through the Weather Research and Forecast (WRF) model for an agricultural field in northern India.
The study aims to estimate daily reference evapotranspiration values with the GRU method, which is a deep learning technique, and well-known tree regression-based models, namely, M5P, RF, random tree (RT) and RepTree methods, based on data using the measured values from the meteorology station in Tekirdag, Turkey, which is surrounded by sea on three sides. In addition, by comparing the obtained results, the aim was to determine the model that gives the best result, and to determine the meteorological variables that are effective in the modeling and are able to make predictions with the fewest parameters.

Material and Methods
The Tekirdag region is an important agricultural land of Turkey surrounded by seas. A map of this region is given in Figure 1. Among the most widely cultivated field crops in the province are wheat, sunflower, barley, corn, and alfalfa, which are produced in a total of 3,846,960 da areas. The size of the fruit production area is 109,135 da, in which grapes, apples, olives, pears, and cherries are mostly grown. Vegetables such as watermelon, melon, tomato, onion, and cucumber are produced in a total area of 43,873 da. Tekirdag province produces 10% of the total textiles, 25% of the total margarine, and 20% of the total sunflower oil of the entire country. It ranks 9th according to the socio-economic development index [18,19]. the study, models based on artificial intelligence methods were found to be successful for ETo estimation. Srivastava et al. [17] successfully estimated the amount of reference evapotranspiration (ETo) using NASA/POWER and National Center for Environmental Prediction (NCEP) global data through the Weather Research and Forecast (WRF) model for an agricultural field in northern India.
The study aims to estimate daily reference evapotranspiration values with the GRU method, which is a deep learning technique, and well-known tree regression-based models, namely, M5P, RF, random tree (RT) and RepTree methods, based on data using the measured values from the meteorology station in Tekirdağ, Turkey, which is surrounded by sea on three sides. In addition, by comparing the obtained results, the aim was to determine the model that gives the best result, and to determine the meteorological variables that are effective in the modeling and are able to make predictions with the fewest parameters.

Material and Methods
The Tekirdağ region is an important agricultural land of Turkey surrounded by seas. A map of this region is given in Figure 1. Among the most widely cultivated field crops in the province are wheat, sunflower, barley, corn, and alfalfa, which are produced in a total of 3,846,960 da areas. The size of the fruit production area is 109,135 da, in which grapes, apples, olives, pears, and cherries are mostly grown. Vegetables such as watermelon, melon, tomato, onion, and cucumber are produced in a total area of 43,873 da. Tekirdag province produces 10% of the total textiles, 25% of the total margarine, and 20% of the total sunflower oil of the entire country. It ranks 9th according to the socioeconomic development index [18,19]. The daily meteorological data used in the study were obtained from the State Meteorology Service. The data set consists of daily min-max temperatures, wind speeds, dew point temperatures, sunshine times, and min-max relative humidities from 1 January 1993 to 31 December 2018. The longterm monthly changes of some meteorological parameters are given in Figure 2. The statistical features of the meteorological parameters of the Tekirdag synoptic station are given in Table 1. A correlation matrix is given in Table 2 to determine the meteorological variables that have a statistically significant effect on ETo. As can be understood from the correlation matrix values, there is a strong and statistically significant relationship between the Tmax and Tmin variables and ETo amounts. In the light of the correlation values, the combinations of variables that will enter the models as input are determined. The daily meteorological data used in the study were obtained from the State Meteorology Service. The data set consists of daily min-max temperatures, wind speeds, dew point temperatures, sunshine times, and min-max relative humidities from 1 January 1993 to 31 December 2018. The long-term monthly changes of some meteorological parameters are given in Figure 2. The statistical features of the meteorological parameters of the Tekirdag synoptic station are given in Table 1. A correlation matrix is given in Table 2 to determine the meteorological variables that have a statistically significant effect on ETo. As can be understood from the correlation matrix values, there is a strong and statistically significant relationship between the Tmax and Tmin variables and ETo amounts. In the light of the correlation values, the combinations of variables that will enter the models as input are determined. Figure 3 indicates the distribution of the meteorological data. Tmin, Tmax, and ETo values show a distribution close to normal. Although data mining and artificial intelligence studies do not require compliance with normal distribution, it is a positive situation for the classification models to yield a successful result.
Mathematics 2019, 7, x 4 of 18 Figure 3 indicates the distribution of the meteorological data. Tmin, Tmax, and ETo values show a distribution close to normal. Although data mining and artificial intelligence studies do not require compliance with normal distribution, it is a positive situation for the classification models to yield a successful result.

ETo Calculation
FAO has defined the term ETo [21,22]. Despite the fact that the Penman-Monteith (PM) formula is much more complex than other formulas, it has been formally explained and recommended by FAO. In previous studies, the FAO Penman-Monteith equation (ETo-PM, Equation (1)) was used as a base model. The relation has two main features: (1) it can be used in any weather conditions without regional calibration, and (2) the precision of the relationship is based on lysimetric data in an approved spherical range. However, in many countries, there is still no equipment to observe these variables correctly, or data is not regularly recorded [23]. where

Tree Models
Classification and regression trees methodology consists of three parts: the creation of a maximum tree, selection of the appropriate treewidth, and the classification of new data from the generated tree [24,25]. The algorithm used for classification is known as a classifier. The term "classifier" refers to the mathematical process of a classification algorithm which sometimes maps the input data to a kind. In machine learning terminology, classification is described as an example of supervised learning, that is, a training set of correctly defined observations. A classification algorithm uses a step-by-step method to estimate the output of new sample data.

M5 Decision Tree (M5T)
The decision tree approach is a binary (two-way split) model that indicates how the amount of a dependent variable can be estimated from the independent variable values. There are two types of

ETo Calculation
FAO has defined the term ETo [21,22]. Despite the fact that the Penman-Monteith (PM) formula is much more complex than other formulas, it has been formally explained and recommended by FAO. In previous studies, the FAO Penman-Monteith equation (ETo-PM, Equation (1)) was used as a base model. The relation has two main features: (1) it can be used in any weather conditions without regional calibration, and (2) the precision of the relationship is based on lysimetric data in an approved spherical range. However, in many countries, there is still no equipment to observe these variables correctly, or data is not regularly recorded [23]. where

Tree Models
Classification and regression trees methodology consists of three parts: the creation of a maximum tree, selection of the appropriate treewidth, and the classification of new data from the generated tree [24,25]. The algorithm used for classification is known as a classifier. The term "classifier" refers to the mathematical process of a classification algorithm which sometimes maps the input data to a kind. In machine learning terminology, classification is described as an example of supervised learning, that is, a training set of correctly defined observations. A classification algorithm uses a step-by-step method to estimate the output of new sample data.

M5 Decision Tree (M5T)
The decision tree approach is a binary (two-way split) model that indicates how the amount of a dependent variable can be estimated from the independent variable values. There are two types of decision trees: (1) classification trees are the most common, and (2) regression trees are used for estimation purposes based on numerical variables [26].
If each leaf in the tree contains linear regression relationships for the prediction of the target variable in that leaf, this is named the tree model. The M5 decision tree algorithm was developed by Quinlan [27]. The M5 algorithm uses tests on a single attribute that maximizes the variance in the target space, creating a regression sequence by iteratively dividing the sample space. A mathematical formula for calculating standard deviation reduction (SDR) is: where T is a set of examples that reaches the node, T i is the subset of examples that have the ith outcome of the potential set, and sd is the standard deviation. After the tree is grown, linear multiple regression is created for each internal node using the data for that node and all the attributes involved in the tests in that node's subtest. Each subtree is then considered in pruning to overcome irregular growth problems. Pruning takes place when the prediction error in the linear relationship at the root of a subtree is less than or equal to the expected error for the subtree. Finally, smoothing is used to compensate for sharp discontinuities between adjacent linear patterns on the leaves of the pruning tree.

Reduces Error Pruning (REP) Tree Classifier
As a fast decision tree approach, the REP Tree Classifier is based on the idea of calculating information acquisition with entropy and minimizing the error caused by variance [26]. The REP Tree creates multiple trees in regression tree modified iterations. Then, the best of the trees produced is selected. This algorithm creates a regression/decision tree within the framework of variance and the knowledge gain approach. By using the method of linking, this algorithm reduces the pruning error rate. The measure used in pruning the tree is the error in the average frame predicted by the tree. The values of numerical attributes are sorted at the beginning of the modeling process. As with the C4.5 Algorithm, this algorithm divides the corresponding samples into pieces and processes the missing values [27].

The Random Tree
The random tree algorithm selects a test based on a specific number of random features at each node without pruning. Commonly, Random Trees refer to random data and have nothing to do with machine learning [3].
The RF is a controlled classifier, i.e., a community learning algorithm that produces many individual learners. It uses a bagging idea to generate a random data set to form a decision tree. Each node in a random forest is best divided among the precursor subsets chosen randomly in that node. The algorithm deals with both classification and regression problems. Random trees are a collection of tree estimators called forests. The classification works as follows: the random trees classifier takes the input property vector, classifies it with each tree in the forest, and extracts the class label that receives the majority of votes. In the event of a rejection, the classifier response is the average of the responses of all trees in the forest [28]. RTs are fundamentally a combination of two algorithms that exist in machine learning: single model trees and RF ideas. Model trees are decision trees in which each leaf has a linear pattern optimized for the local subdomain described by this leaf. RFs have been shown to significantly enhance the performance of single stable trees; tree diversity is generated by two random methods. First, the training data is sampled by replacing each tree, as in Bagging. Second, when growing a tree, instead of always calculating the best possible division for each node, only one random subset of all attributes is considered on each node, and the best part is calculated for that subset. Such trees are for classification. Random model trees combine random forests and model trees for the first time. RTs use this product for split selection, and therefore, stimulate sensibly balanced trees where a spherical environment for the ridge runs on all leaves, thus simplifying the optimization procedure [29].

The Random Forest
The RF algorithm is a supervised classification algorithm. There is a direct relationship between the number of trees in the algorithm and the results it can achieve. As the number of trees increases, we get a definite result. The difference between the RF algorithm and the Decision Tree algorithm is that the Root Node discovery and division of nodes in RF is running randomly. This is because the RF algorithm can be used in both classification and regression tasks. Overfitting is also a critical problem that adversely affects results, but for the RF algorithm, if there are enough trees in the forest, the likelihood of an overfitting problem is reduced. The third advantage is that the classifier of the RF algorithm can handle missing values, and finally, it can be modeled for the classifier categorical values of the RF algorithm.
There are two stages in the RF algorithm: one is the creation of an RF, and the other is to estimate through the RF classifier created in the first stage. The RF algorithm can be used to identify the most important feature among the features available in the training data set.
The RF method consists of groups of the classification tree or the regression tree, as appropriate. Therefore, one of the most commonly used algorithms among community methods is RF. It can achieve the best model setup when rerunning the random forest algorithm [30]. The underlying idea of the method is to form communities with the help of a randomly selected subset of a large number of foresight trees [31,32]. The RF Method is categorical and continuous; it can also be used in large or small data sets. The disadvantage of the method is that it does not give a tree as output, in contrast to the Classification Tree Method [33]. The advantage of selecting random estimators in this way is that the resulting model is more accurate, as less correlation is obtained between the trees in the community [34]. In this method, as in the classification and regression trees, the Gini Index (GI) is used as division criteria.
A decrease in the Gini index is desirable because it indicates an increase in purity. The fact that this index is ultimately equal to zero means maximum purity [35].

Gated Recurrent Units (GRU)
Prediction with GRU architecture, a recurrent neural network (RNN) type made using the Python language, is very efficient. GRU requires a short time for training compared to the other methods. For training, a Pearson coefficient method is applied that will extract the main features that will affect the prediction. This gate is mainly introduced to remove the problem present in RNN. This is why it uses two gates; the first one is reset and the next one is the update gate. The main structure of the GRU network can be explained with the help of Figure 4. GRUs have been shown to exhibit even better performance on certain smaller datasets [36,37]. In the GRU model used in this study, two hidden layers with 200 and 150 neurons, Relu activation function, and Adam optimization were used. Learning rate alternatives from 1 × 10 −1 to 1 × 10 −9 , decay as 1 × 10 −1 to 1 × 10 −9 , and 250-500 as epochs were attempted.

Weka and Python
The Weka software was introduced at Waikato University in New Zealand. The system was written in Java and distributed under the terms of a General Public License. Weka supports many standard data mining works such as data preprocessing, clustering, regression, classification, visualization, and feature selection. It presents a uniform interface for many different learning algorithms to evaluate the outcome of learning schemes in pre-and post-processing and in any given data set. Weka is a collection of the most advanced machine learning algorithms and data preprocessing tools [26,39]. Python is a high-level, interpreted, open-source, general-purpose programming language. Created by Guido van Rossum and released in 1991, it is used for data science, machine learning, system automation, web and API development, and more [40].

Results and Discussion
In this section, the results obtained according to different data mining methods and input combinations are given and compared with the deep learning technique.
The measurement of meteorological variables is difficult or costly; this was taken into account when creating scenarios. Naturally, it is desirable to estimate the amount of ETo with the help of fewer or even one or two easily measurable variables, rather than using all or multiple meteorological variables. Alternate scenarios and their input variables are given in Table 3.
In this study, the input variables used in the formation of scenarios are based on two important factors. The first factor was based on variables affecting ETo within the framework of the theoretical approach, while the second was based on the correlation coefficient between ETo and other independent meteorological variables. The effect of a single input variable on the ETo estimation was examined in scenarios 8, 9 and 11 (Table 3).

M5P Model
The results obtained with the M5P Model are given in Table 4. As shown, the best result was obtained in the first scenario (R = 0.9925, MAE = 0.1566 mm/day, RMSE = 0.2135 mm/day), but in this scenario, there were 8 input variables. The model presented 63 different linear models for ETo calculation. In this case, the 1st scenario is quite difficult to use in terms of implementation. The 15th scenario is relatively useful when using only Tmax as a measured input (R = 0.9694, MAE = 0.319 mm/day, RMSE = 0.4277 mm/day), and is more useful than the first scenario due to the small number of parameters and the 20 linear equations it generates. According to the M5P results, even though the linear equation number is advantageous in the 6th and 10th scenarios, the accuracy rate is lower compared to the 1st and 15th scenarios, making it unsuitable for use. The results of the 1st and 15th scenarios from the M5P model are close to the results obtained from the ETo-PM model. As can be

Weka and Python
The Weka software was introduced at Waikato University in New Zealand. The system was written in Java and distributed under the terms of a General Public License. Weka supports many standard data mining works such as data preprocessing, clustering, regression, classification, visualization, and feature selection. It presents a uniform interface for many different learning algorithms to evaluate the outcome of learning schemes in pre-and post-processing and in any given data set. Weka is a collection of the most advanced machine learning algorithms and data preprocessing tools [26,39]. Python is a high-level, interpreted, open-source, general-purpose programming language. Created by Guido van Rossum and released in 1991, it is used for data science, machine learning, system automation, web and API development, and more [40].

Results and Discussion
In this section, the results obtained according to different data mining methods and input combinations are given and compared with the deep learning technique.
The measurement of meteorological variables is difficult or costly; this was taken into account when creating scenarios. Naturally, it is desirable to estimate the amount of ETo with the help of fewer or even one or two easily measurable variables, rather than using all or multiple meteorological variables. Alternate scenarios and their input variables are given in Table 3.
In this study, the input variables used in the formation of scenarios are based on two important factors. The first factor was based on variables affecting ETo within the framework of the theoretical approach, while the second was based on the correlation coefficient between ETo and other independent meteorological variables. The effect of a single input variable on the ETo estimation was examined in scenarios 8, 9 and 11 (Table 3).

M5P Model
The results obtained with the M5P Model are given in Table 4. As shown, the best result was obtained in the first scenario (R = 0.9925, MAE = 0.1566 mm/day, RMSE = 0.2135 mm/day), but in this scenario, there were 8 input variables. The model presented 63 different linear models for ETo calculation. In this case, the 1st scenario is quite difficult to use in terms of implementation. The 15th scenario is relatively useful when using only Tmax as a measured input (R = 0.9694, MAE = 0.319 mm/day, RMSE = 0.4277 mm/day), and is more useful than the first scenario due to the small number of parameters and the 20 linear equations it generates. According to the M5P results, even though the linear equation number is advantageous in the 6th and 10th scenarios, the accuracy rate is lower compared to the 1st and 15th scenarios, making it unsuitable for use. The results of the 1st and 15th scenarios from the M5P model are close to the results obtained from the ETo-PM model. As can be seen from Figure 5a,b, there is a high level of agreement between the values obtained from M5P and PMF-56.   28 2 In this study, the running time of the models was affected by the processor of the used computer, the number of input variables, and the data in the scenarios. As seen in Table 4, the running time was very short. The low running time here is shown only to emphasize the advantage of the proposed method.

The Random Forest
The results of the RF model for different scenarios are provided in Table 5. As shown, the best results were obtained in the first scenario (R = 0.9926, MAE = 0.1533 mm/day, RMSE = 0.2122 mm/day), but there were 8 input variables, and it is desirable to have a small number of inputs so

The Random Forest
The results of the RF model for different scenarios are provided in Table 5. As shown, the best results were obtained in the first scenario (R = 0.9926, MAE = 0.1533 mm/day, RMSE = 0.2122 mm/day), but there were 8 input variables, and it is desirable to have a small number of inputs so that the model does not become too complicated. Therefore, the use of the 1st scenario is not suitable for implementation. Scenario 15, however, when using Tmax as a measured input with a monthly time index, yielded a relatively good result (R = 0.963, MAE = 0.3486 mm/day, RMSE = 0.4697 mm/day) compared to the 1st scenario. The 1st and 15th scenario results from the RF model are very similar to the distribution of the results obtained from the ETo-PM model. As can be seen from Figure 6a,b, there is a high level of agreement between the values obtained from RF and ETo-PM. The RF model uses multiple trees because it takes variables randomly as its input, and the best model selection process is better than the M5T model because it uses the voting principle.

The Random Tree
As shown in Table 6, the best results were obtained in the first scenario (R = 0.9798, MAE = 0.2472 mm/day, RMSE = 0.3502 mm/day). In the 1st scenario, there were 8 input variables, but the 15th scenario yielded a result close to the 1st scenario using only Tmax as measured input with a monthly time index (R = 0.9599, MAE = 0.3591 mm/day, RMSE = 0.4895 mm/day). Figures 7a,b show the results obtained from the 1st and 15th scenarios from the RT model and the distribution of the results from the ETo-PM model. As can be seen from the graphs, there was agreement between the values obtained from RT and ETo-PM.

The Random Tree
As shown in Table 6, the best results were obtained in the first scenario (R = 0.9798, MAE = 0.2472 mm/day, RMSE = 0.3502 mm/day). In the 1st scenario, there were 8 input variables, but the 15th scenario yielded a result close to the 1st scenario using only Tmax as measured input with a monthly time index (R = 0.9599, MAE = 0.3591 mm/day, RMSE = 0.4895 mm/day). Figure 7a,b show the results obtained from the 1st and 15th scenarios from the RT model and the distribution of the results from the ETo-PM model. As can be seen from the graphs, there was agreement between the values obtained from RT and ETo-PM.

REPtree
The results of the RF model for different scenarios are given in Table 7. As shown in the table, the best results were obtained in the first scenario (R = 0.982, MAE = 0.2366 mm/day, RMSE = 0.3288 mm/day). The results of the 2nd scenario were almost the same as those of the 1st scenario. Thus, it was found that the effect of using Tdew on the model was negligible. When the 15th scenario was examined, it was seen that Reptree model gave relatively good results (R = 0.967, MAE = 0.3284 mm/day, RMSE = 0.4435 mm/day).

REPtree
The results of the RF model for different scenarios are given in Table 7. As shown in the table, the best results were obtained in the first scenario (R = 0.982, MAE = 0.2366 mm/day, RMSE = 0.3288 mm/day). The results of the 2nd scenario were almost the same as those of the 1st scenario. Thus, it was found that the effect of using Tdew on the model was negligible. When the 15th scenario was examined, it was seen that Reptree model gave relatively good results (R = 0.967, MAE = 0.3284 mm/day, RMSE = 0.4435 mm/day). In Figure 8a,b, the scatter plot of the 1st and 15th scenarios of the RepTree model are given. As can be seen from the graphs, the values obtained in the RepTree method are less compatible with the M5P and RF methods.

GRU
The Python language was used to model the study with RNN-GRU. The model consists of 2 hidden layers with 200 and 150 neurons respectively. Relu was used as an activation function, and the Adam algorithm was used for optimization.
The results of the test period obtained in 15 different scenarios using the GRU model are provided in Table 8. As shown, scenario 1 using all meteorological variable as inputs gave the best result (R = 0.9931, MAE = 0.1953 mm/day, RMSE = 0.2556 mm/day). The fact that the GRU Model uses only Tmax and n as the input variable in scenario 12 is also very important for in situ applications, since the Tmax and n variables are easier and cheaper to measure. The GRU model took 204 s to complete scenario 12, as it is a deep learning mechanism and has many hidden layers. GRU trains the model more slowly than other machine learning methods, which may be a disadvantage for deep learning. However, with fast computers, it may not make sense to view this duration as a disadvantage. The 15th scenario yielded a relatively good result when only the measured Tmax variable was used with MTI (R = 0.9837, MAE = 0.2433 mm/day, RMSE = 0.3292 mm/day). For the 15th scenario, 223 s were needed. Scatter plots of scenarios 1 and 15 are presented in Figures 9a,b. As can be seen from the graphs, the results obtained from the ETo-PM method and GRU model are highly compatible.

GRU
The Python language was used to model the study with RNN-GRU. The model consists of 2 hidden layers with 200 and 150 neurons respectively. Relu was used as an activation function, and the Adam algorithm was used for optimization.
The results of the test period obtained in 15 different scenarios using the GRU model are provided in Table 8. As shown, scenario 1 using all meteorological variable as inputs gave the best result (R = 0.9931, MAE = 0.1953 mm/day, RMSE = 0.2556 mm/day). The fact that the GRU Model uses only Tmax and n as the input variable in scenario 12 is also very important for in situ applications, since the Tmax and n variables are easier and cheaper to measure. The GRU model took 204 s to complete scenario 12, as it is a deep learning mechanism and has many hidden layers. GRU trains the model more slowly than other machine learning methods, which may be a disadvantage for deep learning. However, with fast computers, it may not make sense to view this duration as a disadvantage. The 15th scenario yielded a relatively good result when only the measured Tmax variable was used with MTI (R = 0.9837, MAE = 0.2433 mm/day, RMSE = 0.3292 mm/day). For the 15th scenario, 223 s were needed. Scatter plots of scenarios 1 and 15 are presented in Figure 9a,b. As can be seen from the graphs, the results obtained from the ETo-PM method and GRU model are highly compatible.  For the evaluation of overfitting/underfitting problems of the GRU model, a loss vs epoch plot is given in Figure 10. As expected, the training and test loss values are parallel and decreasing. The performance of different input combinations with the models used in this study was compared according to the R values, and is summarized in Figure 11. As can be understood from the figure, scenarios with the Tmax variable as one of the meteorological parameters gave better results For the evaluation of overfitting/underfitting problems of the GRU model, a loss vs epoch plot is given in Figure 10. As expected, the training and test loss values are parallel and decreasing. For the evaluation of overfitting/underfitting problems of the GRU model, a loss vs epoch plot is given in Figure 10. As expected, the training and test loss values are parallel and decreasing. The performance of different input combinations with the models used in this study was compared according to the R values, and is summarized in Figure 11. As can be understood from the figure, scenarios with the Tmax variable as one of the meteorological parameters gave better results than the others. Scenario 1, which includes all meteorological variables, was found to be the best for The performance of different input combinations with the models used in this study was compared according to the R values, and is summarized in Figure 11. As can be understood from the figure, scenarios with the Tmax variable as one of the meteorological parameters gave better results than the others. Scenario 1, which includes all meteorological variables, was found to be the best for all models. Among the five models used in the study, the GRU model (R = 0.9931) based on the deep learning mechanism, gave the best result. The statistical properties of the ETo values obtained from different models and ETo-PM are given in Table 9. As shown, the statistical properties of the methods used were quite similar to the those of the ETo-PM method, except for the GRU model. The GRU results showed less similarity than other methods. In this respect, the GRU results showed less success than the other methods, especially compared to standard deviation values.
In this study, to better evaluate the performance of the models used, a Taylor diagram was drawn according for scenarios 1 and 15 ( Figure 12). As can be seen from Figure 11, the M5P method yielded results that were closer to observed values than those of the other methods for scenario 1. M5P is followed by RF and GRU. As can be seen from this graph, the results of the RT and REPTree models were located at a greater distance from the observed values in terms of the correlation coefficient, and thus, seemed to be relatively unsuccessful compared to other models. The Taylor diagram of the 15th scenario shows that the GRU model was closest to the ETo-PM. Another visual result of the Taylor diagram of the 15th scenario is that the other four models were located close to each other.  The statistical properties of the ETo values obtained from different models and ETo-PM are given in Table 9. As shown, the statistical properties of the methods used were quite similar to the those of the ETo-PM method, except for the GRU model. The GRU results showed less similarity than other methods. In this respect, the GRU results showed less success than the other methods, especially compared to standard deviation values. In this study, to better evaluate the performance of the models used, a Taylor diagram was drawn according for scenarios 1 and 15 ( Figure 12). As can be seen from Figure 11, the M5P method yielded results that were closer to observed values than those of the other methods for scenario 1. M5P is followed by RF and GRU. As can be seen from this graph, the results of the RT and REPTree models were located at a greater distance from the observed values in terms of the correlation coefficient, and thus, seemed to be relatively unsuccessful compared to other models. The Taylor diagram of the 15th scenario shows that the GRU model was closest to the ETo-PM. Another visual result of the Taylor diagram of the 15th scenario is that the other four models were located close to each other. studies [5][6][7]11,23], is that deep learning techniques were tested for the first time using data from Turkey's coast; they showed good success rates with input parameters based only on temperature.

Conclusions
Accurate estimations of ETo quantities are essential for the efficient use of water for irrigation in agricultural economies. ETo calculations can be made with many nonlinear and multiparameter experimental equations. These experimental equations are really difficult and time consuming to use. However, it was observed that various phenomena with nonlinear structures were calculated with data-based and machine learning methods without much difficulty. The evapotranspiration phenomenon contains a complex and nonlinear structure as an important element of the hydrology cycle. In this study, the feasibility of different data mining methods and deep learning models on ETo estimation were evaluated. According to the obtained results, when considering different combinations of meteorological variables as inputs, the RF and GRU methods achieved high levels of success in ETo estimations compared to other methods. In this study, as a result of comparing different input scenarios, it was seen that the maximum temperature parameter is the most important input variable for ETo prediction models. This indicates that temperature is an important variable in ETo formation, since the region is humid. The measurement of temperature is very easy, simple, and cost-effective compared to other meteorological variables. The results of this study were analyzed; it can be seen that the findings were very close to those of previous studies [14,15]. As a result, given the limited meteorological data in humid areas, a high level of accuracy and easy predictability of the ETo amount were achieved using only the maximum temperature data. The most important contributions of this study are (1) its evaluation of the performance of the GRUs method, which is a deep learning method, with an estimate of ETo, and (2) its very successful ETo predictions with only one meteorological variable. It is recommended that the proposed method be used to calculate ETo without the need of other meteorological parameters.  In this study, as in similar studies, the amount of ETo was estimated by some artificial intelligence methods. The accuracy rate obtained in this study was similar to those of previous studies [8][9][10]14,15], i.e., in the range of R = 0.8 to 0.99. It showed that artificial intelligence models were successful in ETo predictions. The most important contribution of this study, compared to other studies [5][6][7]11,23], is that deep learning techniques were tested for the first time using data from Turkey's coast; they showed good success rates with input parameters based only on temperature.

Conclusions
Accurate estimations of ETo quantities are essential for the efficient use of water for irrigation in agricultural economies. ETo calculations can be made with many nonlinear and multiparameter experimental equations. These experimental equations are really difficult and time consuming to use. However, it was observed that various phenomena with nonlinear structures were calculated with data-based and machine learning methods without much difficulty. The evapotranspiration phenomenon contains a complex and nonlinear structure as an important element of the hydrology cycle. In this study, the feasibility of different data mining methods and deep learning models on ETo estimation were evaluated. According to the obtained results, when considering different combinations of meteorological variables as inputs, the RF and GRU methods achieved high levels of success in ETo estimations compared to other methods. In this study, as a result of comparing different input scenarios, it was seen that the maximum temperature parameter is the most important input variable for ETo prediction models. This indicates that temperature is an important variable in ETo formation, since the region is humid. The measurement of temperature is very easy, simple, and cost-effective compared to other meteorological variables. The results of this study were analyzed; it can be seen that the findings were very close to those of previous studies [14,15]. As a result, given the limited meteorological data in humid areas, a high level of accuracy and easy predictability of the ETo amount were achieved using only the maximum temperature data. The most important contributions of this study are (1) its evaluation of the performance of the GRUs method, which is a deep learning method, with an estimate of ETo, and (2) its very successful ETo predictions with only one meteorological variable. It is recommended that the proposed method be used to calculate ETo without the need of other meteorological parameters.

Conflicts of Interest:
The authors declare no conflict of interest.