Development of Boosted Machine Learning Models for Estimating Daily Reference Evapotranspiration and Comparison with Empirical Approaches

: Proper irrigation scheduling and agricultural water management require a precise estimation of crop water requirement. In practice, reference evapotranspiration (ETo) is ﬁrstly estimated, and used further to calculate the evapotranspiration of each crop. In this study, two new coupled models were developed for estimating daily ETo. Two optimization algorithms, the shufﬂed frog-leaping algorithm (SFLA) and invasive weed optimization (IWO), were coupled on an adaptive neuro-fuzzy inference system (ANFIS) to develop and implement the two novel hybrid models (ANFIS-SFLA and ANFIS-IWO). Additionally, four empirical models with varying complexities, including Hargreaves–Samani, Romanenko, Priestley–Taylor, and Valiantzas, were used and compared with the developed hybrid models. The performance of all investigated models was evaluated using the ETo estimates with the FAO-56 recommended method as a benchmark, as well as multiple statistical indicators including root-mean-square error ( RMSE ), relative RMSE ( RRMSE ), mean absolute error ( MAE ), coefﬁcient of determination ( R 2 ), and Nash–Sutcliffe efﬁciency ( NSE ). All models were tested in Tabriz and Shiraz, Iran as the two studied sites. Evaluation results showed that the developed coupled models yielded better results than the classic ANFIS, with the ANFIS-SFLA outperforming the ANFIS-IWO. Among empirical models, generally the Valiantzas model in its original and calibrated versions presented the best performance. In terms of model complexity (the number of predictors), the model performance was obviously enhanced by an increasing number of predictors. The most accurate estimates of the daily ETo for the study sites were achieved via the hybrid ANFIS-SFLA models using full predictors, with RMSE within 0.15 mm day − 1 , RRMSE within 4%, MAE within 0.11 mm day − 1 , and both a high R 2 and NSE of 0.99 in the test phase at the two studied sites. the performances. Among the empirical models applied, the Valiantzas M1–M6 of ANFIS at both stations, M1 and M2 of M1–M5 of ANFIS-IWO both phases of the Tabriz station; M1–M2 M1–M5 (test phase) models as M1, M4 of at the training stage and M1–M6 of ANFIS-IWO at the test stage of Shiraz station. We concluded that the version of the Valiantzas model could use in modeling the daily ETo with high of precision; this model could be compatible with the hybrid models proposed in the present study. In general, the M7 models of ANFIS, ANFIS-SFLA, and ANFIS-IWO performed much better than the calibrated empirical models; however, the M7 models of ANFIS-SFLA developed at the Tabriz and Shiraz stations were the best-performing techniques for modeling of the daily ETo with a dependable accuracy.


Introduction
Evapotranspiration (ET) is an important component of the hydrologic cycle. An accurate estimation of ET is required for many applications, such as optimal water resources management, irrigation planning, determination of irrigation intervals, design of irrigation systems, agricultural water management, and studies related to water balance at each area [1][2][3][4][5][6]. Lysimeters are commonly applied to directly measure the ET; however, measuring it with this method is costly and requires considerable time, making it difficult to use in many areas. Additionally, eddy covariance and Bowen ratio energy balance are other direct techniques of determining the ET that are not usually applied in practice due to their complexities and costs [7][8][9]. Hence, indirect techniques are often used to estimate Eslamian et al. [28] for estimating the monthly ETo of selected stations in Isfahan, Central Iran. The monthly ETo estimates of the hybrid ANN-GA were closer to the FAO-56 PM data, compared to the ANN. Yin et al. [29] investigated the accuracy of a developed hybrid SVM-GA, as well as single SVM and feed-forward neural networks (FFNNs), in daily ETo modeling of a semiarid region in China. It was concluded that the hybrid SVM-GA had a better accuracy than the ANN and SVM. A hybrid model was proposed by Tao et al. [30] via combining a firefly algorithm (FA) with ANFIS for estimating daily ETo of three stations located in Burkina Faso. The proposed coupled ANFIS-FA was found to outperform the classic ANFIS. In another work, Wu et al. [31] integrated bio-inspired optimization algorithms, including the GA, ant colony optimization (ACO), cuckoo search algorithm (CSA), and flower pollination algorithm (FPA), with an ELM for forecasting of the daily ETo at eight locations in China. The ELM coupled with the FPA (i.e., ELM-FPA) outperformed the other hybrid models that were developed. Other types of coupled models developed via hybridizing the ML and optimization algorithms have been recently proposed to improve ETo modeling. For example, interested readers can refer to Ahmadi et al. [13], Roy et al. [32], Chia et al. [33], Yan et al. [34], Gong et al. [35], Gao et al. [36], and Dong et al. [37].
Considering the importance of ETo in the optimal agricultural water management and planning for available water resources systems, estimating its values in each region via the suitable approaches is an essential requirement. This study attempted to propose novel models for daily ETo modeling. In this context, two types of optimization algorithms, consisting of the shuffled frog-leaping algorithm (SFLA) and invasive weed optimization (IWO), were coupled on an adaptive neuro-fuzzy inference system (ANFIS) as predictor tools, which was the novelty of this study. Therefore, the coupled ANFIS-SFLA and ANFIS-IWO models were proposed. Additionally, four empirical models, including Hargreaves-Samani, Romanenko, Priestley-Taylor, and Valiantzas, were utilized in both their original and calibrated forms. The performances of the applied models (i.e., the classic ANFIS, the hybrid ANFIS-SFLA and ANFIS-IWO, and original and calibrated forms of the empirical models) were compared with each other by means of multiple error indicators. We focused on two stations in Iran (i.e., Tabriz and Shiraz) as the study sites. To the best of our knowledge, this is the first attempt in the literature to develop a hybrid ANFIS-SFLA and ANFIS-IWO for modeling daily ETo.

Study Sites and Data Used
This study considered two sites in Iran, namely Tabriz and Shiraz, as the study locations. The Tabriz station is located in the northwest of Iran, and its latitude, longitude, and altitude are 38 • 05 N, 46 • 17 E, and 1361.0 m, respectively. The long-term mean annual precipitation at this location is 283.6 mm. In addition, the Shiraz station in the south of Iran is located at a latitude of 29 • 32 N, longitude of 52 • 36 E, and altitude of 1484 m. The mean annual precipitation of Shiraz is 328.0 mm. The location of the studied sites is shown in Figure 1.

Empirical Models Used
Different empirical models, including ones based on temperature, mass transfer, radiation, and various meteorological parameters, were proposed for estimating the ETo time series. As noted, the FAO-56 PM is accepted as a reliable method to estimate ETo. Hence, ETo estimated by FAO-56 PM was utilized as a benchmark to evaluate the performance of other empirical models applied, the classic ANFIS and the two developed hybrid models (ANFIS-SFLA and ANFIS-IWO). Additionally, four other empirical models with varying complexities (different numbers of input variables) were selected and used. They were the Hargreaves-Samani (temperature-based), Romanenko (mass-transfer-based), Priestley-Taylor (radiation-based), and Valinatzas (based on various meteorological parameters) models. The mathematical equations of these models in their original forms are presented in Table 2. Table 2. The original forms of empirical models applied in this study.

Adaptive Neuro-Fuzzy Inference System (ANFIS)
The ANFIS model was introduced by Jang in 1993 for first time [43]. This model is similar to a multilayered artificial neural network, except that it also uses fuzzy logic in addition to learning artificial neural network algorithms [44]. An ANFIS model consists of five layers: the data entry layer, fuzzy rules weight calculation layer, obtained weight normalization layer, rule calculation layer, summation layer, and network output layer. The distinguishing feature of ANFIS is the provision of a hybrid learning algorithm for the postdiffusion slope method and the least-squares method to modify the parameters [45]. In this research, the hybrid method was considered as the training model of the ANFIS model. Figure 3 shows a scheme of the ANFIS model used. The distinguishing feature of ANFIS is the provision of a hybrid learning algorithm for the postdiffusion slope method and the least-squares method to modify the parameters [45]. In this research, the hybrid method was considered as the training model of the ANFIS model. Figure 3 shows a scheme of the ANFIS model used.

Shuffled Frog-Leaping Algorithm (SFLA)
The SFLA is a bio-inspired optimization algorithm that is based on the social behavior of frogs, and it belongs to the category of memetic algorithms [46]. The SAFLA is a metaheuristic optimization algorithm, and this metaheuristic algorithm is a swarm

Shuffled Frog-Leaping Algorithm (SFLA)
The SFLA is a bio-inspired optimization algorithm that is based on the social behavior of frogs, and it belongs to the category of memetic algorithms [46]. The SAFLA is a metaheuristic optimization algorithm, and this metaheuristic algorithm is a swarm intelligence-based approach that is used for solving complexities of large optimization issues. The main idea behind this algorithm is to apply a local search method within the structure of the genetic algorithm to improve the aqueous work of the resonance process during the search. The metaheuristic algorithm first encodes the sum of the initial solutions, then it calculates the desirability of each answer based on a fitness function and generates new ones [47].
The function of this algorithm is to convert the original set into a number of smaller sets and then rearrange them with the competitive complex evolution (CCE) technique, and again by merging the ordered subsets of the original set that it makes one step more orderly and does it over and over to obtain the most optimal answers [48]. In the SFLA algorithm, the algorithm that performs this sorting on the subsets is called FLA, which is the CCE optimized and refined algorithm. In CCE, sorting into a complexity is done by subpopulations of the main population, but in FLA, this is done first on one memeplex and then on all memeplexes to always obtain the best answer out of all possible answers [46].

Invasive Weed Optimization (IWO)
The IWO is a type of sensible and evolutionary optimization algorithm that was first introduced by Mehrabian and Lucas [49], and was inspired via the procedure of proliferation, survival, and adaptability of weeds. Consistent with the IWO definition, a weed is a plant that produces and grows in unwanted places, and is a serious pest for other plants and forestalls their growth. This set of rules, whilst easy, is very effective and fast in finding the optimal factors and operates based on the basic and natural traits of weeds along with seed production, growth, and survival conflict in a colony.
First, a limited initial population is randomly generated and scattered in the problemsolving space. When determining the amount of initial population and reproduction, in IWO optimization method, every member of the population produces seeds according to its capabilities [50]. The product varies linearly from the smallest possible number of seeds to the largest number, and the weeds produce more seeds with better adaptation so that the mean is zero and the standard deviation varies at different stages, ensuring that the randomly distributed seeds are very close to their parent plant. In competitive Elimination, in the invasive weed algorithm, after several repetition steps, the number of clonal seeds reaches its maximum due to reproduction, and then a mechanism is used to remove weak seeds. Whilst the maximum number of seeds allowed is produced, every seed can produce new seeds, in keeping with the method noted in the previous steps, that can be scattered inside the space in question. When all the seeds are distributed within the location, every seed is given a rating, and inside the last level, the seeds with a lower rating are removed so that the seed population remains at the maximum. These steps are repeated until the seeds converge step by step to the optimal seed [49].

Hybrid Models (ANFIS-SFLA and ANFIS-IWO)
The main objective of the current study was application of the new hybrid models used for hydrological goals, the ANFIS-SFLA and ANFIS-IWO hybrid models, by comparison with the ordinary ANFIS to acquire an accurate and capable approach for simulating ETo at the study areas. The efficiency of the SFLA and IWO algorithms based on the ordinary ANFIS was defined by the optimized weight in the ANFIS by the shuffled frogleaping and invasive weed optimization algorithms. When a mathematical function reaches a certain fitness between ANFIS weights and the SFAL and IWO, then hybrid models (i.e., ANFIS-SFLA and ANFIS-IWO) stop, or when iterations reach the maximum number, then model stops. This approach allows the models reach to their maximum capabilities, and then the new hybrid model can have advantages of both the ANFIS and optimization algorithms for estimation [48]. Previous studies have proven that such coupled optimized techniques can provide better results in hydrological modeling [50][51][52]. Table 3 provides the optimal parameters related to the machine learning models used. Figure 4 demonstrates a schematic flowchart of the modeling process of current study.
comparison with the ordinary ANFIS to acquire an accurate and capable approach simulating ETo at the study areas. The efficiency of the SFLA and IWO algorithms ba on the ordinary ANFIS was defined by the optimized weight in the ANFIS by the shuf frog-leaping and invasive weed optimization algorithms. When a mathematical func reaches a certain fitness between ANFIS weights and the SFAL and IWO, then hyb models (i.e., ANFIS-SFLA and ANFIS-IWO) stop, or when iterations reach the maxim number, then model stops. This approach allows the models reach to their maxim capabilities, and then the new hybrid model can have advantages of both the ANFIS optimization algorithms for estimation [48]. Previous studies have proven that s coupled optimized techniques can provide better results in hydrological modeling [ 52]. Table 3 provides the optimal parameters related to the machine learning models u Figure 4 demonstrates a schematic flowchart of the modeling process of current study

Evaluation of the Model Performance
The present study proposed two novel hybrid models for modeling the daily ETo via the combination of ANFIS with SFLA and IWO. So, the hybrid ANFIS-SFLA and ANFIS-IWO models were developed. In addition, four empirical models, including Hargreaves-Samani, Romanenko, Priestly-Taylor, and Valiantzas, were utilized. The modeling accuracies of classic ANFIS, the hybrid ANFIS-SFLA, and ANFIS-IWO, as well as the original and calibrated forms of the empirical models, were compared with each other utilizing five evaluation metrics, including the root-mean-square error (RMSE), relative RMSE (RRMSE), mean absolute error (MAE), coefficient of determination (R 2 ), and Nash-Sutcliffe efficiency (NSE). The mentioned metrics can be defined as follows: Water 2021, 13, 3489 Among the aforementioned error statistics, the RRMSE and NSE can illustrate the accuracy of any modeling approach as below: For the RRMSE: Excellent (RRMSE < 10%); good (10% < RRMSE < 20%); fair (20% < RRMSE < 30%), and poor (RRMSE > 30%) [53].

Results and Discussion
Firstly, the classic ANFIS was implemented at the study sites based on the input combinations defined in Table 4. As seen in the table, seven various scenarios (i.e., M1-M7) were taken into consideration in the modeling procedure. The RMSE, RRMSE, MAE, R 2 , and NSE statistical parameters obtained by the classic ANFIS at Tabriz and Shiraz stations are respectively presented in the first sections of Tables 5 and 6 for both training and test periods. It can obviously be seen that the M1-based ANFIS model with minimal inputs (i.e., temperature components) presented the worst performance at both of the study stations. Generally, the accuracy of classic ANFIS was enhanced by increasing the number of input predictors; however, there was a negligible difference between the performances of the M1 and M2 models at the studied regions. On the other hand, the inclusion of the wind speed (U2) in the M2 model (i.e., M3 model) led to further improvement of the performance of the classic ANFIS. This outcome confirmed the results of previous works [55,56] in that, although wind speed solely showed the lowest accuracy in ETo modeling, considering this parameter along with the other meteorological parameters improved the ETo modeling performance. In addition, slight discrepancies were observed when comparing the accuracy of the M3-M6 models of the classic ANFIS. The M7 models developed at the study locations outperformed the M1-M6 models. This model utilized full inputs for modeling the ETo. The difference between the M6 and M7 models was that two radiation components (i.e., Rn and Ra) were considered in the M7 model. The Ra was calculated by means of the Julian day and latitude of the location. Moreover, minimum temperature, maximum temperature, and sunshine duration ratio were required to compute the Rn. The required parameters for calculating the Ra and Rn were generally available at all locations. Therefore, we concluded that considering them could be of use in enhancing the accuracy of modeling techniques in estimating the ETo.  Novel hybrid models were then developed and proposed to improve the performance of the classic ANFIS. For this, two optimization algorithms, including the SFLA and IWO, were coupled on the classic ANFIS. The statistical results of the novel hybrid models at Tabriz and Shiraz stations are shown in the second and third sections of Tables 5 and 6, respectively. The results obtained for the classic ANFIS were also observed in the hybrid ANFIS-SFLA and ANFIS-IWO models. The same results were obtained: the worst performance by M1 models; enhanced performance of hybrid models by increasing the number of variables/predictors as inputs; and the superiority of the M7 models compared with the M1-M6 models. As was apparent, hybridizing the ANFIS with SFLA and IWO algorithms remarkably improved the forecasting efficacy of classic ANFIS. For an instance, the error statistics of the RMSE, RRMSE, MAE, R 2 , and NSE for the M7-based classic ANFIS during the test phase were obtained respectively as 0.44 mm day −1 , 11.26%, 0.35 mm day confirmed the outcomes of previous works [27][28][29][30]37,57,58] in that coupling the optimization algorithms and ML techniques could improve modeling of the ETo time series in comparison with the standalone ML techniques.  Figures 5 and 6 show a comparison of the superior hybrid models (i.e., M7 models of ANFIS-SFLA and ANFIS-IWO) and the corresponding M7 model of the classic ANFIS at Tabriz and Shiraz stations, respectively. Lesser dispersions of the data points around the exact line (i.e., 1:1) in the proposed hybrid models, particularly for the ANFIS-SFLA, confirmed the higher accuracy of the coupled models over the single ANFIS. On the other hand, the time series plots of the hybrid models denoted the reliable capability of the proposed models for capturing the daily FAO-56 PM ETo values in comparison with the classic ANFIS. This point is very clear about the modeling of peak points, in that the modeled values of daily ETo via the hybrid models were much closer to the daily FAO-56 PM ETo data. In this regard, the classic ANFIS presented poor performances. Furthermore, hydrograph plots clearly showed the efficiencies of the classic ANFIS. The hybrid ANFIS-SFLA and ANFIS-IWO performed better in modeling the low and medium values than the peak points.
In addition to the application of ANFIS and proposing the novel hybrid models, four types of empirical models ( Table 2) were used in this study. The values of error criteria obtained by the original versions of the equations at the Tabriz and Shiraz stations are tabulated in Tables 7 and 8, respectively. The Romanenko equation yielded the weakest performance among the empirical equations, especially at Shiraz station. Additionally, the Valiantzas equation was the best-performing empirical model at the study sites. It was obvious that the empirical models should be calibrated in the study areas to provide the best performance. Table 9 reports the calibrated versions of the empirical models used. According to the statistical indicators obtained for the calibrated empirical models in Tables 7 and 8  In addition to the application of ANFIS and proposing the novel hybrid models, four types of empirical models (Table 2) were used in this study. The values of error criteria obtained by the original versions of the equations at the Tabriz and Shiraz stations are tabulated in Tables 7 and 8, respectively. The Romanenko equation yielded the weakest performance among the empirical equations, especially at Shiraz station. Additionally, the Valiantzas equation was the best-performing empirical model at the study sites. It was obvious that the empirical models should be calibrated in the study areas to provide the best performance. Table 9 reports the calibrated versions of the empirical models used. According to the statistical indicators obtained for the calibrated empirical models in Tables 7 and 8, it can be observed that the calibration procedure significantly improved   Note: bold values denote the error criteria for the best-performing empirical model at the training and test phases. Table 9. The calibrated forms of empirical models applied in this study. Here, the modeling performances of the classic ANFIS, hybrid ANFIS-SFLA and ANFIS-IWO models, and empirical models in the original and calibrated forms were compared with each other. It is obvious from Tables 4 and 5 that the hybrid ANFIS-SFLA and ANFIS-IWO models developed at the study sites presented superior results compared with the classic ANFIS; however, the ANFIS-SFLA outperformed the ANFIS-IWO. Moreover, as mentioned previously, the calibrated empirical models yielded better estimates of the daily ETo than the original forms of empirical models. A performance assessment of the classic and proposed coupled models with the empirical models in their original and calibrated forms revealed that the original empirical models generally provided the weakest performances. Among the empirical models applied, the calibrated Valiantzas model illustrated better accuracy than the M1-M6 models of classic ANFIS at both stations, M1 and M2 models of ANFIS-SFLA, M1-M5 models of ANFIS-IWO at both phases of the Tabriz station; M1-M2 (training phase) and M1-M5 (test phase) models of ANFIS-SFLA; as well as M1, M2, and M4 models of ANFIS-IWO at the training stage and M1-M6 models of ANFIS-IWO at the test stage of Shiraz station. We concluded that the calibrated version of the Valiantzas model could be of use in modeling the daily ETo with a high degree of precision; therefore, this model could be compatible with the hybrid models proposed in the present study. In general, the M7 models of ANFIS, ANFIS-SFLA, and ANFIS-IWO performed much better than the calibrated empirical models; however, the M7 models of ANFIS-SFLA developed at the Tabriz and Shiraz stations were the best-performing techniques for modeling of the daily ETo with a dependable accuracy.  Figures 7 and 8 show the comparative graphs of the daily ETo estimates from the FAO-56 PM method against the modeled data by the best empirical model (i.e., calibrated Valiantzas) and its original form. Fewer dispersions were visible around the exact line (1:1) in the calibrated Valiantzas model compared with the original Valiantzas. Moreover, overestimation (i.e., red lines in the hydrograph plots) can be seen in many of data points in the original Valiantzas at the study locations, especially at Shiraz station; however, this overestimation was corrected by the calibrated version of this model.  In this section, the modeling accuracy of all applied models is qualitatively described based on the various ranges for the RRMSE and NSE criteria. In terms of the NSE statistic, the accuracy of the classic ANFIS, the hybrid ANFIS-SFLA and ANFIS-IWO, and the calibrated forms of empirical models was located in the "very good" class, since the values of NSE for the mentioned models were in the range of 0.75-1.0. Similarly, the performance of the original Hargreaves-Samani and Valiantzas at Tabriz and Shiraz stations, as well as the Priestly-Taylor equation at Shiraz station, belonged to the "very good" category. The accuracy class of the original Priestly-Taylor model at Tabriz station was "good", since 0.65 < NSE ≤ 0.75. Finally, the original Romanenko belonged to the "unsatisfactory" class at both locations. The performance of applied models was then classified based on the RRMSE criterion. The M1 and M2 models of the classic ANFIS at Tabriz station, the M1 model of the classic ANFIS at Shiraz station, as well as the M1 models of ANFIS-SFLA and ANFIS-IWO at both of the stations were located in the "fair" degree (20% < RRMSE < 30%). In addition, at Tabriz station, the performance of the M6 and M7 models of ANFIS-SFLA during the training stage, M4-M7 models of ANFIS-SFLA during the test stage, the M7 model of ANFIS-IWO in training, and the M6-M7 models of ANFIS-IWO in the test period were classified in the "excellent" class. For the case of Shiraz station, the M7 models of classic ANFIS, M3-M7 models of ANFIS-SFLA (training stage) and M4-M7 models of this technique, as well as the M6-M7 models of ANFIS-IWO, were found to belong to the "excellent" class. Regarding the classes of empirical models, it can be clearly seen in Tables 6 and 7 that the accuracy class of the original Romanenko model was poor, especially at Shiraz station. As is clear, the performance class of excellent was not observed for the empirical models in their original and calibrated forms, with the exception of the calibrated Valiantzas in the test stage at Shiraz station. The best class of empirical models was obtained by the original and calibrated versions of the Valiantzas model at Tabriz and Shiraz stations, and the calibrated Priestly-Taylor model at Shiraz station. Here, the modeling performances of the classic ANFIS, hybrid ANFIS-SFLA and ANFIS-IWO models, and empirical models in the original and calibrated forms were compared with each other. It is obvious from Tables 4 and 5 that the hybrid ANFIS-SFLA and ANFIS-IWO models developed at the study sites presented superior results compared with the classic ANFIS; however, the ANFIS-SFLA outperformed the ANFIS-IWO. Moreover, as mentioned previously, the calibrated empirical models yielded better estimates of the daily ETo than the original forms of empirical models. A performance assessment of the classic and proposed coupled models with the empirical models in their original and calibrated forms revealed that the original empirical models generally provided the weakest performances. Among the empirical models applied, the calibrated Given that the ETo equations were developed in specific areas and with the climatic conditions of that location, it is therefore necessary to calibrate these equations with reliable reference data before use in other areas. In other words, none of the empirical equations met the needs of all climatic conditions, and only met the specific conditions in which they were developed. Besides, evapotranspiration as a climatic variable is affected by their regional and climatic characteristics. For this, calibration of empirical models is a basic requirement to achieve their better performance. The better accuracy of calibrated equations over their original ones justify this issue.

Stations Empirical Models Equations
Moreover, the same procedure should be taken into consideration when assessing the entire performance of the models used. In this context, the empirical models should be calibrated through the application of a training dataset, as used for the machine learning models. Finally, the performance of each empirical model can be evaluated by different types of supervised learning algorithms via machine learning models. Machine learning models, by finding and learning from patterns in a dataset, can understand the relationship between ETo and other meteorological variables, and they can be used as strong tools for prediction. Especially in the case of the limited availability of data, machine learning models can provide a satisfactory simulation, even with a minimum dataset. In addition, optimization algorithms can be used as boosting tool for improving the ability of the ordinary ANFIS model for ETo prediction. We recommend that other potential researchers apply different types of newly developed machine learning models to evaluate and reach the highest accuracy in ETo prediction.

Conclusions
An attempt was made in the present study to improve the modeling accuracy of the ANFIS in daily ETo estimation. The Tabriz and Shiraz stations in Iran were selected as the two studied sites. The classic ANFIS was coupled with optimization algorithms such as SFLA and IWO. So, novel hybrid ANFIS-SFLA and ANFIS-IWO models were proposed and implemented. The classic and hybrid models were developed under seven models (M1-M7) utilizing different numbers of climatic variables as inputs. Evaluation results showed that the developed novel models demonstrated superiority over the classic ANFIS; however, the hybrid model ANFIS-SFLA provided better performances than ANFIS-IWO. Generally, the performances of the classic and coupled models were improved with an increasing number of predictors/variables. The M1 models with minimal inputs and M7 models with full predictors were the worst and best models in modeling the daily ETo, respectively. The M7 models of the hybrid ANFIS-SFLA were the best-performing models for precise modeling of the daily ETo time series at the studied sites. Four empirical models were also applied in this study, and then the performances of the empirical models were assessed in their original and calibrated forms. It was found that calibrating the empirical equations could improve the accuracy of the estimated daily ETo over their original forms. Among the empirical models, the Romanenko model showed the weakest results in its both original and calibrated versions. In contrast, the Valiantzas was the best model. A performance assessment of the classic ANFIS, hybrid ANFIS-SFLA, and ANFIS-IWO, as well as the original and calibrated empirical models, demonstrated that the implemented hybrid models, followed by the classic ANFIS (M3-M7 models), generally outperformed the empirical models. In addition, different empirical-based methods had different complexities, and some of them required more input data, which might be difficult to achieve. Therefore, there is a need to develop/improve methods with varying inputs to adapt to the real situation considering the availability of the data. This study hybridized an ML-based model (i.e., ANFIS) with the optimization algorithms, including the SFLA and IWO. Future research works could implement a variety of hybrid models for ETo modeling through coupling the ANFIS and SVM with the other types of bio-inspired optimizers, including the firefly algorithm (FA), whale optimization algorithm (WOA), krill herd algorithm (KHA), dragonfly algorithm (DFA), grasshopper optimization algorithm (GOA), etc. Similar to the case studies considered in the current work, the climate of a large part of Iran is arid or semiarid. Therefore, the estimation performance of implemented models in capturing the ETo time series can be evaluated in climates similar to that of Iran (and other parts of the world), and the obtained results could be compared with our findings in this study.