Recent Advances in Evapotranspiration Estimation Using Artiﬁcial Intelligence Approaches with a Focus on Hybridization Techniques—A Review

: Di ﬃ culties are faced when formulating hydrological processes, including that of evapotranspiration (ET). Conventional empirical methods for formulating these possess some shortcomings. The artiﬁcial intelligence approach emerges as the best possible solution to map the relationships between climatic parameters and ET, even with limited knowledge of the interactions between variables. This review presents the state-of-the-art application of artiﬁcial intelligence models in ET estimation, along with di ﬀ erent types and sources of data. This paper discovers the most signiﬁcant climatic parameters for di ﬀ erent climate patterns. The characteristics of the basic artiﬁcial intelligence models are also explored in this review. To overcome the pitfalls of the individual models, hybrid models which use techniques such as data fusion and ensemble modeling, data decomposition as well as remote sensing-based hybridization, are introduced. In particular, the principles and applications of the hybridization techniques, as well as their combinations with basic models, are explained. The review covers most of the related and excellent papers published from 2011 to 2019 to keep its relevancy in terms of time frame and ﬁeld of study. Guidelines for the future prospects of ET estimation in research are advocated. It is anticipated that such work could contribute to the development of agriculture-based economy.


Introduction
In 2019, the United Nations [1] reported that world population had reached 7.7 billion.In the same report, it was predicted that the world population would continue to grow, and the forecasted figures are 8.5 billion, 9.7 billion and 10.9 billion for the years 2030, 2050, and 2100, respectively.Consequently, agricultural activity that contributes to the population food supply increases progressively and becoming more important.Agricultural activity is regarded as the anthropic activity that depletes the highest amount of water [2].Therefore, a good estimation of the water cycle can assist in efficient agricultural planning, water catchment and irrigation strategy, and thus optimize the utilization of water.Evapotranspiration (ET), as suggested by the term itself, is the combination of evaporation of water from land and plant surfaces and transpiration from vegetation through the leaves' stomata [3].ET is a natural event that affects the hydrological cycle, which is believed to be highly complex that involves several nonlinear processes [4].There are numerous factors that govern the rate of evapotranspiration and these include temperature, solar radiation, air humidity, and wind speed [5].
ET is classified as a physical phenomenon.Hence, the rate of ET can be measured and represented by a numerical value.Traditionally, lysimeters are used to measure ET directly without any assumptions [6].It works by measuring the rate of water percolation through soil on the basis of mass transfer [7].Non-weighable lysimeters are normally used for long-term observation, whereas weighable lysimeters can provide readings with greater temporal resolution [8].It was claimed that lysimeters could provide the most accurate measurement of ET.In fact, several studies that involved ET estimation in the early years, utilized readings of lysimeters as their calibration standards [9,10].Unfortunately, the construction, maintenance, and use of lysimeters involve high financial burden and ecological footprints.The limited amount of lysimeters also hindered the measurement of ET at distinct locations [3].In view of such situation, the development of other more convenient estimation tools to estimate ET with higher accuracy and lower cost becomes the mode of choice.
Before the age of artificial intelligence, empirical equations were constantly developed to cater the need of accurately estimating ET.However, in the absence of point measurement, direct acquisition of ET value is virtually impossible.According to Pereira et al. [11], the term of reference evapotranspiration (ET 0 ) was introduced to overcome the problem.ET 0 is an estimate of the amount of water loss or consumption based on the weather primary effect.In the coefficient-reference system, a crop coefficient (K c ) will be multiplied with the ET 0 to obtain the potential evapotranspiration (PET) for that particular growth period.
Over the years, numerous efforts had been pursued to obtain ET 0 to the extent of higher accuracy with lower computational complexity.Among the vast number of conventional models for ET 0 estimation, some of the most notable empirical approaches are discussed in this review.The Penman-Monteith (PM) model [12], which had its early beginnings in 1948, is regarded as one of the most widely employed models in the estimation of ET 0 .Furthermore, the Food and Agriculture Organisation (FAO) of the United Nations, in their publication "Crop evapotranspiration-Guidelines for computing crop water requirements-FAO Irrigation and Drainage Paper 56" (FAO56 in short), had followed up and revised the calculation of ET 0 and PET based on the PM model [13].This indirectly made the PM model as a standard in estimating ET 0 , and it was used in a number of research works as a standard for comparison [14][15][16].However, the sheer number of parameters and the complexity of their investigation, derivation, and calculation do put paid to its use in a world of increasing knowledge advancement and digital advance.
Ever since, the effort of providing simpler solution than the PM model continued.The Hargreaves-Samani (HS) model is another method to estimate ET 0 , which was proposed by Hargreaves and Samani [17].The HS model had been employed in various occasions to estimate ET 0 .However, the constants and coefficients involved in the HS model can be site-specific.Hence, efforts were also done to calibrate the coefficients in HS model to suit local needs and this can be laborious.Luo et al. [18] validated the utilization of calibrated the HS model in Guilin, Kaifeng, Ganyu, and Yinchuan to predict ET 0 by using forecasted temperature.The results showed that, although the prediction accuracy was sufficiently high (87.54% to 96.90%), when the ET 0 is relatively high or relatively low, the Hargreaves model would fail as it did not consider the effects of wind speed and relative humidity.In Veneto, Italy (sub-humid climate), a study was done to compare the performance of calibrated and uncalibrated the HS model [19].It was found that the standard HS model would overestimate ET 0 value, which led to the tendency of excess water requirements.Calibrating the empirical parameters of the HS model successfully reduced the overestimation from 18.9% to 2.6% thus justified the importance of calibration.
The Priestly-Taylor (PT) Model was proposed by Priestley and Taylor [20].Similar to the HS model, the PT model instead ignores the sensitivity of ET 0 towards vapor pressure and air movement.On top of that, the PT model even further simplified the PM model where temperature parameters are also removed.Due to the nature of the model that neglects the effect of temperature, the PT model tends to underestimate ET 0 when compared to the PM model [21].This suggestion was further supported Agronomy 2020, 10, 101 3 of 33 by the poor performance of PT model at dry climate regions.Moreover, the missing aerodynamic component in the PT model also limited its spatial applicability.
Apart from the three models discussed previously, there are some other less popular models to estimate ET 0 .These include the radiation-based Turc model [22] and temperature-based Thornthwaite model [23].They are seldom used as a standard for comparison unlike the previous three models.As shown in the literature, conventional models of ET 0 estimation generally have two main shortcomings: highly data intensive and strongly dependent on geographical location or not spatially robust [24,25].Researchers attempted to solve these problems by modifying or calibrating available models to suit their needs.In addition, researchers also tried to produce a more powerful prediction tool that can handle highly complex and nonlinear processes like ET.This review provides an outlook on the emerging prediction tool, which is the artificial intelligence model.
The "black-box" nature of artificial intelligence model makes it very useful to map the relationship between inputs and outputs even in the absence of relevant scientific knowledge.This leads to its application in ET estimation in order to replace the data intensive and relatively less adaptive empirical models.Increasing popularity of artificial intelligence application in ET estimation can be seen in Figure 1, which is concluded from Scopus.From 2011 to 2015, the number of publications related to ET prediction using artificial intelligence model appears to be quite stagnant.However, from 2016 onwards (except for 2017), there is a steady increase in the number of publications especially in 2018 and 2019.This shows the wide acceptance of such technique by global researchers that affirms the usefulness of artificial intelligence modeling.Having that said, there are certain flaws that exist in the basic artificial intelligence models.Fortunately, this could be overcome by using hybridization techniques to combine different models together in order to produce predictions with better accuracy as well as consistency.
Agronomy 2020, 10, x FOR PEER REVIEW 3 of 32 Apart from the three models discussed previously, there are some other less popular models to estimate ET0.These include the radiation-based Turc model [22] and temperature-based Thornthwaite model [23].They are seldom used as a standard for comparison unlike the previous three models.As shown in the literature, conventional models of ET0 estimation generally have two main shortcomings: highly data intensive and strongly dependent on geographical location or not spatially robust [24,25].Researchers attempted to solve these problems by modifying or calibrating available models to suit their needs.In addition, researchers also tried to produce a more powerful prediction tool that can handle highly complex and nonlinear processes like ET.This review provides an outlook on the emerging prediction tool, which is the artificial intelligence model.
The "black-box" nature of artificial intelligence model makes it very useful to map the relationship between inputs and outputs even in the absence of relevant scientific knowledge.This leads to its application in ET estimation in order to replace the data intensive and relatively less adaptive empirical models.Increasing popularity of artificial intelligence application in ET estimation can be seen in Figure 1, which is concluded from Scopus.From 2011 to 2015, the number of publications related to ET prediction using artificial intelligence model appears to be quite stagnant.However, from 2016 onwards (except for 2017), there is a steady increase in the number of publications especially in 2018 and 2019.This shows the wide acceptance of such technique by global researchers that affirms the usefulness of artificial intelligence modeling.Having that said, there are certain flaws that exist in the basic artificial intelligence models.Fortunately, this could be overcome by using hybridization techniques to combine different models together in order to produce predictions with better accuracy as well as consistency.Nonetheless, a review article that focuses on the discussion of the application of artificial intelligence modeling is found to be absent in current literature.This situation has attracted the concern and focus of the authors of this review paper.It is of paramount importance to produce a compilation of related articles together with critical review as a reference and guidelines for future researchers in this field.This shall be regarded as a noble contribution and a starting point to facilitate the progress of relevant scientists and researchers, either as a starting or in their continuous efforts.To the agricultural water and water resources practitioners, the outcome of their research can be used for the decision-making process, which includes the design of irrigation schedules, water resources allocation, and management [26].Careful, precise, and appropriate decisions are ultimately important for the sustainability of anthropic activities, especially in the context of agricultural economy.This can also generate more data for the use of research in years to follow.On top of that, published research articles are archived in scientific journals for future references.Figure 2 shows the virtuous cycle between review papers, new research, and decision-making processes.Nonetheless, a review article that focuses on the discussion of the application of artificial intelligence modeling is found to be absent in current literature.This situation has attracted the concern and focus of the authors of this review paper.It is of paramount importance to produce a compilation of related articles together with critical review as a reference and guidelines for future researchers in this field.This shall be regarded as a noble contribution and a starting point to facilitate the progress of relevant scientists and researchers, either as a starting or in their continuous efforts.To the agricultural water and water resources practitioners, the outcome of their research can be used for the decision-making process, which includes the design of irrigation schedules, water resources allocation, and management [26].Careful, precise, and appropriate decisions are ultimately important for the sustainability of anthropic activities, especially in the context of agricultural economy.This can also generate more data for the use of research in years to follow.On top of that, published research articles are archived in scientific journals for future references.Figure 2 shows the virtuous cycle between review papers, new research, and decision-making processes.Therefore, this paper aims to play the role of a comprehensive guidance for future research, from several aspects.In Section 2, different types of data and their sources are discussed thoroughly.The advantages and disadvantages of the data types are also summarized in this paper.Section 3 focuses on the explanation of basic artificial intelligence models in terms of their characteristics and application in ET estimation.Hybridization techniques for artificial intelligence models are reviewed in Section 4. Details of each and every technique are explained with respect to their principle and suitability in different situations.Finally, the future prospect of the use of artificial intelligence in ET estimation is presented in Section 5 as a prelude to the concluding remarks of this review paper.

Data Types
In order to proceed to the training of artificial intelligence models, the selection and acquisition of data are inseparable processes.Suitable data sets or parameters vary according to different regions as well as climate patterns.In order to aid future researchers during the process of data acquisition, significant parameters (though not exhaustive) to estimate ET for different climate patterns are summarized in Table 1 [27][28][29][30][31][32][33][34][35][36][37].From Table 1, it can be seen that temperature and radiation data are indispensable for ET estimation.This is strongly in agreement with the background theory whereby temperature (indicating heat energy) and radiation are the two main driving forces of the energy consuming ET process [7].In general, the raw data can be obtained from two main sources, namely the ground observation data and the remote sensing data [38].In essence, these two types of datasets act as complimentary data to one another.To obtain ground observation data, meteorological and weather stations are set Therefore, this paper aims to play the role of a comprehensive guidance for future research, from several aspects.In Section 2, different types of data and their sources are discussed thoroughly.The advantages and disadvantages of the data types are also summarized in this paper.Section 3 focuses on the explanation of basic artificial intelligence models in terms of their characteristics and application in ET estimation.Hybridization techniques for artificial intelligence models are reviewed in Section 4. Details of each and every technique are explained with respect to their principle and suitability in different situations.Finally, the future prospect of the use of artificial intelligence in ET estimation is presented in Section 5 as a prelude to the concluding remarks of this review paper.

Data Types
In order to proceed to the training of artificial intelligence models, the selection and acquisition of data are inseparable processes.Suitable data sets or parameters vary according to different regions as well as climate patterns.In order to aid future researchers during the process of data acquisition, significant parameters (though not exhaustive) to estimate ET for different climate patterns are summarized in Table 1 [27][28][29][30][31][32][33][34][35][36][37].From Table 1, it can be seen that temperature and radiation data are indispensable for ET estimation.This is strongly in agreement with the background theory whereby temperature (indicating heat energy) and radiation are the two main driving forces of the energy consuming ET process [7].In general, the raw data can be obtained from two main sources, namely the ground observation data and the remote sensing data [38].In essence, these two types of datasets act as complimentary data to one another.To obtain ground observation data, meteorological and weather stations are set up for continuous collections.FLUXNET, which consists of micrometeorological tower sites, was set up as an initiative to cope with the large demand of ground observation data.The micrometeorological tower sites are mostly concentrated in North America, Europe and Asia, whilst some tower sites are also available in South America, Oceania, and Africa regions.Various types of data could be obtained from the FLUXNET database, including different locations, durations, and time scales.It was claimed that FLUXNET would continue its effort to expand in order to increase the geographical coverage [39].The main advantage of using ground observation data is that it provides direct measurement, which does not need further imputation or processing in order to retrieve the intrinsic information.Explicitly, the data acquired from weather stations or flux towers are ready to be used without any pre-processing steps.However, measurements of weather station only represent the conditions of the tower's location and its close proximity.In other words, deduction of the weather conditions for a larger region can be a challenging task [40].
Remote sensing emerges as a solution to cover the problem stated.This can be realized by either subtracting the sensible heat flux from net radiation (residual method) or computing the surface resistance with a vegetation index-surface temperature scatterplot [41].Successful estimation of ET from satellite images opened the door for its forecasting using artificial intelligence models.The major sources of satellite images could be obtained from Landsat, Moderate Resolution Imaging Spectroradiometer (MODIS) and the Global Land Surface Satellite (GLASS).Remote sensing data allow for more information to be captured by satellite images.In fact, the remote sensing method can be used to derive vegetation information as well as different types of radiation, which are useful for ET estimation.Nonetheless, estimation using remote sensing data has to be well calibrated in order to reflect accurate readings.This can be done through the integration of land data assimilation system (LDAS).LDAS forces land surface models with the observed fields and removes biases in forcing based on atmospheric models.In this way, unrealistic model states can be corrected.By merging measurements (from satellite and ground observations) with model estimations, imperfections in observations and errors in model prediction can be minimized [42].Table 2 summarizes the types of data for ET estimation and their sources, along with the obtainable parameters, advantages, and disadvantages.
Careful selection of data is important for an excellent model training process.Identification and determination of data sources, parameters, and data points are keys to developing an outstanding artificial intelligence model.Hence, prior to model selection, one should always first consider the data available.In the next section, this review will lead the readers to discover some of the more common artificial intelligence models used in ET estimation.

Artificial Neural Network (ANN)
Artificial neural network (ANN), as suggested by the name, is a variation of the machine learning model that resembles the neural network of human brain.In the latter, neurons are connected to each other via synapses.In ANN, synapses are replaced by weights and biases connections.This helps to map the relationship between inputs and outputs [43].A Multilayer perceptron (MLP) is one of the earliest types of ANN, and it was introduced by Rosenblatt [44].However, it was not until the year 1989 that MLP was proven to be able to approximate functions after training [45].In 1992, in tandem with the advancement of computer development, the MLP showed better performance than traditional statistical method for the first time [46].
The application of the MLP in estimation of ET 0 was initiated by Kumar et al. [47].In the study, the authors collected the six essential parameters for estimating ET 0 using the PM model in Davis, California.This set of data was dated from 1 January 1990 to 30 June 2000.At the same time, a second set of data dated from 1 January 1960 to 31 December 1963 was obtained together with their corresponding lysimeter readings.The authors intended to compare the performance of the MLP with different architecture trained with different target data.The outcome of the study showed that, when all the six parameters were fed as input of the MLP model, a single layer of seven hidden neurons with 5000 learning cycles was ample to represent the nonlinear process of ET.The training of the model using lysimeter measurement as target produced slightly more accurate estimations than using the PM model.This not only justified the ability of the MLP to map inputs to output in the absence of clear relationship, but also at the same time laid down a foundation that estimation of the PM model is sufficiently good to be used as a target for the model training.
The success achieved by Kumar et al. [47] attracted the attention of researchers to further study the capability of the MLP in estimating ET 0 .Attempts to reduce the number of required parameters were continuously done.Rahimikhoob [48] trained the MLP with only temperature and radiation data in eight different stations in southern coast of Caspian Sea located in northern Iran (humid subtropical climate).By using only maximum temperature, minimum temperature, and global radiation, the trained models were compared with the HS model and the calibrated HS model.The authors had wished to make a comparison between the MLP and empirical models with limited climatic data (both were temperature and radiation based).The study proved that, even in the case where climatic dataset was incomplete, the prediction of the MLP model was still promising, as the lack of flexibility in the empirical model had the tendency of either underestimating or overestimating.Antonopoulos and Antonopoulos [49] conducted their study in a mountainous area in West Macedonia of Greece.The authors continuously removed variables one by one from the MLP model in order to investigate the ability of the MLP in estimating ET 0 with limited climatic parameters.The performance of the MLP contrasted with that of the PT model, the Makkink model, and the HS model.The study showed that, even in the case of two parameters (temperature and radiation), the MLP still outperformed Makkink model and HS model while having comparable performance with the PT model.Some other studies reported by other literature also showed that the MLP could give better estimation than equivalent conventional empirical models in the case of limited climatic parameters, in the four main classes of climate regions such as semi-arid [50,51], arid [31,52,53], humid, and semi humid regions [54].
The introduction of the MLP had also encouraged the establishment of other forms of the ANN model.Some of the examples are the radial basis function network (RBF) [55], generalized regression neural network (GRNN) [56], back-propagation neural network (BPNN) [45] and extreme learning machine (ELM) [57].These algorithms achieved promising performances in ET 0 estimation.The characteristics of each ANN model are provided in Table 3.The RBF was first used to convert pan evaporation data into ET0 [58].This study proved that the major obstacle of empirical estimation, which is the data dependency can be resolved.The RBF network used in the study required only pan evaporation and radiation data; however, it was able to achieve higher accuracy than both the Christen model and the PM model.

Artificial Neural Network Models Characteristics
Ladlani et al. [36] did a comparative study on the performance of the RBF and the GRNN to predict ET0 in Algiers of Algeria.Concurrently, the two ANN models were contrasted with empirical models (PT and HS model).In comparison, the GRNN model had the best performance in terms of low error and high correlation.The GRNN, which was evolved from the RBF, was proved, for the first time, to have more superior ability than the RBF.This could be due to the inclusion of a summation layer in the GRNN, which could enhance the estimation by the RBF.Performance of the GRNN in computing ET0 is constantly compared with other machine learning models [59,60].From the results in the literature, the GRNN did not possess any prominent advantageous over ELM, but was deemed to be a good alternative for conventional models.
Traore et al. [61] took a different approach to apply machine learning model in ET0 estimation where the BPNN model was used.In their study, meteorological data were collected from the Sudano-Sahelian zone.The HS model, which is a temperature based empirical model, was compared to BPNN model that was trained with only temperature data.The results showed that the artificial intelligence model outperformed the conventional HS model.On top of that, the authors also The RBF was first used to convert pan evaporation data into ET0 [58].This study proved that the major obstacle of empirical estimation, which is the data dependency can be resolved.The RBF network used in the study required only pan evaporation and radiation data; however, it was able to achieve higher accuracy than both the Christen model and the PM model.
Ladlani et al. [36] did a comparative study on the performance of the RBF and the GRNN to predict ET0 in Algiers of Algeria.Concurrently, the two ANN models were contrasted with empirical models (PT and HS model).In comparison, the GRNN model had the best performance in terms of low error and high correlation.The GRNN, which was evolved from the RBF, was proved, for the first time, to have more superior ability than the RBF.This could be due to the inclusion of a summation layer in the GRNN, which could enhance the estimation by the RBF.Performance of the GRNN in computing ET0 is constantly compared with other machine learning models [59,60].From the results in the literature, the GRNN did not possess any prominent advantageous over ELM, but was deemed to be a good alternative for conventional models.
Traore et al. [61] took a different approach to apply machine learning model in ET0 estimation where the BPNN model was used.In their study, meteorological data were collected from the Sudano-Sahelian zone.The HS model, which is a temperature based empirical model, was compared to BPNN model that was trained with only temperature data.The results showed that the artificial intelligence model outperformed the conventional HS model.On top of that, the authors also The RBF was first used to convert pan evaporation data into ET 0 [58].This study proved that the major obstacle of empirical estimation, which is the data dependency can be resolved.The RBF network used in the study required only pan evaporation and radiation data; however, it was able to achieve higher accuracy than both the Christen model and the PM model.
Ladlani et al. [36] did a comparative study on the performance of the RBF and the GRNN to predict ET 0 in Algiers of Algeria.Concurrently, the two ANN models were contrasted with empirical models (PT and HS model).In comparison, the GRNN model had the best performance in terms of low error and high correlation.The GRNN, which was evolved from the RBF, was proved, for the first time, to have more superior ability than the RBF.This could be due to the inclusion of a summation layer in the GRNN, which could enhance the estimation by the RBF.Performance of the GRNN in computing ET 0 is constantly compared with other machine learning models [59,60].From the results in the literature, the GRNN did not possess any prominent advantageous over ELM, but was deemed to be a good alternative for conventional models.
Traore et al. [61] took a different approach to apply machine learning model in ET 0 estimation where the BPNN model was used.In their study, meteorological data were collected from the Sudano-Sahelian zone.The HS model, which is a temperature based empirical model, was compared to BPNN model that was trained with only temperature data.The results showed that the artificial intelligence model outperformed the conventional HS model.On top of that, the authors also revealed that inclusion of wind speed data could effectively enhance the accuracy as compared to radiation and relative humidity.Other research works related to the BPNN included comparison of BPNN with gene programming in arid region [62], comparison with tree models [63], and reduction of input parameters [48].
The latest development of artificial intelligence models resulted in the introduction of the ELM as an option of ANN.In 2015, this variation of ANN was first used to estimate ET 0 in Iraq where the authors claimed that this region represented general atmospheric and geographical conditions [30].Similar to most of the available literature, the authors trained their ELM model by using the PM model estimation as target.Several different combinations of input parameters that consisted of temperature, wind speed, relative humidity, and radiation were tested to study the most favorable combination.Although the ELM and the BPNN showed comparable results, the authors opined that the ELM was preferable due to its efficient computation and great generalization ability.The fast iteration of ELM is due to the fact that only the number of hidden layer nodes have to be tuned and this in turn reduces the risk of overfitting.Their work was followed up by Gocic et al. [64], where they trained the ELM using empirical models with lesser input parameters.In their study, it was found that the ELM trained with the HS model was more superior to those that were trained with the PT model and the Turc model.With that being said, the difference between individual the ELM models were marginal.In comparison with the PM model estimated ET 0 , the ELM predictions had good correlation which justified that the ELM were feasible for such purpose.Subsequently, another study was also carried out to reduce the required input for ELM training [65].
Besides reducing the number of parameters required, researchers were also working towards identifying the optimum training algorithm of the ANN for ET 0 prediction [28].The study compared six learning algorithms of MLP, namely the Levenberg-Marquardt, Delta-Bar-Delta, Step, Momentum, ConjugateGradient and the QuickProp.For each algorithm, different combinations of input parameters were tested.At the same time, different activation functions such as the hyperbolic tangent, sigmoid, and linear functions.The investigation revealed that irrespective of input parameters, the Levenberg-Marquardt learning algorithm coupled with hyperbolic tangent function was the optimum setting for ET 0 estimation using the MLP.The major distinction of the Levenberg-Marquardt algorithm is that it includes the Gauss-Newton algorithm in its iterative process, which would lead to the search of global minimum, unlike the other algorithms which have higher risk to converge to local minima.
Recently, the investigations related to the ANN prediction of ET were focused on specific case studies.Instead of estimating ET 0 , researchers began to utilize the MLP to predict PET directly.Since PET relies heavily on the types of crops (different K c ), this resulted in the study to be very specific in terms of regions and plantations.For instance, Hashemi and Sepaskhah [66] obtained lysimeter readings from Kooshak Agricultural Research Station in southwest of Iran.They compared the performance of the MLP with the PM model and the radial basis function (RBF) model to estimate PET at a barley plantation.By only feeding sunshine hours, mean humidity, mean temperature, and wind speed as input, both the MLP and the RBF achieved better performance than the PM model.This breakthrough reduced the need of collecting data for K c computation which could be tedious.Similar work was also carried out on wheat and maize plantations, which demonstrated the advantages of MLP over conventional methods [67].However, such studies required lysimeter reading and perhaps leaf area index for training purposes.These data are not widely or easily available and thus suffice to say that it would affect the premise of utilizing the ANN in forecasting PET directly for plantations.
An assessment of the various papers that have been reviewed in this subsection reveals the following: 1.
The evolution of the ANN from the MLP to the ELM was due to the constant need of improving training methods and algorithms in order to obtain effective predictions with greater accuracy, better generalization and lesser dependency on input parameters.

2.
Within each and every variation of the ANN model, one could safely and easily deduce that the trend and focus of study would not deviate much from the following four aspects: a. Minimization of required input parameters, b.
Generalization of the ANN for wider spatial application, c.
Introduction of new input parameters, d.
Enhancement of ANN prediction ability.
It is believed that these four aspects stated above could revolutionise the prediction of ET 0 , with a more general model-without the need for much climatic data.

3.
Longer forecasting horizon could provide a good pre-requisite for effective water allocation strategy.The use of the ANN alone could sometimes be insufficient to provide a solution for the above aspects.

4.
The black-box operation of ANN could not offer an explanation to the complex ET process.
Therefore, the upcoming subsections will continue to review other artificial intelligence models used to estimate ET 0 in order to provide a complimentary solution to the shortcomings of ANN.

Support Vector Machine (SVM)
The support vector machine (SVM) is another popular algorithm used in machine learning modeling, especially when it is claimed to be powerful and robust in regression and classification tasks [68].Cortes and Vapnik [69] laid down the basic and foundation of the current SVM model.Instead of involving large number of neurons and iterations to infer the relationship between inputs and outputs, the SVM plots the datasets into a feature space.The relationship between inputs and outputs is predicted using kernel function, where problem complexity and accuracy can be optimized concurrently.
Since the ET 0 prediction is more likely a regression problem rather than a classification problem, a variation of the SVM, which is the support vector regression (SVR), is normally used.In the SVR, a loss function is used to define the deviation allowance as well as the function to approximate the targeted output [70].The working principle of SVM is shown in Figure 3.It is believed that these four aspects stated above could revolutionise the prediction of ET0, with a more general model-without the need for much climatic data.3. Longer forecasting horizon could provide a good pre-requisite for effective water allocation strategy.The use of the ANN alone could sometimes be insufficient to provide a solution for the above aspects.4. The black-box operation of ANN could not offer an explanation to the complex ET process.
Therefore, the upcoming subsections will continue to review other artificial intelligence models used to estimate ET0 in order to provide a complimentary solution to the shortcomings of ANN.

Support Vector Machine (SVM)
The support vector machine (SVM) is another popular algorithm used in machine learning modeling, especially when it is claimed to be powerful and robust in regression and classification tasks [68].Cortes and Vapnik [69] laid down the basic and foundation of the current SVM model.Instead of involving large number of neurons and iterations to infer the relationship between inputs and outputs, the SVM plots the datasets into a feature space.The relationship between inputs and outputs is predicted using kernel function, where problem complexity and accuracy can be optimized concurrently.
Since the ET0 prediction is more likely a regression problem rather than a classification problem, a variation of the SVM, which is the support vector regression (SVR), is normally used.In the SVR, a loss function is used to define the deviation allowance as well as the function to approximate the targeted output [70].The working principle of SVM is shown in Figure 3.According Raghavendra and Deka [70], the SVM was widely used in hydrology application, including the estimation of ET0.The advantages and strengths of the SVM include high robustness, capability to solve complex problems, less susceptible to overfitting, and it can provide a compact description of the model [71].The network structure of the SVM is illustrated in Figure 4.According Raghavendra and Deka [70], the SVM was widely used in hydrology application, including the estimation of ET 0 .The advantages and strengths of the SVM include high robustness, capability to solve complex problems, less susceptible to overfitting, and it can provide a compact description of the model [71].The network structure of the SVM is illustrated in Figure 4.The utilization of the SVM in predicting ET0 with ground observation data started as early as 2010 [72].The case study was done in California, which represented a Köppen-Geiger climate system.The predictions of the SVM were compared with the CIMIS Penman, HS model, Ritchie model, and the Turc model.The authors discovered that when all climatic parameters were available, the SVM outperformed all other models in all the stations studied.When wind speed and relative humidity were removed during the training of the SVM, the model underestimated ET0 but still had satisfying outcomes, whereby it only performed slightly worse than conventional the HS model.In fact, the authors also claimed that the SVM had better performance than the ANN, where the former incurred lower error and higher correlation with the standard PM model.
As stated earlier, the performance of the SVM greatly relies on the type of kernel functions chosen to transform the datasets before plotting them into the feature space.Selection of kernel functions can be done using a trial-and-error method.Mehdizadeh et al. [73] did a comprehensive study to compare the performance of the SVM using the RBF and polynomial kernel functions.The study revealed that the RBF kernel function could obtain more accurate results, but did not provide further explanation.A simple deduction that can be made is that the RBF function which represents a Gaussian distribution can be fitted well to the ET problem and datasets of the particular study.This suggestion is supported by the results of Mohammadrezapour et al. [74], where they showed that selection of kernel functions to estimate ET0 varied from case to case.In other words, there is no universal kernel function that is suited for all problems.Researchers who wished to estimate ET using the SVM should be prepared to include a tuning stage in order to identify suitable kernel functions.
Continuous efforts were done to study the limit of the SVM as well as comparing the SVM to other artificial intelligence models.While the pioneers that applied the SVM in estimating ET0 reported that the SVM performed better than ANN, some other literature opined otherwise [75,76].This is due to the nature of the SVM where a global optimum has to be located instead of converging to local optima as in ANN.This makes SVM a very generalizable model, but it would incur higher residuals.More often, the strengths of the SVM would be visualized in the case of limited climatic parameters.In a work done by Fan et al. [77], when only temperature and radiation data were available, the performance of the SVM could be on par with the ELM.In fact, in terms of accuracy and correlation, the SVM achieved better score than most of the hybrid models such as extreme gradient boosted model, random forest, and gradient boosted decision tree.Further discussions on hybrid models is given in a later part of this review.
Similar to the ANN, the SVM had been used to predict PET directly as this could reduce the extra effort of measuring/estimating Kc.This attempt was done by Shrestha and Shukla [78], where they trained their SVM models against lysimeter readings for pepper and watermelon crops.Instead of using conventional climatic parameters, the authors opted some interesting features which included days after transplant, irrigation frequency, water table depth, soil moisture, rainfall, and rainfall event as well as drainage and runoff frequency.According to the authors, the trained SVM model should be able to predict Kc and ET0, thereby making the computation of PET possible.It was observed that the SVM model was robust and well-generalized due to the fact that it could be The utilization of the SVM in predicting ET 0 with ground observation data started as early as 2010 [72].The case study was done in California, which represented a Köppen-Geiger climate system.The predictions of the SVM were compared with the CIMIS Penman, HS model, Ritchie model, and the Turc model.The authors discovered that when all climatic parameters were available, the SVM outperformed all other models in all the stations studied.When wind speed and relative humidity were removed during the training of the SVM, the model underestimated ET 0 but still had satisfying outcomes, whereby it only performed slightly worse than conventional the HS model.In fact, the authors also claimed that the SVM had better performance than the ANN, where the former incurred lower error and higher correlation with the standard PM model.
As stated earlier, the performance of the SVM greatly relies on the type of kernel functions chosen to transform the datasets before plotting them into the feature space.Selection of kernel functions can be done using a trial-and-error method.Mehdizadeh et al. [73] did a comprehensive study to compare the performance of the SVM using the RBF and polynomial kernel functions.The study revealed that the RBF kernel function could obtain more accurate results, but did not provide further explanation.A simple deduction that can be made is that the RBF function which represents a Gaussian distribution can be fitted well to the ET problem and datasets of the particular study.This suggestion is supported by the results of Mohammadrezapour et al. [74], where they showed that selection of kernel functions to estimate ET 0 varied from case to case.In other words, there is no universal kernel function that is suited for all problems.Researchers who wished to estimate ET using the SVM should be prepared to include a tuning stage in order to identify suitable kernel functions.
Continuous efforts were done to study the limit of the SVM as well as comparing the SVM to other artificial intelligence models.While the pioneers that applied the SVM in estimating ET 0 reported that the SVM performed better than ANN, some other literature opined otherwise [75,76].This is due to the nature of the SVM where a global optimum has to be located instead of converging to local optima as in ANN.This makes SVM a very generalizable model, but it would incur higher residuals.More often, the strengths of the SVM would be visualized in the case of limited climatic parameters.In a work done by Fan et al. [77], when only temperature and radiation data were available, the performance of the SVM could be on par with the ELM.In fact, in terms of accuracy and correlation, the SVM achieved better score than most of the hybrid models such as extreme gradient boosted model, random forest, and gradient boosted decision tree.Further discussions on hybrid models is given in a later part of this review.
Similar to the ANN, the SVM had been used to predict PET directly as this could reduce the extra effort of measuring/estimating K c .This attempt was done by Shrestha and Shukla [78], where they trained their SVM models against lysimeter readings for pepper and watermelon crops.Instead of using conventional climatic parameters, the authors opted some interesting features which included days after transplant, irrigation frequency, water table depth, soil moisture, rainfall, and rainfall event as well as drainage and runoff frequency.According to the authors, the trained SVM model should be able to predict K c and ET 0 , thereby making the computation of PET possible.It was observed that the SVM model was robust and well-generalized due to the fact that it could be successfully applied to both vine and erect plantations, and work well in distinct seasons (spring and fall) as well as different irrigation system (drip and sub-irrigation).The SVM not only produced closer estimations to lysimeter readings as compared to the standard estimation procedure [13]; it also beat the ANN and the relevance vector machine (RVM).The performance of the SVM was stable and consistent at each growth stage of the plantations.As a side note, the authors suggested that their SVM models identified that the evaporation and transpiration partition of plantations' PET could be represented by days after transplant, water table depth, rainfall events, and soil surface moisture.
From the reviewed literature, it can be inferred that the SVM has the potential to be reliable for accurate estimation of both ET 0 and PET.However, the literature also revealed that the performance of SVM could be strongly affected by the selection of kernel functions and quality of input data [70].This could also be justified by the contradicting findings of researchers on the comparison between the SVM and ANN.Computational cost is another concern of SVM application particularly when high dimensionality is involved.

Fuzzy Models
Introduced by Zadeh [79], fuzzy logic allows the description of data in such a way that a "degree of likeliness" can be given.In other words, by using fuzzy logic, instead of describing in terms of "either A or B", one can produce a membership degree between 0 and 1 so that the description looks like "partly A and partly B".Application of fuzzy logic requires an initial set up by experts to determine the type of distribution by selecting a membership function (usually Gaussian function is chosen).In addition, three major ingredients should be fed to fuzzy inference system (FIS), namely a set of fuzzy rule base, database which contains the membership functions and a mechanism (either Sugeno or Mamdani) to apply the fuzzy rules on input and output [80].The main difference between Sugeno and Mamdani fuzzy logic is the approach to compute the final output.The overall flow of an FIS is illustrated in Figure 5.
Agronomy 2019, 9, x FOR PEER REVIEW 13 of 32 fall) as well as different irrigation system (drip and sub-irrigation).The SVM not only produced closer estimations to lysimeter readings as compared to the standard estimation procedure [13]; it also beat the ANN and the relevance vector machine (RVM).The performance of the SVM was stable and consistent at each growth stage of the plantations.As a side note, the authors suggested that their SVM models identified that the evaporation and transpiration partition of plantations' PET could be represented by days after transplant, water table depth, rainfall events, and soil surface moisture.
From the reviewed literature, it can be inferred that the SVM has the potential to be reliable for accurate estimation of both ET0 and PET.However, the literature also revealed that the performance of SVM could be strongly affected by the selection of kernel functions and quality of input data [70].This could also be justified by the contradicting findings of researchers on the comparison between the SVM and ANN.Computational cost is another concern of SVM application particularly when high dimensionality is involved.

Fuzzy Models
Introduced by Zadeh [79], fuzzy logic allows the description of data in such a way that a "degree of likeliness" can be given.In other words, by using fuzzy logic, instead of describing in terms of "either A or B", one can produce a membership degree between 0 and 1 so that the description looks like "partly A and partly B".Application of fuzzy logic requires an initial set up by experts to determine the type of distribution by selecting a membership function (usually Gaussian function is chosen).In addition, three major ingredients should be fed to fuzzy inference system (FIS), namely a set of fuzzy rule base, database which contains the membership functions and a mechanism (either Sugeno or Mamdani) to apply the fuzzy rules on input and output [80].The main difference between Sugeno and Mamdani fuzzy logic is the approach to compute the final output.The overall flow of an FIS is illustrated in Figure 5.The history of applying fuzzy logic to estimate ET0 began in 2009.Keskin et al. [81] forecasted the pan evaporation of Lake Eğirdir in Turkey using ground observation climatic data.A comparable study was done in Karso watershed of India [82].The authors did not only study the feasibility of fuzzy logic predicting pan evaporation, but also the performance of fuzzy logic as compared to the ANN, the least-squared SVR and the adaptive-neuro fuzzy inference system (ANFIS), was also evaluated.The authors remarked that the fuzzy logic model emerged as one of the best models for pan evaporation estimation.This study stressed the importance of fuzzy rules in producing good estimations.Successful application of fuzzy logic shall have good membership functions as foundation.The tuning of membership function not only requires expert knowledge, but is also timeconsuming, especially for a complex phenomenon like ET that can be affected by a number of parameters.Hence, the ANN acts as a complimentary to the fuzzy logic to form an ANFIS [83].Application of the ANFIS for ET0 estimation was first done by Kisi and Öztürk [84] and there are a number of recent works showing promising results [85][86][87].
Since the ANFIS is a product of an enhancement based on ANN, hence its performance is frequently compared with ANN in terms of ET0 estimation.Pour-Ali Baba et al. [85] conducted their experiment in Gwangju and Haenam of South Korea.They realized that the performance of the ANFIS and ANN could vary when the input datasets were different.ANFIS had produced better The history of applying fuzzy logic to estimate ET 0 began in 2009.Keskin et al. [81] forecasted the pan evaporation of Lake E girdir in Turkey using ground observation climatic data.A comparable study was done in Karso watershed of India [82].The authors did not only study the feasibility of fuzzy logic predicting pan evaporation, but also the performance of fuzzy logic as compared to the ANN, the least-squared SVR and the adaptive-neuro fuzzy inference system (ANFIS), was also evaluated.The authors remarked that the fuzzy logic model emerged as one of the best models for pan evaporation estimation.This study stressed the importance of fuzzy rules in producing good estimations.Successful application of fuzzy logic shall have good membership functions as foundation.The tuning of membership function not only requires expert knowledge, but is also time-consuming, especially for a complex phenomenon like ET that can be affected by a number of parameters.Hence, the ANN acts as a complimentary to the fuzzy logic to form an ANFIS [83].Application of the ANFIS for ET 0 estimation was first done by Kisi and Öztürk [84] and there are a number of recent works showing promising results [85][86][87].
Since the ANFIS is a product of an enhancement based on ANN, hence its performance is frequently compared with ANN in terms of ET 0 estimation.Pour-Ali Baba et al. [85] conducted their experiment in Gwangju and Haenam of South Korea.They realized that the performance of the ANFIS and ANN could vary when the input datasets were different.ANFIS had produced better estimation when solar radiation was fed as input, whereas ANN had better performance when sunshine hours were used.Similarly, the performance of ANFIS and ANN could be affected by geographical location [88].However, some literature claimed that ANN had slightly better performance than ANFIS, which could be due to ANN's flexibility (not bound by any rules) [89,90].
One interesting study on the ANFIS model is the comparison between two methods of setting up a fuzzy rule, namely the grid partitioning method and subtractive clustering method [91].The former divides input space in grid-like manner and each region is fuzzy.For subtractive clustering, rules are set up based on the number of clusters found in the input space.It was claimed that subtractive clustering had computational advantage over grid partitioning.Investigation done by Cobaner [91] showed that both approaches had similar performances.However, the ANFIS model using subtractive clustering method could be affected by quality of training data, especially when data are missing [92].
The review of publications in this subsection revealed that, unlike nonlinear learning in ANN and kernel tricks applied in SVM, fuzzy logic provides another way for a machine to learn the rather complex phenomenon of evapotranspiration.The main advantage of the fuzzy logic-based models over the ANN and the SVM is that it actually allows for a more linguistic way of describing the data.In other words, based on the fuzzy rules and membership functions, one can more or less deduce the relationship that maps the inputs to the outputs.

Tree Based Models
Breiman [93] was the first person to compile decision trees into two main categories, which were the classification tree and regression tree.However, it was Quinlan [94] who provided a better understanding on the operation of tree models.In Quinlan's work, it was stated that the decision would continue to split and grow as long as the data within the nodes of the trees were still considered as impure.In the case of ET, using a tree model for regression analysis is favored over classification.Within this context, Pal and Deswal [95] introduced a widely accepted splitting criterion for the M5 tree model.They claimed that, in order to produce better splits with highest computation efficiency, data within any nodes should be split in such a way that the standard deviation reduction could be maximized.It was observed that the M5 tree model could produce high correlation results to the ET 0 value, although the errors of estimations gradually increased when input climatic parameters were reduced.
There are several research works published that followed up the study of Pal and Deswal [95].Rahimikhoob et al. [96] attempted to convert pan evaporation data in ET 0 while using other climatic parameters as complementary data.Subsequently, the performance of M5 tree model in predicting ET 0 was compared with the ANN [52].The study was done in Iran where wind speed and radiation data were found to be absent.The author concluded that, under such circumstances, the M5 tree model could achieve similar performance to the ANN.It was also suggested that the M5 tree model should be favored over the ANN due to its simplicity in terms of computation.
Elsewhere, Kisi and Kilic [97] also studied the difference in prediction performance of the M5 tree model and ANN.In their concluding remarks, the authors revealed that both the M5 tree model and ANN could produce outstanding ET 0 estimation when trained and tested locally.This was however not true when the machine learning models were trained and tested at different stations.The M5 tree model had the worst performance, especially where lesser climatic parameters were available.In fact, the performance of M5 tree model was worse than the empirical models.However, in another study, the results showed disagreement where the M5 tree model was claimed to be having better forecasting accuracy when trained locally as well as using external data [98].In other words, the M5 tree model could be very dependent on the quality of training data to determine its spatial robustness and generalisability.
According to the papers reviewed regarding to tree-based modeling, it is clear to us that tree-based models exhibit a clear advantage of simple and fast computation.In spite of that, the pitfall of such a model is also obvious.As the tree in the model would have to grow until there are no any other possible splits (the data is deemed to be pure by then), there is a risk of overgrowing the tree.In such circumstance, overfitting could occur which is undesirable for regression analysis.To overcome the problem, a strategy known as pruning is needed to remove unnecessary parts of the tree and replaced them with linear functions.Moreover, the sequence of tree's splitting could end up with different results even though with the same set of training data.To compensate the effect of randomness, trees are sometimes bundled to form a random forest.This will be discussed in detail in later parts of this review.
Basic artificial intelligence models have their own advantages as well as disadvantages.The ANN can be efficient in fitting nonlinear relationship, but it is less explanatory and prone to overfit.The SVM has good generalizability at the expense of costly computation especially for high dimensionality problems.The fuzzy logic provides interpretable rules but that would require initial set up with expert knowledge.The Tree models are computationally efficient, however, would incur high errors.Hence, using basic artificial intelligence models alone is insufficient to accommodate the increasing expectations of their performances.In the following next section, different hybridization techniques of artificial intelligence models are explored as an effective solution to overcome the problems encountered above.

Hybrid Models
Hybrid modeling which combines two or more models may somehow improve model performance by merging their individual strengths [99,100].As demonstrated in the research works mentioned previously, researchers are ambitious for artificial intelligence models to be developed in the future that can work in harsher conditions.For example, in environments with limited climatic parameters, wide region of interest, or longer prediction horizon, among others.Therefore, this section of review will be focusing on uncovering some more commonly used techniques to develop hybrid artificial intelligence models.

Averaging
The idea of ensemble modeling was suggested in 2005, where it was used to forecast weather to overlay predictions of multiple models [101].The simplest possible ensemble model is by plainly averaging the product of the members of the ensemble.Simple averaging obtains the mean of the models.In such a way, all involved models will be treated as though they have equal performance.In order to correct the absurdity in the assumption of simple averaging, some studies preferred to use weighted averaging.The weight values assigned to the models are ranked based on certain performance measure.For example, Nourani et al. [33] proposed to use the coefficient of determination as a ranking reference.However, these two methods were not comprehensive enough to provide accurate insights for individual models in an ensemble.
Taylor [102] proposed an alternative measure known as the simple Taylor skill.For each individual model, a Taylor skill score will be assigned as the weight value.The Taylor skill score is deemed to be more comprehensive as it takes correlation coefficient and relative standard deviation into consideration.This approach is used by Yao et al. [103], where it was proven that the ensemble model produced from the simple Taylor skill fusion could produce spatial estimation which was comparable to the remote sensing technique.Nonetheless, the authors raised the concern that the simple Taylor skill fusion lacks the ability to describe the ET phenomena physically.This led to the rather low popularity of this method among researchers worldwide.

One of the most common techniques to hybridize artificial intelligence models is the data fusion technique (ensemble modeling).
There are various strategies that can lead to the desired output.The first method to be reviewed in the bootstrap aggregating (bagging) method.Generally, bootstrap aggregating involves two main parts: resampling and aggregation.Bootstrap aggregating is especially useful when one has a smaller sample size.During the stage of resampling, the collected samples will be treated as an "apparent population".Bags of "samples" will be produced from the "apparent population" by using resampling with replacement method.The bags of "samples" will be having an equivalent size with its "apparent population" [104].Application of bootstrap aggregating in estimating ET 0 is common.Kim et al. [105] applied bootstrap aggregating on the GRNN to study the performance of soft computing in forecasting ET 0 .The study showed that using bootstrapping alone to solely extend the size of training data was insufficient to produce significant improvement to the GRNN models.Instead, the authors suggested training multiple models in order to obtain their aggregated output.It was opined that the latter could effectively reduce the generalization error.This study was the pioneer of utilising bootstrap aggregating for improvement of artificial intelligence models when calculating ET 0 .
The success of Kim et al. [105] attracted the attention of global researchers to conduct similar studies.Besides the GRNN, bootstrap aggregating can be applied on other machine learning models such as tree models.In fact, performing the tree model analysis using bootstrapped samples can lead to the formation of a random forest that was mentioned in passing in the previous section.Feng et al. [59] reported that the random forest model could perform better than the GRNN.In the study of Granata [5], the author compared the results of bagged random forests with individual regression tree models.However, this study reported another finding which claimed that bagging did not significantly improve the performance of single regression tree.Although the author did not provide explanation to his discovery, nonetheless, it is strongly believed that the contradictions between the works of Kim et al. [105], Feng et al. [59], and Granata [5] originated from differences in the datasets.The former two opted to use monthly and annual data respectively, whereas the latter was using daily time step data.Bootstrap aggregating is clearly providing positive effects when the sample size is smaller.
The unique characteristics of bootstrap aggregating is that it does not only perform data pre-processing on the raw datasets.At the same time, it offers an algorithm to aggregate and average out the output of individual models.This is especially useful as an approach to enlarge the limited collected data while offsetting the bias and variance that was aroused from the randomness of model training.

Bayesian Modeling Approaches
Apart from averaging and bootstrap aggregating, another very useful technique to create an ensemble model is via the Bayesian modeling approaches.The Bayesian modeling approaches utilizes the Bayes rules in statistical studies.There are two main strategies when applying the Bayesian modeling approaches in modeling hydrological processes, namely the Bayesian model selection and the Bayesian model averaging [106].Although both approaches originated from the same fundamentals, their intuitions could still show remarkable differences.In Bayesian model averaging ("team-of-rivals" approach), the main theory underneath is that the model is convinced that there is a truth to be told by models.However, the degree of correctness is strongly dependent on the uncertainties incurred.
Bayesian model averaging works on the basis that it considers truthfulness of the members in an ensemble as their weights.Explicitly, this is realized through the computation of the posterior probability of each model [107].In this way, the ensemble model would not take excessive risk to exclude models that could be true as well.In the case when data and observations are massive enough to confidently deduce a conclusion, or when there is a particular model that considered virtually able to be true, this model will be promoted to Bayesian model selection ("winner-takes-all" approach).
In other words, Bayesian modeling approaches will keep updating weight of models by imputing their posterior probabilities, until an exceptional model emerges as a "winner" to end the search.
Bayesian modeling approaches were widely used in ET 0 estimation research.Zhu et al. [108] studied the posterior distributions of factors affecting ET 0 for different periods, which varied in terms of leaf area index.In a later study, Zhu et al. [109] produce an ET 0 estimation ensemble model which included the likes of the PM model, the advection-aridity model, the Shuttleworth-Wallace model, and the modified PT model.The outcome of the study proved that, as compared to the simple averaging, the Bayesian model averaging had more positive influences during the process of developing the ensemble model.The authors is of the opinion that the probability density function proposed in the Bayesian theory was well suited to ET phenomena.Despite its good performance, the authors also stressed that output of the Bayesian model averaging was strongly linked to the selection of input parameters.
Chen et al. [110] took a more aggressive approach whereby they used the Bayesian model averaging to combine empirical and artificial intelligence models.The research team suggested two different schemes to create the ensemble model.The authors observed that including all models in the ensemble would result in poorer performance as the Bayesian model averaging assign some weights to poor-performing models.In view of such circumstance, another ensemble was created to include only the models that were performing well.The hypothesis of the authors was verified to be correct.
The usefulness of the Bayesian modeling approaches has resulted in the introduction of various related algorithms such as the Bayesian joint probability [111] and the Bayesian regression [112].Bayesian regression was thought to be able to provide an insight to the selection of input parameters as well as their relationship to ET 0 .This could provide a solution to policy makers to prioritize collection of data in the near future.

Boosting Algorithm
Boosting is a technique whereby the prediction accuracy is improved by compounding estimations of several weak learners [113].Unlike the Bayesian model averaging, the boosting algorithm works in a step-wise method, where a learner is added at a time to minimize the loss function.In the boosting algorithm, the first learner will try to search for an optimum loss function value.Subsequently, the following models will be fitted into the ensemble and work on the residuals of their predecessors.Over the years, many versions of boosting algorithms had been established, each with its own novel distinction.Some commonly known boosting methods include the gradient boosting [114], adaptive boosting [115], extreme gradient boosting [116], and the categorical boosting [117].
In recent years, the use of the boosting algorithm in estimating ET 0 has emerged increasingly popular.Fan et al. [77], in particular, had provided the comparison of two types of boosting algorithms on tree models, namely gradient boosting and extreme gradient boosting.The two algorithms differ in the sense that the gradient boosting uses nodes of tree models as weak learners, whereas the extreme gradient boosting uses set of trees as weak learners.The authors found that" generally, the extreme gradient boosting overpowered the gradient boosting.This could be due to the fact that extreme gradient boosting combined the averaged-out results of trees in set, which reduced the variance in the output.On top of that, the design of extreme gradient boosting allows for parallel computation, which could reduce the time taken for analysis.
Ponraj and Vigneswaran [118] proposed the use of the gradient boosting regression to estimate ET 0 at Borrego Springs, California.In the same study, the authors compared the performance of the gradient boosting regression with the conventional multiple linear regression and random forest methods.The results of the study showed that the gradient boosting regression showed higher correlation with the standard PM model.The authors also suggested the use of the gradient boosting machine as an alternative in future investigations.
Recently, Fan et al. [119] used another variation of the boosting algorithm, which was the light gradient boosting algorithm to estimate ET 0 .The operating principle of the light gradient boosting is that it integrates the essence of gradient boosting and extreme gradient boosting.It performs leaf-wise optimization instead of level-wise optimization.This could effectively reduce the memory and time taken for computation.However, as shown by the results, the light gradient boosting machine required sufficient data in order to be trained well.During the training stage, the performance of light gradient boosting machine was generally weaker than the random forest as well as the M5 tree model.The situation was reversed during the testing phase where light gradient boosting machine performed better when it was well trained.
The core of the boosting algorithm is to assemble several weak learners to form a strong learner.By doing so, the strengths and experience of the weak learners can be utilized well by the hybrid artificial intelligence model.Most important of all, the boosting algorithm can be used as a strategy that can reduce the risk of overfitting.Nevertheless, the development of the boosting algorithm is still at the early stages and more advanced methods shall be anticipated in the near future.

Nonlinear Neural Ensemble
Previously discussed data fusion techniques are developed based on certain statistical logics.There is a kind of data fusion technique that depends on the black-box theory known as the nonlinear neural ensemble.To summarize, outputs of individual artificial intelligence models are fed into a secondary neural network to be trained once more.In other words, an ANN will be used to assemble individual artificial intelligence models.This method had been applied by Nourani et al. [33] through the combined ANN, SVM, ANFIS, and multiple linear regression.When compared with the simple averaging and the weighted averaging, the nonlinear neural ensemble yielded better performance.Similar observations were obtained when they used nonlinear neural ensemble to combine empirical models.This proved that, for a highly nonlinear process like ET, averaging might be insufficient to capture the complexity.
In another study, the individual ANN was added one at a time to produce an ensemble [120].The addition of ANN was continued until the termination condition was met (tolerable error was achieved).In this way, the architecture and activation function of individual ANN can be constantly modified in order to be considered acceptable into the ensemble.This approach was also used by El-Shafie et al. [121].The resulting ensemble model would be consisting of only excellent models, which in turn led to the accurate prediction of seasonal ET 0 .
The nonlinear neural ensemble is very useful in the scenario where statistical data fusion methods could not produce improvements as compared to the original artificial intelligence models.It can hybridize the artificial intelligence models by mapping another black-box relationship between inputs and outputs.However, the results would be less interpretable as the intrinsic relationship within the black box could not be observed.

Ensemble Models for Remote Sensing
One of the first attempt to use the machine learning model to estimate ET 0 with remote sensing data was done in the United States, where AmeriFlux sites were available [40].In the study, land surface temperature, enhanced vegetation index, shortwave radiation, and land cover data are recovered from satellite images that provided 1 km by 1 km coverage and eight-day time step.The ET 0 were estimated using the SVM, ANN and multiple regression.A similar approach was done by Zhang et al. [122] in China where they extended the application of remote sensing data in the BPNN and the ANFIS for estimating ET 0 .Further studies were done to include more artificial intelligence models such as the M5 tree model, bagging, random forest [5], ELM [123], and boosted tree [124].However, the accuracy of these studies was constrained by the quality of the images for retrieving the estimated meteorological data.It was claimed that the images shall be within microwave band whereas cloud free condition is preferred [125,126].
The major advantages of the remote sensing data over conventional ground observation data include the wide selection of spatiotemporal range as stated earlier.Furthermore, satellite images can provide a massive variety of parameters to be used for ET 0 estimation.As a result, multiple models can be used to train the artificial intelligence models such as land surface model, energy balance (based on eddy covariance and Bowen ratio) and equations for ET 0 predictions.In spite of its advantages, the shortcoming of remote sensing is that the estimation of ground data could be inaccurate.Moreover, the homogeneity of the satellite images could also affect the process of estimation.Hence, numerous efforts were done to apply data assimilation techniques on ET 0 estimation artificial intelligence models based on remote sensing data.This could be done by merging several satellite images that capture different information and feed them to the artificial intelligence models during the training stage.
The spatial and temporal adaptive reflectance fusion model (STARFM) is regarded as one of the most commonly used data assimilation technique when dealing with remote sensing data.The basic idea of the STARFM is that, if a Landsat-MODIS image pair is available, the algorithm can calculate the systematic error on each pixel for MODIS image in order to retrieve a Landsat-like image [127].However, this method assumes that both Landsat and MODIS images observe the same amount of reflectance and incur by a constant bias error.Enhanced STARFM (ESTARFM) was introduced later to overcome some problems of STARFM [128].ESTARFM can handle heterogeneous regions unlike STARFM method.In other words, when the pixel resolutions of the satellite images are not uniform, ESTARFM would be favored over STARFM [129].
The operating principles of both the STARFM and ESTARFM are similar.Available MODIS images are matched with Landsat images of different overpass dates.Optimum base pairs will be used for training the model.Based on the given MODIS image, the two data fusion techniques will retrieve a predicted Landsat image so that the computation of ET 0 is made possible.Cammalleri et al. [130] proposed to apply data fusion technique on remote sensing based ET 0 prediction so that images that carry multiple information can be combined.In their study, Landsat images (30 m spatial resolution, 16-day temporal resolution) and MODIS images (1 km spatial resolution, 1-day temporal resolution) were used to estimate ET 0 using ALEXI and DisALEXI land surface models.By using the STARFM, ET 0 can be estimated from both sets of data.Landsat based ET 0 was compared with Landsat-MODIS based ET 0 .It was found the latter had higher accuracy, especially in the presence of discriminant factors such as rainfall events.
A similar method was applied by Cammalleri et al. [131] to the field scale, where they studied the ET 0 of corn and cotton crops.Semmens et al. [132] extended the application to viticulture system where Landsat, MODIS and multi-sensor data were fused by the STARFM method.Recently, the ALEXI and DisALEXI fused by STARFM was also used by Knipper et al. [126].Their study was built on the basis of Semmens et al. [132] with the expansion of study to multiple years in order to have an insight of seasonal dynamics of ET 0 .Ma et al. [133] deployed ESTARFM for ET 0 estimation from satellite imaging for the first time.They used three sets of MODIS data and two sets of Landsat data with dissimilar spatial and temporal resolutions.Instead of using the ALEXI and DisALEXI models, the surface energy balance system (SEBS) model was used in the study to calculate ET 0 .The authors claimed that their results produced high resolution ET 0 estimation with good accuracy.It was stated that the estimated ET 0 could produce similar trends with the observed ET 0 with slight underestimation.Nevertheless, a direct comparison between STARFM and ESTARFM has not been studied.Hence, it is believable that the upper hand of ESTARFM is that it allows for inclusion of more satellite images with different resolutions.
Besides the STARFM and ESTARFM, another very popular data assimilation technique used by researchers worldwide is the Kalman based ensemble.The definition of the observable model and state model are essential to use the Kalman algorithm.Alavi et al. [134] demonstrated the usefulness of the Kalman filter based ET 0 estimation.To be exact, the work estimated missing heat flux by treating as a function of time and temperature which acted as observable models.The Kalman filter-based algorithm was compared with the conventional mean diurnal variation, multiple regression, two-week average PT coefficient, and the multiple imputation.It was found that, although the Kalman filter-based algorithm did not show outstanding accuracy among the other methods, the slight difference provided better estimation of ET 0 , especially during short gap periods with volatile ET 0 fluctuations where sensitivity was a decisive factor.
Ever since then, the ensemble Kalman filter approach was widely used in estimating ET 0 .For instance, Peters-Lidard et al. [135] evaluated several data assimilation systems that employed the Kalman filter approach to predict the latent heat flux.The prediction was done using FLUXNET as well as MODIS data as inputs.It was reported that data assimilation was able to provide more accurate results, indicating wide data structure application of the ensemble Kalman filter.In the Shahe River Basin of China, Yin et al. [136] assimilated a hydrological model (data) with remote sensing-based evapotranspiration.The outcome of the research work showed a promising potential of the ensemble Kalman filter as a predictor when the state model is available.The advantages of using the ensemble Kalman filter are that it permits the realization of multisource data as well as increases the precision of estimation by incorporating suitable models.However, the determination of the observation and state models can be challenging, and this throws the spanner in the works for the popularity of the ensemble Kalman filter.
Utilization of the remote sensing approach in estimating ET 0 removes the constraint of spatial coverage.Satellite images of varying resolutions can be processed to recover valuable information during the prediction.The remote sensing method also enables the provision of real-time data to become possible and allows continuous monitoring of ET of certain regions.The development of data fusion algorithms successfully combines different satellite images and this in turn results in more information to be used in ET 0 prediction.Despite all these, the use of remote sensing is still at its early stage and hence more robust as well as powerful tools can be expected in the near future.

Data Decomposition
The previous discussions are mainly focused on the exploitation of historical data as ingredients for creating an estimating model.However, temporal trends and variations of ET are of utmost importance as they can be a predictive tool to assist the decision-making of the stakeholders.Therefore, a good artificial intelligence model shall be able to provide such information.Data related to ET could be highly dynamic and contain unnecessary noises.Decomposition of data is needed to filter out the noises in order to retrieve useful information.
According to Partal [137], the wavelet transformation had been successfully applied in many hydrological processes research.In fact, a combination of ANN with wavelet transformation was proved to be feasible in many other studies.Therefore, Partal [137] attempted to perform wavelet transformation of data series of several climate data using different temporal resolutions to obtain useful decompositions.Theses sub-series were reconstructed and then be fed into the BPNN, multiple linear regression and HS model.The resultant wavelet neural network (WNN) had better performance than the other two models and proved that the wavelet-transformed data were useful to retain only useful information as well as trends.The application of wavelet transformation is not constrained to only BPNN, but is also applied in other ANNs such as RBF [138], ELM [29], GRNN [139,140], and ANFIS [141].
Cobaner [142] converted Class A pan evaporation data into ET 0 by using wavelet decomposition.The study only focused on the effect of wavelet transformation; therefore, instead of using an ANN, the author selected a regression model for analysis.By using the Mallat discrete wavelet transformation, the complex time series was broken down into several sub-time series that exhibited daily, monthly, and annual features of the process.Each time series was weighted based on the strength of its correlation.It was concluded that although the wavelet regression model had slightly lower accuracy than the standard FAO-24 model for pan evaporation conversion; however, the drastic reduction of required parameters was sufficiently proved to be a success of this study.
Apart from the wavelet transformation, there are also other variations of data decomposition.For instance, Adarsh et al. [143] used a multivariate empirical mode decomposition to pre-treat the raw data (temperature, solar radiation, relative humidity and wind speed).In this method, intrinsic mode functions were generated after the decomposition of the data using varying temporal scale.Using multivariate empirical mode decomposition on the data did not provide significant improvement on the obtained predictions.Future investigations can be conducted to study the effect of such decomposition method when climatic parameters were scarce.
Misaghian et al. [144] provided another form of data decomposition a priori to estimate ET 0 .The ET 0 data were represented in a multi-dimensional or tensor vector space.By using the Tucker decomposition (a variation of singular value decomposition), the three-way relationship of month, year and ET 0 was unfolded.The core tensor can be computed by the prediction machine and reconstructing the predicted original tensor.The authors compared the values of ET 0 computed with the empirical models with those generated by tensor decomposition prediction.The predicted outcome of tensor decomposition model was close to the estimations using the PM model, PT model, HS model, Blaney-Criddle model, and the Jensen-Haise model.
Data decomposition offers another perspective of the ET 0 prediction whereby future ET 0 can be forecasted based on historical trends.In order to obtain a clearer picture of how ET 0 is behaving at different time scales, data decomposition could do the work by filtering noise and generate profiles of ET 0 trends to be analyzed by artificial intelligence models.Data decomposition works as a pre-processing technique that serves to reduce redundant data to the artificial intelligence model in order to produce more meaningful and useful estimations.
Figure 6 outlines the pathways to develop hybrid models using different modeling approaches.In addition, in Table 4, an overview of different hybridization methods is provided.The variations of hybridization models are discussed in details in terms of their background principles and suitable applications.different time scales, data decomposition could do the work by filtering noise and generate profiles of ET0 trends to be analyzed by artificial intelligence models.Data decomposition works as a preprocessing technique that serves to reduce redundant data to the artificial intelligence model in order to produce more meaningful and useful estimations.Figure 6 outlines the pathways to develop hybrid models using different modeling approaches.In addition, in Table 4, an overview of different hybridization methods is provided.The variations of hybridization models are discussed in details in terms of their background principles and suitable applications.

Future Prospects
Shifting from conventional empirical models to artificial intelligence models for ET estimation should be regarded as an indubitable trend.This is in line with the introduction of the Fourth Industrial Revolution where artificial intelligence will take over non-value-added activities such as forecasting and estimation.This would assist in the reduction of errors or mistakes when policy makers are making decisions based on highly precise, accurate, and effective predictions.In addition, it is inevitably important for researchers worldwide to seek for solutions and reduce the number of meteorological parameters needed for ET prediction for all its attendant costs and time savings and efficiency reasons.The black-box operating nature of artificial intelligence models is currently the solution to this problem.With that being said, ET data from ground observation or physical measurement would still remain imperative during this transition while a robust artificial intelligence model is being developed concurrently.On the other hand, advancement in satellite technologies allows the use of remote sensing in ET monitoring.In other words, this provides another form of data that would not have been collected from ground weather stations.Application of remote sensing technology in ET estimation reduces the dependency of ET estimation from ground observation data as it offers a new basis to compute ET.Nevertheless, ground observation data are still important in order to calibrate raw satellite images for better prediction in coming years.
In short, the future prospects of this field of study can be summarized as follows: 1. Effective data assimilation.
Data fusion techniques shall be well utilized to accurately map ground observation data to remote sensing data.This can make the satellite images become more informative in terms of accuracy as well as temporal and spatial resolutions, if well calibrated.

Creation of new hybrid models.
This can be done by changing the combinations of currently available artificial intelligence models and hybridization techniques.Meanwhile, development of new algorithms or enhancement of present algorithms can be attempted in the future.It is anticipated that the "committee of decision" formed from hybrid models can produce predictions with greater accuracy and shorter computation time.

Be cautious of climate changes.
Artificial intelligence models are highly dependent on the training (historical) data.Volatile climate poses a serious challenge where past trends might not be applicable in the future.Studies in the coming years can be focused on retrieving information which take climate changes into consideration.For example, data selection and sampling shall be done with care in order to ensure the homogeneity of the data where the effect of climate change is minimal.In addition, models should be kept as updated as possible.Dynamic modeling can be done to cater to this need while artificial intelligence can support it with fast calculation and real-time data.

Relationship discovery from new association rules.
Making use of the "Big Data" allows us to explore various possibilities which associate input parameters to ET.By using the developed artificial intelligence models, one can explore a vast number of variables or parameters and study their association with ET within a short period of time.Parameters that are highly correlated with ET can be further studied to reveal their relationships and scientific interactions.

Widening of forecasting horizons.
Related studies were still in the infancy stage where the forecasting windows were too narrow.Increasing forecasting lead time can assist in the design of efficient water resource management plans.This would be important, especially in crop plantations that require a longer time to schedule irrigation their plan.

Conclusions
The estimation of ET is of paramount importance, especially when dealing with agricultural activities.This review has outlined the pitfalls of conventional models based on energy balance which included the high dependency on climatic parameters.In addition, empirical models could be specific to certain regions and this in turn requires further calibration before the models could be used.Emergence of artificial intelligence models, which operated on the premise of a black-box principle aims to overcome these problems.The integration of artificial intelligence reignites the possibility of reduction of the now much needed climatic parameters for estimation of ET.Since artificial intelligence models are data-driven, this review has pointed out some sources of data, and also the significance of different parameters in various climate patterns.ANN, SVM, fuzzy models, and tree-based models had been studied extensively in the past and their feasibilities were tried and tested.Nevertheless, studies had revealed that, in the case of limited meteorological parameters or data, performance of these artificial intelligence models would deteriorate.
In view of such circumstance, data fusion techniques had been developed as a solution.Bootstrap aggregating is useful when available data size is too little to train a good model.Bayesian modeling approaches rely on the imputation of posterior probabilities to weigh the correctness of individual models.The boosting algorithm works by combining several weak learners to form a strong learner.Moreover, the nonlinear neural ensemble relies on the black-box operation in order to create an ensemble which produced better results than its constituent individual models.Data decomposition has distinct characteristics whereby it extracts useful information at different resolutions via certain forms of transformation.This could assist in the removal of unwanted noise prior analysis.The purpose of performing data decomposition, especially wavelet transformation is to draw certain trends from historical data in order to predict future behaviour of ET.At the end of this review, a compilation of suggested hybridization techniques for each base artificial intelligence models are provided in Table 5.This could serve as a guideline in terms of parameters selections and ensemble strategies for future research workers who wish to have a fresh start on ET estimation using the hybrid artificial intelligence models.
Remote sensing technology appears to be able to remove the limitation of spatial coverage when estimating ET.It also serves to provide real-time data in order to increase the dynamicity of analysis.Remote sensing-based ET estimation is always integral with land surface model where energy balance and radiation play important roles.Data assimilation can also be performed on remote sensing data where satellite images from different sources could be combined in artificial intelligence models.This enables the combination of different information being carried by different sources of satellite images (including resolution and band range).This can be realized by some of the commonly used techniques such as the STARFM, ESTARFM, and the Kalman filter-based ensemble.
Besides providing the chronological development and guidelines to select methods or algorithms for ET estimation using artificial intelligence models, this review also suggested the future trends of the development of artificial intelligence in ET prediction.In upcoming studies, it is anticipated that data fusion or assimilation would be the major subject alongside with the development of more robust artificial intelligence models.It is in our interest that ground observation data can be merged with remote sensing data.New hybrid models are also anticipated in order to increase the prediction accuracy and speed.In the near future, climate change will be a major environmental issue and researchers shall be cautious about its effect.With matured and well-developed models, we could expect that more parameters well associated with ET can be explored to discover their relationship with ET, all in all for a more profound understand of the processes.Finally, forecasting horizons are to be lengthened for achieving water resources allocation with higher efficiency.It can be a useful tool during key steps of the decision-making process for policy makers, especially in water resources management for a successful economic growth and development in the agricultural sector.

Figure 1 .
Figure 1.Number of publications related to evapotranspiration estimation using artificial intelligence from 2011 to 2019.

Figure 1 .
Figure 1.Number of publications related to evapotranspiration estimation using artificial intelligence from 2011 to 2019.

Figure 2 .
Figure 2. Virtuous cycle between review papers, new research and the decision-making process.

3 . 3 . 32 -
input layer, one or more hidden layers and one output layer -Signals are passed from input layer to output layer in the forward direction -Normally use sigmoid activation function to map input to output Radial Basis Function -Consist of one input layer, one hidden layer and one output layer -Gaussian activation function is computed for every nodes in the hidden layer Generalised Regression Neural Network -A probabilistic based model -Consist of one input layer, one pattern layer, one summation layer and one output layer -Pattern layer is used to cluster the data and train the model -Results of the summation layer nodes are normalised in the output layer Back-Propagation Neural Network -Consist of one input layer, one or more hidden layers and one output layer -Consist of one input layer, one or more hidden layers and one output layer -Signals are passed from input layer to output layer in the forward direction -Normally use sigmoid activation function to map input to output Radial Basis Function Table Characteristics of artificial neural network models.input layer, one or more hidden layers and one output layer -Signals are passed from input layer to output layer in the forward direction -Normally use sigmoid activation function to map input to output Radial Basis Function -Consist of one input layer, one hidden layer and one output layer -Gaussian activation function is computed for every nodes in the hidden layer Generalised Regression Neural Network -A probabilistic based model -Consist of one input layer, one pattern layer, one summation layer and one output layer -Pattern layer is used to cluster the data and train the model -Results of the summation layer nodes are normalised in the output layer Back-Propagation Neural Network -Consist of one input layer, one or more hidden layers and one output layer -Consist of one input layer, one hidden layer and one output layer -Gaussian activation function is computed for every nodes in the hidden layer Generalised Regression Neural Network Table Characteristics of artificial neural network models.input layer, one or more hidden layers and one output layer -Signals are passed from input layer to output layer in the forward direction -Normally use sigmoid activation function to map input to output Radial Basis Function -Consist of one input layer, one hidden layer and one output layer -Gaussian activation function is computed for every nodes in the hidden layer Generalised Regression Neural Network -A probabilistic based model -Consist of one input layer, one pattern layer, one summation layer and one output layer -Pattern layer is used to cluster the data and train the model -Results of the summation layer nodes are normalised in the output layer Back-Propagation Neural Network -Consist of one input layer, one or more hidden layers and one output layer -A probabilistic based model -Consist of one input layer, one pattern layer, one summation layer and one output layer -Pattern layer is used to cluster the data and train the model -Results of the summation layer nodes are normalised in the output layer Table 3. Cont.Artificial Neural Network Models Characteristics Back-Propagation Neural Network Agronomy 2019, 9, x FOR PEER REVIEW 9 of Include a back-propagation algorithm to feedback the output error in order to optimise the model performance by adjusting weights and biases Extreme Learning Machine -Consist of only one input layer, one hidden layer and one output layer -Number of nodes in hidden layer are randomly generated -Only the number of nodes in the hidden layer have to be tuned to optimize the performance of model

- 32 -
Consist of one input layer, one or more hidden layers and one output layer -Include a back-propagation algorithm to feedback the output error in order to optimise the model performance by adjusting weights and biases Extreme Learning Machine Agronomy 2019, 9, x FOR PEER REVIEW 9 of Include a back-propagation algorithm to feedback the output error in order to optimise the model performance by adjusting weights and biases Extreme Learning Machine -Consist of only one input layer, one hidden layer and one output layer -Number of nodes in hidden layer are randomly generated -Only the number of nodes in the hidden layer have to be tuned to optimize the performance of model

-
Consist of only one input layer, one hidden layer and one output layer -Number of nodes in hidden layer are randomly generated -Only the number of nodes in the hidden layer have to be tuned to optimize the performance of model a. Minimization of required input parameters, b.Generalization of the ANN for wider spatial application, c.Introduction of new input parameters, d.Enhancement of ANN prediction ability.

Figure 3 .
Figure 3. Working principle of support vector machine.

Figure 3 .
Figure 3. Working principle of support vector machine.

Figure 4 .
Figure 4. Network structure of support vector machine

Figure 4 .
Figure 4. Network structure of support vector machine

Figure 5 .
Figure 5. Overall flow of fuzzy inference system

Figure 5 .
Figure 5. Overall flow of fuzzy inference system

Figure 6 .
Figure 6.Pathways for hybrid model development.ANN -Artificial Neural Network; SVM -Support Vector Machine.

Table 1 .
Significant parameters for different climate patterns to estimate evapotranspiration

Table 1 .
Significant parameters for different climate patterns to estimate evapotranspiration

Table 2 .
Types of data for evapotranspiration estimation.

Table 3 .
Characteristics of artificial neural network models.

Table 3 .
Characteristics of artificial neural network models.

Table 4 .
Overview of different hybridization techniques.