Estimating FAO Blaney-Criddle b-Factor Using Soft Computing Models

: FAO Blaney-Criddle has been generally an accepted method for estimating reference crop evapotranspiration. In this regard, it is inevitable to estimate the b-factor provided by the Food and Agriculture Organization (FAO) of the United Nations Irrigation and Drainage Paper number 24. In this study, ﬁve soft computing methods, namely random forest (RF), M5 model tree (M5), support vector regression with the polynomial function (SVR-poly), support vector regression with radial basis function kernel (SVR-rbf), and random tree (RT), were adapted to estimate the b-factor. And Their performances were also compared. The suitable hyper-parameters for each soft computing method were investigated. Five statistical indices were deployed to evaluate their performance, i.e., the coefﬁcient of determination ( r 2 ), the mean absolute relative error (MARE), the maximum absolute relative error (MXARE), the standard deviation of the absolute relative error (DEV), and the number of samples with an error greater than 2% (NE > 2%). Findings reveal that SVR-rbf gave the highest performance among ﬁve soft computing models, followed by the M5, RF, SVR-poly, and RT. The M5 also derived a new explicit equation for b estimation. SVR-rbf provided a bit lower efﬁcacy than the radial basis function network but outperformed the regression equations. Models’ Applicability for estimating monthly reference evapotranspiration (ETo) was demonstrated.


Introduction
Reference evapotranspiration (ETo) estimation is imperative information to serve water resources planning, management, and operation [1]. The Blaney-Criddle method, as proposed by the Food and Agricultural Organization (FAO) of the United Nations, is a well-known temperature-based reference crop evapotranspiration. This method gives more advantage when having limitations of the measured data than the Penman-Monteith method, which requires many meteorological data [2][3][4]. Many attempts [5][6][7][8][9] have been made to evaluate the efficiency of the FAO Blaney-Criddle method in estimating reference crop evapotranspiration for many regions. The study results by Jhajharia, Ali, DebBarma, Durbude, and Kumar [6] revealed that in humid locations, the Blaney-Criddle method was superior to other temperature-based methods, such as Hargreaves and Thornth-Waite. This is because it offered an approximate solution of the reference crop evapotranspiration closest to the FAO Penman-Monteith. However, the calibration of FAO-Blaney-Criddle

FAO Blaney-Criddle B Factor
The original Blaney-Criddle equation requires information on the average daily percentage of total daily hours and mean daily air temperature for predicting reference crop evapotranspiration. Its formula is expressed as follows, by [21].
where ET 0 is reference crop evapotranspiration (mm/d); a and b are calibrated constants; p is the average daily percentage of total annual daytime hours; and T is the average daily air temperature ( • C). The a factor can be derived by: where RH min is the lowest daily relative humidity (%); and n/N is the average ratio of actual to possible sunshine hours. The p and N values can be received from tables when specifying latitudes and months [21,22]. They can be obtained using formulas as proposed by [23,24]. For determination of the value of b factor, Doorenbos, Pruitt, and Agl [21] proposed it in tabular form. It depends on the lowest daily relative humidity (RH min ), daytime wind speed (U d ), and the average ratio of actual to possible sunshine hours (n/N) (see detailed information in Table 1). The authors can simply utilize the technique of table interpolation to obtain the b value. However, it needs seven interpolation times for getting that value, leading to lead to considerable error [25]. To defeat such drawback, Frevert et al. [26] first proposed a regression equation (see Equation (3)) and, later, it was improved by Allen and Pruitt [27] (see Equation (4)). Nevertheless, it was still an error in estimating the b value of approximately 10% compared to the tabular values. Ambas and Evanggelos [28] used weighted least squares to estimate b factor of the FAO24 Blaney-Criddle method as shown in Equation (5). It gave close results as compared to the previous studies. Equations (3)- (5) still have an error in estimating the b value as compared to the tabular values. Hence, it requires other techniques to decrease the error. b = 0.81917 − 0.0040922(RH min ) + 1.0705 n N + 0.065649(U d ) − 0.0059684(RH min )(n/N) −0.0005967(RH min )(U d )

Soft Computing Models
Soft computing models refer to a data analysis of a complex system in order to discover the relationship between system state variables, i.e., independent and dependent variables, without explicit knowledge of the physical nature of the system [29]. In this section, four data-driven models, e.g., random forest (RF), M5 model tree (M5), support vector regression (SVR), and random tree (RT) are briefly explained, as follows.

Random Forest (RF)
The Random Forest (RF) was first introduced by Breiman [30] and has been a common modification of decision trees, which is one of the collections of techniques for data classification and regression [31]. There are two major phases of model construction. In the first step, RF generates a number of individual trees based on the decision tree process. Each tree is created by randomly selecting different sampled training data sets from the entire training data set (also known as the bagging method or bootstrap aggregation) and sub-attributes (or features) from all attributes in the training data set. Second, the voting method is applied, that is, the model prediction is finally achieved by voting for the classification problem or by using the mean value for the regression problem from the predictive performance of each tree generated. In comparison to the M5, full-grown RF trees are not pruned back. This is one of the key benefits of the regression of RF over the M5. As the number of trees increases, the error of speculation still converges even without pruning the tree, and over-fitting is not a matter of concern in light of the Strong Law of Large Numbers [32]. The RF model was adapted based on regression models in this study.

M5 Model Tree (M5)
The M5 Model Tree (M5) model tree was first implemented by Quinlan [33]. It applies a divide-and-conquer method to the creation of a relationship between independent and dependent variables and can be applied to both qualitative (categorical) and quantitative variables. Building M5 involves three stages. The first stage involves the development of a decision tree by dividing the data set into subsets (or leaves). Second, the overgrown tree is reduced, and linear regression functions substitute the plucked sub-trees to avoid overfitting the structure or a weak generalizer. The merging of certain lower sub-trees into one node is processed as part of the pruning approach. The smoothing procedure is finally employed to reduce the serious discontinuities between the linear models in the leaves of the trimmed trees, especially for models created from a small number of training samples.

Support Vector Regression (SVR)
Support Vector Regression (SVR) was developed by Vapnik [34] and his colleagues. This is the adaptation of the support vector machine (SVM) for regression. The basic idea of SVR learning is to solve the separation hyperplane that can correctly divide the training data set and has the largest geometric interval [35,36]. Using the automated conversion of nominal values to numerical values, SVM may be both numerical and nominal. Normalization or standardization shall be processed for all input data prior to the corresponding step. Unlike Support Vector Machine (SVM) for a classifier, which finds a line that best divides training data into classes, SVR processes the best line that separates the training data set by having a minimal error in the cost function. For this reason, an optimization algorithm is used to consider those data instances in the training dataset that are nearest to the minimum cost line. These instances are then referred to as support vectors, which is the name of this technique. In the event that a line that matches the data cannot be identified, a margin is inserted along the line to loosen the constraint. This margin helps the overall outcome to be better, but it does offer some poor predictions to be tolerated. Adequate determination of the complexity parameter C is important. Giving a low C value gives a broad minimum margin, otherwise, it gives a smaller minimum margin. In several real-world problems, it has been found that the use of a straight line is not sufficient for separating data sets. It is also more fitting to use curves or even polygonal regions. By converting data into higher dimensional areas, the kernel functions have been meant to draw lines and predict.

Random Tree (RT)
RT is a fundamental decision tree algorithm collaborating with Quinlan C4.5 or Classification and Regression Trees (CART). It chooses a random subset of attributes for each split from the available attributes before it is implemented with a subset size determined by the part ratio parameter. This method constructs a decision tree and chooses the feature to maximize the information gained using a portion of the data as training data. It is strong and straightforward to use, producing extremely accurate forecasts [29,30]. For a regression tree, a dataset is divided into sub-spaces, and fitting a constant is proceeded for each sub-space [32]. Consequently, A single-tree model exhibits a low level of prediction accuracy and a propensity to be very unstable. However, it can produce extremely accurate results via bagging RT as a decision tree method. It is highly flexible and has quick learning.

Weka Machine Learning Tool
WEKA (Waikato Environment for Knowledge Analysis) is a Java-based open-source machine learning platform released under the GPL (GNU). It was subsequently established by the University of Waikato in New Zealand. WEKA can impose pre-processing, classification, clustering, association rules, and selection of attributes for data. It also has a graphical representation visualization tool. WEKA has four main applications: Explorer, Experimenter, Knowledgeflow, and SimpleCLI. We can use the Explorer environment to explore the data. If we want to conduct experiments and conduct statistical tests between learning methods, the authors can use an experimenter. Knowledgeflow essentially supports the same features as an explorer, but it is a drag-and-drop interface that supports progressive learning. The authors can work on the WEKA command-line interface in a simpleCLI environment.

Tuning Hyper-Parameters
Developing a soft computing model or machine learning model considers two parameters, i.e., model parameters and hyper-parameters. Unlike model parameters obtained during the training process, hyper-parameters are the pre-setting parameters by the user to determine model structure before training the models. The control of a machine learning model's behavior requires hyperparameter adjustment. Therefore, our predicted model parameters will yield less performance if our hyper-parameters aren't properly tuned to minimize the loss function. In general, the process for tuning hyper-parameters includes defining a model, defining the range of possible values for all hyper-parameters, defining a method for sampling hyper-parameter values, defining evaluative criteria to judge the model, and defining a cross-validation method. In this experiment, a WEKA experimenter was utilized to do a systematic trial and error, that is, varying one interesting parameter and fixing the remained parameters, and repeating this step until covering all parameters. The Root Relative Squared Error (RRSE) with ten-fold cross-validation, given in WEKA, was used as a criterion for selecting the best parameter value for all 216 data sets.

Data Used
In this study, 216 data sets taken from the b factor tabular of FAO Blaney-Criddle [21] were utilized, coincident with the study purpose. For evaluating the models' performance with the previous studies, the training and testing process data sets were the same as those used in Trajkovic, Stankovic, and Todorovic [25]. They randomly selected 186 of 216 data sets for training models and used all 216 data sets for testing models. Table 1 summarizes the statistical analysis of relevant parameters of FAO Blaney-Criddle b for the training and testing processes. In overall statistic values, they were very similar for both training and testing data sets. However, when considering the Kurtosis value, it indicated all parameters (U d , n/N, RH min , and b) for both training and testing datasets had platykurtic distributions. Also, the skewness value showed that U d , n/N, and RH min for both training and testing datasets were approximately symmetric ("-" sign means skewed left and "+" sign means skewed right), while b for both training and testing datasets were moderately skewed right. The correlation analysis was conducted to individually evaluate the strength of the relationship between each input parameter (U d , n/N, and RH min ) and an output parameter (b). A low degree correlation was found for U d (r = 0.27, and 0.26 for training and testing stages, respectively), and a high degree correlation was obtained for the rest parameters. The n/N gave the correlation coefficient (r) of 0.57 and 0.58 for the training and testing stages, respectively. Additionally, RH min provided a strong negative relationship by giving the correlation coefficient (r) of −0.74 for both the training and testing stages, respectively.

Statistical Model Performance Indices
Five statistical indices were deployed to evaluate model performance, i.e., the coefficient of determination (r 2 ) (Equation (6)), the mean absolute relative error (MARE) (Equation (7)), the maximum absolute relative error (MXARE) (Equation (8)), the standard deviation of the absolute relative error (DEV) (Equation (9)), and the number of samples with an error greater than 2% (NE > 2%). All of these statistical indices were used by Trajkovic, Stankovic, and Todorovic [25] on this particular issue. The r 2 calculates the level of linearity of two variables and its maximum is 1.00. MARE and MXARE determine the difference between the real and the expected b factor and should be as small as possible. The perfect model should have an NE of zero. Finally, a Taylor diagram was proposed to comparatively elaborate and evaluate the efficacy of the developed models. This diagram can simultaneously show three statistic parameters, i.e., correlation, root mean square error, and standard deviation. The equations of statistical indices are given below, where b ai is the actual b-factor, is the estimated b-factor, and n is the number of samples in a data set. Table 2 shows the results of tuning hyper-parameters. Their explanation for each soft computing model is as follows.  In the process of tuning hyper-parameters for RF, some default parameters were selected as default in WEKA software, i.e., (1) infinite maximum tree depth, and (2) int(log 2(#predictors) + 1) function used to set the number of randomly selected attributes. However, three parameters, namely: (1) numIteration, which is the number of trees in the random forest; (2) batchSize, which is the optimum number of instances to be processed when predicting batch; and (3) numExecutionSlots, which is the number of threads available for execution to be used to create the collection, were investigated in our experiment. Findings revealed that numIteration of 300, batchSize of 100 (default value), and numExe-cutionSlots of 1 (default value) were the suitable hyper-parameters for RF with the testing data set. All cases gave an RRSE value of 12.14. The numIteration was a sensitive parameter, while batchSize and numExecutionSlots were not sensitive.

M5 Model Tree (M5)
The authors experimented tuning hyper-parameters of the M5 model tree using the default parameters in WEKA software of unpruned to be false, and use Unsmoothed to be false. The four parameters, i.e., batchSize, minNumInstances, numDecimalPLaces, and buildRegressionTree were investigated. If the batch prediction is utilized, the bathcSize option specifies the recommended number of instances to process. More or fewer instances are conceivable, but this allows implementations to select the batch size they want. The minimal number of instances to allow at a leaf node is specified by minNumInstances. The number of decimal places to utilize for the model's output is numDecimalPLaces. It can be decided whether to construct a regression tree/rule instead of a model tree/rule using the buildRegressionTree method.
Findings revealed that batchSize of 100 (default value), minNumInstances of 4 (default value), and numDecimalPLaces of 4 (default value), were the suitable hyper-parameters for M5 with the testing data set. All best cases gave an RRSE value of 11.46. By using batchSize of 100, minNumInstances of 4, and numDecimalPLaces of 4, the authors investigated the effect of selecting or not selecting a regression tree/rule. There was no need to generate a regression tree/rule due to giving an RRSE value of 11.46 compared to creating a regression tree/rule, which gave an RRSE value of 56.26. The minNumInstances and buildRegressionTree were sensitive parameters, while batchSize and numDecimalPLaces were not sensitive.

Support Vector Regression (SVR)
For SVR, two kernel functions, namely the polynomial kernel with variable exponent value and the radial basis function kernel with varying gamma value, were investigated. Also, the complexity parameter (C) was varied between 0.0 and 1.0 to determine the optimum value. The gamma parameter represents the influence of a single training reach, with low values indicating 'far' and large values indicating 'close.' The inverse of the impact radius of the samples chosen by the model as support vectors are gamma parameters. The modified sequential minimal optimization (SMO) as an iterative algorithm was used to solve the regression problem for SVR [34]. The authors found the optimal hyper-parameters for SVR with polynomial kernel function were the complexity parameter (C) of 0.8 and the exponent (n) of 1.0. By fixing the complexity parameter (C) value of 0.8 and varying the exponent (n) value from 1.0 to 4.0, it was sensitive to the exponent value for SVR with a polynomial kernel function. The best case gave an RRSE value of 24.21.
Furthermore, the optimal hyper-parameters for the radial basis function kernel were the complexity parameter (C) of 1.0 and the gamma parameter (γ) of 1.0. By fixing the complexity parameter (C) value of 1.0 and varying the gamma parameter (γ) value from 1.0 to 4.0, it was sensitive to the gamma parameter (γ) value for the radial basis function kernel. Additionally, the gamma parameter (γ) value of 1.0 gave the least RRSE. For both cases, the suitable C parameters were equal to or more than 0.8. It indicated that these data sets required a smaller minimum margin to separate the data. From those suitable exponents of the polynomial kernel function and gamma parameter of the radial basis function kernel were equal to 1.0, it also manifested that these data sets are not conglomerate data sets and do not require projecting the data into a higher-dimensional space for data separation. The best case gave an RRSE value of 2.37.

Random Tree (RT)
RT was conducted to determine the suitable hyper-parameters and some default parameters were selected as suggested by WEKA software, i.e., (1) unlimited maximum depth of the tree and (2) int(log 2(#predictors) + 1) function used to set the number of randomly selected attributes. However, five parameters, namely batchSize, numDecimalPlaces, min-Num, numFolds, and minVarianceProp, were investigated. The batchSize refers to the preferred number of instances to be processed when batch predictions are made. The numberDecimalPlaces is the number of decimal places to be utilized in model output. The minNum means the minimum total weight of the instances in a leaf. The numFolds is configured to determine the quantity of data used. For backfitting, one fold is utilized, and the other is applied for building the tree. The minVarianceProp represents the smallest variance of all data present at a node in regression trees to be divided.
Findings revealed that batchSize of 100 (default value), numDecimalPlaces of 2 (default value), minNum of 1 (default value), numFolds of 0 (default value), and minVarianceProp of 0.001 (default value), were the suitable hyper-parameters for RT with testing data set. All best cases gave an RRSE value of 24.23. The minNum, numFolds, and minVarianceProp were sensitive parameters, while batchSize and numDecimalPLaces were not sensitive.

Model's Performance Comparison
After getting the most suitable hyper-parameters for each soft computing model, the authors proceeded to assess their performance in estimating the FAO Blaney-Criddle b factor. As explained earlier, to compare the model's performance, five statistical indices were used, i.e., the coefficient of determination (r 2 ), the mean absolute relative error (MARE), the maximum absolute relative error (MXARE), the standard deviation of the absolute relative error (DEV), and the number of samples with an error greater than 2% (NE > 2%). This evaluation was only conducted for the testing stage following the study by Trajkovic, Stankovic, and Todorovic [25]. Table 3 shows the comparative results of statistical indices getting from the present and previous studies' testing stages. By ranking the model with each statistical index and counting the frequency for five soft computing models, it was found that SVR-rbf outperformed the other methods, followed by RF, RT, M5, and SVR-poly. This is because SVR-rbf has the lowest values of MARE (%), MXARE (%), NE > 2%, and DEV (%) and the highest value of r 2 . By doing the same thing, SVR-rbf, RF, and RT gave better results than the regression-based approach proposed by Frevert, Hill, and Braaten [26], Allen and Pruitt [27], and Ambas and Evanggelos [28], while M5 and SVR-poly gave the lower performance. However, SVR-rbf's performance as compared to the RBF network was comparable due to providing a bit lower performance.  Figure 1 shows the performance of eight models in the testing stage. The left-hand side shows plotting the actual b-factor and estimated b-factor (y-axis) with the data set order (xaxis), and a scatter plot is displayed on the right-hand side. The data set order was received from the b factor tabular of FAO Blaney-Criddle [21] with 216 data sets. The authors could not plot the graph herein for the RBF network due to having no raw predicted data shown in the literature. Figure 2 presents a Taylor diagram to compare the performance of eight models, except for the RBF network, due to the same reason mentioned. Estimating b factor by Frevert, Hill, and Braaten [26], Allen and Pruitt [27], and Ambas and Evanggelos [28] were calculated by Equations (3)-(5), respectively. A Taylor diagram pointed out that SVR-rbf provided the results closest to FAO Blaney-Criddle b parameters obtained from the table as proposed by Doorenbos, Pruitt, and Agl [21], followed by Frevert, Hill, and Braaten [26], RF, RT, M5, Allen and Pruitt [27], Ambas and Evanggelos [28], and SVRpoly. Using the equation proposed by Ambas and Evanggelos [28] and SVR-poly, it gave overestimation and underestimation for the b-factor, respectively, since they have more and less standard deviation (see Figure 2). Consequently, it indicates that these two models gave more uncertainty in estimating the b-factor than other models. Figure 3 shows a set of linear equations obtained from M5. It includes six rule sets.

Models' Applicability for Estimating Monthly Reference evapotranspiration (ETo)
The monthly climatological variables at Nis, Yugoslavia, given by Trajkovic, Stankovic, and Todorovic [25], were used to demonstrate the model's applicability, as shown in Table  4. Reference evapotranspiration (ETo) in January and December is equal to zero. That is why any climatological variables of January and December do not appear in Table 4. Table  5 shows the results of applying the developed soft computing models and compares their performance to a table interpolation method [25] and the regression-based models for estimating the b-factor. Table 6 shows the difference between the b-factor obtained from various methods and a table interpolation method. The positive value means overestimation, and the negative value represents underestimation. The b-factor based on the regression-based models was mainly underestimated by 1.12-6.00% compared to those values obtained by the table interpolation method [25], except for estimating the b-factor in June using the equation developed by Frevert et al. [26]. It gave an overestimation of 1.11%. However, most of the soft computing models overestimated by 0.57-3.92% in the estimation of the b-factor. Some of them, for example, M5 provided underestimated by 0.3% in March, 1.57% in April, 3.02% in October, and 2.21% in November. Table 7 presents the estimated monthly reference evapotranspiration (ETo). Using a table interpolation method as a baseline, it is also pointed out that the soft computing models outperformed the regression-based models in ETo estimation due to giving a lower percentage of yearly difference. All three regression-based models gave underestimation by 3.2-5.1% in estimating ETo, while all six soft computing models provided some overestimation by 0.4-0.9%. Based on the data used in this study, the RBF network and RT models gave the highest performance in estimating ETo due to having the lowest percentage of yearly difference.

Models' Applicability for Estimating Monthly Reference evapotranspiration (ETo)
The monthly climatological variables at Nis, Yugoslavia, given by Trajkovic, Stankovic, and Todorovic [25], were used to demonstrate the model's applicability, as shown in Table 4. Reference evapotranspiration (ETo) in January and December is equal to zero. That is why any climatological variables of January and December do not appear in Table 4. Table 5 shows the results of applying the developed soft computing models and compares their performance to a table interpolation method [25] and the regressionbased models for estimating the b-factor. Table 6 shows the difference between the b-factor obtained from various methods and a table interpolation method. The positive value means overestimation, and the negative value represents underestimation. The b-factor based on the regression-based models was mainly underestimated by 1.12-6.00% compared to those values obtained by the table interpolation method [25], except for estimating the b-factor in June using the equation developed by Frevert et al. [26]. It gave an overestimation of 1.11%. However, most of the soft computing models overestimated by 0.57-3.92% in the estimation of the b-factor. Some of them, for example, M5 provided underestimated by 0.3% in March, 1.57% in April, 3.02% in October, and 2.21% in November.    Table 7 presents the estimated monthly reference evapotranspiration (ETo). Using a table interpolation method as a baseline, it is also pointed out that the soft computing models outperformed the regression-based models in ETo estimation due to giving a lower percentage of yearly difference. All three regression-based models gave underestimation by 3.2-5.1% in estimating ETo, while all six soft computing models provided some overestimation by 0.4-0.9%. Based on the data used in this study, the RBF network and RT models gave the highest performance in estimating ETo due to having the lowest percentage of yearly difference.

Conclusions
Accuracy of reference evapotranspiration (ETo) estimation is of importance for agricultural water management. In this study, five soft computing models, namely RF, M5, SVR-poly, SVR-rbf, and RT, were evaluated and compared their performance for estimating FAO Blaney-Criddle b-factor among themselves and the previous studies conducted by the RBF network and three regression equations (Richard G Allen et al., 1991;Frevert et al., 1983;Ambas & Evanggelos (2010). In addition, tuning hyper-parameters for each soft computing model were experimented with to receive its suitable architecture before applying them. The main findings results revealed the following.
(1) Among five soft computing models, it was found that SVR-rbf gave the highest performance in reference evapotranspiration (ETo) estimation, followed by M5, RF, SVR-poly, and RT, respectively. (2) The new explicit equations for FAO Blaney-Criddle b-factor estimation were proposed herein using the M5 model. It is a rule set, including six linear equations. (3) Compared to the RBF network [25], SVR-rbf provided a bit lower performance but outperformed three previous regression equations. (4) The soft computing models outperformed the regression-based models in the b-factor estimation since they gave the lower values of MARE (%), MXARE (%), NE > 2%, and DEV (%) and the higher value of r 2 . (5) Models' Applicability for estimating monthly reference evapotranspiration (ETo) revealed that the soft computing models outperformed the regression-based models in ETo estimation owing to the lower percentage of yearly difference. All three regression-based models underestimated ETo, while all six soft computing models slightly overestimated it. (6) This work's usefulness is to support a more accurate and convenient evaluation of reference crop evapotranspiration with a temperature-based approach. It leads to agricultural water demand estimation accuracy as necessary data for water resources planning and management.

Conflicts of Interest:
The author declares that there is no conflict of interests regarding the publication of this manuscript. In addition, the ethical issues, including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, and redundancies have been completely observed by the authors.