Evaluation of Empirical Reference Evapotranspiration Models Using Compromise Programming: A Case Study of Peninsular Malaysia

: Selection of appropriate empirical reference evapotranspiration (ET o ) estimation models is very important for the management of agriculture, water resources, and environment. Statistical metrics generally used for performance assessment of empirical ET o models, on a station level, often give contradictory results, which make the ranking of methods a challenging task. Besides, the ranking of ET o estimation methods for a given study area based on the rank at di ﬀ erent stations is also a di ﬃ cult task. Compromise programming and group decision-making methods have been proposed in this study for the ranking of 31 empirical ET o models for Peninsular Malaysia based on four standard statistical metrics. The result revealed the Penman-Monteith as the most suitable method of estimation of ET o , followed by radiation-based Priestley and Taylor and the mass transfer-based Dalton and Meyer methods. Among the temperature-based methods, Ivanov was found the best. The methodology suggested in this study can be adopted in any other region for an easy but robust evaluation of empirical ET o models.


Introduction
Evapotranspiration (ET), the process of water release to the atmosphere, plays a crucial role in irrigation management [1], water balance estimation [2], surface runoff estimation [3], groundwater level prediction [4], water stress assessment [5], reservoir management [6], daily flux modelling [7], and climate change impact assessment [8]. It determines the crop irrigation requirement and thus, irrigation management, the introduction of new crop, or crop scheduling to adapt to climate change [9][10][11]. It is a major component that defines surface runoff and therefore, important for designing drainage and hydraulic structure [12,13]. In addition, it is the major component that determines the ecological or environmental water demand and thus, assessment of environmental sustainability or ecological balance [14]. It provides an assessment of water release from surface water bodies and reservoirs to the atmosphere and therefore, operation and management of water resources [15,16]. Hence, ET is considered as a vital component for any hydrological and climatic study [17]. Atmospheric water is an important driving factor of precipitation [18]. It has a significant effect on the retention of solar radiation and thus, controlling the air temperature of a region [19]. Therefore, the importance of the assessment of ET becomes more crucial in the context of climate change.
A most accepted method of ET estimation is to measure the reference evapotranspiration (ET o ) [20]. In-situ measurement of ET o is expensive and time-consuming, and subject to significant uncertainties. Because of the limitation of in-situ measurements of ET o , many empirical models have been developed to estimate ET o in the last 70 years, since the introduction of the Thornthwaite method in 1944 [21]. The ET o depends on atmospheric energy balance and release of water to the atmosphere from vegetation [22,23]. Therefore, the ET o estimation methods are categorized according to the meteorological parameters they use. The ET o method has been divided into different categories in different studies. Most widely, it is classified into four groups: (i) Water balance/mass transfer; (ii) radiation; (iii) temperature; and (iv) combination of the aforementioned. Each method has its own perspectives, concepts, and often developed for a particular climatic region. Few of them are developed through modification of other established methods. However, the main challenge in the estimation of ET o is the skill of the method used [15,24]. Most of the ET estimation methods are developed for a particular region with a specific viewpoint, and therefore, they are often found inefficient in estimating ET o in other climatic zones. However, some methods are developed without focusing on any climatic region and have been found applicable over a wide range of climate. The major challenge arises in the selection of the best model for an area with the least error compared to in-situ measurements.
ET is a crucial element in defining the water budget and physical processes in tropics. The condensation of the vast volume of water vapor in the tropical region leads to the release of latent heat energy to the atmosphere, which is very important for climatology in the region. Tropical regions, particularly the Southeast Asian tropical region, are rich in biodiversity. This rich biodiversity is promoted by high rainfall and high ET, among other factors. Changes in ET can have a severe impact on tropical biodiversity, and therefore, monitoring of ET is very important for the region. It is particularly crucial for Peninsular Malaysia where about 60% of its land is covered with forest with dense biodiversity.
A large number of studies have been conducted to select the most suitable ET o model in different parts of the globe [15,20,[25][26][27][28], including Peninsular Malaysia [13,[29][30][31][32]. Ali et al. [30] and Ali et al. [31] found a strong agreement of the monthly average of class A pan evaporation with the FAO Penman-Monteith [33] estimation for the Muda irrigation project, the largest paddy field in Malaysia, in the north of the Peninsula. Tukimat et al. [13] compared a number of temperature-and radiation-based methods with the FAO Penman-Monteith model to estimate ET o in the Muda irrigation project, and found that the radiation-based models give better estimates of ET o . Lee et al. [29] compared the pan evaporation with the estimates of eight empirical models and found a good agreement between pan evaporation estimates of ET o with the estimates of the FAO Penman-Monteith and FAO Blaney-Criddle [34] models in the west coast of the Peninsular. Muniandy et al. [32] compared the pan evaporation estimates with 26 empirical model estimations at a station located in the south of the Peninsular, and reported that the mass transfer-based Penman model can provide better estimates of ET o compared to other methods.
Different statistics have been used in previous studies for the assessment of the performance of ET o , which include root mean square error (RMSE), mean absolute error (MAE), Nash-Sutcliff efficiency (NSE), bias ratio, etc. Selection of ET o method based on a single statistic like RMSE or NSE is questionable, as these statistics can be used for the estimation of a particular property only. For example, RMSE provides a measure of the mean distance between two time series, while correlation provides how two time series follow each other in their variation. The correlation coefficient (R 2 ) can be excellent even if the distance between the two series is high, while RMSE can be much less even if one time series fails to follow the variation of another series. Thus, a number of statistical metrics are generally used for the assessment of the performance of different ET o methods [13,15,[25][26][27][28][29]. However, the major problem with using a number of statistical metrics is that different metrics often provide contradictory results [35][36][37]. For example, a model may show good agreement in terms of RMSE, but a worse measure in terms of R 2 . Thus, it often becomes challenging to make a decision based on different statistics. Compromise programming (CP) [38] can be used to find the most suitable solution through judicious compromising of different objectives, among which many may be conflicting. CP attempts to identify a solution where all the considered objectives achieve the most suitable value [39,40]. CP has been found more efficient compared to conventional multi-criteria decision analysis (MCDA) methods in finding the most suitable solution [38,[40][41][42][43][44].
The ranking of the ET o estimation method at a single station based on the ability to replicate the observed ET can be done using CP and a matrix of statistical indices. However, it is often required to suggest the best ET o model for a region based on the performance at different stations over the region. Ranking of ET o based on the performance at multiple stations becomes challenging, as different ET o models often show different ranks at different stations. Group decision making (GDM) can be employed for such cases where the ET o model is given a position based on the frequency of the rank obtained at different stations.
The objective of the present study is to use CP for the ranking of empirical ET o models for Peninsular Malaysia. Four statistical metrics were used for the assessment of the performance of 31 ET o models at 10 locations distributed over the Peninsula. CP was used for the ranking the empirical models at each of the 10 stations. Finally, an information aggregation approach was used for the ranking of the empirical models for the entire Peninsular Malaysia based on the results obtained at the different stations. This is the first approach of the ranking of empirical ET o models based on CP and the information aggregation approach. The method proposed in this study can be used for the ranking of empirical models in a prudent way.

Geography and Climate of Peninsular Malaysia
Situated along the tropics, Peninsular Malaysia covers an area of 130,598 km 2 ( Figure 1). Undulating mountains in the middle and relative flat coast on all the three sides (east, west, and south) are the major topographic features of the peninsula. About 60% of the land is covered by forest. The year-round rainfall, high uniform temperature, and high humidity are the major characteristics of the climate of Peninsular Malaysia. The climate is more or less homogeneous throughout the Peninsula [45,46]. Due to its geographical location, the weather in the region is influenced by both the northeast and the southwest monsoon, and thus experiences a significant amount of rainfall even in the driest month. The annual average rainfall in Peninsular Malaysia varies between 1950 and 4000 mm [47]. The number of rainfall days ranges between 150 and 200. Weather is always hot due to its location in the tropics, and humid due to high rainfall. The mean temperature in the peninsula varies between 23 • C in the central highlands and 32 • C in the coastal region [46,48]. Seasonal variation of mean temperature is always less than 2.0 • C from the mean temperature of 27 • C. Being located in the equator, the study area receives long daylight hours (about 12 h) throughout the year and, thus, sufficient solar radiation. The wind in peninsular Malaysia is mostly light (0.9 to 2.3 m/s). Sunshine hours and temperature have an important role in ET o in the study area. The ET o is lower in the rainy season due to lower sunshine hours. Furthermore, it is lower in central mountainous areas (2.5 mm/day) due to relatively higher humidity compared to the coastal region (4-5 mm/day), where the humidity is less.  Table 1.

Data and Sources of Information
The observed daily meteorological data of temperature (mean, maximum, and minimum), relative humidity, solar radiation, wind speed, and pan evaporation from 10 meteorological stations fairly distributed over Peninsular Malaysia were collected from the Malaysian Meteorological Department. The locations of the meteorological stations are shown in Figure 1. The summary of the different climatic variables used in the present study is given in Table 1. RH is the relative humidity; u is the wind speed; Rs is the solar radiation; and ETpan is the pan evaporation.
Pan evaporation is an indirect and less expensive method of estimation of ET and therefore, it is most widely used for estimation of ET. The pan evaporation data is multiplied by the pan coefficient to get the ETo. The pan coefficient value varies between 0.35 and 0.85, depending on the nature of the evaporating surfaces (land use), altitude, average humidity, and average wind speed of the site [49]. Considering the existing setup, the Department of Irrigation and Drainage of Malaysia [50] suggested a pan coefficient of 0.75 for the estimation of ETo from pan evaporation in Malaysia. Therefore, the observed ETo was calculated by multiplying the pan evaporation data by the pan coefficient of 0.75.

Data and Sources of Information
The observed daily meteorological data of temperature (mean, maximum, and minimum), relative humidity, solar radiation, wind speed, and pan evaporation from 10 meteorological stations fairly distributed over Peninsular Malaysia were collected from the Malaysian Meteorological Department. The locations of the meteorological stations are shown in Figure 1. The summary of the different climatic variables used in the present study is given in Table 1. RH is the relative humidity; u is the wind speed; R s is the solar radiation; and ETpan is the pan evaporation.
Pan evaporation is an indirect and less expensive method of estimation of ET and therefore, it is most widely used for estimation of ET. The pan evaporation data is multiplied by the pan coefficient to get the ET o . The pan coefficient value varies between 0.35 and 0.85, depending on the nature of the evaporating surfaces (land use), altitude, average humidity, and average wind speed of the site [49]. Considering the existing setup, the Department of Irrigation and Drainage of Malaysia [50] suggested a pan coefficient of 0.75 for the estimation of ET o from pan evaporation in Malaysia. Therefore, the observed ET o was calculated by multiplying the pan evaporation data by the pan coefficient of 0.75.

Methodology
The performance of different empirical ET o models was assessed and ranked by comparing their estimations with the in-situ data. The methodology adopted in this study is summarized below.

1.
ET o was estimated by the empirical models using the metrological variables.

2.
Four statistical metrics were used to estimate the capability of different empirical ET o models to estimate different properties of observed ET o at each station. 3.
CP was used to integrate the results of statistical metrics and rank the ET o models at each station. 4.
GDM, an information accumulation method, was deployed to rank the empirical models for the entire Peninsula.

Empirical ET o Models
In this study, 31 empirical ET o models were evaluated by comparing their estimates with the pan evaporation data. They were selected based on their applicability worldwide and the availability of required input data. The empirical models were classified into four groups based on the input parameters. Out of 31 models, 10 are temperature-based, 10 are radiation-based, 10 are mass transfer-based models, and one is a combination model. The ET o was calculated using the meteorological input at each station location without any calibration. Table 2 lists the input parameters and the equation of each of the 31 empirical models. Table 2. List of the empirical reference evapotranspiration (ET o ) models evaluated in this study, along with their input parameters and equations. They are classed into four groups: Temperature-based, radiation-based, mass transfer-based, and combination.

Radiation-based
Jensen et al. [62] T mean , R s ET o = Rs λ (0.025T mean + 0.08) 14 Priestley et al. [ Meyer [70] T mean , RH, u Rohwer [71] T mean , RH, u Penman [72] T mean , RH, u Albrecht [73] T mean , RH, u Brockamp et al. [74] T mean , RH, u Mahringer [76] T mean , RH, u Szasz [77] T mean , RH, u ET o is the evapotranspiration in mm/day in all equations except the Ritchie and McGuinness and Bordne models, where ET o is in cm/day. R n is the net radiation (MJ/m 2 /day). G is the soil heat flux (MJ/m 2 /day). R a is the extraterrestrial radiation (MJ/m 2 /day). Γ is the psychrometric constant (kPa/ • C). e s is the saturation vapor pressure (hPa). e a is the actual vapor pressure (hPa). e s and e a are in hPa in all equations except the Papadakis, Rohwer, Penman, and FAO Penman-Monteith models, where e s and e a are in kPa. ∆ is the slope of the saturation vapor pressure-temperature curve (kPa/ • C). λ is the latent heat of evaporation (MJ/kg). T mean is the average daily air temperature ( • C). T mean is in • C in all equations except the McGuinness and Bordne model, where T mean is in • F. u is the mean daily wind speed at 2 m (m/s). f(u) is a function of wind speed. Z is the elevation (m). L is local latitude (degrees). T d is the dew point temperature ( • C). T min is the minimum air temperature ( • C). T max is the maximum air temperature ( • C). TD is the maximum and minimum temperature difference ( • C). RH is the average relative humidity (%). Rs is the solar radiation. Rs is in MJ/m 2 /day in all equations except the Turc, Makkink, Ritchie and McGuinness, and Bordne models, where Rs is in Cal/m 2 day, and the Caprio model, where Rs is in kJ/m 2 day. e ma is the saturation vapor pressure at the monthly mean daily maximum temperature (kPa). p is the mean annual percentage of daytime hours for different latitudes that can be obtained from Doorenbos et al. [34]. p is expressed as constant (0.274) in Muniandy et al. [32]. L d is the daytime length in multiples of 12 h. RHOSAT is saturated vapor density (g/m 3 ). ESAT is the saturated vapor pressure (mbar). KPEC is the calibration coefficient (1.2). α is a constant (1.26). α 1 is a constant (1.1).

Statistical Indices
Four statistical metrics were used to measure the capability of each empirical model in estimating the observed ET o at each gauge location. They were the normalized root mean square error (NRMSE), percentage of bias (%BIAS), modified index of agreement (md), and Kling-Gupta efficiency (KGE). The NRMSE is a measure of accuracy as it calculates the magnitudes of the errors in modeled data [78]. The %BIAS quantifies the tendency of ET o estimation by empirical models to under or over-estimate the observed data [36]. The md summarizes the additive and proportional differences in the observed and modeled ET o means and variances. The KGE integrates linear correlation (r), bias ratio (β), and variability (γ) of observed and modeled data [35,79]. Table 3 presents each metric equation, range, and optimum value. Table 3. The metric equations, range, and optimum value.

Metric Equation Range
Optimum Value ET 0m,i and ET 0obs,i are the i-th modeled and observed ET 0 data; n is the number of observations; j represents an arbitrary positive power; r is the Pearson correlation; β is the bias ratio; and γ represents the variability of observed and modeled data.

Compromise Programming
Compromise programming (CP) was used to integrate the results of the statistical metrics described above to enable selection of the most accurate empirical ET o model. CP ranks the empirical methods based on the distance of each method from an ideal value for the set [42,80]. The CP index (CPI) can be calculated as follows.
where i represents the result of a statistical metric; x i is the normalized value of metric i of the empirical model; and x * i is the normalized ideal value of the metric i. The parameter p is used to measure the distance of a solution from an ideal point. The p can have a value between 1 and ∞. However, 1, 2, and ∞ are most commonly used in CP [81,82]. Therefore, these values are used in this study to estimate the CPI. The differences between the observed value of the metrics and x * i are directly proportional to their magnitude when p = 1. The higher differences have greater influence in the case of p = 2. When p = ∞, the minimum values of the maximum differences are used for the estimation of the CPI. Details of the method can be found in [37,80].
In this study, we considered equal importance of all the ET o estimation models and therefore, the weight parameter of the CP method proposed by Zeleny [38] is not considered. The CPI value ranges between zero and positive infinity, where zero is the most preferable value.

Ranking the Empirical ET o Models
The ranking of empirical models in estimating observed ET o from several stations was a challenging task. This was due to the fact that a model may show various degrees of accuracies at different locations.
To overcome this challenge, information aggregation methods, such as mean ranking, majority of ranks, and frequency of occurrence, were useful [42,83]. They integrate information from different sources to help in the decision-making process [84]. In this study, empirical models were ranked using GDM. The ranking procedure is outlined below.

1.
The empirical models were ranked at station level using their CPI (from 1 to 31, the lowest CPI was ranked 1st).

2.
The frequency of occurrence (F) of each model of getting a certain rank at all stations was calculated through a 31 × 31 matrix. 3.
The rank positions were given weight as the inverse of the rank w r = rank −1 .

4.
The frequency of occurrence of a model at a certain rank, obtained in Step 2, was multiplied by the weight of the rank, obtained in Step 3.

5.
The overall score of each ET o model (W m ) was estimated by adding the output of Step 4 as presented in Equation (6). 6. The empirical models were ranked according to the calculated overall weight, where the highest weighted model was ranked top (1st position).

Results
The ET o was estimated using all the 31 empirical models at each station, using the meteorological variables. Figure 2 shows heat-scatter plots of observed ET o against each empirical model estimation of ET o for all the stations. It can be seen from Figure 2a-t, all the temperature-and radiation-based models tend to overestimate the observed ET o except for the Ivanov and Makkink model. The overestimation was generally lower by the radiation-based models (Figure 2k-t) than the temperate-based models, which indicates that the overestimation may be due to the exclusion of other factors influencing ET o in the study area. The Ritchie model was found to heavily overestimate the observed ET o , as seen in Figure 2q. Overall, the mass transfer-based models' estimations (Figure 2u-ad) were found to be more aligned to the 1:1 diagonal line than the temperature-and radiation-based methods. The Penman, WMO, and Mahringer models underestimated the observations. The 5. The overall score of each ETo model ( ) was estimated by adding the output of Step 4 as presented in Equation (6). 6. The empirical models were ranked according to the calculated overall weight, where the highest weighted model was ranked top (1st position).

Results
The ETo was estimated using all the 31 empirical models at each station, using the meteorological variables. Figure 2 shows heat-scatter plots of observed ETo against each empirical model estimation of ETo for all the stations. It can be seen from Figure 2a-t, all the temperature-and radiation-based models tend to overestimate the observed ETo except for the Ivanov and Makkink model. The overestimation was generally lower by the radiation-based models (Figure 2k-t) than the temperatebased models, which indicates that the overestimation may be due to the exclusion of other factors influencing ETo in the study area. The Ritchie model was found to heavily overestimate the observed ETo, as seen in Figure 2q. Overall, the mass transfer-based models' estimations (Figure 2u-ad) were found to be more aligned to the 1:1 diagonal line than the temperature-and radiation-based methods. The Penman, WMO, and Mahringer models underestimated the observations. The

Evaluation Using Statistical Metrics
The statistical metrics obtained at all the 10 station locations by comparing the observed ET o with the different empirical model estimations are presented as box plots in Figure 3. The blue, green, gold, and pink box plots represent the temperature-, radiation-, mass transfer-, and combination-based methods, respectively. The red vertical lines represent the optimum value of each metric. Overall, most of the temperature-based methods were found to be poor at estimating the ET o . Among the temperature-based methods, the Ivanov model was found preferable, which had a median NRMSE of 108.8, median %BIAS of 0.70%, median md of 0.51, and median KGE of 0.44.

Evaluation Using Statistical Metrics
The statistical metrics obtained at all the 10 station locations by comparing the observed ETo with the different empirical model estimations are presented as box plots in Figure 3. The blue, green, gold, and pink box plots represent the temperature-, radiation-, mass transfer-, and combinationbased methods, respectively. The red vertical lines represent the optimum value of each metric. Overall, most of the temperature-based methods were found to be poor at estimating the ETo. Among the temperature-based methods, the Ivanov model was found preferable, which had a median NRMSE of 108.8, median %BIAS of 0.70%, median md of 0.51, and median KGE of 0.44. The performance of radiation-based methods was found superior to temperature-based models in estimating ETo. The Makkink model was the best performing model among them, and the Ritchie was the worst. As shown in Figure 3 The FAO Penman-Monteith model had the lowest NRMSE median, and the highest md and KGE medians. However, the Rohwer model had a lower %BIAS median than the FAO Penman-Monteith model. The Rowher, Meyer, and Makkink models had similar NRMSE, but Rowher had the lowest %BIAS, and Makkink had the highest md and KGE medians. Therefore, it is important to use CP to integrate the results of the statistical metrics to make a concrete evaluation decision. The performance of radiation-based methods was found superior to temperature-based models in estimating ET o . The Makkink model was the best performing model among them, and the Ritchie was the worst. As shown in Figure 3 The FAO Penman-Monteith model had the lowest NRMSE median, and the highest md and KGE medians. However, the Rohwer model had a lower %BIAS median than the FAO Penman-Monteith model. The Rowher, Meyer, and Makkink models had similar NRMSE, but Rowher had the lowest %BIAS, and Makkink had the highest md and KGE medians. Therefore, it is important to use CP to integrate the results of the statistical metrics to make a concrete evaluation decision.

Compromise Programming
CP was employed to integrate the statistical metrics and rank the empirical models based on their capability in estimating the observed ET o in Peninsular Malaysia. It was used to measure the distance of each empirical model from an ideal point at each station separately. As an example, the ideal results obtained at Kuantan station were the lowest NRMSE (93.10), the %BIAS nearest to zero (3.70) and, the highest md and KGE (0.58, for both). The CPI was calculated for each model through the summation of the subtraction of each metric from the ideal value. The following equation presents an example of the CPI calculation of the Ivanov model at Kuantan.
The same procedure was used to calculate the CPI of the remaining models. Figure 4 shows a level plot of the CPI for each empirical model at Kuantan station.

Compromise Programming
CP was employed to integrate the statistical metrics and rank the empirical models based on their capability in estimating the observed ETo in Peninsular Malaysia. It was used to measure the distance of each empirical model from an ideal point at each station separately. As an example, the ideal results obtained at Kuantan station were the lowest NRMSE (93.10), the %BIAS nearest to zero (3.70) and, the highest md and KGE (0.58, for both). The CPI was calculated for each model through the summation of the subtraction of each metric from the ideal value. The following equation presents an example of the CPI calculation of the Ivanov model at Kuantan.
The same procedure was used to calculate the CPI of the remaining models. Figure 4 shows a level plot of the CPI for each empirical model at Kuantan station.

Ranking the Empirical ETo Models
Ranking of the empirical models was done through a six-step procedure, as stated in Section 3.4. First, the CPI values at each station were used to rank the empirical models, where the model that had the lowest CPI was ranked 1st at each station and vice versa. For example, the FAO Penman-Monteith model had the lowest CPI (7.60) in Kuantan station (refer to Figure 4), therefore ranked 1st, followed by the Dalton model which had the 2nd second lowest CPI of 20.08. The rank of each model at each station in Peninsular Malaysia is illustrated in Figure 5 as a level plot.
The frequency of occurrence that a model achieved a certain rank in different stations was calculated. For an example, the FAO Penman-Monteith model was found to have the least CPI in Kuala Terengganu, Kuantan, Melaka, and Muadzam Shah stations, so it was ranked 1st in these stations (refer to Figure 5). Therefore, the frequency of occurrence that the FAO Penman-Monteith model received as number one was four times. The levels of Figure 6 show the complete frequency of occurrence of the empirical models received a certain rank. For example, it can be seen that the FAO Penman-Monteith model was ranked at the 1st rank four times, 2nd rank once, 3rd rank four times, and 4th rank once. On the contrary, the FAO Blaney-Criddle and Kharuffa models were found to have the highest frequency (10 times) for getting the 31st and 30th rank, respectively.

Ranking the Empirical ET o Models
Ranking of the empirical models was done through a six-step procedure, as stated in Section 3.4. First, the CPI values at each station were used to rank the empirical models, where the model that had the lowest CPI was ranked 1st at each station and vice versa. For example, the FAO Penman-Monteith model had the lowest CPI (7.60) in Kuantan station (refer to Figure 4), therefore ranked 1st, followed by the Dalton model which had the 2nd second lowest CPI of 20.08. The rank of each model at each station in Peninsular Malaysia is illustrated in Figure 5 as a level plot.
The frequency of occurrence that a model achieved a certain rank in different stations was calculated. For an example, the FAO Penman-Monteith model was found to have the least CPI in Kuala Terengganu, Kuantan, Melaka, and Muadzam Shah stations, so it was ranked 1st in these stations (refer to Figure 5). Therefore, the frequency of occurrence that the FAO Penman-Monteith model received as number one was four times. The levels of Figure 6 show the complete frequency of occurrence of the empirical models received a certain rank. For example, it can be seen that the FAO Penman-Monteith model was ranked at the 1st rank four times, 2nd rank once, 3rd rank four times, and 4th rank once. On the contrary, the FAO Blaney-Criddle and Kharuffa models were found to have the highest frequency (10 times) for getting the 31st and 30th rank, respectively.  The frequencies of occurrence of rank positions were multiplied by the rank weights and the overall score of each empirical model ( ) was calculated by summing the output of the  The frequencies of occurrence of rank positions were multiplied by the rank weights and the overall score of each empirical model ( ) was calculated by summing the output of the The frequencies of occurrence of rank positions were multiplied by the rank weights and the overall score of each empirical model (W m ) was calculated by summing the output of the multiplication.
For example, the frequencies of the FAO Penman-Monteith model having the 1st, 2nd, 3rd, and 4th rank were 4, 1, 4, and 1. So, the W m was calculated as shown in Equation (8).
Based on the W m , the models were finally ranked as shown in Table 4. The FAO Penman-Monteith model was the top-ranked model in this study, followed by the Priestley-Taylor and Dalton models. The Hamon, Kharuffa, and FAO Blaney-Criddle models were ranked as the last three models. The ranking of the ET models for different values of p is presented in Table 5. The results revealed a slight variation in the ranks of a few models. From example, Makkink was ranked 5th for p = 1 and p = ∞, and it was ranked 3rd in case of p = 2. However, FAO Penman-Monteith was found as the most suitable method for all values of p. Priestley-Taylor was found best among the radiation-based models and Ivanov among the temperature-based models for all the cases. However, the best mass transfer-based model was not consistent for all values of p. Dalton was found best for p = 1, while Meyer was best for p = 2 and p = ∞. Therefore, both can be considered the most suitable mass transfer-based ET o models for Peninsular Malaysia.  7  9  7  Penman  23  22  23  Trabert  8  7  6  Linacre  24  25  24  Mahringer  9  8  9  Ritchie  25  24  25  Ivanov  10  11  12  Schendel  26  27  27  Albrecht  11  10  10  Ravazzani  26  24  26  Brockamp and Wenner  12  12  11  Trajkovic  28  28  28  Irmak-Rn  13  20  15  Hamon  29  29  29  Abtew  14  14  13  Kharuffa  30  30  30  Turc  15  15  14  FAO Blaney-Criddle  31  31  31  Szasz  16  16

Discussion
The FAO-Penman-Monteith method has been found as the most efficient for the estimation of ETo in different climatic regions. The FAO Penman-Monteith model was developed based on physiological and aerodynamic theories of surface water release to the atmosphere. Therefore, it can be used as a standard model for estimation of ETo in any region without any adjustment of parameters [24] based

Discussion
The FAO-Penman-Monteith method has been found as the most efficient for the estimation of ET o in different climatic regions. The FAO Penman-Monteith model was developed based on physiological and aerodynamic theories of surface water release to the atmosphere. Therefore, it can be used as a standard model for estimation of ET o in any region without any adjustment of parameters [24] based on the FAO recommendation [13,15]. The present study also found that FAO Penman-Monteith is the best model for Peninsular Malaysia. However, the Penman-Monteith model needs a large number of meteorological variables, including air temperature, wind speed, relative humidity, and solar radiation, for the estimation of ET o . It is often very difficult to obtain data of all the meteorological variables. Many meteorological stations in the developing country do not measure all these variables. Therefore, a large number of alternative models have been developed based on the availability of data. The success of those models in a particular area often depends on the climate of the region. Therefore, the selection of an appropriate model based on the availability of data and the performance of ET o estimation model is a difficult task. The performance of 31 empirical ET o models has been assessed in this study. Input requirements of the models are different.
The performance of empirical ET o models was often found to vary from station to station within the same climate zone, which may be due to the period and quality of data used, and uncertainty in the coefficient values used for the estimation of ET o . Besides, suggesting different models for different stations often makes the practical application of the ET o estimation model complex. Therefore, a single model is often suggested for the regional level for the estimation of ET o . Thus, the ranking of ET o estimation models in different stations was used in this study for the ranking of ET o models for the entire Peninsular Malaysia using information aggregation approach.
In the present study, radiation-based Priestley and Taylor was found to perform best after the FAO Penman-Monteith model. It is followed by the mass transfer-based Dalton and Meyer models. The Priestley and Taylor model needs three meteorological variables (mean air temperature, solar radiation, and relative humidity) compared to the five variables required by the Penman-Monteith model (temperature, solar radiation, relative humidity, wind speed, and saturated vapor pressure), while Dalton and Meyer need three meteorological variables (mean temperature, relative humidity, and wind speed). Based on the availability of meteorological data, an appropriate model can be selected for the estimation of ET o in Peninsular Malaysia with more or less similar accuracy.
Among the temperature-based models, only the Ivanov model was found to perform satisfactorily, which was ranked 10th among the 31 models compared in the present study. Other temperature-based models performed the worst and were ranked at the bottom of all the models. Many of the models were developed for a particular climate. For example, the Priestley and Taylor and the Makkink models were developed for the estimation of ET o in a humid climate. On the other hand, the Turc model was found suitable for ET o estimation in a cold, humid and arid climate [26]. Therefore, the Priestley and Taylor and Makkink models were found to perform very well among the radiation-based models, while Turc was found to perform worse than the simple temperature-based Ivonov model in tropical Malaysia.
The findings of the present study contradicts earlier studies. Ali and Lee [31] found Blaney-Criddle as the most suitable model after Penman-Monteith for the estimation of ET o at Alor Setar station in Peninsular Malaysia. They only used relative error for the assessment of the performance of empirical ET o models. Tukimat et al. [13] assessed the performance of seven empirical ET models for the same station using three statistical metrics, namely absolute error, relative error, and correlation coefficient. They found the least absolute and relative errors for the Hargreaves-Samani but highest correlation for Makkink, followed by Priestley-Taylor and Turc. They come to an overall conclusion that radiation-based models are most suitable for the estimation of ET o in the region, which also support the findings of the present study. But they failed to decide the best model due to a contradiction in statistical metrics. Lee et al. [29] compared the performance of eight empirical models using mean absolute error, and reported FAO Blaney-Criddle as the most suitable model after FAO Penman-Monteith for estimation of ET o in the west coast of the peninsula. Muniandy et al. [32] compared the performance of 26 empirical models at Senai station using eight statistical metrics. They also obtained contradictory results in term of different statistics. They took the arithmetic mean of the statistics to rank the models and found Penman as the best among the mass transfer-based models, McGuinness and Bordne among the radiation-based, and Szasz among the temperature-based models. Different models have been reported as the best in different stations in Peninsular Malaysia in the above studies, which do not match with the findings of the present study. This is due to the use of a single statistic for making a decision, as in the studies of Lee et al. [29] and Ali et al. [31]. Tukimat et al. [13] and Muniandy et al. [32] used multiple statistics, but did not attempt to find the best ET model based on the statistics. Muniandy et al. [32] attempted to rank the models based on the average of multiple statistics, but the average of statistics does not provide an optimum solution as the ranges of statistics metrics vary widely.
CP was used in this study for finding the most suitable empirical models based on four statistical metrics which can be used to measure the similarity between two time series in a robust way. CP proves a robust model compared to many MCDA models for finding a reliable solution based on multiple contradictory objectives. Therefore, the best empirical models identified in this study based on CP can be considered more reliable. Besides, the empirical models were ranked for the entire Peninsular Malaysia considering the fact of the same tropical humid climate for the whole region. The information aggregation model was used in this study for this purpose, which ranks the models based on the frequency of rank obtained by different models in different stations. Therefore, the top-ranked models in different stations were also found to achieve the top rank for the entire peninsula. This indicates the ranking of the models obtained in this study for the entire peninsula can be used for finding the most suitable model based on the availability of data for reliable estimation of ET o .

Conclusions
The CP and GDM methods were used in this study for the ranking 31 ET o empirical models for the estimation of ET o in Peninsular Malaysia, based on four statistical metrics applied at 10 locations distributed over the study area. The result revealed Priestley and Taylor as the most suitable among the radiation-based models, Dalton among the mass transfer-based models, and Ivonov among the temperature-based models for the region. Though the mass transfer-based models were found more reliable compared to radiation-based models, Priestley and Taylor was found as the most suitable after Penman-Monteith, which is globally considered as the standard model for ET o estimation. The Priestley and Taylor model needs only mean air temperature, solar radiation, and relative humidity compared to a large number of meteorological variables required for the estimation of ET o using Penman-Monteith. Therefore, Priestley and Taylor can be used as a replacement of Penman-Monteith in the estimation of ET o when available data is limited. The present study suggests that the Ivonov model, which requires only mean temperature and relative humidity, can be used for the worst case in terms of availability of data.
Estimation of ET o in this study was based on pan coefficient of 0.75, as suggested by the Department of Irrigation and Drainage of Malaysia. The sensitivity of the ranking of ET o estimation methods can be tested in the future for different pan coefficients. CP and GDM were used in this study for making a decision on ET o models. Beside CP and GDM, other decision-making and information aggregation methods can be used, and their performance can be compared with the findings of the present study in the future. The parameters of the empirical models can be calibrated for Peninsular Malaysia before the comparison and ranking of the models.