## 1. Introduction

Evapotranspiration (ET), the process of water release to the atmosphere, plays a crucial role in irrigation management [

1], water balance estimation [

2], surface runoff estimation [

3], groundwater level prediction [

4], water stress assessment [

5], reservoir management [

6], daily flux modelling [

7], and climate change impact assessment [

8]. It determines the crop irrigation requirement and thus, irrigation management, the introduction of new crop, or crop scheduling to adapt to climate change [

9,

10,

11]. It is a major component that defines surface runoff and therefore, important for designing drainage and hydraulic structure [

12,

13]. In addition, it is the major component that determines the ecological or environmental water demand and thus, assessment of environmental sustainability or ecological balance [

14]. It provides an assessment of water release from surface water bodies and reservoirs to the atmosphere and therefore, operation and management of water resources [

15,

16]. Hence, ET is considered as a vital component for any hydrological and climatic study [

17]. Atmospheric water is an important driving factor of precipitation [

18]. It has a significant effect on the retention of solar radiation and thus, controlling the air temperature of a region [

19]. Therefore, the importance of the assessment of ET becomes more crucial in the context of climate change.

A most accepted method of ET estimation is to measure the reference evapotranspiration (ET

_{o}) [

20]. In-situ measurement of ET

_{o} is expensive and time-consuming, and subject to significant uncertainties. Because of the limitation of in-situ measurements of ET

_{o}, many empirical models have been developed to estimate ET

_{o} in the last 70 years, since the introduction of the Thornthwaite method in 1944 [

21]. The ET

_{o} depends on atmospheric energy balance and release of water to the atmosphere from vegetation [

22,

23]. Therefore, the ET

_{o} estimation methods are categorized according to the meteorological parameters they use. The ET

_{o} method has been divided into different categories in different studies. Most widely, it is classified into four groups: (i) Water balance/mass transfer; (ii) radiation; (iii) temperature; and (iv) combination of the aforementioned. Each method has its own perspectives, concepts, and often developed for a particular climatic region. Few of them are developed through modification of other established methods. However, the main challenge in the estimation of ET

_{o} is the skill of the method used [

15,

24]. Most of the ET estimation methods are developed for a particular region with a specific viewpoint, and therefore, they are often found inefficient in estimating ET

_{o} in other climatic zones. However, some methods are developed without focusing on any climatic region and have been found applicable over a wide range of climate. The major challenge arises in the selection of the best model for an area with the least error compared to in-situ measurements.

ET is a crucial element in defining the water budget and physical processes in tropics. The condensation of the vast volume of water vapor in the tropical region leads to the release of latent heat energy to the atmosphere, which is very important for climatology in the region. Tropical regions, particularly the Southeast Asian tropical region, are rich in biodiversity. This rich biodiversity is promoted by high rainfall and high ET, among other factors. Changes in ET can have a severe impact on tropical biodiversity, and therefore, monitoring of ET is very important for the region. It is particularly crucial for Peninsular Malaysia where about 60% of its land is covered with forest with dense biodiversity.

A large number of studies have been conducted to select the most suitable ET

_{o} model in different parts of the globe [

15,

20,

25,

26,

27,

28], including Peninsular Malaysia [

13,

29,

30,

31,

32]. Ali et al. [

30] and Ali et al. [

31] found a strong agreement of the monthly average of class A pan evaporation with the FAO Penman-Monteith [

33] estimation for the Muda irrigation project, the largest paddy field in Malaysia, in the north of the Peninsula. Tukimat et al. [

13] compared a number of temperature- and radiation-based methods with the FAO Penman-Monteith model to estimate ET

_{o} in the Muda irrigation project, and found that the radiation-based models give better estimates of ET

_{o}. Lee et al. [

29] compared the pan evaporation with the estimates of eight empirical models and found a good agreement between pan evaporation estimates of ET

_{o} with the estimates of the FAO Penman-Monteith and FAO Blaney-Criddle [

34] models in the west coast of the Peninsular. Muniandy et al. [

32] compared the pan evaporation estimates with 26 empirical model estimations at a station located in the south of the Peninsular, and reported that the mass transfer-based Penman model can provide better estimates of ET

_{o} compared to other methods.

Different statistics have been used in previous studies for the assessment of the performance of ET

_{o}, which include root mean square error (RMSE), mean absolute error (MAE), Nash-Sutcliff efficiency (NSE), bias ratio, etc. Selection of ET

_{o} method based on a single statistic like RMSE or NSE is questionable, as these statistics can be used for the estimation of a particular property only. For example, RMSE provides a measure of the mean distance between two time series, while correlation provides how two time series follow each other in their variation. The correlation coefficient (R

^{2}) can be excellent even if the distance between the two series is high, while RMSE can be much less even if one time series fails to follow the variation of another series. Thus, a number of statistical metrics are generally used for the assessment of the performance of different ET

_{o} methods [

13,

15,

25,

26,

27,

28,

29]. However, the major problem with using a number of statistical metrics is that different metrics often provide contradictory results [

35,

36,

37]. For example, a model may show good agreement in terms of RMSE, but a worse measure in terms of R

^{2}. Thus, it often becomes challenging to make a decision based on different statistics.

Compromise programming (CP) [

38] can be used to find the most suitable solution through judicious compromising of different objectives, among which many may be conflicting. CP attempts to identify a solution where all the considered objectives achieve the most suitable value [

39,

40]. CP has been found more efficient compared to conventional multi-criteria decision analysis (MCDA) methods in finding the most suitable solution [

38,

40,

41,

42,

43,

44].

The ranking of the ET_{o} estimation method at a single station based on the ability to replicate the observed ET can be done using CP and a matrix of statistical indices. However, it is often required to suggest the best ET_{o} model for a region based on the performance at different stations over the region. Ranking of ET_{o} based on the performance at multiple stations becomes challenging, as different ET_{o} models often show different ranks at different stations. Group decision making (GDM) can be employed for such cases where the ET_{o} model is given a position based on the frequency of the rank obtained at different stations.

The objective of the present study is to use CP for the ranking of empirical ET_{o} models for Peninsular Malaysia. Four statistical metrics were used for the assessment of the performance of 31 ET_{o} models at 10 locations distributed over the Peninsula. CP was used for the ranking the empirical models at each of the 10 stations. Finally, an information aggregation approach was used for the ranking of the empirical models for the entire Peninsular Malaysia based on the results obtained at the different stations. This is the first approach of the ranking of empirical ET_{o} models based on CP and the information aggregation approach. The method proposed in this study can be used for the ranking of empirical models in a prudent way.

## 4. Results

The ET

_{o} was estimated using all the 31 empirical models at each station, using the meteorological variables.

Figure 2 shows heat-scatter plots of observed ET

_{o} against each empirical model estimation of ET

_{o} for all the stations. It can be seen from

Figure 2a–t, all the temperature- and radiation-based models tend to overestimate the observed ET

_{o} except for the Ivanov and Makkink model. The overestimation was generally lower by the radiation-based models (

Figure 2k–t) than the temperate-based models, which indicates that the overestimation may be due to the exclusion of other factors influencing ET

_{o} in the study area. The Ritchie model was found to heavily overestimate the observed ET

_{o}, as seen in

Figure 2q. Overall, the mass transfer-based models’ estimations (

Figure 2u–ad) were found to be more aligned to the 1:1 diagonal line than the temperature- and radiation-based methods. The Penman, WMO, and Mahringer models underestimated the observations. The FAO Penman-Monteith model estimations of ET

_{o} were aligned with the 1:1 line (

Figure 2ae).

#### 4.1. Evaluation Using Statistical Metrics

The statistical metrics obtained at all the 10 station locations by comparing the observed ET

_{o} with the different empirical model estimations are presented as box plots in

Figure 3. The blue, green, gold, and pink box plots represent the temperature-, radiation-, mass transfer-, and combination-based methods, respectively. The red vertical lines represent the optimum value of each metric. Overall, most of the temperature-based methods were found to be poor at estimating the ET

_{o}. Among the temperature-based methods, the Ivanov model was found preferable, which had a median NRMSE of 108.8, median %BIAS of 0.70%, median md of 0.51, and median KGE of 0.44.

The performance of radiation-based methods was found superior to temperature-based models in estimating ET

_{o}. The Makkink model was the best performing model among them, and the Ritchie was the worst. As shown in

Figure 3, Makkink had the lowest median %BIAS (−1.80%), and highest median md (0.55). However, the Priestley and Taylor model had a slightly better median NRMSE (94.35) than Makkink (102.10). The McGuinness and Bordne model had a better median KGE (0.57) than Makkink (0.55).

Among the mass transfer-based models, the Rohwer and Meyer’s methods performed best. Rohwer had median NRMSE, %BIAS, md, and KGE of 104.25, −0.40%, 0.51, and 0.45, respectively. The Meyer model had median NRMSE, %BIAS, md, and KGE of 103.45, 3.10%, 0.51, and 0.45, respectively. The FAO Penman-Monteith had a median NRMSE of 85, %BIAS of −2.90%, md of 0.57, and KGE of 0.60.

The FAO Penman-Monteith model had the lowest NRMSE median, and the highest md and KGE medians. However, the Rohwer model had a lower %BIAS median than the FAO Penman-Monteith model. The Rowher, Meyer, and Makkink models had similar NRMSE, but Rowher had the lowest %BIAS, and Makkink had the highest md and KGE medians. Therefore, it is important to use CP to integrate the results of the statistical metrics to make a concrete evaluation decision.

#### 4.2. Compromise Programming

CP was employed to integrate the statistical metrics and rank the empirical models based on their capability in estimating the observed ET_{o} in Peninsular Malaysia. It was used to measure the distance of each empirical model from an ideal point at each station separately. As an example, the ideal results obtained at Kuantan station were the lowest NRMSE (93.10), the %BIAS nearest to zero (3.70) and, the highest md and KGE (0.58, for both). The CPI was calculated for each model through the summation of the subtraction of each metric from the ideal value. The following equation presents an example of the CPI calculation of the Ivanov model at Kuantan.

The same procedure was used to calculate the CPI of the remaining models.

Figure 4 shows a level plot of the CPI for each empirical model at Kuantan station.

#### 4.3. Ranking the Empirical ET_{o} Models

Ranking of the empirical models was done through a six-step procedure, as stated in

Section 3.4. First, the CPI values at each station were used to rank the empirical models, where the model that had the lowest CPI was ranked 1st at each station and vice versa. For example, the FAO Penman-Monteith model had the lowest CPI (7.60) in Kuantan station (refer to

Figure 4), therefore ranked 1st, followed by the Dalton model which had the 2nd second lowest CPI of 20.08. The rank of each model at each station in Peninsular Malaysia is illustrated in

Figure 5 as a level plot.

The frequency of occurrence that a model achieved a certain rank in different stations was calculated. For an example, the FAO Penman-Monteith model was found to have the least CPI in Kuala Terengganu, Kuantan, Melaka, and Muadzam Shah stations, so it was ranked 1st in these stations (refer to

Figure 5). Therefore, the frequency of occurrence that the FAO Penman-Monteith model received as number one was four times. The levels of

Figure 6 show the complete frequency of occurrence of the empirical models received a certain rank. For example, it can be seen that the FAO Penman-Monteith model was ranked at the 1st rank four times, 2nd rank once, 3rd rank four times, and 4th rank once. On the contrary, the FAO Blaney-Criddle and Kharuffa models were found to have the highest frequency (10 times) for getting the 31st and 30th rank, respectively.

The frequencies of occurrence of rank positions were multiplied by the rank weights and the overall score of each empirical model (${W}_{m}$) was calculated by summing the output of the multiplication. For example, the frequencies of the FAO Penman-Monteith model having the 1st, 2nd, 3rd, and 4th rank were 4, 1, 4, and 1. So, the ${W}_{m}$ was calculated as shown in Equation (8).

Based on the

${W}_{m}$, the models were finally ranked as shown in

Table 4. The FAO Penman-Monteith model was the top-ranked model in this study, followed by the Priestley-Taylor and Dalton models. The Hamon, Kharuffa, and FAO Blaney-Criddle models were ranked as the last three models.

The ranking of the ET models for different values of

p is presented in

Table 5. The results revealed a slight variation in the ranks of a few models. From example, Makkink was ranked 5th for

p = 1 and

p = ∞, and it was ranked 3rd in case of

p = 2. However, FAO Penman-Monteith was found as the most suitable method for all values of

p. Priestley-Taylor was found best among the radiation-based models and Ivanov among the temperature-based models for all the cases. However, the best mass transfer-based model was not consistent for all values of

p. Dalton was found best for

p = 1, while Meyer was best for

p = 2 and

p = ∞. Therefore, both can be considered the most suitable mass transfer-based ET

_{o} models for Peninsular Malaysia.

To show the efficacy of the top-ranked empirical model identified in this study, the FAO Penman-Monteith estimated and observed ET

_{o} were compared. The heat scatter plots of observed FAO Penman-Monteith ET

_{o} at different stations are presented in

Figure 7. The figure shows that most of the points are aligned along the diagonal line, which indicates a perfect estimation of ET

_{o} by FAO Penman-Monteith. The method overestimated ET

_{o} in a few stations, such as Alor Star, Bayan Lepas, Kota Bahru, and Muadzam Shah.

## 5. Discussion

The FAO–Penman-Monteith method has been found as the most efficient for the estimation of ET

_{o} in different climatic regions. The FAO Penman-Monteith model was developed based on physiological and aerodynamic theories of surface water release to the atmosphere. Therefore, it can be used as a standard model for estimation of ET

_{o} in any region without any adjustment of parameters [

24] based on the FAO recommendation [

13,

15]. The present study also found that FAO Penman-Monteith is the best model for Peninsular Malaysia. However, the Penman-Monteith model needs a large number of meteorological variables, including air temperature, wind speed, relative humidity, and solar radiation, for the estimation of ET

_{o}. It is often very difficult to obtain data of all the meteorological variables. Many meteorological stations in the developing country do not measure all these variables. Therefore, a large number of alternative models have been developed based on the availability of data. The success of those models in a particular area often depends on the climate of the region. Therefore, the selection of an appropriate model based on the availability of data and the performance of ET

_{o} estimation model is a difficult task. The performance of 31 empirical ET

_{o} models has been assessed in this study. Input requirements of the models are different.

The performance of empirical ET_{o} models was often found to vary from station to station within the same climate zone, which may be due to the period and quality of data used, and uncertainty in the coefficient values used for the estimation of ET_{o}. Besides, suggesting different models for different stations often makes the practical application of the ET_{o} estimation model complex. Therefore, a single model is often suggested for the regional level for the estimation of ET_{o}. Thus, the ranking of ET_{o} estimation models in different stations was used in this study for the ranking of ET_{o} models for the entire Peninsular Malaysia using information aggregation approach.

In the present study, radiation-based Priestley and Taylor was found to perform best after the FAO Penman-Monteith model. It is followed by the mass transfer-based Dalton and Meyer models. The Priestley and Taylor model needs three meteorological variables (mean air temperature, solar radiation, and relative humidity) compared to the five variables required by the Penman-Monteith model (temperature, solar radiation, relative humidity, wind speed, and saturated vapor pressure), while Dalton and Meyer need three meteorological variables (mean temperature, relative humidity, and wind speed). Based on the availability of meteorological data, an appropriate model can be selected for the estimation of ET_{o} in Peninsular Malaysia with more or less similar accuracy.

Among the temperature-based models, only the Ivanov model was found to perform satisfactorily, which was ranked 10th among the 31 models compared in the present study. Other temperature-based models performed the worst and were ranked at the bottom of all the models. Many of the models were developed for a particular climate. For example, the Priestley and Taylor and the Makkink models were developed for the estimation of ET

_{o} in a humid climate. On the other hand, the Turc model was found suitable for ET

_{o} estimation in a cold, humid and arid climate [

26]. Therefore, the Priestley and Taylor and Makkink models were found to perform very well among the radiation-based models, while Turc was found to perform worse than the simple temperature-based Ivonov model in tropical Malaysia.

The findings of the present study contradicts earlier studies. Ali and Lee [

31] found Blaney–Criddle as the most suitable model after Penman-Monteith for the estimation of ET

_{o} at Alor Setar station in Peninsular Malaysia. They only used relative error for the assessment of the performance of empirical ET

_{o} models. Tukimat et al. [

13] assessed the performance of seven empirical ET models for the same station using three statistical metrics, namely absolute error, relative error, and correlation coefficient. They found the least absolute and relative errors for the Hargreaves-Samani but highest correlation for Makkink, followed by Priestley-Taylor and Turc. They come to an overall conclusion that radiation-based models are most suitable for the estimation of ET

_{o} in the region, which also support the findings of the present study. But they failed to decide the best model due to a contradiction in statistical metrics. Lee et al. [

29] compared the performance of eight empirical models using mean absolute error, and reported FAO Blaney-Criddle as the most suitable model after FAO Penman-Monteith for estimation of ET

_{o} in the west coast of the peninsula. Muniandy et al. [

32] compared the performance of 26 empirical models at Senai station using eight statistical metrics. They also obtained contradictory results in term of different statistics. They took the arithmetic mean of the statistics to rank the models and found Penman as the best among the mass transfer-based models, McGuinness and Bordne among the radiation-based, and Szasz among the temperature-based models. Different models have been reported as the best in different stations in Peninsular Malaysia in the above studies, which do not match with the findings of the present study. This is due to the use of a single statistic for making a decision, as in the studies of Lee et al. [

29] and Ali et al. [

31]. Tukimat et al. [

13] and Muniandy et al. [

32] used multiple statistics, but did not attempt to find the best ET model based on the statistics. Muniandy et al. [

32] attempted to rank the models based on the average of multiple statistics, but the average of statistics does not provide an optimum solution as the ranges of statistics metrics vary widely.

CP was used in this study for finding the most suitable empirical models based on four statistical metrics which can be used to measure the similarity between two time series in a robust way. CP proves a robust model compared to many MCDA models for finding a reliable solution based on multiple contradictory objectives. Therefore, the best empirical models identified in this study based on CP can be considered more reliable. Besides, the empirical models were ranked for the entire Peninsular Malaysia considering the fact of the same tropical humid climate for the whole region. The information aggregation model was used in this study for this purpose, which ranks the models based on the frequency of rank obtained by different models in different stations. Therefore, the top-ranked models in different stations were also found to achieve the top rank for the entire peninsula. This indicates the ranking of the models obtained in this study for the entire peninsula can be used for finding the most suitable model based on the availability of data for reliable estimation of ET_{o}.