Optimal Calibration of Evaporation Models against Penman–Monteith Equation

: We present an approach for the calibration of simpliﬁed evaporation model parameters based on the optimization of parameters against the most complex model for evaporation estimation, i.e., the Penman–Monteith equation. This model computes the evaporation from several input quantities, such as air temperature, wind speed, heat storage, net radiation etc. However, sometimes all these values are not available, therefore we must use simpliﬁed models. Our interest in free water surface evaporation is given by the need for ongoing hydric reclamation of the former Ležáky–Most quarry, i.e., the ongoing restoration of the land that has been mined to a natural and economically usable state. For emerging pit lakes, the prediction of evaporation and the level of water plays a crucial role. We examine the methodology on several popular models and standard statistical measures. The presented approach can be applied in a general model calibration process subject to any theoretical or measured evaporation. tested on Results from the validation are analyzed and the best model is chosen concerning results from all cross-validation splittings.


Introduction
Evaporation and evapotranspiration play a crucial role in water management in a wide range of human activities and thus there is a strong need for accurate estimates. This need leads to a considerable number of papers and studies that are offering new methods of such estimates or comparison of methods already used in hydrologic engineering applications. The results of estimation could be compared with reference evaporation/evapotranspiration calculated by FAO Penman-Monteith equation E FAO , which is recommended as the standard method [1,2]. This equation is considered to be an etalon to which the results of the other methods can be related and compared. The papers dealing with evaporation or evapotranspiration present the comparison of the FAO Penman-Monteith method results to other methods proposing the relations between input data less complicated and computationally less demanding. For instance, such a procedure was performed in the study [3] to find the best estimation of water lost from a covered reservoir. In [4] it is stated that E FAO provides good agreement with evaporation measured on 120 ha dam. In [5], E FAO is used not only as a reference method and but also a foundation of new numerical models derived by multiple linear regression and design of experiment method, followed by the simplified methodology for the quantification of the evaporation rate of a basin with a photovoltaic system. Similarly, the possibility of reduction of Lake Nasser evaporation using a floating photovoltaic system is described in [6]. Our research is not focused only on the area of Lake Most, but also on pit lakes that are only planned, therefore it is impossible to use limnological and bathymetric data, such as temperature profile or water depth. Such a situation is considered in [7], where authors used the E FAO for the computation of the open water evaporation estimation.
FAO Penman-Monteith method is characterized by a strong likelihood of correctly predicting evapotranspiration in a wide range of locations and climates with differing local conditions, e.g., solar radiation, sunshine duration, wind speed, air humidity, air temperature [8][9][10].
Our interest in free water surface evaporation is given by the need for ongoing hydric recultivation of the former Ležáky-Most quarry (Czech Republic), i.e., Lake Most, as well as another planned hydric recultivation in the region. One of the key components of hydric reclamation planning is the securitization of long-term sustainability, which is based on the capability of keeping the stable level of a dimension of the final water level.
Hydric recultivation was proposed to be the best way to deal with the residual of opencut coal mines in the north-western region of the Czech Republic. After the mine is closed, the void could be filled by surface water runoff and groundwater. In the case that these resources are not strong enough, the pit lake has to be filled artificially and that is the case of the former Most-Ležáky mine and Lake Most. The level of the new lake has been proposed to be stable with a water level at 199 m above sea level assuming the tabulated values of precipitation and evaporation between 500 mm and 600 mm annually [11]. However, this assumption fails to be true [12].
The evolution of artificial pit lakes is affected by a wide range of chemical, physical, and namely hydrological processes such as saturation of the coastal lines, leakage through the bottom, and free surface evaporation. As the lake bottom was sealed before filling, evaporation was supposed to be the main cause of the observed water loss.
Together with the precipitation, the open water evaporation and vegetation evapotranspiration form the main components of the water cycle in nature, and it is said that the evaporation over the land surface amounts to about two thirds of the average precipitation, see [13]. However, these estimates differ from location to location and evaporation measurement or calculation procedures are complicated and burdened with a high degree of uncertainty. This happens due to the complexity of evaporation as a physical phenomenon and several factors that affect this process. The rate of evaporation could be measured or calculated, however, because of the simplification of the evaporation process description, both measurement and computational methods provide only the approximation of actual evaporation. For further discussion about the historical development of evaporation and evapotranspiration, see [14] mentioning 166 models and equations obtained during the last three centuries. The description and characterization of all physical processes affecting evaporation could be found in a classical book by Brutsaert [15], or Maidment [16] ( Shuttleworth's chapter).
For computational methods, there are two ways to handle evaporation, described as mass transfer or energy budget methods, see [13]. Furthermore, the models could be viewed as temperature-based, radiation-based, mass transfer-based, and combined methods based on the inputs used to calculate the rate of evaporation, for additional details see [2,[16][17][18].
Additionally, within each group, there are several equations, which are widely cited in the technical literature. During the study of various evaporation models in the literature, one can observe several difficulties. One of the main difficulties is the inconsistencies in the used physical units. For instance, according to the time and place of publication of articles or books, the same equation can be encountered with the pressure given in kPa, mbar, Torr or millimeters of mercury column. This variability of units can cause different shapes of the same equations in different sources, even when the same units are used. Furthermore, different types of methods require different type of data. However, in practical applications, we are not able to measure all types of input parameters, therefore the choice of the model depends not only on the modelling quality but mainly on the ability to measure the required input physical quantities. This is one of the main reasons why the development of new simplified models is still active. The FAO equation is a solid standard, but sometimes too complex to be handled in practice. For instance, in the article [19], eighteen temperature-, radiation-, mass transfer-based and combined methods are studied under the condition of climatic change in Germany. The FAO equation is compared with 31 methods under the humid climate condition in Iran in the paper [20]. In [21], five temperature-based methods and three radiation methods are considered.
In this paper, we are looking for the simplification of the FAO equation in terms of the number of input quantities. Our goal is to use less complex models to model the evaporation in the area of Lake Most by calibrating the parameters of the models in the fitting optimization process against the evaporation estimation by FAO using selected statistical measures. The motivation came from the lack of measuring devices in the direct area of the lake. In this paper, we consider models which require only the air temperature, wind speed, and relative humidity. Besides the Lake Most, we are interested also in the planned pit lakes. Therefore, the types of models are limited to those which requires only these basic meteorological data. In this case, it is not possible to measure, for instance, the temperature of the water.
The performance of the considered methods can be evaluated statistically, and several statistical measures could be used. The most commonly used are the Root Mean Square Error (RMSE), Mean absolute error (MAE), and the Mean Bias Error (MBE), see, for instance, [22]. Additionally, the RMSE and MBE could be combined to calculate the so-called t-statistic test, which expresses the level of confidence between the models, see [23]. A quite unusual measure could be found in [24], it is mean ratio MR, which is computed as the average of ratios between the predicted and observed values. Another way to compare two given models is to compute Pearson's correlation coefficient (PCC) r or to compute R 2 coefficient of determination. These two measures are closely connected since R 2 is square of r. The simplest measure of the prediction quality is the percentage expression of the difference between models. This is a statistical measure that is well known as percent bias (PBIAS). The last class of statistical measures that describe the correspondence of observed and simulated data are agreement indices, such as Nash-Sutcliffe efficiency (NSE) and Willmot's agreement index. These are used, for example, in [21,25,26]. For further discussion of statistical model evaluations, see [27]. In this paper, we compare several selected statistical measures, namely NSE, RMSE, MAE, and PBI AS. However, we suppose that our methodology can be applied to any chosen distance function. The applicability depends on the ability to solve the corresponding regression problem.
The choice of temperature-based methods in the case of the Lake Most study is not due to the current lack of meteorological data since Kopisty meteostation is located only 1 km from the lake. The simplification of the equation is motivated by the further planned hydric recultivations in the region. Planned pit lakes are more distant from Kopisty and therefore the meteorological data provided to the models from Kopisty would not be sufficient. The data provided to the models on new lakes will be measured directly on the area of new lakes. In the case of Lake Most, we can identify the appropriate simplified model because of advantageous location of Kopisty with respect to Lake Most. Therefore, we are interested in the identification of the simplest suitable model with the low demand on input data. The construction of the new weather station in the area closer to the new lake does not make any sense from the financial point of view (because of the presence of Kopisty weather station). On the other hand, to provide better estimations, we should measure the input data as close as possible to the area of interest. The measurement of, for instance, the temperature is relatively cheap. The only question is if the temperature is a sufficient amount of input meteorological data for providing a sufficient estimation. The accuracy of the water loss due to evaporation is crucial for those planned artificial lakes. To provide the best possible estimate, the experiences from the Lake Most will be used. In this paper, we compare several simplified models with respect to different statistical measures, namely NSE, RMSE, MAE, and PBI AS.
Additionally, to avoid the overfitting of the calibrated model, we adopt the crossvalidation methodology [28]. We randomly split the data into calibration and validation parts. The parameters of the model are optimized on the calibration set and tested on the validation part. Results from the validation part are further analyzed and the best model is chosen concerning results from all cross-validation splittings.
The paper is organized as follows. Section 2.4 introduces the methods and materials used in our computation. To be more specific, we start with the presentation of the Lake Most in Section 2.1, and the data provided to the models in Section 2.2. Afterwards, we review the FAO equation in Section 2.3 and the simplified models in Section 2.4. During the calibration process, we use the statistical measures presented in Section 2.5. The whole methodology is implemented in R programming language, see Section 2.6 for details. This section also includes the description of the used cross-validation process. The results are presented in Section 3 and discussed in Section 4. Finally, Section 5 concludes the paper.
The paper can be considered to be an extension of our previously published work [29].

Study Area
The Lake Most is situated in the North of the Czech Republic near the city of Most 50 • C310 N, 13 • C360 E, see Figure 1. It was created by the hydric recultivation of the Most-Ležáky quarry in the central part of the North Bohemian brown coal basin. The former mine heavily affected the area of 1254 ha and the pit lake, as a part of its revitalization, was planned to have a surface area of about 300 ha. The project of the revitalization is secured by the state enterprise Palivový kombinát Ústí (PKU) [30]. Before the flooding, it was necessary to take technical arrangements such as sealing the bottom of the future lake, construction of an underground sealing wall, and strengthening the shoreline. All these arrangements allow viewing the Lake Most as a closed system without natural inflow or outflow. Due to the absent natural inflow, the residual pit of the lake was filled through an artificial feeder during the period from 2008 to 2014. In the final phase of lake filling, i.e., in the year 2014, the surface level rose from 197.74 m to the required level of 199 m above sea level.
After finishing the filling process, Lake Most has an actual surface area of 309.4 ha, a coastal line length of 8.9 km, a total water volume of 70.5 million m 3 , and a maximum depth of 75 m. Throughout the filling of the lake, both operational and basic meteorological data were monitored. The operational data contain data on the achieved altitude of the lake level, its surface area, and especially on the volume of water admitted. The filling of the lake has been finished in 2014 achieving the required surface level of 199 m.

Data and Data Sources
In our research, we are using the meteorological data collected during the years 2015-2019. The collection includes all data necessary for the calculation of the Penman-Monteith equation (see Section 2.3). These meteorological measurements were performed at the Kopisty weather station situated approximately 1 km from the lake. The station is operated by CHMI-Czech Hydrometeorological Institute and the data are recorded at ten-minute intervals. The dataset obtained from CHMI was statistically processed to be used in the equations to model the evaporation. We present the data basic statistics in Figure 2. In Kopisty weather station, the wind speed is measured at 10 m above the ground to avoid the influence of the ground. The air temperature and humidity are measured at 2 m above the ground.
We also included precipitation frequency for the demonstration of the hydrological balance in the area of interest. In comparison with the average temperature and precipitation in the Czech Republic, the area of the planned hydric reclamation is in the area with the temperature strongly above the average and precipitation strongly below normal precipitation, and with the number of hours of sunshine below the typical value in the Czech Republic [11]. average day temperature T a , atmospheric pressure P, daylight hours per day n, relative humidity RH, wind speed u 2 , and precipitation pr. These data are used in the equations for modelling the evaporation.

Penman-Monteith Equation
The E FAO equation is of the form Please see Section Abbreviations at the end of this paper for the description and physical units of the used variables.
According to Linacre paper [31] for daily estimates of the evaporation rate of free water level, the term G can be neglected, i.e., we set G = 0.
The psychrometric constant γ depends on the atmospheric pressure P in [kPa] and on above-mentioned constants C p = 1013 J kg −1°C−1 , λ = 2.45 MJ kg −1 and ε = 0.622[−], with ε being ratio molecular weight of water vapor to dry air. To compute its value, the following formula is used Using this formula, the computed value of γ depends only on one measured quantity and that is atmospheric pressure P and is given in kPa°C −1 .
The slope ∆ describes the relationship between saturation vapor pressure and temperature. For a given temperature T a , the corresponding ∆ is given by The resulting unit of ∆ is kPa°C −1 .
The net radiation at the surface R n in MJ m −2 day −1 is, by [1], given as the difference incoming net short wave radiation R ns and outgoing net long wave radiation R nl , i.e., To evaluate R nl , the knowledge of solar radiation R s and global extraterrestrial radiation, R a is required.
Extraterrestrial radiation is the amount of radiation incident on a unit of the horizontal surface at the outer boundary of the atmosphere. For places of similar latitude, it is approximately the same, changing only during the year. There is no influence of cloud turbidity or air pollution over the Earth's atmosphere, and therefore, the dose of solar energy is the highest at any given time. In addition to the solar constant, the angle of incidence of the sun's rays at a given location of the atmosphere boundary must also be taken into account. Therefore, the value of R a is expressed depending on these quantities as The terms included in Equation (5) are where a s , b s are Angström coefficients, n[h] and N[h] are actual and maximum possible duration of daylight, respectively. Finally, α denotes albedo, i.e., the coefficient of reflection.
As the Angström coefficients are not calculated based on the actual solar radiation measurements here, the FAO paper [1] recommendation a s = 0.25 and b s = 0.5 in Equation (6) is used. Furthermore, the free water surface albedo is set as α = 0.08 based on [1].
The maximum daylight duration N is computed as To determine net longwave radiation R nl , the following formula is used The above formula (9) uses the Stefan-Boltzmann constant The value of the clear-sky radiation R so is calculated as where z is the site altitude in [m] above the sea level. However, despite its complexity of the FAO Penman-Monteith Equation (1), it is not possible to consider E FAO results to be accurate, since the number of input data to be measured or calculated by empirical formulae based on measured input data. Such an estimation process is affected by measurement and calculation errors. For example, in [32,33], one could find sensitivity analysis of the FAO Penman-Monteith equation in different climate conditions.
It should be pointed out that Equation (1) was derived as a method to determine the reference rate of evapotranspiration, i.e., the reference rate of evaporation from growing plants with the characteristics of hypothetical reference crops such as height, aerodynamic resistance of their surface, and albedo. For real crops, the rate of evapotranspiration is determined from With the proper coefficient, the E FAO formula could be used to estimate open water evaporation. The values of coefficient K c , i.e., K c,mid and K c,end for mid and end season respectively are tabulated in [1]. Specifically, K c,mid = K c,end = 1.05 for shallow lakes, i.e., for those with a depth of up to 2 m. For deep lakes, i.e., with a depth exceeding 5 m, the values K c,mid = 0.65 and K c,end = 1.25 are indicated. Therefore, it should be borne in mind that (especially in the case of deep lakes) the result of E FAO could lead to the underestimation of up to 35% or the overestimation up to 25% during the season.
Since our research is not focused only on the area of Lake Most, but also on lakes that are only planned and does not exist at present, it is impossible to use limnological and bathymetric data, such as temperature profile or water depth. This leads us to the considerations of the article [7], which states that in the case of missing limnological data, the lake coefficient K c = 1 can be selected and E FAO result itself could be considered to be an open water evaporation estimate. Hence, in all our calibration and validation processes, the equation E FAO serves us as an etalon and all our results are compared against it.

Evaporation Estimation Methods
Since the FAO Penman-Monteith equation E FAO (see (1) in Section 2.3) is very inputintensive and complex in its calculation procedure, many other methods have been derived to determine the rate of evaporation. Depending on the inputs of the method, we divide them into temperature-, radiation-, mass transfer-based, and combined methods.
Temperature-based method equations can be considered the simplest type of equations. They primarily work with a single variable, namely the average mean air temperature T a . Quite often these equations have a linear form E = p T a + q, but they also occur in the form E = k T m a , or the form of exponential formula E = 10 p T a +q or E = e p T a +q . However, the group also includes relations in which the temperature occurs in combination with a member comprising, for instance, relative humidity RH or theoretical length of the solar day N.
In this section, we selected 7 simple (in comparison to the complexity of FAO Penman-Monteith) evaporation models for the demonstration of our calibration approach.

Regression Derived Relations-Czech Republic
The following three relations are used in the Czech Republic. They are derived by regression between the observed evaporation and mean daily air temperature, using statistical regression to find both linear and exponential models. The model relations presented in this section are compared in the paper [34] with measurements on 20 m 2 evaporation pan placed in the meteorological station Hlasivo near the city of Tábor 49 • C290 N, 14 • C450 E) in the South Bohemian Region. This station is operated by Výzkumný ústav vodohospodářský T. G. Masaryka (VUV, T. G. Masaryk Water Research Institute) and was built in 1957 and has a 20 m 2 evaporation tank, GGI-3000 pan and Class-A pan. The pan evaporation measurements here are carried out from May to October, which is due to the temperatures below the freezing point in the winter months. The models are given by where E S is the equation according to Šermer [35], E BV according to Beran and Vizina [36], and E VUV according to Adam Beran from VUV published in the official report Model průběhu meteorologických veličin pro oblast jezera Most do roku 2050 (The modelling of the course of meteorological quantities for Lake Most area until 2050). In all equations, the evaporation rate is determined in [mm day −1 ]. Equation (10) has the form of an exponential function and therefore its results can never be negative. However, it must be mentioned that there are limitations of Equations (11) and (12): if the equation produces the negative evaporation estimation, we set the value equal to zero. For instance, E BV = 0 is set on days with the mean temperature below −0.526°C, since E BV would be negative in such cases.
To calibrate the models, we present a parametric formulation of the Equations (10)-(12) by where θ are unknown parameters, which will be calibrated. We extended models by projection to nonnegative numbers (using the outer max function) to enforce the computed nonnegative evaporation.

Kharrufa
The equation presented by Kharrufa in [37] is an example of a nonlinear temperature formula. It is mostly written in the literature in the form and results in the evaporation rate in [mm day −1 ]. In (16), variable p denotes the percentage of total daytime hours for the daily period out of total daytime hours of the year. This form is used, for instance, in [38,39]. The coefficient 0.34 was found empirically and it is possible to refine it with respect to the site-specific conditions. For example, in [34], the form E K = 0.25 p T 1.3 a is given with the formula being calibrated for the conditions of the Hlasivo weather station in the South Bohemian Region. In this study, the form (16) with coefficient 0.34 is used.
In this paper, we the calibrate model (16) introducing the parametric version and calibrate the unknown parameters θ ∈ R 2 .

Hargreaves-Samani
Another method was introduced by Hargreaves in article [40] and further modified to the form which can be found in article [41]. Usually, the Hargreaves-Samani equation is given in its basic form In this equation, variable T r denotes the difference between daily maximum and minimum air temperatures [°C].
Although the formula contains a radiation term R a , it is ranked among the temperaturebased formulae since the term R a here is just a theoretical value calculated according to Formula (5). Using this computation of R a , the Hargreaves-Samani equation takes any of the following equivalent forms The difference of these forms is only in the usage of division by the latent heat of vaporization of water λ = 2.45 MJ kg −1 , which is performed to obtain the results in millimetres per day.
In this paper, we consider the parametric form of Hargreaves-Samani Equation (18) with parameters θ ∈ R 2 . These parameters will be optimized during the calibration process.

Schendel
In contrast to the formulae mentioned above, in which the air temperature is sufficient to calculate the evaporation rate, the air relative humidity RH measurement is required in the following Schendel equation. The formula has a simple form The equation can be found in the original Schendel paper [42]. It is used by many authors, for example, see [19,20].

To calibrate this model, we consider the Schendel Equation (22) in the parametric form
with unknown parameter θ ∈ R.

Priestley-Taylor Equation
A potential evapotranspiration based on the Priestley-Taylor can be considered to be a combined method, which is developed as a combination of the turbulent diffusion method and the method of energy balance [43]. The basic equation for the computation of the potential evapotranspiration by the Priestley-Taylor method is given by Please see Section Abbreviations in the end of this paper for the description of the used variables.
In this paper, we optimize the constant parameter in Equation (22), i.e., we consider a parametric model where θ ∈ R is unknown parameter.

Turc Equation
The Turc method was developed for the climatic conditions of western Europe [44]. Several forms of Turc equation could be found in the literature. In this paper, we are considering the one from [45], i.e., The term R s total solar radiation is computed using Equation (6). In this paper, we calibrate the Turc Equation (24) with respect to two parameters, namely we work with the parametric model where the term C RH and its conditions remains the same as in (24).

Statistical Measures
To compare the performance and predictive power of each evaporation estimate method E defined in the previous section against the FAO Penman-Monteith equation, the following statistical measures are used: Nash-Sutcliffe efficiency (NSE), root mean square error (RMSE), mean absolute error (MAE), and percent bias (PBIAS). Each of these measures offers a description of the difference between observed or measured and predicted or calculated values.
One of the most popular measures that assesses model predictive power is Nash-Sutcliffe efficiency (NSE), see [46], which tries to capture the extent of errors and their degree of variability. It is given by where T is several observations andĒ FAO is the mean value of FAO Penman-Monteith model given byĒ The NSE index indicates the relative magnitude of the residual variance ("noise") compared to the measured data variance ("information"), see [27]. The index shows, among other things, how well the scatterplot of the observed and modelled data corresponds to a 1:1 straight line. NSE takes the value −∞ ≤ NSE ≤ 1 and NSE = 1 indicates a perfect match. The value NSE = 0 means that the model predicts with the same accuracy as the observation mean. Negative values of NSE indicate an unacceptable model.
To capture the size of individual errors E t − E FAO t , root mean square error (RMSE) and the mean absolute error (MAE) could be used. They are defined by Both RMSE and MAE also express the model error in units corresponding to the units of the value observed. The MAE value captures the mean error size and similar information is provided by RMSE. However, RMSE attaches more weight to larger errors and thus suggests the presence of "extreme error". For both MAE and RMSE, it should be noted that their results are always greater than zero, which results in the loss of overvaluation or undervaluation information. For both, their lower value indicates a better model. Concerning the MAE and RMSE indicators, the "half standard deviation" rule is also mentioned; i.e., in a good model, MAE, as well as RMSE should be less than half the standard deviation of the observed variable.
The simplest measure of the prediction quality is the difference E FAO − E or the percentage expression of this difference. This value is called percent bias and it is computed as where E t is the value of the model in time (day) t = 1, . . . , T. We used the presented statistical measures to calibrate the parameters of the models, i.e., for each model and measure, we solve the minimization problem in the case of ρ ∈ {RMSE, MAE, PBIAS} (given by Equations (27)- (29)), or the maximization problem θ * = arg max ρ(E FAO , E(θ)) = arg min −ρ(E FAO , E(θ)) (31) in the case of ρ = NSE (given by (26)). In both problems, E(θ) represents the considered parametric model which has to be calibrated, i.e., one of Equations (13)- (15), (17), (19), (21), or (23).
During our experiments on model calibration using different statistical measures, we observed that the maximization of NSE and minimization of RMSE produce the same optimizer. The following theorem supports this observation with theoretical proof.
Proof. From the definition of the optimality point θ * of the optimization problem on the left side of (32), we can write for all possible parameters θ i.e., using the definition (26) This inequality can be modified to an equivalent form subtracting 1 from both sides and multiplying by constant negative term − We divide this inequality by a positive number of data points T and apply the square root to both sides. Since both sides of the original inequality are nonnegative values and the square root is an increasing function, the following inequality is equivalent to the previous one i.e., using the definition (27) RMSE(E FAO , E(θ)) ≤ RMSE(E FAO , E(θ * )).
This is the optimality condition for the solution of the optimization problem on the right side of (32).

Implementation
We have implemented the calibration process in R programming language [47]. This software provides us with an easy way how to load the data, manipulate them, solve the corresponding optimization problems, and present the results.
The type of the optimization problem depends on the considered model and the selected statistical measure, see (30) and (31). We compared several solvers provided by R programming language (both in computational time and the accuracy of the results) and we observed that the nlminb algorithm from optim package is the most efficient option for solving the optimization problems. This algorithm is using PORT routines [48] and our numerical experiments proved the suitability and stability during the solution of both linear and nonlinear models.
To generalize our calibration process, we adopt the cross-validation methodology [28]. The modelling with a random subset of data generalizes the results and avoids overfit-ting. In our case, we perform 10 random permutations of the data (please notice that the statistical measures which we are using are independent of the order in time). We split each permutation into 10 parts-9 of them is used for calibrating the model (i.e., solution of (30) or (31)) and the remaining part is used for validation. See Figure 3, where we demonstrate performed 10 calibration-validation processes on one specific data permutation. We repeat this permutation 100 times and for each of this permutation, we repeat 10 calibration-validation splittings. In total, we obtain 1000 results of the calibration process, which are further analyzed. This method is well known as K-fold cross-validation.  . The cross-validation: one random permutation of input data is divided into 10 parts, 9 of them are used for calibration, the remaining one is used for validation. On one specific data permutation, we obtain 10 different calibration results. In this figure, we demonstrate the process on the data of length T = 20; however, our real dataset is of length T = 2191.

Results
We calibrate the models (Section 2.4) against the FAO Penman-Monteith equation (Section 2.3) using the criteria (Section 2.5) and cross-validation process (Section 2.6). In the comparison of the original models, the calibration process always results in better statistical values, see Table 1. The value has been obtained using the parameters presented in Table 2 on the whole data set. Table 1. The comparison of statistical measures before and after the calibration process. These values have been computed on the whole data set and correspond to the models presented in Table 2 The parameters of the optimal models have been chosen as a mean value of the statistical measure on testing data. As was described in Section 2.6, we are calibrating models on the random part of a given data set and since the optimal parameters depend on this choice, the calibrated parameters are also random variables. Consequently, the new value of the statistical measure is random as well. See Figure 4, where we demonstrate the randomness of the statistical value-we plot the statistical measure of the original model on different training data and compare them with the statistical measures obtained by the calibration process on these training data. The final calibrated model has been chosen as a model, which corresponds to the mean value of the obtained calibrations. Each of the presented graphs corresponds to different statistical measures in the calibration. We calibrated the models with respect to all considered statistical measures and we track the change of other measures. As an example, see Tables 3-5 for the results in the case of the Kharrufa model, Hargreaves-Samani model, and Turc model. As we mentioned above, the random choice of calibration data in the calibration process causes the randomness of the optimal parameters. In Figures 5-7, we present the different optimal values of calibrated models with respect to various statistical measures. We can observe that using the cross-validation approach, the final optimal parameters are a random variable as well.
The       The final result is presented in Figure 11. Here, we demonstrate the monthly evaporation computed by Hargreaves-Samani equation in the form of a time-line and histogram of evaporation in months. The parameters of the calibrated models can be found in Table 2 and the improvement of the used statistical measure on the whole data set in Table 1.

Discussion
In our results presented in the previous section, we processed the results of calibration on random data from the cross-validation process (see Section 2.6). Before the selection of the mean model, we removed the outliers based on the quartile thresholding. For instance, in the case of the Kharrufa equation calibrated with respect to NSE, we removed 11 outliers more than 1.5 interquartile ranges (IQRs) below the first quartile or above the third quartile.
Our results show that the calibrated parameters depend on the chosen statistical measure, see Table 2. However, all of them are improving the objective value in comparison with the original equations, see Table 1.
From the obtained results, we observed that the calibration with respect to one selected measure improves not only this objective measure but also improves the remaining measures. See Tables 3-5, where we examined the Kharrufa, Hargreaves-Samani, and Turc model.
The measures presented in Section 2.5 are defined as the sum of local differences. Our results of cumulative evaporation presented in Figures 8-10 show the consequences of the formulation of the objective function in this form-the cumulative evaporation computed by the optimal calibrated model fits the cumulative evaporation computed by the FAO equation. We observed this property in the case of all measures. However, in the case of daily evaporation (or monthly evaporation), the local difference can be large, see Figure 11. Evaporation in some months has been underestimated and in other months has been overestimated. In any case, this underestimation and overestimation are always better than in the case of the original equation.
The obtained results follow the equivalency of the calibration process based on NSE maximization and RMSE minimization, i.e., Theorem 1. Please see Figures 5-7, where we demonstrate the density of optimal parameters of the calibrated Kharrufa, Hargreaves-Samani, and Turc equation with respect to the random data split in the cross-validation process. The small difference between NSE and RMSE is caused by the error of the iterative algorithm: the optimization algorithm has a stopping criterium based on the change of the function value. Since the NSE and RMSE have different objective functions, the iterative algorithm stops the optimization prematurely (sufficiently approximately) in different optimizers. Especially in the case of Figure 6, the difference is clearly observable. However, in this case, we are dealing with the Hargreaves-Samani model (19). We suppose that this difference is caused by the non-linearity of the model (and the non-linearity of used statistical measures). The difference between objective functions in the solutions computed by NSE and RMSE is approximately 10 −2 (see Table 4), which is the value used in the stopping criteria of the iterative optimization algorithm. The situation is similar for RMSE.
The results obtained by our analysis show that the calibrated Hargreaves-Samani and Turc models seem to be the most suitable simplification of the FAO Penman-Monteith equation in the area of Lake Most. However, it is necessary to mention that the final choice of the most suitable calibrated equation for evaporation modelling depends not only on the final value of the statistical measures but also on the input data requirements. Therefore, we suggest using the Hargreaves-Samani equation since this equation requires only the input of the extraterrestrial radiation and the air temperature, see Equation (18). Figure 11 presents the final improved evaporation estimation.

Conclusions
In this paper, we presented the methodology for the calibration of evaporation models with the FAO Penman-Monteith equation and demonstrated it on selected simplified models using the most common statistical measures. Additionally, we implemented a crossvalidation process to remove the overfitting of the calibrated model. This approach can be easily applied to any model of interest and any sufficiently reasonable statistical measure. In the paper, we presented a calibration with respect to theoretical values computed by FAO Penman-Monteith equation; however, the methodology can be used for calibration with any theoretical or measured reference values of evaporation.
From the presented results, we suggest using the Hargreaves-Samani equation to model the evaporation on Lake Most. This equation reported the sufficient approximation of the FAO Penman-Monteith equation and additionally, it requires only a few input parameters, which can be easily (and cheaply) measured.
During our research, we observed the global fitting property of common statistical measures-the evaporation during cold days is underestimated and the evaporation during sunny days is overestimated. To deal with this issue, we focus our future work on the division of days into groups with different optimal models.  Data Availability Statement: The data are not publicly available due to the requirements of confidentiality and security. Data was obtained from The Institute of Atmospheric Physics CAS and are available from the authors with the permission of The Institute of Atmospheric Physics CAS.

Acknowledgments:
The authors would like to express further thanks to The Institute of Atmospheric Physics CAS for the provided meteorological data from their measuring station Kopisty.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: