Machine Learning for Pan Evaporation Modeling in Different Agroclimatic Zones of the Slovak Republic (Macro-Regions)

: Global climate change is likely to inﬂuence evapotranspiration (ET); as a result, many ET calculation methods may not give accurate results under different climatic conditions. The main objective of this study is to verify the suitability of machine learning (ML) models as calculation methods for pan evaporation modeling on the macro-regional scale. The most signiﬁcant PE changes in the different agroclimatic zones of the Slovak Republic were compared, and their considerable impacts were analyzed. On the basis of the agroclimatic zones, 35 meteorological stations distributed across Slovakia were classiﬁed into six macro-regions. For each of the meteorological stations, 11 variables were applied during the vegetation period in the years from 2010 to 2020 with a daily time step. The performance of eight different ML models—the neural network (NN) model, the autoneural network (AN) model, the decision tree (DT) model, the Dmine regression (DR) model, the DM neural network (DM NN) model, the gradient boosting (GB) model, the least angle regression (LARS) model, and the ensemble model (EM)—was employed to predict PE. It was found that the different models had diverse prediction accuracies in various geographical locations. In this study, the results of the values predicted by the individual models are compared.


Introduction
Changes in climatic factors, such as in temperature, wind speed, sunshine hours, humidity, and solar radiation, can have a significant impact on the evapotranspiration (ET) process [1]. ET is the combination of two separate water loss processes: water's evaporation from the soil and plant surfaces and plant transpiration, through which water escapes from a plant's body into the ambient air through its stomata in the form of steam [2,3]. Evaporation is an important component of the hydrological cycle [4]. Pan evaporation (PE) is extensively used in developing irrigation projects and provincial water resources [5,6]. PE is an important climatic variable for developing efficient water resource management strategies [7]. Generally, two approaches-(i) direct (i.e., pan evaporimeter) and (ii) indirect (i.e., empirical or semi-empirical equations)-are used to measure evaporation [8,9]. The direct estimation of PE using a class A pan evaporimeter has limited spatial coverage because of practical and instrumental problems [9][10][11]. In contrast, the application of the indirect PE estimation method based on the relationship of ET with various climatic parameters is often restricted due to data availability and climate variability [9,12,13]. Considering the limitations of both the methods, machine learning (ML) techniques have ; reference evapotranspiration (Etr); average, maximum, minimum, morning, and afternoon relative humidity (RH, Rh max , Rh min , RH1, and RH11); average, maximum, and minimum air temperature (T aver , T min , and T max ); wind speed (Us); relative sunshine duration (S); solar radiation (SR); soil heat flux (G); saturation vapor pressure (es); actual vapor pressure (ea); slope of the saturation vapor pressure function (∆); psychometric constant (y); precipitation (P); global radiation (Rs). ** Root mean square error (RMSE); mean absolute error (MAE); efficiency factor (EF); Willmott index (d); absolute error (MSE); determination coefficient (R2); regression value (R); relative absolute error (RAE); relative squared error (RSE); percentage bias (PBias); unpaired two-sample t-test (t-test); Nash-Sutcliffe efficiency index (NSE); scatter index (SI).
The North Atlantic Stream, as part of the Gulf Stream, has a favorable effect on the climatic conditions of Western and Northwestern Europe. These effects are mainly reflected in the reduction in temperature fluctuations throughout the year; this means that, for example, during the winter period, the air temperatures do not decrease below the freezing point [46]. However, the weather in Slovakia also affects the flows of air from the Atlantic Ocean and from the Adriatic Sea. The individual weather forecasting models used in Slovakia have also contradicted each other in certain time periods. An example is the year 2021, when weather differed from most of the years of the second decade of the 20th century. Frequent changes in weather were typical for most of the year 2021. This has occurred since the end of summer and during autumn, and it was also characteristic for the month of December. At the beginning of this month, it warmed up, then there was a cooling period; however, this only occurred in some parts of Slovakia, for example in the west and southwest, and it was accompanied by heavy snowfall. In the middle of the month of December, it warmed noticeably, and it lasted until the end of the year [47]. The climate of a particular area is also affected by microclimatic factors, especially the shape of the relief (convex or concave), the orientation of the relief towards the world and the prevailing flow, relative altitude fragmentation, vegetation, and anthropogenic influences. The aim of this study is to obtain information about the areas that are the most affected by PE changes, which is closely related to the manifestations of global climate change. For the purpose of this study, eight newly explored ML models for estimating PE losses in six macro-regional scales were assessed. The macro-regions were classified according to individual agroclimatic zones. Using available climatic data from 35 different meteorological stations across the Slovak Republic, we performed the estimation of the PE in the vegetation period applying the datasets selected for the years 2010 to 2020, including their different elevations above sea level and geographical locations.
As a result of the above-mentioned literature resources, ML is an important tool for the goal of leveraging technologies around artificial intelligence. ML is, in part, based on a model of brain cell interaction. Although the model was created in 1949 by Donald Hebb (1949) [48] in a book titled The Organization of Behavior (Keith, 2021) [49], the applications of ML techniques to evapotranspiration estimation problems are currently limited and the knowledge on the topic is still partial and fragmented [50].
The results of this study are unique because: (1) there is no comparative study for the whole of the Slovak Republic, where all available climate data for the last climate decade for PE estimation are applied; (2) the new progressive ML method was used for PE assessment; and (3) besides the selection of the applicability and predictability of eight different ML models, the macro-regions with the most pronounced manifestations of global climate change are identified.

Study Location and Climatic Data Collection
In connection with global climate change, the gradual modification of the energy balance globally, and especially locally, was observed. Therefore, the assessment of PE changes and their trends in the Slovak Republic has its justification. The climatic data from the climatological network of Slovak stations were provided by the Slovak Hydrometeorological Institute (SHMI) in Bratislava. The available PE dataset for the vegetation periods of the years from 2010 to 2020 was analyzed according to individual regions of the Slovak Republic.
The daily data of PE during the main vegetation period (from April to October) from 35 meteorological stations distributed across Slovakia (see Figure 1) were used in this study. There were evaluated data mostly from the ten consecutive years (2010-2020). All available PE data measured in the Slovak Republic by SHMI during the observed periods were applied in this study. The reason for this is not only the availability of the data, but also the application of the same instrument for their measurement, i.e., a GGI-3000 evaporator (see principle in [51]).

Slovak Republic Zoning Criteria
The zoning criteria are based on the agroclimatic division of Slovakia [52]. For the agroclimatic division of Slovakia, three basic agroclimatic indicators were used, according to the area divided into agroclimatic macro-areas, sub-areas, and districts.
(a) The agroclimatic temperature indicator (TS10) is the sum of average daily air temperatures during the period with an average daily temperature of ≥10.0 • C. According to this indicator, the territory of Slovakia is divided into three agroclimatic macro-areas and eight agroclimatic areas: 1.
(b) The agroclimatic moisture indicator (K VI-VIII ) is given by the difference of potential evaporation (Eo) and precipitation (P) in the summer months (June-August (VI-VIII)): E o values were calculated for the territory of Slovakia by [53]. The climatic indicator "K" expresses the moisture balance of the area well (expressed in millimeters). Positive also the application of the same instrument for their measurement, i.e., a GGI-3000 evaporator (see principle in [51]).

Slovak Republic Zoning Criteria
The zoning criteria are based on the agroclimatic division of Slovakia [52]. For the agroclimatic division of Slovakia, three basic agroclimatic indicators were used, according to the area divided into agroclimatic macro-areas, sub-areas, and districts.
(a) The agroclimatic temperature indicator (TS10) is the sum of average daily air temperatures during the period with an average daily temperature of 10.0 °C. According to this indicator, the territory of Slovakia is divided into three agroclimatic macro-areas and eight agroclimatic areas: 1. Warm agroclimatic macro-area with TS10 from 3100 to 2400 °C ; 2. Slightly warm agroclimatic macro-region with TS10 from 2400 to 2000 °C ; and 3. Cold agroclimatic macro-region with TS10 from 2000 to 1600 °C.
(b) The agroclimatic moisture indicator (KVI-VIII) is given by the difference of potential evaporation (Eo) and precipitation (P) in the summer months (June-August (VI-VIII)): Eo values were calculated for the territory of Slovakia by [53]. The climatic indicator "K" expresses the moisture balance of the area well (expressed in millimeters). Positive values Eo − P are characterized by the water deficiency and negative values by the moisture excess.
According to the climatic indicator (KVI-VIII), there are seven sub-areas in Slovakia, in which agricultural crops have differently ensured their moisture requirements: 1. Subarea with KVI-VIII 150 mm-very dry 2. Subarea with KVI-VIII 150 to 100 mm-mostly dry 3. Subarea with KVI-VIII 100 to 50 mm-slightly dry 4. Subarea with KVI-VIII 50 to 0 mm-slightly humid 5. Subarea with KVI-VIII 0 to −50 mm-mostly humid 6. Subarea with KVI-VIII −50 to −100 mm-humid According to the climatic indicator (K VI-VIII ), there are seven sub-areas in Slovakia, in which agricultural crops have differently ensured their moisture requirements:
Subarea with K VI-VIII 150 to 100 mm-mostly dry 3.
Subarea with K VI-VIII 100 to 50 mm-slightly dry 4.
Subarea with K VI-VIII −100 mm-very humid (c) The agroclimatic wintering indicator (T min ) represents the average of the annual absolute temperature minima. This characteristic effectively describes the climatic conditions during the winter. Absolute temperature minima are important factors for the cultivation of winter crops and fruit trees. They effectively express the critical freezing temperatures. According to the conditions of wintering, there are five districts:
Agroclimatic district of relatively mild winters with T min from −18.0 • C to −20.0 • C; 3.
Agroclimatic district of mildly cold winter with T min from −20.0 • C to −22.0 • C; 4.
Agroclimatic district of mostly cold winter with T min from −22.0 to −24.0 • C; 5.
Agroclimatic district of cold winter with T min ≤ −24.0 • C. The average annual vertical temperature gradient in Slovakia is 0.61 • C per 100 m of height. During the summer months, however, its value increases to 0.76 • C and, in the winter months, it decreases to 0.33 • C. The territory of the Slovak Republic was divided into eight regions (see Figure 1). The classification of the eight Slovak regions into six macro-regions with 35 individual meteorological stations across Slovakia is shown in Table 2.
According to Figure 1 and Table 2, the division of eight regions into six macro-regions according to the different agroclimatic zones is as follows: The characteristics of the agroclimatic zones were applied to classify eight regions into six macro-regions, which are: (1) Northwestern Slovakia (NW): Trenčín region-From an agroclimatic point of view, the NW area is assigned to the macro-area of a mildly warm, agroclimatic area of a lightly warm subarea that is moderately humid to mostly humid. Moreover, this area is assigned to agroclimatic precincts with a mild/cold winter to a mostly cold winter. This area also transitions north to the macro-area with a cold, agroclimatic area with a moderately cold sub-area that is mostly humid to humid, and a precinct with mostly cold winters. (2) Southwestern Slovakia (SW): Bratislava region, Trnava region, and Nitra region-From an agroclimatic point of view, the SW area is assigned to the macro-area of a warm and agroclimatic area that is very warm, a sub-area that is very dry, and a predominantly dry and agroclimatic precinct that has mainly mild winters. (3) North-central Slovakia (NC): Žilina region-From an agroclimatic point of view, the NC area is assigned to the macro-area of a slightly warm to cold agroclimatic area from a slightly to moderate/mildly warm sub-area up to a slightly cold sub-area. This area is also assigned to slightly dry, moderately humid, mostly humid, and agroclimatic precincts that are mildly cold to mostly cold in the winter. This area transitions north to a macro-area of a cold, agroclimatic area that is mostly cold, a sub-area that is mostly humid to humid, and a precinct that is mostly cold/cold in the winter. (4) South-central Slovakia (SC): Banská Bystrica region-From an agroclimatic point of view, the SC area at the southernmost part of the state border is assigned to a warm macro-area, a very warm agroclimatic area, a very dry and predominantly dry sub-area, and an agroclimatic precinct with a predominantly mild winter. This area transitions to a macro-area with a moderately warm, an agroclimatic area that is relatively mild/warm, a sub-area that is slightly humid to mostly humid, and a precinct that is slightly cold in the winter. (5) Northeastern Slovakia (NE): Prešov region-From an agroclimatic point of view, this area is the most diverse. In the southern part, it is considered a warm macro-area, with an agroclimatic area that is sufficiently warm to relatively/moderately warm, sub-areas that are predominantly dry to moderately dry, and agroclimatic precincts that have relatively mild winters to mild cold winters. This area transitions north to a macro-area that is warm or moderately warm to cold, an agroclimatic area that is relatively mild/warm to slightly cold, sub-areas that are moderately dry or slightly humid, and a precinct that is slightly cold in the winter to mostly cold in the winter. (6) Southeastern Slovakia (SE): Košice region-From an agroclimatic point of view, the SE area is assigned as the macro-area of a warm and agroclimatic area that is very warm, a sub-area that is very dry and a predominantly dry, and a agroclimatic precinct with a predominantly mild winter. This area transitions to an agroclimatic area that is mostly warm, a sub-area that is very dry, and a precinct that has a relatively mild winter.   Extreme temperatures are considered a limitation for the production process of field and garden crops and often for all agricultural production. The absolute maximum temperature in Slovakia was measured on 20 July 2017 in Hurbanovo at 40.3 • C and the absolute minimum temperature was measured on 11 February 1929 in Vígl'aš-Pstruša at −41.0 • C.

Case Study and Data Description
In this study, the following measured meteorological data were applied on a daily basis: (1) daily pan evaporation in millimeters (PE); (2) minimum temperature in degrees Celsius (T min ); (3) maximum temperature in degrees Celsius (T max ); (4) average temperature in degrees Celsius (T aver ); (5) relative humidity as a percentage (RH); (6) average wind speed in meters per second (Us) with (7) wind speed measured at 7, 14, and 21 h, as well as (8) prevailing wind directions measured at 7, 14, and 21 h; (9) the sum of precipitation in millimeters (P); and (10) vapor pressure in hectopascals (E). For each station, the (11) elevation above sea level in meters (elevation) and (12) geographical location (coordinates) were applied.

ML Models and Evaluation Criteria
The data collected daily from 35 meteorological stations were processed into an extensive database (including about 176,000 observation records) in the form of panel data. Each observation was labelled by time and cross-sectional characteristics. The database was divided into a ratio of 40:30:30 into data for model estimation (train), for model validation (Validate) and for model testing (Test). The analysis was performed in SAS Enterprise Miner 15.1. As part of the workflow, the most used modeling techniques for large-scale data analysis were applied. As the target variable, pan evaporation was selected. All the other variables were used as the input variables, including geographical location. Based on the input data, the following models were estimated, validated, and tested.

Neural Network (NN)
The neural network with multilayer perception (MLP) model architecture was tested. Artificial neurons are similar to biological neurons in terms of learning based on experience and generalizations of previous experiences and applying acquired experience to new data. MLP is a neural network inspired by the structure of the real brain [54]. It consists of layers that contain neurons as processing units. Each unit in a layer is associated with all units in the previous layer. The knowledge of the whole network is encoded in the weights of these connections. Such a neural network can accurately predict the outputs of any unknown function. The network uses three nodes (input layer, hidden layer, and output layer), which are neurons with a nonlinear activation function [55] (see Figure 2).

AutoNeural Network (AN)
An autoneural network is a modeling tool in which the neural network architecture is automatically configured by the software to achieve the best result. The main advantage of a neural network is its ability to detect complex nonlinear relationship between dependent and independent variables and all possible interactions between them [57]. Another advantage of neural networks is the availability of multiple training algorithms which may be applied without deeper statistical knowledge. As disadvantages could be considered a higher computational burden, it has the empirical nature of model development and the tendency towards overfitting. Another disadvantage is the "black box" nature of

AutoNeural Network (AN)
An autoneural network is a modeling tool in which the neural network architecture is automatically configured by the software to achieve the best result. The main advantage of a neural network is its ability to detect complex nonlinear relationship between dependent and independent variables and all possible interactions between them [57]. Another advantage of neural networks is the availability of multiple training algorithms which may be applied without deeper statistical knowledge. As disadvantages could be considered a higher computational burden, it has the empirical nature of model development and the tendency towards overfitting. Another disadvantage is the "black box" nature of neural networks, which complicates the further modification and deeper interpretation of model results compared to other types of models.

Decision Tree (DT)
An empirical decision tree represents a segmentation of data that is created based on a series of simple decision rules. The tree is created based on an algorithm that identifies different ways of dividing the dataset into individual segments similar to branches. They can be used for the analysis of categorical as well as quantitative variables. The rule for forming the branches of the decision tree is based on the method of extracting the relationship between the analysis object and one or more inputs to create the segments [58].

Dmine Regression (DR)
Dmine regression is an estimation method using a sequentially applied least squares method. In each step, an independent variable is selected that contributes most to the explained model's variability. In the estimation, bidirectional interactions between classification variables are considered, a possible non-linear relationship between the input variables and the target variable is identified, and group variables are used to reduce the number of the classification variable levels.

DM Neural Network (DM NN)
A DM neural network is the method used to estimate the additive nonlinear model, whereby the main components are used to predict the target variable. In contrast to the neural network, it overcomes the problem of nonlinear estimation and the identification of the global optimal solution with a significantly shorter computational time. In the first step, the method of the main components is applied to the training data, from which a smaller number of components is subsequently selected for use in the next modeling process. At each stage of the modeling process, eight training functions are estimated separately in the training dataset, from which the variant that achieves the best results is selected.

Gradient Boosting (GB)
Gradient boosting is the application of the algorithm published by [59]. The algorithm searches for the optimal partition of data defined in terms of the values of one variable. The optimality of the criterion depends on how the target variable is divided into the individually defined partitions. The more similar the values of the target variable within a segment, the higher the value of the partition. The majority of the similar algorithms work through a process called recursive partitioning. The partitions are then combined to create a predictive model. The accuracy of the model is evaluated by standard indicators based on a comparison with the values of the target variable. As with decision trees, there is no assumption about the distribution of variables that need to comply with the gradient boosting. For interval input variables, the model only works with variable sequences. For analyzed data, where decision trees have achieved good results, gradient boosting can often further improve this result.

Least Angle Regression (LARS)
This type of regression was first published by Efron et al. (2004) [60]. Similar to the forward selection that is often used in regression models, the algorithm produces a sequence of regression models. One additional parameter is added at each step. The sequential procedure is completed after all parameters have entered the model. The algorithm begins by centering the input and output variables. The input variable scales are adjusted to the same number of corrected squares. The initial level of the coefficients is zero, as is the predicted effect on the output variable. Subsequently, the predictor that is most correlated with the residues is identified and a step in the direction of this predictor follows. The step length determines the predictor coefficient. The step length is determined so that any of the other predictors and the current predicted effect have the same correlation with the residues. At this point, the predicted effect moves in a direction that is rectangular between the two predictors. This will ensure that already mentioned predictors have the same correlation with residues. The predicted effect moves in the same direction until the third predictor also has the same correlation with residues as the previous two predictors previously in the model. The new direction is designed to be rectangular between the three predictors. The predicted effect moves again in this direction until a fourth predictor that has the same correlation with residues enters the model. In this way, the process continues until all predictors enter the model.

Least Ensemble Model (EM)
This type of model is one of the methods that combines posterior probabilities and two or more predictive models to create a potentially more accurate model [61]. In the case of the analysis performed, the ensemble model was based on the results of all the above models.
The modeling process is shown in Figure 3. In the initial step, data were collected and processed into the input database. The database was divided into train, validate, and test datasets. In the next step, all the modeling techniques described above in the methodology were applied in the train dataset. Estimated models were validated and tested using validate and test datasets. In the final step, the best model was selected based on the smallest value of average squared error achieved in validation and testing.   Table 3 shows the basic characteristics of PE for individual regions. On average, the lowest PE values were recorded in the NC area, and the highest were recorded in the SE and SW regions. As for the largest change over the period, it could be expressed using the variation range in the last column. It follows that the largest change in terms of PE occurred in the SE region, followed by SC and SW. The smallest change in terms of evaporation was in the NC and NW regions. Similar conclusions can be taken from Figure 4.   Table 3 shows the basic characteristics of PE for individual regions. On average, the lowest PE values were recorded in the NC area, and the highest were recorded in the SE and SW regions. As for the largest change over the period, it could be expressed using the variation range in the last column. It follows that the largest change in terms of PE occurred in the SE region, followed by SC and SW. The smallest change in terms of evaporation was in the NC and NW regions. Similar conclusions can be taken from Figure 4.    Table 4, where the lowest value on average is in NC, and the highest values are in SW and SE. The displayed values represent the measurements taken during the vegetation season. In terms of the graphical range of values, similar results were also achieved, either with or without the consideration of extreme values. The highest margin is in SE and SW, or SC, where the highest PE changes were found. If we assess the differences from a statistical point of view, Table 4 shows the differences between areas that can be considered statistically significant.    Table 4, where the lowest value on average is in NC, and the highest values are in SW and SE. The displayed values represent the measurements taken during the vegetation season. In terms of the graphical range of values, similar results were also achieved, either with or without the consideration of extreme values. The highest margin is in SE and SW, or SC, where the highest PE changes were found. If we assess the differences from a statistical point of view, Table 4 shows the differences between areas that can be considered statistically significant.

ML Models' Accuracy Evaluation
The indicator for comparing the accuracy of forecasts of individual models was the average squared error (ASE) indicator. The model with the lowest value of this indicator was evaluated as the most suitable model for prediction. Table 5 compares the prediction accuracy results of all estimated models, according to which the ASE chooses the model with the smallest average squared error value. The comparison is made using the average squared error of the estimate, which is calculated as the square of the difference between the actual and predicted value of the target variable. The lowest value of the criterion is considered as the best one, because it is the most accurate estimation. The second column shows the accuracy that was achieved when estimating the model (Train). In this case, higher accuracy is usually always achieved in the case of model validation and training. The first column shows the value of the average squared error achieved when predicting the validation sample, and the last column of Table 5 shows the error values achieved when training the model on another data sample as large as the validation sample. These values are slightly higher than in the model estimate. The model that achieved the lowest average square error in PE prediction was Dmine regression. Both models of the neural network were evaluated as the least suitable models for the prediction of PE values.  Figure 5 compares the results of the average predicted values according to the individual models. With the three best models (Dmine regression, gradient boosting, and decision tree), a very similar development of predicted values can be observed. The most significant differences in the predicted values were recorded at higher values; on the contrary, at low predicted values, the resulting predictions of all models were very similar.     For comparison, the results according to the second best model are also presented (gradient boosting). The order of the importance of the variables in this case was almost identical to the results obtained with the decision tree. Even in this case, humidity and temperature have the greatest influence (although there is a difference in the importance of the individual variables related to temperature, compared to the first model). In the third case, the elevation above sea level is in third place. Significant variables selected in the Dmine regression model, which was considered superior, are shown in Table 6. The table includes information about the explanatory ability of each variable, the sum of squares, and its significance expressed by the F-value and p-value. Variables denoted as AOV16 represent the categorical expression of quantitative variables. They were used in cases in which this kind of variable achieved better results than the original one. According to the results in Table 6, it can be concluded that the most significant variables influencing pan evaporation were average humidity and minimum temperature. Other variables were also significant, but with much smaller influence on the target variable.   For comparison, the results according to the second best model are also presented (gradient boosting). The order of the importance of the variables in this case was almost identical to the results obtained with the decision tree. Even in this case, humidity and temperature have the greatest influence (although there is a difference in the importance of the individual variables related to temperature, compared to the first model). In the third case, the elevation above sea level is in third place. Significant variables selected in the Dmine regression model, which was considered superior, are shown in Table 6. The table includes information about the explanatory ability of each variable, the sum of squares, and its significance expressed by the F-value and p-value. Variables denoted as AOV16 represent the categorical expression of quantitative variables. They were used in  This study's results are in accordance with other studies, e.g., [39,[62][63][64], which concluded that ML may be a powerful tool for the prediction of actual evapotranspiration when a time series of a few years is available. Starting from the measurements of a sufficient number of climatic parameters, it is possible to obtain forecasting models characterized by very high accuracy and precision.
The final model was selected as the best one, from all common data mining techniques usually used in the modeling of large-scale data. The dataset was, in this case, compiled from measurements in different places and included a large number of observations. The overall explanatory ability of this model measured by R-squared analysis is slightly smaller than the value presented by other authors; however, this could be explained by a very large number of observations used in the presented analysis, which was unique especially due to the extent of the data (both in time and space). The presented model can be used as the universal model for the prediction of pan evaporation in the Slovak Republic.

Relationship between PE and Variables
Concerning the relationship between PE, temperature and elevation above sea level, the correlation analysis was simplified; only T min and T max were included instead of T aver . Table 7 shows that there is a significant yet weak relationship between PE and elevation above sea level. On the other hand, there is a moderately strong relationship between PE and T aver , which is a moderate relationship. Comparing T min and T max clearly shows that T max is a slightly more important factor.

Discussion
The results of this study are built on existing evidence of ML models and statistical techniques as useful frameworks for making predictions of complex climatological indices, such as the hydrological PE, as concluded by numerous authors (see Table 1). Investigating outcomes indicate that the ML models perform well in predicting PE at different climatical regions [1,9,13,16,[40][41][42], which is also consistent with this case study. The results according to [11] showed that the ML models have different accuracies in different climates.
ML models based on multiple regression, such as DR in this study, have the best accuracy of results, which is in line with the hypotheses of the previous studies [11,40,42,45]. However, the results do not fit with the theory according to Zounemat-Kermani (2021) [40], where the kriging model, as well as the support vector regression (SVR), radial basis function neural network (RBFNN), and = Levenberg-Marquardt (MLP-ML) models, performed better compared to the RSM (the modified response surface method) and M5Tree (M5 model tree).
Contrary to this study's hypothesized association, in which AN and NN were evaluated as the least suitable models for the prediction of PE, the study of Sudheer (2002) [50] proved that PE values could be reasonably estimated using temperature data only through the ANN technique. The results of this study also contradict the claims of the research [13], in which the estimation results obtained with the functional link artificial neural network (FLANN) model are compared with those obtained by multi-layer artificial neural net-works (MLANN) and two empirical methods using the same raw data and corresponding features. In summary, the proposed FLANN models provide improved estimates, and they are able to model the daily evaporation process more accurately compared to MLANN. Out of MLANN and FLANN, the FLANN is less complex and better in terms of accuracy. However, among the three agroclimatic zones, the improvement in the plains zone (Raipur) is more visible, as compared to the plateau (Jagdalpur) and hill (Ambikapur) zones of the study area [13].
The study [13] concludes that appropriate FLANN models can also be selected depending upon the availability of climatic parameters. A comparison according to [40] indicates that the models, without local climatic data, provide more scattered estimates. The studies that used more input climate data variables [9,40,44] have low predicted values; on the contrary, the most significant relationships in the predicted values were recorded in studies with less input climatic data variables [11,40,43]. As confirmed by [11], there is a slight difference between RH and T aver , as they were much more effective at modeling PE than the other variables in their study. A different study [42] concluded that besides T aver , SR had the best performance, followed by models that used RH and, finally, Us. However, the temperature-based models showed the worst performance in their study. From a study conducted by Al-Mukhtar et al. [1], it is evident that the temperatures (T max , T min , and T aver ), RH and Us were all significantly associated at the 0.05 level with the PE, justifying that the PE rates at these stations can be nicely modelled by these elements. The results of this study are consistent with the above-mentioned studies and proved the strong relationship between PE and T aver .
The results of this study match with the study of Majhi and Naidu (2021) [13], where they stated that the evaporation rate increases with an increase in the radiation and temperature of the evaporating zone. Similarly, wind speed also helps to remove the water from water bodies to some extent and triggers the evaporation process. Bright sunshine and low humidity and atmospheric pressure are the other climatic variables that significantly contribute to the evaporation process [13]. The study used the long-term climatic data on T max ( • C), T min ( • C), and RH for morning and afternoon hours (RH1 and RH11 (%)), and Up, SR, and PE were collected from the certified observatories located in selected stations in India [13].
The different studies used different numbers of meteorological stations for their calculations, from one [44,45] or two [9,16,40] up to eight [11,42]. However, in this case study, 35 meteorological stations were used to improve the calculation accuracy, which provides new insight and is an advantage of this study.
In a study from the same country as this case study (Košice, Slovak Republic) [35], the soft computing techniques for estimating daily PE were investigated, using daily SR, RH, T, and Us as the meteorological variables for modeling. These model results show that the ML model performs better than another study that used soft computing techniques [35]; this was the main motivation of this study. However, in this case study, the applicability of the newly explored ML models was investigated, with the best accuracy obtained by DR. For future research, it can be compared with the best performing models of ML from Table 1. The macro-regions with the greatest change in PE will be used for further research applying PE with other variables (SE and SW regions of the Slovak Republic) with the uniform climatic characteristics.
In general terms, no one ML algorithm is the best for all problems [50]. The performance of different machine learning algorithms strongly depends on the size and structure of the available data. As concluded in the study by Granata (2019) [50], this study also infers that the individual ML models have been chosen because they usually achieve high performance and they are very good at learning complex, highly non-linear relationships. However, due to the large dispersion of climatic, geographical, and local conditions, it is necessary to test individual ML models independently for each site.

Conclusions
This study explored the abilities of eight different ML models-NN, AN, DT, DR, DM NN, GB, LARS, and EM-in modeling PE, utilizing the meteorological dataset combinations of the variables T min , T max , T aver , RH, and Us with wind speed in the directions at 7, 14, and 21 degrees; prevailing wind directions at 7, 14, and 21 degrees; PE; elevation above sea level; and geographical location. The climatic datasets obtained from 35 stations in different geographical locations were used as inputs for training and testing. The datasets were taken from eight regions in the Slovak Republic during the vegetation periods of the years from 2010 to 2020.
The eight regions were classified into six macro-regions-NW, SW, NC, SC, NE, and SE-according to the agroclimatic zones of the Slovak Republic. The results of this study could be applied practically in the field of regional PE estimation. According to results of the performed analysis, the best method for the modeling of pan evaporation was Dmine regression. Both neural network models were the worst performing models from all applied methods. This suggests that the character of the modelled relationships could be explained better by other methods, despite the ability of neural networks to explain complex relationships. On the other hand, Dmine regression allows us to use a standard modeling approach and offers better insight into the modelled relationships, which is not the case of the "black box" neural network.
From this study, we conclude the following: (a) The lowest PE values were recorded in the NC area, and the highest were recorded in the SE and SW regions. The largest PE change over the observed period (expressed by using the variation range) occurred in the SE region, followed by SC and SW. The smallest PE change was in the NC and NW regions. (b) The best accuracy of the ML models was obtained by DR (TASE = 0.78819), followed by GB (TASE = 0.77826) and DT (TASE = 0.78094), though it is possible to see very similar results of the predicted values. Both neural network models, AN and NN, were evaluated as the least suitable models for the prediction of PE. (c) There is a significant but weak relationship between PE and elevation above sea level.
However, there is a moderately strong relationship between PE and T aver . A comparison between T min and T max shows that T max is a slightly more important factor.
Because ET is one of the main parts of the hydrology cycle, it is crucial for estimating PE, especially when there are limited data available. Based on the results, the use of the best three evaluated ML models is recommended (DR, GB, and DT) in the region with the largest PE change and with uniform climatic characteristics. The previous studies that used more input climate data variables have low predicted values; on the contrary, the most significant relationships in the predicted values were recorded in studies with less input climatic data variables.
The results of this study can be used in water resources management to make sustainable irrigation plans, design sustainable water supply systems, or carry out sustainable reservoir management.