Combination of Limited Meteorological Data for Predicting Reference Crop Evapotranspiration Using Artificial Neural Network Method

Reference crop evapotranspiration (ETo) is an important component of the hydrological cycle that is used for water resource planning, irrigation, and agricultural management, as well as in other hydrological processes. The aim of this study was to estimate the ETo based on limited meteorological data using an artificial neural network (ANN) method. The daily data of minimum temperature (Tmin), maximum temperature (Tmax), mean temperature (Tmean), solar radiation (SR), humidity (H), wind speed (WS), sunshine hours (Ssh), maximum global radiation (gradmax), minimum global radiation (gradmin), day length, and ETo data were obtained over the long-term period from 1969 to 2019. The analysed data were divided into two parts from 1969 to 2007 and from 2008 to 2019 for model training and testing, respectively. The optimal ANN for forecasting ETo included Tmax, Tmin, H, and SR at hidden layers (4, 3); gradmin, SR, and WS at (6, 4); SR, day length, Ssh, and Tmean at (3, 2); all collected parameters at hidden layer (5, 4). The results showed different alternative methods for estimation of ETo in case of a lack of climate data with high performance. Models using ANN can help promote the decision-making for water managers, designers, and development planners.


Introduction
Increased water consumption due to rapid population growth has necessitated expanding food production through irrigation and industrial output to meet basic human needs [1]. The fundamental goal of irrigation is to apply water to maintain crop evapotranspiration when rainfall is insufficient [2,3]. Evapotranspiration (ET) is a connection between the energy and water cycles, and climate is the primary source of water vapor in the atmosphere [4][5][6][7]. It regulates the quantity of water needed for vegetation development, surface runoff, water loss from water bodies, and water requirements for ecological sustainability. As a result, it is one of the most important hydro-meteorological factors for irrigation and water resource management, as well as ecological monitoring [8][9][10][11]. Evapotranspiration, like other hydro-meteorological variables, has changed all over the world because of global warming. Changes in ET must be assessed to plan adaptation and mitigation to combat climate change's effects on water supplies, agriculture, and ecology [12]. Evapotranspiration is one of the most essential components of hydrological and climatological processes, accounting for roughly 70% of precipitation falling on land and consuming more than 50% of the solar energy collected by the earth [13,14]. As a result, ET is a particularly valuable indicator for analysing the hydrologic regime's changing behaviour [15].
Evapotranspiration, on the other hand, is difficult to quantify. Although lysimeters are commonly used to directly measure ET, they are poorly spread over the world due to their high cost and time-consuming management [16,17]. Therefore, ET is generally estimated by various empirical models that assess either potential evapotranspiration (ET p ) or reference crop evapotranspiration (ET o ) [18,19]. The factor ET o is the highest rate of evapotranspiration that a well-watered vegetative grass surface can produce [5,20,21]. It is thought to be the most accurate way to quantify evaporative demand from real-world land surfaces under certain meteorological conditions [22]. In fact, ET o (rather than actual ET) has long been a key input into numerous hydrological models [23,24]. The ET o is a key factor in the hydrological process used for the calculation of the water requirement, based on the meteorological parameters [25] and quantification in agriculture and precision farming [26]. In addition to empirical ET o models, newly emerging artificial intelligence and machine learning techniques have recently been used to estimate ET o [27][28][29][30].
Intelligent computer models, such as the artificial neural network (ANN) methodology have been created as alternate ways for calculating ET throughout the last decade [31]. Because of its broad use in a variety of scientific fields, ANNs have shown significant progress in the studies on hydrology and water resources [27,28]. Artificial neural networks are large, parallel-distributed processors made up of basic processing units that have a natural proclivity for storing and making available experimental information. They are useful tools for modelling nonlinear processes because they require minimal inputs and can map input-output connections without requiring any prior knowledge of the process [32]. As an efficient soft calculation technology for estimating ET o based on accessible and quantified climatic variables, artificial intelligence models (AI) have been widely applied [33,34].
Neural Networks are frequently used in anticipating lumpy demand, which is characterized by periods of no demand and times of high demand. Traditional time-series approaches may be unable to capture data with a nonlinear pattern. To get over these restrictions, NN modelling is a potential option. Without requiring a priori function forms of models, neural networks can be used to model time series data. Many neural network algorithms have been suggested, explored, and successfully applied to time series prediction and causal prediction, including Multilayer Feed-forward NN, Recurrent NN, Time delay NN, and Nonlinear Autoregressive exogenous NN [35].
The corrosion rate and residual strength of un-stiffened aluminium panels with corrosion thinning and MSD were predicted using ANN by Pidaparti et al. [36,37]. When the ANN residual strength forecasts were compared to the experimental findings, the mean absolute error was found to be around 12%. Even when compared to some of the other very simple engineering models presented in the literature, such an error level is considered to be relatively high.
Hijazi et al. [38] recently used ANN modelling to forecast the residual strength of panels using MSD. A dataset of 147 alternative configurations was compiled from a variety of literature sources, with 97 data points utilized for training, 25 for validation, and 25 for testing. Three different aluminium alloys (2024-T3, 2524-T3, and 7075-T6), four different  test panel configurations (un-stiffened, stiffened, stiffened with broken middle stiffener,  and lap-joints), and several panel and crack geometries were included in the dataset.
An advantage of ANN models, is that attribute-value pairs can be used to represent problems in ANN. The output of ANNs can be discrete-valued, real-valued, or a vector of many real or discrete-valued attributes, while the target function can be discrete-valued, real-valued, or a vector of numerous real or discrete-valued attributes. Noise in the training data is not a problem for ANN learning algorithms. There may be faults in the training samples, but they will have no effect on the final output. It is employed when a quick assessment of the learnt target function is necessary. The number of weights in the network, the number of training instances considered, and the settings of various learning algorithm parameters can all contribute to long training durations for ANNs [39].
The ANN model used by De and Debnath [40] to compute the maximum and minimum temperatures over India throughout the summer is improved herein. For the years 1901 to 2003, this program used the data for three months to predict the mean monthly surface temperature in the monsoon months (June, July and August) over India, using a Multilayer Perceptron Neural Network (MLPNN). As a result, three models for maximum and lowest temperature for each monsoon month were developed for the period 1901-2003. In all situations, the estimate error was less than 5%. Hasmat Malikand Savita [41], for example, proposes an artificial neural network (ANN) for long-term wind velocity prediction.
In this research, we have tested an artificial neural network intelligence technique (ANN) as a first choice software for the prediction of the ET o with partial climate data and assessed its accuracy with reference information [42]. The fundamental advantage of this network is that no arithmetic model is needed, so an ANN is prepared on examples and trends using a big number of independent and dependent information without past hypotheses about their natural distribution and interactions [43]. The ANNs lack the limitations of other methods of prediction as they finding the needed data through the observed data. Using the ANN techniques, it is essential to have adequate input and output data that is symmetrical [44]. Previous studies have used various AI algorithms and models for computing the ET o [45]. For example, Yin et al. [46] used various regular metrological variables, including T max , T min , wind speed, relative humidity, and solar radiation, to model daily evapotranspiration in China. The results depended upon the machine learning techniques of a hereditary model, or an algorithm such as the kernel model, i.e., the support vector machine (GA-SVM). The artificial neural network and SVM models were compared; the researchers applied the GA-SVM models for the predication of the ET o . An estimate of ET o using eight climatic variables was proposed by Jovic et al. [47] using a hybrid method called the genetic programming (GP).
The main aim of this study was to develop estimation models for ET o using ANN techniques with limited available climatic data to support farmers, water managers, designers, and development planners. Furthermore, in the research area, to the best of our knowledge, there are no studies in the literature that use artificial intelligence methods to predict ET o based on limited climate data. Given the availability of data and the complexity of computing ET o using the Penman Monteith technique, which requires multiple parameters, the steps of this research were: (1) to model reference crop evaporation ET o in a particular area using some of the variables in the climate data, and to compute the performance of ANN model results, and (2) to select the best developed ANN model for estimation of ET o based on performance metrics. The study outcomes and discussion are described, analysed, and the performance of the ANN in the projection of ET o is compared to observation data.

Study Site
The study area is located in the eastern part of Hungary (Hajdú-Bihar County), between 47.5 • N and 21.5 • E ( Figure 1) [48]. In this part of Hungary, the climate is classified as a continental one with a hot summer and a cold rainy winter [49]. The total sunshine hours are 2000 h y −1 , with 810 h in summer and 175-180 h in winter. The average an-nual temperature is 9.6-9.8 • C. Homogenized data for the average, the minimum and the maximum daily air temperature values at 2 m above the ground, and the daily sum of precipitation for the period of 1901-2010, were used. The average daily sum of sunshine hours from 1920 to 2010 was used for the calculations with no data gaps. Considering the wind conditions, north-eastern winds are dominant in Hungary, and based on the dataset for 110 years, the average wind velocity was 3 m s −1 . As a yearly average, winds higher than 10 m s −1 occurred in 122 days per year. The most variable climate element in the plain site was precipitation. The annual precipitation was 560-590 mm; however, the spatial distribution of precipitation varies widely. In the vegetation growing period of 1st April to 30th September, its average is around 350 mm, but differences between years and the seasonal distribution are extreme. For example, based on the dataset for Debrecen, which is in the middle of the lowland of the Carpathian basin, the minimum and maximum annual precipitations between years 1901 and 2010 were 321 mm and 953 mm, respectively. The mean annual potential evapotranspiration was 918 mm year −1 , though in the first decade of the 1900s, it was 883 mm year −1 , while in the first decade of the 2000s, it was 953 mm year −1 . In terms of land use, the proportion of arable land was about 50% of the total area, with maize and wheat among the main crops [50].

Data Sources
In this research, the weather station of Debrecen city, which is situated in the central part of the study area, was chosen to collect the historical meteorological data for modelling. The weather station is one of the official stations of the Hungarian Weather Service and is located in a reference site as required based on Allen et al. [5]. The daily data of minimum temperature (T min ), maximum temperature (T max ), mean temperature (T mean ), solar radiation (SR), relative humidity (H), wind speed (WS), sunshine hours (Ssh), were obtained from the open access meteorological database of the Hungarian Meteorological Service (https://odp.met.hu/climate/station_data_series/ accessed: 03 December 2021), over the long-term period from 1969 to 2019. The downloaded data were already homogenized and adjusted by the National Weather Service. Day length, and minimum and maximum global radiation values were modelled using latitude values and the calendar day. Maximum global radiation (grad max ) (known as well: clear-sky solar radiation) was calculated based on the 37th equation in the FAO 56 paper [5]. The fraction of extraterrestrial radiation that reaches the earth's surface ranges from 0.25 on a day with dense cloud cover to about 0.75 on a cloudless day with clear sky [5]. Therefore, minimum global radiation (grad min ) referring to solar radiation assuming dense cloud cover was calculated as follows: where: grad min : Minimum global radiation; and R a : extraterrestrial radiation (MJ m −2 d −1 ). The FAO-endorsed Penman-Monteith parameterization [5] was used for estimating reference crop evapotranspiration which calculates reference crop evapotranspiration. The albedo used was 0.23, whilst the aerodynamic resistance was 70 s m −1 .
where: ET o : reference crop evapotranspiration (mm day −1 ), R n : net radiation at the crop surface (MJ m −2 day −1 ), G: soil heat flux density (MJ m −2 day −1 ), T: mean daily air temperature at 2 m height ( • C), u 2 : wind speed at 2 m height (m s −1 ), e s : saturation vapour pressure (kPa), e a : actual vapour pressure (kPa), e s -e a : saturation vapour pressure deficit (kPa), The analysed data were divided into two parts, from 1969 to 2007, and from 2008 to 2019, for model training and testing, respectively.

Application of the Artificial Neural Network method (ANN)
The artificial neural networks (ANN) provide a technique with features in common with the analysis process of the neural network [51]. A multi-layered structure is among the most common models for feed-forward ANN. The input, hidden and output layers comprise the easy network architecture of ANN. The input group is transferred to the network on the input layer. The number of nodes in this layer defines the number of inputs. In addition, every input variable is calculated by one neuron, and each neuron is related to those neurons of the other layer by their weights. The hidden layer is that which performs the essential principle of the network and divides the most essential information from the input layer and converts it to the next layer. Difficulties may arise in the processing of new data groups in the same network if more neurons are used than necessary. The most generally used methods for calculating the number of hidden layer neurons and layer are trial and error techniques. Their performance is very useful to increase the accuracy of the study results.
Recently, artificial technology has been successfully applied for predicting the ET o and other climate factors. As a result, many researchers have achieved highly accurate results and provided valuable predication data that was helpful for the future planning and development and accorded with observed data [44][45][46].
Our data sets were studied using a technique of artificial intelligence. The ANN is a type with three layers, as illustrated in Figure 2. The input layer is the initial data of the neural network, the hidden input network is the intermediate input layer, where all the computations are made, the output layer produces the result (output) for each input. To improve the neural network's performance, multiple combinations of input variables with a number of hidden neurons for each layer were applied using an automatic loop to adjust the ANN design. As a result, the optimal combination of factors for forecasting ET o and architecture was chosen based on statistical analysis indicators.

Performance Evaluation
The actual ET o and the modelled values were compared throughout the study period. To evaluate the accuracy of the models, the following statistical indicators were selected: root mean square error (RMSE), normalized root mean square error (NRMSE), mean absolute percentage error (MAPE), accuracy, and the coefficient of determination (R 2 ) [3,52,53]. All the parameters are defined as follows: ETo i A is the observed or actual value, ETo i P is the simulated value, ETo − is the mean value, and N is the total number of data points. RMSE: NRMSE: where SD is the standard deviation of the observed or actual ET o values Mean absolute percentage error: Accuracy:

Calculated ET o
There was a considerable increasing trend in ET o values over the 50 years of data ( Figure 3). The increase corresponds with the increasing trend of temperature at the study site, explored by Juhász et al. [54].

Data Fusion of Climatic Factors for Modelling the ET o
There are two approaches to choosing the appropriate climatic variables in ANNs for forecasting ET o . The first approach relies on training and testing various variable combinations in the ANN in the study, to arrive at the optimal or best combination with high accuracy, performance, and less error. The second strategy employs dimensionality reduction algorithms, such as principal component analysis (PCA), which find patterns in data based on the correlation between features and pick the best variables as inputs to the ANNs. The five combinations in the hidden layer (4, 3), (6,4), (3,2), (4,2), and (5, 4) were used for the development of the ANN models. The input variables were used such as daily of minimum temperature (T min ), maximum temperature (T max ), mean temperature (T mean ), solar radiation (SR), Humidity (H), wind speed (WS), and sunshine hours (Ssh), maximum global radiation (grad max ), minimum global radiation (grad min ), and day length. Different scenarios were formed using these input variables and it was examined which one was the most suitable at different layers. Table 1 lists the top five hidden configurations from the different combinations of input variables based on values of the root mean square error (RMSE), mean absolute percentage error (MAPE), the normalized root mean square error (NRMSE), the accuracy (ACC), and the coefficient of determination (R 2 ) during testing phases. The first combination (C1), consisting of T mean + Sun shine hours + CWB, showed that values of RMSE, MAPE, NRMSE, ACC and R 2 were obtained in the range of 0.071-0.257 mm, 0.005-0.072%, 0.032-0.425%, 0.986-0.997% and 0.85-0.98%, respectively. In order to find the most accurate model, the model must have low RMSE, MAPE, NRMSE values and high ACC and R 2 values [52,53]. For the C1 combination, the most accurate model was the ANN model having hidden layer (5,4).
The last combination, C8, included all parameters and values in the ranges of 0.0170-0.330, 0.0006-0.0019, and 0.001-0.004 were found for RMSE (mm), MAPE (%), and NRMSE (%), respectively. The values of ACC (0.999%) and R 2 (0.99) were the same at all developed models of ANN. From Table 1, it is concluded that the most accurate model was obtained at hidden layer (5,4) for C8. Table 2 shows the results of the most accurate developed models for all the given combinations. Thus, it is clear from Table 2 that the most accurate results were obtained for the combination C2 (4,3). The values for RMSE, MAPE, NRMSE were found to be the lowest at 0.008 mm, 0.000%, and 0.000%, respectively, of all the combinations, whereas values for ACC, and R 2 of 0.999%, and 0.99, respectively, were the highest of all the combinations. The next best result was obtained for the C4 (6, 4) followed by C7 (3, 2), C8 (5, 4), C1 (5,4), and C6 (6,4). The results show the combination C3 had the worst performance followed by C5. Thus, the C2 combination (T max + T min + Humidity + Solar Radiation) at hidden neurons (4,3) can be used to develop the models.

Structural Network Design of the Superior Models
The results of the training data from 1969 to 2007 are presented in Figure 4, showing the optimum structure of the trained neural network. Basic information such as training synaptic weights, the number of hidden neuron layers, convergence steps, and overall errors are all conveyed by each network topology. The threshold for partial derivatives of the error function was chosen at 0.01 for the purposes of this research. All the superior models had different combinations of inputs variables with a variable number of hidden neuron layers. The first three Superior Models contained limited meteorological data (A: T max , T min , humidity, solar radiation; B: grad min , solar radiation, wind speed; C: solar radiation, day length, sunshine hours, T mean ). The fourth model (D) contained all the inputs: CWB, grad max , grad min , T max , T min , humidity, solar radiation, day length, sunshine hours, wind speed.  Model A had hidden neuron layers (4, 3), the training process needed 38,852 steps to achieve the least error. The process has an overall error of about 0.003. Model B had hidden neuron layers (6,4) and the training process needed 5288 steps with an overall error of about 0.001. Furthermore, in Model C with hidden neuron layers (3,2), the training process needed 55,640 steps until all absolute partial derivatives of the error function were smaller than the default threshold of 0.01, with the process having an overall error of about 0.004. Model D required 18,725 steps in the training process until all absolute partial derivatives of the error function were smaller than the default threshold of 0.01 and had an overall error was 0.004.

ET o Prediction and Validation for Superlative ANN Models
After training the neural networks with data from the 39 years from 1969 to 2007 for both the meteorological dataset and ET o and selecting the best model with hidden neuron layers, the prediction of ET o , based on each model, was performed for the same period using the testing data (2008-2019). The predicted values of the ET o were then compared with the reserved data (the tested years) that were not executed for the machine learning model. The results for each model are presented in Figure 5. The deterministic coefficient varies for each model, but in each case, it was greater than 0.998. Although there are ignorable differences (0.07%) in the deterministic coefficients amongst the models containing limited meteorological data, the best performed model was the one which had hidden layers (3,2), with solar radiation, day length, sunshine hours, T mean input parameters. Based on the above results, it was found that scenario C2 had higher accuracy (ACC) with the best correlation (R 2 ) and, at the same time, the least error, of all the scenarios. According to the parameters used in this scenario, it was observed that the minimum and maximum temperature and solar radiation were the most effective parameters in the evapotranspiration occurrence. The results showed that the scenario C2 is almost consistent with the scenario C4, with very slight differences. By comparing Scenario C2 with Scenario C1, it was clear that replacing the T mean with the T max and T min reduces the accuracy of the model. Also, the poor performance of the scenario C3 showed that the Grad max parameter had the least impact on this process, because by eliminating this factor in other scenarios, the accuracy of the models was improved compared to this scenario. In scenario C3 and C5, the accuracy of the model was reduced by removing each of the T max ,T min and solar radiation parameters. In the C8 scenario, although the accuracy of the model was high, using the many parameters made the model more complex. In other words, it was appropriate to have a model with fewer parameters and high accuracy (such as scenario C2). The values of R 2 also show the correlation between the amount of real and estimated evapotranspiration, which accorded with the results, in all scenarios except C3 and C5; a high correlation was observed between these two values. In general, based on the significant accuracy of the Scenarios C2 and C4, the estimated evapotranspiration can be predicted with very high accuracy with the minimum dataset (T max , T min and solar radiation), and the results can be used where data collecting (field survey or measurement) is not possible.

Discussion
Rapid climate change around the world poses major challenges against sustainability of ecosystems and human lives and welfare [55][56][57]. Recently, the earth surface temperature was increased significantly [58,59], and is projected to increase in the future [60,61]. These rapid changes have altered the process of the global hydrological cycle and its component [59,62]. However, these changes along with extreme drought cycles have badly affected soil moisture regimes and inhibited different functions of the soil system [63,64]. In this sense, evapotranspiration, is one of the main components of the hydrological cycle that is directly influenced by climate change, due to the dynamic interaction between different climate components (temperature, radiation, rainfall) and water storage. Thus, tracking changes in ET is an essential tool for ensuring sustainability and profitability of agricultural systems. Hence, ET is the main link between the availability of water for agricultural system and atmospheric demands.
Although many kinds of equipment, models and equations have been developed for measuring or simulating ET, has all have their own drawbacks. Thus, new trends in the application of different models of ANNs for ET calculation could produce a reliable output, which can help promote the decision-making for water managers, designers, and development planners. In this context, the main aim of this research was to evaluate the performance of different scenarios with ANN models in ET prediction for eastern Hungary. In the training stage of the ANN models, the Superior Models contained varied components of meteorological data. The first model used T max , T min , humidity, solar radiation; the second adopted grad min , solar radiation, wind speed; while the third employed solar radiation, day length, sunshine hours, T mean . However, the fourth model contained all the inputs: CWB, grad max , gradmin, T max , T min , humidity, solar radiation, day length, sunshine hours, and wind speed. For the prediction phase, the best performed model was the one which had hidden layers (3, 2) with solar radiation, day length, sunshine hours, T mean as input parameters. Remarkably, it seems that in both steps, temperature and solar radiation were an important factor that influenced the simulation process via the ANN models. In this sense, the C2 scenario indicated that T max , T min and solar radiation were the most effective parameters in the evapotranspiration occurrence.
Previously, Alsafadi et al. [65] pointed out an increase of drought events across Hungary, and also, Mohammed et al. [48] detected more than 103 event (1950-2010) of severe agricultural drought in Debrecen (eastern Hungary). However, Matyasovszky et al. [66] reported a warmer climate condition across Hungary. Lockwood [67] indicated the influence of solar activity on winter temperature across the Eurasian region, while Mares et al. [68] showed a good concordance between drought and solar/geomagnetic activities in central Europe. All these conclusions, directly or indirectly, support our finding, especially for the C2 scenario, where frequent drought events (i.e., less propagation of soil moisture) in Debrecen, warmer climate conditions (increase of T max , T min ) and increase of solar/geomagnetic activities (i.e., solar radiation) enhanced ET process and increased the water demanded by atmosphere.
Agricultural activities, especially maize production, are the main economic activity in the study area [48,49]. Thus, modelling and reduction of ET is a vital issue for sustainably of agribusiness, taking into consideration the sensitivity of maize to water deficit and drought events. Climate change along with decreasing rainfall and increasing ET creates new unsuitable conditions for crop and fruit production [69]. This vulnerability is also explained by the difference between annual precipitation and annual evapotranspiration. It is well known that in mid-season, the potential evapotranspiration is high and precipitation does not meet it, and so there is a shortage of soil moisture for crops, furthermore the high clay content can be also a huge problem concerning readily available water content of soils [70]. Unfortunately, future climate change is predicted to cause more serious drought and extreme events in the Tisza River basin, which includes Debrecen [54,71]. According to statistical data, droughts occur every second or third year during summer, especially in July and August. Therefore, summer crops, such as maize are more affected by the drought than crops harvested in early summer, like wheat [72]. The vital interaction between the different climate parameters (T max , T min , humidity, solar radiation) and the ecosystem component accelerates the development of research on AI and ML algorithms to support decision makers. However, adaptation of the ANN methods could support policy makers and stockholders for the future planning and development.

Conclusions
Although the accurate determination of ET o is of great importance for water and water resources management, its estimation is a challenge. The experimental objectives of this study were achieved by estimating ET o based on limited meteorological data using an artificial neural network method for a particular area. Alternative methods were found for the estimation of ET o where there is a lack of climate data with high accuracy, nevertheless the best developed ANN model in ET o estimation was selected based on performance metrics. We established that the optimal ANN for forecasting ET o included T max , T min , H, and SR at hidden layers (4, 3); grad min , SR, and WS at (6,4); SR, day length, S sh , and T mean at (3,2); all collected parameters at (5,4). The calculated ET o values achieved a high statistical significance versus those simulated with the lowest distributional variations, and the accuracy and coefficients of determination were close to 1. ANN models produced relatively satisfactory outputs, and after developing an easy-to-use modelling platform, could be advanced to promote the decision-making for farmers, water managers, designers, and development planners at the study site. Nevertheless, the application of ANNs also proved to be an appropriate tool in ET o estimation, therefore, there is a potential to develop models for a larger climatic region (i.e., Pannonian region) that involve more accurate climate data for prediction.