Next Article in Journal
Interactions Between Forest Cover and Watershed Hydrology: A Conceptual Meta-Analysis
Previous Article in Journal
An Extensive Review of Leaching Models for the Forecasting and Integrated Management of Surface and Groundwater Quality
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction of Reference Crop Evapotranspiration in China’s Climatic Regions Using Optimized Machine Learning Models

1
College of Water Resources and Hydropower, Sichuan Agricultural University, Ya’an 625014, China
2
State Key Laboratory of Hydraulics and Mountain River Engineering & College of Water Resource and Hydropower, Sichuan University, Chengdu 610065, China
3
Jianyang Xintianfu Agricultural Technology Co., Ltd., Jianyang 641400, China
*
Author to whom correspondence should be addressed.
Water 2024, 16(23), 3349; https://doi.org/10.3390/w16233349
Submission received: 2 October 2024 / Revised: 31 October 2024 / Accepted: 15 November 2024 / Published: 21 November 2024

Abstract

:
The accurate estimation of reference crop evapotranspiration (ET0) is essential for crop water consumption modeling and agricultural water resource management. In the present study, three bionic algorithms (aquila optimizer (AO), tuna swarm optimization (TSO), and sparrow search algorithm (SSA)) were combined with an extreme learning machine (ELM) model to form three mixed models (AO-ELM, TSO-ELM, and SSA-ELM). The accuracy of the ET0 estimates for five climate regions in China from 1970 to 2019 was evaluated using the FAO-56 Penman–Monteith (P-M) equation. The results showed that the predicted values of the three mixed models and the ELM model fitted the P-M calculated values well. R2 and RMSE were 0.7654–0.9864 and 0.1271–0.7842 mm·d−1, respectively, for which the prediction accuracy of the AO-ELM model was the highest. The performance of the AO-ELM combination5 (maximum temperature (Tmax), minimum temperature (Tmin), total solar radiation (Rs), sunshine duration (n)) was most significantly improved on the basis of the ELM model. The prediction accuracy for the stations in the plateau mountain climate (PMC) region was the best, while the prediction accuracy for the stations in the tropical monsoon climate region (TPMC) was the worst. In addition to the wind speed (U2) in the temperate continental climate region (TCC)—which was the largest variable affecting ET0—n, Ra, and total solar radiation (Rs) in the other climate regions were more important than relative humidity (RH) and wind speed (U2) in predicting ET0. Therefore, AO-ELM4 was selected for the TCC region (with Tmax, Tmin, Rs, and U2 as inputs) and AO-ELM5 (with Tmax, Tmin, Rs, and n as inputs) was selected for the TMC, PMC, SMC, and TPMC regions when determining the best model for each climate region with limited meteorological data.

1. Introduction

Reference crop evapotranspiration (ET0) is the sum of crop transpiration and soil evaporation under conditions of uniform, vigorous growth, complete ground coverage, and adequate water supply. It represents the exchange of gas, energy, and water between the soil, crops, and the atmosphere [1]. The accurate measurement and estimation of regional ET0 are crucial for simulating crop water consumption, scheduling irrigation, and managing agricultural water resources [2].
There are several methods for calculating ET0, including the Hargreaves, FAO-56 Penman–Monteith (P-M), and Priestley–Taylor models. The most widely accepted method is the FAO-56 P-M model, recommended by the Food and Agriculture Organization of the United Nations (FAO). This model combines radiation and aerodynamic terms of meteorological factors [3] and has been extensively validated globally [4,5,6]. However, the P-M model requires comprehensive meteorological data, which are often unavailable due to the limited development and insufficient investment in many regions. As a result, empirical models that can accurately estimate ET0 with minimal input data, such as temperature, radiation, and mass transfer models, have been developed and applied [7].
With the advancement in computing power and the integration of machine learning algorithms in agriculture, researchers have increasingly used various machine learning models to predict ET0. Examples include the artificial neural network model (ANN) [8,9], support vector machine model (SVM) [10,11,12,13], generalized regression neural network model (GRNN) [14,15], and random forest model (RF) [16,17]. These models have demonstrated high accuracy in predicting ET0. For instance, Wen et al. [13] used an SVM model to predict ET0 in arid regions with limited data and found it to outperform three empirical models. Studies have consistently shown that machine learning models provide more accurate ET0 predictions compared to empirical models [18,19,20,21,22].
Recently, the extreme learning machine model (ELM) has gained popularity due to its fast computation speed and high performance [23,24,25,26,27]. Abdullah et al. [28] found the ELM model to be more efficient and faster than the feedforward backpropagation (FFBP) model and the P-M equation in predicting ET0 in Iraq. Feng et al. [15,24] and Allan T et al. [12] also reported that the ELM model outperformed other machine learning models.
As research has advanced, scholars have observed that while the extreme learning machine (ELM) model can achieve high accuracy in predicting ET0, it suffers from issues such as local optimal solutions and slow convergence rates during simulations. These problems reduce the model’s portability across different regions. To address these limitations, researchers have employed various bio-inspired algorithms to optimize the initial weights and thresholds of the ELM model. This optimization aims to mitigate existing model issues, further enhance prediction accuracy, and improve overall predictive performance [8,29]. For instance, Liu et al. [30] employed a hybrid model combining ELM with a genetic algorithm (GA-ELM) to forecast ET0 in southwestern China. Their findings indicated that the GA-ELM model outperformed both the standalone ELM and empirical models during both training and testing phases. Similarly, Zhu et al. [31] compared the PSO-ELM model with the ELM, ANN, and RF models and six empirical models based on three input modes of radiation, temperature, and mass transfer and found that the machine learning model had more accurate ET0 predictions than the empirical models, with the PSO-ELM model surpassing other machine learning approaches in performance.
Although many studies have demonstrated that combining the ELM model with bio-inspired algorithms can yield excellent predictions of ET0, these applications have largely been limited to traditional optimization algorithms such as genetic algorithm (GA), particle swarm optimization (PSO), and whale optimization algorithm (WOA). Over time, several novel bio-inspired algorithms have emerged, including tuna swarm optimization (TSO) [32], aquila optimizer (AO) [33], and sparrow search algorithm (SSA) [34]. However, due to the scarcity of research on the application of these new algorithms in various climate regions across China, their performance remains largely unexplored. Consequently, there are insufficient data to assess the advantages and limitations of these algorithms. To address this gap, the current study proposes the integration of three novel bio-inspired algorithms—TSO, AO, and SSA—with the ELM model. The resulting hybrid algorithms, TSO-ELM, AO-ELM, and SSA-ELM, are applied to simulate and predict the ET0 at 20 meteorological stations across five distinct climate regions in China, covering the period from 1970 to 2019.
This study aims to combine three new bionic algorithms (TSO, AO, and SSA) with the ELM model to simulate and predict ET0 across 20 meteorological stations in five Chinese climate regions from 1970 to 2019. The primary objectives are as follows: (1) to analyze the performance of these models under different meteorological factors; (2) to compare the prediction accuracy of the TSO-ELM, AO-ELM, and SSA-ELM models with the standard ELM model; and (3) to recommend reliable forecasting models and meteorological factor input combinations for different Chinese climate regions.

2. Materials and Methods

2.1. Experimental Site

China’s diverse climate is categorized into five distinct climatic regions based on variations in temperature, altitude, and precipitation (Figure 1). These regions include the temperate continental climate region (TCC), temperate monsoon climate region (TMC), plateau mountain climate region (PMC), subtropical monsoon climate region (SMC), and tropical monsoon climate region (TPMC) [35]. The TCC and PMC regions are characterized as arid and semi-arid, with average annual precipitation levels of 285 mm and 382 mm, respectively. The average annual evaporation in these regions is 2148 mm for the TCC region and 1883 mm for the PMC region. In contrast, the TMC region is classified as sub-humid, with an average annual precipitation of 648 mm and an average annual evaporation of 1475 mm. The SMC and TPMC regions are considered humid, with annual precipitation levels of 1538 mm and 1964 mm, respectively. The average annual evaporation in these humid regions is 1545 mm for the SMC region and 1175 mm for the TPMC region.

2.2. Data Collection and Analysis

This study selected 20 representative weather stations with comprehensive and easily accessible meteorological data to capture the climatic characteristics of each region. The selected data spanned from 1970 to 2019 and included the following meteorological variables as input factors for analysis: daily maximum/low temperature (Tmax/Tmin), atmospheric relative humidity (RH), wind speed at a height of 2 m (U2), sunshine hours (n), total solar radiation (Rs) and atmospheric radiation (Ra). The performance of a hybrid machine learning model, optimized through various bionic algorithms, was evaluated using these input factors.
Table 1 presents the geographical coordinates (longitude and latitude), altitude, and multi-year average meteorological data for the selected weather stations. The wind speed at a height of 10 m (U10) was converted to a 2 m height equivalent (U2) using established wind profile relationships. The average temperature (Tmean) was calculated as the average of Tmax and Tmin [1]. The meteorological data utilized in this study were sourced from the National Meteorological Information Center–China Meteorological Data Network “https://data.cma.cn/ (accessed on 15 July 2023)”, ensuring data authenticity and validity.
In cases where meteorological data were missing, an interpolation method was employed to estimate and fill in the missing records, thereby maintaining data integrity. Given the extensive time span and volume of the data, the meteorological records from 1970 to 2019 were divided into training and test datasets. The training dataset comprised 40 years of meteorological data, whereas the test dataset included 10 years of data. This division allowed for a robust assessment of the model’s performance across different temporal scales.

2.3. Extreme Learning Machine Model (ELM)

To address the challenges of slow convergence and lengthy iteration times associated with traditional machine learning models, the extreme learning machine (ELM) proposes a learning algorithm based on a single hidden layer feedforward neural network [36]. This algorithm requires the user only to specify the number of hidden layer nodes, randomly generate all the parameters of the hidden layer, and employ the least squares method to determine the weights of the output layer [37]. The ELM algorithm is favored by many researchers due to its characteristics, including short computation times, strong nonlinear approximation capabilities, and robust learning performance.
Figure 2 illustrates the topology of the ELM, which consists of an input layer, a hidden layer, and an output layer. In this network model, the input sample is represented by x, and the output of the hidden layer is denoted as H (x). The calculation formula for the output H (x) of the hidden layer is expressed as follows:
H ( x ) = h 1 x , , h L x
The input value is multiplied by the corresponding weight, adjusted by a bias term, and subsequently processed through a nonlinear function node to obtain the hidden layer’s output. Here, hi (x) denotes the output of the i-th hidden node, which can be formulated as follows:
h i x = g w i , b i , x = g ( w i x + b i ) , w i R D , b i R
In this expression, g w i , b i , x serves as the activation function. For this study, the Sigmoid function is employed, defined as:
g x = 1 1 + e - x = e x e x + 1
Finally, the output from the output layer is given by:
f L x = i = 1 L β i h i x = H ( x ) β  

2.4. Novel Bionic Algorithm

In this study, the sparrow search algorithm, tuna swarm optimization algorithm, and Skyhawk search algorithm were employed to optimize the initial weights and thresholds of the ELM model. The primary aim of this optimization is to enhance the overall performance of the ELM model.
Figure 3 illustrates the entire process of the ELM model, which includes the input of meteorological factors, the optimization stage utilizing the three bionic algorithms, and the final model output. The schematic principles of the three bionic algorithms are further detailed in Figure 4.

2.4.1. Sparrow Search Algorithm (SSA)

The sparrow search algorithm was first proposed by Xue Jiankai in 2020 [34]. Inspired by the predation and anti-predation behaviors of sparrows to attain the optimal position, the individuals within the population are categorized into two roles: discoverers and followers. This categorization mimics the real predation scenario to search for the optimal value. The search process is conducted through the following steps:
  • Initialization of sparrow population:
Suppose there are n sparrows in a predator population, with the initial population FX represented as a matrix of fitness values:
F X = f [ X ] = f ( [ x 1 , 1 x 1 , 2 x 1 , d ] ) f ( [ x 2 , 1 x 2 , 2 x 2 , d ] ) f ( [ x n , 1 x n , 2 x n , d ] )
The fitness values of the sparrow population are then sorted from largest to smallest to obtain the optimal fitness value of the sparrow. Subsequently, the initially generated sparrow population is updated according to the position of the discoverer, follower, and danger warning. The formulas for these updates are as follows:
2.
Discoverer Update:
X i , j t + 1 = X i , j t exp ( - i α iter max ) if   R 2   <   ST X i , j t + Q L if   R 2   ST
3.
Follower Updates:
X ij t + 1 = Q exp x worst t   -   x ij t i 2 if   i   >   n / 2 X best   t + X ij t   -   X best t rand ( { - 1 , 1 } ) L otherwise
4.
Danger warning:
X ij t + 1 = X best t + β X ij t   - X best t if   f i > f g X ij t + K X ij t - X worst t ( f i - f w ) + ε if   f i = f g

2.4.2. Tuna Swarm Optimization (TSO)

The tuna swarm optimization algorithm is a novel optimization structure algorithm proposed in 2021 based on the predation strategy of tuna [32]. Its primary features include simple operation, fewer adjustable parameters, and higher accuracy beyond local optimization.
  • Population initialization:
X i int = rand ( ub - lb ) + lb , i = 1 , 2 , , NP
2.
Spiral predation:
The tuna group forms a spiral state to control the prey within a certain range, thereby achieving efficient predation. According to the spiral predation strategy, its mathematical model is formulated as follows:
X i t + 1 =   α 1 X best t + β X best t - X i t | ) + α 2 X i t , i = 1 α 1 X best t + β X best t - X i t | ) + α 2 X i - 1 t , i = 2 , 3 , , N P
An optimal random coordinate is randomly generated within the tuna population, which serves as the reference point for the spiral search. The mathematical model is expressed as:
X i t + 1 =   α 1 X rand t + β X rand t - X i t | ) + α 2 X i t , i = 1 α 1 X rand t + β X rand t - X i t | ) + α 2 X i - 1 t , i = 2 , 3 , , N P
Heuristic algorithms typically begin with a global search and gradually narrow the search scope to a local search. Consequently, as the number of iterations increases, the search of the tuna population shifts from random reference points to precise optimal individuals.

2.4.3. Aquila Optimizer (AO)

The aquila optimizer is a new biological optimization algorithm developed in 2021 based on the hunting behavior of the aquila in nature [33]. It possesses strong optimization capabilities and rapid convergence speed. The algorithm explores the optimal solution through a four-stage optimization process, updating the location in real time, and the search mechanism automatically stops when the desired result is achieved.
First, the position of the population within the search range is initialized:
X ij = rand × U B j   - L B j + L B j ,   i = 1 , 2 , , N   j = 1 , 2 , Dim
  • Extended Search (X1):
Aquila hover high in search of prey areas and select the best hunting area in a vertical bent position. Its mathematical model is as follows:
X 1 ( t + 1 ) = X best ( t )   ×   1   -   t T + ( X M ( t )   -   X best ( t ) rand )
2.
Narrowing Down the Search (X2):
When the first stage identifies the prey area, the AO narrowly searches the selected hunting area for the target prey, preparing to attack the targeted prey. Its mathematical model is expressed as:
X 2 t + 1 = X best ( t ) × Levy ( D ) + X R ( t ) + ( y - x ) rand
3.
Expansion Development (X3):
When the prey is locked down, it descends vertically and attacks. The chosen target area in AO is utilized to get close to the prey. Its mathematical model is formulated as:
X 3 ( t + 1 ) = ( X bex ( t ) - X M ( t ) ) × α - rand + ( ( UB - LB ) × rand + LB ) × δ
4.
Scaling Down Development (X4):
When an aquila attacks its prey, it randomly attacks as the prey moves. Its mathematical model is expressed as:
X 4 ( t + 1 ) = QF ( t ) × X best ( t ) - ( G 1 × X ( t ) × rand ) - G 2 × Levy ( D ) + rand × G 1

2.5. Refer to Crop Evapotranspiration Calculation Model

2.5.1. FAO-56 Penman–Monteith Equation

In this study, the FAO-recommended Penman–Monteith formula was employed to calculate the evapotranspiration of 20 sites across various climatic regions in China, providing a benchmark for assessing the model’s accuracy [1].
ET 0 = 0.408 Δ ( R n   - G ) + γ 900 T + 273 U 2 ( e s   - e a ) Δ + γ ( 1 + 0.34 U 2 )
where ET0 is the reference crop evapotranspiration quantity (mm·d−1); Rn is the net radiation; G is the soil heat flux (MJ m−2·d−1); T is the average temperature (℃); es is the saturated water pressure (kPa); ea is the actual water pressure (kPa); Δ is the slope of the saturated water barometric temperature curve (kPa·℃−1); γ is the hygrometer constant (kPa·℃−1); and U2 is the ground wind speed 2 m high above the ground (m·s−1). The detailed calculation process of parameters has been explained in the paper FAO-56 [1].

2.5.2. Model Accuracy Verification

To more accurately evaluate and quantify the performance and accuracy of the ET0 model, four evaluation indices—the coefficient of determination (R2), root mean square error (RMSE), normalized root mean square error (NRMSE), and mean absolute error (MAE)—were employed in this study. Their mathematical expressions are as follows:
R 2 = [ i = 1 n ( X i   - X ) ( Y i   - Y ) ] 2 i = 1 n ( X i   - X ) 2 i = 1 n ( Y i   - Y ) 2
RMSE = 1 n i = 1 n ( Y i   - X i ) 2
NRMSE = 1 n i = 1 n Y i   -   X i 2 X   ×   100 %
MAE = 1 n i = 1 n X i   - Y i
where Xi and Yi are measured and predicted values, respectively; and X and Y are the average of X i and Y i , respectively. When ET0 extremes exist in the process of assessing model performance, we should adopt the following strategies to ensure the accuracy and reliability of the results. Firstly, priority should be given to using metrics that are less sensitive to outliers, such as MAE; secondly, in order to gain a more comprehensive understanding of the model performance, we should not rely on only one metric to assess model performance, but should use a combination of multiple metrics, such as R2, MAE, RMSE, and NSE; furthermore, before assessing the model, it is very important to carry out appropriate preprocessing and outlier handling of the data, which can help to minimize the impact of outliers on the assessment results.

2.6. The Importance of Meteorological Factors for ET0 as Determined by the Through Analysis Method

Figure 5 illustrates the importance of meteorological factors on ET0 based on the path coefficient method. The study assessed the impact of seven meteorological factors on ET0 at 20 stations using the path coefficient method. The results indicate that total solar radiation (Rs) and maximum temperature (Tmax) are the most influential factors on ET0, with importance ranges of 0.832–0.947 and 0.766–0.906, respectively, which are crucial across most meteorological stations. The U2 factor shows significant importance only at the Kuerle and Kashi stations, while it is less influential at other stations. Additionally, the relative humidity (RH) factor exhibits an inverse effect on ET0 at most stations, meaning that an increase in humidity leads to a decrease in ET0. Therefore, the importance of each factor in influencing ET0 can be ranked as follows: Rs > Tmax > Ra > Tmin > n > U2 > RH.
In the present study, seven meteorological factors were combined based on their importance and ease of acquisition. The basic combination included Tmax, Tmin, and Rs. Additional factors such as Ra, RH, U2, and n were then added individually to form four combinations. According to the principle of mass transfer, RH and U2 were further added to the temperature factors to form a total of six input combinations. The detailed meteorological input combinations are presented in Table 2.

3. Results and Analysis

3.1. Comparison of Performance Differences of Machine Learning Models in Different Climate Regions

In this study, four evaluation indicators were used to assess the ET0 prediction capability of the ELM model and the three hybrid machine learning models (TSO-ELM, SSA-ELM, AO-ELM) across five climate regions in China under various combinations of inputs. The evaluation indicators are shown in Figure 6. It was observed that all four of these models achieved satisfactory prediction accuracy, with the hybrid models demonstrating higher accuracy under the same meteorological input combinations compared to the ELM model. Notably, the combination5 yielded better predictions than the other combinations, while using combination1exhibited the worst performance in terms of R2, indicating significant differences in model accuracy across different meteorological input scenarios.
To further evaluate the performance of each model, the statistical indicators from each climate region station during the training stage and the test stage were averaged, with the results presented in Table 3, Table 4, Table 5, Table 6 and Table 7. The R2, RMSE, NRMSE, and MAE values ranged from 0.7654 to 0.9864, 0.1271 to 0.7842 mm∙d−1, 0.0409 to 0.2647, and 0.0886 to 0.5757 mm∙d−1, respectively. Based on the test stage data, the AO-ELM model exhibited the best performance, with R2, RMSE, NRMSE, and MAE values of 0.9139, 0.4333 mm∙d−1, 0.1532, and 0.3114 mm∙d−1, respectively. The TSO-ELM and SSA-ELM models followed, while the ELM model showed the poorest performance, with R2, RMSE, NRMSE, and MAE values of 0.9089, 0.4429 mm∙d−1, 0.1574, and 0.3227 mm∙d−1, respectively.
The AO, TSO, and SSA models all possess strong global search capabilities, rapid convergence rates, and good robustness, characterized by stable performance across diverse optimization problems and datasets, along with concise algorithmic structures and straightforward implementation. The findings indicate that the three bionic algorithms enhance the ELM model with varying degrees of optimization performance, suggesting a certain level of rationality in the bionic algorithms’ optimization of the ELM model.
Despite the superior performance of all three bionic algorithms, there are notable differences. This is attributed to the AO algorithm’s ability to incorporate global information during the optimization process, allowing the AO-ELM model to more effectively avoid overfitting and thereby exhibit stronger generalization abilities. Consequently, the AO-ELM model maintains higher prediction accuracy when faced with new, unseen data. Although the SSA algorithm is also an effective global optimization method, it may become trapped in local optima when dealing with complex problems, limiting the model’s prediction accuracy. Additionally, the convergence speed of the SSA algorithm may be influenced by the initial population and the number of iterations, impacting training efficiency and prediction performance. In practice, the TSO algorithm may be constrained by problem size and complexity. These advantages enable the AO-ELM model to demonstrate higher prediction accuracy and stability in addressing complex prediction tasks.
In summary, the machine learning models in this study achieved commendable accuracy in simulating and predicting ET0 across the five climate regions, indicating a strong correlation between the predicted and measured values. Furthermore, the performance of the hybrid models varied, suggesting that the simulation performance of the ELM model was improved to varying extents after optimization by the bionic algorithms.
In the five different climate regions, the PMC stations showed good performance in the test stage (mean R2 = 0.9281, RMSE = 0.3653 mm∙d−1, NRMSE = 0.1424, MAE = 0.2668 mm∙d−1). The TPMC sites showed the worst performance (mean R2 = 0.8815, RMSE = 0.4461 mm∙d−1, NRMSE = 0.1246, MAE = 0.3427 mm∙d−1). In the five distinct climate regions, the PMC (plateau mountain climate) stations exhibited a strong performance during the testing phase, with a mean R2 of 0.9281, RMSE of 0.3653 mm∙d−1, NRMSE of 0.1424, and MAE of 0.2668 mm∙d⁻1. Conversely, the TPMC (tropical and subtropical monsoon climate) stations demonstrated the weakest performance, with a mean R2 of 0.8815, RMSE of 0.4461 mm∙d−1, NRMSE of 0.1246, and MAE of 0.3427 mm∙d−1. PMC regions are typically characterized by intense solar radiation and significant diurnal temperature variations. In contrast, TPMC regions are generally marked by high temperatures, elevated humidity levels, and concentrated, heavy rainfall events. Given that the primary meteorological inputs are temperature and solar radiation, factors such as solar radiation (Rs), temperature (T), and sunshine duration (n) have a more pronounced effect on ET0 prediction in PMC regions. On the other hand, in TPMC regions, rainfall and humidity play a more critical role. The substantial fluctuations in rainfall and humidity can lead to overfitting or underfitting of the model during training, thereby affecting prediction accuracy.
For instance, Dong et al. [38] reported that four hybrid models based on KNEA (K-Nearest Neighbor Evolutionary Algorithm) performed worst at PMC sites when predicting ET0 across various climate regions in China. Similarly, Wu et al. [23] found that the prediction accuracy of their optimized model was significantly higher in TMC and SMC regions compared to TCC and PMC regions. The discrepancy between our results and those of previous studies may be attributed to differences in the overall performance of the various hybrid machine learning models, the specific combinations of input factors, and the selection of sites within the same climate regions. It is evident that the prediction accuracy of the same hybrid model can vary depending on the climate region of the station, indicating that local climate and environmental conditions influence model performance.
During the testing phase, scatter plots were generated to compare the predicted values and P-M (Penman–Monteith) measured values for the four machine learning models under different input combinations, using Linxia Station as a representative example (Figure 7). Most data points are closely aligned with the 1:1 line, suggesting that the model’s predicted values are in close agreement with the measured values and indicating a strong correlation. However, it is apparent that varying input combinations affect the accuracy of the ET0 predictions. The R2 values for the combination of two and five variables are consistently higher than those of other combinations, and the scatter plots exhibit less dispersion, indicating a better fit between the predicted and measured values and higher prediction accuracy. In contrast, the scatter plots for the input combination of six variables show a high degree of dispersion and less ideal prediction outcomes, suggesting that while introducing solar radiation (Ra) and sunshine duration (n) as input parameters yields reliable prediction performance, incorporating relative humidity (RH) and wind speed (U2) tends to degrade the model performance.
These findings are consistent with earlier research, confirming that Ra and n are critical parameters for accurate ET0 prediction [8,39]. The variation in model performance across different climate regions is closely linked to local climate conditions. Sunshine duration indirectly reflects solar radiation, which provides the necessary energy for evapotranspiration, converting liquid water into water vapor [31]. Additionally, some studies have highlighted solar radiation as the primary factor influencing ET0 variations in China [40]. This further supports the notion that Ra and n are more critical than RH and U2 for ET0 prediction.

3.2. Comparison of the Stability of Each Machine Learning Model

In general, the four models demonstrated good accuracy across the five climate regions, as shown in Table 3, Table 4, Table 5, Table 6 and Table 7. Figure 8 presents the average RMSE values of the ELM model and the mixed model. The results indicate that the average RMSE values and their growth rates during both the training and testing stages are higher than those observed during the training stage. Fan et al. [25] and Wu et al. [41] also reported similar findings when predicting ET0. In terms of growth rate, the RMSE growth rates of the three mixed models are higher than that of the ELM model; among them, the AO-ELM model exhibits the largest growth rate (1.5–7.2%), while the ELM model shows the lowest growth rate (0.3–5.9%). However, it is observed that the mixed models all exhibit lower RMSE values than the ELM model in the testing phase. Despite the ELM model having a smaller RMSE increase and a more stable performance, the RMSE value of the AO-ELM model is more satisfactory. Therefore, it is recommended to use the AO-ELM model for simulating and predicting ET0. Compared to the model performance during the training stage, the model performance at the stations in each climate region declined during the testing stage. The primary reason for this performance difference is the distribution discrepancy between the climatic environment and meteorological data samples during the training and testing stages. McVicar et al. [42] previously discussed in their study that the environment and climate of stations in each region would change over time, resulting in differences in ET0 prediction accuracy between the two stages.
Figure 9 illustrates the changes in statistical indicators for the mixed model based on the ELM model. As shown in the figure, R2 increases by 0.0022–0.5188%, RMSE decreases by −0.0079–3.9794%, NRMSE decreases by 1.3139–4.0016%, and MAE decreases by 1.3139–4.0016% across all the climate regions. This suggests that the mixed model’s performance is improved compared to the ELM model, consistent with the conclusion that the mixed model can achieve better performance in predicting ET0 than the original model, as noted by Muhammad et al. [43], Zhu et al. [31], and Reham et al. [44]. In particular, in the PMC region, the performance improvement is the most obvious (R2 increased by 0.3229–0.5188%, RMSE decreased by 3.9605–3.9794%, NRMSE decreased by 2.9798–4.0016%, MAE decreased by 2.9798–4.0016%). The TSO-ELM model exhibits the best improvement effect (R2 increased by 0.5188%, RMSE decreased by 3.9605%, NRMSE decreased by 4.0016%, MAE decreased by 4.5507%). However, the performance improvement at the TCC sites is the poorest, and the RMSE error value of the SSA-ELM model even increases (RMSE increases by 0.0079%), indicating that the optimization effect of the SSA algorithm is not ideal in this area.

3.3. Comparison of Computational Costs of Various Machine Learning Models

Figure 10 describes the average computational runtime for the four models across combinations of six weather factors. From the figure, it is evident that the ELM model boasts the shortest runtime, which ranges from 0.91 to 1.32 s. The other three hybrid algorithms, however, exhibit longer runtimes, approximately 5.41 to 14.86 times those of the ELM model. This discrepancy is attributed to the larger number of parameters optimized by the hybrid models, which are run using bionic algorithms that necessitate more time to identify the optimal solution. The computational times of the models for various combinations of meteorological factors do not exhibit significant variations. Notably, the TSO-ELM model’s runtime shows slight fluctuations under different combinations. Among the three hybrid algorithms, the AO-ELM model records the shortest runtime. Despite the AO-ELM model’s runtime being increased by 5.41 to 7.34 times compared to the ELM model, its prediction accuracy surpasses that of the ELM model. Given the primary importance of prediction accuracy in estimating ET0 by a machine learning model, the computational time of the AO-ELM model is deemed acceptable within this context.

3.4. Selection of the Best Model for Each Climate Region

Table 8 presents the optimal statistical indicator values for the best-performing models derived from meteorological stations across the five climate regions in China, using different meteorological factor inputs (using the testing phase as an example). As indicated in the table, the statistical indices R2, RMSE, NRMSE, and MAE range between 0.9291 and 0.9864, 0.1271 and 0.4550 mm·d−1, 0.0886 and 0.3373, and 0.0409 and 0.1825 mm·d−1, respectively, across the climate regions. The AO-ELM model excelled in predictive performance across the most sites (13 sites) among the five climate regions, suggesting that the AO-ELM model can achieve superior prediction performance across a broad range of applications. Our analysis revealed that the hybrid model in the TCC region exhibited the highest prediction accuracy under combination 4. Wind speed directly influences the velocity and direction of water molecule movement on the ground. In conditions of high wind speed, water molecules are more likely to diffuse from the evaporating surface into the atmosphere, thereby accelerating the evaporation rate. Additionally, when constructing the evapotranspiration model, wind speed serves as one of the critical input parameters, and the accuracy and reliability of the wind speed data significantly affect the model’s prediction outcomes. Furthermore, wind speed indirectly impacts the model’s evapotranspiration prediction accuracy by adjusting the model parameters. Given that meteorological stations in this climatic region are situated inland, near deserts and grasslands, characterized by year-round dry, less rainy, and windy conditions, the wind speed’s impact on evapotranspiration becomes more pronounced. This indicates that in this climatic region, ET0 is more influenced by the U2 factor, and thus combination 4, which includes the U2 factor, is more suitable for the TCC region. The other four climate regions exhibited the best performance when using the combination2 and the combination5 as input factors, especially, combination5 notably achieving better prediction accuracy, underscoring the pivotal role of the meteorological factor n in predicting ET0. This finding aligns with Yan et al.’s [39] conclusion regarding the prediction of ET0 in arid and humid areas of China, where they observed that wind speed had a greater impact on arid areas, while sunshine duration was more critical in humid areas. In summary, when assessing the predictive performance of different models across various climate regions, the best results are achieved by selecting the most appropriate meteorological input parameters for the local stations.

3.5. Improving ET0 Predictions More Effectively

Although the hybrid algorithm in this study achieved good accuracy, the boosting performance is not satisfactory, for which the prediction of ET0 can be further improved by the following methods. Firstly, before the introduction of advanced algorithms, it is necessary to carry out pre-processing work on the data, including data missing value processing, outlier detection, etc., which helps to improve the stability and accuracy of the algorithm. Secondly, the appropriate algorithm is selected according to the characteristics of the data and regional variability, and the parameters of the algorithm are tuned. This can be achieved by introducing optimization algorithms to optimize the model parameters; by combining multiple machine learning algorithms, using the advantages of each model to improve the accuracy of the prediction model; or by constructing the ET0 prediction model through deep machine learning algorithms, integrated learning algorithms, and a series of other methods. After constructing the model, the model needs to be evaluated and verified to ensure its performance in practical applications.

4. Conclusions

In this study, the performance of ELM is optimized using three algorithms: the aquila optimizer (AO), tuna swarm optimization (TSO), and sparrow search algorithm (SSA). The optimized hybrid models, namely AO-ELM, TSO-ELM, and SSA-ELM, along with the standalone ELM model, were simulated to predict the reference evapotranspiration (ET0) at stations located in various climatic regions across China. The models were trained and tested using daily meteorological data (Tmax, Tmin, Rs, Ra, RH, U2, and n) collected from 20 meteorological stations spanning five distinct climatic regions. The models were evaluated with six different combinations of meteorological input factors, and the predicted ET0 values were compared against those calculated using the FAO-56 Penman–Monteith (PM) equation to assess the degree of fit. The results indicate the following:
(1)
During the testing phase, the three hybrid models demonstrated satisfactory prediction accuracy across different climatic regions. Among them, the AO-ELM model exhibited superior predictive performance compared to the SSA-ELM and TSO-ELM models.
(2)
In scenarios where complete meteorological data are unavailable, the combination of the Tmax, Tmin, and Rs parameters with U2 as an input parameter yield better ET0 predictions in temperate continental monsoon climate regions. Conversely, using n as an input parameter provided satisfactory ET0 predictions in the other climate regions.
(3)
Stations located in highland mountain climate regions exhibited excellent simulation performance, while those in tropical monsoon climate regions showed the poorest performance. This suggests that local climate conditions significantly influence the overall model performance.
(4)
For model selection, the AO-ELM model demonstrated superior predictive performance when applied on a large scale. Regarding the optimal combination of input parameters, apart from the superior prediction accuracy of the combination4 in the temperate continental monsoon (TCC) region, the combination5 performed better in the remaining four climatic regions. Therefore, AO-ELM4 (utilizing Tmax, Tmin, Rs, and U2 as inputs) was chosen for the temperate continental climate (TCC) region, and AO-ELM5 (utilizing Tmax, Tmin, Rs and n as inputs) was chosen for the tropical monsoon climate (TMC), plateau mountain climate (PMC), subtropical monsoon climate (SMC), and temperate monsoon climate (TPMC) regions when determining the most suitable model for each climatic region with limited meteorological data.
The findings of this study provide valuable insights for predicting ET0 in diverse climatic regions. However, due to the limited number of bio-inspired algorithms employed in this research, the study has certain limitations. For more in-depth and rigorous research, it is recommended to explore a broader range of meteorological factors, input combinations, or advanced bio-inspired algorithms to develop more reliable methods for ET0 prediction.

Author Contributions

Conceptualization, R.M. and Y.L.; methodology, R.M.; software, R.M. and S.J.; validation, R.M., S.J., Y.L. and J.H.; formal analysis, R.M. and S.J.; investigation, H.M.; resources, J.H. and S.J.; data curation, R.M. and S.J.; writing—original draft preparation, J.H. and R.M.; writing—review and editing, J.H. and R.M.; visualization, H.M.; supervision, S.J.; project administration, J.H.; funding acquisition, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Sichuan Agricultural University Professional Development Support Program (2221998094), the Sichuan Science and Technology Program (2023YFN0024), the Chengdu Eastern New Area Technological Innovation Research Program (2024-DBXQ-KJYF008).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are openly available in National Meteorological Information Center-China Meteorological Data Network “https://data.cma.cn/ (Accessed on 15 July 2023)”.

Acknowledgments

We would like to express our gratitude to everyone who has provided support and advice throughout the research process. We would like to acknowledge the financial support from the Sichuan Agricultural University Professional Development Support Program (2221998094), the Sichuan Science and Technology Program (2023YFN0024), the Chengdu Eastern New Area Technological Innovation Research Program (2024-DBXQ-KJYF008). All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest. The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

References

  1. Allen, R.G. Crop Evapotranspiration-Guidelines for computing crop water requirements. FAO Irrig. Drain. Pap. 1998, 56, 147–151. [Google Scholar]
  2. Shiri, J.; Kisi, Ö.; Landeras, G.; López, J.J.; Nazemi, A.H.; Stuyt, L. Daily reference evapotranspiration modeling by using genetic programming approach in the Basque Country (Northern Spain). J. Hydrol. 2012, 414, 302–316. [Google Scholar] [CrossRef]
  3. Zhang, Q.; Cui, N.; Feng, Y.; Gong, D.; Hu, X. Improvement of Makkink model for reference evapotranspiration estimation using temperature data in Northwest China. J. Hydrol. 2018, 566, 264–273. [Google Scholar] [CrossRef]
  4. Feng, Y.; Jia, Y.; Cui, N.; Zhao, L.; Li, C.; Gong, D. Calibration of Hargreaves model for reference evapotranspiration estimation in Sichuan basin of southwest China. Agric. Water Manag. 2017, 181, 1–9. [Google Scholar] [CrossRef]
  5. Yang, Y.; Chen, R.S.; Han, C.T.; Liu, Z.W. Evaluation of 18 models for calculating potential evapotranspiration in different climatic zones of China. Agric. Water Manag. 2021, 244, 106545. [Google Scholar] [CrossRef]
  6. Guo, X.H.; Sun, X.H.; Ma, J.J. Prediction of daily crop reference evapotranspiration (ET0) values through a least-squares support vector machine model. Hydrol. Res. 2011, 42, 268–274. [Google Scholar] [CrossRef]
  7. Feng, Y.; Jia, Y.; Zhang, Q.; Gong, D.; Cui, N. National-scale assessment of pan evaporation models across different climatic zones of China. J. Hydrol. 2018, 564, 314–328. [Google Scholar] [CrossRef]
  8. Gao, L.L.; Gong, D.Z.; Cui, N.B.; Lv, M.; Feng, Y. Evaluation of bio-inspired optimization algorithms hybrid with artificial neural network for reference crop evapotranspiration estimation. Comput. Electron. Agric. 2021, 190, 106466. [Google Scholar] [CrossRef]
  9. Ferreira, L.B.; Da Cunha, F.F.; de Oliveira, R.A.; Fernandes, E.I. Estimation of reference evapotranspiration in Brazil with limited meteorological data using ANN and SVM—A new approach. J. Hydrol. 2019, 572, 556–570. [Google Scholar] [CrossRef]
  10. Fan, J.L.; Zheng, J.; Wu, L.F.; Zhang, F.C. Estimation of daily maize transpiration using support vector machines, extreme gradient boosting, artificial and deep neural networks models. Agric. Water Manag. 2021, 245, 106547. [Google Scholar] [CrossRef]
  11. Mohammadi, B.; Mehdizadeh, S. Modeling daily reference evapotranspiration via a novel approach based on support vector regression coupled with whale optimization algorithm. Agric. Water Manag. 2020, 237, 106145. [Google Scholar] [CrossRef]
  12. Tejada, A.T.; Ella, V.B.; Lampayan, R.M.; Reaño, C.E. Modeling Reference Crop Evapotranspiration Using Support Vector Machine (SVM) and Extreme Learning Machine (ELM) in Region IV-A, Philippines. Water 2022, 14, 754. [Google Scholar] [CrossRef]
  13. Wen, X.H.; Si, J.H.; He, Z.B.; Wu, J.; Shao, H.B.; Yu, H.J. Support-Vector-Machine-Based Models for Modeling Daily Reference Evapotranspiration With Limited Climatic Data in Extreme Arid Regions. Water Resour. Manag. 2015, 29, 3195–3209. [Google Scholar] [CrossRef]
  14. Ladlani, I.; Houichi, L.; Djemili, L.; Heddam, S.; Belouz, K. Modeling daily reference evapotranspiration (ET0) in the north of Algeria using generalized regression neural networks (GRNN) and radial basis function neural networks (RBFNN): A comparative study. Meteorol. Atmos. Phys. 2012, 118, 163–178. [Google Scholar] [CrossRef]
  15. Feng, Y.; Peng, Y.; Cui, N.B.; Gong, D.Z.; Zhang, K.D. Modeling reference evapotranspiration using extreme learning machine and generalized regression neural network only with temperature data. Comput. Electron. Agric. 2017, 136, 71–78. [Google Scholar] [CrossRef]
  16. Feng, Y.; Cui, N.B.; Gong, D.Z.; Zhang, Q.W.; Zhao, L. Evaluation of random forests and generalized regression neural networks for daily reference evapotranspiration modelling. Agric. Water Manag. 2017, 193, 163–173. [Google Scholar] [CrossRef]
  17. Wang, S.; Lian, J.J.; Peng, Y.Z.; Hu, B.Q.; Chen, H.S. Generalized reference evapotranspiration models with limited climatic data based on random forest and gene expression programming in Guangxi, China. Agric. Water Manag. 2019, 221, 220–230. [Google Scholar] [CrossRef]
  18. Yang, Y.; Sun, H.W.; Xue, J.; Liu, Y.; Liu, L.G.; Yan, D.; Gui, D.W. Estimating evapotranspiration by coupling Bayesian model averaging methods with machine learning algorithms. Environ. Monit. Assess. 2021, 193, 156. [Google Scholar] [CrossRef]
  19. Gul, S.; Ren, J.; Wang, K.; Guo, X. Estimation of reference evapotranspiration via machine learning algorithms in humid and semiarid environments in Khyber Pakhtunkhwa, Pakistan. Int. J. Environ. Sci. Technol. 2023, 20, 5091–5108. [Google Scholar] [CrossRef]
  20. Spontoni, T.A.; Ventura, T.M.; Palacios, R.S.; Curado, L.; Fernandes, W.A.; Capistrano, V.B.; Fritzen, C.L.; Pavao, H.G.; Rodrigues, T.R. Evaluation and Modelling of Reference Evapotranspiration Using Different Machine Learning Techniques for a Brazilian Tropical Savanna. Agronomy 2023, 13, 2056. [Google Scholar] [CrossRef]
  21. Agrawal, Y.; Kumar, M.; Ananthakrishnan, S.; Kumarapuram, G. Evapotranspiration Modeling Using Different Tree Based Ensembled Machine Learning Algorithm. Water Resour. Manag. 2022, 36, 1025–1042. [Google Scholar] [CrossRef]
  22. Nagappan, M.; Gopalakrishnan, V.; Alagappan, M. Prediction of reference evapotranspiration for irrigation scheduling using machine learning. Hydrol. Sci. J. 2020, 65, 2669–2677. [Google Scholar] [CrossRef]
  23. Wu, L.F.; Peng, Y.W.; Fan, J.L.; Wang, Y.C.; Huang, G.M. A novel kernel extreme learning machine model coupled with K-means clustering and firefly algorithm for estimating monthly reference evapotranspiration in parallel computation. Agric. Water Manag. 2021, 245, 106624. [Google Scholar] [CrossRef]
  24. Feng, Y.; Cui, N.B.; Zhao, L.; Hu, X.T.; Gong, D.Z. Comparison of ELM, GANN, WNN and empirical models for estimating reference evapotranspiration in humid region of Southwest China. J. Hydrol. 2016, 536, 376–383. [Google Scholar] [CrossRef]
  25. Fan, J.L.; Yue, W.J.; Wu, L.F.; Zhang, F.C.; Cai, H.J.; Wang, X.K.; Lu, X.H.; Xiang, Y.Z. Evaluation of SVM, ELM and four tree-based ensemble models for predicting daily reference evapotranspiration using limited meteorological data in different climates of China. Agric. For. Meteorol. 2018, 263, 225–241. [Google Scholar] [CrossRef]
  26. Feng, Y.; Gong, D.Z.; Mei, X.R.; Cui, N.B. Estimation of maize evapotranspiration using extreme learning machine and generalized regression neural network on the China Loess Plateau. Hydrol. Res. 2017, 48, 1156–1168. [Google Scholar] [CrossRef]
  27. Yin, Z.L.; Feng, Q.; Yang, L.S.; Deo, R.C.; Wen, X.H.; Si, J.H.; Xiao, S.C. Future Projection with an Extreme-Learning Machine and Support Vector Regression of Reference Evapotranspiration in a Mountainous Inland Watershed in North-West China. Water 2017, 9, 880. [Google Scholar] [CrossRef]
  28. Abdullah, S.S.; Malek, M.A.; Abdullah, N.S.; Kisi, O.; Yap, K.S. Extreme Learning Machines: A new approach for prediction of reference evapotranspiration. J. Hydrol. 2015, 527, 184–195. [Google Scholar] [CrossRef]
  29. Chia, M.Y.; Huang, Y.F.; Koo, C.H. Swarm-based optimization as stochastic training strategy for estimation of reference evapotranspiration using extreme learning machine. Agric. Water Manag. 2021, 243, 106447. [Google Scholar] [CrossRef]
  30. Liu, Q.S.; Wu, Z.J.; Cui, N.B.; Zhang, W.J.; Wang, Y.S.; Hu, X.T.; Gong, D.Z.; Zheng, S.S. Genetic Algorithm-Optimized Extreme Learning Machine Model for Estimating Daily Reference Evapotranspiration in Southwest China. Atmosphere 2022, 13, 971. [Google Scholar] [CrossRef]
  31. Zhu, B.; Feng, Y.; Gong, D.Z.; Jiang, S.Z.; Zhao, L.; Cui, N.B. Hybrid particle swarm optimization with extreme learning machine for daily reference evapotranspiration prediction from limited climatic data. Comput. Electron. Agric. 2020, 173, 105430. [Google Scholar] [CrossRef]
  32. Xie, L.; Han, T.; Zhou, H.; Zhang, Z.R.; Han, B.; Tang, A.D. Tuna Swarm Optimization: A Novel Swarm-Based Metaheuristic Algorithm for Global Optimization. Comput. Intell. Neurosci. 2021, 2021, 9210050. [Google Scholar] [CrossRef] [PubMed]
  33. Abualigah, L.; Yousri, D.; Abd Elaziz, M.; Ewees, A.A.; Al-qaness, M.; Gandomi, A.H. Aquila Optimizer: A novel meta-heuristic optimization algorithm. Comput. Ind. Eng. 2021, 157, 107250. [Google Scholar] [CrossRef]
  34. Xue, J.; Shen, B. A novel swarm intelligence optimization approach: Sparrow search algorithm. Syst. Sci. Control Eng. 2020, 8, 22–34. [Google Scholar] [CrossRef]
  35. Fan, J.L.; Wu, L.F.; Zhang, F.C.; Xiang, Y.Z.; Zheng, J. Climate change effects on reference crop evapotranspiration across different climatic zones of China during 1956–2015. J. Hydrol. 2016, 542, 923–937. [Google Scholar] [CrossRef]
  36. Huang, G.; Zhu, Q.; Siew, C. Extreme learning machine: A new learning scheme of feedforward neural networks. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, Budapest, Hungary, 25–29 July 2004. [Google Scholar]
  37. Ding, S.F.; Zhao, H.; Zhang, Y.N.; Xu, X.Z.; Nie, R. Extreme learning machine: Algorithm, theory and applications. Artif. Intell. Rev. 2015, 44, 103–115. [Google Scholar] [CrossRef]
  38. Dong, J.H.; Liu, X.G.; Huang, G.M.; Fan, J.L.; Wu, L.F.; Wu, J. Comparison of four bio-inspired algorithms to optimize KNEA for predicting monthly reference evapotranspiration in different climate zones of China. Comput. Electron. Agric. 2021, 186, 106211. [Google Scholar] [CrossRef]
  39. Yan, S.C.; Wu, L.F.; Fan, J.L.; Zhang, F.C.; Zou, Y.F.; Wu, Y. A novel hybrid WOA-XGB model for estimating daily reference evapotranspiration using local and external meteorological data: Applications in arid and humid regions of China. Agric. Water Manag. 2021, 244, 106594. [Google Scholar] [CrossRef]
  40. Jiang, S.Z.; Liang, C.; Cui, N.B.; Zhao, L.; Du, T.S.; Hu, X.T.; Feng, Y.; Guan, J.; Feng, Y. Impacts of climatic variables on reference evapotranspiration during growing season in Southwest China. Agric. Water Manag. 2019, 216, 365–378. [Google Scholar] [CrossRef]
  41. Wu, L.F.; Fan, J.L. Comparison of neuron-based, kernel-based, tree-based and curve-based machine learning models for predicting daily reference evapotranspiration. PLoS ONE 2019, 14, e0217520. [Google Scholar] [CrossRef]
  42. McVicar, T.R.; Roderick, M.L.; Donohue, R.J.; Li, L.T.; Van Niel, T.G.; Thomas, A.; Grieser, J.; Jhajharia, D.; Himri, Y.; Mahowald, N.M.; et al. Global review and synthesis of trends in observed terrestrial near-surface wind speeds: Implications for evaporation. J. Hydrol. 2012, 416, 182–205. [Google Scholar] [CrossRef]
  43. Adnan, R.M.; Dai, H.L.; Mostafa, R.R.; Islam, A.; Kisi, O.; Elbeltagi, A.; Zounemat-Kermani, M. Application of novel binary optimized machine learning models for monthly streamflow prediction. Appl. Water Sci. 2023, 13, 110. [Google Scholar] [CrossRef]
  44. Mostafa, R.R.; Kisi, O.; Adnan, R.M.; Sadeghifar, T.; Kuriqi, A. Modeling Potential Evapotranspiration by Improved Machine Learning Methods Using Limited Climatic Data. Water 2023, 15, 486. [Google Scholar] [CrossRef]
Figure 1. Geographical distribution of meteorological stations in different climatic regions of China.
Figure 1. Geographical distribution of meteorological stations in different climatic regions of China.
Water 16 03349 g001
Figure 2. Topology structure of ELM.
Figure 2. Topology structure of ELM.
Water 16 03349 g002
Figure 3. Input, optimization, and output flow of optimized ELM model.
Figure 3. Input, optimization, and output flow of optimized ELM model.
Water 16 03349 g003
Figure 4. Flow chart of bionic optimization algorithm.
Figure 4. Flow chart of bionic optimization algorithm.
Water 16 03349 g004
Figure 5. Importance of meteorological factors to ET0 based on the through-coefficient method.
Figure 5. Importance of meteorological factors to ET0 based on the through-coefficient method.
Water 16 03349 g005
Figure 6. Statistical indicators of each model under different input combinations.
Figure 6. Statistical indicators of each model under different input combinations.
Water 16 03349 g006
Figure 7. Scatter plot of ET0 prediction and corresponding FAO-56 P-M values of four machine learning models in Linxia Station under six different input combinations (Note: thin lines represent 1:1 lines).
Figure 7. Scatter plot of ET0 prediction and corresponding FAO-56 P-M values of four machine learning models in Linxia Station under six different input combinations (Note: thin lines represent 1:1 lines).
Water 16 03349 g007
Figure 8. Percentage increase in RMSE values in the test phase of four machine learning models compared to the RMSE values in the training phase (average of the five climate regions sites).
Figure 8. Percentage increase in RMSE values in the test phase of four machine learning models compared to the RMSE values in the training phase (average of the five climate regions sites).
Water 16 03349 g008
Figure 9. Changes in statistical index values (average values of four weather stations in each climate regions) of the mixed model compared with those of the ELM model in different climate regions.
Figure 9. Changes in statistical index values (average values of four weather stations in each climate regions) of the mixed model compared with those of the ELM model in different climate regions.
Water 16 03349 g009
Figure 10. Computational cost (model runtime) of four machine learning models with different input combinations (Combination1: Tmax, Tmin, Rs; Combination2: Tmax, Tmin, Rs, Ra; Combination3: Tmax, Tmin, Rs, RH; Combination4: Tmax, Tmin, Rs, U2; Combination5: Tmax, Tmin, Rs, n; Combination6: Tmax, Tmin, RH, U2).
Figure 10. Computational cost (model runtime) of four machine learning models with different input combinations (Combination1: Tmax, Tmin, Rs; Combination2: Tmax, Tmin, Rs, Ra; Combination3: Tmax, Tmin, Rs, RH; Combination4: Tmax, Tmin, Rs, U2; Combination5: Tmax, Tmin, Rs, n; Combination6: Tmax, Tmin, RH, U2).
Water 16 03349 g010
Table 1. Geographical locations of selected weather stations and daily mean values of meteorological data from 1970 to 2019.
Table 1. Geographical locations of selected weather stations and daily mean values of meteorological data from 1970 to 2019.
Climate RegionsStationLatitude
(N)
Longitude
(E)
Elevation
(m)
Tmax
(℃)
Tmin
(℃)
N
(h)
RH
(%)
U2
(m·s−1)
Rs
(MJ m−2·d−1)
Ra
(MJ m−2·d−1)
TCCKuerle41.7586.1793718.316.097.8945.211.3015.9727.61
Kashi39.4775.99128118.496.027.7149.931.0116.3228.41
Jiuquan39.7598.51147615.071.248.4247.101.2516.9128.31
Huhehaote40.81111.62107413.460.937.7252.161.0615.9627.93
TMCChangchun43.83125.2921511.340.837.1462.762.0714.5626.78
Zhengzhou34.72113.6410720.409.895.8064.291.3914.7730.02
Linxia35.60103.21188214.491.696.6666.330.7115.6029.75
Luochuan35.76109.43116615.634.836.8961.761.1915.8529.69
PMCXining36.65101.77224914.050.087.2756.240.8216.1429.42
Linzhi29.6494.36310016.294.055.4863.190.9414.9031.57
Naqu31.4892.0545007.13−7.737.5851.761.4817.3831.04
Changdu31.1497.18324416.820.936.5650.310.6416.1531.14
SMCWuhan30.60114.034821.4613.225.2677.021.0814.7631.30
Guangzhou23.16113.272126.5618.994.5776.981.0214.5733.23
Guiyang26.68106.62110019.6312.123.1577.531.2612.4732.41
Dujiangyan30.99103.65101919.2812.652.5279.680.6611.1531.18
TPMCHaikou20.03110.331528.1121.555.6183.181.5316.5033.93
Dongfang19.10108.657328.7122.267.0778.782.4218.6134.10
Lancang22.5699.93105427.4514.695.9177.410.4816.4533.38
Zhanjiang21.27110.372326.8420.765.2481.891.6215.8033.69
Table 2. Input combinations of meteorological variables for various machine learning models.
Table 2. Input combinations of meteorological variables for various machine learning models.
ModelsInput Combinations
ELMTSO-ELMSSA-ELMAO-ELM
ELM1TSO-ELM1SSA-ELM1AO-ELM1Tmax, Tmin, Rs
ELM2TSO-ELM2SSA-ELM2AO-ELM2Tmax, Tmin, Rs, Ra
ELM3TSO-ELM3SSA-ELM3AO-ELM3Tmax, Tmin, Rs, RH
ELM4TSO-ELM4SSA-ELM4AO-ELM4Tmax, Tmin, Rs, U2
ELM5TSO-ELM5SSA-ELM5AO-ELM5Tmax, Tmin, Rs, n
ELM6TSO-ELM6SSA-ELM6AO-ELM6Tmax, Tmin, RH, U2
Table 3. Average statistical values of mixed machine learning models with six different input parameters in the training and testing phases of TCC in China.
Table 3. Average statistical values of mixed machine learning models with six different input parameters in the training and testing phases of TCC in China.
TrainingTesting
R2RMSENRMSEMAER2RMSENRMSEMAE
ELM10.90220.62590.21780.42190.90460.62510.20600.4265
TSO-ELM10.90410.61970.21550.41550.90160.64020.21180.4151
SSA-ELM10.90380.62080.21600.41630.90100.67210.21230.4162
AO-ELM10.90430.61910.21530.41440.90150.64020.21170.4150
ELM20.93040.52330.17300.35920.92820.54750.18070.3659
TSO-ELM20.93370.51360.17710.35190.92800.54810.18070.3628
SSA-ELM20.93320.51570.17780.35460.92810.54780.18060.3637
AO-ELM20.93400.51270.17680.35220.92780.54890.18100.3644
ELM30.93490.51070.17730.35400.92380.56320.19240.3957
TSO-ELM30.93700.50210.17430.34660.92350.56420.18510.3929
SSA-ELM30.93720.50160.17420.34500.91500.56750.18620.3952
AO-ELM30.93750.50020.17360.34330.92300.56570.18560.3916
ELM40.96120.37940.13530.25270.96070.39480.13210.2720
TSO-ELM40.96670.35420.12590.22250.96580.37240.12420.2450
SSA-ELM40.96690.35370.12570.22140.96560.37340.12450.2466
AO-ELM40.96710.35240.12410.22050.96590.37170.12400.2445
ELM50.93170.52140.17970.35960.92770.54940.18120.3671
TSO-ELM50.93320.51540.17760.35310.92840.54670.18030.3620
SSA-ELM50.93330.51460.17770.35320.92790.55040.18150.3631
AO-ELM50.93360.51430.17730.35330.92800.54810.18070.3639
ELM60.94250.47150.16670.35350.94400.48180.15970.3635
TSO-ELM60.94630.45600.16120.34300.94670.47010.15600.3567
SSA-ELM60.94590.45780.16180.34480.94610.47280.15680.3583
AO-ELM60.94680.45380.16040.34080.94720.46770.15520.3545
Note: The best indicators under each combination in the table are highlighted in bold.
Table 4. Average statistical values of mixed machine learning models with six different input parameters in the training and testing phases of TMC in China.
Table 4. Average statistical values of mixed machine learning models with six different input parameters in the training and testing phases of TMC in China.
TrainingTesting
R2RMSENRMSEMAER2RMSENRMSEMAE
ELM10.88330.56580.21730.40260.88660.54260.21070.3924
TSO-ELM10.88690.55700.21400.39400.89010.53450.20760.3833
SSA-ELM10.88680.55730.21420.39450.89050.53380.20750.3838
AO-ELM10.88760.55550.21350.39300.89030.53400.20750.3844
ELM20.92890.43870.16760.30320.94580.37490.14620.2671
TSO-ELM20.93120.43120.16470.29880.94810.36660.14290.2625
SSA-ELM20.93110.43170.16480.29880.94790.36730.14320.2616
AO-ELM20.93190.42930.16390.29660.94820.36610.14260.2605
ELM30.92540.45610.17390.32590.90970.47690.18360.3509
TSO-ELM30.92990.43700.16840.31090.90920.47000.17860.3423
SSA-ELM30.93010.43660.16820.31050.90940.47990.17740.3517
AO-ELM30.93060.43510.16770.31050.90970.47430.17210.3412
ELM40.93240.43100.16900.30120.91950.45890.18020.3223
TSO-ELM40.93410.41040.16100.27630.92430.44470.17460.3001
SSA-ELM40.93780.41390.16230.27890.92410.44550.17490.3026
AO-ELM40.93940.40830.16010.27350.92490.44310.17380.2987
ELM50.92880.43910.16780.30420.94550.37620.14670.2676
TSO-ELM50.93130.43150.16490.29850.94730.36950.14400.2629
SSA-ELM50.93100.43230.18710.29920.94760.36820.14360.2628
AO-ELM50.93220.42860.16370.29620.94860.36490.14230.2590
ELM60.92150.45660.18650.35220.90890.48740.19190.3731
TSO-ELM60.92900.43520.17140.33050.91810.46310.18250.3505
SSA-ELM60.92840.43780.17200.33250.91760.46440.18300.3509
AO-ELM60.92940.43380.17080.32920.91940.45950.18100.3468
Note: The best indicators under each combination in the table are highlighted in bold.
Table 5. Average statistical values of mixed machine learning models with six different input parameters in the training and testing stages of PMC in China.
Table 5. Average statistical values of mixed machine learning models with six different input parameters in the training and testing stages of PMC in China.
TrainingTesting
R2RMSENRMSEMAER2RMSENRMSEMAE
ELM10.90180.36690.15050.27340.89350.48950.16580.3187
TSO-ELM10.90320.36380.14920.26990.90170.44830.15220.2950
SSA-ELM10.90340.36350.14910.26980.89900.45030.15640.2979
AO-ELM10.90360.36300.14890.26930.90200.44400.15160.2943
ELM20.95700.23680.09730.17440.96080.25940.11220.2216
TSO-ELM20.96190.21280.08800.17170.96140.25800.11170.2158
SSA-ELM20.95820.23360.09610.17240.96130.25800.11150.2155
AO-ELM20.95870.23210.09540.17100.96180.25750.11110.2149
ELM30.92650.31820.13040.23990.92190.36720.16890.2721
TSO-ELM30.93110.30830.12600.23000.92840.35070.15980.2517
SSA-ELM30.93060.30850.12640.23090.92680.35120.16080.2520
AO-ELM30.93150.30650.12560.22870.92450.35220.16140.2541
ELM40.93440.30180.12370.21270.93170.36490.15890.3018
TSO-ELM40.93870.29200.11960.20150.93580.35920.15610.2954
SSA-ELM40.93830.29320.12010.20310.93380.36190.15710.3004
AO-ELM40.93900.29120.11930.20030.93460.36190.15620.2955
ELM50.95590.23960.09850.17740.96150.33130.10700.2149
TSO-ELM50.95800.23380.09610.17220.96520.31550.10170.2032
SSA-ELM50.95800.23420.09630.17250.96380.31800.10290.2049
AO-ELM50.95820.21250.09590.17240.96500.31480.10140.2028
ELM60.88390.40360.16490.31720.88250.44360.16580.3248
TSO-ELM60.89170.38870.15880.30200.88670.43540.16210.3180
SSA-ELM60.89030.39140.16000.30370.88520.43870.16350.3201
AO-ELM60.89210.38820.15870.30090.88680.43520.16170.3171
Note: The best indicators under each combination in the table are highlighted in bold.
Table 6. Average statistical values of mixed machine learning models with six different input parameters in the training and testing stages of SMC in China.
Table 6. Average statistical values of mixed machine learning models with six different input parameters in the training and testing stages of SMC in China.
TrainingTesting
R2RMSENRMSEMAER2RMSENRMSEMAE
ELM10.84860.51910.20490.37220.83260.57070.21810.4119
TSO-ELM10.85290.51160.20190.36220.83440.56590.21680.4009
SSA-ELM10.85330.51110.20170.36280.83630.56270.21520.4005
AO-ELM10.85340.51090.20160.36190.83610.56290.21590.3882
ELM20.96780.23860.09530.16390.96710.23600.08910.1605
TSO-ELM20.96900.23460.09380.16060.96780.23350.08810.1608
SSA-ELM20.96860.23490.09390.16090.96760.23440.08840.1611
AO-ELM20.96920.23400.09360.16010.96780.23410.08820.1606
ELM30.88530.45210.17780.32540.87700.44500.18300.3335
TSO-ELM30.89410.43610.17130.28180.88580.43180.17440.3146
SSA-ELM30.89410.43590.17170.30810.88660.42980.17450.3082
AO-ELM30.89390.43650.17170.30720.88510.43290.17530.3076
ELM40.87530.47540.18680.34760.86320.50130.19820.3546
TSO-ELM40.88510.45480.17880.32380.87010.48530.19450.3421
SSA-ELM40.88450.45640.17940.32540.86840.48840.19600.3456
AO-ELM40.88590.45360.17840.32220.86930.48720.19560.3421
ELM50.96790.23920.09570.16490.96750.22810.09370.1586
TSO-ELM50.96890.23470.09380.16090.97050.23290.08960.1539
SSA-ELM50.96890.23520.09480.16090.97050.23200.09180.1565
AO-ELM50.96910.23430.09370.16050.97070.23040.08820.1533
ELM60.87580.47890.18890.36210.87050.44250.18700.3585
TSO-ELM60.88500.45950.18130.34770.87540.43280.17750.3477
SSA-ELM60.88490.45960.18130.34310.87190.43540.18170.3460
AO-ELM60.88570.45830.18070.34260.87640.42890.17620.3435
Note: The best indicators under each combination in the table are highlighted in bold.
Table 7. Average statistical values of mixed machine learning models with six different input parameters in the training and testing phases of TPMC in China.
Table 7. Average statistical values of mixed machine learning models with six different input parameters in the training and testing phases of TPMC in China.
TrainingTesting
R2RMSENRMSEMAER2RMSENRMSEMAE
ELM10.80470.59480.16970.46850.80970.58920.16470.4656
TSO-ELM10.80560.58810.16790.46070.81400.58250.16290.4578
SSA-ELM10.80500.58890.16810.46200.81290.58430.16340.4601
AO-ELM10.80580.58770.16780.45990.81370.58300.16310.4577
ELM20.96260.25240.07120.18070.95790.27650.07620.2006
TSO-ELM20.96530.24380.06890.17260.95990.27000.07450.1952
SSA-ELM20.96510.24440.06910.17310.95980.27030.07450.1954
AO-ELM20.96560.24270.06840.17150.96010.26910.07430.1942
ELM30.85730.50250.14460.38970.86370.49690.13920.3823
TSO-ELM30.86320.49180.14160.38590.86640.49200.13800.3749
SSA-ELM30.86250.49300.14190.37990.86430.49600.13910.3796
AO-ELM30.86390.49050.14120.37670.86530.49380.13860.3778
ELM40.81480.57420.16380.45110.82180.57080.15940.4532
TSO-ELM40.82440.55930.15960.43800.83110.55550.15520.4397
SSA-ELM40.82430.55930.14230.43790.83080.55610.15540.4362
AO-ELM40.82570.55720.15910.43580.82950.55840.15600.4353
ELM50.96380.24910.07040.17790.95920.27250.07520.1973
TSO-ELM50.96490.24750.06920.17350.95960.27070.07470.1961
SSA-ELM50.96500.24490.06920.17330.96000.26970.07440.1946
AO-ELM50.96530.24270.06880.17170.96010.26930.07430.1948
ELM60.85310.50980.14670.39570.86090.50030.14080.3905
TSO-ELM60.85730.50230.14460.38830.86470.49360.13880.3830
SSA-ELM60.85780.50140.14440.38760.86530.49250.13860.3820
AO-ELM60.85870.49990.14390.38590.86450.49360.13890.3819
Note: The best indicators under each combination in the table are highlighted in bold.
Table 8. Error statistics of the combination of best-performing machine learning models for weather stations in different climate regions in China.
Table 8. Error statistics of the combination of best-performing machine learning models for weather stations in different climate regions in China.
RegionsStationModel IDR2RMSEMAENRMSE
TCCKuerleTSO-ELM40.97520.34530.22020.1055
KashiAO-ELM40.97250.37170.25160.1163
JiuquanAO-ELM40.96280.36510.23510.1267
HuhehaoteTSO-ELM40.95370.40030.26480.1460
TMCChangchunTSO-ELM30.93070.45500.31990.1852
ZhengzhouAO-ELM50.92910.44850.33730.1488
LinxiaAO-ELM50.98090.19420.12900.0869
LuochuanAO-ELM50.95690.33110.23110.1285
PMCXiningAO-ELM50.98640.16550.13060.0704
LinzhiAO-ELM50.94710.44170.29090.1231
NaquAO-ELM20.97360.22190.23910.1022
ChangduAO-ELM50.97650.20980.09290.1201
SMCWuhanTSO-ELM20.98030.22650.15420.0862
GuangzhouTSO-ELM50.95670.26170.17770.0865
GuiyangAO-ELM50.97330.20770.13650.0918
DujiangyanSSA-ELM20.97390.19210.12860.0927
TPMCHaikouAO-ELM50.95320.31920.21670.0898
DongfangSSA-ELM50.93870.37310.28830.0886
LancangAO-ELM20.98340.12710.08860.0409
ZhanjiangAO-ELM50.96620.25360.18250.0768
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hu, J.; Ma, R.; Jiang, S.; Liu, Y.; Mao, H. Prediction of Reference Crop Evapotranspiration in China’s Climatic Regions Using Optimized Machine Learning Models. Water 2024, 16, 3349. https://doi.org/10.3390/w16233349

AMA Style

Hu J, Ma R, Jiang S, Liu Y, Mao H. Prediction of Reference Crop Evapotranspiration in China’s Climatic Regions Using Optimized Machine Learning Models. Water. 2024; 16(23):3349. https://doi.org/10.3390/w16233349

Chicago/Turabian Style

Hu, Jian, Rong Ma, Shouzheng Jiang, Yuelei Liu, and Huayan Mao. 2024. "Prediction of Reference Crop Evapotranspiration in China’s Climatic Regions Using Optimized Machine Learning Models" Water 16, no. 23: 3349. https://doi.org/10.3390/w16233349

APA Style

Hu, J., Ma, R., Jiang, S., Liu, Y., & Mao, H. (2024). Prediction of Reference Crop Evapotranspiration in China’s Climatic Regions Using Optimized Machine Learning Models. Water, 16(23), 3349. https://doi.org/10.3390/w16233349

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop