# Estimation of Spring Maize Evapotranspiration in Semi-Arid Regions of Northeast China Using Machine Learning: An Improved SVR Model Based on PSO and RF Algorithms

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

_{c}) is crucial for effective irrigation and water management. To achieve this, support vector regression (SVR) was applied to estimate the daily ET

_{c}of spring maize. Random forest (RF) as a data pre-processing technique was utilized to determine the optimal input variables for the SVR model. Particle swarm optimization (PSO) was employed to optimize the SVR model. This study used data obtained from field experiments conducted between 2017 and 2019, including crop coefficient and daily meteorological data. The performance of the innovative hybrid RF–SVR–PSO model was evaluated against a standalone SVR model, a back-propagation neural network (BPNN) model and a RF model, using different input meteorological variables. The ET

_{c}values were calculated using the Penman–Monteith equation, which is recommended by the FAO, and used as a reference for the models’ estimated values. The results showed that the hybrid RF–SVR–PSO model performed better than all three standalone models for ET

_{c}estimation of spring maize. The Nash–Sutcliffe efficiency coefficient (NSE), root mean square error (RMSE), mean absolute error (MAE) and coefficient of determination (R

^{2}) ranges were 0.956–0.958, 0.275–0.282 mm d

^{−1}, 0.221–0.231 mm d

^{−1}and 0.957–0.961, respectively. It is proved that the hybrid RF–SVR–PSO model is appropriate for estimation of daily spring maize ET

_{c}in semi-arid regions.

## 1. Introduction

_{c}), which mainly consists of soil surface evaporation and vegetation transpiration, is an integral part of the farmland water balance and hydrological cycle. As a critical indicator in the determination of irrigation regimes, determining crop water requirements is of utmost importance. Therefore, research on ET

_{c}is crucial for improving agricultural water productivity, conserving irrigation water resources and ensuring food security [5,6].

_{c}and optimize irrigation water utilization to guarantee the quality and yield of spring maize.

_{c}, the crop coefficient (K

_{c}) is used in conjunction with reference crop evapotranspiration (ET

_{0}) [11]. However, determining ET

_{c}is challenging due to its dependence on various meteorological variables, soil conditions and crop growth indicators [12,13]. Several empirical models have been developed over the years to estimate daily ET

_{0}, which can be broadly categorized as temperature-based—such as the Hargreaves–Samani model [14], radiation-based—such as the Priestley–Taylor (P–T) model and Jensen–Haise (H–S) model [15,16], and the principles of energy balance and water vapor diffusion-based Penman–Monteith (PM) equation [11]. Among these empirical models, the PM equation has a wider range of applications and higher estimation accuracy and is recommended by the Food and Agricultural Organization (FAO) for daily ET

_{0}estimation in various regions [10,17,18,19]. However, its use is restricted in areas where complete meteorological data are unavailable. Although the P–T and H–S models based on temperature or radiation data can be useful, their estimation accuracy is suboptimal. As a result, it is essential to develop an ET

_{c}estimation model that requires minimal meteorological data input while still achieving high estimation accuracy.

_{c}estimation owing to their ability to model complex nonlinear relationships. For example, Saggi and Jain [6] developed an ensemble model consisting of a regularization random forest and hybrid fuzzy–genetic model to estimate ET

_{c}for maize and wheat, with the results demonstrating superior performance of the ensemble model. In another study by Yamaç [20], the performance of four machine learning models, namely support vector machine (SVM), k-nearest neighbor, random forest (RF) and adaptive boosting models, was compared for sugar beet evapotranspiration estimation under different weather data input conditions. The results demonstrated that the SVM model outperformed the other three models in various conditions. Han, et al. [21] applied the back-propagation neural network (BPNN) model for the ET

_{c}of wheat, maize and soybean prediction. The BPNN model was verified using eddy correlation measurement of ET

_{c}, and the results of the BPNN model were found to be satisfactory.

_{0}. The results showed that the prediction accuracy of the model established with meteorological input variables determined by the RF method is higher than that of the other three methods. In another study, Pinos, et al. [28] used the RF method to determine the most important variables of ET

_{0}as input to the artificial neural network (ANN) model and found that selecting the input variables on the basis of quantification not only reduced the complexity of the model, but also improved its accuracy.

_{0}and optimized the model using particle swarm optimization (PSO). Their results demonstrated that the optimized model outperformed the standalone SVM model in terms of prediction accuracy. Wu, et al. [30] optimized the extreme learning machine (ELM) model for estimating daily ET

_{0}in various climatic regions of China, using three bio-inspired optimization algorithms, including genetic algorithm (GA), PSO and artificial bee colony (ABC) algorithms. The results demonstrated the effectiveness of bio-inspired heuristic optimization algorithms, particularly the PSO algorithm, in optimizing machine learning models for hydrological applications. In another study, Zhang, et al. [31] used the PSO algorithm to optimize a BPNN model for the prediction of total daily solar radiation and found that the accuracy of the model was significantly enhanced. We also found several other studies that used PSO to optimize machine learning models for other applications, such as flood forecasting, water quality modeling and materials science and engineering [32,33,34]. These studies further highlight the potential of PSO as a powerful optimization technique for improving the performance of machine learning models in various applications.

_{c}calculated by the FAO method using complete meteorological data and empirical values of K

_{c}, we established a hybrid RF–SVR–PSO model and compared it with a standalone SVR model, RF model and BPNN model in order to verify the optimization effect of PSO and RF and the estimation accuracy of the hybrid model. The main purposes of this study were to: (1) apply a hybrid RF–SVR–PSO model to estimate the daily ET

_{c}of spring maize, (2) compare the estimation performance of the hybrid model with the standalone SVR, BPNN and RF models under the input parameters determined by the RF method, (3) recommend the optimal ET

_{c}estimation model and meteorological input variables for spring maize in the semi-arid region of Northeast China.

## 2. Materials and Methods

#### 2.1. Experimental Site and Data Source

_{ave}, T

_{min}and T

_{max}, $\mathbb{C}$), precipitation (mm), wind speed at 2 m height (U, m s

^{−1}), daily duration of sunshine (n, h), average relative humidity (RH, $\%$) and average vapor pressure (hP, hPa). The Fumeng County meteorological station is a national ecological and agricultural meteorological observation station. The ground observation is responsible for 24 h monitoring and automatic uploading of station meteorological data at 10 min intervals to participate in global sharing.

^{−1}, and shallowly buried drip irrigation tape was used for sowing via an integrated machine. The irrigation method employed was shallow buried drip irrigation, with the embedded drip irrigation tape only laid in the center of the narrow row, approximately 3–5 cm beneath the surface. For further information on the field experiment, please refer to Wang, et al. [36].

#### 2.2. Maize Crop Evapotranspiration Calculation

_{0}was calculated by the FAO56 Penman–Monteith equation [11]

^{−2}d

^{−1}), $G$ is the soil heat flux density (MJ m

^{−2}d

^{−1}), $\gamma $ is the psychrometric constant, T is the mean air temperature ($\mathbb{C}$), ${u}_{2}$ is the daily wind speed at 2 m height (m s

^{−1}), ${e}_{s}$ is the saturation vapor pressure (kPa), ${e}_{a}$ is the actual vapor pressure (kPa) and $\Delta $ is the slope of the saturation vapor pressure–temperature curve (kPa ${\mathbb{C}}^{-1}$).

_{n}. Therefore, the soil heat flux was considered negligible according to the recommendation of FAO56 [11]. However, it is important to note that under certain conditions, such as those of bare surfaces or shorter computation periods, the soil heat flux can become more significant and should be measured or estimated using appropriate methods. According to the Ångström–Prescott (A–P) formula (Equation (2)), the daily solar radiation (R

_{s}) can be estimated using extraterrestrial radiation (R

_{a}), the actual daily duration of sunshine (n) and the maximum possible duration of sunshine (N) [11,37].

^{−2}; $J$ is the day of the year; ${d}_{r}$ is the inverse relative earth–sun distance; $\delta $ is the solar declination (rad); $\phi $ is the latitude (rad) [30].

^{−1}) calculated by Equation (1) and ${K}_{c}$ is the crop coefficient.

_{c}values of initial, mid-season and late-season stages suggested by FAO56 are 0.3, 1.2 and 0.6. The K

_{c}values of mid-season and late-season stages were adjusted due to the actual conditions of the experimental site according to the following equations:

^{−1}) at 2 m above ground during the growth stage from the experiment site [11].

#### 2.3. Support Vector Regression

_{c}. The details about the mathematical–statistical theory of the SVR model can be found in the Supplementary Material.

#### 2.4. Particle Swarm Optimization Algorithm

#### 2.5. Random Forest

_{c}of spring maize. The random forest (RF) method is a widely used tree-based machine learning algorithm for constructing classification and regression models [26]. It is an extended variant of Bagging [48], which employs decision trees as basic learners and introduces random attribute selection into the training process. The RF model has been applied to the estimation of ET

_{0}in many studies [23,27,49]. In addition, the RF method possesses exceptional capability in determining the importance of variables [28]. To ascertain the importance of input variables, all meteorological variables and crop coefficient values from the training period were employed as the input training data to construct each tree. In the tree generation, a random bootstrap sampling of each point of input training data was conducted, resulting in approximately 37% of the input training data being excluded from tree generation and classified as out-of-bag (OOB) observations [50]. The RF model determined the importance of each input variable by measuring the mean decrease in prediction accuracy when samples of a variable in the OOB dataset were randomly permuted [51]. In this study, the RF was constructed using the R package “rfPermute” [52].

#### 2.6. Back-Propagation Neural Network

_{c}of spring maize, and its estimation accuracy was compared with the hybrid model.

#### 2.7. Hybrid Model Building

_{c}and computes the corresponding error.

_{c}estimation. R version 4.1.1 [56] was used to build and implement crop evapotranspiration estimation models, and the structure of the hybrid RF–SVR–PSO model is illustrated in Figure 1.

#### 2.8. Evaluation Criteria of Model Performance

^{2}) and Nash–Sutcliffe efficiency coefficient (NSE) [57]. These statistical criteria were calculated as follows:

## 3. Results

#### 3.1. The Variables for Determining Crop Evapotranspiration

_{c}, we fitted the model using all nine variables, including eight meteorological variables and K

_{c}, based on RF analysis. The ranking of importance of the variables showed that K

_{c}had the greatest impact on ET

_{c}, reaching 95.23%, followed by n, T

_{ave}, RH, T

_{max}, T

_{min}, U, hP and Precipitation (Figure 2). This result shows that the most important factors affecting ET

_{c}, in addition to crop coefficients, are sunshine hours, temperature, and relative humidity, which is similar to what was found in the study conducted by Pinos, Chacón and Feyen [28]. Therefore, we added K

_{c}to all machine learning models. We then added the three, four and five variables with the highest importance outside of K

_{c}as input variables to the machine learning model.

#### 3.2. Performance Assessment

_{c}with NSE, RMSE, MAE and R

^{2}ranging from 0.858–0.986, 0.206–0.508 mm d

^{−1}, 0.152–0.426 mm d

^{−1}and 0.915–0.989, respectively.

^{2}being slightly smaller than that of the BPNN model for four and five input variables, but significantly higher than that of the RF model. The RF model performed the poorest, with R

^{2}, RMSE, MAE and NSE ranging from 0.915–0.939, 0.434–0.508 mm d

^{−1}, 0.369–0.409 mm d

^{−1}and 0.858–0.897, respectively. The SVR model performed best when the input variables were K

_{c}, T

_{ave}, T

_{min}, n and RH, with R

^{2}, RMSE, MAE and NSE values of 0.957, 0.320 mm d

^{−1}, 0.263 mm d

^{−1}and 0.944.

_{c}for all three combinations of input variables compared to the standalone SVR model, with R

^{2}, RMSE, MAE and NSE values of 0.957–0.961, 0.275–0.282 mm d

^{−1}, 0.221–0.231 mm d

^{−1}and 0.956–0.958, respectively. As the number of input variables increased, the model performance and the accuracy of estimated ET

_{c}values was only slightly improved.

_{c}values using the PM–FAO equation as the target values, estimated ET

_{c}values of the RF–PSO–SVR model, SVR model, BPNN model and RF model under different combinations of input variables were compared in the form of scatter and hydrograph plots, as shown in Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8. The scatter plots clearly depict that the scatter distribution of the hybrid RF–SVR–PSO model are more evenly concentrated on the ideal line (i.e., 1:1 line) than the other three standalone models, and the fit of the hybrid RF–SVR–PSO model improves slightly as the number of input variables increase. It was also evident from the scatter plot that the trend lines for the three independent models were generally above or below the ideal line (i.e., 1:1 line), with varying degrees of overestimation or underestimation of the ET

_{c}estimates compared to the FAO–PM calculations. Compared with the calculated value of FAO–PM, ET

_{c}estimates of the three standalone models are overestimated or underestimated to vary degrees. Further, the accuracy of the standalone models decreased when the input variables were K

_{c}, T

_{ave}, T

_{max}, T

_{min}, n and RH. As shown in the hydrograph plots, the hybrid RF–PSO–SVR model outperformed the standalone models, both in capturing the peaks and in terms of the overall individual values estimated.

## 4. Discussion

_{c}estimation accuracy of the standalone SVR model was higher than that of the standalone BPNN and RF models. The study demonstrated the superiority of the SVR model in handling the complex nonlinear relationship between ET

_{c}and meteorological variables, and its high accuracy and computational efficiency in estimating ET

_{c}[22,27,37].

_{0}, which showed a 13% lower RMSE than the standalone ELM model. Jia, et al. [60] used the sparrow search algorithm (SSA) to optimize the ELM model, resulting in a significant improvement in the model’s performance. Wen and Yuan [61] optimized the BPNN model for CO

_{2}emissions forecasting using the PSO algorithm, and the results indicated a positive effect on optimization. Given the simplicity of the PSO algorithm and its good optimization results, this study utilized the PSO algorithm to determine and optimize the hyperparameters (C, γ and ε) of the SVR model, resulting in the development of a hybrid RF–SVR–PSO model for spring maize daily ET

_{c}estimation. The RMSE and MAE of the hybrid RF–SVR–PSO model computed with three different meteorological input variables decreased by 13.2% to 22.8% and 14.6% to 21.2%, and NSE improved by 1.5% to 3.1% compared to the standalone SVR model (testing period). While it may seem that all machine learning models perform well, the PSO algorithm led to substantial improvements in the accuracy of ET

_{c}estimates compared to the standalone machine learning models. Specifically, the PSO algorithm significantly reduced the overestimation and underestimation of ET

_{c}estimates by standalone machine learning models, which is critical for guiding actual maize production practices. Spring maize is particularly sensitive to water stress, and when the model underestimates ET

_{c}, the recommended irrigation amount may be lower than the amount of water required for maize production, resulting in reduced maize yields and affecting food security. Conversely, if the model overestimates ET

_{c}, the recommended irrigation amount may be higher than the amount of water required for maize production, resulting in wasted water and reduced water productivity.

_{c}[25]. The selection of suitable inputs for the ML models can effectively improve the accuracy of the results. This study used the RF method as a data pre-processing method to determine the importance of the meteorological input variables and to identify suitable inputs for the ML model. The RF method ranked the importance of estimating spring maize daily ET

_{c}variables in the study area. By taking the top four, five and six variables of highest importance as the input of the ML models, we determined the optimal input of the model. With an increase in the number of input meteorological variables, the accuracy of the hybrid RF–SVR–PSO model in estimating the spring maize daily ET

_{c}was only slightly improved. Although the accuracy of the hybrid RF–SVR–PSO model improved only slightly with an increase in the number of input variables, considering the computational efficiency and estimation accuracy, the input variables K

_{c}, n, T

_{ave}and RH were determined as the best combination for the hybrid RF–SVR–PSO model, with R

^{2}, RMSE, MAE and NSE values of 0.957, 0.282, 0.231 and 0.956, respectively.

_{c}were compared with those of other approaches and are presented in Table 4 for the testing period. Jia et al. [60] proposed a hybrid SSA–ELM model using T

_{max}, T

_{min}, n, maize leaf area index (GLAI) and plant height (h) as the input variables to estimate the spring maize ET

_{c}, and obtained RMSE, MAE and R

^{2}values of 0.433 mm d

^{−1}, 0.342 mm day

^{−1}and 0.895, respectively. In a study conducted by Yamaç [20], RMSE, MAE and R

^{2}values were reported for the adaptive boosting (AB) and SVM models using K

_{c}, T

_{max}, T

_{min}, RH and U as the input variables as 0.954 mm day

^{−1}, 0.688 mm day

^{−1}and 0.856, and 0.699 mm day

^{−1}, 0.557 mm day

^{−1}and 0.923, respectively. Thus, it can be seen that the hybrid RF–SVR–PSO has a high accuracy for spring maize ET

_{c}estimation. Furthermore, the hybrid model can be used in semi-arid regions as an alternative to the widely used FAO56-recommended approach, the PM equation, to obtain satisfactory ET

_{c}estimation. However, as a machine learning model, the hybrid RF–SVR–PSO operates as a black box, and its parameters must be re-determined for use in different locations with varying meteorological conditions. Additionally, in areas where eddy covariance or lysimeter data are obtainable, utilizing such measurement data as the reference could avoid the “double bias” caused by using FAO–PM approach as a reference.

_{c}estimation and irrigation management in semi-arid regions. In this study, the training and testing datasets were divided only by a simple hold-out method, which may result in a reduced generalization ability. Therefore, in the forthcoming study, the hybrid RF–SVR–PSO model can be combined with the K-fold cross-validation method to improve the estimation accuracy, generalization and robustness of the model.

## 5. Conclusions

^{2}, RMSE, MAE and NSE. The results demonstrated that, using the same input variables, the estimation accuracy of the spring maize daily ET

_{c}of RF–SVR–PSO model was better than that of the standalone models. The RF–SVR–PSO model with K

_{c}, T

_{ave}, n and RH as the input variables can be utilized to estimate spring maize daily ET

_{c}and provide a precise and accurate basis for agricultural water resource management and decision making. This conclusion can promote the development of water-saving agriculture and efficient utilization of agricultural water in arid and semi-arid areas of Northeast China, and it is also valuable for regions with different climatic conditions. In future studies, the model will be used in regions with different climatic conditions to improve its effectiveness at different stations.

## Supplementary Materials

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Kang, S. Towards water and food security in China. Chin. J. Eco-Agric.
**2014**, 22, 880–885. [Google Scholar] - Godfray, H.C.J.; Beddington, J.R.; Crute, I.R.; Haddad, L.; Lawrence, D.; Muir, J.F.; Pretty, J.; Robinson, S.; Thomas, S.M.; Toulmin, C. Food Security: The Challenge of Feeding 9 Billion People. Science
**2010**, 327, 812–818. [Google Scholar] [CrossRef] [PubMed][Green Version] - Wang, D.; Li, G.; Mo, Y.; Zhang, D.; Xu, X.; Wilkerson, C.J.; Hoogenboom, G. Evaluation of subsurface, mulched and non-mulched surface drip irrigation for maize production and economic benefits in northeast China. Irrig. Sci.
**2021**, 39, 159–171. [Google Scholar] [CrossRef] - Zou, H.; Fan, J.; Zhang, F.; Xiang, Y.; Wu, L.; Yan, S. Optimization of drip irrigation and fertilization regimes for high grain yield, crop water productivity and economic benefits of spring maize in Northwest China. Agric. Water Manag.
**2020**, 230, 105986. [Google Scholar] [CrossRef] - Tuo, Y.; Wang, Q.; Zhang, L.; Shen, F.; Wang, F.; Zheng, Y.; Wang, Z. Establishment of a crop evapotranspiration calculation model and its validation. J. Agron. Crop. Sci.
**2022**, 209, 251–260. [Google Scholar] [CrossRef] - Saggi, M.K.; Jain, S. Application of fuzzy-genetic and regularization random forest (FG-RRF): Estimation of crop evapotranspiration (ET) for maize and wheat crops. Agric. Water Manag.
**2020**, 229, 105907. [Google Scholar] [CrossRef] - Liu, X.; Fu, B. Drought impacts on crop yield: Progress, challenges and prospect. Acta Geogr. Sin.
**2021**, 76, 2632–2646. [Google Scholar] - FAOSTAT. Food and Agricultural Organization of the United Nations: Major Food and Agricultural Commodities and Producers. 2020. Available online: http://www.fao.org/faostat/en/#data/QC/visualize (accessed on 18 August 2022).
- Hou, Y.; Kong, L.; Cai, H.; Liu, H.; Gao, Y.; Wang, Y.; Wang, L. The Accumulation and Distribution Characteristics on Dry Matter and Nutrients of High-Yielding Maize Under Drip Irrigation and Fertilization Conditions in Semi-Arid Region of Northeastern China. Sci. Agric. Sin.
**2019**, 52, 3559–3572. [Google Scholar] - Yang, X.; Ming, B.; Tao, H.; Wang, P. Spatial distribution characteristics and impact on spring maize yield of drought in Northeast China. Chin. J. Eco-Agric.
**2015**, 23, 758–767. [Google Scholar] - Allen, R.G.; Pereira, L.S.; Raes, D.; Smith, M. Crop Evapotranspiration: Guidelines for Computing Crop Water Requirements; FAO—Food and Agriculture Organization of the United Nations: Rome, Italy, 1998; Available online: https://www.fao.org/3/X0490E/x0490e00.htm (accessed on 18 August 2022).
- Pereira, L.S.; Allen, R.G.; Smith, M.; Raes, D. Crop evapotranspiration estimation with FAO56: Past and future. Agric. Water Manag.
**2015**, 147, 4–20. [Google Scholar] [CrossRef] - Kumar, R.; Jat, M.K.; Shankar, V. Methods to estimate irrigated reference crop evapotranspiration—A review. Water Sci. Technol.
**2012**, 66, 525–535. [Google Scholar] [CrossRef] - Najafi, P.; Tabatabaei, S.H. Comparison of different Hargreaves-Samani methods for estimating potential evapotranspiration in arid and semi-arid regions of Iran. Res. Crops
**2009**, 10, 441–447. [Google Scholar] - Ai, Z.; Yang, Y. Modification and Validation of Priestley-Taylor Model for Estimating Cotton Evapotranspiration under Plastic Mulch Condition. J. Hydrometeorol.
**2016**, 17, 1281–1293. [Google Scholar] [CrossRef] - Al-Ghobari, H.M. Estimation of reference evapotranspiration for southern region of Saudi Arabia. Irrig. Sci.
**2000**, 19, 81–86. [Google Scholar] [CrossRef] - Xu, Z.; Yi, L.I.; Liu, J. Application of stochastic model to simulation of reference crop evapotranspiration in grassland of arid region. J. Hydraul. Eng.
**2008**, 39, 1267–1272, 1278. [Google Scholar] - Wang, W.; Peng, S.Z.; Luo, Y.F. Chaotic behavior analysis and prediction of reference crop evapotransporation. J. Hydraul. Eng.
**2008**, 39, 1030–1036. [Google Scholar] - Pinos, J. Estimation methods to define reference evapotranspiration: A comparative perspective. Water Pract. Technol.
**2022**, 17, 940–948. [Google Scholar] [CrossRef] - Yamaç, S.S. Artificial intelligence methods reliably predict crop evapotranspiration with different combinations of meteorological data for sugar beet in a semiarid area. Agric. Water Manag.
**2021**, 254, 106968. [Google Scholar] [CrossRef] - Han, X.; Wei, Z.; Zhang, B.; Li, Y.; Du, T.; Chen, H. Crop evapotranspiration prediction by considering dynamic change of crop coefficient and the precipitation effect in back-propagation neural network model. J. Hydrol.
**2021**, 596, 126104. [Google Scholar] [CrossRef] - Yin, Z.; Wen, X.; Feng, Q.; He, Z.; Zou, S.; Yang, L. Integrating genetic algorithm and support vector machine for modeling daily reference evapotranspiration in a semi-arid mountain area. Hydrol. Res.
**2017**, 48, 1177–1191. [Google Scholar] [CrossRef] - Fan, J.; Yue, W.; Wu, L.; Zhang, F.; Cai, H.; Wang, X.; Lu, X.; Xiang, Y. Evaluation of SVM, ELM and four tree-based ensemble models for predicting daily reference evapotranspiration using limited meteorological data in different climates of China. Agric. For. Meteorol.
**2018**, 263, 225–241. [Google Scholar] [CrossRef] - Xing, X.; Ma, X.; Yu, M.; Liu, Y. Estimating models for reference evapotranspiration with core meteorological parameters via path analysis. Hydrol. Res.
**2017**, 48, 340–354. [Google Scholar] [CrossRef] - Zhao, L.; Zhao, X.; Zhou, H.; Wang, X.; Xing, X. Prediction model for daily reference crop evapotranspiration based on hybrid algorithm and principal components analysis in Southwest China. Comput. Electron. Agric.
**2021**, 190, 106424. [Google Scholar] [CrossRef] - Breiman, L. Random forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef][Green Version] - Mohammadi, B.; Mehdizadeh, S. Modeling daily reference evapotranspiration via a novel approach based on support vector regression coupled with whale optimization algorithm. Agric. Water Manag.
**2020**, 237, 5–32. [Google Scholar] [CrossRef] - Pinos, J.; Chacón, G.; Feyen, J. Comparative analysis of reference evapotranspiration models with application to the wet Andean páramo ecosystem in southern Ecuador. Meteorologica
**2020**, 45, 25–45. [Google Scholar] - Petković, D.; Gocic, M.; Shamshirband, S.; Qasem, S.N.; Trajkovic, S. Particle swarm optimization-based radial basis function network for estimation of reference evapotranspiration. Theor. Appl. Climatol.
**2016**, 125, 555–563. [Google Scholar] [CrossRef] - Wu, Z.; Cui, N.; Hu, X.; Gong, D.; Wang, Y.; Feng, Y.; Jiang, S.; Lv, M.; Han, L.; Xing, L.; et al. Optimization of extreme learning machine model with biological heuristic algorithms to estimate daily reference crop evapotranspiration in different climatic regions of China. J. Hydrol.
**2021**, 603, 127028. [Google Scholar] [CrossRef] - Zhang, Y.; Cui, N.; Feng, Y.; Gong, D.; Hu, X. Comparison of BP, PSO-BP and statistical models for predicting daily global solar radiation in arid Northwest China. Comput. Electron. Agric.
**2019**, 164, 104905. [Google Scholar] [CrossRef] - Singh, A.; Sharma, A.; Rajput, S.; Bose, A.; Hu, X. An Investigation on Hybrid Particle Swarm Optimization Algorithms for Parameter Optimization of PV Cells. Electronics
**2022**, 11, 909. [Google Scholar] [CrossRef] - Xu, Y.; Hu, C.; Wu, Q.; Jian, S.; Li, Z.; Chen, Y.; Zhang, G.; Zhang, Z.; Wang, S. Research on particle swarm optimization in LSTM neural networks for rainfall-runoff simulation. J. Hydrol.
**2022**, 608, 127553. [Google Scholar] [CrossRef] - Li, W.; Zhang, L.; Chen, X.; Wu, C.; Cui, Z.; Niu, C. Predicting the evolution of sheet metal surface scratching by the technique of artificial intelligence. Int. J. Adv. Manuf. Technol.
**2020**, 112, 853–865. [Google Scholar] [CrossRef] - Baydaroğlu, Ö.; Koçak, K. SVR-based prediction of evaporation combined with chaotic approach. J. Hydrol.
**2014**, 508, 356–363. [Google Scholar] [CrossRef] - Wang, Z.; Yin, G.; Gu, J.; Wang, S.; Ma, N.; Zhou, X.; Liu, Y.; Zhao, W. Effects of Water, Nitrogen and Potassium Interaction on Water Use Efficiency of Spring Maize Under Shallow-buried Drip Irrigation. J. Soil Water Conserv.
**2022**, 36, 316–324. [Google Scholar] [CrossRef] - Chen, S.; He, C.; Huang, Z.; Xu, X.; Jiang, T.; He, Z.; Liu, J.; Su, B.; Feng, H.; Yu, Q.; et al. Using support vector machine to deal with the missing of solar radiation data in daily reference evapotranspiration estimation in China. Agric. For. Meteorol.
**2022**, 316, 108864. [Google Scholar] [CrossRef] - Vapnik, V.N. The Nature of Statistical Learning Theory; Vapnik, V.N., Ed.; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar] [CrossRef]
- Drucker, H.; Burges, C.J.C.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. In Proceedings of the 10th Annual Conference on Neural Information Processing Systems (NIPS), Denver, CO, USA, 1996; pp. 155–161. [Google Scholar]
- Shrestha, N.K.; Shukla, S. Support vector machine based modeling of evapotranspiration using hydro-climatic variables in a sub-tropical environment. Agric. For. Meteorol.
**2015**, 200, 172–184. [Google Scholar] [CrossRef] - Zuo, R.; Carranza, E.J.M. Support vector machine: A tool for mapping mineral prospectivity. Comput. Geosci.
**2011**, 37, 1967–1975. [Google Scholar] [CrossRef] - Abdollahi, S.; Pourghasemi, H.R.; Ghanbarian, G.A.; Safaeian, R. Prioritization of effective factors in the occurrence of land subsidence and its susceptibility mapping using an SVM model and their different kernel functions. Bull. Eng. Geol. Environ.
**2019**, 78, 4017–4034. [Google Scholar] [CrossRef] - Pal, M.; Maxwell, A.E.; Warner, T.A. Kernel-based extreme learning machine for remote-sensing image classification. Remote Sens. Lett.
**2013**, 4, 853–862. [Google Scholar] [CrossRef] - Meyer, D.; Dimitriadou, E.; Hornik, K.; Weingessel, A.; Leisch, F. E1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. 2022. Available online: https://CRAN.R-project.org/package=e1071 (accessed on 24 August 2022).
- Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the 1995 IEEE International Conference on Neural Networks (ICNN 95), Perth, WA, Australia, 27 November–1 December 1995; pp. 1942–1948. [Google Scholar]
- Gu, W.; Chai, B.; Teng, Y. Research on Support Vector Machine Based on Particle Swarm Optiminzation. Trans. Beijing Inst. Technol.
**2014**, 34, 705–709. [Google Scholar] - Bendtsen, C. pso: Particle Swarm Optimization. 2022. Available online: https://CRAN.R-project.org/package=pso (accessed on 24 August 2022).
- Breiman, L. Bagging predictors. Mach. Learn.
**1996**, 24, 123–140. [Google Scholar] [CrossRef][Green Version] - Karimi, S.; Shiri, J.; Marti, P. Supplanting missing climatic inputs in classical and random forest models for estimating reference evapotranspiration in humid coastal areas of Iran. Comput. Electron. Agric.
**2020**, 176, 105633. [Google Scholar] [CrossRef] - Heung, B.; Bulmer, C.E.; Schmidt, M.G. Predictive soil parent material mapping at a regional-scale: A Random Forest approach. Geoderma
**2014**, 214–215, 141–154. [Google Scholar] [CrossRef] - Wang, Z.; Zhang, X.; Chhin, S.; Zhang, J.; Duan, A. Disentangling the effects of stand and climatic variables on forest productivity of Chinese fir plantations in subtropical China using a random forest algorithm. Agric. For. Meteorol.
**2021**, 304–305, 108412. [Google Scholar] [CrossRef] - Archer, E. rfPermute: Estimate Permutation p-Values for Random Forest Importance Metrics. 2022. Available online: https://CRAN.R-project.org/package=rfPermute (accessed on 24 August 2022).
- Zhang, D.; Lin, J.; Peng, Q.; Wang, D.; Yang, T.; Sorooshian, S.; Liu, X.; Zhuang, J. Modeling and simulating of reservoir operation using the artificial neural network, support vector regression, deep learning algorithm. J. Hydrol.
**2018**, 565, 720–736. [Google Scholar] [CrossRef][Green Version] - Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw.
**1989**, 2, 359–366. [Google Scholar] [CrossRef] - Fritsch, S.; Guenther, F.; Wright, M.N. neuralnet: Training of Neural Networks. 2019. Available online: https://github.com/bips-hb/neuralnet (accessed on 24 August 2022).
- R Core Team. R: A Language and Environment for Statistical Computing. 2021. Available online: https://www.R-project.org/ (accessed on 20 August 2022).
- Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol.
**1970**, 10, 282–290. [Google Scholar] [CrossRef] - Ji, R.; Ban, X.; Zhang, S. Ascertainment of Crop Coefficients of Maize in Liaoning Area. Chin. Agric. Sci. Bull.
**2004**, 20, 246–248+268. [Google Scholar] - Zhu, B.; Feng, Y.; Gong, D.; Jiang, S.; Zhao, L.; Cui, N. Hybrid particle swarm optimization with extreme learning machine for daily reference evapotranspiration prediction from limited climatic data. Comput. Electron. Agric.
**2020**, 173, 105430. [Google Scholar] [CrossRef] - Jia, Y.; Su, Y.; Zhang, R.; Zhang, Z.; Lu, Y.; Shi, D.; Xu, C.; Huang, D. Optimization of an extreme learning machine model with the sparrow search algorithm to estimate spring maize evapotranspiration with film mulching in the semiarid regions of China. Comput. Electron. Agric.
**2022**, 201, 107298. [Google Scholar] [CrossRef] - Wen, L.; Yuan, X. Forecasting CO2 emissions in Chinas commercial department, through BP neural network based on random forest and PSO. Sci. Total Environ.
**2020**, 718, 137194. [Google Scholar] [CrossRef]

**Figure 2.**Plot of variable importance ranking, where variable importance is expressed as the percentage increase in mean squared error (%IncMSE). Each value represents the increase in prediction error of the same model after a variable is omitted.

**Figure 3.**Scatter plots of calculated daily spring maize ET

_{c}values by FAO–PM method compared with ML model-estimated values with K

_{c}, T

_{ave}, n and RH as the input. RF–SVR–PSO model (

**a**), SVR model (

**b**), BPNN model (

**c**) and RF model (

**d**).

**Figure 4.**Scatter plots of calculated daily spring maize ET

_{c}values by FAO–PM method compared with ML model-estimated values with K

_{c}, T

_{ave}, T

_{max}, n and RH as the input. RF–SVR–PSO model (

**a**), SVR model (

**b**), BPNN model (

**c**) and RF model (

**d**).

**Figure 5.**Scatter plots of calculated daily spring maize ET

_{c}values by FAO–PM method compared with ML model-estimated values with K

_{c}, T

_{ave}, T

_{max}, T

_{min}, n and RH as the input. RF–SVR–PSO model (

**a**), SVR model (

**b**), BPNN model (

**c**) and RF model (

**d**).

**Figure 6.**Comparison of simulated and calculated values of spring maize ET

_{c}in 2019 of different models with K

_{c}, T

_{ave}, n and RH as the input: RF–SVR–PSO model (

**a**), SVR model (

**b**), BPNN model (

**c**) and RF model (

**d**). The areas labeled I, II, III and IV indicate the initial stage, crop development stage, mid-season stage and late-season stage, respectively.

**Figure 7.**Comparison of simulated and calculated values of spring maize ET

_{c}in 2019 of different models with K

_{c,}T

_{ave}, T

_{max}, n and RH as the input: RF–SVR–PSO model (

**a**), SVR model (

**b**), BPNN model (

**c**) and RF model (

**d**). The areas labeled I, II, III and IV indicate the initial stage, crop development stage, mid-season stage and late-season stage, respectively.

**Figure 8.**Comparison of simulated and calculated values of spring maize ET

_{c}in 2019 of different models with K

_{c}, T

_{ave}, T

_{max}, T

_{min}, n and RH as the input: RF–SVR–PSO model (

**a**), SVR model (

**b**), BPNN model (

**c**) and RF model (

**d**). The areas labeled I, II, III and IV indicate the initial stage, crop development stage, mid-season stage and late-season stage, respectively.

**Table 1.**Daily meteorological data for the experimental site during the spring maize growth periods from 2017 to 2019.

Years | Variables | Max | Min | Average | Sd |
---|---|---|---|---|---|

2017 | ${\mathrm{T}}_{\mathrm{ave}},\mathbb{C}$ | 30.4 | 9.6 | 22.2 | 4.4 |

${\mathrm{T}}_{\mathrm{max}},\mathbb{C}$ | 40.0 | 14.9 | 28.6 | 4.4 | |

${\mathrm{T}}_{\mathrm{min}},\mathbb{C}$ | 26.2 | 0.0 | 16.0 | 5.9 | |

n, h d^{−1} | 13.6 | 0.0 | 8.2 | 3.8 | |

RH, % | 92.0 | 18.0 | 59.2 | 17.0 | |

U, m s^{−1} | 6.6 | 1.1 | 2.9 | 1.2 | |

hP, hPa | 1001.8 | 979.0 | 989.0 | 4.4 | |

Precipitation, mm d^{−1} | 62.8 | 9.0 | 2.0 | 7.1 | |

2018 | ${\mathrm{T}}_{\mathrm{ave}},\mathbb{C}$ | 31.2 | 9.7 | 21.7 | 4.7 |

${\mathrm{T}}_{\mathrm{max}},\mathbb{C}$ | 38.1 | 15.6 | 27.7 | 4.4 | |

${\mathrm{T}}_{\mathrm{min}},\mathbb{C}$ | 27.6 | 1.7 | 16.2 | 6.1 | |

n, h d^{−1} | 13.3 | 0.1 | 7.6 | 3.8 | |

RH, % | 94.0 | 18.0 | 62.6 | 17.0 | |

U, m s^{−1} | 6.4 | 0.9 | 3.2 | 1.2 | |

hP, hPa | 1004.4 | 977.3 | 989.6 | 5.5 | |

Precipitation, mm d^{−1} | 48.3 | 0.0 | 2.0 | 6.3 | |

2019 | ${\mathrm{T}}_{\mathrm{ave}},\mathbb{C}$ | 29.8 | 13.4 | 22.0 | 3.7 |

${\mathrm{T}}_{\mathrm{max}},\mathbb{C}$ | 38.4 | 19.3 | 28.0 | 3.7 | |

${\mathrm{T}}_{\mathrm{min}},\mathbb{C}$ | 26.1 | 3.1 | 16.6 | 4.9 | |

n, h d^{−1} | 14.0 | 0.0 | 6.9 | 4.3 | |

RH, % | 96.0 | 26.0 | 69.0 | 16.6 | |

U, m s^{−1} | 6.4 | 1.2 | 2.8 | 1.2 | |

hP, hPa | 1004.2 | 975.8 | 987.6 | 5.9 | |

Precipitation, mm d^{−1} | 78.3 | 0.0 | 4.7 | 12.4 |

Training Period | Testing Period | ||
---|---|---|---|

Crop Growth Stages | 2017 | 2018 | 2019 |

Initial | 1 May–31 May (31 d) | 28 April–26 May (29 d) | 14 May–10 June (28 d) |

Crop development | 1 June–20 July (50 d) | 27 May–15 July (50 d) | 11 June–28 July (48 d) |

Mid-season | 21 July–1 September (43 d) | 16 July–30 August (46 d) | 29 July–7 September (41 d) |

Late-season | 2 September–27 September (26 d) | 31 August–4 October (35 d) | 8 September–28 September (21 d) |

total days (d) | 150 | 160 | 138 |

**Table 3.**Statistical performance of hybrid RF–SVR–PSO model, standalone SVR model, BPNN model and RF model with three different variables input for training and testing periods.

Input/Model | Training Periods | Testing Periods | ||||||
---|---|---|---|---|---|---|---|---|

R^{2} | RMSE (mm d ^{−1}) | MAE (mm d ^{−1}) | NSE | R^{2} | RMSE (mm d ^{−1}) | MAE (mm d ^{−1}) | NSE | |

K_{c}, T_{ave}, n, RH | ||||||||

RF–PSO–SVR | 0.970 | 0.396 | 0.329 | 0.949 | 0.957 | 0.282 | 0.231 | 0.956 |

SVR | 0.979 | 0.252 | 0.194 | 0.979 | 0.948 | 0.365 | 0.289 | 0.927 |

BPNN | 0.976 | 0.271 | 0.211 | 0.976 | 0.943 | 0.418 | 0.333 | 0.904 |

RF | 0.989 | 0.234 | 0.169 | 0.983 | 0.939 | 0.434 | 0.369 | 0.897 |

K_{c}, T_{ave}, T_{max}, n, RH | ||||||||

RF–PSO–SVR | 0.970 | 0.366 | 0.297 | 0.956 | 0.959 | 0.278 | 0.225 | 0.958 |

SVR | 0.985 | 0.216 | 0.165 | 0.985 | 0.957 | 0.320 | 0.263 | 0.944 |

BPNN | 0.984 | 0.221 | 0.169 | 0.984 | 0.960 | 0.341 | 0.277 | 0.936 |

RF | 0.988 | 0.238 | 0.177 | 0.982 | 0.915 | 0.508 | 0.426 | 0.858 |

K_{c}, T_{ave}, T_{max}, T_{min}, n, RH | ||||||||

RF–PSO–SVR | 0.965 | 0.388 | 0.319 | 0.951 | 0.961 | 0.275 | 0.221 | 0.958 |

SVR | 0.986 | 0.210 | 0.162 | 0.986 | 0.948 | 0.340 | 0.281 | 0.936 |

BPNN | 0.986 | 0.209 | 0.158 | 0.986 | 0.955 | 0.341 | 0.279 | 0.936 |

RF | 0.988 | 0.206 | 0.152 | 0.982 | 0.918 | 0.486 | 0.409 | 0.870 |

Model | Input | Performance Indicator | Reference | ||
---|---|---|---|---|---|

RMSE (mm day ^{−1}) | MAE (mm day ^{−1}) | R^{2} | |||

RF–SVR–PSO | K_{c}, T_{ave}, n, RH | 0.282 | 0.231 | 0.957 | |

SSA–ELM | T_{max}, T_{min}, n, GLAI, h | 0.433 | 0.342 | 0.895 | Jia et al. [60] |

AB | K_{c}, T_{max}, T_{min}, RH, U | 0.954 | 0.688 | 0.856 | Yamaç [20] |

SVM | K_{c}, T_{max}, T_{min}, RH, U | 0.699 | 0.557 | 0.923 | Yamaç [20] |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Hou, W.; Yin, G.; Gu, J.; Ma, N.
Estimation of Spring Maize Evapotranspiration in Semi-Arid Regions of Northeast China Using Machine Learning: An Improved SVR Model Based on PSO and RF Algorithms. *Water* **2023**, *15*, 1503.
https://doi.org/10.3390/w15081503

**AMA Style**

Hou W, Yin G, Gu J, Ma N.
Estimation of Spring Maize Evapotranspiration in Semi-Arid Regions of Northeast China Using Machine Learning: An Improved SVR Model Based on PSO and RF Algorithms. *Water*. 2023; 15(8):1503.
https://doi.org/10.3390/w15081503

**Chicago/Turabian Style**

Hou, Wenjie, Guanghua Yin, Jian Gu, and Ningning Ma.
2023. "Estimation of Spring Maize Evapotranspiration in Semi-Arid Regions of Northeast China Using Machine Learning: An Improved SVR Model Based on PSO and RF Algorithms" *Water* 15, no. 8: 1503.
https://doi.org/10.3390/w15081503