Machine Learning for Prediction of Energy in Wheat Production

The global population growth has led to a considerable rise in demand for wheat. Today, the amount of energy consumption in agriculture has also increased due to the need for sufficient food for the growing population. Thus, agricultural policymakers in most countries rely on prediction models to influence food security policies. This research aims to predict and reduce the amount of energy consumption in wheat production. Data were collected from the farms of Estahban city in Fars province of Iran by the Jihad Agricultural Department’s experts for 20 years from 1994 to 2013. In this study, a novel prediction method based on consumed energy in the production period is proposed. The model is developed based on artificial intelligence to forecast the output energy in wheat production and uses extreme learning machine (ELM) and support vector regression (SVR). In the experimental stage, the value of elevation metrics for the EVM and ELM was reported to be equal to 0.000000409 and 0.9531, respectively. Total input energy (consumed) is found to be 1,460,503.1 Mega Joules (MJ), and output energy (produced wheat) is 1,401,011.945 MJ for the Estahban. The result indicates the superiority of the ELM model to enhance the decisions of the agricultural policymakers.


Introduction
The adverse effects of population growth on food resources have been studied in several studies [1]. Food security has become more than ever a national security matter of various countries worldwide [2]. The food and agriculture organization (FAO) also emphasizes food security as a measure to facilitate the access of all people to sufficient, safe, and nutritious food to satisfy dietary needs and appetite preferences for an active and healthy life. With regard to this definition, agricultural policymakers must ensure that there is enough food for the communities' dietary needs [3]. Therefore, they should pay more attention to foodstuff forecasting methods [4]. In 2013, the total harvested area was 218.4, the average yield was 3264 kg per hectare, and the total wheat production was equal to 713 million tons. Wheat is an essential plant in Iran, with more than 50% of entire arable lands allocated to it [5]. Thus, in-depth insight into the production and energy usage are of utmost importance for food security and energy planning.
Global wheat crop condition, mostly favorable provided by the Agricultural Market Information System (AMIS) [6], shows that the European Union ranks first followed by China and India, regarding wheat production. Table 1 represents the global wheat production and its projection in a million tons. It shows that EU ranks first, following by China, India, and USA, respectively [7]. The total area under wheat cultivation in the 2014 crop year was reported to be 6.4 million hectares. The total wheat harvest of the country was 12 million tons. Eight million tons of this amount was irrigated wheat. The average yield of irrigated wheat is 3.5 tons per hectare [8]. In 2015, the total cultivated area for wheat was estimated to be 5.7 million hectares. The country produces about 11.5 million tons of wheat. Fars is the second-highest wheat-producing province with 10.19% of Iran's total wheat production. Irrigated wheat yields an average of 3993.2 kg per hectare in Fars from 2015 to 2016. Agricultural policymakers in Iran believe that the exact amount of wheat production in the country must be assessed. Importing excess wheat leads to a price reduction in the country. This can decrease farmers' profits or even cause them to be looser. On the other hand, insufficient wheat imports can lead to unmet demands and an increase in the wheat price, so people cannot afford their annual wheat costs. Engineers in various fields are interested in analyzing current and past data for future predictions using a variety of techniques, such as statistics, modeling, time series, and learning machines.
With each publishing, a variety of crop status maps are produced and distributed. A propagation map showing crop conditions in the major wheat-growing areas was created at the Commission's Agriculture 2020, 10, 517 3 of 19 request. A visual summary of global AMIS crop conditions for the study is now accessible in the area of the wheat production chart (Figure 1). The standalone crop report and the website provide further crop-specific and seasonal maps. Figure 1 provides information about the wheat-growing location in Iran, adapted from [9]. It shows that Khorasan Province in the northeast and Fars Province is the south of Iran are significant locations for wheat production.
Agriculture 2020, 10, 517 3 of 18 With each publishing, a variety of crop status maps are produced and distributed. A propagation map showing crop conditions in the major wheat-growing areas was created at the Commission's request. A visual summary of global AMIS crop conditions for the study is now accessible in the area of the wheat production chart (Figure 1). The standalone crop report and the website provide further crop-specific and seasonal maps. Figure 1 provides information about the wheat-growing location in Iran, adapted from [9]. It shows that Khorasan Province in the northeast and Fars Province is the south of Iran are significant locations for wheat production.  Table 2 shows the overall wheat and other main productions like barley and rice in Iran. The wheat import from Iran has been decreasing since 2014 dramatically [10,11]. The main goal of this study was to predict the total amount of required energy for wheat production. For this purpose, city of Estahban in Fars Province of Iran was nominated as the case study. This research's contribution is that the ELM and SVR methods are used as methods to forecast energy output to determine the best method with the least forecast error for prediction of energy in wheat production of the Iranian city of Estabhan for the first time. Another contribution that highlights this research is the selected seven Input data used for analyzing the methods.
In Section 2, a literature review is presented. In Section 3, the materials and methods are discussed. Results and discussion are presented in Section 4. Finally, Section 5 is the conclusion.  Table 2 shows the overall wheat and other main productions like barley and rice in Iran. The wheat import from Iran has been decreasing since 2014 dramatically [10,11]. The main goal of this study was to predict the total amount of required energy for wheat production. For this purpose, city of Estahban in Fars Province of Iran was nominated as the case study. This research's contribution is that the ELM and SVR methods are used as methods to forecast energy output to determine the best method with the least forecast error for prediction of energy in wheat production of the Iranian city of Estabhan for the first time. Another contribution that highlights this research is the selected seven Input data used for analyzing the methods.
In Section 2, a literature review is presented. In Section 3, the materials and methods are discussed. Results and discussion are presented in Section 4. Finally, Section 5 is the conclusion.

Literature Review
In recent years, many researchers have analyzed the energy consumption for producing agricultural products. In the past 15 years, neural networks have attracted considerable attention. Artificial neural Agriculture 2020, 10, 517 4 of 19 network (ANN) models are based on the biological activities of neurons. Biological neural networks have turned into a critical modeling technique that is used more than other complex input-output methods. ANNs are good for some tasks but not for others. They can learn from examples and solve nonlinear problems (support vector regression) [12]. The support vector machine (SVM) method was considered better than ANN in late 1990, because it gave attractive and better solutions in problems. SVR theory is developed from computational theory, but the development in ANNS is more heuristic. While ANNs minimize empirical risk or training error, SVMs minimize structural risk. In SVR, the objective function is convex, so the global optimum is always reached [13]. SVM is used for discrete data, and SVR is used for continuous data [14]. In a study by Hosseinzadeh-Bandbafha et al. [4] on energy consumption and efficiency in dairy farms in Qazvin, Iran, they estimated the greenhouse gas emissions due to energy consumption in these farms and attempted to optimize the energy use of the farms to reduce the emission rate and the total emission produced. Memon et al. [15] studied the energy consumption pattern in wheat production in Pakistan and concluded that wheat cultivation achieved the highest net energy.
In a study by Zangeneh et al. [16] on potato production units' energy consumption in Hamadan, Iran, these units' energy consumption was determined through in-person interviews with 100 farmers. The farmers were divided into 1: 68 farmers with a high farming machinery level, and 2: consisting of 32 farmers with a low level of farming technology and without machinery. The researchers concluded that farming machinery is necessary to improve the amount of potato production regarding the benefit to cost ratio. Proximal support vector machine (PSVM) and least square support vector machine (LS-SVM) models are derived from SVM that have a higher speed.
Single hidden layer feedforward neural networks (SLFNs) are learning platforms that can widely use feature mappings [17]. ELM is a complex learning algorithm for SLFNs, randomly selecting the input weights matrix and the hidden layer biases [18]. Neumann et al. [19] showed that ELM is appropriate for this purpose.
Nath et al. [20] published an autoregressive integrated moving average (ARIMA) simulation method analysis on the estimation of wheat production in India. The wheat production in India was projected with a time series simulation method. The optimal ARIMA configuration for the analysis was identified to be ARIMA (1, 1,0). The goal was to predict future wheat production by adapting ARIMA (1,1.0) to our time series data for up to 10 years as accurately as possible. The outcomes of the forecasts suggest that the annual production in 2026 and 2027 would increase. With an estimated annual growth rate of about 4%, wheat production will continue to grow. The long short-term memory (LSTM) neuronal network forecasting model for wheat production in Pakistan was reported by Haider et al. [21]. This paper is concerned with creating an effective wheat production prediction model utilizing neural networks with the use of (LSTM). In combination with the LSTM model, a smoothing data preprocessing method is used to enhance predictive precision further. Santamaría-Artigas et al. [22] carried out an evaluation of the near-surface air temperatures arising from the US and Ukraine's reanalysis. This paper analyses ERA-Interim (ERAI), Japanese 55-year reanalysis, Modern-Era Research and Development Retrospective Analysis Version 2, and NCEP1 and NCEP2 reanalysis works for near-surface air sites. The re-analysis data were first related to measurements from weather stations in the US and Ukraine and then analyzed within a winter wheat yield model. The data was validated in the United States and Ukraine. The evaluation of the weather station results indicated that all the data samples worked properly (r 2 > 0.95) and more recent re-analysis, like ERAI, had smaller root-mean-square deviation (RMSD~0.9 • C) errors relative to traditional high-resolution datasets, such as NCEP1 (RMSD~2.4 • C).
The multi-scale and multi-model gridded method for assessing crop production, risk analysis, and impact studies on climate change was introduced by Shelia et al. [23]. This paper provides an overview of gridded crop models and yield forecasts, together with risk analysis and climate impact analyses, methods, techniques, prototypes, and capabilities for the CCAFS Regional Agricultural Forecasting Toolbox (CRAFT). Yazdani [24] completed an economic and scientific evaluation of environmental components in Tabriz and Isfahan (Iran). In this analysis, various climatic variables, Agriculture 2020, 10, 517 5 of 19 including temperature, precipitation, and freezing, were measured to determine the economic value of the environment. For the Tabriz and Isfahan agriculture areas, four products were selected, namely arable wheat, dry farm wheat, arable barley, and dry farm barley. Ram et al. [25] examined health identification of wheat crop utilizing pattern recognition and image processing. The writers came to incorporate a form of pattern identification and an approach of image processing. The system allows a farmer to adopt a particular crop trend in order to assess risks sooner. Combining it with the power of the Internet of Things (IoT), individuals without the need of human resources will simplify the process. Ultimately, this work will speed up farming and enable farmers to grow more in less time.
In a study by Ali and Deo [26], the wheat yield was modeled by a data-intelligent algorithm based on the artificial neural network and genetic programming and minimax probability machine regression (MPMR) results were compared. The criteria used for this comparison were correlation (r), Willmott's index (WI), Nash-Sutcliffe coefficient (EV), root-mean-square error (RMSE), and mean absolute error (MAE). The r, WI, and EV values obtained for station 1 were as follows: for the ANN  . These results demonstrated the excellent capability of ANN as a data-intelligent algorithm in the forecasting of wheat yield based on the nearest neighbor scheme. Salim and Raza [27] conducted a study of the sustainable wheat production nutrient use efficiency (NUE) research. Kamir et al. [28] did a study on forecasting wheat yields in Australia using weather data, time series for satellite images, and machine learning processes. The machine-learning regression models also showed superior performance compared to the methods based on peak normalized difference vegetation index (NDVI) and harvest index (R 2 < 0.46).
Pantazi et al. [29] introduced three based models, of supervised Kohonen networks (SKNs), counter-propagation artificial networks (CP-ANNs), and XY-fusion (XY-F) that implement supervised learning for associating high-resolution data on soil and crop, which utilized iso-frequency classes of yield productivity for wheat. They concluded that the SKN model had better accuracy for prediction. Amato et al. [30] introduced a novel multimedia summarization model from online social networks (OSNs). They focused on the management and sharing of multimedia information. They proposed the summary of the model and heuristics to get a multimedia summary with priority, continuity, variety, and not repetitive features. The results were also validated. Wang et al. [31] carried out efforts to improve the surface energy balance network's meteorological feedback using the mesoscale environment analysis and prevision model. Comparisons of data collected at the weather station were carried out to determine the quality of weather research and forecasting (WRF) simulation. The results showed high agreement between the reports of meteorological stations and wind speed (R 2 = 0.628), air temperature (R 2 = 0.8242), relative humidity (R 2 = 0.8089), and surface pressure (R 2 = 0.8915) values obtained from WRF. According to the above notions, this study aimed to improve the accuracy of the amount of wheat production forecast in Estahban using energy input. The exact forecasting of the harvested amount at the end of the harvest season is essential for import or export planning by policymakers in this field. We used SVR and ELM methods to forecast the wheat yield. Consequently, the research questions are as follows.

1.
How much energy is required to produce wheat in Estahban? 2.
Is it efficient to produce wheat regarding energy consumption? 3.
Which method has higher accuracy for prediction?

Data Collection and Processing
Fars ranks the second wheat-producing province in Iran, with 10.19% of the country's total wheat production. The yield of irrigated wheat in Fars is equal to 2.3993 kg/hectares [8]. The Estahban town, with an elevation of 1690 m above the sea level and annual precipitation of between 50 and 450 mm, has an average yearly temperature of Celsius degrees (Organization for Research and Planning). Figure 2 shows the geographic location of the studied area.
The data for this research were obtained from agricultural experts and engineers in the area of Estahban town, as well as the selected farmers in the region. Data were collected from the farms of Estahban city in Fars province by the Jihad Agricultural Department experts during 20 years from 1994 to 2013. Out of 145 farms, 105 farms were selected as the sample size (Cochran's formula). All 105 farms in the Estahban region had the same climate and geographical characteristics and a mechanized agricultural system. This study's energy analysis aimed to estimate the energy equivalent of inputs and outputs in wheat production. The information for wheat production consists of grains, water, different kinds of fertilizers, pesticides, labor, equipment, machinery, and gasoline [32].
The inputs and outputs have different units. For example, the unit for measurement of diesel fuel is a liter, while chemical substances are measured with their mass (kilograms). In the present study, the energy equivalent (MJ ha −1 ) was used to convert all inputs and outputs into the same unit. Table 3 shows that such unit conversion is common for determining the agricultural studies' input and output flows. To obtain the energy equivalent of inputs, the input rate is multiplied by the corresponding energy coefficient [32].

Selected Input for the Model
The first step in product forecast is the selection of model inputs. The energy input from various sources used in the production process is considered the model input, and the energy output is considered the model output. We used the units in Table 3 to find the energy equivalent of the input and output amounts. All the input and output equations are calculated by the product of the input value and MJ/h [16]. Therefore, we considered 12 agricultural input variables for the model. This study considered different agricultural mechanization indexes, frequency and duration of irrigation, and farm size as input variables to improve the forecast model. The model inputs were determined by the farmers to provide simple measurement and clarity in answers. This study used IBM SPSS version 22 and MATLAB R2013a software.

Support Vector Machine (SVM)
SVM is a classifier, which belongs to the category of kernel methods of machine learning. The linear decision functions that SVM is looking for are as follows: The distance between two classes (margin) in the transformed feature space is shown as 2 ||w|| . The SVM model seeks to maximize the margin and minimize the training error. The optimization problem will be as follows.
subject to:(w.(xi) where the user-defined parameter C determines the tradeoff between the margin size and training error, and the Xi data in t i (w.(x i ) + b) = 1 is called a support vector.

Least Square Support Vector Machine (LS-SVM) Model
The LS-SVM model is the least square error model of the SVM. In this model, the training errors are minimized as square errors in the optimization problem and the inequality constraints are changed to equality type constraints. So, it is possible to implement this method solving a set of linear equations, instead of quadratic programming. LS-SVM optimization is as follows: subject to: (w.(xi) In LS-SVM, the αi values are proportional to the ith data point error, while in SVM, these values are equal to zero for most of the data.

Proximal Support Vector Machine (PSVM) Model
The mathematical model for PSVM is given by: The distinction between SVM, LS-SVM and PSVM models is how to deal with the training error in the optimization problem. The LS-SVM and PSVM models have changed nonlinear constraints to linear ones by changing the objective function, and thus, are able to solve the new optimization problem faster [40].

Extreme Learning Machine (ELM)
ELM theory shows that although hidden neurons play an important role in neural algorithms, this role may not necessarily involve iterative tuning of neuron parameters. In fact, it is possible to have an algorithm in which every parameter of a hidden neuron is generated randomly from a continuous probability distribution and completely independent of training samples [41]. Using ELM, it is possible to approximate any continuous function. ELM network structure: The parameters of hidden nodes (a i , b i ) are randomly generated. h i (x) = g(a i .x.b i ) is the ith hidden node's output for input x, where g is a nonlinear piecewise continuous function (Kasun et al., 2016). Figure 3 illustrates a schematic of the neuron network structure. where the user-defined parameter C determines the tradeoff between the margin size and training error, and the Xi data in ( .(x ) + ) = 1 is called a support vector.

Least Square Support Vector Machine (LS-SVM) Model
The LS-SVM model is the least square error model of the SVM. In this model, the training errors are minimized as square errors in the optimization problem and the inequality constraints are changed to equality type constraints. So, it is possible to implement this method solving a set of linear equations, instead of quadratic programming. LS-SVM optimization is as follows: subject to: ( .( ) + ) = 1 − = 1,…, . In LS-SVM, the αi values are proportional to the i th data point error, while in SVM, these values are equal to zero for most of the data.

Proximal Support Vector Machine (PSVM) Model
The mathematical model for PSVM is given by: subject to: ( .( ) + ) = 1 − = 1,…, . The distinction between SVM, LS-SVM and PSVM models is how to deal with the training error in the optimization problem. The LS-SVM and PSVM models have changed nonlinear constraints to linear ones by changing the objective function, and thus, are able to solve the new optimization problem faster [40].

Extreme Learning Machine (ELM)
ELM theory shows that although hidden neurons play an important role in neural algorithms, this role may not necessarily involve iterative tuning of neuron parameters. In fact, it is possible to have an algorithm in which every parameter of a hidden neuron is generated randomly from a continuous probability distribution and completely independent of training samples [41]. Using ELM, it is possible to approximate any continuous function. ELM network structure: The parameters of hidden nodes (ai, bi) are randomly generated. ℎ ( ) = ( . . ) is the ith hidden node's output for input x, where g is a nonlinear piecewise continuous function (Kasun et al., 2016). Figure 3 illustrates a schematic of the neuron network structure.  The aim of this model is to propose an integrated model that encompasses all SVM, LS-SVM, and PSVM methods. The ELM model was first proposed for SLFNs and then developed for generalized SLFNs. The output function for generalized SLFNs is given by [42,43]: The h(x) function maps the d-dimensional space of the input into an L-dimensional space. There is a weight between the hidden layer and the output layer. It is argued that if the appropriate mapping is selected in the hidden layer, approximating any function is possible with this model. For a two-class arrangement problem, the decision function according to [42,43], is as follows.
One of the features that differentiate the ELM models from typical learning methods is that they try to minimize the training error and the norm of the output weights; a feature that according to Bartlett's theory will result in better generalizability of the model. Therefore, the ELM objective function, which tries to minimize the training error and the norm of the output weights according to [43] is as follows.
Minimize: ||hβ−T||2 and ||β,|| where h is the hidden layer matrix and is given by: As can be seen, minimizing ||β|| is, in fact, equivalent to maximizing the margin between the two classes in two-class classification, i.e., maximizing 2 ||β|| .

Types of ELM models:
Single-output multi-class classification. Multi-output multi-class classification.
In this study, we use the first type, single-output multi-class classification.
Since ELM can approximate any continuous function, one way for multi-class classification is to use an output that is close to the intended class in each area. In this case, the optimization problem according to [43] will be as follows.

Support Vector Regression
Support vector machine (SVM) is an improved machine learning algorithm, which could be used for regression and classification purposes. SVMs are mainly used in classification problems. In order to accept the error in the forecast, i.e., define the ε ratio, we need to present support vector machine with regression analysis, which is known as support vector regression. If we want to bound our error expectation to a certain degree, we can have the cost function for the soft margin hyperplane with constant C, based on [44,45], is as follows. where loss function l ε (y i .ŷ i ), based on [44,45], is defined as follows.
where y i is the real andŷ is the forecasted output of our model. This function decreases the forecast errors equal to or smaller than ε, and penalizes the values above ε by the amount of constant C, and adds the forecast error above ε to the cost function. In other words, the l ε (y i ,ŷ i ) function is indifferent to errors equal to or smaller than ε. With N training samples, the penalty function, which is the additional cost per each unit of error in the cost function of the problem, is defined as C 1 N N i=1 l ε (y i ,ŷ i ) and known as empirical risk. So, using two slack variables £ i and £ * i to add the surplus error, the optimization problem, based on [44,45], will be as follows.
sub ject to : Similar to the classification problem, the Lagrange function of the main problem is given by [46]: Regarding the above problem, the constraints £ i and £ * i ≥ 0 must be added to the Lagrangian model. Additionally, contrary to the classification case, there is no assumption as ε = 1, because in regression, {yi} determines the size of the problem. Now, taking the derivatives of the Lagrange function with respect to w, b, £ * i and £ i , we have [46]: sub ject to : In the above problem, which should be optimized by changing ∝ i and ∝ * i variables, ∝ * i − ∝ i 0 is the case in support vectors. Among support vectors, if ∝ * i − ∝ i = C, the vector will fall out of the function range and is considered out of the acceptable region (ε region). Like the classification case, the kernel functions can substitute for x i , x j in the above function as well. The values for ∝ i and ∝ * i are determined using quadratic programming to solve the problem. They are obtained with the equation W = N i =1 ∝ i − ∝ * i x i for the coefficients matrix, which is made up of input data xi and is independent of the state variable y i . To calculate b, the following equations can be used [46]: Figure 4 illustrates the process of proposed model from start to stop.
input data xi and is independent of the state variable yi. To calculate b, the following equations can be used [46]: Figure 4 illustrates the process of proposed model from start to stop.

The Functions Used in ELM
Some activation functions of the ELM model based on [47] are as follows. "sig" stands for Sigmoidal function. "sin" stands for Sine function. "hardlim" stands for Hardlim function. "tribas" stands for the trigonometric basis functions. "sig" stands for Sigmoidal function. "sin" stands for Sine function. "hardlim" stands for Hardlim function. "tribas" stands for the trigonometric basis functions.
Jahangir et al.
[47] investigated the rainfall-runoff process simulation using a back propagation artificial neural network with a sigmoid activation function in Kardeh watershed. The results showed that a multilayer perceptron network with a hidden layer simulated the runoff process with high accuracy.
In this study, we use the Sigmoidal function in ELM to forecast the output energy.

Using Kernels
Kernels can also be used in ELM models, like the SVM-based models. In this case, the ELM kernel is defined as [48]: ΩELM = HH t : ΩELMi, = h(xi).h(x j) = K(xi, x j).
The output function of the model can be obtained with the kernel, as follows [48]:

Performance Measures
The root mean square error (RMSE) measures the difference between values forecasted by a model or a statistical estimator and the actual values. RMSE is a good tool to compare the forecast errors made by a dataset [49-51]:

The Pattern Used in Wheat Cultivation
Using the conversion factors presented in Table 4, the pattern for the energy consumption of each farm was studied. The average value of consumed energy from different sources and the total energy production are summarized in Table 4. In this study, the ELM and SVR methods are used as methods to forecast energy output to determine the best approach with the least forecast error. The accuracy and error of the model depend on the selected parameters (listed in Table 4). In this method, the accuracy of the model depends on the correct selection of the number of hidden layer neurons, as well as the amount of training data and testing data. The results obtained from the ELM and SVR model are shown in Table 5.  Figure 5 shows the plot of Quantile-Quantile (QQ) sample data versus standard normal and the dispersion around the regression line for the factors affecting input and output energy is shown in Figure 6. As can be seen, the plots are in a linear pattern. Clearly, the forecast at the training stage is better than the forecast at the testing stage. This indicates that the data for effective energy input parameters in the experiment section are irregular and insufficient. In the ELM method, 70% of the data are used as training and 30% of them are used as testing data. Additionally, the number of hidden neurons is 30, and the Sigmoidal kernel is used: (TrainResult,TestResult') = ELM_V2('Nis',70, 0, 30, 'sig'). (23) dispersion around the regression line for the factors affecting input and output energy is shown in Figure 6. As can be seen, the plots are in a linear pattern. Clearly, the forecast at the training stage is better than the forecast at the testing stage. This indicates that the data for effective energy input parameters in the experiment section are irregular and insufficient. In the ELM method, 70% of the data are used as training and 30% of them are used as testing data. Additionally, the number of hidden neurons is 30, and the Sigmoidal kernel is used:

Radial Basis SVR Method
Eleven neurons and radial basis function kernels in implementing the SVR method were utilized for this analysis. Figures 7-9 show the results for the training data stage, testing data stage, and all data, respectively.

Radial Basis SVR Method
Eleven neurons and radial basis function kernels in implementing the SVR method were utilized for this analysis. Figures 7-9 show the results for the training data stage, testing data stage, and all data, respectively.

Discussion
The amount of energy input and output in the wheat fields was calculated to be 1,460,503.1 MJ and 1,401,011.945 MJ, respectively. The energy efficiency is 0.955653987%. According to Table 4, 37.888% of the total energy used is for diesel fuel, 21.7% for nitrogen fertilizer, 14.67% for machinery and equipment, and 14.653% for water, sorted by the amount of consumed energy. The results of the extreme learning machine (ELM) using the SVRrbf method are shown in Table 6. For performance measurement, we determined the relative importance of independent input parameters of ELM on the output. The root mean squared error (RMSE), correlation coefficient (R), R 2 , and MSE were used to evaluate the difference between the ELM model's expected and actual values. R's value in the training stage of ELM is equal to 0.981, which is less than R = 1 in the SVR training stage. Additionally, in the testing stage of the ELM, R is 0.9763, which is larger than R = 0.40872 in the training of the SVR method. R in the testing stage of the ELM method is more significant than 0.8, so it is considered acceptable. The value of R 2 = 0.9531 in the ELM method's testing stage is greater than R 2 = 0.167052 of the SVR method and is therefore acceptable. The value of MSE = 0.0102 and RMSE = 0.01010 in the testing stage of ELM is much less than MSE = 503.4801 and RMSE = 22.4384 in the SVR method. The results of this study were gained within the Estahban region. The method of this study could be performed in other regions of the world and could also be used for other crops. Clearly, it could be used for other crops in which different data would be required.

Application of the Developed Model in the Future
The reason for studying the ELM and SVR methods is to develop an accurate forecasting model with minimal error, which can be utilized by policymakers and farmers in estimating the number of products at the end of the harvesting season. At the beginning of the planting season for wheat, farmers decide on how much of the cultivated fields should be allocated to wheat. This decision is made regarding the amount of annual precipitation and also, the amount of consumed water. Farmers are informed about the amount of consumed grain, chemical fertilizers, etc. due to experience and the region's views of agricultural engineers. They also have enough information to estimate and measure the materials required and the labor and machinery used. By correctly educating farmers on a large scale, they will be able to inform their local agricultural organizations about their product forecasts. This will help policymakers on deciding how much wheat should be imported, if necessary at all. Additionally, this can have a significant impact on the appropriate pricing of wheat, in order to protect farmers and the community. Another advantage of this model includes using farmers' forecasted data to propose a dynamic model of the agricultural system, simulating future data, estimating the amount of product and materials to be consumed in the future, making decisions, and preventing future risk factors.

Conclusions
Energy consumption has a significant impact on global warming and climate change. So reducing energy consumption will have a great effect on controlling the increasing rate of global warming. With regard to the data in Table 4, one can infer that to reduce the amount of energy consumption in wheat production, it is possible to limit the amount of nitrogen fertilizer, diesel fuel, and water, and also utilize more energy-efficient machinery and equipment. In this study, extreme learning machine and support vector regression methods were applied to forecast wheat production's output energy in the Estahban region. The data for agricultural parameters effective in the production of wheat were converted into their energy equivalent and used as the model's inputs. The results showed that ELM is better than SVR and other methods for forecasting wheat production's output energy in Estahban. Furthermore, the ELM method has much smaller error than radial basis function SVR and can provide more accurate forecasts. It also is faster than the SVR method for forecasting problems. The extended ELM model is capable of learning patterns and can forecast the model's energy output with the lowest error. The ELM method can be used to forecast other agricultural products. The results showed that the proposed method is able to improve forecast accuracy and can be generalized. So, ELM can be an excellent alternative to radial basis function SVR. The concluding remarks are as follows.

•
To produce wheat in Estahban, 1,460,503.1 MJ of energy is required.

•
From the obtained data, it is not energy efficient to produce wheat because 1,460,503.1 MJ energy is required to produce it, the total energy of which is 1,401,011.9 MJ. On the other hand, wheat is necessary for human beings.

•
The extreme ELM model is capable of learning patterns and can forecast the energy output of the model with the lowest error.
In the future, researchers could use different crops to predict the required energy. Additionally, other countries could be investigated for this purpose. Finally, different methods and algorithms can also be used and compared.