Development of a Group Method of Data Handling Technique to Forecast Iron Ore Price

Iron is one of the most applicable metals in the world. The global price of iron ore is determined based on demand and supply. There are numerous parameters (e.g., price of steel, steel production, oil price, gold price, interest rate, inflation rate, iron production, and aluminum price) affecting the global iron ore price. Considering the high number of effective parameters and existence of complex relationship among them, artificial intelligence-based approaches can be employed to predict iron ore price. In this paper, a new intelligence system namely group method of data handling (GMDH) was developed and introduced to predict the price of iron ore. For comparison purposes, four other techniques i.e., autoregressive integrated moving average (ARIMA), support vector regression (SVR), artificial neural network (ANN), and classification and regression tree (CART) were developed for prediction of monthly iron ore price. Then, using testing datasets, the developed models were validated and their performance capacities were compared. The results showed that performance prediction of the GMDH model is significantly better than other predictive models based on four performance indices i.e., root mean square error, variance account for (VAF), mean absolute error, and mean absolute percentage error. Results of VAF (97.89%, 90.81%, 80.95%, 55.02%, and 23.87% for GMDH, SVR, ANN, CART, and ARIMA models, respectively) revealed that the GMDH technique is able to predict iron ore price with higher degree of accuracy compared to the other techniques.


Introduction
The market of iron ore with its numerous agents and contributors has been affected by different conditions and variables. Small and large producers, permanent and seasonal exporters and therefore, different agents have been active in this industry. Different agents consider not only product price but also price of initial ingredients like iron ore, coke, and other influential factors such as oil, transportation fee and demand value of consumptive or alternative products. So, rising or falling price of iron ore should be emphasized based on its influence on demand and price of other products or being affected by them [1].
Iron ore is the basic raw material used in steel production. About 10 to 20 percent of total cost of steel is attributed to iron ore so that identification of factors affecting price of iron ore can be beneficial At the moment, pricing of iron ore is based on seasonal agreements of major producers of iron ore and major consumers of steel so that the two markets of Europe and Eastern Asia (in past, Japan and Now, China) have significant role in internal iron ore deals. A major share of distributed iron ore is produced by the three companies of Vale (a semi-private company) (Rio de Janeiro, Brazil ), Rio Tinto(London, UK), and B.H.P Billiton(Melbourne, Australia). Because of the fact that each ton of crude steel is made by using 1.5 ton of iron ore, production of crude steel can be introduced as a demand agent and because provision of iron ore is almost exclusive, producers provide the market with an output totally in association with instant demand [10].
In the present paper, a new model in the field of mineral price prediction namely group method of data handling (GMDH) is introduced. Literature shows that the models that work on the basis of self-organizing networks containing active neurons (GMDH) are of a higher effectiveness in terms of making more accurate and less labor-intensive predictions. In addition, the paper evaluates the precision of other predictive techniques i.e., autoregressive integrated moving average (ARIMA), ANN, support vector regression (SVR), and classification and regression tree (CART) in predicting At the moment, pricing of iron ore is based on seasonal agreements of major producers of iron ore and major consumers of steel so that the two markets of Europe and Eastern Asia (in past, Japan and Now, China) have significant role in internal iron ore deals. A major share of distributed iron ore is produced by the three companies of Vale (a semi-private company) (Rio de Janeiro, Brazil ), Rio Tinto(London, UK), and B.H.P Billiton(Melbourne, Australia). Because of the fact that each ton of crude steel is made by using 1.5 ton of iron ore, production of crude steel can be introduced as a demand agent and because provision of iron ore is almost exclusive, producers provide the market with an output totally in association with instant demand [10].
In the present paper, a new model in the field of mineral price prediction namely group method of data handling (GMDH) is introduced. Literature shows that the models that work on the basis of self-organizing networks containing active neurons (GMDH) are of a higher effectiveness in terms of making more accurate and less labor-intensive predictions. In addition, the paper evaluates the Appl. Sci. 2020, 10, 2364 3 of 20 precision of other predictive techniques i.e., autoregressive integrated moving average (ARIMA), ANN, support vector regression (SVR), and classification and regression tree (CART) in predicting monthly price of iron ore. Then, after evaluations of performance predictions of the said models, the best one among them is selected and introduced to solve the problem. In the following parts, after introducing input and output parameters, the principles of predictive models and their implementations in predicting price of iron ore is given in detail. Afterward, the proposed models are evaluated through the use of well-known performance indices and four performance indices are calculated and compared to each other in order to choose the best predictive model.

Literature Review
Up to now, different techniques such as statistical and artificial intelligence methods have been suggested for prediction of iron ore price [7]. Traditional mathematical models such as time series [11,69] and the multi linear regression [70,71] models have been used for metals price prediction. One of the most common traditional mathematical models for predicting time series is the autoregressive integrated moving average (ARIMA) method. This technique is one of the most routinely used statistical approaches for time series predicting over the past decades. ARIMA have relished beneficial applications in forecasting insurance, economics, energy, engineering, social, stock problems, and foreign exchange [72][73][74][75]. Critical literature review show that the accuracy of SC and ML techniques is superior to the ARIMA models in predicting time series problems [76][77][78][79][80].
As stated before, the applications of ML and SC techniques have been widely used in fields of civil and mining engineering. In terms of metal and material price prediction, several researchers highlighted a successful implementation of such techniques. For example, Parisi et al. [11] used and developed an ANN model for the prediction of variation of gold prices. They considered the role of ward network in the architecture of the proposed ANN and reported a high ability of the ANN model in forecasting gold price. The monthly copper price was predicted by García and Kristjanpoller [81] proposing ANN and FIS predictive models. They constructed many soft computing models of ANN and FIS and finally concluded that FIS is a more powerful technique in predicting copper price compared to ANN model. In another study, Alameer et al. [82] forecasted a long-term monthly gold price fluctuations by using four ANN-based models i.e., particle swarm optimization (PSO)-ANN, genetic algorithm (GA)-ANN, and whale optimization algorithm (WOA)-ANN and grey wolf optimization-ANN. In fact, they studied the power of these hybrid models in predicting gold price fluctuations and finally, they introduced WOA-ANN as a new and applicable model in this field. Ewees et al. [2] proposed a hybrid intelligent model i.e., chaotic grasshopper optimization algorithm-ANN for estimation of iron ore prices and concluded that their proposed model is a promising technique for forecasting commodity prices with high accuracy. In another study of price prediction, the prediction of coal price fluctuations was considered as the objective by Alameer et al. [83]. They developed three models namely support vector machine (SVM), ANN, and deep neural network (DNN) with the aim to successfully show that the DNN model is able to provide higher performance capacity in estimating coal price fluctuations.
According to the above discussion, the authors decide to use and propose the GMDH which is considered as a new intelligent system in prediction of iron ore price. For comparison purposes, a traditional statistical model, i.e., ARIMA, and three ML techniques i.e., SVR, CART, and ANN are selected to be applied for iron ore price prediction. Then, an evaluation is performed between the applied models and the best model is selected and introduced in field of iron ore price prediction.

Input and Output Parameters
In order to estimate the price of iron ore, 12 measurable parameters of highest impact on price of iron ore [84], were defined and considered as input parameters and subsequently, an output parameter was chosen (see Table 1). In Table 1, input and output parameters with their unit, minimum, maximum, and descriptions are presented. All of the existing applied data in the present study was 311 which after using the 10-Fold Cross Validation method, was decreased to 248 number of data from January Appl. Sci. 2020, 10, 2364 4 of 20 1990 to December 2013 (see Figure 2). In the following section, principles and applications of various models i.e., ANN, SVR, GMDH, and CART in predicting price of iron ore are described.

Autoregressive Integrated Moving Average (ARIMA)
Box and Jenkins [85] developed a general forecasting approach to predict future points in the time series. ARIMA can be applied in some cases where data show signal of non-stationarity. In this approach, the future value of a variable is presumed to be a linear function of periodic past explanations and random errors. This model is known as ARIMA (p, q, d), where p stands for number of time lags of the autoregressive model, q is the instruction of moving-average parameters, and d is the number of times the data have had past values subtracted [86]. The parameter is estimated by the number of times the differencing is achieved on the time series [87]. In order to build the best ARIMA model for forecasting of iron ore price, and parameters have to be efficiently determined for a proper model. To estimate the best model, we use the criteria as follows: Bayesian information criterion (BIC), R-squared, root-mean-square error (RMSE), mean absolute percentage error (MAPE), and mean absolute error (MAE). Table 2 shows the different parameters , , and d in the ARIMA model. ARIMA (0, 1, 1) shows the best performance compared to other presented models in Table 2. The residual plots of autocorrelation function (ACF) and partial autocorrelation function (PACF) in

Autoregressive Integrated Moving Average (ARIMA)
Box and Jenkins [85] developed a general forecasting approach to predict future points in the time series. ARIMA can be applied in some cases where data show signal of non-stationarity. In this approach, the future value of a variable is presumed to be a linear function of periodic past explanations and random errors. This model is known as ARIMA (p, q, d), where p stands for number of time lags of the autoregressive model, q is the instruction of moving-average parameters, and d is the number of times the data have had past values subtracted [86]. The d parameter is estimated by the number of times the differencing is achieved on the time series [87]. In order to build the best ARIMA model for forecasting of iron ore price, p and q parameters have to be efficiently determined for a proper model. To estimate the best model, we use the criteria as follows: Bayesian information criterion (BIC), R-squared, root-mean-square error (RMSE), mean absolute percentage error (MAPE), and mean absolute error (MAE). Table 2 shows the different parameters p, q, and d in the ARIMA model. ARIMA (0, 1, 1) shows the best performance compared to other presented models in Table 2. The residual plots of autocorrelation function (ACF) and partial autocorrelation function (PACF) in Figure 3 shows the model is adequate as it shows a random variation from the origin zero (0), the points below and above are all uneven, hence the model fitted is acceptable.  Figure 3 shows the model is adequate as it shows a random variation from the origin zero (0), the points below and above are all uneven, hence the model fitted is acceptable.

Classification and Regression Tree (CART)
The decision tree (DT) is one of the non-parametrical approaches for classification purposes. In this approach, a model of classification is introduced for available observation using a very simple technique. The introduced model has a very simple structure, and is comprehensible for decisionmaking. Although this approach uses simple techniques, it can be performed even sometimes better than complicated approaches such as ANN [88]. The DT has a graph the same as a tree and is composed of roots, branches, leaves, and nodes [65,89]. In this approach, a variable is assigned as root or root node, and each node is then divided into sub-node according to the question about the interval of that variable which is either "Yes" or "No" which in turn indicates an input parameter that includes a prediction of output parameter in itself. Each branch in the DT shows the interval of the node which it is diverged from. The DT is usually drawn from top to bottom so that the roots are located at the top. A leaf is composed of roots, branches, and the final node. The DT includes various algorithms among which CHAID (chi-square automatic interaction detection), QUEST (quick, unbiased and efficient statistical tree), and CART are the most important [90,91]. Reviewing the

Classification and Regression Tree (CART)
The decision tree (DT) is one of the non-parametrical approaches for classification purposes. In this approach, a model of classification is introduced for available observation using a very simple technique. The introduced model has a very simple structure, and is comprehensible for decision-making. Although this approach uses simple techniques, it can be performed even sometimes better than complicated approaches such as ANN [88]. The DT has a graph the same as a tree and is composed of roots, branches, leaves, and nodes [65,89]. In this approach, a variable is assigned as root or root node, and each node is then divided into sub-node according to the question about the interval of that variable which is either "Yes" or "No" which in turn indicates an input parameter that includes a prediction of output parameter in itself. Each branch in the DT shows the interval of the node which it is diverged from. The DT is usually drawn from top to bottom so that the roots are located at the top. A leaf is composed of roots, branches, and the final node. The DT includes various algorithms among which CHAID (chi-square automatic interaction detection), QUEST (quick, unbiased and efficient statistical tree), and CART are the most important [90,91]. Reviewing the previous studies indicated that the CART model is more efficient and practical in the field of predicting and simulating compared to the other available ones. Therefore, this algorithm is selected and applied in the present study.
The CART model was developed by Breiman et al. [92] which can be made into a two-scale tree on the basis of some rules. This algorithm gives the capability to be used in parameters concerning quality and quantity. If the dependent parameter is of quantity type, the formed tree is called the "Regression tree," and if it is of the quality type, it is named "Classification tree." Parameters such as the "Maximum tree depth" and "Number of intervals" as controlling parameters are taken into consideration in this algorithm to avoid extreme complications of the model [92]. The MATLAB software was used to build the CART model. Many CART models considering the most effective parameters on this model, have been constructed and the best model among them was selected to predict price of iron ore. Figure 4 shows the desirable tree for modelling the iron ore price. In addition, Figure 5 displays the predicted iron ore price by CART verses their measured values. Based on this figure, R 2 is determined as 0.936 which revealed that the CART predictive model is able to estimate the iron ore price with high level of accuracy.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 20 previous studies indicated that the CART model is more efficient and practical in the field of predicting and simulating compared to the other available ones. Therefore, this algorithm is selected and applied in the present study. The CART model was developed by Breiman et al. [92] which can be made into a two-scale tree on the basis of some rules. This algorithm gives the capability to be used in parameters concerning quality and quantity. If the dependent parameter is of quantity type, the formed tree is called the "Regression tree," and if it is of the quality type, it is named "Classification tree." Parameters such as the "Maximum tree depth" and "Number of intervals" as controlling parameters are taken into consideration in this algorithm to avoid extreme complications of the model [92]. The MATLAB software was used to build the CART model. Many CART models considering the most effective parameters on this model, have been constructed and the best model among them was selected to predict price of iron ore. Figure 4 shows the desirable tree for modelling the iron ore price. In addition, Figure 5 displays the predicted iron ore price by CART verses their measured values. Based on this figure, R 2 is determined as 0.936 which revealed that the CART predictive model is able to estimate the iron ore price with high level of accuracy.

Support Vector Regression
Support vector machine (SVM) was introduced to solve both classification and regression problems. Among many methods of machine learning, the SVM is considered as a relatively new technique which is based on the structural risk minimization [93]. SVM can be referred to both regression and classification methods. Among them, support vector regression (SVR) is only able to solve extrapolative problems by building a predictive model. The SVR works based on regression estimation and function approximation considering an alternative "loss function." The SVR is defined as a common technique which is based on the Vapnik-Chervonenkis (VC) theory [94,95]. If the VC dimension is low, the expected probability of error is also low, which means good generalization [96].
The concept of the ε-insensitive loss function is graphically shown in Figure 6. Only the samples out of the ± ε margin (known as ε-insensitive tube) will have a nonzero slack variable. Normally, if the predicted value is inside the region, the loss will be zero, while if the predicted point is outside the tube, the loss is the magnitude of the difference between the predicted value and the radius ε of the tube.

Support Vector Regression
Support vector machine (SVM) was introduced to solve both classification and regression problems. Among many methods of machine learning, the SVM is considered as a relatively new technique which is based on the structural risk minimization [93]. SVM can be referred to both regression and classification methods. Among them, support vector regression (SVR) is only able to solve extrapolative problems by building a predictive model. The SVR works based on regression estimation and function approximation considering an alternative "loss function." The SVR is defined as a common technique which is based on the Vapnik-Chervonenkis (VC) theory [94,95]. If the VC dimension is low, the expected probability of error is also low, which means good generalization [96].
The concept of the ε-insensitive loss function is graphically shown in Figure 6. Only the samples out of the ± ε margin (known as ε-insensitive tube) will have a nonzero slack variable. Normally, if the predicted value is inside the region, the loss will be zero, while if the predicted point is outside the tube, the loss is the magnitude of the difference between the predicted value and the radius ε of the tube.

Support Vector Regression
Support vector machine (SVM) was introduced to solve both classification and regression problems. Among many methods of machine learning, the SVM is considered as a relatively new technique which is based on the structural risk minimization [93]. SVM can be referred to both regression and classification methods. Among them, support vector regression (SVR) is only able to solve extrapolative problems by building a predictive model. The SVR works based on regression estimation and function approximation considering an alternative "loss function." The SVR is defined as a common technique which is based on the Vapnik-Chervonenkis (VC) theory [94,95]. If the VC dimension is low, the expected probability of error is also low, which means good generalization [96].
The concept of the ε-insensitive loss function is graphically shown in Figure 6. Only the samples out of the ± ε margin (known as ε-insensitive tube) will have a nonzero slack variable. Normally, if the predicted value is inside the region, the loss will be zero, while if the predicted point is outside the tube, the loss is the magnitude of the difference between the predicted value and the radius ε of the tube.  Mentioning about statistical equations behind SVM and SVR is out of scope of this study. In order to have a detailed view of SVM and SVR, studies conducted by Hasanipanah et al. [97] Appl. Sci. 2020, 10, 2364 8 of 20 and Mahdevari et al. [98] are suggested. Generally, to construct a SVR model, three factors are very important to determine. These factors are the Gaussian kernel parameter γ, the loss function parameter ε, and the penalty term C [98]. A series models of SVR were built defining a fitness function in three loops for each combination of C, ε, and γ. Then, the optimum C, ε, and γ in each run was selected. Table 3 presents the obtained results of the developed SVR models in terms of performance indices i.e., R 2 , root mean square error (RMSE) and mean absolute error (Ea). In addition, different combinations of C, ε, and γ can be seen in Table 3. As shown in Table 3, model number 9 (R 2 = 0.98, RMSE = 6.63, Ea = 3.84 and) outperforms the others. The results demonstrated a high level of accuracy of SVR in estimating iron ore price. Correlation between predicted and measured values of iron ore price using SVR predictive model is displayed in Figure 7. Table 3. Results of ten-fold cross-validation and optimal values of C, ε, and γ in predicting iron ore price. Mentioning about statistical equations behind SVM and SVR is out of scope of this study. In order to have a detailed view of SVM and SVR, studies conducted by Hasanipanah et al. [97] and Mahdevari et al. [98] are suggested. Generally, to construct a SVR model, three factors are very important to determine. These factors are the Gaussian kernel parameter γ, the loss function parameter ε, and the penalty term C [98]. A series models of SVR were built defining a fitness function in three loops for each combination of C, ε, and γ. Then, the optimum C, ε, and γ in each run was selected. Table 3 presents the obtained results of the developed SVR models in terms of performance indices i.e., R 2 , root mean square error (RMSE) and mean absolute error (Ea). In addition, different combinations of C, ε, and γ can be seen in Table 3. As shown in Table 3, model number 9 (R 2 = 0.98, RMSE = 6.63, Ea = 3.84 and) outperforms the others. The results demonstrated a high level of accuracy of SVR in estimating iron ore price. Correlation between predicted and measured values of iron ore price using SVR predictive model is displayed in Figure 7. Table 3. Results of ten-fold cross-validation and optimal values of C, ε, and γ in predicting iron ore price.

Artificial Neural Networks
Artificial neural network (ANN), as it is apparent from their name, are computational networks which are used for simulation of the network of brain neural cells in living creatures. Current knowledge of biologic scientists about the activity of neural neurons in the past few decades is the basis of designing such networks. Neural networks obtain their needed rules through teaching educational teaching. Neural networks use different algorithm for learning but without attention to

Artificial Neural Networks
Artificial neural network (ANN), as it is apparent from their name, are computational networks which are used for simulation of the network of brain neural cells in living creatures. Current knowledge of biologic scientists about the activity of neural neurons in the past few decades is the basis of designing such networks. Neural networks obtain their needed rules through teaching educational teaching. Neural networks use different algorithm for learning but without attention to applied method, learning is generally a repetitive operation during which a set of educational examples are shown to the network so as to help it be thoroughly taught [99][100][101]. So, distribution is the most common learning algorithm of multi-layer neural networks with leading nutrition invented by Rumel Hurt. This method changes connection weights by slope reduction and minimizes error function [61,62,[102][103][104][105]. A neural network is composed of a number of neurons in consequent layers. Basically, three layers exist for such networks: an input layer, one or more middle layers in addition to existing neurons in each layer defined based on complexity of the issue through trial-and-error method. A simple network with leading nutrition of input R is presented in Figure 8 [106].
Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 20 applied method, learning is generally a repetitive operation during which a set of educational examples are shown to the network so as to help it be thoroughly taught [99][100][101]. So, distribution is the most common learning algorithm of multi-layer neural networks with leading nutrition invented by Rumel Hurt. This method changes connection weights by slope reduction and minimizes error function [61,62,[102][103][104][105]. A neural network is composed of a number of neurons in consequent layers. Basically, three layers exist for such networks: an input layer, one or more middle layers in addition to existing neurons in each layer defined based on complexity of the issue through trial-and-error method. A simple network with leading nutrition of input R is presented in Figure 8 [106]. Each input, with suitable weight coefficient (W), is given a weight. Such weights are defined randomly and are corrected during the process of learning with reduction of network error. The sum of weight input and bias make up the input of function of stimulus F. In order to obtain the intended output, one can use a different transfer function. Tangent Sigmoid, logarithm sigmoid, linear and positive linear transfer functions are an example of such functions. In this model, trial-and-error method is used for the two intermediate hidden layers along with addition of logarithm sigmoid transfer function while its output shifts between zero and one. In output layer of leading multi-layer networks, positive linear transfer function was used [107][108][109][110].
Many researchers highlighted application of ANN in various fields of civil and mining works (e.g., [111][112][113]). A number of ANN models with different topology were built and among them, an ANN model with higher accuracy level was chosen. The most influential factors i.e., hidden layer No., No. of nodes in each hidden layer, and the applied transfer function were considered in modeling process. A network with architecture 12-6-1 with minimum error was found to be optimum. R 2 was determined between the predicted and the measured values as 0.977 for the optimum ANN model (see Figure 9) which shows capability of this model in predicting iron ore price. Each input, with suitable weight coefficient (W), is given a weight. Such weights are defined randomly and are corrected during the process of learning with reduction of network error. The sum of weight input and bias make up the input of function of stimulus F. In order to obtain the intended output, one can use a different transfer function. Tangent Sigmoid, logarithm sigmoid, linear and positive linear transfer functions are an example of such functions. In this model, trial-and-error method is used for the two intermediate hidden layers along with addition of logarithm sigmoid transfer function while its output shifts between zero and one. In output layer of leading multi-layer networks, positive linear transfer function was used [107][108][109][110].
Many researchers highlighted application of ANN in various fields of civil and mining works (e.g., [111][112][113]). A number of ANN models with different topology were built and among them, an ANN model with higher accuracy level was chosen. The most influential factors i.e., hidden layer No., No. of nodes in each hidden layer, and the applied transfer function were considered in modeling process. A network with architecture 12-6-1 with minimum error was found to be optimum. R 2 was determined between the predicted and the measured values as 0.977 for the optimum ANN model (see Figure 9) which shows capability of this model in predicting iron ore price.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 20 Figure 9. Scatter plot of the predicted vs. actual iron ore price developing artificial neural network (ANN) model.

Group Method of Data Handling
GMDH, which was introduced by Ivakhnenko in the 1960s [114], refers to a machine learning technique that works on the basis of the heuristic self-organizing principle [115]. This evolutionary computation technique is able to perform a series of operations like crossbreeding, rearing, seeding, and selection and rejection of seeds corresponding. These operations are related to the determination of the input variables, structure, and parameters of model, and the selection of model by the termination principle [116]. GMDH enjoys a high degree of flexibility and users are able to hybridize it by means of evolutionary and iterative algorithms, e.g., PSO [117], and GA [118]. GMDH helps to represent a model as set of neurons (nodes) wherein various pairs of existing neurons within each layer are linked to each other by quadratic polynomial, thereby generating new neurons in the succeeding layer. This type of representation is applicable to modelling processes for the purpose of mapping inputs to outputs and it can solve almost all non-linear and complex problems. Formally, the system identification problem is described as exploring a function f that can be approximately utilized instead of actual function f in a way to estimate the output for a certain input vector X = (x1, x2, x3, …, xn) in a way to be with the minimum distant to its actual output y. Aiming at finding an effective solution, GMDH constructs a general link between the output and input variables which can be in the shape of a mathematical description.
In designing GMDH, all polynomials of the neurons existing within each of the network layers are identical and also the network has been designed on the basis of the same processes. The secondorder polynomial is actually the fundamental structure of the GMDH network originally introduced by Ivakhnenko [114]. In general, various types of polynomial, including tri-quadratic, quadratic, bilinear, and third order have been applied to the process of designing the self-organized systems. Compared to the quadratic type of polynomial, with employing the third order and tri-quadratic polynomial, networks of a higher complexity will be constructed. Bilinear polynomial produced lower complicated structure. The quadratic polynomial involves six weighting coefficients that have been found effective in solving engineering problems [119]. Literature shows that selecting one among different types of polynomials depends upon two factors: (1) The minimum error of objective function, and (2) complexity of the polynomial type. In order to get more details about the GMDH model and its implementation, some other studies are suggested [21,114,115].
In order to design a GMDH model, its effective parameters i.e., the number of neurons, the number of GMDH layers, and selection pressure should be taken into consideration. Selection

Group Method of Data Handling
GMDH, which was introduced by Ivakhnenko in the 1960s [114], refers to a machine learning technique that works on the basis of the heuristic self-organizing principle [115]. This evolutionary computation technique is able to perform a series of operations like crossbreeding, rearing, seeding, and selection and rejection of seeds corresponding. These operations are related to the determination of the input variables, structure, and parameters of model, and the selection of model by the termination principle [116]. GMDH enjoys a high degree of flexibility and users are able to hybridize it by means of evolutionary and iterative algorithms, e.g., PSO [117], and GA [118]. GMDH helps to represent a model as set of neurons (nodes) wherein various pairs of existing neurons within each layer are linked to each other by quadratic polynomial, thereby generating new neurons in the succeeding layer. This type of representation is applicable to modelling processes for the purpose of mapping inputs to outputs and it can solve almost all non-linear and complex problems. Formally, the system identification problem is described as exploring a functionf that can be approximately utilized instead of actual function f in a way to estimate the outputŷ for a certain input vector X = (x 1 , x 2 , x 3 , . . . , x n ) in a way to be with the minimum distant to its actual output y. Aiming at finding an effective solution, GMDH constructs a general link between the output and input variables which can be in the shape of a mathematical description.
In designing GMDH, all polynomials of the neurons existing within each of the network layers are identical and also the network has been designed on the basis of the same processes. The second-order polynomial is actually the fundamental structure of the GMDH network originally introduced by Ivakhnenko [114]. In general, various types of polynomial, including tri-quadratic, quadratic, bilinear, and third order have been applied to the process of designing the self-organized systems. Compared to the quadratic type of polynomial, with employing the third order and tri-quadratic polynomial, networks of a higher complexity will be constructed. Bilinear polynomial produced lower complicated structure. The quadratic polynomial involves six weighting coefficients that have been found effective in solving engineering problems [119]. Literature shows that selecting one among different types of polynomials depends upon two factors: (1) The minimum error of objective function, and (2) complexity of the polynomial type. In order to get more details about the GMDH model and its implementation, some other studies are suggested [21,114,115].
In order to design a GMDH model, its effective parameters i.e., the number of neurons, the number of GMDH layers, and selection pressure should be taken into consideration. Selection pressure factor is actually a criterion that allows the system to select the best fit in each step and moves them to the next layer. This process is repeated till receiving to the defined criteria which is normally system error. Hence, a parametric study has been conducted to see the effects of this parameter through the trial-and-error procedure. The results showed that selection pressure of 50% can be considered as the one among all values. Therefore, it has been used in the rest of modelling process. The design of another effective parameter (neuron number) on the GMDH model needs another parametric study. To do this, considering the previous studies (e.g., [21]), values of 2,4,6,8,10,12,14,16,18, and 20 were selected to be used as number of neurons in the parametric study and results of GMDH models based on R 2 are shown in Table 4. Among the obtained results, as it can be seen in Table 4, GMDH model number 8 with 16 neurons shows the best prediction performance and due to that neuron number of 16 was selected in the rest of modelling process. It should be mentioned that this parametric study was conducted using selection pressure of 50% and layer number of 5. In the last step of GMDH modelling, number of layers should be determined through the conduct of another parametric study. For this purpose, amounts of 2, 3,4,5,6,7,8,9, and 10 were set as possible layers number. This was chosen according to previous investigations and their suggestions (e.g., [21]). According to the obtained results of this parametric study ( with R 2 of 0.980 receives the best performance prediction. Therefore, in design of GMDH model in this study, values of 50%, 16 and 8 were selected for selection pressure, neuron number and layer number, respectively. More discussions about the performance of selected GMDH model will be given in the next section.

Model Evaluation
In this section, evaluation of the developed models for iron ore price prediction is performed. To do this, several performance prediction i.e., R 2 , RMSE, E a , MAPE (mean absolute percentage error) and variance account for (VAF) (see Equations (1)-(4)) were chosen and calculated for all developed models i.e., SVR, ANN, CART, and GMDH. Obviously, higher VAF values and lower error values give a model with higher performance prediction.
where, y and y are the measured and predicted values, respectively. In addition, N in total number of datasets [120]. In order to validate/test the proposed models, the selected datasets which were not used in the modelling process (23 databases from January 2014 to November 2015) were applied and then predicted. Subsequently, the obtained results from testing step were compared with the measured iron ore price values as shown in Figures 10-14 for ARIMA, CART, ANN, SVR, and GMDH models, respectively. As shown in these figures, the measured and predicted values by GMDH are closer to each other compared to other proposed models. This confirmed that the GMDH model which is a new model in the area of this study is able to perform well for prediction of iron ore price. It is a powerful and practical predictive model for such purposes.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 20 and variance account for (VAF) (see Equations (1)-(4)) were chosen and calculated for all developed models i.e., SVR, ANN, CART, and GMDH. Obviously, higher VAF values and lower error values give a model with higher performance prediction.
where, y and y′ are the measured and predicted values, respectively. In addition, N in total number of datasets [120]. In order to validate/test the proposed models, the selected datasets which were not used in the modelling process (23 databases from January 2014 to November 2015) were applied and then predicted. Subsequently, the obtained results from testing step were compared with the measured iron ore price values as shown in Figures 10-14 for ARIMA, CART, ANN, SVR, and GMDH models, respectively. As shown in these figures, the measured and predicted values by GMDH are closer to each other compared to other proposed models. This confirmed that the GMDH model which is a new model in the area of this study is able to perform well for prediction of iron ore price. It is a powerful and practical predictive model for such purposes.         Aside from that, the obtained results of performance prediction for all proposed techniques are tabulated in Table 6. As it can be seen in this table, higher performance prediction is related to results of GMDH model in both train and test phases. Although the other methods e.g., SVR and ANN are also suitable for prediction purposes and they can provide an acceptable level of accuracy, the highest prediction capacity was achieved developing GMDH predictive approach. Hence, this technique can be introduced as a new predictive model in estimating iron ore price.  Aside from that, the obtained results of performance prediction for all proposed techniques are tabulated in Table 6. As it can be seen in this table, higher performance prediction is related to results of GMDH model in both train and test phases. Although the other methods e.g., SVR and ANN are also suitable for prediction purposes and they can provide an acceptable level of accuracy, the highest prediction capacity was achieved developing GMDH predictive approach. Hence, this technique can be introduced as a new predictive model in estimating iron ore price.

Conclusions
The main objective of this study was to propose and introduce a GMDH predictive model as a powerful, capable, and practical one to estimate iron ore price. For comparison purposes, several other techniques i.e., SVR, ANN, ARIMA, and CART were applied and developed to predict iron ore price. For this aim, oil price, steel price, iron ore production, steel production, gold price, inflation rate, exchange rate, interest rate, dowjones stock price, US GDP, aluminum price, China GDP were selected as input variables. A series of analyses with the most effective parameters on the GMDH were conducted in order to see their effects on the system. Then, the best GMDH model was selected among them. Considering several datasets which were not used in the modelling process, the developed models were validated. It was found that the GMDH model which was applied as the first time in the area of this study, could offer more accurate prediction results in comparison with the other implemented models. It was accomplished through the use of some performance prediction such as RMSE and VAF for both model development and model validation. The results of MAPE for testing datasets of GMDH, SVR, ANN, CART, and ARIMA were obtained as 0.039, 0.099, 0.132, 0.169, and 1.001, respectively which show higher prediction capacity of the GMDH technique.

Conflicts of Interest:
The authors declare no conflict of interest.