Hybrid Stochastic-Grey Model to Forecast the Behavior of Metal Price in the Mining Industry

Accurate metal price forecasting is the precondition for optimal and sustainable mine production planning. This paper combined two methods for time series analysis. The developed model represents the combination of the Grey System Theory and a Stochastic differential equation. More precisely, we added stochastic term to the first-order whitenization differential equation. Solution of this equation represents the time response function which is capable of creating artificial evolving paths of the metal price. The simulation process resulted in a distribution and adequate expected value at every single point. Further, model efficiency was increased by adding residuals modeled by the Singular Spectrum Analysis method. The model was tested on the monthly lead metal price series. Mean absolute percentage error is 4.37% and the model can be classified as a high-performance model.


Introduction
Many economic factors influence metal price, but the most important are supply and demand, interest and exchange rates, and country economic growth. Creation of an economic model for metal price is quite a hard and time-consuming task. Each of the price drivers is governed by their volatile law. To avoid such a hard task, we used metal price as the final indicator of joint action of these drivers. There is no doubt that metal price has a crucial influence on the wealth of any mining company. Globally, metal price is also very important for some developing countries. Their gross domestic product is significantly based on the metal mining industry. Disturbances in this sector can increase the economic uncertainties in developing countries. Hence, the metal mining industry, i.e., metal price, plays a key role in the sustainable development of those countries. Metal price is a driver that has a significant impact on mine production planning and is highly uncertain. Accordingly, we can call metal price a critical uncertainty. The benefit of developing a price forecasting model is to evaluate resilience and vulnerability of the production plan. The main goal of metal price forecasting is to enable creation of a stable and sustainable plan of mine production.
The following overview is related to the relevant literature about metal price forecasting. Barczak [1] made a comparison between Grey model GM (1,1), Holt's exponential smoothing model and Autoregressive Moving−Average (ARMA) model. All these models have been used to build short-time forecasts for daily time resolution. He concluded that the ARMA model is not suitable to forecast gold price; the Grey model requires more information and adaptive models can be treated as an initial assessment. Askari and Askari [2] applied the original GM (1,1) and Fourier series modified GM (1,1) to forecast gold price. Also, they compared these models with the Autoregressive Integrated Moving Average (ARIMA) method. Results of the comparison demonstrate that the Fourier series modified GM

Stochastic Grey Modeling
Let us consider a primitive time series of metal price X(t) = x(t) , t = 1, 2, . . . , T. which is obtained by monitoring over the specific time period. Applying the Accumulated Generation Operation (AGO) on the primitive series, the following monotonically increasing series is obtained [13][14][15][16]: where elements of the new series are calculated as follows: The sequence of mean value of adjacent values of accumulated series is denoted as: where elements are generated in the following way: A grey differential equation modeling the sequence of mean value is: x(t) + az (1) Equation is the whitened equation of the grey differential equation. Parameters a and b are defined by the Least-squares method as follows: where The solution of Equation (6) or time response function is: Sustainability 2020, 12, 6533 4 of 21 The reconstructed (predicted) value of primitive metal price data is given below: The universe of monitored data X(t) is a collection of observations x(t); each one has been recorded at time t where time is discrete. For t ≥ 2, each element of the universe can be expressed as follows: where w (t)-stochastic component or "white noise." White noise represents the time derivative of the Wiener process or Brownian motion. It means that x(t) is caused by x (t−1) only. Equation (11) represents the first-order stochastic model of X(t). Accordingly, the AGO series can be represented as stochastic time series and Equation (6) becomes the stochastic grey differential equation (SGDE) of the following form: Finally, we obtain a SGDE of the AGO series: where k-coefficient which depends on time resolution of monitoring; σ-standard deviation of AGO series; W(t)-an increment to a standard Brownian motion.
The value of coefficient k is defined as follows: The average number of monitoring (trading) days per year is 250. The solution of the stochastic grey differential equation is based on the application of Ito's lemma and Ito isometry. For t ∈ [0, T], the common form of stochastic differential equation (SDE) is [17][18][19][20][21]: with initial condition X(0) = x(0) and x (0) is constant. In our case; (1) (t) and β(t, X(t)) = kσ are components of SGDE respectively. For a function F(t) ≡ f (t, X(t)), which is at least one differentiable in t and at least twice differentiable in x(t), we have the following expression known as Ito's lemma: Sustainability 2020, 12, 6533 5 of 21 Equation (13) represents the SDE of Grey modeling. For our problem, we apply the function F (1) (t) ≡ f (t, X(t)) = x (1) e at which has the following partial derivatives: If we substitute obtained partial derivatives into expression of Ito's lemma, we get: The solution of the previous SDE is represented by the integral equation: where the last integral represents stochastic integral. The stochastic integral T 0 f (t, X(t))dW(t) has the centered Gaussian distribution [22] T 0 f (t, X(t))dW(t) N 0, with mean and variance given by the Ito isometry Applying Ito isometry in Equation (23), we obtain the final discrete time equation of the reconstructed AGO series: where N (0,1) is the normally distributed random variable with mean 0 and standard deviation 1 and ∆t = 1 (year, month, or day). Equation (27) represents an evolutionary time model which is capable of creating the given number of artificial scenarios (see Figure 1).
where N (0,1) is the normally distributed random variable with mean 0 and standard deviation 1 and Δt = 1 (year, month, or day). Equation (27) represents an evolutionary time model which is capable of creating the given number of artificial scenarios (see Figure 1). Simulating by Equation (27), we obtain the space of simulation which can be represented by the following simulation matrix: Simulatingx (1) (t) by Equation (27), we obtain the space of simulation which can be represented by the following simulation matrix: where S denotes the total number of simulations and T the period of monitoring. Each row of the previous matrix represents one artificial AGO path, while each column represents the set of artificial AGO values at time points. Obviously, for t ≥ 2, the model generates the sequence of probability density functions (see Figure 2) and sequence of expected values ofX: The Inverse Accumulated Generating Operation (IAGO) is used to reconstruct a primitive time series of metal price: For simplicity, reconstructed (predicted) series is expressed as follows: Sustainability 2020, 12, 6533 Accuracy of the developed model is estimated by the mean absolute percentage error (MAPE). MAPE is used as a measure describing the degree of deviation of predicted values from monitored values: If we want to see what possible metal price scenarios beyond point T are, then we should continue simulating Equation (27) from T to T + h, where h represents the number of steps ahead. In this way, we end the prediction phase and start the forecasting phase. Applying the same approach, as we have done in the prediction phase, we obtain the forecasted series as follows: Results of the developed model can be represented as:

Improvement of the Model by Residual Error Correction
To increase efficiency and versatility of the SGDE prediction model, it is necessary to correct reconstructed values with respect to the real situations. Assume that monitored and reconstructed value of the metal price at time point t is x(t) andx(t), respectively. The residual error is calculated according to the following expression: If the residual error is known, then the predicted value of the metal price can be defined as: Thus, it is necessary to establish the residual error prediction model. Such model enables us to define the final predicted value as: wherê ε(t)reconstructed (predicted) residual error obtained by model. Theory of the Singular Spectrum Analysis (SSA) [23][24][25][26] is used as the base for creation of the residual error model. The SSA technique is composed of the two following phases: decomposition and reconstruction. The objective of the first phase is to decompose the original residual error series into a sum of series, i.e., into a sum of components. This phase is followed by the second one, with the objective to reconstruct the original series. The reconstructed series will be composed of the defined number of components.
Suppose ε(t), t = 1, 2, . . . , T is the residual error time series obtained by Equation (36). Let L be an integer such that 2 ≤ L ≤ T, and it is called the window length. A mapping that transforms residual error series into multidimensional matrix The result of embedding is the trajectory matrix: This matrix has the unique characteristic where all elements along the anti-diagonals are equal. It is also known as the Hankel matrix. Let us create the lag-covariance matrix of the following form, S = MM T ∈ R L×L . Eigenvalues and corresponding eigenvectors of the matrix S must be arranged in decreasing order: Note: decreasing order is created only with respect to eigenvalues. The trajectory matrix M can be represented as the following sum of matrices: where r = max(i, such that λ i > 0 ), i.e., r is the rank of matrix M. Equation (41) represents the singular value decomposition of matrix M. The sum of matrices can be divided into several subsets. Within each disjointed subset, we are summing matrices and this procedure is called the grouping phase. It is usual to divide the set of matrices M 1 +M 2 + . . . +M r into two subsets (g = 2).
Subset I1 is associated with the signal component, while I2 with the noisy component. Obviously, selecting the value of p is important. One approach of selecting the appropriate value of p is based on the plot of the logarithms of eigenvalues. The point where there is a significant drop of logarithm value is adopted as the appropriate value of p [25]. Another approach is related to the ratio i / r i=1 λ i as the contribution of the eigenvalue to the singular value decomposition of matrix M. Eigenvalues with small contribution can be excluded from the decomposition. The value of p can be interpreted as a point where contributions are separated into significant and insignificant.
Forecasting is based on the linear recurrent equation. Calculation of the linear vector of coefficients is performed in the following way: where U (L−1) i vector composed of the first L−1 values of the eigenvectors U i α i -vector composed of the last value of the eigenvectors U i , i = 1, 2, . . . , p v 2 verticality coefficient; it must satisfy the following condition, v 2 < 1 Verticality coefficient is calculated as follows: The h−step ahead forecasting of the residual error is based on the following equation: where Outcomes of the residual error model can be represented as: Finally, outcomes of the improved model are as follows: Accuracy of the combined model is also tested by MAPE (see Equation (32)).

Numerical Example
The data used in this paper, for the purpose of testing the model, include monthly values of lead metal price. The period of monitoring ranged from January 2013 to December 2018; see Table 1 [27]. This set of prices is divided into two subsets. The first subset ranging from January 2013 to December Sustainability 2020, 12, 6533 11 of 21 2017 (60 months) was used as the training subset, and within this range, we checked the accuracy of the developed model by comparing the predicted and monitored prices. The second subset ranging from January 2018 to December 2019 (24 months) was used as the validity subset, and within this period, we made a comparison between the forecasted and actual prices. Despite the randomness in the price series, there are always some kinds of governing laws. It is obvious that our metal prices series is too complex and cannot be considered as a regular one. Having applied AGO on the monitored series, we created a precondition of establishing the characteristics of the series. The outcomes are represented in Table 2 and Figure 3.  The following matrices are used to define parameters a and b: According to Equation (7), we obtained a value of a = −0.0009282 and b = 1978.47. The grey differential equation of the AGO series is defined as: The solution of the previous equation is the time response function of the following form: By adding the stochastic term, the previous differential equation becomes the stochastic grey differential equation of the AGO series as follows: The corresponding solution of the SGDE is: We performed 6000 simulations of Equation (57), and the expected AGO outcomes are represented in Table 3. Applying Equation (10) on the reconstructed AGO series, we obtained the predicted series of metal prices (see Table 4 and Figure 4).  Residual errors of the SGDE model are represented in Table 5.   Residual errors of the SGDE model are represented in Table 5. Modeling the residual error series uses window length of L = 10, T = 59, and K = 50. The residual error matrix (trajectory matrix) is as follows: Eigenvalues and corresponding eigenvectors of the matrix S = MM T ∈ R 10×10 are shown in Table 6. The point p, which divides residual error series into the signal and noisy subset, was selected according to the logarithm eigenvalues plot and contribution of eigenvalues (see Figure 5 and Table 6). The point p, which divides residual error series into the signal and noisy subset, was selected according to the logarithm eigenvalues plot and contribution of eigenvalues (see Figure 5 and Table  6). The value of p was selected with respect to the contribution of eigenvalues and is p = 6. Corresponding vectors U 1 , U 2 , . . . , U 6 were included in the reconstruction phase of the residual error series. It indicates that the reconstructed trajectory matrix will be expressed as the sum of six reconstructed matrices,M = 6 i=1 U i U T i M =M 1 +M 2 + . . . +M 6 . Applying the Hankelization on each reconstructed matrix, we obtained the following six reconstructed series of residual errors (see Figure 6). reconstructed matrix, we obtained the following six reconstructed series of residual errors (see Figure  6). The reconstructed residual error series was obtained by summing up these six series and the result is represented by Figure 7. The reconstructed residual error series was obtained by summing up these six series and the result is represented by Figure 7. The final predicted metal price series was obtained by adding the reconstructed residuals to the predicted lead metal prices. The results of the model are represented in Table 7 and Figure 8. Table 7. Predicted values of the metal price obtained by the SGDE+SSA model.  The final predicted metal price series was obtained by adding the reconstructed residuals to the predicted lead metal prices. The results of the model are represented in Table 7 and Figure 8.  The data ranging from January 2018 to December 2019 were collected to estimate the forecasting capability of the SGDE+SSA model. Actual and forecasted metal prices and residual errors are represented in Table 8 and by Figure 9.   The data ranging from January 2018 to December 2019 were collected to estimate the forecasting capability of the SGDE+SSA model. Actual and forecasted metal prices and residual errors are represented in Table 8 and by Figure 9. in Table 9. Table 9. SGDE and SGDE+SSA model accuracy.  Accuracy of the developed methods used for modeling the lead metal price series is represented in Table 9. Table 9. SGDE and SGDE+SSA model accuracy.

Model
Phase MAPE Alipour et al. [9] made a comparison between the linear prediction method ARIMA, nonlinear TGARCH, and stochastic SDE. We used their results to check the effectiveness of our model (Table 10).

SGDE+SSA
The comparative analysis of testing period performance of the ARIMA, TGARCH, SDE, and SGDE+SSA techniques, using MAPE criterion, has been accomplished and is shown in Table 10. According to the table, for the ARIMA, TGARCH, SDE, and SGDE+SSA models, the MAPE values are 20.90%, 54.36%, 15.61%, and 2.04%, respectively. The results indicate that the prediction capability of the SGDE+SSA model is higher in comparison with ARIMA, TGARCH, and SDE. It can be seen that the SDE model is superior over the linear and nonlinear models. It indicates that our approach of merging nonlinear and stochastic techniques is a well-stated hypothesis.

Conclusions
Strategic planning is very important in the mining industry (business) to make a profitable and sustainable production plan. The main source of uncertainty is metal price, which has a significant influence on planning. The aim of this paper was to understand and describe the behavior of metal price. The proposed approach used Stochastic Grey modelling to roughly predict the next price from a set of recent data, and then used the Singular Spectrum Analysis to fit residual error. Through simulation results, this paper offers an effective novel model to deal with the high fluctuation sequence, such as metal price series. It offers a better accuracy prediction than traditional Grey modelling. This study is important for metal mining companies, because they can forecast the metal price behavior. The developed model enables companies to reduce the metal price risks and to increase the efficiency of their operations. Comparative analysis has shown that the developed model can be applied to forecast any mineral commodity. Also, it can be applied for any time series analysis and to forecast future values of many economic commodities having high fluctuation.
Future research will be directed to the creation of stochastic-grey differential equation, which will be capable of incorporating jumps into metal price forecasts. The length of observed time series data