Copper Price Prediction Using Support Vector Regression Technique

: Predicting copper price is essential for making decisions that can affect companies and governments dependent on the copper mining industry. Copper prices follow a time series that is nonlinear and non-stationary, and that has periods that change as a result of potential growth, cyclical ﬂuctuation and errors. Sometimes, the trend and cyclical components together are referred to as a trend-cycle. In order to make predictions, it is necessary to consider the different characteristics of a trend-cycle. In this paper, we study a copper price prediction method using support vector regression (SVR). This work explores the potential of the SVR with external recurrences to make predictions at 5, 10, 15, 20 and 30 days into the future in the copper closing price at the London Metal Exchange. The best model for each forecast interval is performed using a grid search and balanced cross-validation. In experiments on real data sets, our results obtained indicate that the parameters ( C , ε , γ ) of the model support vector regression do not differ between the different prediction intervals. Additionally, the amount of preceding values used to make the estimates does not vary according to the predicted interval. Results show that the support vector regression model has a lower prediction error and is more robust. Our results show that the presented model is able to predict copper price volatilities near reality, as the root-mean-square error (RMSE) was equal to or less than the 2.2% for prediction periods of 5 and 10 days.


Introduction
Copper is one of the first metal products to be listed on the world's main foreign exchange markets: The London Metal Exchange (LME), Commodity Exchange Market of New York (COMEX) and Shanghai Futures Exchange (SHFE). Copper price is determined by the supply and demand dynamics on the metal exchanges, especially the London Metal Exchange. Although it may be strongly influenced by the currency exchange rate and the investment flow, the factors that can cause fluctuations in volatile prices are partially associated with changes in the activity of the economic cycle [1].
There are many reasons for wanting to make predictions about the price of copper. On the one hand, copper, among other natural elements (e.g., silver), has a high electrical and thermal conductivity.
On the other hand, several studies include copper and other metals as products of interest in the evaluations of the prediction to improve the forecasts of price. Such studies employ different methods and mathematical models such as autoregressive integrated moving average (ARIMA) models combined with wavelets [18], meta-heuristics models [19,20], neural networks models [2,21] and hybrid models [5][6][7][8][9]. The Fourier transform [22] is used to analyze the variability of the prices of various metals. In addition, there are works in the literature that study the relationships of commodity and asset price models, such as the case of oil prices and their effects on copper and silver prices [23].

Support Vector Regression Model
Given a data set of N elements {(X i , y i )} N i=1 , where X i is the i-th element in a space of n dimensions, X i = [x 1,i , . . . , x n,i ] ∈ R n and y i (y i ∈ R) is the actual value for X i , a nonlinear function is defined as φ : R n → R n h . To map the entry data, X i is an R n h space of high dimension called space of features that determines the nonlinear transformation φ. So, in a high-dimensional space, there exists a linear function f that makes it possible to relate the entry data X i and output y i . That linear function, the SVR function, is presented in Equation (1), (1) where f (X) represents the foretold values; W ∈ R n and b ∈ R. The SVR minimizes the empiric risk, shown in Equation (2) R reg In the case of the ε-SVR, a loss function ε-insensitive is used [10,24], defined in Equation (3), Θ ε is used to determine the nonlinear function φ in the R n h space to find a function that can fit current training data with a deviation less than or equal to ε (see Figure 1a). This function minimizes the training error between the data training, and the function ε-insensitive is provided by Equation (4) [11,25]. min subject to restrictions (for all, i = 1, . . . , N): Equation (4) punishes the training errors of f (X) and Y through the function ε-insensitive ( Figure 1b). The parameter C determines the compromise between the complexity of the model, expressed by the vector W and the points that fulfill the condition | f (X) − y| ε in Equation (3). If C → ∞, the model has a small margin and is adjusted to the data. If C → 0, the model has a big margin, which is why it is softened. Finally, ξ * i represents the training errors greater than ε and ξ i the errors less than −ε (see Figure 1a). To solve this regression problem, we can replace the internal product of Equation (1) by functions of kernel K(). This makes it possible to perform such an operation in a superior dimension, using low-dimensional space data input without knowing the transformation φ [26], as it is shown in Equation (6). This is called the kernel trick.
The parameters β * and β are Lagrange multipliers associated with the problem of quadratic optimization. Several types of functions can be used as kernel [27], but in this work, we will be using the Gaussian function of a radial base (RBF) [28]: The parameters γ of the kernel function, the regularization constant C and ε of the loss function are considered the parameters of design for the SVR to use. Furthermore, they are obtained from a data set that is different from the training data.

Data Description
There are three major stock exchanges where the copper is traded: LME, COMEX and SHFE. Similar to [6,7], we use the price of copper given by LME, which is widely considered as a reference index for world prices of this metal [29]. The time series used in this research has 2971 daily data of copper prices in US dollars per metric ton from 2 January 2006 until 2 January 2018, as shown in Figure 2. (These data were obtained through a trial subscription on www.lme.com in January 2018. Currently, a membership must be paid for to get updated data.) Similar time ranges of copper price have been used in [6,7,20].

Methodology
The methodology used in this research contains four steps.
Step 1 is focused on preparing the data. Steps 2 and 3 explain the training and prediction stages. Finally, step 4 details the performance measures used.

Step 1: Data Pre-Processing
In the first place, the data is normalized with the min-max normalization (MMN) method [30] within the range of 0 to 1. Then, a suitable range has to be determined for the SVR hyperparameter ε, which is related to the margin of tolerance of punishment for errors in training. For this, it is necessary to know the level of noise N that the time series has. It has an average N = 3.66 × 10 −4 , a root-mean-square value RMS N = 0.0680 and a range between [−0.246, 0.205]. These characteristics allow us to define a conservative interval for ε = [0, 0.3].
Finally, the series of Figure 2 is divided into two series, S a and S b , each one with 50% of the data, which will be used for training and evaluating alternately.

Step 2: Parameters Adjust and Training
In the case of the prediction of time series with SVR, it is assumed that the actual value y t is a function of its previous L values x t = [y t−1 , . . . , y t−L ] and the hyperparameter of the SVR w = [C, ε, γ]. Hence, the model has four parameters: L, the number of prior values to predict the actual value and the three hyperparameters of the SVR. The range of each one is shown in Table 1. Table 1. Ranges for the grid of the parameters L, C, ε and γ for the radial and linear kernel. The i-th model of the SVR (M i ) is defined by the set of parameters To adjust these parameters, it is made into a grid search, according to the recommendation of Hsu et al. [31] and its computational design shows in Algorithm 1. For all combinations of parameters Q i , the model M i is trained with the series S a and is tested with the series S b , and vice versa. Then, for each training and testing, the set of parameters Q i with the least mean squared error (MSE) between the predicted and the original data is selected. This process of training/testing is made using the balanced cross-validation method proposed by McCarthy [32].
Algorithm 1: Algorithm design for grid search to find the best testing models.
Input : training time series (R); testing time series (T) Output : array of the best models with minimum testing error (BTM)

Step 3: Prediction
To make the prediction at p j days, M i (with its best set of parameters Q i ) takes a vector of L i past values, taking into account the previous predicted values if they correspond withŷ t+p j as the value to predict in p j days and x t+p j being the vector that contains the previous L i values that are used in the prediction. Then, you haveŷ t+p j = M i (x t+p j ) and x t+p j given by the expression of Equation (8).

Step 4: Performance Measures
For each prediction interval, the effectiveness of the prediction model will be determined through performance measures such as MSE and RMSE. These performance measures have been used in previous prediction work [33][34][35]. Furthermore, the correlation coefficient between the predicted value and the actual value will be used. The computational design of step 3 and step 4 is shown in Algorithm 2. Shapiro-Wilk normality tests will be used to (1) select the correlation coefficient (Pearson or Spearman) between the predicted values and the real values of the time series and (2) select the method of comparison of the means (Wilcoxon rank-sum test or two-sample t-test) of the errors for the different prediction time horizons. For all tests, a p-value < 0.05 is considered significant.

Results and Analysis
In the experiments, the best models were explored in a grid choosing the best (MSE minor) for each p prediction interval. Table 2 shows the parameters for the best SVR, according to the MSE index, for each prediction interval of p according to the training set and test. Additionally, the correlation index ρ between the real data and the predicted data and the root-mean-square error (RMSE) is shown.
In Table 2, R p → T means that to train the SVR, the set R ∈ {S a , S b } is used, and it is tested in the set T ∈ {S a , S b }, with R = T, where p is the prediction interval, with p ∈ {5, 10, 15, 20, 25, 30}.  It is interesting to note that the amount of previous data (L) is independent of the prediction interval that is made, as well as the parameters of the SVR that remain practically intact. The adjustment capacity for the five-day prediction of the S a → S b time series for the 2017 period is shown in Figure 3.  The best prediction capacity is obtained during the training with the series S a , which is temporarily the oldest. The training could have been enhanced due to the level of noise of this series. The RMS value of the noise of series S a is RMS N a = 0.0848, which is higher than the one of the series S b , RMS N b = 0.0318. For example, for a five-day prediction, training with S a gives MSE = 0.0003 if training with S b , MSE = 0.0012. Furthermore, the dispersion of the MSE is less compared to the training based on the most recent part of the series (see Figure 4a).
In Figure 4a, we show the distribution of the MSE obtained in the predictions summarized in Table 2, which will allow us to evaluate the capabilities of prediction statistically. In addition, the confidence intervals are shown in Figure 4b, with a 5% significance for the mean estimation of the MSE of the sample obtained from the simulation-that is, intervals built at 95% confidence. Table 3 presents the differences of means of the MSE between groups in the lower triangle, and the symbols of confidence indicators in the upper triangle. With concern to the combination of pairs in the test of the hypothesis of mean differences for MSE, it is obtained in the simulation for the different prediction time intervals and the two time series S a and S b . Furthermore, the average difference of the MSE can be visually appreciated according to the 95% confidence interval (see Figure 4b). The previous results are presented in Figure 4b and Table 3. For the comparison of means of the MSE between groups, the Wilcoxon rank-sum test was used, because according to the Shapiro-Wilk test, they do not fit a normal distribution, with p-value ≤ 1.73 × 10 −6 for all groups. Table 3. Differences of means of the mean squared error (MSE) (Wilcoxon rank-sum test) between groups in the lower triangle and p-value in the upper triangle with significance symbol (., *, **, *** indicate statistical significance at the 90%, 95%, 99% and 99.9% levels, respectively).

Discussion
As a result of the search for training parameters for the SVR models and the search for the best prediction models for different forecasting intervals, it was determined that the amount of previous data of the time series that are needed (L = 3 for the best models) is independent of the prediction interval. Similar values have been used in the literature in similar time series, but without justification based on a parameter search that minimizes the prediction error. For example, a value of L = 2 is used in [8,9], L = 3 in [5] and L = 5 in [15].
Our results show that the presented model is able to predict copper price volatilities near reality (see Figure 3). Similar results are obtained compared to other works, such as: • In [9], an analysis on the dynamics of real prices for main industrial metals is presented. Using monthly data, the authors estimated linear and threshold autoregressive models. For the nonlinear models, they assumed that the dynamics of metal prices depend on their deviation from the recursive mean. We use a monthly prediction (30 days) to compare the best RMSE value obtained in both works. Our RMSE (0.033) is similar to the RMSE (0.046) obtained in that work.

•
In [8], the authors use time series models to predict the prices of Shanghai copper futures. This work introduces the application of X12-ARIMA-GARCH family models in futures price analysis and forecasting. To compare their results with ours, we use a short prediction period (5 and 10 days) and compare the RMSE obtained. In the same period, our RMSE (0.017 and 0.018) are similar to the RMSE (0.018 and 0.022) in that work.

•
In [5], a hybrid model is proposed to provide an accurate model for predictions of copper prices. The proposed model combines the adaptive neuro-fuzzy inference system and genetic algorithm. Our work presents an RMSE (RMSE = 0.033) similar to the GA-ANFIS method (RMSE = 0.0813), presented in that work. This is due to the granularity of the training and prediction data. It is interesting to note that in that work, a method based on SVM is shown whose error (RMSE = 0.1027) is high compared to our work. The difference is that in our work, a regression was used with an exhaustive search of its parameters.

•
In [20], a Bat algorithm was used to predict the copper price volatility. The copper price was estimated using time series and Bat algorithms. The time series function used in this work is similar in our work. Under those conditions, the prediction error is RMSE = 0.132. With the method proposed in this work, a maximum error of RMSE = 0.08 is achieved (see Table 2).
Finally, we can observe in Figure 4a that there are significant differences of the MSE at 95% confidence, between S a → S b and S b → S a in each of the prediction intervals 5, 10, 15, 20, 25 and 30 days. In contrast, for the prediction S a → S b , there are no significant differences in the MSE between the prediction intervals 5, 10, 15, 20, 25 and 30 days. This allows us to show the robustness of the prediction in the short and medium-term since the prediction at the five-day interval has not lost performance over the 30-day prediction interval, considering it, in this case, as a medium-term. This is useful for the decision-making process for mining companies and traders. On the other hand, to affect the decision-making process for investors and the government, it is necessary to have reliable long-term forecasts.

Conclusions
In this work, the construction of a model was presented based on SVR that allows making a prediction of the closing value of copper in the Metal London Stock Exchange, as the RMSE was equal to or less than the 2.2% for prediction periods of 5 and 10 days. The method consists of finding the best model through a search in a grid, wherein each model is trained and tested through use of the balancing methods in cross-validation. For the training process, only the data of closing price of the series are used. The results indicate that the model of the SVR can be used regardless of the number of days of the prediction, and this can be done with only three actual values.Additionally, we observed that more current data negatively impact the MSE. This phenomenon must be studied in the future, but there is a signal that this can be explained through the level of noise and the amount of data of the training time series.
The importance of price prediction will depend on the interest of the agents and their objectives in short, medium and long-term prediction periods. Our work aims for short-term predictions of 5 days, 10, . . . , up to 30 days. These predictions will interest brokers and investors who seek to take advantage of the periodic variations with active portfolio management. For medium-term predictions, such as monthly and annual predictions, governments may be more interested in their national budget, as is the case in Chile, which is an economy whose income and tax revenues come from copper mining. In the long-term (more than one year), investors and mining companies will be more interested in their long-term investment plans, such as the process of improvement, expansion, a search of new deposits that give value to their investments, or institutional or private investors with long-term investment horizons with buy and hold investment strategies.
For future work, it is necessary to apply the method to other time series of the stock index-for example, Standard & Poor's 500 (S&P 500), Dow Jones, National Association of Securities Dealers Automated Quotation (NASDAQ) and BOVESPA. Furthermore, we can apply the method to others commodities like gold, silver, brent crude oil and corn to determine if it is possible to make forecasts with an error margin similar to the one found in this work.