1. Introduction
Forecasting commodity prices can affect the performance of companies. The turnover of trading firms and firms producing goods is affected by future prices. The costs of users of commodities are also determined by the prices of these products. The magnitude of profits naturally affects the value of firms (
Carter et al. 2017). Commodity prices should be taken into account when planning revenues and costs. Hedging strategies should be developed to reduce profit volatility. Hedging the risk of price changes also has costs. Therefore, more accurate forecasting of commodity price movements is a value-adding factor in the above considerations.
Changes in commodity prices also affect the cost of credit (
Donders et al. 2018). Of course, the volatility of a firm’s profits also affects the perception of credit risk. Higher volatility leads to higher financing costs. Proper forecasting of commodity price movements can support the design of a strategy to reduce profit volatility.
A variety of methods are used to forecast the prices of different commodities and shares. They are basically grouped into three main categories. The first includes traditional statistical methods, the second some kind of artificial intelligence-based methods, and the third so-called hybrid methods (
Kim and Won 2018;
Vidal and Kristjanpoller 2020;
Zolfaghari and Gholami 2021). In our paper, we focus only on the group of predictive models based on artificial intelligence, which includes the following algorithms: ANNs (Artificial Neural Networks), DNNs (Deep Neural Networks), GAs (Genetic Algorithms), SVM (Support Vector Machine), and FNNs (Fuzzy Neural Networks). Artificial intelligence-based models have several advantages over traditional statistical models because of their complexity and much higher predictive accuracy. Because of their learning ability, AI-based models are able to recognise patterns in the data, such as non-linear movements. Exchange rates exhibit non-stationary and non-linear movements that traditional statistical models are unable to detect, and AI-based methodologies have taken the lead in this area over time.
In their study,
Gonzalez Miranda and Burgess (
1997) modelled the implied volatility of IBEX35 index options using a multi-layer perceptual neural network over the period November 1992 to June 1994. Their experience shows that forecasting with nonlinear NNs generally produces results that dominate over forecasts from traditional linear methods. This is due to the fact that the NN takes into account potentially complex nonlinear relationships that traditional linear models cannot handle well.
Hiransha et al. (
2018) produced forecasts of stock price movements on the NSE and NYSE. They based their analysis on the following models: multi-layer perceptual model, RNN, LSTM (Long-Short-Term Memory) and CNN (Convolutional Neural Network). Based on empirical analysis, CNN performed the best. The results were also compared with the outputs of the ARIMA method and in this comparison, CNN was also the optimal choice.
Ormoneit and Neuneier (
1996) used a multilayer perceptual and density estimator neural network to predict the volatility of the DAX index for the period January 1983 to May 1991. In comparing the two models, they concluded that the density estimator neural network outperformed the perceptual method without a specific target distribution.
Hamid and Iqbal (
2004) applied the ANN methodology to predict the volatility of S&P 500 index futures. From their empirical analysis, they concluded that ANNs’ forecasts are better than implied volatility estimation models.
Ou and Wang (
2009) conducted research on trend forecasting of the Hang Seng index using tree-based classification, K-nearest neighbor (KNN), SVM, Bayesian clustering and neural network models. The final result of the analysis showed that SVM is able to outperform the other predictive methods.
Ballings et al. (
2015) compared the AdaBoost, Random Forest, Kernel factory, SVM, KNN, logistic regression and ANN methods using stock price data from European companies. They tried to predict stock price trajectories one year ahead. The final result showed that Random Forest was the best performer.
Nabipour et al. (
2020) compared the predictive ability of nine different machine learning and two deep learning algorithms (Recurrent Neural Network, Long-Short-Term Memory) on stock data of financial, oil, non-metallic mineral and metallic materials companies on the Tehran Stock Exchange. They concluded that RNN and LSTM outperformed all other predictive models.
Long et al. (
2020) used machine learning (Random Forest, Adaptive Boosting), bi-directional deep learning (BiLSTM) and other neural network models to investigate the predictability of Chinese stock prices. BiLSTM was able to achieve the highest performance, far outperforming the other forecasting methods.
Fischer and Krauss (
2018) examined data from the S&P500 index between 1992 and 2015. Random Forest, logistic regression and LSTM were used for the forecasts. Their final conclusion was that the long short-term memory algorithm gave the best results.
Nelson et al. (
2017) applied the multi-layer perceptual, Random Forest, and LSTM models to Brazilian stock market data to answer the question of which of the three models is the most accurate predictor. They concluded that the LSTM was the most accurate.
Nikou et al. (
2019) analysed the daily price movements of the iShares MSCI UK exchange-traded fund over the period January 2015 to June 2018. ANN, Support Vector Machine (SVM), Random Forest and LSTM models were used to generate the predicted values. LSTM obtained the best score, while SVM was the second-most accurate.
Recent research (
van der Lugt and Feelders 2019;
Hajiabotorabi et al. 2019) comparing the predictive ability of ANNs and RNNs concluded that RNNs can outperform traditional neural networks. Also prominent among these methods is the long-short-term memory (LSTM) model, which has been applied to a wide range of sequential datasets. This model variant has the advantage of showing high adaptability in the analysis of time series (
Petersen et al. 2019).
In their research,
Thi Kieu Tran et al. (
2020) demonstrated that the temporal impact of past information is not taken into account by ANNs for predicting time series, and therefore deep learning methods (DNN) have recently been increasingly used. A prominent group of these are Recurrent Neural Networks (RNNs), which have the advantage of providing feedback in their architecture.
Kaushik and Giri (
2020) compared LSTM, vector autoregression (VAR) and SVM for predicting exchange rate changes. Their analysis revealed that the LSTM model outperformed both SVM and VAR methods in forecasting.
Basak et al. (
2019) used XGBoost, logistic regression, SVM, ANN and Random Forest to predict stock market trends. The results showed that Random Forest outperformed the others.
Siami-Namini et al. (
2018) examined data from the S&P500 and Nikkei 225 indices in their study. The final conclusion was that the superiority of LSTM over ARIMA prevailed.
Liu (
2019) focused on the prediction of the S&P500 index and Apple stock price in his study. He concluded that over a longer forecasting time horizon, LSTM and SVM outperform the GARCH model.
Based on the above, the LSTM is considered to be quite good in terms of predictive performance, but it has a serious shortcoming, namely that it cannot represent the multi-frequency characteristics of time series, and therefore it does not allow the frequency domain of the data to be modelled. To overcome this problem,
Zhang et al. (
2017) proposed the use of Fourier transform to extract time-frequency information. In their research, they combined this method with a neural network model; however, these two types are mutually exclusive since the information content of the time domain is not included in the frequency domain and the information of the frequency domain does not appear in the time domain. This ambiguity is addressed by the wavelet transform (WT). This and the ARIMA model were compared by
Skehin et al. (
2018) with respect to FAANG (Facebook, Apple, Amazon, Netflix, Google) stocks listed on the NASDAQ. They concluded that in all cases except Apple, ARIMA outperformed WT-LSTM for the next-day stock price prediction.
Liang et al. (
2019) investigated the predictive performance of the traditional LSTM and the LSTM model augmented with wavelet transform on S&P500 index data. Their work demonstrated that WT-LSTM can outperform the traditional long-short-term memory method.
Liang et al. (
2022) studied the evolution of the gold price. They propose the application of a novel decomposition model to predict the price. First, decomposition into sub-layers of different frequencies is performed. Then, the prediction is performed for each sublayer using long short-term memory, convolutional neural networks and convolutional attention block module (LSTM-CNN-CBAM). Secondly, the long-short-term memory, convolutional neural networks, and convolutional block attention module (LSTM-CNN-CBAM) joint forecasting all sublayers. The last step is to summarise the partial results. Their results show that the collaboration between LSTM, CNN and CBAM can enhance the modelling capability and improve prediction accuracy. In addition, the ICEEMDAN decomposition algorithm can further improve the accuracy of the prediction, and the prediction effect is better than other decomposition methods.
Nickel is becoming an increasingly important raw material as electromobility develops.
Ozdemir et al. (
2022) discuss the medium- and long-term price forecast of nickel in their study. They employ two advanced deep learning architectures, namely LSTM and GRU. The MAPE criterion is used to evaluate the forecasting performance. For both models, their forecasting ability has been demonstrated. In addition to the prediction capability, the speed of the calculations is tested. When processing high resolution data, speed can be an important factor. The study found that GRU networks were 33% faster than LSTM networks.
For copper price prediction,
Luo et al. (
2022) propose a two-phase prediction architecture. The first phase is the initial forecasting phase. The second phase is error correction. In the first phase, factors that could affect the price of copper are selected. After selecting the three most influential factors, a GALSTM model is developed. This is necessary to make the initial forecasts. A 30-year historical data series is then used to validate the model.
Companies are diverse in terms of financial risk (
Ali et al. 2022). A more accurate price forecasting model can contribute to better enterprise risk management. It can help reduce earnings volatility. It can also support the development of a more effective hedging strategy.
The study aims to test modern forecasting techniques. A total of two families of models (decision trees, artificial intelligence) will be used to produce estimates. The question is which of the tested techniques gives more accurate forecasting results. The tests are carried out for eight commodity products across the categories of oil, gas, and precious metals.
2. Data and Methods
The most commonly used metrics in the literature for evaluating the predictive models and assessing their accuracy are root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) (
Nti et al. 2020).
- (a)
Root mean squared error (RMSE): this performance indicator shows an estimation of the residuals between actual and predicted values.
where
is the estimated value produced by the model,
is the actual value, and n is the number of observations.
- (b)
Mean Absolute Error (MAE): this indicator measures the average magnitude of the error in a set of predictions.
- (c)
Mean Absolute Percentage Error (MAPE): this indicator measures the average magnitude of the error in a set of predictions and shows the deviations in percentage.
The forecasts are more reliable and accurate if these indicators have lower values. It is important to note that RMSE minimizes larger deviations more due to squaring, so this metric can give more extreme values than MAE. The former should be interpreted as a value, while the MAPE should be interpreted as a percentage (deviations expressed as a percentage of the original value). For this reason, the MAPE can be used to compare different instruments because it does not depend on the nominal size of the price. As our study examined indices from around the world and the effects of two different negative economic events, we used the MAPE indicator for comparability in the overall assessment of the models.
The SVM is used to predict time series where the state of the variables is not constant or the classical methods are not justified due to the high complexity. SVRs are a subtype of SVM used to predict future price paths. SVM is able to eliminate irrelevant and high variance data in the predictive process and improve the accuracy of the prediction. SVM is based on the structural minimization of risk taken from statistical training theory. SVM can be used in financial data modelling as long as there are no strong assumptions. SVM is based on a linear classification of the data that seeks to maximize reliability. The optimal fit of the data is achieved using second-order programming methods, which are well-known methods for solving constraint problems. Prior to linear classification, the data is propagated through a phi function into a wider space so that the algorithm can classify highly complex data. This algorithm thus uses a nonlinear mapping to convert the main data to a higher dimension and linear optimality to separate the hyperplane (
Nikou et al. 2019).
The decision boundary is defined in Equation (4), where SVMs can map the input vectors
into a high dimensional feature space
and
is mapped by the kernel function
.
SVMs convert non-separable classes into separable ones with linear, non-linear, sigmoid, radial basis and polynomial kernel functions. The formula of the kernel functions is shown in Equations (5)–(7), where γ is the constant of the radial basis function and
d is the degree of the polynomial function. The two adjustable parameters in the sigmoid function are the slope α and the intercepted constant
c.
SVMs are often very efficient in high-dimensional spaces and in cases where the number of dimensions is larger than the number of samples. However, to avoid overfitting, the number of features in the choice of regularisation members and kernel functions should be much larger than the number of samples (
Nabipour et al. 2020). The hyperparameters of SVM model can be found in
Table 1.
Random Forest (RF) is a combination of several decision trees, developed to achieve better prediction performance than when using only a single decision tree. Each decision tree in RF is based on a bootstrap pattern using binary recursive partitioning (BRP). In the BRP algorithm, a random subset of input vectors is selected and then all possible combinations of all input vectors are evaluated. The resulting best split is then used to create a binary classification. These processes are repeated recursively within each successive partition and are terminated when the partition size becomes equal to 1. Two important fine-tuning parameters are used in the modelling of RF: one is the number of branches in the cluster (
p), and the other is the number of input vectors to be sampled in each split (
k). Each decision tree in RF is learned from a random sample of the data set (
Ismail et al. 2020).
To build the RF model, three parameters must be defined beforehand: the number of trees (
n), the number of variables (
K) and the maximum depth of the decision trees (
J). Learning sets (
Di,
i = 1, ….,
n) and variable sets (
Vi,
i = 1, ….,
n) of decision trees are created by random sampling with replacement, which is called bootstrapping. Each decision tree of depth
J generates a weak learner
from each set of learners and variables. The hyperparameters of RF model can be found in
Table 2. Then, these weak learners are used to predict the test data, and an ensemble of n trees
is then generated. For a new sample, the RF can be defined as follows (
Park et al. 2022):
where
is the predicted value of
.
XGBoost is a model based on decision trees. Compared to other tree-based models, XGBoost can achieve higher estimation accuracy and much faster learning speeds due to parallelisation and decentralisation. Other advantages of the XGBoost method are that it uses regularisation to prevent overfitting, has built-in cross-validation capability, and handles missing data professionally. The XGBoost model incorporates multiple classification and regression trees (CART). XGBoost performs binary splitting and generates a decision tree by segmenting a subset of the data set using all predictors. This creates two subnodes. The XGBoost model with multiple CART can be defined as follows:
where
R is the number of trees and
F is the total number of CARTs occurring.
corresponds to each independent tree and the weight of each leaf. The objective function of the XGBoost model can be defined as follows:
where
l is a loss function measuring the difference between
and
.
is a regularisation term that prevents overfitting by defining the complexity of the model. Assuming that
is the predicted value at time
t, the objective function can be written as (
Han et al. 2023):
The hyperparameters of XGBoost model can be found in
Table 3.
GRU is a type of recurrent neural network (RNN) that can provide outstanding performance in predicting time series. It is similar to the other neural network model (LSTM) discussed in more detail in the next subchapter, but GRU has lower computational power requirements, which can greatly improve learning efficiency.
It has the same input and output structure as a simple RNN. The internal structure of the GRU unit contains only two gates: the update gate
, and the reset gate
. The update gate
determines the value of the previous memory saved for the current time, and the restore gate r
t determines how the new input information is to be combined with the previous memory value. Unlike the LSTM algorithm, the
update gate can both forget and select the memory contents, which improves computational performance and reduces runtime requirements. The GRU context can be defined by the following equations:
where σ() is a logistic sigmoid function, i.e.,
is the hidden state of the neuron at the last moment.
are the weight matrices of the update gate.
are the weight matrices of the reset gate.
are the weight matrices of the temporary output.
is the input value at time t, and
and
are the information vectors that provide hidden layer output and temporary unit state at time
t (
Xiao et al. 2022). The hyperparameters of GRU model can be found in
Table 4.
LSTM is a type of recurrent neural network (RNN) often used in sequential data research. Long-term memory refers to learning weights and short-term memory refers to the internal states of cells. LSTM was created to solve the problem of the vanishing gradient of RNNs, the main change of which is the replacement of the middle layer of the RNN by a block (LSTM block). The main feature of LSTM is the possibility of long-term affilation learning, which was impossible for RNNs. To predict the data associated with the next time point, it is necessary to update the weight values of the network, which requires the maintenance of the initial time interval data. An RNN could only learn a limited number of short-term affilations; however, RNNs cannot learn long-term time series. LSTM can handle them adequately. The structure of the LSTM model is a set of recurrent subnets, called memory blocks. Each block contains one or more autoregressive memory cells and three multiple units (input, output, and forget) that perform continuous write, read, and cell operation control (
Ortu et al. 2022). The LSTM model is defined by the following equations:
where
h is the number of hidden units,
is the small batch input of a given time unit
t,
is the hidden state of the data from the previous period,
is the sigmoid function,
and
are the weight matrix of the input gate, and
is the offset term of the input gate.
and
are the weight matrix of the forgetting gate, and
is the offset term of the forgetting gate.
is the candidate memory cells,
and
are the weight matrix of the gated unit, and
is the offset term of the gated unit.
is the new cell state at the current time, and
is the cell state at the previous time.
and
are the weight matrix of the output gate, and
is the offset term of the output gate (
Dai et al. 2022).
The hyperparameters of the models are specified in
Table 5. In order to make an even more accurate comparison, we tried to harmonize the hyperparameters of algorithms belonging to the same main type.
In the study, a total of eight commodities were included: Brent oil, copper, crude oil, gold, natural gas, palladium, platinum and silver. The data was downloaded from Yahoo Finance using Python. An appropriate database size is important in forecasting (
Hewamalage et al. 2022). The complete database contains the futures price data for the period 1 January 2010 to 31 August 2022, which was split into two parts. In the first case, focusing on the interval between 1 January 2010 and 31 August 2018, and in the second case, focusing on the interval between 1 January 2014 and 31 August 2022, we included commodity market instruments that are considered to be the most liquid, with turnover that stands out among other assets. For both studies, we split the datasets into learning and validation samples in the proportion of approximately 94% and 6%. In the first estimation, learning database covers the period between 1 January 2010 and 28 February 2018 (98 months), while the validation interval between 1 March 2018 and 31 August 2018 (6 months). In the second estimation (2022), the learning database covers the period between 1 January 2014 and 28 February 2022 (98 months), while the validation interval between 1 March 2022 and 31 August 2022 (6 months). The descriptive statistics of the commodity market for the full dataset are presented in
Table 6.
The two periods (2018 and 2022) reflect a significantly different general economic situation, which is not reflected in the descriptive statistics for the whole period. To obtain a more accurate picture and to assess the performance of the forecasting algorithms, it is important to know the period for which the forecast is made. This is shown in the following two tables (
Table 7 and
Table 8).
The two tables above show that not only are the average and median rates in 2022 higher, but the standard deviation is also several times higher than in 2018. The relative standard deviation values are also 2–3 times higher. In such an environment, the accuracy of the forecast is reduced, but its importance for enterprise risk management is increased.
According to the correlation matrix (
Table 9), Brent and crude oil (0.8344), gold and silver (0.8023) move together, while the other commodities show negligible correlation and no common movement.
3. Results and Discussion
The basis for the evaluation of the results is the MAPE indicator, which is scale-independent and therefore suitable for comparing both over time and across instruments. The results are calculated for forecast periods of 21 and 125 days based on five different forecasting algorithms for robustness. The forecast was made for both 2018 and 2022, the reason being that the macroeconomic environment had changed significantly by 2022, due to the Russian-Ukrainian war, among other factors. The different forecasting methods are described in detail in the methodology chapter.
The results of the Support Vector Machine (SVM) forecast are presented in
Table 10, which shows the mean absolute percentage error (MAPE) values per commodity for 2018 and 2022, for 21- and 125-day forecast horizons.
The average error of the SVM-based estimation ranges from 4.36% to 4.96% for the selected commodities in the sample, with no significant difference between the forecast horizons and the years under study. Of course, the longer the time interval, the worse the accuracy of the forecast, but not significantly. When looking on a product-by-product basis, more significant changes are already visible. For the two types of oil, but especially for crude oil, the 125-day forecast is estimated with a multiple error compared to the 21-day forecast. For natural gas, the 125-day estimate for 2022 is more than three times greater than in 2018, with a significant increase in pricing uncertainty. There is no more significant change in the forecast than this, with only silver showing an increase of 1.5 times error rate.
A special feature of the Random Forest (RF) decision tree is that it uses several types of decision trees, so it employs a very different methodological approach than the Support Vector Machine (SVM) that preceded it. The estimation results are presented in
Table 11, in the same structure as the SVM.
For this forecast procedure, there is already a significant difference between the overall mean error for 2018 and 2022. For the 21-day forecast for 2018 and 2022, the sample MAPEs are significantly different (p = 0.004), while for the 125-day forecast the difference is not statistically confirmed. Random forest gives forecasts that are orders of magnitude more accurate than SVM for the four-week time horizon. However, when looking at the year 2022, including the 125-day forecast, it can be seen there is a significant error in the RF. For natural gas, there is a difference is much higher over the six-month time horizon, a significant error of over 20%. In addition, it is important to highlight the positive results—for example, the gold price forecast shows an error of less than 1% in three out of four cases, which is outstanding, but relatively accurate forecasts are also seen for copper and silver.
The Extreme Gradient Boost (XGBoost) prediction can be classified in the same family of decision trees as Random Forest (RF). The results of XGBoost are shown in
Table 12.
Comparing the results of XGBoost with RF, very similar results can be seen, without any significant difference. For the 21-day forecast, the average difference between the years 2018 and 2022 is significantly different (
p = 0.003), i.e., on average XGBoost was able to give a more accurate forecast in 2018 than in 2022, with less noise. Similar to the RF, the longer-term natural gas forecast contains the largest error, and precious metals show the lowest MAPE values.
Díaz et al. (
2020) found that decision trees (random forest and gradient boosting regression tree) provide a more reliable prediction of the copper price than linear methods, but the random walk process performs still better.
The results of the Gated Recurrent Unit (GRU) forecast are shown in
Table 13. The GRU model can be classified as a family of neural networks, i.e., the prediction is based on an algorithm using artificial intelligence. This is a very different concept to decision trees.
The results of the Gated Recurrent Unit (GRU) confirm expectations, with the average MAPE value being the lowest in all categories compared to previous models (SVM, RF, XGBoost). The 21- and 125-day predictions for the year 2018 are considered to be outstandingly good, with an average error for the sample of around 1.3%. The results of the t-tests show that for the first time, the error rates for 2018 are statistically justified (
p = 0.002 and
p = 0.047) to be lower than the 2022 values for both time horizons (21 and 125 days). Based on these results, the GRU-based estimator produces very good forecasts for the four-week period in a relatively smooth economic period free of turbulence. It is also important to highlight that the forecasts for 2022 show the highest accuracy, with an improvement of more than 1 percentage point over the 125-day period to 2022 compared to previous models. The forecast for natural gas, which showed an error of over 20% for the decision trees, has been reduced to 11% for GRU, although this is still a significant error. It is also interesting to note that for the neural network-based estimation, the trend was reversed, with the longer time horizon estimation resulting in a lower average error, but the two results are not significantly different. As before, the prediction of precious metals shows high accuracy.
Ozdemir et al. (
2022) used GRU and LSTM models to forecast the price of nickel from 2022 to 2031 over the long term. The MAPE of the two estimation methods are very similar (about 7%), but the GRU models required on average 33% less computation time.
The Long-Short-Term Memory (LSTM) is part of a family of neural network models similar to the Gated Recurrent Unit (GRU) algorithm, and is built to provide reliable estimates primarily over longer time scales. The results of the LSTM are shown in
Table 14.
The LSTM results, similar to the GRU, are very convincing, but not better than the overall sample averages. It is important to note that the differences are only a few decimal points. At the 21-day time horizon, the 2018 values are significantly lower (
p = 0.005); the same can be said for the 125-day horizon at 10% confidence level (
p = 0.087).
Busari and Lim (
2021) used LSTM and GRU methodologies among others to forecast spot oil prices in their study. In their study, the forecast time horizon is 6 days and the training-validation database ratio is 75–25%. Their results show a forecasting accuracy (MAPE) of around 10–11%, compared to our results where the 21-day forecast error is below 2% for the year 2018 and 6.8–7% for the year 2022.
For natural gas, which is considered critical, the MAPE for the 125-day estimate for 2022 is 13.37%, also higher than the 11.4% for GRU. In another study, also dealing with natural gas forecasting, GRU outperformed LSTM models (
Wang et al. 2021). The MAPE value is higher than our own results, but the authors used weekly data. The most spectacular MAPE improvement was between hybrid GRU (PSO-GRU) and LSTM, with a difference of more than 1 percentage point. The advantage of LSTM should be reflected in the long-range forecast, but compared to GRU, it performed better in 3/8 cases for 2018 and 2/8 cases for 2022 for the longer 125-day time horizon. Of course, this does not exclude the possibility that LSTM might not perform better on a longer time horizon, but there is no conclusive data available to test this for 2022.
4. Conclusions
The study aimed to answer the question of how accurate the futures price forecasts of eight selected commodities (incorporating oil, gas and precious metals) using different models (decision trees, neural networks) are in different economic environments, and to what extent these forecasts can be used for corporate risk management. The six months of the year 2022 (from March to August), characterized by inflationary pressures, the Russian-Ukrainian war and global chip shortages, while the control period was chosen as six months in 2018 before the COVID-19 epidemic, which is considered a calm economic environment.
Enterprise risk management has different time horizons depending on the exposure to risk. For commodities, we assumed production lead times and warehousing, and therefore did not look at the very short term. Two periods were defined—a short period of one month (21 days), and a medium period (125 days). These are the time periods that can be planned for inventory management. Enterprise risk management comes to the fore in at least two respects when it comes to raw material replenishment. The necessary stock should be available at the right place and time, and stocks should be purchased at the best possible price. The second aspect is the use of forecasting algorithms of varying complexity as a decision support tool. The most accurate forecast possible helps achieve the so-called perfect timing that all investors—in this case, the purchasing department—desire. Purchasing at the best possible price means lower cost price and therefore higher profits and a market advantage over competitors.
The results show that forecast accuracy is higher in calmer economic environments, which is due to the fact that in a less volatile environment, forecasting is easier (see descriptive statistics for 2018 and 2022). More importantly, artificial intelligence neural networks also produce better results in commodity markets than decision trees and other approach models. For the year 2022, the MAPE (Mean Absolute Percentage Error) indicators show an average value of around 4%, i.e., the difference between the model estimate and the real data. In the control period, this indicator is around 1.5%. This difference is approximately equal to the increase in standard deviation and relative standard deviation that has occurred. Due to the Russian-Ukrainian war, oil and especially gas prices displayed the worst accuracy, followed by palladium. Precious metals price forecasts showed the highest accuracy. It can also be concluded that in the calmer economic period (also supported by the average MAPE values), we obtained a more accurate estimate in the case of shorter forecast periods, while the same conclusion cannot be drawn in the more volatile period (2022). Overall, we managed to achieve the most accurate estimate with the GRU model, which even slightly outperformed the LSTM algorithm. In the case of the examined instruments and periods, the weakest performance was produced by the SVM.
Our study is limited insofar as the models were only used individually and not combined. However, hybrid models may have several advantages over the traditional approach.
Liang et al. (
2022) used different hybrid neural models to predict spot and forward gold prices. Their results show that hybrid models provide more accurate estimates than LSTM models. The implementation of an error-correction hybrid model for copper price forecasting has resulted in significant MAPE improvements of anything up to 1 percentage point or more (
Luo et al. 2022). The study also does not cover the use of independent variables such as technical analysis tools. Their application can help the training of models and thus the recognition of past technical levels.