Modelling South African Gold Sales Using SARIMA, GARCH and Neural Networks

Tsoku, Johannes Tshepiso; Metsileng, Daniel; Botlhoko, Tshegofatso

doi:10.3390/math14081289

Open AccessArticle

Modelling South African Gold Sales Using SARIMA, GARCH and Neural Networks

by

Johannes Tshepiso Tsoku

¹

,

Daniel Metsileng

²

and

Tshegofatso Botlhoko

^2,*

¹

Department of Statistical Sciences, Sefako Makgatho Health Sciences University, Pretoria 0208, South Africa

²

Department of Business Statistics and Operations Research, Mafikeng Campus, North-West University, Mafikeng 2745, South Africa

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(8), 1289; https://doi.org/10.3390/math14081289

Submission received: 17 February 2026 / Revised: 7 April 2026 / Accepted: 9 April 2026 / Published: 13 April 2026

Download

Browse Figures

Versions Notes

Abstract

This study investigated the forecasting performance of the South African gold sales series using the seasonal autoregressive integrated moving average (SARIMA), generalised autoregressive conditionally heteroscedastic (GARCH), general regression neural network (GRNN) and artificial neural network (ANN)-based extreme learning machine (ELM). This study employed traditional methods and a recently developed ML method for single hidden-layer feed-forward neural networks (SLFNs). The findings revealed that

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

was considered the best model for the gold sales series. The empirical findings demonstrated that the SARIMA model outperforms neural network-based models, providing the South African government and its lenders with a more reliable and cost-effective tool for predicting foreign exchange earnings from gold. This study contributes to the literature by providing one of the first comparative evaluations of traditional time-series models and advanced neural network methods for forecasting South African gold sales. This study is novel as it is a first-of-its-kind comparative application of traditional SARIMA and GARCH models alongside GRNN and ANN-based ELM methods to South African gold sales, revealing the superior forecasting performance of a traditional SARIMA model over advanced ML approaches. Future research should explore the development and application of hybrid models that integrate the strengths of linear SARIMA frameworks with the pattern-recognition capabilities of nonlinear ANN-based ELM models.

Keywords:

ANN-based ELM; forecasting accuracy; gold sales; GRNN; SARIMA model

MSC:

62M10; 62P20; 91B84; 68T07

1. Introduction

This study investigates the forecasting performance of the seasonal autoregressive integrated moving average (SARIMA), generalised autoregressive conditionally heteroscedastic (GARCH), general regression neural network (GRNN) and artificial neural network (ANN)-based extreme learning machine (ELM) for the South African gold sales series. According to [1], the gold mining industry in South Africa has long been a cornerstone of the nation’s economy, historically serving as a primary driver of economic growth, employment, and export revenue. Nevertheless, the industry is characterised by intrinsic volatility, shaped by a complex interaction of elements such as changing global commodity prices, increasing extraction depths, escalating operational expenses, and shifting socio-political environments [2]. Thus, precise predictions of gold sales are crucial for strategic planning, investment choices, and policy development to maintain the sector’s stability and its ongoing contribution to the national economy [3].

Traditional statistical models, such as the SARIMA and GARCH, have been widely used for time-series forecasting in commodity markets [4,5]. While SARIMA effectively captures linear structures and seasonal patterns, GARCH models are proficient at modelling time-varying volatility conditional variance with constant parameters and clustering effects that often exist in economic data [6]. GARCH models are computationally efficient, easy to estimate, and widely understood, making them particularly suitable for policy-oriented and applied forecasting studies where transparency and interpretability are essential. In contrast, the generalised autoregressive score (GAS) proposed by [7] relies on the specification of a full conditional density and score-driven dynamics, which can increase model complexity and estimation uncertainty, especially in relatively low-frequency monthly data. However, a significant limitation of these classical techniques is their presumed linearity, which may render them insufficient for capturing the complex, nonlinear relationships inherent in gold market dynamics [8].

In recent years, sophisticated machine learning (ML) techniques have emerged as powerful alternatives, demonstrating remarkable success in modelling nonlinear and complex systems [9]. Techniques such as GRNN and ANN-based ELM offer significant advantages [10,11]. These models can autonomously learn complex patterns and nonlinear dependencies from historical data without requiring pre-specified relationships, potentially leading to more accurate and reliable forecasts in financial and resource economics [12].

Against this backdrop, this study presents a comparative investigation into the forecasting performance of a variety of models for the South African gold sales series. Specifically, this study examines and contrasts the predictive accuracy of the traditional SARIMA and GARCH models with the advanced ML capabilities of the GRNN and ANN-based ELM. By training and rigorously evaluating these models on historical data, this study aims to determine the most reliable and effective methodology for forecasting future trends in this critical sector. This study is among the first to apply and compare these advanced neural network architectures alongside traditional time-series models for forecasting South African gold sales, thereby contributing valuable insights for stakeholders in the mining industry and financial markets.

The rest of the study is structured as follows: Section 2 presents the literature review, Section 3 discusses the methodology, Section 4 is a discussion of findings and Section 5 presents the conclusion and recommendations.

2. Literature Review

2.1. Literature Review on Traditional Models

Various studies have been conducted to model gold sales/prices using different techniques. Among these, the ARIMA/SARIMA models have been a subject of exploration, along with the challenges associated with them. Studies such as [13,14,15,16] used linear statistical models while studies such as [17] employed nonlinear techniques. Different findings were obtained from different studies, bringing to the fore the complexity of gold sales/prices.

Traditional methods of gold price forecasting have been grounded in econometric and statistical time-series models. Among these, the ARIMA framework proposed by Box and Jenkins remains one of the most widely employed techniques for modelling univariate financial time series. ARIMA models tend to have limitations when the data exhibits cyclical or periodic structures. To address this limitation, ref. [13] employed the Box–Jenkins methodology to forecast South African gold sales using monthly gold sales data from January 2000 to June 2013 to account for seasonal effects present in the dataset. The analysis showed that the seasonal

A R I M A (4, 1, 4) \times {(0, 1, 1)}_{12}

model was the “best fit,” with a mean absolute percentage error (MAPE) value of 11%, indicating that the model was appropriate for forecasting future gold sales in South Africa. Additionally, the forecasted values indicated a decline in overall gold sales for the first six months of 2014.

Furthermore, the study by [15] utilised SARIMA for forecasting monthly rainfall and temperature in selected South Asian countries and showed the good performance of SARIMA to capture seasonality and long-term trends of climatic time series. SARIMA was able to produce reliable short-term predictions for both rainfall and temperature, and model diagnostics showed a good fit with residuals behaving like white noise. Outcomes revealed marked seasonality in rainfall and temperature, as well as long-term variability, which is essential for agriculture, water management and climate change efforts. The authors also determined that the SARIMA models could be a useful instrument for policymakers and researchers in formulating climate-resilient techniques for the area.

Accurate estimation of the seasonal trend of the Cooling Degree Days (CDDs) values is a crucial policy tool in determining the energy needed to cool buildings. To model the seasonal trend, ref. [18] researched the facility of the SARIMA model for CDD forecast in the prediction of energy demand from cooling systems. The results indicated that the SARIMA models are able to estimate the seasonal and trend factors of CDD data very well, and therefore, provide accurate and reliable forecasts. Model diagnostics revealed that residuals exhibited performance of a white noise series, suggesting a good model fit. The results suggested that SARIMA has high capacity in forecasting short- and medium-term CDD, which is useful for energy planning, climate change impact assessment and infrastructure management.

However, the ARIMA model is still regarded as a powerful tool used in time-series modelling. Studies such as [14] employed the ARIMA model to predict the daily gold price in USD for the first half of July 2018 through the establishment of an analysis of gold prices from July 2013 to June 2018, based on data from the World Gold Council. The study also used the autocorrelation function (ACF), partial autocorrelation function (PACF), Akaike Information Criterion (AIC), and Bayesian information criterion (BIC) to assess the accuracy of the models. The empirical results demonstrated that the

A R I M A (3, 1, 2)

model was the most effective for predicting the gold price in USD. Similarly, [19] used the ARIMA model to forecast gold prices in the short term, for which 196 observations of daily gold prices in USD were obtained for a period of 9 months, from 1 November 2019 to 31 July 2020. The results indicated that the adjusted ARIMA model provided better accuracy for predicting gold prices in the short run.

In a recent study, the forecasting performance of ARIMA and SARIMA was compared. Ref. [20] compared the forecasting efficiency of two classical time-series models, namely, ARIMA and SARIMA, to investigate the predictive dynamics of retail gold prices in India. The forecast accuracy was assessed using the MAPE, root mean square error (RMSE), and the Diebold–Mariano (DM) test. The findings revealed that while both models provide statistically acceptable forecasts, the SARIMA model consistently outperformed ARIMA across all error metrics. Given silver’s historical importance as a valuable material for over 6000 years, ref. [21] employed the ARIMA model to forecast the price of silver. The ARIMA model identified and explained around 26% of the observed changes in silver prices.

A precise and accurate model development for gold prices is critical in managing assets because of their unique features. The ANN model has been used for modelling the price of gold compared with the ARIMA model. Ref. [22] compared the performance of the ARIMA and ANN models in forecasting gold prices based on the value of RMSE. The study used daily gold price data from the World Gold Council, spanning from 3 September 2018 to 30 October 2020. ARIMA (4,1,0) was chosen as the best model for the time series based on the AIC. Long Short-Term Memory (LSTM) was selected as the method for the ANN to forecast the gold price. After comparing multiple-step forecasting and one-step-ahead forecasting using ARIMA and LSTM, it was found that LSTM had a smaller RMSE compared to ARIMA. The results in this study showed that the ANN model outperformed the ARIMA model in forecasting the gold price.

The ARIMA model was also compared to the volatility model by different authors. Ref. [23] employed ARIMA and GARCH models to predict the gold price. This study used daily gold price data obtained as secondary data from the investing.com website, covering the period from 12 March 2016 to 31 December 2020. The results showed that the GARCH (2,1) model provided an RMSE of 2.375454 and a mean absolute error (MAE) of 1.702908, and an MAPE of 0.001168113. The findings indicated that long-term investment in gold was very profitable, as the model suggested an upward trend in gold prices.

2.2. Literature Review on Machine Learning and Deep Learning Models

Gold prices have been modelled using ML and deep learning algorithms. The major concerns in financial markets are fluctuations and price prediction. Modern quantitative finance uses mathematical modelling to recognise the hidden patterns of financial asset prices as fundamental information in cases of risk management and investment planning [24]. Ref. [17] explored ANNs to forecast future gold prices. The study proposed intelligent networks that need to be enhanced with a meta-heuristic algorithm, the BAT algorithm, to enable the ANN to track price fluctuations. The designed model was compared to those in a published scientific study and other competitive models, such as the ARIMA, ANN, adaptive neuro-fuzzy inference system (ANFIS), MLP Neural Network, RBF Neural Network, and GRNN. To evaluate the performance of these models, RMSE was used as the error index. The results showed that the proposed BAT Neural Network (BNN) outperformed both conventional and modern forecasting models.

The study by [25] evaluated the performance of the Gated Recurrent Unit (GRU) in predicting gold prices. The study employed mean squared error (MSE) and MAE to evaluate the model. Using MSE and MAE, the findings revealed that the GRU model was able to predict gold prices with an adequate level of accuracy with the best performance found in a batch size of 64 with 100 epochs.

Hybrid models have been introduced to improve the precision of stand-alone models. The study by [26] compared the performance of ARIMA, ANN and their hybrid model using a series of gold prices. In order to evaluate the performance of the models, the study employed MSE, MAE and MAPE. The findings revealed that hybrid models can improve the forecasting accuracy over stand-alone models. Similarly, ref. [16] explored the ARIMA, GARCH, and TGARCH models to predict gold prices. The dataset consisted of daily gold price data from 1 January 2021 to 13 October 2023. The results indicated that the hybrid ARIMA (2,1,3)-GARCH (1,1) model produced the lowest MAE (106.712) and RMSE (126.788) values, suggesting that it offers superior prediction accuracy compared to the other models. Therefore, the ARIMA (2,1,3)-GARCH (1,1) model provided more precise forecasts, making it the most reliable for predicting gold prices.

The hybrid model was also employed in the study conducted by [27] using the optimising technique. The study compared the forecasting performance of classical ANN and the hybrid model of ANN and genetic algorithm (GA) using a gold price series. GA was used to optimise the ANN neurons. The study employed three error metrics, namely, MAE, RMSE and MAPE, to compare the forecasting accuracy of the models. The findings revealed that the hybrid model of ANN and GA had better forecast accuracy.

The literature on gold production, sales, and price forecasting reveals extensive use of linear econometric models, nonlinear techniques, and hybrid approaches, reflecting the complex and dynamic nature of gold-related time series. Traditional methods, particularly ARIMA and its seasonal extension SARIMA, have been widely applied due to their robustness in modelling trends and seasonality. Studies such as [13,14,20] demonstrate that while ARIMA models can generate reliable short-term forecasts, SARIMA generally provides superior performance when seasonality is present. In addition, volatility-based models such as GARCH have proven effective in capturing time-varying variance and persistence in gold price movements [16,23]. Despite their strengths, these linear and volatility models are limited in their ability to fully capture the nonlinear dynamics inherent in gold markets.

To address these limitations, recent studies have increasingly adopted ML and hybrid modelling frameworks. ANNs and deep learning models, including ANN, LSTM, GRU, and GRNN, have consistently outperformed traditional econometric models in several empirical applications [17,22,25]. Moreover, hybrid models that combine linear and nonlinear structures such as ARIMA–GARCH and ANN–GA have been shown to further enhance forecasting accuracy by jointly modelling trend, volatility, and nonlinear patterns [16,26,27]. Nevertheless, findings remain context-dependent, with limited comparative evidence for South African gold sales. Consequently, the present study contributes to the literature by jointly evaluating the forecasting performance of SARIMA, GARCH, GRNN, and ANN-based ELM models using South African gold sales data.

3. Methodology

This section describes the dataset, preprocessing steps, and methodological framework employed in this study.

3.1. Data Preparation and Preprocessing

3.1.1. Data

The study employed monthly gold sales time-series data ranging from January 2003 to July 2024, with 259 observations sourced from Statistics South Africa (StatsSA). The data is publicly available and accessed on 25 February 2025 at https://www.statssa.gov.za (access on 12 March 2026): Table P2041: Mining Production and Sales, series MVK24000: Mineral sales according to mining divisions, mineral groups and minerals, Gold. SARIMA, GRNN, and ANN-based ELM have been used individually in different studies to model the linear and nonlinear characteristics of time-series data. The use of GRNN and ANN-based ELM is justified over ML and other deep learning models for several theoretical and practical reasons. First, GRNN and ANN-based ELM are well suited for small-to-medium-sized datasets, such as monthly South African gold sales data. Second, GRNN and ANN-based ELM possess fast training speeds and simpler network architectures, reducing computational cost and the risk of overfitting compared to other deep learning models with multiple hidden layers and numerous hyperparameters. Lastly, these models are effective at capturing nonlinear relationships and complex patterns in time-series data without the need for extensive parameter tuning or long training horizons. The use of the SARIMA model is justified when compared with other seasonal time-series models due to its flexibility, statistical robustness, and strong forecasting performance in modelling stochastic seasonal processes [4]. Furthermore, SARIMA remains widely recognised as a benchmark seasonal forecasting model due to its parsimony, interpretability, and strong statistical foundation. This study suggests employing linear and nonlinear models to investigate their performance in modelling gold sales time-series data.

3.1.2. Data Split and Normalisation

The dataset was split into two subsets, namely, 80% for training and 20% for testing. The train–test split was chosen because it strikes an optimal balance between model training and testing evaluation. Allocating 80% to training ensures that there is enough historical data to reliably capture the underlying patterns, seasonality, and volatility in the gold sales series. At the same time, reserving 20% for testing provides a sufficiently large hold-out sample to robustly assess the models’ predictive performance on unseen data [28,29].

The target variable, monthly gold sales, was normalised using Min–Max scaling in the range [0, 1] for ML models. To ensure stationarity, the series was differenced before fitting the SARIMA and GARCH models. For comparability, the same differenced series was used to train the neural network models. Additionally, normalisation was applied only to the neural network models, as ML algorithms are sensitive to feature scales, whereas SARIMA and GARCH models are not. Table 1 summarises the data handling and preprocessing for each model.

3.1.3. Model Architecture

The GRNN was implemented as a non-parametric kernel-based neural network for time-series prediction. GRNN estimates the conditional expectation of the target variable using a Gaussian kernel function based on the distance between training observations and test observations. Given an input vector, x, the GRNN prediction is computed as a weighted average of observed outputs, where the weights are determined by the Gaussian spread parameter (σ). In this study, the spread parameter was set to 0.1 to control the smoothness of the regression function. Since GRNN relies on distance calculations, the input variables were normalised prior to model training to improve numerical stability and ensure comparable feature magnitudes.

The ANN-ELM model was implemented as a hybrid two-stage neural architecture combining ANN and an ELM. First, an ANN with two hidden layers (10 and 5 neurons) was trained using backpropagation to capture nonlinear patterns, and its predictions were used as inputs to the ELM to enhance predictive performance through model stacking. The ELM randomly initialised hidden-layer weights and biases, computed hidden-layer outputs using a ReLU activation function, and analytically estimated output weights using Ridge regression, resulting in faster training and improved numerical stability compared to traditional neural networks.

3.1.4. Software

The analysis was conducted using Python version 2022.3.3 software, and the details of the models are discussed in the following subsections.

3.2. Seasonal Autoregressive Integrated Moving Average (SARIMA) Model

For the past three decades, ARIMA models have been widely used in numerous fields for time-series forecasting. Introduced by [4] in the early 1970s, the ARIMA model has become a proven and reliable method for predicting time-series data [4]. The general form of ARIMA

(p, d, q)

is given as follows:

φ_{p} (B) {(1 - B)}^{d} (y_{t} - μ) = θ_{q} (B) ε_{t}

(1)

where

φ_{p} (B) = 1 - \sum_{i = 1}^{p} φ_{i} B^{i}, θ_{q} (B) = 1 - \sum_{j = 1}^{q} θ_{j} B^{j}

are polynomials in terms of

B

degrees of freedom and

p

and

q

respectively;

\nabla = (1 - B)

and

B

represents the backward shift operator. The seasonal ARIMA

(S A R I M A (p, d, q) \times (P, D, Q)_{s})

is an extension of the ARIMA model which aims to improve the performance of the ARIMA model in modelling and predicting time series with seasonal effects. It is multiplicative in nature. The mathematical representation of SARIMA is given as follows:

φ_{p} (B) Φ_{p} (B^{s}) {(1 - B)}^{d} {(1 - B^{s})}^{D} X_{t} = θ_{0} + θ_{q} (B) θ_{Q} (B^{s}) ε_{t}

(2)

The SARIMA

(p, d, q) (P, D, Q) s

model is a widely used time-series forecasting method that consists of seven key parameters, namely,

p

: the number of autoregressive (AR) terms;

d

: the degree of differencing required to make the series stationary;

q

: the number of moving average (MA) terms;

P

: the seasonal AR lags;

D

: the degree of seasonal differencing; Q: the seasonal MA lags; and

s

: the length of the seasonal cycle.

Φ_{p} (B^{s})

and

θ_{Q} (B^{s})

are seasonal AR and MA polynomials of P and Q respectively.

(1 - B)

and

(1 - B^{s})

denote the nonseasonal and seasonal differencing operators respectively.

The Box–Jenkins approach used in this study consists of three iterative stages: model identification, parameter estimation and diagnostic testing.

I.: Model identification

The initial step in the Box–Jenkins methodology is model identification, which focuses on assessing whether the time series is stationary and determining the necessary level of differencing if required. Stationarity is a key assumption for ARIMA/SARIMA models, as failing to achieve it can result in inaccurate forecasts. A time series is deemed stationary when its statistical characteristics, including mean and variance, remain consistent over time [4]. To evaluate stationarity, methods such as the visual inspection of time-series plots, statistical tests such as the Augmented Dickey–Fuller (ADF) test pioneered by [30] and outlined by [13], and autocorrelation function (ACF) plots are frequently employed. If the time series exhibits non-stationary behaviour, techniques like differencing or logarithmic transformations can be used to stabilise its statistical properties.

Ref. [30] proposed the Augmented Dickey–Fuller (ADF) test as a formal approach to detect the presence of a unit root in a time series. Subsequently, in 1992, Kwiatkowski, Phillips, Schmidt, and Shin introduced the KPSS test as an alternative or complementary method to the ADF test, offering a different perspective on stationarity by testing trends and levels of stationarity [31]. This study employs visual plots as well as both the ADF and KPSS formal tests to assess stationarity in the time-series data. Furthermore, ACF and PACF are analysed to determine possible AR and MA components, aiding in the selection of appropriate model parameters.

II.: Parameter estimation

After identifying the appropriate model structure, the next phase is estimation, which involves determining the parameters of the ARIMA model. This is commonly achieved through methods such as Maximum Likelihood Estimation (MLE), which identifies the parameter values that maximise the likelihood of the observed data. Once the parameters are estimated, model selection is conducted using statistical criteria such as the AIC and BIC. These metrics assist in comparing models by weighing the trade-off between goodness of fit and complexity, with lower AIC or BIC values indicating a more suitable model. Choosing the best model at this stage is essential for ensuring precise forecasts while preventing overfitting.

III.: Diagnostic testing

Diagnostic testing in the Box–Jenkins methodology focuses on evaluating the statistical properties of the error terms, specifically the normality assumption and the weak white noise assumption. These assumptions ensure that the residuals (errors) from the fitted model do not exhibit any predictable patterns or structures. To assess the model’s overall validity, the Ljung–Box test proposed by [32] is commonly used. This test examines whether the residuals are independently distributed, effectively checking for any remaining autocorrelation. If the Ljung–Box test indicates significant autocorrelation, it suggests that the model has not fully captured the underlying data patterns, and further adjustments to the model may be required. The hypotheses to be tested are as follows:

H0.

The model is adequate.

H1.

The model is inadequate.

The test statistic for the Ljung–Box test is given as follows:

L B^{*} = n^{'} (n^{'} + 2) \sum_{l = 1}^{z} \frac{1}{(n^{'} - l)} r_{l}^{2} (\hat{α})

(3)

where, in

n^{'} = n - d

,

n

denotes the number of observations and

d

is the degree of freedom of nonseasonal differencing used to transform the original series into stationary. The

r_{l}^{2} (\hat{α})

denotes the square of the autocorrelation of the residuals at lag

l

[33].

3.3. Generalised Autoregressive Conditionally Heteroscedastic (GARCH) Model

In 1982, ref. [5] introduced autoregressive conditional heteroscedasticity (ARCH) models to account for the time-varying volatility frequently observed in economic and financial time-series data. Later, in 1986, ref. [6] expanded on this concept by developing generalised autoregressive conditional heteroscedasticity (GARCH) models, which efficiently model the dynamics of conditional heteroscedasticity as a variance process. In GARCH models, the squared volatility,

σ^{2}

, is influenced by both past squared volatilities and past squared values of the model. This characteristic makes them a generalised form of ARCH models. The GARCH

(p, q)

process is given as follows:

σ_{t}^{2} = α_{0} + \sum_{i = 1}^{q} α_{i} c_{t - i}^{2} + \sum_{j = 1}^{p} β_{j} σ_{t - j}^{2}

(4)

where

α_{0}

,

α_{i} \geq 0

,

i = 1, 2, \dots, q - 1, q

, and

β_{j} \geq 0, j = 1, 2, \dots, p - 1, p

to ensure that the conditional variance remains positive. The constraint on

\sum_{i = 1}^{\max (p, q)} (α_{i} + β_{j})

also ensures that the unconditional variance in the

c_{t}

is finite while its conditional variance

σ_{t}^{2}

changes over time [34]. Suppose

p = q = 1

from GARCH

(p, q),

then the random variable

c_{t}

has a

G A R C H (1, 1)

model if the following is true:

\in_{t} | Ʈ_{t - 1} ~ N (0, σ_{t}^{2}

(5)

σ_{t}^{2} = α_{0} + α_{1} c_{t - 1}^{2} + β_{1} σ_{t - 1}^{2}

(6)

The parameters of a GARCH

(1, 1)

model can be estimated using the Maximum Likelihood Estimation (MLE) method, which optimises the likelihood function to find the values that best fit the observed data.

3.4. Generalised Regression Neural Network (GRNN)

The general regression neural network (GRNN) was first introduced by [10] in 1991 and is widely recognised for its advantages as a meta-modelling algorithm. The GRNN model is based on nonlinear regression theory, which allows it to model complex, nonlinear relationships between inputs and outputs using a non-parametric approach. According to [35], GRNN is based on non-parametric regression principles, relying on sampled data and employing Parzen non-parametric estimation to determine network output using the maximum probability principle. Additionally, unlike backpropagation-based methods, GRNN does not require an iterative training process. GRNN excels in nonlinear approximation.

According to [36], a GRNN consists of four layers: input, pattern, summation, and output. In the input layer, data are taken in by a number of observed parameters equal to the number of input units. The input layer receives data through multiple observed parameters, corresponding to input units. The pattern layer stores training patterns, while the summation layer contains two types of neurons: single-division neurons, which connect to the pattern layer, and summation neurons, which link to the output layer. The hidden and output layers utilise radial basis and linear activation functions, respectively. Each hidden neuron corresponds to one training pattern, which allows the network to perform non-parametric regression and estimate the probability density function of the underlying data. Learning in the GRNN is instantaneous. Finally, the output layer normalises the output by dividing the output of each S-summation neuron by the output of each D-summation neuron, producing the predicted value

Y_{i}

for the given unknown input vector

x

computed as follows:

Y_{i} = \frac{\sum_{i = 1}^{n} y_{i} . e x p [- D (x, x_{i})]}{\sum_{i = 1}^{n} e x p [- D (x, x_{i})]}

(7)

where

D (x, x_{i}) = \sum_{k = 1}^{m} {(\frac{x_{i} - x_{i k}}{σ})}^{2}

(8)

where

n

represents the training pattern numbers;

y_{i}

represents the weighted connection between the ith pattern layer neuron and the S-summation neuron; the Gaussian function is denoted by

D

;

m

denotes the number of input vector elements; and

x_{k}

and

x_{i k}

are the jth element of

x

and

x_{i}

, respectively. The optimal value of the spread parameter (

σ

) is determined experimentally. One of the key advantages of GRNN is its rapid learning capability and its ability to achieve an optimal regression surface as the sample size grows. This makes GRNN particularly useful in real-time applications with limited data, as it can quickly establish the regression surface even with a small number of samples [37]. Shapley additive explanations (SHAP) were employed to quantify the contribution of each lagged input to the model’s predictions. This approach enables interpretability of nonlinear temporal dependencies captured by the GRNN, identifying dominant short-term and seasonal memory effects. The structure of the GRNN architecture is visually represented in Figure 1, providing a schematic overview of its four key layers: input, pattern, summation, and output.

3.5. Extreme Learning Machine (ELM)

An artificial neural network (ANN), inspired by the human nervous system, is a widely used tool in artificial intelligence, particularly for tasks such as prediction, pattern recognition, and classification [38]. According to [38], the performance of ANN-based techniques heavily depends on the careful tuning of key parameters, including the number of hidden layers, nodes, weights, and the choice of transfer function. However, extensive research and practical applications have revealed certain limitations of this approach [39]. Ref. [11] highlighted several drawbacks associated with traditional ANN methods, such as long computation times, difficulties in determining stopping criteria, challenges in managing the learning rate and epochs, susceptibility to local minima, and the need for extensive fine-tuning.

To overcome these limitations, ref. [40] introduced a novel learning algorithm designed for SLFN, known as the extreme learning machine (ELM). In this approach, input weights and hidden biases are randomly assigned, while output weights are determined analytically using the Moore–Penrose (MP) generalised inverse method. In this study, each training sample consisted of the normalised gold sale together with its corresponding time index as the input feature vector, while the target variable was the normalised gold sale. Prior to model training, the data were normalised to ensure comparable feature magnitudes and improve numerical stability of the hidden layer activations. Given a training dataset of NN unique samples

(x_{i}, t_{i}) \in ℝ^{n} \times ℝ^{m}

, the output with zero error for the SLFN with

\hat{N}

hidden neurons can be expressed as follows:

\sum_{i = 1}^{\hat{N}} β_{i} g (w_{i}, x_{j}, b_{i}) = t_{j}, j = 1, 2, \dots, N

(9)

where

w_{i}

is the input weights;

β_{i}

denotes the weights connecting the hidden-to-output layer; and

b_{i}

is the biases in the hidden layer. The matrix representation of the

N

equations in Equation (9) is given as follows:

H β = T

(10)

where

H = {[\begin{matrix} g (w_{1}, x_{1}, b_{1}) & \dots & g (w_{L}, x_{1}, b_{L}) \\ ⋮ & ⋱ & ⋮ \\ g (w_{1}, x_{N}, b_{1}) & \dots & g (w_{L}, x_{N}, b_{L}) \end{matrix}]}_{N \times \hat{N}}, β = {[\begin{matrix} β_{1} \\ β_{2} \\ ⋮ \\ β_{L} \end{matrix}]}_{\hat{N} \times m} and T = {[\begin{matrix} t_{1} \\ t_{2} \\ ⋮ \\ t_{N} \end{matrix}]}_{N \times m}

Since the weights

w_{i}

and biases

b_{i}

are assigned randomly, the weight vector

β

is the only parameter that needs to be estimated. However, the structure of the hidden layer output weight matrix

H

depends on the data sample and

\hat{N},

meaning that Equation (10) may not always hold. As a result, estimating

β

is essentially reformulated as a least squares optimisation problem, expressed in the following form:

Ω β = m i n {‖ H β - T ‖}_{2}^{2}

(11)

Ref. [41] stated that, according to optimisation theory, the solution that minimises the objective function

Ω β

is given as follows:

β = H^{†} T

(12)

where

H^{†} = {(H^{T} H)}^{- 1} H^{T}

which is known as the MP generalised inverse (also called Pseudo inverse) of

H

. The key difference between ELM and traditional neural network approaches is that, in ELM, there is no need to fine-tune all the parameters of the feedforward network, such as the input weights and hidden layer biases [42]. The number of hidden neurons and activation/spread selection help control model complexity. SHAP was also computed to interpret the nonlinear forecasts by quantifying the contribution of each lagged input. This approach provides insight into the temporal memory structure learned by the ELM, distinguishing between short-term persistence and seasonal effects. Figure 2 depicts the schematic structure of ELM.

3.6. Evaluation of the Forecasting Performance of the Models

In this study, evaluation metrics are employed to gauge the effectiveness of the proposed models. These metrics are RMSE, MAE, MAPE, MFE and Theil’s U. The metrics are computed using the following equations respectively:

R M S E = \sqrt{\frac{\sum_{i = 1}^{N} {(x_{i} - z_{i})}^{2}}{N}}

(13)

M A E = \frac{1}{n} \sum_{i = 1}^{N} |x_{i} - z_{i}|

(14)

M A P E = \frac{1}{n} \sum_{i = 1}^{N} |\frac{x_{i} - z_{i}}{x_{i}}| \times 100

(15)

M F E = \frac{1}{n} \sum_{i = 1}^{N} (x_{i} - z_{i})

(16)

U = \frac{\sqrt{\frac{1}{n} \sum_{i = 1}^{N} {(z_{i} - x_{i})}^{2}}}{\sqrt{\sqrt{\frac{1}{n} \sum_{i = 1}^{N} {(x_{i} - x_{i - 1})}^{2}}}}

(17)

where

x_{i}

denotes the actual value and

z_{i}

represents the predicted values of the gold sales while

N

is the total number of observations. Furthermore, the Diebold–Mariano (DM) test developed by [43] was employed to statistically compare the forecast accuracy of traditional time-series forecasting models with the ML techniques employed in the study to test whether differences in their forecasting performance are statistically significant. The DM test statistic is computed using the following formula:

D M = \frac{\bar{d}}{\sqrt{\frac{{\hat{γ}}_{d} (0) + 2 \sum_{k = 1}^{m - 1} {\hat{γ}}_{d} (k)}{T}}}

(18)

where

\bar{d}

is the average loss differential between the two models;

{\hat{γ}}_{d} (k)

represents the estimated autocovariance of the loss differential at lag k; M is the truncation lag (also called the bandwidth); and T denotes the number of forecasts (sample size).

4. Discussion of Findings

This section presents an analysis of the study’s findings, with the results illustrated through tables and figures.

4.1. Explanatory Data Analysis (EDA)

The EDA was performed to understand the characteristics of the dataset. The results are presented in Table 2.

The gold sales dataset consists of 259 observations with a mean value of 5602.078 and a median value of 5114.900, which indicates that the data is slightly skewed to the right, as the mean is greater than the median. The minimum value observed is 1477.10, while the maximum reaches as high as 20,492.500, which reflects a wide range of values. The standard deviation of 2997.16, along with the high variance of 8,982,938.18, indicates significant variability in the dataset, suggesting that the data points are widely spread out around the mean. Figure 3 provides a visual representation of the gold sales.

As shown in Figure 3, the gold sales plot appears to be non-stationary, showing noticeable fluctuations over the sample period. Visual inspection suggests that the series is non-stationary. To confirm this, a formal stationarity test was conducted, with the results detailed in Table 3.

The results in Table 3 demonstrate that the p-value of the ADF test at level is 0.985, which indicates non-stationarity since it is greater than the 0.05 significance level. Similarly, the KPSS test at level yielded a p-value of 0.010, suggesting non-stationarity as well. However, after first differencing, both tests confirmed stationarity. The ADF test reported a p-value of 0.000, while the KPSS test yielded a p-value of 0.100, which is above the 0.05 threshold, indicating that the study failed to reject the null hypothesis of stationarity. Overall, these results confirm that the time series is integrated of order 1, I (1).

4.2. Results of the SARIMA Model

In this section, the ACF and PACF are used to identify the appropriate order for the time-series model. The ACF and PACF plots are presented in Figure 4.

According to Figure 4, the identified competing SARIMA models were

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

,

SARIMA (0, 1, 1) {(2, 1, 1)}_{12}

,

SARIMA (0, 1, 1) {(1, 1, 1)}_{12}

and

SARIMA (0, 1, 1) {(1, 1, 2)}_{12}

. Using the AIC,

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

was deemed most suitable for gold sales. The parameter estimates for this model are summarised in Table 4.

The results in Table 4 reveal that all the variables in the model are statistically significant, as all their p-values are below the threshold of 0.05. The MA(1) variable, with a coefficient of 0.797, has a strong impact on the model, highlighting the role of past errors at lag 1. The model also reveals that seasonal effects are important, with seasonal AR(12) and seasonal AR(24) capturing yearly and two-year seasonal patterns, with coefficients of 1.546 and 0.637, respectively. The seasonal MA(12) and seasonal MA(24) terms, with coefficients of 1.515 and 0.698, indicate a strong positive effect on the model from errors at lag 12 and 24. Finally, the significant Sigma2 value of 0.0671 confirms that the model accurately captures the error variance, ensuring a robust fit. Overall, the results suggest that the

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

model successfully incorporates both autocorrelation and seasonality, making it effective for capturing complex temporal patterns in the data. The results of the diagnostic tests for the fitted SARIMA model are summarised in Table 5.

The results summarised in Table 5 from the JB test show that the residuals are not normally distributed, as the p-value is less than 0.05. Additionally, the Ljung–Box Q test results, with a p-value of 0.590, exceed the 0.05 significance level, suggesting that there is sufficient statistical evidence to support the adequacy of the

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

model. Figure 5 displays the observed versus the fitted values of the SARIMA model.

Figure 5 shows the observed as well as fitted values of the SARIMA model. As evident from the graph, the SARIMA model captures the general trend and direction of returns, but it smooths out extreme spikes and sharp fluctuations, underestimating some of the higher volatility periods. The close alignment in most periods indicates that the model fits reasonably well in stable market conditions, though it struggles to fully replicate extreme market movements. The log returns also exhibit mean-reverting behaviour at around zero, which is a typical characteristic of financial and commodity return series. Furthermore, from around 2020 onwards, the series shows more pronounced volatility and extreme fluctuations, possibly reflecting market disruptions associated with the COVID-19 pandemic and related economic uncertainty. This increase in volatility may also explain the larger prediction deviations observed during this period, as sudden market shocks are generally more difficult for SARIMA models to capture. The results of the GARCH model are presented in Section 4.3.

4.3. Results of the GARCH Model

Table 6 presents the estimation results of the GARCH model, which was performed using the differenced data.

The results in Table 6 revealed that the mean (µ) estimate of -0.017 is not statistically significant with a p-value of 0.160, suggesting the average return is not meaningfully different from zero. Similarly, the omega (Ω) parameter, which represents the constant in the volatility equation, is also not significant (p = 0.340), indicating that the base level of volatility is small and not statistically different from zero. In contrast, gamma1

(γ_{1})

, with an estimate of 0.355 and a p-value of 0.031, is significant, suggesting that past shocks or asymmetries have an important effect on current volatility. Lastly, beta1

(β_{1})

has a strong and highly significant estimate of 0.823, indicating that past volatility carries over and has a major influence on current volatility, highlighting persistence in the volatility process.

4.4. SHAP Analysis Results of GRNN Model and ANN-ELM Model

To gain deeper insights into the predictive behaviour of the neural networks, SHAP analysis was performed for both GRNN and ANN-ELM models in Section 4.4 and Section 4.5 respectively. To visually interpret the contributions of individual lagged features to model predictions, a GRNN SHAP value diagram was generated to illustrate the magnitude and direction of each feature’s impact on the predicted gold sales values.

According to Figure 5, the SHAP results indicate that lag 1 is the most dominant feature, implying that the most recent observation plays a crucial role in driving the model’s predictions. This provides evidence that the model places substantial emphasis on immediate past values when generating forecasts. This suggests that the GRNN model is strongly driven by short-term temporal dependence, indicating that recent gold sales observations contain the most relevant predictive information for the model. Moreover, lags 6 and 7 show moderate contributions, reflecting the presence of some medium-term influence in the predictive structure. This may reflect delayed responses in the gold sales series, where the effects of economic or market shocks may take several months to fully influence observed sales patterns. The SHAP distribution further shows that higher values of lag 1 tend to increase the predicted outcome, while lag 7 exhibits a more mixed or opposite effect, suggesting possible delayed adjustment effects in the series. In contrast, lags 11, 8, 3, 5 and 12 exhibit very small SHAP values, indicating minimal long-term or seasonal dependence within the model. This confirms that the model relies predominantly on short-term memory effects, with limited capacity to capture longer-term seasonal patterns. This partially explains the weaker performance of the GRNN model observed in Table 7, as the model appears to underutilise longer-term structural information that may be important for capturing seasonal dynamics in gold sales. For decision-makers, this suggests that recent trends are the most important for short-term prediction, while older observations provide supporting information for more stable forecasting. The SHAP values are presented in Table 7.

Table 8 presents the mean absolute SHAP values, summarising the overall contribution of each lag to the model’s output. The results confirm the relative importance of the lags observed in the SHAP value diagram. To visually interpret the contributions of individual lagged features to model predictions, Figure 6 presents the ANN-ELM SHAP value diagram generated to illustrate the magnitude and direction of each feature’s impact on the predicted gold sales values.

According to Figure 7, the SHAP results indicate that lag 12 is the most influential feature, with the highest mean absolute SHAP value of 0.019797. This reveals that the value from 12 periods prior plays a significant role in shaping current predictions, pointing to strong seasonal or long-term dependency in the model. The SHAP distribution further shows that higher values of lag 12 tend to push the predictions upward, confirming its strong positive contribution to predictability, while lower values tend to reduce the predicted outcome. Additionally, lags 7, 6, and 9 have comparatively substantial contributions, suggesting that the model’s predictions are significantly impacted by medium-term historical values. The spread of SHAP values for these lags also indicates that they provide complementary predictive information by capturing delayed temporal effects. This implies that while making predictions, the model takes into account intermediate past data in addition to seasonality. In contrast, more recent lags such as lag 1 and lag 2 have smaller contributions compared to lag 12 and the mid-range lags, implying that short-term fluctuations are less dominant in the predictive process. Furthermore, lags 3, 5, and 11 exhibit minimal SHAP values, indicating that these specific past observations contribute very little to the model’s output. Overall, the diagram highlights that the model captures strong seasonal and medium-term dynamics, rather than relying on short-term memory. From a decision-making perspective, this suggests that historical seasonal patterns are the most informative for forecasting, while very recent changes have limited influence. Overall, the SHAP analysis confirms that ANN-ELM emphasises medium- and long-term dynamics, providing an interpretable understanding of how seasonal and cyclical effects drive gold sales forecasts. The SHAP values are presented in Table 8.

Table 8 provides the mean absolute SHAP values, which quantify the overall contribution of each lag to the model’s predictions. The ranking of lags demonstrates that lags 12 and 7 exert the greatest influence, while the remaining predictors have comparatively smaller effects. The performance of the GARCH model is compared with that of other models such as

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

, GRNN and ANN-ELM in Section 4.5.

4.5. Comparison of SARIMA, GARCH, GRNN and ANN-Based ELM Models

To assess the forecasting performance of the best

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

, GARCH, GRNN and ANN-based ELM models, RMSE, MAE, MAPE, MFE and Theil’s U were computed for periods pre-COVID-19, COVID-19 and beyond, and for the overall periods. The results are summarised in Table 9.

According to the results presented in Table 9, the performance comparison of the models in the pre-COVID-19 regime reveals that the ANN-ELM model outperformed all other models in all the error matrices, implying that it captured the pre-COVID-19 patterns very well. This conclusion is supported by its lowest RMSE of 0.157 and MAE of 0.112, which indicate smaller forecast deviations compared to the other competing models. Additionally, all the models in the regime presented negative MFE; this indicates that the models overestimated the actual gold sales on average and that there is a systematic upward bias in the forecasts.

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

is the second-best performing model of the four and the GRNN model performed worst as it struggled to capture the pre-COVID-19 data patterns.

In the COVID-19 and beyond regime, all errors increased compared to pre-COVID, likely due to market volatility and structural changes. However, the ANN-ELM model again performed best, showing robustness even under extreme volatility. In the same regime, SARIMA’s performance dropped significantly, with RMSE increasing from 0.202 to 0.424 and MAE increasing from 0.141 to 0.347. Similar increases were observed in MAPE and Theil’s U, indicating a general deterioration in forecast accuracy under heightened volatility. Again, all models overestimated gold sales across the regime, identified by the negative MFE’s. GARCH and GRNN showed similarly large errors, highlighting the difficulty of capturing extreme events and nonlinearities during the period.

Over the full dataset, the results revealed that the

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

model achieved the best results, with the lowest RMSE of 0.260, MAE of 0.184 and MFE of -0.016. This indicates its effectiveness in capturing the underlying patterns and seasonality in the data. The ANN-based ELM model followed as the second-best performer, outperforming both the GRNN and GARCH models with an MAPE value of 45.68 and Theil’s U of 0.241. The results showed that the GARCH model performed worse than both the

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

and ANN-based ELM models but better than the GRNN model. In this regime, the GRNN model demonstrated the weakest performance, indicating its relative unsuitability for this dataset. Although ANN-ELM achieved the lowest MAPE (45.68%) and Theil’s U (0.241), indicating better relative percentage accuracy and benchmark performance, SARIMA maintained superior performance in terms of absolute error minimisation, as reflected by RMSE and MAE. This suggests that while ANN-ELM provides competitive performance, SARIMA provides a more stable overall forecast accuracy across the full sample period.

Overall,

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

, was selected as the best performing model due to its ability to effectively capture seasonal and trend patterns in the overall sample. It demonstrated consistency and reliability across the full dataset, providing stable and interpretable forecasts that balance accuracy and practical decision-making. The results highlight the strength of traditional statistical approaches such as the specified SARIMA model in this context, while also acknowledging the potential of advanced ML models such as ANN-based ELM. Therefore, it is concluded that the selected traditional model performed effectively in modelling the South African gold sales data. This is in contradiction with the study by [17] which found the opposite to be true. The study in [23] also revealed that the ANN outperformed the traditional ARIMA, which contradicts the findings of the current study. These differences may be attributed to variations in datasets, volatility conditions, model tuning procedures and evaluation frameworks.

4.6. DM Test Results for Forecast Comparison

Table 10 presents the results of the DM test for the overall sample.

The results in Table 10 suggest that the difference in forecast accuracy between the

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

and ANN-ELM models is not statistically significant at the 5% level. This implies that although the SARIMA model exhibits marginally lower forecast errors, the improvement is not statistically distinguishable from that of ANN-ELM. The DM test confirms the

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

model as the better performer for the prediction of gold sales.

4.7. One Step Ahead Forecast

Table 11 presents a two-year forecast using SARIMA as the best performing model.

Table 11 presents the two-year SARIMA forecasts for the log-differenced monthly gold sales. Positive forecast values indicate expected month-to-month increases in gold sales, whereas negative values signal anticipated declines. The forecasts reveal a clear pattern of alternating gains and losses, reflecting the inherently volatile behaviour of gold sales. In the short-term horizon (the first six months of the forecast), the model predicts several negative movements interspersed with modest positive corrections. This pattern suggests temporary downward pressure on gold sales, accompanied by short-lived recoveries rather than sustained growth.

During the medium-term period (six to twelve months ahead), gold sales are forecast to continue exhibiting volatility, with fluctuations between positive and negative changes. However, the magnitude of these movements appears to be moderated by the short-term period, indicating a gradual reduction in extreme price adjustments. In the long-term horizon (twelve to twenty-four months ahead), the forecasts point to modest but more consistent positive values, suggesting an emerging upward momentum. Although occasional negative shocks remain present, the overall pattern implies a degree of stabilisation and a slow appreciation in gold sales over time. Overall, the SARIMA model effectively captures the volatile yet mean-reverting nature of gold price dynamics. The results indicate that while short-term fluctuations and uncertainty are likely to persist, the longer-term outlook suggests a gradual recovery and potential stabilisation of gold sales toward the end of the forecast horizon. Figure 8 shows a graphical comparison of the differenced gold sales with their corresponding SARIMA forecasted values.

Figure 8 presents the two-year-ahead forecast generated by the SARIMA model using the log-differenced series. The black solid line represents the historical log-differenced gold sale values, while the red dashed line illustrates the SARIMA forecast for the next 24 months. The SARIMA forecast remains centred close to zero throughout the forecast horizon. This suggests that the model expects future changes in gold sales to fluctuate around the historical average growth rate rather than exhibit a strong upward or downward trend. The forecast also shows relatively stable and moderate fluctuations, indicating that SARIMA primarily captures the linear and seasonal components of the series but does not project extreme volatility, as observed in some historical periods. Economically, this stability implies relatively balanced market expectations where supply and demand conditions are likely to remain steady in the short- to medium-term.

5. Conclusions and Recommendations

The study investigated the forecasting performance of the SARIMA, GARCH, GRNN and ANN-based ELM using the South African gold sales series. A visual inspection suggested that the series is non-stationary. The formal test of stationarity confirmed that the gold sales series is non-stationary at level and stationary at first difference. Therefore, the series was integrated to order one,

I (1)

. The findings from the linear process revealed that

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

is the best model for the gold sales series. In contrast, ref. [23] found that the ANN model outperformed the ARIMA model in forecasting the gold price. The findings from GARCH revealed that past shocks or asymmetries have an important effect on current volatility and there is persistence in the volatility process.

The findings from the nonlinear process revealed that the ANN-based ELM performed better than GRNN and GARCH models. The results revealed that both the SARIMA model and the ANN-based ELM model can deliver accurate forecasts when applied to real-world scenarios. The overall findings revealed that the linear model outperformed the nonlinear models when comparing the forecasting accuracy.

The findings of this study carry significant broader implications for economic forecasting and analytical methodology. Significantly, the superior performance of the traditional SARIMA model over more complex ML techniques validates the fact that methodological sophistication does not automatically guarantee forecasting accuracy. The optimal model choice is inherently dependent on the characteristics of the data, strengthening the continuing value and interpretability of classical time-series approaches. This underscores the importance of robust preprocessing, such as achieving stationarity, as a non-negotiable first step for any modelling exercise. The SARIMA forecasts suggest that gold sales series will remain volatile in the short- to medium-term, with alternating gains and losses, before showing a modest upward momentum over the long term. Economically, this indicates that gold will continue to serve as a hedge against inflation and market uncertainty, supporting its role in portfolio diversification and risk management. For gold-producing economies and mining firms, short-term price fluctuations may affect revenues and trade balances, while long-term appreciation could enhance profitability and external reserves. Overall, the results highlight the need for active investment and risk strategies in response to gold’s inherent volatility.

For practitioners in the resource sector and beyond, these results suggest that reliable forecasting and risk management, essential for strategic planning, can often be achieved with robust, traditional statistical models. Ultimately, this research provides a cautionary benchmark against the uncritical adoption of advanced algorithms and establishes a comparative framework for evaluating model performance on other volatile economic time series. In support of the findings, ref. [15] also determined that the SARIMA models could be a useful instrument for policymakers and researchers in formulating climate-resilient techniques for this area. By reinforcing the empirical and methodological relevance of SARIMA in seasonal financial time-series forecasting, this study contributes to both the econometric modelling literature and applied financial analytics.

The primary research contribution of this study is its empirical demonstration that, for the specific case of non-stationary South African gold sales data, a well-specified traditional linear model (SARIMA) outperformed more advanced nonlinear ML approaches (GARCH, GRNN, and ANN-ELM) in forecasting accuracy. This finding challenges the prevailing assumption that complex ML models inherently deliver superior results for economic time-series forecasting. Furthermore, the study provides a rigorous comparative framework for evaluating model performance, highlighting the critical importance of model selection based on data characteristics rather than algorithmic complexity, and offers valuable insights for stakeholders in resource economics by identifying the most reliable forecasting tools for strategic decision-making.

Despite its contributions, this study has several limitations. First, the analysis is restricted to South African gold sales series, which may limit the generalisability of the findings to other commodities, countries, or macroeconomic indicators. Second, while the study compares traditional linear models (SARIMA and GARCH) with selected nonlinear ML models (GRNN and ANN-based ELM), it does not include more advanced deep learning architectures such as Long Short-Term Memory (LSTM) or Transformer models. Thirdly, this study did not take into account the issues of potential structural breaks and regime shift in the gold sector. Additionally, the study used nominal gold sales values without adjusting for inflation, despite its potential influence on gold sales values. Lastly, this study relies on monthly data spanning January 2003 to July 2024, and the results may be sensitive to the frequency and length of the time series.

Building upon these findings, future research should explore the development and application of hybrid models that integrate the strengths of linear SARIMA frameworks with the pattern-recognition capabilities of nonlinear ANN-based ELM models, potentially leveraging SARIMA to capture linear components and ANNs to model residuals, thereby enhancing overall forecast accuracy. Subsequent studies could also validate these comparative results across different commodity markets (such as platinum, diamonds, or crude oil) and economic indicators to determine the generalisability of the findings and identify the specific data conditions under which ML models might outperform traditional models. Future studies may benefit from incorporating inflation-adjusted (real) gold sales values or including inflation as an explanatory variable to better understand the economic significance of the results. Furthermore, employing more advanced deep learning architectures, such as LSTM or Transformer networks, could provide a more rigorous benchmark for assessing the potential of complex nonlinear methods in economic forecasting using daily data. Future studies may also consider rolling window evaluation schemes and alternative loss functions to provide more comprehensive benchmarking. Future studies may also extend the analysis to other commodity markets to enhance the generalisability of the findings. Also, developing a new or hybrid forecasting model and comparing it against the existing approaches could be pursued in subsequent studies.

Author Contributions

Conceptualization, J.T.T., D.M. and T.B.; Methodology, J.T.T.; Formal analysis, T.B.; Data curation, J.T.T.; Writing—original draft, J.T.T., D.M. and T.B.; Writing—review & editing, J.T.T., D.M. and T.B. All authors have read and agreed to the published version of the manuscript.

Funding

This study received no external funding.

Informed Consent Statement

Not applicable.

Data Availability Statement

The study employed secondary data obtained from the Statistics South Africa (StatsSA) website repository. The data presented in this study are openly available at [Statistics South Africa] [https://www.statssa.gov.za] (access on 12 March 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ACF	Autocorrelation function
ADF	Augmented Dickey–Fuller
AIC	Akaike information criterion
ANFIS	Adaptive neuro-fuzzy inference system
ANN	Artificial neural network
AR	Autoregressive
BIC	Bayesian information criterion
CDD	Cooling degree days
DM	Diebold–Mariano
ELM	Extreme learning machine
GA	Genetic algorithm
GARCH	Generalised autoregressive conditionally heteroscedastic
GAS	Generalised autoregressive score
GRNN	General regression neural network
GRU	Gated recurrent unit
LSTM	Long short-term memory
MA	Moving average
MAE	Mean absolute error
MAPE	Mean absolute percentage error
ML	Machine learning
MLE	Maximum likelihood estimation
MP	Moore–Penrose
MSE	Mean squared error
PACF	Partial autocorrelation function
RMSE	Root mean square error
SARIMA	Seasonal autoregressive integrated moving average
SLFNs	Single hidden layer feed-forward neural networks
StatsSA	Statistics South Africa

References

Dyan, C.I. South Africa’s Stolen Gold: A Legal Analysis of the Impact of Illegal Mining on South Africas National Security. Master’s Thesis, University of Cape Town, Cape Town, South Africa, 2022. [Google Scholar]
Ericsson, M.; Löf, O.; Löf, A. Chinese Control over African and Global Mining—Past, Present and Future. Miner. Econ. 2020, 33, 153–181. [Google Scholar] [CrossRef]
Tilton, J.E.; Guzmán, J.I. Mineral Economics and Policy; Routledge: Oxfordshire, UK, 2016. [Google Scholar]
Box, G.; Jenkins, G.M. Analysis: Forecasting and Control. San Fr. 1976, 10, 161–192. [Google Scholar]
Engle, R.F. Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation. Econom. J. Econom. Soc. 1982, 50, 987–1007. [Google Scholar] [CrossRef]
Bollerslev, T. Generalized Autoregressive Conditional Heteroskedasticity. J. Econom. 1986, 31, 307–327. [Google Scholar] [CrossRef]
Creal, D.; Koopman, S.J.; Lucas, A. Generalized autoregressive score models with applications. J. Appl. Econom. 2013, 28, 777–795. [Google Scholar] [CrossRef]
Wang, Y.; Lin, T. A Novel Deterministic Probabilistic Forecasting Framework for Gold Price with a New Pandemic Index Based on Quantile Regression Deep Learning and Multi-Objective Optimization. Mathematics 2023, 12, 29. [Google Scholar] [CrossRef]
Hastie, T.; Friedman, J.; Tibshirani, R. The Elements of Statistical Learning; Springer Series in Statistics; Springer: New York, NY, USA, 2001. [Google Scholar]
Specht, D.F. A General Regression Neural Network. IEEE Trans. Neural Netw. 1991, 2, 568–576. [Google Scholar] [CrossRef] [PubMed]
Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme Learning Machine: Theory and Applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Zhang, G.; Patuwo, B.E.; Hu, M.Y. Forecasting with Artificial Neural Networks:: The State of the Art. Int. J. Forecast. 1998, 14, 35–62. [Google Scholar] [CrossRef]
Tsoku, J.T.; Phukuntsi, N.; Metsileng, D. Gold Sales Forecasting: The Box-Jenkins Methodology. Change 2013, 7, 54–60. [Google Scholar] [CrossRef]
Yang, X. The Prediction of Gold Price Using ARIMA Model. In Proceedings of the 2nd International Conference on Social Science, Public Health and Education (SSPHE 2018); Atlantis Press: Dordrecht, The Netherlands, 2019; pp. 273–276. [Google Scholar]
Ray, S.; Das, S.S.; Mishra, P.; Al Khatib, A.M.G. Time Series SARIMA Modelling and Forecasting of Monthly Rainfall and Temperature in the South Asian Countries. Earth Syst. Environ. 2021, 5, 531–546. [Google Scholar] [CrossRef]
Bunnag, T. The Importance of Gold’s Effect on Investment and Predicting the World Gold Price Using the ARIMA and ARIMA-GARCH Model. Ekon. J. Econ. 2024, 2, 38–52. [Google Scholar] [CrossRef]
Hafezi, R.; Akhavan, A.N. Forecasting Gold Price Changes: Application of an Equipped Artificial Neural Network. AUT J. Model. Simul. 2018, 50, 71–82. [Google Scholar]
Bilgili, M. Time Series Forecasting on Cooling Degree-Days (CDD) Using SARIMA Model. Nat. Hazards 2023, 118, 2569–2592. [Google Scholar] [CrossRef]
Saranya, P.B. Modelling and Forecasting Gold Prices using Arima. Asian J. Res. Bus. Econ. Manag. 2021, 11, 1–10. [Google Scholar] [CrossRef]
Bhattacharya, R. Seasonal Dynamics in Gold Price Forecasting: A Comparative Analysis of ARIMA and SARIMA Models for Retail Gold Prices-Evidence from Daily Data, 2014–2025. IJTAS—Int. J. Technol. Appl. Sci. 2025, 16, 1–17. [Google Scholar]
Bagrecha, C.; Singh, K.; Sharma, G.; Saranya, P.B. Forecasting Silver Prices: A Univariate ARIMA Approach and a Proposed Model for Future Direction. Miner. Econ. 2025, 38, 131–141. [Google Scholar] [CrossRef]
Hong, U.; Majid, N. Comparison of ARIMA Model and Artifical Neural Network in Forcasting Gold Price. J. Qual. Meas. Anal. 2021, 17, 31–39. [Google Scholar]
Setyowibowo, S.; As’ ad, M.; Sujito, S.; Farida, E. Forecasting of Daily Gold Price Using ARIMA-GARCH Hybrid Model. J. Ekon. Pembang. 2022, 19, 257–270. [Google Scholar] [CrossRef]
Long, Y. Visibility Graph Network Analysis of Gold Price Time Series. Phys. Stat. Mech. Its Appl. 2013, 392, 3374–3384. [Google Scholar] [CrossRef]
Sudiatmika, I.P.G.A.; Putra, I.M.A.W.; Artana, W.W. The Implementation of Gated Recurrent Unit (Gru) for Gold Price Prediction Using Yahoo Finance Data: A Case Study and Analysis. Brill. Res. Artif. Intell. 2024, 4, 176–184. [Google Scholar] [CrossRef]
Alsuwaylimi, A.A. Comparison of ARIMA, ANN and Hybrid ARIMA-ANN Models for Time Series Forecasting. Inf. Sci. Lett. 2023, 12, 1003–1016. [Google Scholar] [CrossRef]
Bin Khamis, A.; Yee, P.H. A Hybrid Model of Artificial Neural Network and Genetic Algorithm in Forecasting Gold Price. Eur. J. Eng. Technol. Res. 2018, 3, 10–14. [Google Scholar]
Bergmeir, C.; Benítez, J.M. On the Use of Cross-Validation for Time Series Predictor Evaluation. Inf. Sci. 2012, 191, 192–213. [Google Scholar] [CrossRef]
Brownlee, J. Ensemble Learning Algorithms with Python: Make Better Predictions with Bagging, Boosting, and Stacking; Machine Learning Mastery: San Francisco, CA, USA, 2021. [Google Scholar]
Dickey, D.A.; Fuller, W.A. Distribution of the Estimators for Autoregressive Time Series with a Unit Root. J. Am. Stat. Assoc. 1979, 74, 427–431. [Google Scholar] [CrossRef]
Kwiatkowski, D.; Phillips, P.C.; Schmidt, P.; Shin, Y. Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root: How Sure Are We That Economic Time Series Have a Unit Root? J. Econom. 1992, 54, 159–178. [Google Scholar]
Ljung, G.M.; Box, G.E. On a Measure of Lack of Fit in Time Series Models. Biometrika 1978, 65, 297–303. [Google Scholar] [CrossRef]
Bowerman, B.L.; O’Connell, R.T.; Koehler, A.B. Forecasting, Time Series, and Regression: An Applied Approach; Thomson Brooks/Cole: Monterey, CA, USA, 2005. [Google Scholar]
Matei, M.; Rovira, X.; Agell, N. Bivariate Volatility Modeling with High-Frequency Data. Econometrics 2019, 7, 41. [Google Scholar] [CrossRef]
Hu, R.; Wen, S.; Zeng, Z.; Huang, T. A Short-Term Power Load Forecasting Model Based on the Generalized Regression Neural Network with Decreasing Step Fruit Fly Optimization Algorithm. Neurocomputing 2017, 221, 24–31. [Google Scholar] [CrossRef]
Kim, B.; Lee, D.W.; Park, K.Y.; Choi, S.R.; Choi, S. Prediction of Plasma Etching Using a Randomized Generalized Regression Neural Network. Vacuum 2004, 76, 37–43. [Google Scholar] [CrossRef]
Cigizoglu, H.K. Generalized Regression Neural Network in Monthly Flow Forecasting. Civ. Eng. Environ. Syst. 2005, 22, 71–81. [Google Scholar] [CrossRef]
Acharya, N.; Shrivastava, N.A.; Panigrahi, B.K.; Mohanty, U.C. Development of an Artificial Neural Network Based Multi-Model Ensemble to Estimate the Northeast Monsoon Rainfall over South Peninsular India: An Application of Extreme Learning Machine. Clim. Dyn. 2014, 43, 1303–1310. [Google Scholar] [CrossRef]
Huang, G.-B.; Chen, L.; Siew, C.-K. Universal Approximation Using Incremental Constructive Feedforward Networks with Random Hidden Nodes. IEEE Trans. Neural Netw. 2024, 17, 879–892. [Google Scholar] [CrossRef] [PubMed]
Huang, G.-B.; Li, M.-B.; Chen, L.; Siew, C.-K. Incremental Extreme Learning Machine with Fully Complex Hidden Nodes. Neurocomputing 2008, 71, 576–583. [Google Scholar] [CrossRef]
Chong, E.K.; Lu, W.-S.; Zak, S.H. An Introduction to Optimization: With Applications to Machine Learning; John Wiley & Sons: Hoboken, NJ, USA, 2023. [Google Scholar]
Zhang, J.; Xiao, W.; Zhang, S.; Huang, S. Device-Free Localization via an Extreme Learning Machine with Parameterized Geometrical Feature Extraction. Sensors 2017, 17, 879. [Google Scholar] [CrossRef]
Diebold, F.X.; Mariano, R.S. Comparing Predictive Accuracy. J. Bus. Econ. Stat. 2002, 20, 134–144. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of a GRNN architecture. Source: [37] p. 74.

Figure 2. Schematic representation of the structure of ELM. Source: [42].

Figure 3. Gold sales plot.

Figure 4. ACF and PACF plots.

Figure 5. SARIMA observed vs. fitted values.

Figure 6. SHAP value diagram for ANN-ELM.

Figure 7. SHAP value diagram for GRNN.

Figure 8. SARIMA forecasted values.

Table 1. Data handling and preprocessing for all models.

Model	Data Transformation	Data Split	Reason
SARIMA	Differencing	Full log-differenced series	Stationarity requirement
GARCH	Differencing	Full log-differenced series	Volatility modelling requires a stationary series
ANN/GRNN/ELM	Differencing + normalisation (Min–Max [0, 1])	80% training and 20% testing	Stationarity + scale sensitivity of ML models

Table 2. EDA results on gold sales.

No. of Observations	Mean	Median	Variance	Std Dev.	Min	Max
259	5602.078	5114.900	8,982,938.184	2997.155	1477.100	20,492.500

Table 3. Gold sales stationarity test results.

Test	Test Statistic	Probability (p-Value)
ADF test results at level	0.487	0.985
KPSS test results at level	2.251	0.010
ADF test results at first difference	−7.905	0.000
KPSS test results at first difference	0.089	0.100

Table 4. Parameter estimate results of

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

model on differenced data.

Table 4. Parameter estimate results of

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

model on differenced data.

Variables	Coefficient	Standard Error	Z	P > \|z\|
MA (1)	0.797	0.033	−23.847	0.000
Seasonal AR (12)	1.546	1.5455	0.290	0.000
Seasonal AR (24)	0.637	0.285	−2.236	0.025
Seasonal MA (12)	1.515	0.273	−5.542	0.000
Seasonal MA (24)	0.698	0.234	2.981	0.003
Sigma2	0.067	0.004	15.055	0.000

Table 5. Diagnostic test results of the fitted

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

model on differenced data.

Table 5. Diagnostic test results of the fitted

SARIMA (0, 1, 1) {(2, 1, 2)}_{12}

model on differenced data.

Test	Test Statistic	Probability (p-Value)
JB Test	48.73	0.00
Ljung–Box (L1) Q	0.290	0.590

Table 6. Parameter estimate results of GARCH (0,1) model.

Parameters	Estimate	Std. Error	t-Value	Pr (>\|t\|)
$µ$	−0.017	0.012	−1.405	0.160
$Ω$	0.003	0.003	0.954	0.340
$γ_{1}$	0.355	0.164	2.162	0.031
$β_{1}$	0.823	0.104	7.907	0.000

Table 7. GRNN SHAP values for feature importance.

Feature	Mean Absolute SHAP
lag_1	0.053
lag_6	0.026
lag_7	0.019
lag_2	0.010
lag_9	0.009
lag_10	0.001
lag_4	0.001
lag_11	0.000
lag_8	0.000
lag_3	0.000
lag_5	0.000
lag_12	0.000

Table 8. ANN-ELM SHAP values for feature importance.

Feature	Mean Absolute SHAP
lag_12	0.019
lag_7	0.012
lag_6	0.010
lag_9	0.009
lag_10	0.008
lag_1	0.007
lag_2	0.006
lag_4	0.004
lag_8	0.004
lag_3	0.002
lag_5	0.002
lag_11	0.001

Table 9. Comparison results of SARIMA, GARCH, GRNN and ANN-based ELM on test data.

Model	RMSE	MAE	MAPE (%)	MFE	Theil’s U
Pre-COVID-19
$SARIMA (0, 1, 1) {(2, 1, 2)}_{12}$	0.202	0.141	263.90	−0.009	0.502
GARCH	0.247	0.172	110.58	−0.013	0.968
GRNN	0.374	0.346	100	−0.346	1.000
ANN-ELM	0.157	0.112	41.77	0.062	0.200
COVID-19 and beyond
$SARIMA (0, 1, 1) {(2, 1, 2)}_{12}$	0.424	0.347	90.240	−0.080	0.437
GARCH	0.607	0.508	100.59	−0.002	0.967
GRNN	0.602	0.515	100	−0.515	1.000
ANN-ELM	0.359	0.279	65.17	−0.184	0.385
Overall sample
$SARIMA (0, 1, 1) {(2, 1, 2)}_{12}$	0.260	0.184	248.30	−0.016	0.451
GARCH	0.356	0.243	110.89	−0.018	0.972
GRNN	0.441	0.390	100	−0.389	1.000
ANN-ELM	0.284	0.255	45.68	−0.044	0.241

Table 10. DM test results.

Models	DM-Statistic	p-Value
$SARIMA (0, 1, 1) {(2, 1, 2)}_{12}$ vs. ANN-ELM	1.415	0.157

Table 11. SARIMA two-year forecast.

Month Start	Month End	Forecasts	Month Start	Month End	Forecasts
1 August 2024	31 August 2024	−0.386	1 August 2025	31 August 2025	0.069
1 September 2024	30 September 2024	−0.162	1 September 2025	30 September 2025	−0.189
1 October 2024	31 October 2024	0.184	1 October 2025	31 October 2025	0.216
1 November 2024	30 November 2024	−0.147	1 November 2025	30 November 2025	−0.166
1 December 2024	31 December 2024	−0.018	1 December 2025	31 December 2025	−0.007
1 January 2025	31 January 2025	0.189	1 January 2026	31 January 2026	0.205
1 February 2025	28 February 2025	−0.415	1 February 2026	28 February 2026	−0.443
1 March 2025	31 March 2025	0.250	1 March 2026	31 March 2026	0.258
1 April 2025	30 April 2025	−0.179	1 April 2026	30 April 2026	−0.191
1 May 2025	31 May 2025	0.158	1 May 2026	31 May 2026	0.165
1 June 2025	30 June 2025	−0.006	1 June 2026	30 June 2026	0.071
1 July 2025	31 July 2025	0.145	1 July 2026	31 July 2026	0.105

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tsoku, J.T.; Metsileng, D.; Botlhoko, T. Modelling South African Gold Sales Using SARIMA, GARCH and Neural Networks. Mathematics 2026, 14, 1289. https://doi.org/10.3390/math14081289

AMA Style

Tsoku JT, Metsileng D, Botlhoko T. Modelling South African Gold Sales Using SARIMA, GARCH and Neural Networks. Mathematics. 2026; 14(8):1289. https://doi.org/10.3390/math14081289

Chicago/Turabian Style

Tsoku, Johannes Tshepiso, Daniel Metsileng, and Tshegofatso Botlhoko. 2026. "Modelling South African Gold Sales Using SARIMA, GARCH and Neural Networks" Mathematics 14, no. 8: 1289. https://doi.org/10.3390/math14081289

APA Style

Tsoku, J. T., Metsileng, D., & Botlhoko, T. (2026). Modelling South African Gold Sales Using SARIMA, GARCH and Neural Networks. Mathematics, 14(8), 1289. https://doi.org/10.3390/math14081289

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modelling South African Gold Sales Using SARIMA, GARCH and Neural Networks

Abstract

1. Introduction

2. Literature Review

2.1. Literature Review on Traditional Models

2.2. Literature Review on Machine Learning and Deep Learning Models

3. Methodology

3.1. Data Preparation and Preprocessing

3.1.1. Data

3.1.2. Data Split and Normalisation

3.1.3. Model Architecture

3.1.4. Software

3.2. Seasonal Autoregressive Integrated Moving Average (SARIMA) Model

3.3. Generalised Autoregressive Conditionally Heteroscedastic (GARCH) Model

3.4. Generalised Regression Neural Network (GRNN)

3.5. Extreme Learning Machine (ELM)

3.6. Evaluation of the Forecasting Performance of the Models

4. Discussion of Findings

4.1. Explanatory Data Analysis (EDA)

4.2. Results of the SARIMA Model

4.3. Results of the GARCH Model

4.4. SHAP Analysis Results of GRNN Model and ANN-ELM Model

4.5. Comparison of SARIMA, GARCH, GRNN and ANN-Based ELM Models

4.6. DM Test Results for Forecast Comparison

4.7. One Step Ahead Forecast

5. Conclusions and Recommendations

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI