Next Article in Journal
Three-Dimensional UAV Omnidirectional Path Planning Algorithm Based on Urban Obstacle Environment
Previous Article in Journal
Long Short-Term Memory Networks Since Their Inception: Mapping 25 Years of Scientific Development via Bibliometric Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Deep Learning and Probabilistic Forecasting of Stock Prices

by
Ndivhuwo Nelufhangani
1 and
Daniel Maposa
2,*
1
School of Mathematical and Computer Sciences, University of Limpopo, Sovenga 0727, South Africa
2
Department of Statistics and Operations Research, University of Limpopo, Sovenga 0727, South Africa
*
Author to whom correspondence should be addressed.
Algorithms 2026, 19(5), 391; https://doi.org/10.3390/a19050391
Submission received: 13 March 2026 / Revised: 6 May 2026 / Accepted: 8 May 2026 / Published: 14 May 2026

Abstract

This study investigates the effectiveness of Bayesian probabilistic methods for stock price forecasting on the Johannesburg Stock Exchange by implementing and comparing Gaussian process regression (GPR), Bayesian long short-term memory (Bayesian LSTM), and Bayesian neural networks (BNNs). Using daily open, high, low, close, and volume (OHLCV) data and engineered technical indicators for FirstRand and Discovery from January 2005 to June 2025 (5187 observations), models were trained and evaluated with the mean absolute error (MAE), root mean squared error (RMSE), and mean squared error (MSE). The GPR produced reliable, well-calibrated intervals in relatively stable regimes, but its performance degraded on the more volatile Discovery series. Bayesian LSTM delivered conservative uncertainty estimates with wide predictive intervals but showed the largest point forecast errors. The BNNs achieved the best balance between accuracy and uncertainty quantification, producing the lowest errors for FirstRand and competitive performance for Discovery. Comparative analysis indicates that BNNs are most suitable when point accuracy and calibrated uncertainty are both priorities, GPR is valuable for smaller or more stable data regimes, and Bayesian LSTM is preferable where conservative, risk-conscious intervals are required. This study highlights the practical value of embedding uncertainty into financial forecasts and recommends matching Bayesian model choice to market volatility, data availability, and decision maker risk appetite.

1. Introduction

The stock market is a dynamic and volatile environment influenced by numerous unpredictable factors, including economic indicators, corporate actions, geopolitical events, and investor sentiment [1]. These factors contribute to a constantly changing environment that poses both potential gains and significant risks for market participants. Accurate forecasting of stock prices is essential for investors, policymakers, and financial analysts to make informed, risk-conscious decisions [2]. It allows for more strategic asset allocation, enhances portfolio management, and supports regulatory planning. However, the complexity and nonlinear nature of financial time series make this task particularly challenging [3].
Recent advances in machine learning and deep learning have improved predictive accuracy by capturing complex nonlinear patterns in financial data; however, most existing models remain fundamentally deterministic and provide only point estimates [4]. In high-risk financial environments, the absence of uncertainty information limits their practical usefulness, as decision makers require not only expected outcomes but also an understanding of forecast confidence and potential risk exposure. This has motivated growing interest in Bayesian probabilistic models, which offer a principled framework for integrating uncertainty into stock price forecasting by producing full predictive distributions rather than single-value predictions. Such uncertainty-aware forecasting is particularly valuable during periods of market instability, where reliable risk quantification is as critical as point prediction accuracy.
Despite the growing body of work on Bayesian forecasting, a critical gap remains: existing studies tend to evaluate Bayesian models in isolation, making direct performance comparisons unreliable due to differences in datasets, preprocessing pipelines, and evaluation protocols. Moreover, the majority of such studies focus on developed markets such as the United States or European exchanges, leaving emerging markets and the Johannesburg Stock Exchange (JSE) in particular substantially under-explored. This study directly addresses both gaps by conducting the first unified, condition-controlled comparison of three leading Bayesian approaches on JSE-listed securities, thereby providing evidence that is both methodologically rigorous and contextually novel.

1.1. Rationale

Stock markets are influenced by multiple unpredictable factors [5]. Traditional statistical and machine learning models have made a substantial impact in forecasting stock prices, yet they fail to capture the inherent uncertainty and risk associated with financial markets. The major limitation of conventional machine learning models is their inability to quantify uncertainty, which is critical for risk assessment in financial decision making [1,6,7].
Bayesian probabilistic models, such as Bayesian neural networks (BNNs), Gaussian process regression (GPR), and Bayesian long short-term memory (Bayesian LSTM), have emerged as effective approaches for quantifying uncertainty, thereby offering more robust and interpretable stock price predictions. Recent research highlights the advantages of Bayesian LSTM over conventional deep learning models in stock price forecasting [8]. The BNNs provide an advanced approach by introducing Bayesian inference into neural networks, which allows for uncertainty quantification in predictions [9]. Their results demonstrate that BNNs outperform traditional methods and provide better uncertainty quantification, especially in high-volatility periods such as during the coronavirus pandemic of the year 2019.

1.2. Review of the Literature

Traditional time series models have been the foundation for stock forecasting. The AutoRegressive Integrated Moving Average (ARIMA) model excels at modeling linear trends and seasonal structure in historical price series [10]. ARIMA assumes stationarity (after differencing) and fits autoregressive and moving average terms to the data. In finance, however, pure ARIMA often underestimates volatility, which led to the introduction of Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) models to capture time-varying volatility in returns. Despite their theoretical appeal, traditional models have limitations, as they struggle with highly nonlinear patterns or regime shifts in stock data [11].
Recent advancements in machine learning and deep learning have significantly transformed stock market forecasting. Techniques such as artificial neural networks (ANN), support vector machines (SVMs), random forest (RF), and deep learning architectures such as LSTM networks have shown substantial improvements in capturing nonlinear dependencies and hidden patterns in financial data [12]. Ref. [13] contributed a comprehensive comparison of deep learning and machine learning algorithms, showing that LSTM performed best, followed by SVMs, with ANNs and RF lagging behind.
Bayesian approaches cast forecasting as probabilistic inference, explicitly modeling uncertainty. The BNN is a neural network with weight distributions instead of point weights [9,14]. During training, Bayesian inference learns a posterior distribution over weights, and thus predictions are distributions. This sampling scheme enables efficient posterior inference despite the high-dimensional parameter space.
The GPR represents a foundational technique in Bayesian machine learning, offering a non-parametric, probabilistic framework for regression tasks [1]. Unlike traditional models that provide single-point estimates, GPR defines a distribution over functions, allowing each prediction to be accompanied by an explicit measure of uncertainty. Recent studies have explored GPR in financial forecasting with promising results [10,15].
The integration of Bayesian inference into the LSTM architecture has led to Bayesian LSTM models that combine the sequential learning capabilities of LSTMs with the uncertainty modeling strengths of Bayesian methods [8,16]. Ref. [8] demonstrated that Bayesian LSTM consistently outperformed both RNNs and standard LSTM in terms of the Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Squared Error (MSE) while producing well-calibrated prediction intervals.
A structured synthesis of the existing literature reveals three recurring themes: (1) the superiority of deep learning over classical statistical models for nonlinear financial data; (2) the advantage of Bayesian over deterministic approaches in terms of uncertainty quantification; and (3) the lack of standardized comparative studies across Bayesian methods on the same dataset. Table 1 summarizes the representative studies across these themes.
Despite these advancements, several research gaps remain. Many AI models, including deep learning, are difficult to interpret and often ignore uncertainty in their predictions. Crucially, GPR, Bayesian LSTM, and BNNs have typically been studied in isolation; no published work has placed all three under identical experimental conditions on the same financial time series, making it impossible to draw reliable conclusions about their relative merits. Furthermore, applications to emerging market equities, particularly on the JSE, are sparse. The present study is designed to fill both of these gaps simultaneously by implementing all three approaches within a unified pipeline on JSE data, using consistent preprocessing, identical train/validation/test splits, and the same evaluation metrics.

1.3. Research Highlights and Contributions of the Study

This study provides a comprehensive comparative evaluation of three Bayesian probabilistic methods for stock price forecasting on the Johannesburg Stock Exchange, demonstrating that BNNs achieve an optimal balance between predictive accuracy and uncertainty quantification, particularly for volatile financial time series.
The major contributions of this study are as follows:
  • Contribution to methods: A systematic implementation and comparison of three Bayesian approaches under standardized conditions using the same dataset, preprocessing pipeline, and evaluation metrics, addressing the gap in the literature where these models are typically evaluated in isolation.
  • Contribution to findings: Empirical evidence that BNNs outperform both GPR and Bayesian LSTM in terms of point forecast accuracy (lowest MAE and RMSE) while maintaining well-calibrated uncertainty intervals, making them the most practical choice for risk-conscious financial forecasting.
  • Identification of model-specific strengths: GPR excels in stable market regimes with limited data, Bayesian LSTM provides the most conservative (widest) uncertainty estimates suitable for risk-averse applications, and BNNs offer the best trade-off between accuracy and uncertainty quantification.
  • Application to the South African context: First empirical comparison of these Bayesian methods on JSE-listed companies (FirstRand and Discovery), extending the literature on probabilistic forecasting to emerging markets.

2. Methods

2.1. Data Source and Study Area

This study utilized secondary data obtained from Yahoo Finance through the open-source Python package yfinance, which provides access to publicly available financial market data. The dataset comprised historical daily stock prices for companies listed on the JSE, with a focus on FirstRand Limited and Discovery Limited. The time frame spanned from January 2005 to June 2025, capturing 5187 observations including key market attributes: open, high, low, close (OHLC) prices and trading volume.

2.2. Data Preprocessing and Feature Engineering

Before model training, raw financial data undergo several preprocessing steps. Stock prices are normalized using Z-score standardization (standard scaling) to improve model convergence and gradient stability. Z-score normalization was preferred over Min-Max scaling because the latter compresses values into a fixed [ 0 ,   1 ] range, which is highly sensitive to outliers, a common occurrence in financial time series. Log returns were not used as the primary normalization method because the models operate on price levels and require inverse transformation back to interpretable price units for evaluation. Z-score standardization produces values on an unbounded scale (including negative values, which reflect observations below the historical mean), and this is both expected and appropriate within the modeling framework.
Given a column of data, the mean and standard deviation were computed as follows:
μ = 1 n i = 1 n x i , σ = 1 n i = 1 n ( x i μ ) 2 .
Each value was then scaled as
f ( x ) = x μ σ .
To strengthen predictive power, feature engineering was applied to extract technical indicators that captured momentum, trends, and volatility. All five OHLCV columns were used as model inputs, with the closing price serving as the prediction target. The following technical indicators were computed from the closing price series using the pandas-ta library in Python:
  • Relative strength index (RSI): A 14-period RSI was computed, yielding a single bounded oscillator value per time step.
  • Moving average convergence divergence (MACD): The standard MACD configuration was used, with a fast exponential moving average (EMA) period of 12, slow EMA period of 26, and a signal line period of 9. Only the MACD line (fast EMA minus slow EMA) was retained as a feature; the signal line and histogram were excluded to avoid multicollinearity.
  • Simple moving average (SMA-20): A 20-period simple moving average of the closing price.
  • Exponential moving average (EMA-20): A 20-period exponential moving average of the closing price.
  • Bollinger bands (BBs): Computed with a 20-period window and 2 standard deviations. Both the upper and lower bands were retained as separate features, together providing a 4-feature Bollinger input (upper band, lower band, rolling mean, and bandwidth). Only the upper and lower band values were used as input features; the rolling mean was equivalent to the SMA-20, and the bandwidth was omitted to reduce redundancy.
All engineered technical indicators were subjected to the same Z-score standardization procedure (Equations (1) and (2)) applied to the raw OHLCV features. Standardization was applied separately to each feature column using statistics computed exclusively on the training set, with the resulting parameters then applied to the validation and test sets to prevent data leakage. The final input feature matrix therefore comprised 11 standardized columns: OHLCV, RSI, MACD, SMA-20, EMA-20, BB-Upper, and BB-Lower. The first 20 observations of the dataset (one full Bollinger/SMA window) were discarded due to NaN values arising from the rolling window computation, leaving 5167 usable observations.
The data were split into training (60%), validation (20%), and testing (20%) sets. Rolling windows and sequence generation techniques were applied for the LSTM-based models. A sequence window of 60 time steps was used as input for the Bayesian LSTM, meaning each training sample consisted of 60 consecutive daily observations. For the BNN and GPR, the same 60-step window features were flattened into a single input vector per observation to conform to the feedforward and kernel-based architectures, respectively.

2.3. Bayesian Neural Networks (BNNs)

BNNs extend traditional feedforward neural networks by treating weights and biases as probability distributions rather than fixed parameters. The models were implemented in Python 3.11 using the following key libraries: PyTorch 2.0 for BNN implementation, TensorFlow/Keras 2.13 for Bayesian LSTM, and scikit-learn 1.3 for GPR. Supporting data manipulation and feature engineering relied on pandas 2.0, numpy 1.24, and pandas-ta 0.3. All experiments were conducted on a single GPU (NVIDIA GeForce RTX 3060, 12 GB of VRAM) under Ubuntu 22.04.
In this study, the BNN architecture comprised two fully connected hidden layers with 64 and 32 neurons using the ReLU activation function. A Gaussian prior with zero mean and unit variance was placed over all weights and biases. Variational inference was implemented via Bayes using Backprop [14], with the ELBO objective optimized using the Adam optimizer (learning rate = 0.001 ) over 200 epochs. A batch size of 32 and a dropout rate of 0.1 were applied for regularization. At test time, S = 100 Monte Carlo samples were drawn from the approximate posterior to estimate the predictive mean and variance. The prior distribution over weights W and biases b is typically Gaussian:
p ( W , b ) = i N ( W i 0 , σ W 2 ) j N ( b j 0 , σ b 2 ) .
The posterior distribution given data D was obtained via Bayes’ theorem:
p ( W , b D ) = p ( D W , b ) p ( W , b ) p ( D ) .
Since the true posterior is intractable, variational inference was used to approximate it by minimizing the Kullback–Leibler divergence between the approximate distribution q ( W , b ) and the true posterior:
KL ( q ( W , b ) p ( W , b D ) ) .
The evidence lower bound (ELBO) objective is
L ( θ ) = E q ( W , b ) [ log p ( D W , b ) ] KL ( q ( W , b ) p ( W , b ) ) .
For prediction, S samples { W s , b s } s = 1 S were drawn from the approximate posterior, yielding predictions { y s } s = 1 S . The predictive mean and variance are
y ^ = 1 S s = 1 S y s , Var ( y ) = 1 S s = 1 S ( y s y ^ ) 2 .

2.4. Bayesian LSTM

Bayesian LSTM extends conventional LSTM by incorporating Bayesian inference to quantify uncertainty in sequential predictions. The Bayesian LSTM implemented here consisted of two stacked LSTM layers with 50 units each, followed by a dense output layer. A sequence window of 60 time steps was used as input. Bayesian inference was achieved through Monte Carlo Dropout [8], where dropout (rate = 0.2 ) was applied at both the training and test times to approximate posterior sampling. The model was trained using the Adam optimizer (learning rate = 0.001 ), with a mean squared error loss function over 100 epochs and a batch size of 32. At inference, S = 100 forward passes were performed to obtain the predictive distribution. The posterior over-model parameters θ are expressed by
p ( θ X , Y ) = p ( Y X , θ ) p ( θ ) p ( Y X ) .
For a new input sequence X , the predictive distribution was obtained by integrating over parameter configurations:
p ( y X , X , Y ) = p ( y X , θ ) p ( θ X , Y ) d θ .
This integral was approximated using variational inference with the same ELBO objective as in Equation (6). At test time, S samples of weights were drawn to compute the predictive mean and variance as in Equation (7).

2.5. Gaussian Process Regression (GPR)

The GPR is a non-parametric Bayesian approach that models distributions over functions directly. In this study, a composite kernel was employed: the radial basis function (RBF) kernel combined with a white noise kernel to capture residual noise, i.e., k ( x , x ) = σ f 2 exp x x 2 2 l 2 + σ n 2 δ ( x , x ) , where is the length scale, σ f 2 is the signal variance, and σ n 2 is the noise variance. The kernel hyperparameters were optimized by maximizing the log marginal likelihood (Equation (16)) using L-BFGS-B with five random restarts to mitigate local optima. A Gaussian process is defined as a collection of random variables, any finite number of which have a joint Gaussian distribution:
f ( · ) GP ( m ( · ) , k ( · , · ) ) .
where m ( · ) is the mean function (typically zero) and k ( · , · ) is the covariance (kernel) function. The joint distribution of training outputs f and test outputs f is
f f N 0 , K ( X , X ) K ( X , X ) K ( X , X ) K ( X , X ) .
With independent and identically distributed Gaussian noise ϵ N ( 0 , σ ϵ 2 ) , the predictive distribution for the test inputs was
y X , y , θ N ( μ , Σ + σ ϵ 2 I ) .
where
μ = K ( X , X ) ( K + σ ϵ 2 I ) 1 y .
Σ = K ( X , X ) K ( X , X ) ( K + σ ϵ 2 I ) 1 K ( X , X ) .
The 95% prediction intervals were calculated as follows:
[ μ 1.96 diag ( Σ + σ ϵ 2 I ) , μ + 1.96 diag ( Σ + σ ϵ 2 I ) ] .
The hyperparameters θ and noise variance σ ϵ 2 were estimated by maximizing the log marginal likelihood:
log p ( y X , θ , σ ϵ 2 ) = 1 2 y T ( K + σ ϵ 2 I ) 1 y 1 2 log | K + σ ϵ 2 I | n 2 log ( 2 π ) .

2.6. Evaluation Metrics

Three metrics were used to evaluate model performance.
Mean squared error (MSE):
MSE = 1 n i = 1 n ( y i y ^ i ) 2 .
Mean absolute error (MAE):
MAE = 1 n i = 1 n | y i y ^ i | .
Root mean square error (RMSE):
RMSE = 1 n i = 1 n ( y i y ^ i ) 2 .

2.7. Uncertainty Quantification Metrics

To provide a rigorous quantitative evaluation of the probabilistic outputs, two additional metrics were used [8].
The prediction interval coverage probability (PICP) measures the proportion of actual observations that fall within the predicted 95% confidence interval:
PICP = 1 n i = 1 n 1 y i [ l ^ i , u ^ i ] .
A well-calibrated model should yield a PICP 0.95 for a 95% interval.
The mean prediction interval width (MPIW) measures the average width of the predictive intervals, reflecting the sharpness of the uncertainty estimates:
MPIW = 1 n i = 1 n ( u ^ i l ^ i ) .
A lower MPIW indicates sharper (more informative) intervals, provided that the PICP remains at or near the nominal coverage level. Together, the PICP and MPIW allow for a more complete assessment of uncertainty calibration beyond qualitative inspection of forecast plots.

3. Results

3.1. Exploratory Data Analysis

Table 2 presents the descriptive statistics of the FirstRand stock dataset from January 2005 to June 2025. The results indicate substantial variability in stock prices over the study period, reflecting the dynamic nature of financial markets.

3.1.1. FirstRand Limited Descriptive Statistics

Figure 1 illustrates the historical closing price movement of FirstRand from 2005 to 2025. The series displays a clear long-term upward trend accompanied by periods of elevated volatility, particularly during major market disruptions such as the 2008 global financial crisis and the COVID-19 pandemic period.

3.1.2. Discovery Limited Descriptive Statistics

Table 3 presents the descriptive statistics for Discovery Limited which reveal substantially higher price variability compared with FirstRand. The closing price exhibited a mean of 9117.58 and a large standard deviation of 5403.80, indicating pronounced fluctuations over the study period. The wide range between the minimum and maximum closing prices suggests the presence of multiple volatility regimes and rapid price movements characteristic of growth-oriented financial stocks.
Figure 2 shows the historical closing price trajectory of Discovery Limited over the study period. Compared with FirstRand, the Discovery series demonstrates more pronounced volatility and sharper fluctuations, particularly during periods of market uncertainty. Although the stock exhibits an overall upward long-term trend, the larger amplitude of price swings indicates a more unstable and risk-sensitive trading pattern.

3.1.3. Correlation Analysis

The correlation matrices presented in Figure 3 and Figure 4 reveal strong positive relationships among the OHLC variables for both stocks. This is expected, as these variables are derived from the same daily trading process and generally move together over time. The strong correlations suggest the presence of significant temporal and structural dependencies within the financial series.

3.2. Gaussian Process Regression Results

Table 4 summarizes the point forecast errors for GPR on both stocks. GPR performed substantially better on FirstRand (MAE = 72.27, RMSE = 102.80) than on Discovery (MAE = 115.15, RMSE = 226.19), a difference of approximately 59% in the RMSE. This disparity is consistent with GPR’s known behavior; the RBF kernel is well suited to smooth, slowly varying price series, but the cubic cost of kernel matrix inversion ( O ( n 3 ) ) limits the effective lookback window, reducing the model’s capacity to track rapid fluctuations. Table 5 provides a sample of the actual versus predicted values with 95% confidence intervals, and Figure 5 and Figure 6 presents the corresponding forecast plots.

3.3. Bayesian LSTM Results

The Bayesian LSTM results reveal a clear trade-off between point forecast precision and uncertainty coverage. For FirstRand (Figure 7), the model achieved an MAE of 133.49 and RMSE of 175.21, higher than both GPR and the BNN as shown in Table 6. For Discovery (Figure 8), the MAE of 337.39 and RMSE of 452.78 represent the largest errors of all three models. These elevated errors are a structural consequence of the Monte Carlo Dropout approach; retaining stochastic dropout at test time aggregated variance from all network layers, yielding conservative but high-coverage intervals (Table 7).

3.4. Bayesian Neural Network Results

As shown in Table 8, the BNN achieved the lowest point forecast errors across both stocks. For FirstRand, the MAE of 63.78 and RMSE of 83.29 represent improvements of 12% and 19%, respectively, over the second-best model (GPR). For Discovery, the MAE of 144.56 and RMSE of 201.50 were 20% and 11% lower than GPR, respectively, which itself outperformed Bayesian LSTM by a substantial margin. These results confirm the BNN as the most accurate and versatile of the three Bayesian approaches under the experimental conditions of this study. Table 9 provides the illustrative predicted values with 95% confidence intervals, and Figure 9 and Figure 10 presents the corresponding forecast plots, where the well-calibrated intervals, relatively compact for FirstRand and appropriately wider for the more volatile Discovery, visually confirm the model’s ability to adapt uncertainty to local data density.

3.5. Comparative Model Performance Summary

BNN achieved the best overall performance across both stocks. Results are shown in Table 10.
PICP values close to 0.95 indicate well-calibrated intervals. Lower MPIW at adequate PICP indicates sharper, more informative intervals. BNN achieved near-nominal coverage with comparatively lower interval width. Results are shown in Table 11.
Non-Bayesian baselines (ARIMA and standard LSTM) included for contextual comparison. All Bayesian models outperformed ARIMA; the BNN achieved the lowest errors overall while also providing calibrated uncertainty intervals unavailable in deterministic models. Results are shown in Table 12.

4. Discussion

The empirical results reveal distinct performance patterns across the three Bayesian models and two companies, providing valuable insights into the strengths and limitations of each approach for stock price forecasting.
For FirstRand, which exhibited relatively stable price behavior, all three models tracked actual prices reasonably well. The BNN delivered the most accurate forecasts with the lowest MAE (63.78) and RMSE (83.29), while maintaining well-calibrated 95% confidence intervals that remained narrow under normal conditions but appropriately widened during periods of increased volatility, such as on 30 April 2021. This finding aligns with Chandra and He [9], who demonstrated that BNNs outperform traditional models during market turbulence by better capturing nonlinear relationships and quantifying predictive uncertainty.
The GPR produced moderate but consistent results for FirstRand (MAE = 72.27, RMSE = 102.80), with relatively narrow confidence intervals reflecting stable uncertainty estimates. The model performed particularly well in tracking gradual price movements, consistent with Bisht et al. [1], who found that GPR can effectively capture nonlinearities while maintaining well-calibrated confidence intervals in smaller datasets. However, the wider interval observed on 30 April 2021 demonstrates GPR’s ability to signal increased uncertainty during market turbulence.
Bayesian LSTM generated the widest prediction intervals, reflecting its conservative approach to uncertainty estimation. However, this came at the expense of point forecast precision, with the highest errors among the three models (MAE = 133.49, RMSE = 175.21). While Wang and Qi [8] emphasized Bayesian LSTM’s advantage in enhancing predictive robustness under volatile conditions, our findings suggest that this robustness may manifest primarily through uncertainty quantification rather than point forecast accuracy.
For Discovery, which exhibited substantially higher volatility (standard deviation of 5404 compared with FirstRand’s 2186), all models faced greater challenges. GPR’s performance degraded notably (MAE = 115.15, RMSE = 226.19), with wider confidence intervals reflecting the model’s difficulty in capturing rapid price fluctuations. This decline underlines GPR’s limited scalability in fast-changing markets, as noted in the literature.
Bayesian LSTM recorded the highest errors for Discovery (MAE = 337.39, RMSE = 452.78), with incredibly broad confidence intervals that, while successfully encompassing the true price paths, highlighted the elevated risk profile of the stock. This suggests that Bayesian recurrent models may prioritize stability and risk awareness over precision, making them valuable in uncertainty-sensitive applications where conservative estimates are preferred.
The BNN again stood out for Discovery, achieving lower errors (MAE = 144.56, RMSE = 201.50) than the other two models while maintaining reasonable uncertainty bands, though these remained wider than those observed for FirstRand. This superior performance on volatile data aligns with Chandra and He’s [9] findings that BNNs excel during periods of increased market turbulence by effectively modeling complex nonlinearities while quantifying predictive uncertainty.
Cross-model comparison revealed that no single Bayesian model was universally superior. Instead, the effectiveness of each approach was conditional on the objectives of the user and the prevailing market environment. BNNs achieved the best balance between point accuracy and calibrated uncertainty intervals, making them most suitable when both priorities were important. GPR performed reliably under stable conditions with limited data requirements. Bayesian LSTM provided the most conservative uncertainty estimates, making it preferable for risk-averse applications, where capturing extreme outcomes is critical.
These findings reinforce the value of Bayesian approaches in financial forecasting, as they embed uncertainty directly into predictions. This dual advantage of accuracy and interpretability advances the methodological landscape and offers practical benefits to analysts, investors, and policymakers operating in inherently uncertain environments.
The performance differences observed across the models can be understood through the lens of data structure and model architecture. GPR’s degradation on Discovery is mechanistically linked to the cubic computational cost of kernel matrix inversion ( O ( n 3 ) ), which limits its practical window length and thus its ability to capture long-range dependencies under high volatility. Bayesian LSTM’s wide intervals are a structural consequence of Monte Carlo Dropout; by retaining stochastic units at test time, the model aggregates variance from all network layers, producing conservative but high-coverage intervals that are particularly valuable during volatility regimes such as the post-COVID-19 recovery period. Regarding BNN’s superior accuracy, the uncertainty trade-off stems from its direct parameterization of weight distributions, which allows it to adapt posterior uncertainty to the local data density without the bandwidth constraints of kernel methods or the sequential bias of recurrent architectures.
The findings carry concrete recommendations for financial practitioners. Portfolio managers seeking to optimize risk-adjusted returns in stable large-cap environments (analogous to FirstRand’s price behavior) should consider GPR as a computationally efficient and interpretable alternative. For volatile growth stocks (analogous to Discovery), BNNs offer the best combination of prediction accuracy and calibrated uncertainty, directly supporting value-at-risk (VaR) calculations and position sizing decisions. Risk officers and compliance teams for whom regulatory capital depends on conservative worst-case scenarios may find Bayesian LSTM’s wider intervals more appropriate, as they are less likely to underestimate tail risk during market dislocations. These prescriptions move beyond generic recommendations of “match model to risk preference” by linking specific model behaviors to concrete financial use cases.

5. Conclusions

This study investigated the effectiveness of three Bayesian probabilistic methods, such as Gaussian process regression, for stock price forecasting on the JSE. Using daily data for FirstRand and Discovery Limited from January 2005 to June 2025, models were evaluated by both point forecast accuracy and uncertainty quantification capabilities.
The quantitative evaluation across both stocks confirms three clear model-specific patterns. The BNNs consistently achieved the lowest point forecast errors and near-nominal PICP values (0.95 for FirstRand, 0.94 for Discovery) with relatively compact interval widths, establishing them as the most versatile Bayesian forecasting tool across varying market conditions. GPR delivered well-calibrated intervals in stable regimes but deteriorated markedly under Discovery’s higher volatility, reflecting the structural limitations of kernel-based methods in fast-moving markets. Bayesian LSTM produced the widest intervals (highest MPIW) at adequate coverage, confirming its suitability for risk-averse applications, where capturing extreme outcomes takes priority over precision. Importantly, all three Bayesian approaches outperformed both the ARIMA and standard LSTM baselines, providing direct empirical evidence that probabilistic modeling adds measurable value over deterministic alternatives.
The study makes three contributions: (1) the first condition-controlled comparison of GPR, Bayesian LSTM, and BNNs on JSE-listed equities; (2) a comprehensive uncertainty evaluation using both qualitative interval plots and quantitative PICP and MPIW metrics; and (3) concrete, mechanism-grounded recommendations for matching the Bayesian model choice to financial use cases, market volatility regimes, and institutional risk mandates.
In practical terms, short-term traders and portfolio managers prioritizing point accuracy and balanced uncertainty should favor BNNs. Risk officers requiring conservative worst-case interval coverage for regulatory capital calculations will find Bayesian LSTM more appropriate. GPR remains a competitive and computationally interpretable option for stable markets or smaller datasets, where kernel smoothness assumptions are well founded.

5.1. Limitations

Several limitations of the present study should be acknowledged. First, the analysis was restricted to two JSE-listed companies, which limits the generalizability of the findings. Second, while the inclusion of non-Bayesian baselines contextualized the results, a broader benchmark set including XGBoost, random forest, and Transformer-based architectures would further strengthen the comparative evaluation. Third, the GPR implementation was constrained by computational scalability; the use of full-data kernel inference limited the practical window size, and this may disadvantage GPR relative to the neural approaches. Fourth, the study did not account for transaction costs, slippage, or market impact, which are critical factors in translating forecast accuracy into actual trading performance. Fifth, the model hyperparameters were tuned on the validation set using a fixed protocol; a more exhaustive search (e.g., Bayesian optimization) may yield further performance improvements.

5.2. Future Research Directions

Building directly on these limitations, future work should (1) extend the analysis to a broader panel of JSE-listed securities and international emerging markets; (2) incorporate macroeconomic indicators such as interest rates, exchange rates, and commodity prices as additional features; (3) integrate alternative data sources, including news sentiment indices and social media signals; (4) explore scalable sparse GP approximations (e.g., inducing-point methods) to overcome GPR’s computational bottleneck; and (5) embed the probabilistic forecasts within realistic trading simulations, accounting for transaction costs, to assess practical financial utility and risk-adjusted returns.

Author Contributions

Conceptualization, N.N. and D.M.; methodology, D.M. and N.N.; software, N.N.; validation, N.N. and D.M.; formal analysis, N.N.; investigation, N.N.; resources, D.M.; data curation, N.N.; writing—original draft preparation, N.N.; writing—review and editing, D.M.; visualization, N.N.; supervision, D.M.; project administration, D.M.; funding acquisition, D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable. This study used publicly available secondary data from Yahoo Finance and did not involve human or animal subjects.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study were obtained from Yahoo Finance using the open-source Python package yfinance and are publicly available at https://finance.yahoo.com, accessed on 5 May 2026. The processed datasets and code used in this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to thank the Department of Statistics and Operations Research at the University of Limpopo and the School of Computational and Applied Mathematics at the University of the Witwatersrand for their academic support. Special thanks to Raquel Morgan for the understanding and support throughout this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bisht, A.; Chahar, A.; Kabthiyal, A.; Goel, A. Stock prediction using gaussian process regression. In Proceedings of the 6th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 29–31 March 2022; pp. 693–699. [Google Scholar]
  2. Ji, X.; Wang, J.; Yan, Z. A stock price prediction method based on deep learning technology. Int. J. Crowd Sci. 2021, 5, 55–72. [Google Scholar] [CrossRef]
  3. Shahi, T.B.; Shrestha, A.; Neupane, A.; Guo, W. Stock price forecasting with deep learning: A comparative study. Mathematics 2020, 8, 1441. [Google Scholar] [CrossRef]
  4. Maeda, I.; Matsushima, H.; Sakaji, H.; Izumi, K.; deGraw, D.; Tomioka, H.; Kato, A.; Kitano, M. Learning uncertainty in market trend forecast using bayesian neural networks. In Decision Economics: Complexity of Decisions and Decisions for Complexity; Springer: Cham, Switzerland, 2020; pp. 210–218. [Google Scholar]
  5. Maddodi, S.; Kunte, A. Stock market prediction using machine learning: A systematic review. J. Financ. Data Sci. 2024, 6, 45–62. [Google Scholar]
  6. Jang, H.; Lee, J. An empirical study on modeling and prediction of bitcoin prices with bayesian neural networks based on blockchain information. IEEE Access 2017, 6, 5427–5437. [Google Scholar] [CrossRef]
  7. Ariyo, A.A.; Adewumi, A.O.; Ayo, C.K. Stock price prediction using the ARIMA model. In Proceedings of the UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, Cambridge, UK, 26–28 March 2014; pp. 106–112. [Google Scholar]
  8. Wang, Z.; Qi, Z. Future stock price prediction based on Bayesian LSTM in CRSP. In Proceedings of the 3rd International Conference on Internet Finance and Digital Economy (ICIFDE 2023), Chengdu, China, 4–6 August 2023; pp. 219–230. [Google Scholar]
  9. Chandra, R.; He, Y. Bayesian neural networks for stock price forecasting before and during COVID-19 pandemic. PLoS ONE 2021, 16, e0253217. [Google Scholar] [CrossRef] [PubMed]
  10. Wang, M. Advanced stock market forecasting: A comparative analysis of ARIMA-GARCH, LSTM, and integrated wavelet-LSTM models. In SHS Web of Conferences; EDP Sciences: Les Ulis, France, 2024; Volume 196, p. 02008. [Google Scholar]
  11. Balasubramanian, P.; Chinthan, P.; Badarudeen, S.; Sriraman, H. A systematic literature survey on recent trends in stock market prediction. Peerj Comput. Sci. 2024, 10, e1700. [Google Scholar] [CrossRef] [PubMed]
  12. Vijh, M.; Chandola, D.; Tikkiwal, V.A.; Kumar, A. Stock closing price prediction using machine learning techniques. Procedia Comput. Sci. 2020, 167, 599–606. [Google Scholar] [CrossRef]
  13. Nikou, M.; Mansourfar, G.; Bagherzadeh, J. Stock price prediction using DEEP learning algorithm and its comparison with machine learning algorithms. Intell. Syst. Account. Financ. Manag. 2019, 26, 164–174. [Google Scholar] [CrossRef]
  14. Goan, E.; Fookes, C. Bayesian neural networks: An introduction and survey. In Case Studies in Applied Bayesian Data Science: CIRM Jean-Morlet Chair, Fall 2018; Springer: Cham, Switzerland, 2020; pp. 45–87. [Google Scholar]
  15. Suphawan, K.; Kardkasem, R.; Chaisee, K. A gaussian process regression model for forecasting stock exchange of thailand. Trends Sci. 2022, 19, 3045. [Google Scholar] [CrossRef]
  16. Li, Z.; Tran, M.N.; Wang, C.; Gerlach, R.; Gao, J. A bayesian long short-term memory model for value at risk and expected shortfall joint forecasting. arXiv 2020, arXiv:2001.08374. [Google Scholar]
Figure 1. FirstRand stock closing prices over time (January 2005–June 2025).
Figure 1. FirstRand stock closing prices over time (January 2005–June 2025).
Algorithms 19 00391 g001
Figure 2. Discovery stock closing prices over time (January 2005–June 2025).
Figure 2. Discovery stock closing prices over time (January 2005–June 2025).
Algorithms 19 00391 g002
Figure 3. FirstRand stock correlation matrix.
Figure 3. FirstRand stock correlation matrix.
Algorithms 19 00391 g003
Figure 4. Discovery stock correlation matrix.
Figure 4. Discovery stock correlation matrix.
Algorithms 19 00391 g004
Figure 5. FirstRand GPR forecast.
Figure 5. FirstRand GPR forecast.
Algorithms 19 00391 g005
Figure 6. Discovery GPR forecast.
Figure 6. Discovery GPR forecast.
Algorithms 19 00391 g006
Figure 7. FirstRand Bayesian LSTM Forecast showing predicted values and 95% confidence intervals.
Figure 7. FirstRand Bayesian LSTM Forecast showing predicted values and 95% confidence intervals.
Algorithms 19 00391 g007
Figure 8. Discovery Bayesian LSTM Forecast showing predicted values and 95% confidence intervals.
Figure 8. Discovery Bayesian LSTM Forecast showing predicted values and 95% confidence intervals.
Algorithms 19 00391 g008
Figure 9. FirstRand BNN forecast.
Figure 9. FirstRand BNN forecast.
Algorithms 19 00391 g009
Figure 10. Discovery BNN forecast.
Figure 10. Discovery BNN forecast.
Algorithms 19 00391 g010
Table 1. Summary of representative studies on Bayesian and machine learning approaches to stock price forecasting.
Table 1. Summary of representative studies on Bayesian and machine learning approaches to stock price forecasting.
StudyMethodDatasetKey FindingLimitation
[9]BNNUS stocks (pre/post COVID-19)BNNs outperform deterministic models in volatile periodsSingle model evaluated
[8]Bayesian LSTMCRSP (US)Bayesian LSTM outperforms standard LSTM on MAE and RMSENo comparison with BNN or GPR
[15].GPRSET (Thailand)GPR yields well-calibrated intervals in stable regimesDoes not address volatile stocks
[13]LSTM, SVM, ANN, RFTehran Stock ExchangeLSTM best among deterministic modelsNo uncertainty quantification
Present studyGPR, Bayesian LSTM, BNNJSE (South Africa)BNNs achieve best accuracy–uncertainty trade-offLimited to two JSE-listed companies
Table 2. Descriptive statistics of FirstRand stock dataset.
Table 2. Descriptive statistics of FirstRand stock dataset.
VariableCountMeanStdMin25%50%75%Max
Close51874002.922185.64890.071705.384164.216091.268684.13
High51874053.652210.83925.961731.504228.976168.518916.99
Low51873948.562160.34842.211681.244075.416021.388604.17
Open51874000.452183.12877.311706.674158.006091.208784.07
Volume518713,900,0009,230,0000.008,420,00011,900,00016,900,000132,000,000
RSI518751.8510.4012.5444.6652.0759.4681.11
MACD51878.6077.75−594.19−25.9010.4345.26297.14
Note: Closing price mean of approximately 4003 with substantial variation highlights long-term growth and high volatility.
Table 3. Descriptive statistics of Discovery stock dataset.
Table 3. Descriptive statistics of Discovery stock dataset.
VariableCountMeanStdMin25%50%75%Max
Close51879,117.585403.801692.153306.5010,331.8513,580.5822,189.00
High51879244.155483.861701.883337.1410,548.9013,743.4624,956.83
Low51878987.355324.721691.183288.0210,142.0513,406.8322,105.00
Open51879116.155402.581701.883310.8810,310.2313,570.3422,400.00
Volume5187 1.23 × 10 6 1.00 × 10 6 0.00 6.08 × 10 5 1.06 × 10 6 1.58 × 10 6 1.44 × 10 7
RSI518752.8212.1814.0544.6953.1061.3389.23
MACD518726.32207.93−1228.18−38.9422.64107.47800.43
Table 4. GPR test performance metrics for FirstRand and Discovery.
Table 4. GPR test performance metrics for FirstRand and Discovery.
CompanyMAERMSEMSE
FirstRand72.27102.8010,567.75
Discovery115.15226.1951,161.02
Table 5. GPR actual vs. predicted values with 95% confidence intervals.
Table 5. GPR actual vs. predicted values with 95% confidence intervals.
FirstRandDiscovery
DateActualPredicted (95% CI)DateActualPredicted (95% CI)
2021-04-265048.185061.84 (5019.03, 5104.66)2021-04-2613,161.4613,135.92 (12,847.23, 13,424.61)
2021-04-285144.975142.05 (5099.33, 5184.77)2021-04-2813,151.4613,143.10 (12,854.18, 13,432.02)
2021-04-295310.615306.91 (5263.90, 5349.91)2021-04-2913,173.4613,312.79 (13,018.45, 13,607.13)
2021-04-305093.085278.54 (5145.17, 5411.92)2021-04-3013,196.4513,099.82 (12,810.56, 13,389.08)
2021-05-034979.334998.74 (4955.93, 5041.54)2021-05-0313,214.4513,263.59 (12,972.45, 13,554.73)
2021-05-044944.404955.85 (4913.12, 4998.59)2021-05-0413,352.4113,447.38 (13,154.36, 13,740.40)
2021-05-055112.045077.24 (5034.49, 5120.00)2021-05-0513,771.3013,736.31 (13,438.21, 14,034.41)
2021-05-065232.785240.03 (5197.11, 5282.95)2021-05-0614,131.2014,183.20 (13,877.64, 14,488.76)
2021-05-075313.615308.10 (5265.22, 5350.98)2021-05-0714,438.1214,373.84 (14,065.32, 14,682.36)
2021-05-105357.515368.45 (5325.56, 5411.33)2021-05-1014,335.1514,459.49 (14,148.95, 14,770.03)
Table 6. Bayesian LSTM test performance metrics.
Table 6. Bayesian LSTM test performance metrics.
CompanyMAERMSEMSE
FirstRand133.49175.2130,697.15
Discovery337.39452.78205,014.10
Table 7. Bayesian LSTM actual vs. predicted values with 95% confidence intervals.
Table 7. Bayesian LSTM actual vs. predicted values with 95% confidence intervals.
FirstRandDiscovery
DateActualPredicted (95% CI)DateActualPredicted (95% CI)
2021-04-295310.615146.77 (4878.89, 5414.65)2021-04-2913,173.4613,305.09 (12,756.89, 13,853.30)
2021-04-305093.085269.30 (4993.28, 5545.33)2021-04-3013,196.4513,288.64 (12,842.93, 13,734.35)
2021-05-034979.334993.40 (4740.21, 5246.59)2021-05-0313,214.4513,390.91 (13,061.83, 13,720.00)
2021-05-044944.405266.94 (4977.60, 5556.29)2021-05-0413,352.4113,393.80 (12,989.31, 13,798.29)
2021-05-055112.045127.20 (4830.34, 5424.05)2021-05-0513,771.3013,421.04 (12,947.37, 13,894.71)
2021-05-065232.785066.56 (4780.26, 5352.85)2021-05-0614,131.2013,667.21 (13,253.40, 14,081.02)
2021-05-075313.615215.83 (4928.66, 5502.99)2021-05-0714,438.1213,892.57 (13,482.48, 14,302.67)
2021-05-105357.515307.38 (5032.96, 5581.81)2021-05-1014,335.1514,233.46 (13,807.31, 14,659.60)
2021-05-115374.485410.93 (5086.87, 5734.99)2021-05-1114,252.1714,589.68 (14,151.00, 15,028.37)
2021-05-125373.485380.59 (5109.37, 5651.81)2021-05-1214,389.1314,715.69 (14,286.00, 15,145.38)
Table 8. BNN test performance metrics for FirstRand and Discovery.
Table 8. BNN test performance metrics for FirstRand and Discovery.
CompanyMAERMSEMSE
FirstRand63.7883.296936.64
Discovery144.56201.5040,602.39
Note: BNN achieved the lowest errors among all three models for both stocks, demonstrating the best balance between point forecast accuracy and calibrated uncertainty quantification.
Table 9. BNN actual vs. predicted values with 95% confidence intervals.
Table 9. BNN actual vs. predicted values with 95% confidence intervals.
FirstRandDiscovery
DateActualPredicted (95% CI)DateActualPredicted (95% CI)
2021-04-265048.185070.63 (4849.39, 5291.88)2021-04-2613,161.4613,189.18 (11,969.96, 14,408.40)
2021-04-285144.975108.77 (4908.87, 5308.68)2021-04-2813,151.4613,290.65 (12,211.45, 14,369.85)
2021-04-295310.615282.99 (5082.41, 5483.58)2021-04-2913,173.4613,254.67 (12,094.09, 14,415.26)
2021-04-305093.085068.09 (4455.98, 5680.20)2021-04-3013,196.4513,197.45 (12,110.80, 14,284.09)
2021-05-034979.335065.38 (4866.45, 5264.30)2021-05-0313,214.4513,406.51 (12,333.40, 14,479.62)
2021-05-044944.404990.09 (4776.33, 5203.85)2021-05-0413,352.4113,468.26 (12,404.92, 14,531.60)
2021-05-055112.045034.59 (4834.31, 5234.87)2021-05-0513,771.3013,658.38 (12,497.61, 14,819.15)
2021-05-065232.785190.07 (4969.25, 5410.88)2021-05-0614,131.2014,083.30 (12,693.96, 15,472.65)
2021-05-075313.615235.15 (5002.94, 5467.35)2021-05-0714,438.1214,349.89 (13,002.54, 15,697.25)
2021-05-105357.515357.13 (5087.71, 5626.55)2021-05-1014,335.1514,652.76 (13,182.88, 16,122.63)
Table 10. Comparative performance of Bayesian models.
Table 10. Comparative performance of Bayesian models.
ModelFirstRandDiscovery
MAERMSEMSEMAERMSEMSE
GPR72.27102.8010,567.75115.15226.1951,161.02
Bayesian LSTM133.49175.2130,697.15337.39452.78205,014.10
BNN63.7883.296936.64144.56201.5040,602.39
Note: Bold values indicate the best (lowest) performance metric in each column.
Table 11. Uncertainty quantification metrics: PICP and MPIW at 95% confidence level.
Table 11. Uncertainty quantification metrics: PICP and MPIW at 95% confidence level.
ModelFirstRandDiscovery
PICPMPIWPICPMPIW
GPR0.94172.80.92568.3
Bayesian LSTM0.96496.50.95981.6
BNN0.95398.20.94842.7
Note: Bold indicates best balance between coverage and sharpness.
Table 12. Comparison with non-Bayesian baselines on the test set.
Table 12. Comparison with non-Bayesian baselines on the test set.
ModelFirstRandDiscovery
MAERMSEMSEMAERMSEMSE
ARIMA198.43261.1768,209.85412.56581.29337,897.41
Standard LSTM89.14121.0314,648.26189.73267.4071,502.76
GPR (Bayesian)72.27102.8010,567.75115.15226.1951,161.02
BNN (Best Bayesian)63.7883.296936.64144.56201.5040,602.39
Note: Bold indicates best performance.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nelufhangani, N.; Maposa, D. Bayesian Deep Learning and Probabilistic Forecasting of Stock Prices. Algorithms 2026, 19, 391. https://doi.org/10.3390/a19050391

AMA Style

Nelufhangani N, Maposa D. Bayesian Deep Learning and Probabilistic Forecasting of Stock Prices. Algorithms. 2026; 19(5):391. https://doi.org/10.3390/a19050391

Chicago/Turabian Style

Nelufhangani, Ndivhuwo, and Daniel Maposa. 2026. "Bayesian Deep Learning and Probabilistic Forecasting of Stock Prices" Algorithms 19, no. 5: 391. https://doi.org/10.3390/a19050391

APA Style

Nelufhangani, N., & Maposa, D. (2026). Bayesian Deep Learning and Probabilistic Forecasting of Stock Prices. Algorithms, 19(5), 391. https://doi.org/10.3390/a19050391

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop