Next Article in Journal
Supply Chain Risk in Eyeglass Manufacturing: An Empirical Case Study on Lens Inventory Management During Global Crises
Previous Article in Journal
Examining the Research Taxonomy of Credit Default Swaps Literature Through Bibliographic Network Mapping
Previous Article in Special Issue
Bridging Asset Pricing and Market Microstructure: Option Valuation in Roll’s Framework
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dirichlet Mixed Process Integrated Bayesian Estimation for Individual Securities

1
Faculty of Finance and Banking, College of Economics, Can Tho University, Can Tho City 90000, Vietnam
2
College of Natural Sciences, Can Tho University, Can Tho City 90000, Vietnam
3
Faculty of Agribusiness and Commerce, Lincoln University, Christchurch 7647, New Zealand
*
Author to whom correspondence should be addressed.
J. Risk Financial Manag. 2025, 18(6), 304; https://doi.org/10.3390/jrfm18060304
Submission received: 21 April 2025 / Revised: 31 May 2025 / Accepted: 1 June 2025 / Published: 4 June 2025
(This article belongs to the Special Issue Featured Papers in Mathematics and Finance, 2nd Edition)

Abstract

:
Bayesian nonparametric methods, particularly the Dirichlet process (DP), have gained increasing popularity in both theoretical and applied research, driven by advances in computing power. Traditional Bayesian estimation, which often relies on Gaussian priors, struggles to dynamically integrate evolving prior beliefs into the posterior distribution for decision-making in finance. This study addresses that limitation by modeling daily security price fluctuations using a Dirichlet process mixture (DPM) model. Our results demonstrate the DPM’s effectiveness in identifying the optimal number of clusters within time series data, leading to more accurate density estimation. Unlike kernel methods, the DPM continuously updates the prior density based on observed data, enabling it to better capture the dynamic nature of security prices. This adaptive feature positions the DPM as a superior estimation technique for time series data with complex, multimodal distributions.

1. Introduction

Bayesian nonparametric methods have gained widespread popularity in both theoretical and applied research, providing a powerful alternative to the frequentist approach, which derives estimates solely from the sample data. In contrast, Bayesian methods achieve consistency by integrating prior information with observed data. Early foundational work by Freedman (1963) and Diaconis and Freedman (1986) explored fundamental questions regarding the behavior of posterior distributions. Specifically, they investigated how the posterior evolves when data are generated from a fixed, true density, and whether the posterior cumulative distribution accurately captures neighborhoods around true density. These inquiries led to the formulation of two key concepts: weak consistency and posterior consistency. Introduced by Schwartz (1965), weak consistency examines whether the posterior mass concentrates around the true density in a “weak” sense, often using convergence criteria like weak convergence, such as weak convergence of probability measures. Later formulized by Walker (2004), posterior consistency focuses on the asymptotic convergence of the posterior distribution to the true density as more data become available.
The development of Bayesian nonparametric density estimation has advanced significantly, thanks to the contributions of numerous researchers. Geman and Hwang (1982) and Barron (1988) pioneered the use of empirical processes based on Bayes’ theorem to establish uniform consistency, laying the foundation for later advancements in strong consistency. Their approach utilized sieve sequences of models approximating more complex models, and provided sufficient conditions for strong posterior consistency. Building on this foundation, Barron et al. (1999) and Amewou-Atisso et al. (2003) further explored consistency theorems, particularly in the context of posterior consistency. Concurrently, Verdinelli and Wasserman (1998), Ghosal et al. (1999), Petrone and Wasserman (2002), and Choudhuri et al. (2004) made substantial contributions to prior selection and modeling in the Bayesian nonparametric framework, emphasizing the importance of prior flexibility in accurately capturing the true underlying data-generating process.
A significant advancement in Bayesian methods is attributed to Ghosal et al. (1999), Walker (2004), and Chae and Walker (2017), who introduced a novel approach to achieving strong posterior consistency in Bayesian density estimation without relying on sieves. The breakthrough addressed key limitations in earlier methods by providing a more general framework for measuring higher convergence rates for posterior distributions, particularly within the context of DPM models of normal distributions as priors. These contributions offered rigorous theoretical justification for the widespread use of DP in density estimation.
In Bayesian inference, the objective is to infer an unknown underlying distribution, θ , from observed data, { x i | i 1 , n ¯ } , where the x i θ are independently and identically distributed (i.i.d.) according to θ . Traditional Bayesian methods address this by placing a prior distribution on θ , typically selected from a parametric family of distributions. The posterior distribution of θ is then computed based on the observed data, combining prior belief with empirical evidence. However, parametric Bayesian models are inherently constrained by a predefined family of distributions used as priors. Bayesian nonparametric methods overcome this limitation by placing a prior on a generalized space of probability distribution, allowing the model to adapt flexibly to the data’s complexity. This adaptability is particularly important when the true distribution is unknown or cannot be adequately represented by a parametric model. While Bayesian nonparametric approaches offer greater modeling flexibility, they also introduce computational and theoretical challenges, particularly in ensuring that the posterior distribution of θ is traceable and well behaved (Li et al., 2019).
A key challenge in nonparametric modeling is that many approaches either require discrete data or belong to distribution classes that lead to intractable posterior distributions. This limitation makes inference computationally challenging, restricting the practical applicability of nonparametric models. However, the DP has emerged as a powerful solution due to its ability to define a broad class of distributions while maintaining a tractable posterior (Roeder & Wasserman, 1997). The DP provides a practically flexible yet mathematically manageable framework, making it a cornerstone of modern Bayesian nonparametric methods. Furthermore, advances in computational techniques—particularly Gibbs sampling and other Markov Chain Monte Carlo (MCMC) methods—have dramatically improved our ability to compute posterior distributions in nonparametric models. MCMC methods enable efficient sampling from complex posteriors, allowing researchers to work with more general priors that were previously computationally prohibitive. As Escobar and West (1995) emphasized, the development of MCMC has been instrumental in expanding the scope of Bayesian inference, fueling both theoretical advancements and real-world applications.
Since Zellner’s pioneering work in the 1960s (Zellner & Chetty, 1965; Zellner, 1971), Bayesian methods have become foundational tools for modeling and predicting rational decision-making in economics. With the advancement in MCMC methods, Bayesian inference has gained even greater traction, particularly in finance, where uncertainty and dynamic updates to beliefs play a critical role (Martin et al., 2024). Traditional MCMC methods typically rely on a kernel-based density estimation to construct the prior distribution, which is then updated to obtain the posterior. However, this standard approach may not fully capture the continuous and adaptive nature of belief updating in decision-making, especially in highly volatile financial markets.
In this paper, we aim to model daily stock price fluctuations using a Bayesian nonparametric approach, specifically the DPM model, as developed by Walker (2004). The DPM model offers advantages over the traditional Bayesian method by providing both strong consistency and traceable posteriors, even when the prior is drawn from a more general family of distributions (Neal, 2000). A key strength of the DPM model lies in its ability to maintain posterior consistency, given that its base measure has a finite mean and the prior’s standard deviation exhibits exponentially decaying tails in a neighborhood of the true density. This property ensures that the model remains stable while capturing the complex, multimodal nature of financial time series data, which often exhibit volatility clustering and regime shifts.
To obtain consistent and accurate estimates in our empirical analysis, we employ mixture models that replace the DP with a more general discrete nonparametric prior. This approach offers greater flexibility and ensures stronger consistency than traditional parametric models or standard nonparametric alternatives. By adopting this framework, we aim to improve the precision of stock price fluctuation predictions—an essential objective in finance, where accurate market forecasts are critical for informed investment decisions and effective risk management.
The paper is structured as follows. Section 2 describes the Bayesian estimations, including the kernel and DPM models, and Section 3 describes the data and estimation procedures. Section 4 provides empirical results based on the names of the selected stock prices. Finally, Section 5 concludes the paper.

2. Estimation Methods

We begin by establishing a baseline Bayesian model that uses a kernel function to estimate daily stock price fluctuations. This baseline model serves as a reference point to benchmark the accuracy and consistency of subsequent models. In addition, we also include a Bayesian parametric (BP) model following Roweis and Ghahramani (1999) for comparison purposes. Next, we implement our proposed approach, the DPM model, which leverages the flexibility of a generalized nonparametric prior to more effectively capturing the complex distribution of stock prices. This allows us to adapt the model to the underlying data without the constraints of parametric assumptions. To comprehensively evaluate the performance of the DPM model against the kernel-based Bayesian model, we compare the accuracy, consistency, and robustness using three distinct valuation metrics, as outlined by Hyndman and Koehler (2006). These metrics will provide a thorough assessment of each model’s performance in terms of its predictive capabilities and reliability.

2.1. Kernel Estimation

Let { x i | i 1 , n ¯ } represent a sequence of independently and identically distributed (i.i.d.) samples drawn from a univariate distribution with an unknown density, f , at any given point, x . Our goal is to estimate the density function of f x . Kernel density estimation (KDE) is a popular method for estimating this unknown density, f x . Following Silverman (2018), the KDE for f x is given by:
f ^ x = 1 n i = 1 n K h x x i = 1 n h i = 1 n x x i h
where K is a non-negative, bounded kernel function, and h > 0 is the bandwidth, a smoothing parameter that controls the degree of smoothing. Commonly used kernel functions include the uniform, biweight, triweight, triangular, Epanechnikov, normal, and others. Among these, the normal kernel, K x = ϕ x , where ϕ is the standard normal density function, is frequently chosen for its computational convenience.
Although conventional methods like KDE and parametric estimation (e.g., Gaussian) mainly rely on the stationarity time series, these models often fail to capture the non-stationary, multimodal, and heavy-tailed distributions of stock prices (Cont, 2001). Many DPs were introduced to overcome this limitation by flexibly modeling complex distributions, dynamically detecting regime shifts, and adapting to non-stationarity through Bayesian updates (Escobar & West, 1995; Neal, 2000). Moreover, DPs were proven to effectively capture densities with non-stationary patterns in financial data (Griffin & Steel, 2006).

2.2. Dirichlet Process

The DP was introduced by Ferguson (1973) and has since become a cornerstone of Bayesian nonparametric statistics. It defines a distribution over a base probability distribution. Specifically, the DP is a stochastic process whose sample paths are probability measures that integrate to unity. In this context, a stochastic process refers to a distribution over a function space, and a sample path represents a realization of a random probability distribution. The DP is particularly valuable because it generates random probability distributions that exhibit behavior akin to traditional random distributions, which makes it highly suited for nonparametric inference. This flexibility is crucial in cases where the underlying distribution is unknown or complex, as it allows the model to adapt dynamically to the data, rather than relying on predefined assumptions about the form of the distribution (Teh, 2010).
The DP, G D i r α , H , is characterized by two key parameters: a base distribution, H , from which samples are drawn, and a concentration parameter, α > 0 , which controls the variability or dispersion of the distribution G . Given a partition of the probability space A 1 , , A r , the random vector { G A 1 , , G A r } follows a Dirichlet distribution:
{ G A 1 , , G A r } D i r { α H A 1 , , α H A r } ·
This indicates that the probabilities assigned by G to the partition sets follow a Dirichlet distribution with parameters proportional to the base measure H and scaled by the concentration parameter α .
Several methods are used for parameter estimation in DPs, including kernel techniques (Silverman, 2018), nonparametric maximum likelihood (Lindsay, 1983), and Bayesian approaches such as MCMC, Gibbs sampling, or mixture models. The DP and associated mixture models have become increasingly popular due to their ability to handle uncertainty in the model structure, particularly when the number of clusters or components is unknown. Despite these theoretical advances, applications in economics and finance remain relatively limited, possibly due to the complexity of these models and the challenges of interpretation in applied settings.

2.2.1. Parameters of Dirichlet Process Mixture Models

In a Bayesian mixture model, the population distribution is expressed as a convex combination of multiple component distributions, each representing a distinct subgroup within the population. This allows a general distribution that can be modeled as:
p θ = k = 1 K π k p k θ k
where p ( · ) is a population distribution function (pdf), π k are weight or mixing proportions that sum up to 1, p k ( · ) are component pdfs, θ = ( θ 1 , , θ K ) , and π k θ k are the component distributions with parameters θ k .
For instance, in a normal mixture model, where each follows a normal distribution parameterized by means, µ k , and variances, σ k 2 , p k · N µ k , σ k 2 , DP normal distribution can be written as:
p θ = k = 1 K π k N k µ k , σ k 2
The DP provides a mechanism for discretizing a base distribution, H . It effectively creates distributions supported on a countable set of points. In a DPM model, the population distribution is represented as a mixture of these discrete distributions, where each component distribution corresponds to a cluster of data points. The DP generates a set of parameters { θ 1 , , θ n } where each θ i is assigned to one of K possible cluster parameters { θ i , , θ K } derived from the original data.

2.2.2. Density Distribution of a Dirichlet Process Mixture Model

In a mixture model, given { z i   | i = 1 , K ¯ } is retrieved from x i as within clusters, a mixture distribution, θ , is represented as p k θ k = p k z i = k , θ , and a vector of mixing proportions is represented as π = π 1 , , π K where k = 1 K π K = I , and the probability of assigning a data point to the kth clusters is p π = π k .
The mixing proportion π in a DPM model is drawn from a Dirichlet distribution:
p α = D i r { K 1 K }
where α is the concentration parameter defined in Equation (2), controlling the variability of the mixture weights, and 1 K is a vector of K components, summing to 1. Each cluster parameter, θ k , is drawn from a base distribution, H λ , which serves as a prior over the component parameters. For computational convenience, Neal (2000) suggested choosing the prior distributions, p θ k | λ , to be conjugate to the likelihood p x i | θ k , facilitating efficient inference.
Given that an observation, x i , belongs to cluster k (i.e., z i = k ), the prior distribution p θ k | λ is linked to the data through x i F θ z i , where F is the component distribution in the mixture model. Due to the exchangeability property of Bayesian nonparametric models, and provided that the base distribution H is conjugate to the likelihood F , collapsed Gibbs sampling can be used. This approach integrates out π and θ K , allowing for direct sampling of the cluster assignment z i from a posterior distribution derived from the mixture weight π k .

2.2.3. Prior Distribution

For a simplified, unordered DPM model, consider the case where all the component variances are equal σ 1 = = σ K = 1 , the prior probabilities for each component are uniform p 1 = = p K = 1 / K , and only the mean values μ = μ 1 , , μ K are unknown. A Markov prior for K steps can be constructed as follows:
π μ π μ 1 π μ 2 | μ 1 π μ 3 | μ 2 π μ K | μ K 1
where π μ j | μ j 1 is typically a normal distribution with variance σ 2 . This represents a sequential construction of the means, where each mean is conditioned on the previous one.
A more general prior for the full set of parameters θ , σ , μ , p is given by (7):
π θ , σ , μ , p = π p π θ π σ | θ π μ | σ
where:
  • p D i r 1 , , 1 represents a symmetric Dirichlet prior on the mixing proportions. This prior assumes that, before observing any data, all components are equally likely, reflecting a state of maximum uncertainty. The vector 1 , , 1 specifies that each component has the same prior weight, ensuring that no particular component is favored a priori.
  • π θ θ 1 is a common, often improper, prior for scale parameters, reflecting a vague prior belief. More specifically, π θ = j = 1 K π θ j , implying independence between the scale parameters of each component.
  • π σ | θ represents the prior on the variances σ , conditional on the scale parameters θ . A common choice is an inverse gamma distribution, but the specific form depends on the model assumptions. It is expressed as π σ j = 1 K π μ j 1 , σ j , which implies that the prior for each σ j may depend on the previous mean μ j 1 . This notation can be better expressed as π σ μ if the dependence is on the means.
  • π μ | σ is the prior on the means μ , conditional on the variances σ . Often, this is a product of normal distributions, i.e., π μ | σ = j = 1 K N μ j | m j , τ j 2 , σ j 2 , where m j and τ j 2 are hyperparameters. This implies that the prior for each mean is normal, centered at m j , with variance scaled by the component variance σ j 2 .
This hierarchical prior structure enables flexible modeling of the mixture components and their relationships. The specific choices for the prior distributions (such as the inverse gamma for variances and normal for means) and their corresponding hyperparameters will have a significant impact on the posterior inference, shaping how the model fits the observed data and reflecting the underlying relationships between the components.

2.2.4. Posterior Distribution

To obtain the mixing proportion π and cluster parameter θ K using Gibbs sampling, Roeder and Wasserman (1997) proposed estimating the conditional distribution of the cluster assignment z for each data point x , as:
p z i , x , α , λ p z i , α p z i , x i , z i = k , λ
where z i and x i represent the sets of cluster assignments and data points excluding the ith element, respectively, and k ranges from 1 to K as the number of clusters. The conditional distribution p z i , x i , z i = k , λ is approximated by:
p z i , x , α , λ p θ k j 1 , z j = k p θ k H λ d θ k
This integral represents the likelihood of x i belonging to cluster k integrating over all possible values of θ k , weighted by the prior H λ , and the likelihood of the other data points already assigned to cluster k .
Tierney (1998) suggests using the Metropolis–Hastings algorithm to sample from the distribution in Equation (9). The Metropolis algorithm facilitates resampling the cluster assignments z i by iteratively considering each data point. For each data point, a new cluster assignment is proposed, and the acceptance of this proposal is deternined by comparing the posterior probabilities computed using Equation (9). Specifically, the algorithm compares the likelihood of the proposed cluster assignment with that of the current one, accepting the proposal with a probability proportional to the ratio of the posterior probabilities. This process allows the algorithm to explore different cluster configurations and gradually converge to the posterior distribution of the cluster assignments.

2.3. Clustering the Number of Components

Determining the optimal number of clusters is a crucial task in cluster analysis, and several methods exist to address this challenge, each with its own strengths. These methods include the Calinski and Harabasz index (Calinski & Harabasz, 1974), the Duda index (Duda & Hart, 1973), the C-index (Hubert & Levin, 1976), and the Friedman index (Friedman & Rubin, 1967). The Friedman index, which is a non-hierarchical clustering method, is computed as:
F r i e d m a n = t r a c e W ^ k 1 B k
where W k is the within-group dispersion matrix of K clusters, and B k is the between-group dispersion matrix of K clusters. Milligan and Cooper (1985) suggested using the maximum difference in Friedman index values across different values of K to identify the optimal number of clusters.
The Euclidean distance, commonly used in clustering, measures the distance between data points x and y in d dimensions:
d x , y = j = 1 d x j y j 1 / 2 ·
Ward (1963), Murtagh and Legendre (2014), and Saxena et al. (2017) suggested minimizing the total within-cluster variance:
S k = j = 1 K i = 1 N x i c j 2
where K is the number of clusters, N is the number of elements in each cluster, x i is the ith element, and c j is the centroid of the j th cluster. Hierarchical clustering algorithms iteratively calculate S k , merging the closet clusters at each step to minimize within-cluster variances until the optimal number of clusters is reached.
The DPM model offers several advantages for time series modeling, including scalability and robustness. Rasmussen and Ghahramani (2001) demonstrated the DPM’s scalability in hierarchical extensions for clustering and density estimation. Müller et al. (2004) emphasized the DPM’s ability to incorporate uncertainty in density estimates, which is particularly important for risk-sensitivity applications in time series. Griffin and Steel (2006) are considered pioneers in applying DPMs to this domain, paving the way for more efficient methods and advancements in stock price modeling.

2.4. Algorithms

The DPM algorithm estimates the density f of a dataset, x = x 1 , x 2 , , x n , representing daily stock prices, modeled as a mixture of distributions with parameters drawn from a Dirichlet process prior (4). The mixture model is expressed as in Equation (13):
f x = k = 1     π k f x θ k ,
where π k are the mixing proportions, f θ k is the component distribution with parameter θ k , and the weights π k are generated by a Dirichlet process, D P α , G 0 , with concentration parameter α and base measure G 0 . The algorithm proceeds as follows:
Step 1: Initialization:
-
Specify the maximum number of clusters, K. K is determined by the coefficient F in Equation (10).
-
Input: data x , number of iterations N iter , and a grid x grid   = g 1 , g 2 , , g N grid spanning x min   , x max .
-
Initialize parameters:
Mixing proportions: π k = 1 / K , for k = 1 , , K .
Cluster parameters: θ k G 0 , where G 0 is the base distribution (e.g., a normal-inverse-gamma prior for mean and variance).
Cluster assignments: z i M u l t i n o m i a l π 1 , , π K , for i = 1 , , n .
Step 2: Gibbs Sampling:
The Gibbs sampling algorithm iteratively updates the posterior parameters w , θ , and z by sampling from their conditional distributions, following Escobar and West (1995). The steps for each iteration are:
Step 2.1: Sample Cluster Assignments (z):
For each data point x i , i = 1 , , n
-
Compute the conditional probability for each cluster k = 1 , , K , as in Equation (14):
p z i = k x i , θ , z i w k f x i θ k
where z i denotes the assignments excluding z i .
-
Normalize probabilities using numerical stabilization:
p z i = k = e x p l o g w k + l o g f x i θ k m k = 1 K e x p l o g w k + l o g f x i θ k m ,
where m = m a x k l o g w k + l o g f x i θ k .
-
Sample z i M u l t i n o m i a l ( p ) .
Step 2.2: Sample Mixing Proportions (w):
-
Compute the number of points in each cluster: n k = i = 1 n   I z i = k .
-
Sample w D i r i c h l e t α / K + n 1 , , α / K + n K , as in Equation (5).
Step 2.3 Sample Cluster Parameters ( θ ):
For each cluster k = 1 , , K with n k > 0 :
-
Compute the posterior distribution for θ k given the data assigned to cluster k :
p θ k x k , z G 0 θ k i : z i = k     f x i θ k
where x k = x i : z i = k .
-
Sample θ k from the posterior, assuming G 0 and f θ k are conjugate (e.g., normal-inverse-gamma for normal components).
Step 3: Posterior Storage:
For iterations s = B + 1 , , N iter   , store:
-
Cluster assignments: z ( s ) = z 1 ( s ) , , z n ( s ) .
-
Mixing proportions: w ( s ) = w 1 ( s ) , , w K ( s ) .
-
Cluster parameters: θ ( s ) = θ 1 ( s ) , , θ K ( s ) .
Step 4: Density Estimation:
-
For each iteration s = B + 1 , , N iter   , compute the mixture density on x grid :
f ( s ) g j = k = 1 K     w k ( s ) f g j θ k ( s ) , j = 1 , , N grid .
-
Compute the average density:
f ^ g j = 1 N i t e r B s = B + 1 N i t e r     f ( s ) g j
Step 5: Output:
Return x grid and the estimated density f ^ x grid .

2.5. Measures of Discrepancy Between Kernel and Dirichlet Methods

Our objective was to compare the true density values, y , with the estimated density value y ^ , ensuring close agreement between them to demonstrate high estimation accuracy in the proposed model. Several metrics were applied to evaluate the accuracy of probability distribution estimation methods. Commonly used measures include the mean squared error (MSE), mean absolute percentage error (MAPE), and Pearson’s correlation coefficient (Pearson’s R). In addition, we also employ five additional metrics—root mean squared error (RMSE), mean absolute deviation (MAD), normalized mean squared error (NMSE), relative absolute error (RAE), and root relative squared error (RRSE)—to provide a more comprehensive assessment of estimation accuracy. An extensive survey and explanations can be found in Hyndman and Koehler (2006).

3. Data and Estimation Procedures

3.1. Description of Data

For our empirical analysis, we randomly selected stock price data from companies listed on the Nasdaq Stock Market. Specifically, six stocks across different industries, including AAPL, JPM, WMT, XOM, JNJ, and BA, and their closed prices were selected and retrieved from the yfinance package, an open-source database provided by Yahoo Finance. The data span from 1 January 2020 to 1 November 2024, comprising 1217 daily closing price observations for each stock.
Figure 1 displays the stock price time series across different sectors, and Figure 2 also illustrates the complex and unpredictable fluctuation patterns of their prior distributions. These stock price attributes significantly hinder the likelihood of model accuracy as well as predictive power. This highlights the importance of understanding the underlying distributions of stock prices and developing robust methods to account for their inherent uncertainty.
Table 1 highlights notable heterogeneity among the analyzed stocks in terms of volatility, mean price, and price distribution. In particular, BA, AAPL, and JPM exhibit significantly higher standard deviations (40.45, 39.56, and 34.67, respectively) and wider price ranges, indicating high volatility. These stocks, characterized by greater risk, may also offer the potential for substantial returns. In contrast, WMT and JNJ display relatively lower standard deviations (9.75 and 12.33, respectively) and narrower price ranges, suggesting more stability and suitability for risk-averse investors. Meanwhile, XOM exhibits a moderate standard deviation as well as a smaller price spread, indicating a moderate risk among the selected stocks.
In addition, the mean stock prices vary widely, from 48.61 (WMT) to 197.71 (BA), underscoring the diversity in price scales across these stocks. For most stocks, the mean and median prices are closely aligned, reflecting a relatively symmetrical price distribution. However, stocks like BA, JNJ, and AAPL show significant discrepancies in their interquartile ranges, pointing to pronounced price fluctuations. On the other hand, WMT, XOM, and JPM exhibit narrower interquartile ranges, indicating more stable price movements.

3.2. Estimation Procedures

Our estimation procedures follow several key steps.
  • Step 1: Split the stock price series into training and testing sets.
  • Step 2: Estimate the prior probability density using the kernel and DMP methods.
  • Step 3: Compute the posterior probability density using the kernel and DMP methods.
  • Step 4: Compare the estimate accuracy using some common measures.

4. Empirical Results and Discussions

4.1. Empirical Results

The ADF and KPSS tests in Table 1 indicate that the stock prices exhibit non-stationary patterns at the 10% level of significance, at least. The normality of the stock price distributions was assessed using the Jarque–Bera test. The results show that most of the analyzed stocks exhibit significant deviations from normality, as indicated by extremely low p-values p < 0.05 , suggesting that their price series do not follow a Gaussian distribution. The only exception to this is the GS price series, which appears to conform to a normal distribution. These non-stationary patterns of stock prices are commonly pronounced and evidenced in many theoretical and empirical papers, including Boness et al. (1974) and Neal (2000), among others.
Additionally, the Ljung–Box test revealed statistically significant autocorrelation in the closing price time series for all the analyzed stocks, with p-values approaching zero p < 0.05 . This indicates a strong dependence of current stock prices on past values, a typical feature in financial time series data.
In summary, the stocks were analyzed for stationarity (using the Augmented Dickey-Fuller and KPSS tests), normality (via the Jarque–Bera test), and autocorrelation (through the Ljung–Box test). The results, presented in Table 1, show that all stocks display nonstationary patterns and autocorrelation, and most likely do not follow a normal distribution. These attributes are particularly common in many financial time series data (Table 1). Figure 2 illustrates the density distributions of the selected stock prices.
Table 2 summarizes the results from DPM, KDE, and BP estimations. The results demonstrate the superior performance of the DPM model compared to the kernel model in predicting the stock prices across most error metrics. The DPM exhibits significantly lower MAE, RMSE, and MAPE, indicating greater accuracy and efficiency. This improvement is particularly evident for AAPL and WMT, with the MAPE for AAPL decreasing dramatically from 74.78% to 43.91% while the MAPE for WMT drops significantly from 86.16% to 61.65%. Additionally, the consistently higher R 2 for the DPM across these stocks confirms its stronger predictive capability, as well as indicating superior modeling and noise reduction.
In contrast, BA stock represents a more challenging case due to its higher volatility. The DPM, BP, and KDE for BA yield comparable MAE and RMSE values, but the DPM maintains an advantage in the MAPE. The KDE, however, achieves a slightly lower R 2 (0.9719) compared to the DPM (0.9845). This suggests that while the DPM remains competitive, the KDE could also be a suitable alternative for highly volatile stocks like BA.
For stocks such as XOM and WMT, which typically exhibit higher error levels due to increased market volatility, the DPM still outperforms the BP and KDE. The DPM’s significantly lower MAE and RMSE values underscore its effectiveness in minimizing prediction errors, even under these challenging conditions. For example, the DPM achieves a notably higher R 2 of 0.7395 for XOM, compared to the KDE’s 0.5053, further highlighting the DPM’s robustness in volatile markets.
Additionally, the DPM exhibits substantially lower MAPE values, particularly for stocks with more complex data patterns. Although the performance differences between the DPM, BP, and KDE are less pronounced for stocks like BA and JNJ, the DPM still maintains an overall advantage, thanks to its consistently lower error metrics across the board.
Figure 3 highlights the superior stability of the DPM method compared to the BP and KDE. The DPM’s error distributions are more concentrated around the median, with fewer outliers, a trend particularly evident in all the selected stock prices. The DPM excels in both reducing errors and maintaining stable results. In contrast, the KDE shows wider error distributions and larger outliers, indicating its greater susceptibility to unusual data points or noise. This vulnerability is most pronounced in highly volatile stocks like NVDA, MRK, and GE. Even in these challenging cases, the DPM maintains a narrower error distribution and significantly reduces the magnitude of large errors, thereby improving the overall reliability of the model.
Furthermore, the DPM offers a substantial advantage in predictive efficiency, particularly with the MAPE, a crucial metric for comparing model performance across varying data scales. This advantage is especially valuable for stock groups with diverse value ranges, such as the energy and pharmaceutical sectors. The boxplots further illustrate that the DPM not only achieves superior error reduction but also maintains greater stability across all the stock groups, from the relatively stable financial stocks to the highly volatile energy and technology stocks. This robust performance underscores the DPM’s suitability as a more reliable method for modeling complex financial data.
Table 3 summarizes the comparative analysis of error indices between the BP, KDE, and DPM models. The test results reveal the superior performance of the DPM across most metrics. Specifically, the DPM exhibited lower mean values for error indices, including the MAE, RMSE, MAD, MAPE, NMSE, RAE, and RRSE, while demonstrating a higher coefficient of determination ( R 2 ). This indicates that the DPM not only achieves greater predictive accuracy but also provides a better fit to observed data fluctuations.
Furthermore, statistical significance tests, including both the Wilcoxon signed-rank test and the Kruskal–Wallis test, confirmed that the differences observed were statistically significant p < 0.1 for most performance indicators, except for the MAE, RMSE, R 2 , and MAD. These findings suggest that the DPM is a more effective methodology for data modeling and prediction compared to the BP and KDE. Consequently, the DPM should be prioritized in applications that require high accuracy and a robust representation of data variability.
In addition, the DPM outperforms the BP and KDE models in estimating stock price density across all the selected stock prices. The DPM’s lower error metrics, particularly MAPE, and higher R 2 highlight its superior forecasting accuracy and noise reduction capabilities. Although the performance difference among the DPM, BP, and KDE may be less pronounced for some highly volatile stocks, the DPM generally maintains its advantage in both stability and accuracy. This consistent superiority underscores the DPM’s effectiveness in modeling and predicting stock prices across various market conditions.

4.2. Discussions

The DPM demonstrates superior stock price prediction performance compared to the BP and KDE, particularly for highly volatile stock prices. The results presented in Table 3 highlight that the DPM significantly reduces error metrics, including the MSE, MAPE, and MAD, while simultaneously increasing R 2 at the 10% level of significance. These improvements indicate a better model fit and enhanced predictive ability compared to the BP and KDE.
Meanwhile, the DPM maintained its superior performance, achieving an NMSE of only 0.085 compared to KDE’s 0.159 and BP’s 0.481. This demonstrates the DPM’s greater ability to minimize noise and effectively handle volatile data. A key advantage of the DPM lies in its flexibility to model complex distributions and its iterative updating of probability densities from prior to posterior, a capability that both the BP and KDE lack. Furthermore, the DPM dynamically determines the optimal number of clusters based on historical data, which enhances accuracy when dealing with multi-cluster structured data. This adaptability allows the DPM to better capture the intricate patterns and fluctuations, making it a more robust method for stock price prediction in volatile environments.
In short, the DPM offers a superior estimation method over the BP and KDE in several key areas. Firstly, the DPM consistently exhibits higher R 2 values than the BP and KDE, particularly in highly volatile stock groups like energy and financials, demonstrating that the DPM’s modeling capabilities align more closely with real-world data. Secondly, the DPM significantly reduces error metrics such as the MAE, RMSE, and MAD, all of which are expressed in the same units as the actual stock values. For example, for BA, AAPL, and JNJ stocks, the DPM shows smaller MAE and RMSE values compared to those of the BP and KDE, indicating greater forecast accuracy. Finally, for scale-independent error metrics such as the MAPE, NMSE, RAE, and RRSE, the DPM consistently outperforms the BP and KDE. Notably, the DPM significantly reduces the percentage forecast error (MAPE) in highly volatile stocks, such as BA, AAPL, and JPM. Therefore, the DPM enhances not only forecasting accuracy but also stability and efficiency, particularly when dealing with complex or highly volatile datasets. This makes the DPM a more reliable and robust choice for stock price prediction in dynamic financial environments.

5. Conclusions

This study compares the performance of the DPM, BP, and KDE models for predicting individual stock prices. The results reveal that the DPM model consistently outperforms the BP and KDE across a range of forecasting accuracy metrics. By iteratively updating the prior probability density to the posterior density, the DPM not only improves forecasting accuracy but also enhances the stability of estimation. This advantage is particularly evident for stocks in high-volatility sectors. The DPM model significantly reduces both scale-dependent errors (e.g., MAE, RMSE, and MAD) and scale-independent errors (e.g., MAPE, NMSE, RAE, and RRSE), demonstrating its superior effectiveness in stock price prediction.
Additionally, the DPM consistently achieves a higher R 2 coefficient compared to the BP and KDE, highlighting its superior capability to capture individual stock price patterns. A notable advantage of the DPM is its ability to automatically determine the optimal number of clusters in multi-cluster data, which enhances its efficiency and relevance in real-world applications such as risk management and portfolio forecasting. However, the DPM’s higher computational demands require the use of optimization techniques, such as Gibbs sampling, to ensure its practical feasibility.
In addition to its superior estimation capabilities for time series data, the DPM proves to be a robust tool for handling complex, multimodal distributions, resulting in more accurate and reliable forecasts. These findings highlight the potential for broader application in future research on stock return forecasting and portfolio management, particularly in the context of multidimensional models and highly volatile time series data.
While the DPM offers significant advantages, its higher computational demands necessitate techniques such as Gibbs sampling for practical implementation. Despite this computational cost, the DPM’s robust predictive capabilities, error reduction, and cluster optimization make it highly suited not only for stock price prediction but also for future applications in risk management and portfolio forecasting.

Author Contributions

Conceptualization, P.D.K. and T.M.T.; methodology, P.D.K. and T.M.T.; software, P.D.K. and T.M.T.; validation, P.D.K., C.G. and T.M.T.; formal analysis, P.D.K.; investigation, P.D.K.; resources, T.M.T.; data curation, T.M.T.; writing—original draft preparation, P.D.K. and T.M.T.; writing—review and editing, P.D.K., C.G. and T.M.T.; visualization, T.M.T.; supervision, C.G.; project administration, P.D.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Amewou-Atisso, M., Ghosal, S., Ghosh, J. K., & Ramamoorthi, R. V. (2003). Posterior consistency for semi-parametric regression problems. Bernoulli, 9(2), 291–312. [Google Scholar] [CrossRef]
  2. Barron, A. R. (1988). The exponential convergence of posterior probabilities with implications for Bayes estimators of density functions. Department of Statistics, University of Illinois. Available online: http://www.stat.yale.edu/~arb4/publications_files/convergence%20of%20bayer%27s%20estimator.pdf (accessed on 31 May 2025).
  3. Barron, A. R., Schervish, M. J., & Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems. The Annals of Statistics, 27(2), 536–561. [Google Scholar] [CrossRef]
  4. Boness, A. J., Chen, A. H., & Jatusipitak, S. (1974). Investigations of nonstationarity in prices. The Journal of Business, 47(4), 518–537. [Google Scholar] [CrossRef]
  5. Calinski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics—Theory and Methods, 3(1), 1–27. [Google Scholar] [CrossRef]
  6. Chae, M., & Walker, S. G. (2017). A novel approach to Bayesian consistency. Electronic Journal of Statistics, 11, 4723–4745. [Google Scholar] [CrossRef]
  7. Choudhuri, N., Ghosal, S., & Roy, A. (2004). Bayesian estimation of the spectral density of a time series. Journal of the American Statistical Association, 99(468), 1050–1059. [Google Scholar] [CrossRef]
  8. Cont, R. (2001). Empirical properties of asset returns: Stylized facts and statistical issues. Quantitative Finance, 1(2), 223–236. [Google Scholar] [CrossRef]
  9. Diaconis, P., & Freedman, D. (1986). On the consistency of Bayes estimates. The Annals of Statistics, 14, 1–26. [Google Scholar] [CrossRef]
  10. Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis (Vol. 3). Wiley. [Google Scholar]
  11. Escobar, M. D., & West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90(430), 577–588. [Google Scholar] [CrossRef]
  12. Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1, 209–230. [Google Scholar] [CrossRef]
  13. Freedman, D. A. (1963). On the asymptotic behavior of Bayes’ estimates in the discrete case. The Annals of Mathematical Statistics, 34(4), 1386–1403. [Google Scholar] [CrossRef]
  14. Friedman, H. P., & Rubin, J. (1967). On some invariant criteria for grouping data. Journal of the American Statistical Association, 62(320), 1159–1178. [Google Scholar] [CrossRef]
  15. Geman, S., & Hwang, C. R. (1982). Nonparametric maximum likelihood estimation by the method of sieves. The Annals of Statistics, 10, 401–414. [Google Scholar] [CrossRef]
  16. Ghosal, S., Ghosh, J. K., & Ramamoorthi, R. V. (1999). Posterior consistency of Dirichlet mixtures in density estimation. The Annals of Statistics, 27(1), 143–158. [Google Scholar] [CrossRef]
  17. Griffin, J. E., & Steel, M. F. J. (2006). Inference with non-Gaussian Ornstein-Uhlenbeck processes for stochastic volatility. Journal of Econometrics, 134(2), 605–644. [Google Scholar] [CrossRef]
  18. Hubert, L. J., & Levin, J. R. (1976). A general statistical framework for assessing categorical clustering in free recall. Psychological Bulletin, 83(6), 1072. [Google Scholar] [CrossRef]
  19. Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), 679–688. [Google Scholar] [CrossRef]
  20. Li, Y., Schofield, E., & Gönen, M. (2019). A tutorial on Dirichlet process mixture modeling. Journal of Mathematical Psychology, 91, 128–144. [Google Scholar] [CrossRef]
  21. Lindsay, B. G. (1983). The geometry of mixture likelihoods: A general theory. The Annals of Statistics, 11, 869–894. [Google Scholar] [CrossRef]
  22. Martin, G. M., Frazier, D. T., Maneesoonthorn, W., Loaiza-Maya, R., Huber, F., Koop, G., Maheu, J., Nibbering, D., & Panagiotelis, A. (2024). Bayesian forecasting in economics and finance: A modern review. International Journal of Forecasting, 40(2), 811–839. [Google Scholar] [CrossRef]
  23. Milligan, G. W., & Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2), 159–179. [Google Scholar] [CrossRef]
  24. Murtagh, F., & Legendre, P. (2014). Ward’s hierarchical agglomerative clustering method: Which algorithms implement Ward’s criterion? Journal of Classification, 31(3), 274–295. [Google Scholar] [CrossRef]
  25. Müller, P., Quintana, F., & Rosner, G. (2004). A method for combining inference across related nonparametric Bayesian models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 66(3), 735–749. [Google Scholar] [CrossRef]
  26. Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2), 249–265. [Google Scholar] [CrossRef]
  27. Petrone, S., & Wasserman, L. (2002). Consistency of Bernstein polynomial posteriors. Journal of the Royal Statistical Society Series B: Statistical Methodology, 64(1), 79–100. [Google Scholar] [CrossRef]
  28. Rasmussen, C., & Ghahramani, Z. (2001). Infinite mixtures of Gaussian process experts. Advances in Neural Information Processing Systems, 14. Available online: https://proceedings.neurips.cc/paper/2055-infinite-mixtures-of-gaussian-process-experts (accessed on 6 January 2024).
  29. Roeder, K., & Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals. Journal of the American Statistical Association, 92(439), 894–902. [Google Scholar] [CrossRef]
  30. Roweis, S., & Ghahramani, Z. (1999). A unifying review of linear Gaussian models. Neural Computation, 11(2), 305–345. [Google Scholar] [CrossRef] [PubMed]
  31. Saxena, A., Prasad, M., Gupta, A., Bharill, N., Patel, O. P., Tiwari, A., Er, M. J., Ding, W., & Lin, C. T. (2017). A review of clustering techniques and developments. Neurocomputing, 267, 664–681. [Google Scholar] [CrossRef]
  32. Schwartz, L. (1965). On Bayes procedures. Zeitschrift Für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 4(1), 10–26. [Google Scholar] [CrossRef]
  33. Silverman, B. W. (2018). Density estimation for statistics and data analysis. Routledge. Available online: https://www.taylorfrancis.com/books/mono/10.1201/9781315140919/density-estimation-statistics-data-analysis-bernard-silverman (accessed on 1 January 2024).
  34. Teh, Y. W. (2010). Dirichlet process. Encyclopedia of Machine Learning, 1063, 280–287. [Google Scholar]
  35. Tierney, L. (1998). A note on Metropolis-Hastings kernels for general state spaces. Annals of Applied Probability, 8, 1–9. [Google Scholar] [CrossRef]
  36. Verdinelli, I., & Wasserman, L. (1998). Bayesian goodness-of-fit testing using infinite-dimensional exponential families. The Annals of Statistics, 26(4), 1215–1241. [Google Scholar] [CrossRef]
  37. Walker, S. (2004). New approaches to Bayesian consistency. The Annals of Statistics, 32(5), 2028–2043. [Google Scholar] [CrossRef]
  38. Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244. [Google Scholar] [CrossRef]
  39. Zellner, A. (1971). Bayesian and non-Bayesian analysis of the log-normal distribution and log-normal regression. Journal of the American Statistical Association, 66(334), 327–330. [Google Scholar] [CrossRef]
  40. Zellner, A., & Chetty, V. K. (1965). Prediction and decision problems in regression models from the Bayesian point of view. Journal of the American Statistical Association, 60(310), 608–616. [Google Scholar] [CrossRef]
Figure 1. Time series of some randomly selected stocks. Source: The dataset was retrieved from yfinance.
Figure 1. Time series of some randomly selected stocks. Source: The dataset was retrieved from yfinance.
Jrfm 18 00304 g001
Figure 2. Densities of some selected stock prices.
Figure 2. Densities of some selected stock prices.
Jrfm 18 00304 g002
Figure 3. Distributions of error measures.
Figure 3. Distributions of error measures.
Jrfm 18 00304 g003
Table 1. Descriptive statistics of some randomly selected stock prices.
Table 1. Descriptive statistics of some randomly selected stock prices.
StockCountMeanStdMin25%50%75%MaxADF
p-Value
KPSS
p-Value
JB
p-Value
LB
p-Value
WMT121748.6059.75932.17843.17246.00951.32983.0860.9950.0100.0000.000
BA1217196.70640.45195.010170.000198.490217.180345.3950.0010.0970.0000.000
JNJ1217147.63112.32796.589142.410149.978156.081170.2440.1260.0100.0000.000
AAPL1217149.26839.56354.450125.117149.018174.561235.9610.6790.0100.0100.000
JPM1217135.58634.66868.481111.444134.814148.985224.3410.9790.0100.0000.000
XOM121775.22329.74424.80948.30079.291102.124123.2460.9060.0100.0000.000
Note: All tests were significant at the 5% level.
Table 2. Error metrics from the DPM, KDE, and BP for some selected stocks.
Table 2. Error metrics from the DPM, KDE, and BP for some selected stocks.
StockMethodMAERMSER2MADMAPENMSERAERRSE
WMTKDE0.00370.00580.92710.002686.15490.07290.23510.2700
WMTDPM0.00210.00270.98460.001561.65040.01540.13040.1240
WMTBP0.00850.01400.57970.0040218.26020.42030.53420.6483
BAKDE0.00040.00060.97190.000350.92550.02810.13210.1677
BADPM0.00040.00050.98450.000342.31420.01550.11320.1245
BABP0.00090.00130.89360.0008124.11100.10640.27730.3262
JNJKDE0.00150.00220.97120.001043.52830.02880.13760.1696
JNJDPM0.00140.00180.98080.001048.23360.01920.12430.1385
JNJBP0.00410.00680.72830.001476.44730.27170.37300.5212
AAPLKDE0.00130.00150.82510.001274.77540.17490.40080.4182
AAPLDPM0.00100.00120.89820.000843.90560.10180.30290.3191
AAPLBP0.00180.00210.66610.0015123.22810.33390.56740.5779
JPMKDE0.00150.00190.84420.001048.64110.15580.37760.3947
JPMDPM0.00120.00150.89800.001041.02080.10200.31350.3193
JPMBP0.00250.00320.56250.002281.53290.43750.63860.6614
XOMKDE0.00360.00440.50530.002560.60870.49470.72110.7033
XOMDPM0.00230.00320.73950.001742.47790.26050.45640.5104
XOMBP0.00590.0071−0.31380.005996.42581.31381.19441.1462
Table 3. Tests for mean differences between DMP, KDE, and BP.
Table 3. Tests for mean differences between DMP, KDE, and BP.
MethodMeanWilcoxon TestKruskal–Wallis
Test
BPDPMKDEKDE-DPMKDE-BPDPM-BP
MAE0.0040.0010.0020.0310.0310.0310.130
RMSE0.0060.0020.0030.0310.0310.0310.130
R20.5190.9140.8410.0310.0310.0310.018
MAD0.0030.0010.0010.0940.0310.0310.220
MAPE120.00146.60060.7720.0630.0310.0310.002
NMSE0.4810.0860.1590.0310.0310.0310.018
RAE0.5970.2400.3340.0310.0310.0310.058
RRSE0.6470.2560.3540.0310.0310.0310.018
Note: All the tests were significant at the 10% level.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Khoi, P.D.; Trong, T.M.; Gan, C. Dirichlet Mixed Process Integrated Bayesian Estimation for Individual Securities. J. Risk Financial Manag. 2025, 18, 304. https://doi.org/10.3390/jrfm18060304

AMA Style

Khoi PD, Trong TM, Gan C. Dirichlet Mixed Process Integrated Bayesian Estimation for Individual Securities. Journal of Risk and Financial Management. 2025; 18(6):304. https://doi.org/10.3390/jrfm18060304

Chicago/Turabian Style

Khoi, Phan Dinh, Thai Minh Trong, and Christopher Gan. 2025. "Dirichlet Mixed Process Integrated Bayesian Estimation for Individual Securities" Journal of Risk and Financial Management 18, no. 6: 304. https://doi.org/10.3390/jrfm18060304

APA Style

Khoi, P. D., Trong, T. M., & Gan, C. (2025). Dirichlet Mixed Process Integrated Bayesian Estimation for Individual Securities. Journal of Risk and Financial Management, 18(6), 304. https://doi.org/10.3390/jrfm18060304

Article Metrics

Back to TopTop