Parameter Estimation of MSNBurr-Based Hidden Markov Model: A Simulation Study

Unggul, Didik Bani; Iriawan, Nur; Irhamah, Irhamah

doi:10.3390/sym17111931

Open AccessArticle

Parameter Estimation of MSNBurr-Based Hidden Markov Model: A Simulation Study

by

Didik Bani Unggul

,

Nur Iriawan

^*

and

Irhamah Irhamah

Department of Statistics, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(11), 1931; https://doi.org/10.3390/sym17111931

Submission received: 7 October 2025 / Revised: 29 October 2025 / Accepted: 6 November 2025 / Published: 11 November 2025

Download

Browse Figures

Versions Notes

Abstract

Hidden Markov Model (HMM) is a well-known probabilistic framework for representing sequential phenomena governed by doubly stochastic processes. Specifically, it features a Markov chain with hidden (unobserved) states, where each state emits observable values through a state-conditioned emission distribution at every time step. In this framework, selecting an appropriate emission distribution is essential because an unsuitable choice may prevent the HMM from accurately representing the observed phenomenon. To accommodate emission phenomena with situational symmetry, we propose an HMM framework with an adaptive emission distribution, named MSNBurr-HMM. This method is based on the MSNBurr distribution, which can effectively represent symmetric, right-skewed, and left-skewed emission patterns. We also provide its parameter estimation algorithm using the Baum–Welch algorithm. For model validation, we conduct fitting simulations across diverse scenarios and compare the findings against Gaussian-HMM and Fernández–Steel Skew Normal-HMM using log-likelihood, the Akaike Information Criterion (AIC), the corrected AIC (AICc), and the Bayesian Information Criterion (BIC). The results demonstrate that the algorithm can effectively estimate the target parameters accurately in all tested scenarios. In terms of performance, MSNBurr-HMM generally outperforms other models with strong dominance in various aspects across all evaluation metrics, confirming the promising and excellent results of this proposed method.

Keywords:

asymmetric emission; Baum–Welch algorithm; Hidden Markov Model; MSNBurr distribution; simulation study

1. Introduction

The Hidden Markov Model (HMM) is a probabilistic modeling framework that represents a doubly stochastic process, where one stochastic process underlies another stochastic process [1]. The underlying process in HMM is a Markov chain characterized by a set of hidden states and transition probabilities [2]. The second stochastic process generates an observable emission variable whose value depends on the hidden state at each time point [3]. This framework has been widely implemented in various research domains [4], including speech recognition [5], econometrics [6], ecology [7], bioinformatics [8], and predictive maintenance [9]. Despite its widespread use, there are important points to note in HMM modeling. A key feature of HMMs is that observed values are emitted according to a particular probability distribution, known as the emission distribution, conditioned on the hidden state. This distribution must accurately represent the data’s properties including whether it is continuous or discrete, as well as its characteristics, such as skewness, kurtosis, or modality. If not appropriate, this will lead to poor fitting results and affect the overall model performance [10]. Several studies have taken different approaches when dealing with the skew phenomenon. For instance, Sowan et al. [11] address skewness and instability in traditional lift values by proposing four adjusted lift formulations, namely Smoothed, Weighted, Log, and Thresholded lift. On the other hand, Dash et al. [12] address skewed data by removing outliers. In the context of HMM, accommodation of skew patterns can be carried out by selecting an emission distribution that is adaptive to the skew pattern. Notably, Gaussian distributions are frequently employed for continuous data [13], but their symmetry assumption often mismatches real-world data. Other continuous distributions, such as Gamma, Log-Gaussian, Log-Gamma, or Weibull distribution, may seem tempting for modeling asymmetric emissions. However, these distributions cannot easily accommodate varying skew patterns (both right and left) or nearly symmetric patterns. Moreover, their restricted domain suitability further limits flexibility.

Based on these findings, we develop an adaptive HMM framework capable of accommodating both symmetric and asymmetric emission distributions. Specifically, we integrate the HMM with the Modified-to-be-Stable-to-Normal from Burr (MSNBurr) distribution proposed by Iriawan [14]. Several factors have motivated us in this integration. First, MSNBurr distribution was formulated to effectively capture symmetric, right-skewed, and left-skewed patterns. Choir et al. [15] also explain that this distribution can perfectly fit the symmetric pattern of the Gaussian distribution while exhibiting heavier and longer tails, making it more representative for data with extreme events. Another advantage of the MSNBurr distribution is the stability of its mode with respect to the location parameter. This contrasts with other distributions such as Azzalini’s Skew-Normal [16] and Skew-t [17], where changes in parameter values cause the mode to shift away from the location parameter. The stability of the mode is important because it ensures that the central position of the distribution remains consistently identifiable when other parameters change. This understanding is crucial in statistics, especially for large real-world datasets, as it helps identify where most of the data is concentrated [18]. Due to its adaptive properties, the use of MSNBurr distribution led to improvements in several methods, including mixture models [19], Markov switching autoregressive models [20], and the Ising model [21].

This integration yields a new framework, which we name MSNBurr-HMM. We present its parameter estimation algorithm based on an Expectation-Maximization scheme, familiarly known as the Baum–Welch Algorithm [22]. It is an effective and widely used method for estimating HMM parameters when hidden states are unknown. Performance validation via simulations will be conducted in various scenarios that represent different emission symmetry conditions. We compare our proposed method with a well-known Gaussian-HMM and Fernández–Steel Skew-Normal-based HMM (FSSN-HMM) through evaluation metrics such as Log-Likelihood, Akaike Information Criterion (AIC) [23], corrected AIC (AICc) [24], and Bayesian Information Criterion (BIC) [25]. Besides Gaussian-HMM, FSSN-HMM was also chosen because the FSSN distribution has similar characteristics to the MSNBurr distribution, particularly its flexibility in modeling both symmetric and asymmetric fat-tailed patterns [26]. In addition, the mode of the FSSN distribution is also stable at the location parameter [19]. Several studies have confirmed the superiority of the FSSN distribution compared to the Gaussian distribution, such as in the modeling data of volcano height [27], magnetic resonance imaging brain image intensity data [19], and the Scotland lip cancer dataset [28]. Based on these considerations, FSSN-HMM serves as an additional ideal benchmark for the MSNBurr-based method proposed here, alongside the symmetric Gaussian-HMM. The results will demonstrate the advantages of the MSNBurr-HMM and highlight its potential for further HMM development.

2. Methods

2.1. Hidden Markov Model

A Hidden Markov Model (HMM) is a doubly stochastic process in which the hidden process follows a Markov chain, and each hidden state generates an observation based on a specific probability distribution. Formally, an HMM consists of two sequences, namely the hidden state sequence

q_{1 : T} = (q_{1}, q_{2} \dots, q_{T})

representing the underlying Markov process and the sequence of observations

O_{1 : T} = (O_{1}, O_{2} \dots, O_{T})

, where

T

denotes the sequence length. Each hidden state

q_{t}

for

t = 1, 2, \dots, T

takes values from a finite set

\{S_{1}, S_{2}, \dots, S_{K}\}

, with

K

is the number of possible hidden states. The observation

O_{t}

for

t = 1, 2, \dots, T

is generated from a conditional emission distribution

p (O_{t} | q_{t} = S_{i})

, and is conditionally independent, meaning that each observation depends only on its corresponding hidden state and not on other observations when the state is known.

HMMs are parameterized by a set of parameters,

λ

, which consists of three components [29]: the initial state distribution (

π

), the transition probability matrix (

A

), and the emission distribution parameters

Θ = (θ_{1}, θ_{2} \dots, θ_{K})

corresponding for

K

hidden states. The definitions of

π

and

A

, along with their constraints, are provided in Equations (1)–(8). Regarding

Θ

, each

θ_{i}

defines the parameters of the emission distribution for state

S_{i}

(

i = 1, 2, \dots, K

). If the emission probability

p (O_{t} | q_{t} = S_{i})

follows a Gaussian distribution, then

θ_{i}

will consist of the mean and standard deviation parameters of that distribution. Using the Markov property of the hidden sequence and the conditional independence assumption of

O_{t}

given

q_{t}

, we can obtain the complete-data likelihood

p (O_{1 : T}, q_{1 : T} | λ)

in Equation (9) [30]. Equation (9) can also be rewritten as Equation (10) by decomposing

λ

into its explicit components

(π, A, Θ)

, where each parameter governs a different part of the model.

π = (π_{1}, π_{2}, \dots, π_{K})

(1)

π_{i} = p (q_{1} = S_{i}), i = 1, 2, \dots, K

(2)

π_{i} \geq 0, i = 1, 2, \dots, K

(3)

\sum_{i = 1}^{K} π_{i} = 1

(4)

A = [\begin{matrix} a_{11} & a_{12} & \dots & a_{1 K} \\ a_{21} & a_{22} & \dots & a_{2 K} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{K 1} & a_{K 2} & \dots & a_{K K} \end{matrix}]

(5)

a_{i j} = p (q_{t} = S_{j} | q_{t - 1} = S_{i}), i, j = 1, 2, \dots, K, t = 2, 3, \dots, T

(6)

a_{i j} \geq 0, i, j = 1,2, \dots, K

(7)

\sum_{j = 1}^{K} a_{i j} = 1, i = 1,2, \dots, K

(8)

p (O_{1 : T}, q_{1 : T} | λ) = p (q_{1} | λ) \prod_{t = 2}^{T} p (q_{t} | q_{t - 1}, λ) \prod_{t = 1}^{T} p (O_{t} | q_{t}, λ)

(9)

p (O_{1 : T}, q_{1 : T} | π, A, Θ) = p (q_{1} | π) \prod_{t = 2}^{T} p (q_{t} | q_{t - 1}, A) \prod_{t = 1}^{T} p (O_{t} | q_{t}, Θ)

(10)

2.2. MSNBurr-Based HMM

We introduce a modified HMM with an emission distribution following the MSNBurr distribution [14]. Derived from the Burr Type II distribution, the MSNBurr distribution accommodates both symmetric and asymmetric patterns while ensuring mode stability with respect to its location parameter [14]. Additionally, it can also accommodates long-tailed data [19]. We formulate an MSNBurr-based HMM, abbreviated as MSNBurr-HMM, by defining the emission probability

p (O_{t}| q_{t} = S_{i})

to follow the MSNBurr distribution. Each state’s emission distribution is characterized by parameters

θ_{i} = (μ_{i}, ϕ_{i}, γ_{i})

, where

μ_{i}

,

ϕ_{i}

, and

γ_{i}

are the location, scale, and shape parameters, respectively. The complete specification of this distribution is provided in Equation (11), with

- \infty < O_{t} < \infty

,

- \infty < μ_{i} < \infty

,

ϕ_{i} > 0

,

γ_{i} > 0

, and

κ_{i}

is a function of

γ_{i}

as shown in Equation (12).

p (O_{t} | q_{t} = S_{i}, θ_{i}) = \frac{κ_{i}}{ϕ_{i}} \exp [- κ_{i} (\frac{O_{t} - μ_{i}}{ϕ_{i}})] {[1 + \frac{\exp [- κ_{i} (\frac{O_{t} - μ_{i}}{ϕ_{i}})]}{γ_{i}}]}^{- (γ_{i} + 1)}

(11)

κ_{i} = \frac{1}{\sqrt{2 π}} {(1 + \frac{1}{γ_{i}})}^{γ_{i} + 1}

(12)

To illustrate the adaptability of this distribution, we present the probability density function (pdf) plots for several different parameter values in Figure 1. We can observe that symmetry is achieved when

γ = 1

. A left-skewed pattern begins to emerge when

γ < 1

, while

γ > 1

is associated with right-skewness.

2.3. Parameter Estimation of MSNBurr-HMM

In this section, we present the parameter estimation of the MSNBurr-HMM using an EM-based approach, commonly known as the Baum–Welch Algorithm. At this point, we will apply Equation (10) to enter the EM framework, where in the E-Step, we compute the expected log-likelihood of the complete data. In the M-Step, it is going to be maximized with respect to the model parameter

λ

. The expected log-likelihood of the complete data is expressed as

Q (λ, λ^{o l d})

in Equation (13), where

λ^{o l d}

is the current parameter estimate obtained from the previous iteration. To solve this, we adopt the Baum–Welch algorithm steps where we need a forward variable

α_{t} (i)

, backward variable

β_{t} (i)

, and two other quantities calculated from

α_{t} (i)

and

β_{t} (i)

, namely

δ_{t} (i)

and

ξ_{t} (i, j)

[22,29,31].

Q (λ, λ^{o l d}) = E_{q_{1 : T} | O_{1 : T}, λ^{o l d}} [\log p (O_{1 : T}, q_{1 : T} | λ)]

(13)

The forward variable, denoted by

α_{t} (i)

, satisfies the definition in Equation (14). At

t = 1

, this quantity is calculated using Equation (15), where

p (O_{1} | q_{t} = S_{i}, θ_{i})

follows the form given in Equation (11). For

t = 2, 3, \dots, T

,

α_{t} (i)

is computed recursively via Equation (16). We next define the backward variable

β_{t} (i)

in Equation (17). In contrast to

α_{t} (i)

, which is computed forward from

t = 1

,

β_{t} (i)

is computed backward from

t = T

using Equation (18), then recursively for

t = T - 1, T - 2, \dots, 1

via Equation (19).

α_{t} (i) = p (O_{1 : t}, q_{t} = S_{i} | λ)

(14)

α_{1} (i) = π_{i} p (O_{1} | q_{1} = S_{i}, θ_{i})

(15)

α_{t} (i) = (\sum_{j = 1}^{K} α_{t - 1} (j) a_{j i}) p (O_{t} | q_{t} = S_{i}, θ_{i})

(16)

β_{t} (i) = p (O_{t + 1 : T} | q_{t} = S_{i}, λ)

(17)

β_{T} (i) = 1

(18)

β_{t} (i) = \sum_{j = 1}^{K} a_{i j} p (O_{t + 1} | q_{t + 1} = S_{j}, θ_{j}) β_{t + 1} (j)

(19)

Once

α_{t} (i)

and

β_{t} (i)

are obtained for each

t = 1, 2, \dots, T

dan

i = 1, 2, \dots, K

, we compute

δ_{t} (i)

and

ξ_{t} (i, j)

using the formal definitions given in Equations (20) and (21), respectively. We refer to

δ_{t} (i)

as the expected state occupancy count and

ξ_{t} (i, j)

as the expected state transition count. Both become the basis for updating the parameters

λ

in the next stage.

δ_{t} (i) = p (q_{t} = S_{i} | O_{1 : T}, λ) = \frac{α_{t} (i) β_{t} (i)}{\sum_{l = 1}^{K} α_{t} (l) β_{t} (l)}

(20)

\begin{matrix} ξ_{t} (i, j) & = p (q_{t} = S_{i}, q_{t + 1} = S_{j} | O_{1 : T}, λ) \\ = \frac{α_{t} (i) a_{i j} p (O_{t + 1} | q_{t + 1} = S_{j}, θ_{j}) β_{t + 1} (j)}{\sum_{i^{'} = 1}^{K} \sum_{j^{'} = 1}^{K} α_{t} (i^{'}) a_{i^{'} j^{'}} p (O_{t + 1} | q_{t + 1} = S_{j^{'}}, θ_{j^{'}}) β_{t + 1} (j^{'})} \end{matrix}

(21)

The updating process of

λ

is operated independently for each component (

π

,

A

, and

Θ

). The update of

π_{i}

(

i = 1, 2, . . ., K

) is calculated using Equation (22), while

a_{i j}

(

i, j = 1,2, . . ., K

) is obtained by Equation (23). Regarding

Θ

, the update is performed on each hidden state emission

θ_{i}

using Equation (24). The

δ_{t} (i)

and

ξ_{t} (i, j)

values used in all three equations are calculated using the current

λ

. The output of Equations (22)–(24) gives us the updated parameter estimates

λ^{n e w} = (π^{n e w}, A^{n e w}, Θ^{n e w})

. Since

δ_{t} (i)

and

ξ_{t} (i, j)

directly depend on these parameters, their values will consequently be updated as well.

π_{i}^{n e w} = δ_{1} (i)

(22)

a_{i j}^{n e w} = \frac{\sum_{t = 1}^{T - 1} ξ_{t} (i, j)}{\sum_{t = 1}^{T - 1} δ_{t} (i)}

(23)

θ_{i}^{n e w} = \arg \max_{θ_{i}} \sum_{t = 1}^{T} δ_{t} (i) \log p (O_{t} | q_{t} = S_{i}, θ_{i})

(24)

The next step is to repeat the process of calculating the forward–backward variable applying the updated parameter

λ^{n e w}

. This process should be repeated until it reaches

M

iterations or satisfies the specified convergence criteria. Such a criterion can be defined, for example, by requiring that the changes in all parameter components become sufficiently small, as formulated in Equation (25), where

m

denotes the iteration index and

ϵ

is a convergence tolerance.

\max \{‖π^{(m + 1)} - π^{(m)}‖, ‖A^{(m + 1)} - A^{(m)}‖, ‖Θ^{(m + 1)} - Θ^{(m)}‖\} < ϵ

(25)

Furthermore, we suggest readers to see Yang et al. [32] for a detailed discussion of the properties and convergence guarantees of the Baum–Welch algorithm. We finally summarize the procedure in Algorithm 1. Notice that an initial parameter value

λ^{(0)}

is required to start the iteration, which can be obtained using a reasonable data-driven strategy. For instance,

λ^{(0)}

may be assigned by using values derived from simple statistics, such as the sample mean and standard deviation of the observable values.

Algorithm 1. Baum–Welch algorithm for MSNBurr-HMM.

Input:

O_{1 : T}

; Initialization

λ^{(0)} = (π^{(0)}, A^{(0)}, Θ^{(0)})

; convergence tolerance

ϵ

Output:

λ^{(m^{*})}

Initialization:
Set the iteration index

m = 0

Repeat:
E-Step:
Compute

α_{t} (i)

and

β_{t} (i)

using Equations (15), (16), (18) and (19).
Compute

δ_{t} (i)

and

ξ_{t} (i, j)

using Equations (20) and (21).

t = 1,2, \dots, T

,

i, j = 1,2, \dots, K

M-Step:
Update

π

using Equation (22) to obtain

π^{(m + 1)}

Update

A

using Equation (23) to obtain

A^{(m + 1)}

Update

Θ

using Equation (24) to obtain

Θ^{(m + 1)}

Convergence Check:
Evaluate the parameter change using Equation (25)
If the criterion in Equation (25) is satisfied, then stop the iteration and set

m^{*} = m + 1

;
Else set

m \leftarrow m + 1

and repeat the process.
End Repeat

2.4. Simulation Design

In this study, we focus on the case of two hidden states with a fixed sequence length of T = 200. We examine three main simulation scenarios, each representing a different skewness in the emission distributions. Scenario 1 consists of two symmetric distributions. In Scenario 2, one emission distribution remains symmetric, while the other exhibits a skewed (asymmetric) behavior. Scenario 3 extends this further, with both emissions demonstrating skewness. Each of these scenarios is then divided into two sub-scenarios based on the overlap between the emission distributions. In sub-scenario (a), the two emission distributions are completely separated, whereas in sub-scenario (b), they partially overlap. As a result, we analyze a total of six distinct cases, labeled as Scen1a, Scen1b, Scen2a, Scen2b, Scen3a, and Scen3b.

The data generation process utilizes an equal initial state distribution,

π = (0.5, 0.5)

, and a fixed transition probability matrix

A = [\begin{matrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{matrix}] = [\begin{matrix} 0.7 & 0.3 \\ 0.2 & 0.8 \end{matrix}]

for all scenarios. The emission distribution parameters vary across scenarios according to their respective criteria with full details provided in Table 1. We also replicate each scenario 100 times to obtain a comprehensive view of the results. Finally, the R code of this simulation is publicly available at https://github.com/dbunggul/msnburrHMM-bwa (accessed on 20 June 2025) for reproducibility.

2.5. Evaluation Criteria

The evaluation procedure will be conducted by analyzing the log-likelihood and three information criteria, namely Akaike’s Information Criterion (AIC) [23], its bias-corrected version (AICc) [24], and Bayesian Information Criterion (BIC) [25]. The log-likelihood, denoted as

\log p (O_{1 : T} | λ)

, can be efficiently computed using the forward variable at the final time step within the Baum–Welch algorithm. The formula of log-likelihood is given by Equation (26) below. Furthermore, we also consider AIC, AICc, and BIC as they account for the number of parameters (

d

). This process of penalizing complexity offers a clearer perspective when comparing models with different numbers of parameters. AIC is defined by Equation (27), while AICc is given in Equation (28), where

n

represents the number of observations. Lastly, BIC is written in Equation (29). Models with higher log-likelihood are preferred [33]. As for AIC, AICc, and BIC, we prefer models with lower values.

\log p (O_{1 : T} | λ) = \log (\sum_{i = 1}^{K} α_{T} (i))

(26)

A I C = - 2 [\log p (O_{1 : T} | λ)] + 2 d

(27)

A I C c = A I C + \frac{2 d (d + 1)}{n - d - 1}

(28)

B I C = - 2 [\log p (O_{1 : T} | λ)] + d \ln n

(29)

3. Results

3.1. Simulation Results

We begin by describing the data generation results from 100 replications per scenario. Their suitability to the desired conditions can be observed through the median of skewness values in Table 1. We can observe from Table 1 that emissions generated from the MSNBurr distribution with

γ = 1

had medians of empirical skewness close to zero. In contrast, the emission distribution generated with

γ \neq 1

successfully simulated asymmetric conditions, with the median of skewness deviating from zero. As an illustration, Figure 2 visualizes the data generation output and the corresponding emission histogram, specific to one of the replications in Scen3b.

In general, the parameter estimation procedure constructed based on Algorithm 1 was executed well. The convergence speed in each simulation varied per scenario. To summarize this, we provide Table 2, which contains the number of replications that converged within a certain iteration range. The model was deemed converged when the difference between each parameter’s value in consecutive iterations was smaller than a set threshold, which was chosen to be

ϵ = 0.0001

for this study. The first key insight is that sub-scenario (b) required more iterations to achieve convergence than sub-scenario (a). This pattern holds when comparing Scen1a-Scen1b, Scen2a-Scen2b, and Scen3a-Scen3b. The majority of replications in sub-scenario (a) converged before the eleventh iteration. In contrast, sub-scenario (b) took longer to converge (11–50 iterations or even more). This result was expected, given that sub-scenario (a) emissions were conditioned to be fully separated, whereas sub-scenario (b) involved two overlapping emissions and thus required more time to converge.

Our next finding indicates that scenario 1 (two symmetrical emissions) converged more slowly than scenario 2 (one skew emission and one symmetrical) and scenario 3 (two skew emissions). Specifically, up to 81% of replications in Scen1a converged before the eleventh iteration. This percentage was lower than those observed in Scen2a (95%) and Scen3a (100%). Similar patterns were evident when comparing Scen1b, Scen2b, and Scen3b. In Scen1b, only 5% of the replications converged in fewer than 26 iterations. This was markedly different from Scen2b and Scen3b, where the corresponding percentages are 35% and 54%, respectively.

We now turn to the results of parameter estimation and their closeness to the target parameters in each scenario. To summarize the results, we present Table 3, which shows the measures of central tendency for the estimated parameter values. For the parameters in

π

and

A

, we computed the mean of the 100 replication estimates as the measure of central tendency, which is shown in the ‘Est.’ column of Table 3. On the other hand, the median was preferred for the emission parameters, given the potential deviation of the mean due to extreme values. This issue is less likely to occur in

π

and

A

because their values are restricted to the range [0, 1]. In general, the parameter estimation performed well, as the central tendency estimates closely approximated the true values. This demonstrated the model’s reliable estimation performance.

We next compare the performance of MSNBurr-HMM against Gaussian-HMM and FSSN-HMM as the benchmark models. The model fitting results across all scenarios are presented in Figure 3, Figure 4, Figure 5 and Figure 6. Figure 3 illustrates the violin plots of the log-likelihood values, while Figure 4, Figure 5 and Figure 6 correspond to the AIC, AICc, and BIC values, respectively. These visualizations reveal key information about model performance.

Firstly, the spread of log-likelihood, AIC, AICc, and BIC values for three models appeared fairly wide across all scenarios. However, MSNBurr-HMM typically displayed a narrower spread compared to the Gaussian-HMM and FSSN-HMM, particularly in Scen1b, Scen2a, and Scen3a. This was further confirmed by the Interquartile Range (IQR) presented in Table 4. We can see that the IQRs for the log-likelihood, AIC, AICc, and BIC values of MSNBurr-HMM were all lower than those for Gaussian-HMM. We also encountered a similar case when comparing it with the FSSN-HMM, except in Scen2b. At this point, it can be said that MSNBurr-HMM exhibited more consistent performance overall.

Secondly, across all six scenarios, the log-likelihood values for the MSNBurr-HMM were also visually higher than those for the Gaussian model, indicating that MSNBurr-HMM generally provided a better fit to the data. When switching to the FSSN-HMM, we observe that in certain scenarios such as Scen1b, Scen2b, and Scen3b, the log-likelihood violin plot of the MSNBurr-HMM is visually located higher with a narrower spread, indicating better performance.

This was consistent for the AIC, AICc, and BIC metrics, where visually, MSNBurr-HMM had lower values compared to Gaussian-HMM and FSSN-HMM, especially in scenarios Scen2a, Scen2b, Scen3a, and Scen3b, which involve asymmetric emission conditions. This was supported by the median values of the four metrics in Table 4. The median log-likelihood of MSNBurr-HMM outperformed Gaussian-HMM and FSSN-HMM in all scenarios. For AIC, AICc, and BIC, the superiority of MSNBurr-HMM was also evident in five of the six scenarios, marked by blue colour in Table 4.

However, it should be noted that the first and second points do not account for pairwise comparisons between models and instead only provide an overview of their performance across 100 replications. As a result, the superiority of MSNBurr-HMM still requires further investigation. To address this issue, for each scenario, we conducted four separate one-sided Wilcoxon signed-rank tests (one for each evaluation metric) on paired replications. More precisely, we compared MSNBurr-HMM against Gaussian-HMM and MSNBurr-HMM against FSSN-HMM under each metric, resulting in two pairwise tests per metric for each scenario. We then applied the Bonferroni correction to mitigate the increased risk of Type I errors that arises when multiple hypothesis tests are performed [34].

In the case of the log-likelihood, the null hypothesis stated that the median difference between the log-likelihoods of MSNBurr-HMM and a comparator model was less than or equal to zero (i.e., the log-likelihood of MSNBurr-HMM was not greater than that of Gaussian-HMM). The alternative hypothesis, therefore, was that the median of the difference was greater than zero. A significance level of 0.05 was adopted and subsequently adjusted using the Bonferroni correction. If the p-value was below Bonferroni-adjusted significance level of 0.025 (reflecting two pairwise comparisons: MSNBurr-HMM versus Gaussian-HMM and MSNBurr-HMM versus FSSN-HMM), then the null hypothesis was rejected. This provided statistically significant evidence that the log-likelihood values of MSNBurr-HMM were greater than those of both Gaussian-HMM and FSSN-HMM. On the other hand, we reversed the hypotheses defined for the AIC, AICc, and BIC cases. For both metrics, we defined the null hypothesis as the median of the difference between the AIC (or AICc or BIC) values of MSNBurr-HMM and a comparator model being zero or greater. Consequently, the alternative hypothesis was that the median of the difference was less than zero. This approach was reasonable because the nature of AIC, AICc, and BIC (where smaller values indicate better performance) is opposite to that of log-likelihood.

The results reveal that, for the MSNBurr-HMM versus FSSN-HMM comparison, the null hypothesis was rejected in all scenarios and across all evaluation metrics, confirming that MSNBurr-HMM consistently delivered significantly better performance. For the MSNBurr-HMM versus Gaussian-HMM comparison, the null hypothesis was not rejected in three specific cases: AICc under Scen1b (V = 2071, p-value = 0.059, one-sided 95% CI bounded above at 0.050), BIC under Scen1a (V = 4201, p-value = 1.000, one-sided 95% CI bounded above at 5.318), and BIC under Scen1b (V = 4318, p-value = 1.000, one-sided 95% CI bounded above at 6.283). In these scenarios, the one-sided confidence intervals crossed zero and the p-values exceeded the Bonferroni-adjusted significance level of 0.025, indicating insufficient statistical evidence that MSNBurr-HMM provided a lower AICc or BIC than Gaussian-HMM. In all other scenarios, however, the null hypothesis was decisively rejected. These outcomes are reasonable from a model selection perspective. In Scen1a and Scen1b, both emission distributions were intentionally constructed to be symmetric, meaning that Gaussian-HMM, which assumes symmetric emission behavior, is already compatible with the data-generating process. In contrast, MSNBurr-HMM is more flexible and contains additional parameters to accommodate skewness and heavy-tailed characteristics. Since AICc and BIC explicitly penalize model complexity, this additional flexibility does not translate into a meaningful advantage when the true emission structure is already symmetric. However, in all other scenarios, where at least one emission distribution exhibits asymmetry, the superiority of MSNBurr-HMM becomes evident. This confirms that when the data deviates from symmetry, MSNBurr-HMM is clearly preferable over Gaussian-HMM due to its enhanced ability to capture skewed behavior.

If we shift the analysis to count the number of replications where MSNBurr-HMM outperformed Gaussian-HMM and FSSN-HMM, the results remain largely consistent. Figure 7, Figure 8, Figure 9 and Figure 10 display the number of replications won by each model based on log-likelihood, AIC, AICc, and BIC respectively. Key information can be drawn from these figures. For instance, the top-left panel of Figure 7 exhibits a bar chart showing values of 78 for MSNBurr-HMM, 22 for FSSN-HMM, and 0 for Gaussian-HMM. This indicated that in Scen1a, MSNBurr-HMM achieved highest log-likelihood values in 78 out of 100 replications while the FSSN-HMM claims the best performance in 22 replications. Shortly, MSNBurr-HMM showed better log-likelihood-based performance in most replications. Figure 7 further revealed that the proportion of MSNBurr-HMM’s superior performance was consistent in scenarios 2 and 3. Moreover, as asymmetric emission behavior emerges in Scenarios 2 and 3, Gaussian-HMM becomes increasingly less competitive.

We observed a trend where the dominance of MSNBurr-HMM in Figure 8 becomes increasingly pronounced as the emission structure shifts from symmetric to asymmetric. In Scen1a and Scen1b (both emissions are symmetric), MSNBurr-HMM achieved lowest AIC values in 65 and 45 replications, respectively. These numbers increased to 72 and 64 replications in Scen2a and Scen2b (one asymmetric emission). The trend continued with 78 and 74 replications in Scen3a and Scen3b (both emissions are asymmetric). In contrast, the number of replications won by Gaussian-HMM consistently decreased as the emission setting transitioned from symmetric to asymmetric. A very similar behavior is observed for AICc in Figure 9. Gaussian-HMM still remained relatively competitive under fully symmetric emission conditions, especially under the overlapping case (Scen1b). However, MSNBurr-HMM took over the dominance in the remaining scenarios. A comparable pattern is also found for BIC. In Figure 10, Gaussian-HMM exhibited stronger performance in Scen1a and Scen1b. The situation began to reverse in Scen2a and Scen2b, where MSNBurr-HMM started to outperform the competing models more frequently. Finally, in Scen3a and Scen3b, which involve fully asymmetric emissions, more than 70% of the replications indicated that MSNBurr-HMM attained the lowest BIC.

3.2. Real-Data Example

This section demonstrates the application of the MSNBurr-HMM in modeling the daily closing price movements of the Jakarta Composite Index (JCI) during 2023–2024. This case was selected based on the distributional characteristics of the emissions, as visualized in Figure 11. First, the histogram clearly exhibits a consistent bimodal pattern, indicating the presence of two distinct components that are naturally modeled as two emissions in an HMM framework. Second, each mode displays a non-symmetric (skewed) shape, suggesting that the symmetric Gaussian assumption may be inadequate. The data will be modeled using the MSNBurr-HMM, and the results will be compared with the performance of the FSSN-HMM and the Gaussian-HMM.

The initial values were set to

π = (0.4, 0.6)

and

A = [\begin{matrix} 0.6 & 0.4 \\ 0.4 & 0.6 \end{matrix}]

. We chose a slightly larger initial-state probability for the second emission (

π_{2}^{(0)} = 0.6

) to reflect its visually wider spread around the higher-valued mode, as inferred from the histogram. The diagonal entries of

A

were set to 0.6 to encode the expectation that remaining in the same state is more likely than switching. These values were kept close to 0.5 to maintain a conservative, minimally biased initialization. For emission parameter initialization, we use the sample mean and standard deviation as references for the location and scale. Specifically,

μ_{1}^{(0)} = \bar{O} - 0.5 {s d}_{O}

,

μ_{2}^{(0)} = \bar{O} + 0.5 {s d}_{O}

,

ϕ_{1}^{(0)} = ϕ_{2}^{(0)} = s d_{O}

, and

γ_{1}^{(0)} = γ_{2}^{(0)} = 0.1

, where

\bar{O}

and

s d_{O}

denote the mean and standard deviation of the observable data. The convergence criterion is set to

ϵ = 0.0001

. With this setup, the MSNBurr-HMM reached convergence at the ninth iteration, and the resulting parameter estimates are given in Table 5. Figure 12 is also provided to illustrate the trajectory of the parameter estimates up to the point of convergence.

The estimated parameters indicate a clear tendency for the process to start in state 1. Furthermore, the estimated transition probabilities highlight that the process tends to stay in the same state rather than switch to another. Regarding the emission distributions, the first emission exhibits its mode at 6879.568, whereas the second emission attains its mode at 7229.954. The second emission has a wider spread than the first, as reflected by the larger

{\hat{ϕ}}_{2}

relative to

{\hat{ϕ}}_{1}

. In addition, the two emission distributions differ in their skewness characteristics. The first emission shows left skewness (

{\hat{γ}}_{1} < 1

), whereas the second emission displays right skewness (

{\hat{γ}}_{2} > 1

). We can observe how the estimated emission distributions align with the data pattern in Figure 13.

The figure confirms that the overall data distribution is well captured by the emission from MSNBurr-HMM. As a further analysis, we present the model performance measures including Log-Likelihood, AIC, AICc, and BIC for the MSNBurr-HMM, FSSN-HMM, and Gaussian-HMM in Table 6. It can be seen that the MSNBurr-HMM consistently outperforms both benchmark models across all evaluated metrics. It achieves the highest log-likelihood of −3060.60, outperforming FSSN-HMM (−3066.02) and Gaussian-HMM (−3083.43). This improvement is also reflected in the lowest AIC value of 6139.19, compared to 6150.04 for FSSN-HMM and 6180.86 for Gaussian-HMM. In terms of AICc, MSNBurr-HMM similarly achieves the smallest value of 6139.57, whereas FSSN-HMM and Gaussian-HMM yield 6150.42 and 6181.10, respectively. The lowest BIC value of 6176.72 further validates the advantage of MSNBurr-HMM in balancing model fit and complexity.

4. Discussion

Several points can be discussed from the findings presented in the previous section. MSNBurr-HMM most clearly outperforms competing models when the underlying emission distributions exhibit asymmetric behavior. In Scenarios 2 and 3, where at least one emission distribution is skewed, MSNBurr-HMM consistently achieves the best performance across log-likelihood, AIC, AICc, and BIC. This demonstrates that the model is particularly effective when the data deviate from Gaussian assumptions, especially in the presence of skewness.

In terms of computational considerations, MSNBurr-HMM indeed involves more parameters than Gaussian-HMM. However, the superiority of MSNBurr-HMM under AIC, AICc, and BIC, all of which incorporate explicit penalties for model complexity, confirms that its improved performance is not simply a consequence of overfitting. When compared to FSSN-HMM, which has the same number of emission parameters, MSNBurr-HMM consistently delivers better results, highlighting its modeling capability. This advantage is further supported in the real-data application, where MSNBurr-HMM not only maintains superior performance but also offers clear interpretability. In particular, the estimated emission parameters provide meaningful insights regarding the mode position, dispersion, and skewness direction of the observed process.

Despite the promising results, this study has several limitations that also present opportunities for future research. First, the simulation design and estimation algorithm are currently restricted to a predetermined number of hidden states. Further exploration is needed on how the MSNBurr-HMM algorithm can be designed to automatically identify the optimal number of hidden states, which would reduce potential bias and enhance the objectivity of the results. Second, this article focuses solely on parameter estimation and has not yet been extended to decode hidden states for each observation. Developing a decoding algorithm and evaluating the accuracy of the MSNBurr-HMM in performing this task would be a valuable direction for future research.

5. Conclusions

We can draw several conclusions from this study. First, MSNBurr-HMM can be effectively estimated using the Baum–Welch Algorithm. Second, the model shows its strongest advantage when the underlying emission distribution is asymmetric. Together, these findings highlight MSNBurr-HMM’s adaptability and effectiveness, confirming its practical potential across diverse modeling contexts.

Author Contributions

Conceptualization, D.B.U.; methodology, D.B.U., N.I. and I.I.; software, D.B.U.; validation, D.B.U., N.I. and I.I.; formal analysis, D.B.U. and N.I.; investigation, D.B.U. and N.I.; resources, D.B.U. and N.I.; data curation, D.B.U.; writing—original draft preparation, D.B.U.; writing—review and editing, N.I. and I.I.; visualization, D.B.U.; supervision, N.I. and I.I.; project administration, D.B.U. and N.I.; funding acquisition, D.B.U. and N.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Lembaga Pengelola Dana Pendidikan (LPDP) through a Doctoral Scholarship (Award Letter No. LOG-19050/LPDP.3/2024).

Data Availability Statement

The data used in the simulation study were generated through simulation procedures described in the paper. All codes required to reproduce the results and regenerate the data are openly available at https://github.com/dbunggul/msnburrHMM-bwa (accessed on 20 June 2025). The daily closing price data for the JCI is available at https://id.investing.com/indices/idx-composite-historical-data (accessed on 15 August 2025).

Acknowledgments

The authors gratefully acknowledge the financial support provided by the Lembaga Pengelola Dana Pendidikan (LPDP). We also thank Department of Statistics Institut Teknologi Sepuluh Nopember for its support throughout this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

The following notations are used in this manuscript:

$T$	Total number of observation periods
$K$	Number of hidden states in the MSNBurr-HMM
$O_{t}; O_{1 : T}$	Observable value at time $t$ , for $t = 1,2, \dots, T$ ; Full observation sequence $(O_{1}, O_{2}, \dots, O_{T})$
$q_{t}; q_{1 : T}$	Hidden state at time $t$ , for $t = 1,2, \dots, T$ ; Full hidden state sequence $(q_{1}, q_{2}, \dots, q_{T})$
$λ$	Full parameter set of MSNBurr-HMM
$π$	Initial state probability
$A$	Transition probability matrix
$Θ$	Emission parameter set
$θ_{i}$	Emission parameter set for i-th hidden state, $θ_{i} = (μ_{i}, ϕ_{i}, γ_{i})$ , for $i = 1,2, \dots, K$
$μ_{i}$	Location parameter of the emission distribution in the i-th hidden state
$ϕ_{i}$	Scale parameter of the emission distribution in the i-th hidden state
$γ_{i}$	Shape parameter of the emission distribution in the i-th hidden state
$α_{t} (i)$	Forward variable, for $t = 1,2, \dots, T$ and $i = 1,2, \dots, K$
$β_{t} (i)$	Backward variable, for $t = 1,2, \dots, T$ and $i = 1,2, \dots, K$
$δ_{t} (i)$	Expected state occupancy, for $t = 1,2, \dots, T$ and $i = 1,2, \dots, K$
$ξ_{t} (i, j)$	Expected state transition, for $t = 1,2, \dots, T$ and $i, j = 1,2, \dots, K$
$m$	Iteration index of the BWA
$ϵ$	Convergence tolerance

References

Yu, S.-Z. Hidden Semi-Markov Models. Artif. Intell. 2010, 174, 215–243. [Google Scholar] [CrossRef]
Kouemou, G.L. History and Theoretical Basics of Hidden Markov Models. In Hidden Markov Models, Theory and Applications; Dymarski, P., Ed.; InTech: Houston, TX, USA, 2011; ISBN 978-953-307-208-1. [Google Scholar]
Bouarada, O.; Azam, M.; Amayri, M.; Bouguila, N. Hidden Markov Models with Multivariate Bounded Asymmetric Student’s t-Mixture Model Emissions. Pattern Anal. Appl. 2024, 27, 117. [Google Scholar] [CrossRef]
Mor, B.; Garhwal, S.; Kumar, A. A Systematic Review of Hidden Markov Models and Their Applications. Arch. Comput. Methods Eng. 2021, 28, 1429–1448. [Google Scholar] [CrossRef]
Srivastava, R.K.; Shree, R.; Shukla, A.K.; Pandey, R.P.; Shukla, V.; Pandey, D. A Feature Based Classification and Analysis of Hidden Markov Model in Speech Recognition. In Cyber Intelligence and Information Retrieval; Tavares, J.M.R.S., Dutta, P., Dutta, S., Samanta, D., Eds.; Lecture Notes in Networks and Systems; Springer: Singapore, 2022; Volume 291, pp. 365–379. ISBN 978-981-16-4283-8. [Google Scholar]
Bhar, R.; Hamori, S. Hidden Markov Models: Applications to Financial Economics; Advanced Studies in Theoretical and Applied Econometrics; Kluwer Academic Publishers: Boston, MA, USA, 2004; Volume 40, ISBN 978-1-4020-7899-6. [Google Scholar]
Glennie, R.; Adam, T.; Leos-Barajas, V.; Michelot, T.; Photopoulou, T.; McClintock, B.T. Hidden Markov Models: Pitfalls and Opportunities in Ecology. Methods Ecol. Evol. 2023, 14, 43–56. [Google Scholar] [CrossRef]
Yoon, B.-J. Hidden Markov Models and Their Applications in Biological Sequence Analysis. Curr. Genom. 2009, 10, 402–415. [Google Scholar] [CrossRef]
Ramezani, S.B.; Killen, B.; Cummins, L.; Rahimi, S.; Amirlatifi, A.; Seale, M. A Survey of HMM-Based Algorithms in Machinery Fault Prediction. In Proceedings of the 2021 IEEE Symposium Series on Computational Intelligence (SSCI), Orlando, FL, USA, 5–7 December 2021; IEEE: Orlando, FL, USA, 2021; pp. 1–9. [Google Scholar]
Volant, S.; Bérard, C.; Martin-Magniette, M.-L.; Robin, S. Hidden Markov Models with Mixtures as Emission Distributions. Stat. Comput. 2014, 24, 493–504. [Google Scholar] [CrossRef]
Sowan, B.; Zhang, L.; Matar, N.; Zraqou, J.; Omar, F.; Alnatsheh, A. A Novel Lift Adjustment Methodology for Improving Association Rule Interpretation. Decis. Anal. J. 2025, 15, 100582. [Google Scholar] [CrossRef]
Dash, C.S.K.; Behera, A.K.; Dehuri, S.; Ghosh, A. An Outliers Detection and Elimination Framework in Classification Task of Data Mining. Decis. Anal. J. 2023, 6, 100164. [Google Scholar] [CrossRef]
Murphy, K.P. Machine Learning: A Probabilistic Perspective; Adaptive Computation and Machine Learning Series; MIT Press: Cambridge, MA, USA, 2012; ISBN 978-0-262-01802-9. [Google Scholar]
Iriawan, N. Computationally Intensive Approaches to Inference in Neo-Normal Linear Models. Ph.D. Thesis, Curtin University of Technology, Perth, Australia, 2000. [Google Scholar]
Choir, A.S.; Iriawan, N.; Ulama, B.S.S.; Dokhi, M. MSEPBurr Distribution: Properties and Parameter Estimation. Pak. J. Stat. Oper. Res. 2019, 15, 179–193. [Google Scholar] [CrossRef]
Azzalini, A. A Class of Distributions Which Includes the Normal Ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
Azzalini, A.; Capitanio, A. Distributions Generated by Perturbation of Symmetry with Emphasis on a Multivariate Skew T-Distribution. J. R. Stat. Soc. Ser. B Stat. Methodol. 2003, 65, 367–389. [Google Scholar] [CrossRef]
Alruwaili, B. The Modality of Skew T-Distribution. Stat. Pap. 2023, 64, 497–507. [Google Scholar] [CrossRef]
Pravitasari, A.A.; Iriawan, N.; Fithriasari, K.; Purnami, S.W.; Irhamah; Ferriastuti, W. A Bayesian Neo-Normal Mixture Model (Nenomimo) for MRI-Based Brain Tumor Segmentation. Appl. Sci. 2020, 10, 4892. [Google Scholar] [CrossRef]
Rasyid, D.A.; Iriawan, N.; Mashuri, M. On the Flexible Neo-Normal MSAR MSN-Burr Control Chart in Air Quality Monitoring. Air Soil Water Res. 2024, 17, 11786221241272391. [Google Scholar] [CrossRef]
Nuraini, U.S.; Iriawan, N.; Fithriasari, K.; Hidayat, T. Ising Neo-Normal Model (INNM) for Segmentation of Cardiac Ultrasound Imaging. In Proceedings of the 2024 Beyond Technology Summit on Informatics International Conference (BTS-I2C), Jember, Indonesia, 19 December 2024; IEEE: Jember, Indonesia, 2024; pp. 298–303. [Google Scholar]
Welch, L.R. Hidden Markov Models and the Baum-Welch Algorithm. IEEE Inf. Theory Soc. Newsl. 2003, 53, 10–13. [Google Scholar]
Akaike, H. A New Look at the Statistical Model Identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
Hurvich, C.M.; Tsai, C.-L. Regression and Time Series Model Selection in Small Samples. Biometrika 1989, 76, 297–307. [Google Scholar] [CrossRef]
Schwarz, G. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Fernández, C.; Steel, M.F.J. On Bayesian Modeling of Fat Tails and Skewness. J. Am. Stat. Assoc. 1998, 93, 359–371. [Google Scholar] [CrossRef]
Castillo, N.O.; Gómez, H.W.; Leiva, V.; Sanhueza, A. On the Fernández–Steel Distribution: Inference and Application. Comput. Stat. Data Anal. 2011, 55, 2951–2961. [Google Scholar] [CrossRef]
Rantini, D.; Iriawan, N.; Irhamah. Fernandez–Steel Skew Normal Conditional Autoregressive (FSSN CAR) Model in Stan for Spatial Data. Symmetry 2021, 13, 545. [Google Scholar] [CrossRef]
Rabiner, L.R. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. IEEE 1989, 77, 257–286. [Google Scholar] [CrossRef]
Chen, Z.; Li, Y.; Xia, T.; Pan, E. Hidden Markov Model with Auto-Correlated Observations for Remaining Useful Life Prediction and Optimal Maintenance Policy. Reliab. Eng. Syst. Saf. 2019, 184, 123–136. [Google Scholar] [CrossRef]
Baum, L.E.; Petrie, T.; Soules, G.; Weiss, N. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains. Ann. Math. Stat. 1970, 41, 164–171. [Google Scholar] [CrossRef]
Yang, F.; Balakrishnan, S.; Wainwright, M.J. Statistical and Computational Guarantees for the Baum-Welch Algorithm. J. Mach. Learn. Res. 2017, 18, 1–53. [Google Scholar]
Nirmal, A.; Jayaswal, D.; Kachare, P.H. A Hybrid Bald Eagle-Crow Search Algorithm for Gaussian Mixture Model Optimisation in the Speaker Verification Framework. Decis. Anal. J. 2024, 10, 100385. [Google Scholar] [CrossRef]
Armstrong, R.A. When to Use the Bonferroni Correction. Ophthalmic Physiol. Opt. 2014, 34, 502–508. [Google Scholar] [CrossRef]

Figure 1. Density of the MSNBurr distribution for

μ = 0

,

ϕ = 1

, and various values of

γ

: (a)

γ = 1

; (b)

γ = 0.5

; (c)

γ = 2

; (d)

γ = 0.2

; (e)

γ = 5

; (f)

γ = 0.1

; (g)

γ = 10

.

Figure 1. Density of the MSNBurr distribution for

μ = 0

,

ϕ = 1

, and various values of

γ

: (a)

γ = 1

; (b)

γ = 0.5

; (c)

γ = 2

; (d)

γ = 0.2

; (e)

γ = 5

; (f)

γ = 0.1

; (g)

γ = 10

.

Figure 2. Illustration of generated data and emission histogram in the Scen3b.

Figure 3. Violin plots of log-likelihood values from 100 replications per scenario.

Figure 4. Violin plots of AIC values from 100 replications per scenario.

Figure 5. Violin plots of AICc values from 100 replications per scenario.

Figure 6. Violin plots of BIC values from 100 replications per scenario.

Figure 7. The number of replications where each HMM outperforms the others based on the log-likelihood.

Figure 8. The number of replications where each HMM outperforms the others based on the AIC.

Figure 9. The number of replications where each HMM outperforms the others based on the AICc.

Figure 10. The number of replications where each HMM outperforms the others based on the BIC.

Figure 11. Observable movement of daily JCI closing prices from 2023 to 2024 along with its histogram.

Figure 12. Trajectory of the parameter estimates up to the point of convergence indicated by the red dotted line.

Figure 13. Histogram of the Daily JCI closing price with two estimated emission density functions.

Table 1. Characteristics summary of generated data for all replications per scenario.

Scenario	Target of Emission Distribution		Median of Empirical Skewness
	Hidden State 1	Hidden State 2	Hidden State 1	Hidden State 2
Scen1a	MSNBurr(2, 1, 1)	MSNBurr(10, 1, 1)	−0.0159	0.0584
Scen1b	MSNBurr(4, 1, 1)	MSNBurr(7, 1, 1)	−0.0110	−0.0834
Scen2a	MSNBurr(2, 1, 1)	MSNBurr(10, 1, 5)	−0.0466	0.8220
Scen2b	MSNBurr(4, 1, 1)	MSNBurr(7, 1, 5)	−0.0929	0.8865
Scen3a	MSNBurr(2, 1, 0.5)	MSNBurr(10, 1, 5)	−0.6443	0.8058
Scen3b	MSNBurr(4, 1, 0.5)	MSNBurr(7, 1, 5)	−0.7681	0.8264

Table 2. Number of replications converging within specific iteration ranges for each scenario.

Iterations to Convergence	Scen1a	Scen1b	Scen2a	Scen2b	Scen3a	Scen3b
Below 11 Iterations	81	0	95	1	100	0
11–25 Iterations	19	5	4	34	0	54
26–50 Iterations	0	64	1	48	0	41
Above 50 Iterations	0	31	0	17	0	5

Table 3. Measure of central tendency for parameter estimate values.

Parameter	Scen1a		Scen1b		Scen2a		Scen2b		Scen3a		Scen3b
Parameter	Target	Est.	Target	Est.	Target	Est.	Target	Est.	Target	Est.	Target	Est.
$π_{1}$	0.50	0.47	0.50	0.53	0.50	0.47	0.50	0.50	0.50	0.47	0.50	0.52
$π_{2}$	0.50	0.53	0.50	0.47	0.50	0.53	0.50	0.50	0.50	0.53	0.50	0.48
$a_{11}$	0.70	0.68	0.70	0.71	0.70	0.68	0.70	0.70	0.70	0.69	0.70	0.71
$a_{12}$	0.30	0.32	0.30	0.29	0.30	0.32	0.30	0.30	0.30	0.31	0.30	0.29
$a_{21}$	0.20	0.21	0.20	0.21	0.20	0.21	0.20	0.20	0.20	0.21	0.20	0.20
$a_{22}$	0.80	0.79	0.80	0.79	0.80	0.79	0.80	0.80	0.80	0.79	0.80	0.80
$μ_{1}$	2.00	1.98	4.00	3.99	2.00	2.01	4.00	4.02	2.00	1.97	4.00	4.02
$μ_{2}$	10.00	9.98	7.00	6.96	10.00	9.99	7.00	6.98	10.00	10.01	7.00	7.00
$ϕ_{1}$	1.00	0.98	1.00	0.99	1.00	0.98	1.00	0.97	1.00	1.00	1.00	0.98
$ϕ_{2}$	1.00	1.03	1.00	0.95	1.00	0.99	1.00	0.99	1.00	1.00	1.00	0.98
$γ_{1}$	1.00	1.06	1.00	1.06	1.00	0.94	1.00	0.94	0.50	0.53	0.50	0.48
$γ_{2}$	1.00	1.07	1.00	1.16	5.00	5.82	5.00	6.68	5.00	5.99	5.00	6.15

Table 4. Comparison of the medians and IQRs of performance metrics between MSNBurr-HMM, Gaussian-HMM, and FSSN-HMM.

Scenario	Model	Log-Likelihood		AIC		AICc		BIC
Scenario	Model	Median	IQR	Median	IQR	Median	IQR	Median	IQR
Scen1a	MSNBurr-HMM	−416.93	18.77	851.86	37.54	852.81	37.54	881.55	37.54
	FSSN-HMM	−417.58	18.91	853.16	37.81	854.11	37.81	882.85	37.81
	Gaussian-HMM	−418.80	18.80	851.59	37.60	852.18	37.60	874.68	37.60
Scen1b	MSNBurr-HMM	−377.61	14.68	773.22	29.36	774.17	29.36	802.91	29.36
	FSSN-HMM	−389.26	17.44	796.52	34.88	797.47	34.88	826.21	34.88
	Gaussian-HMM	−381.23	17.51	776.46	35.02	777.04	35.02	799.54	35.02
Scen2a	MSNBurr-HMM	−409.68	19.91	837.37	39.83	838.32	39.83	867.05	39.83
	FSSN-HMM	−412.32	21.94	842.65	43.89	843.60	43.89	872.33	43.89
	Gaussian-HMM	−418.89	22.74	851.79	45.47	852.37	45.47	874.88	45.47
Scen2b	MSNBurr-HMM	−387.66	20.52	793.32	41.04	794.27	41.04	823.00	41.04
	FSSN-HMM	−393.53	19.35	805.06	38.69	806.01	38.69	834.75	38.69
	Gaussian-HMM	−395.10	21.37	804.19	42.75	804.78	42.75	827.28	42.75
Scen3a	MSNBurr-HMM	−417.50	17.04	852.99	34.09	853.94	34.09	882.68	34.09
	FSSN-HMM	−419.76	19.65	857.52	39.29	858.47	39.29	887.21	39.29
	Gaussian-HMM	−429.20	21.37	872.41	42.75	872.99	42.75	895.49	42.75
Scen3b	MSNBurr-HMM	−400.67	18.10	819.34	36.19	820.29	36.19	849.02	36.19
	FSSN-HMM	−404.80	18.43	827.59	36.85	828.54	36.85	857.28	36.85
	Gaussian-HMM	−408.70	18.64	831.41	37.28	831.99	37.28	854.50	37.28

The blue-highlighted cells in the median column indicated the best model (higher log-likelihood or lower AIC, AICc, BIC), while those in the IQR column represented smaller measures of dispersion.

Table 5. Parameter estimates of the MSNBurr-HMM for the daily JCI closing prices data from 2023 to 2024.

Parameter	Estimated Value
$π_{1}$	1
$π_{2}$	1 × 10⁻¹²
$a_{11}$	0.991
$a_{12}$	0.009
$a_{21}$	0.004
$a_{22}$	0.996
$μ_{1}$	6879.568
$μ_{2}$	7229.954
$ϕ_{1}$	88.677
$ϕ_{2}$	170.374
$γ_{1}$	0.348
$γ_{2}$	27,871.600

Table 6. Performance of the three models based on Log-Likelihood, AIC, AICc, and BIC.

Model	Log-Likelihood	AIC	AICc	BIC
MSNBurr-HMM	−3060.60	6139.19	6139.57	6176.72
FSSN-HMM	−3066.02	6150.04	6150.42	6187.56
Gaussian-HMM	−3083.43	6180.86	6181.10	6210.05

The blue-highlighted cells indicate the best performance (the highest log-likelihood or the lowest AIC, AICc, and BIC).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Unggul, D.B.; Iriawan, N.; Irhamah, I. Parameter Estimation of MSNBurr-Based Hidden Markov Model: A Simulation Study. Symmetry 2025, 17, 1931. https://doi.org/10.3390/sym17111931

AMA Style

Unggul DB, Iriawan N, Irhamah I. Parameter Estimation of MSNBurr-Based Hidden Markov Model: A Simulation Study. Symmetry. 2025; 17(11):1931. https://doi.org/10.3390/sym17111931

Chicago/Turabian Style

Unggul, Didik Bani, Nur Iriawan, and Irhamah Irhamah. 2025. "Parameter Estimation of MSNBurr-Based Hidden Markov Model: A Simulation Study" Symmetry 17, no. 11: 1931. https://doi.org/10.3390/sym17111931

APA Style

Unggul, D. B., Iriawan, N., & Irhamah, I. (2025). Parameter Estimation of MSNBurr-Based Hidden Markov Model: A Simulation Study. Symmetry, 17(11), 1931. https://doi.org/10.3390/sym17111931

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Parameter Estimation of MSNBurr-Based Hidden Markov Model: A Simulation Study

Abstract

1. Introduction

2. Methods

2.1. Hidden Markov Model

2.2. MSNBurr-Based HMM

2.3. Parameter Estimation of MSNBurr-HMM

2.4. Simulation Design

2.5. Evaluation Criteria

3. Results

3.1. Simulation Results

3.2. Real-Data Example

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI