SEP-HMM: A Flexible Hidden Markov Model Framework for Asymmetric and Non-Mesokurtic Emission Patterns

Unggul, Didik Bani; Iriawan, Nur; Irhamah, Irhamah; Prabowo, Andriyas Aryo

doi:10.3390/math14030393

Open AccessArticle

SEP-HMM: A Flexible Hidden Markov Model Framework for Asymmetric and Non-Mesokurtic Emission Patterns

¹

Department of Statistics, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia

²

Yogyakarta Climatology Station, Indonesia Agency for Meteorology Climatology and Geophysics, Yogyakarta 55285, Indonesia

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(3), 393; https://doi.org/10.3390/math14030393

Submission received: 19 December 2025 / Revised: 14 January 2026 / Accepted: 21 January 2026 / Published: 23 January 2026

(This article belongs to the Special Issue Statistics and Data Science)

Download

Browse Figures

Versions Notes

Abstract

This paper proposes a new Hidden Markov Model (HMM) framework integrated with the Skew Exponential Power (SEP) distribution, named SEP-HMM. The primary advantage of this method is its ability to capture and represent asymmetric and non-mesokurtic emission patterns, which are often encountered in real-world phenomena. This advantage makes it more flexible than well-known HMMs, such as Gaussian-HMM, which are still rigidly based on symmetric and mesokurtic assumptions. We formulate and present its complete algorithm for parameter estimation and hidden state decoding. To test its effectiveness, we run simulations with various scenarios and apply SEP-HMM to real datasets consisting of stock price and temperature datasets. In the simulations conducted, the superiority of SEP-HMM compared to Gaussian-HMM and Skew Normal-HMM is confirmed in most of the replications, both in assessing model fit and in identifying hidden states. This is also supported by the real-case dataset, where SEP-HMM outperforms the benchmark models in all tested metrics.

Keywords:

Hidden Markov Model; skew exponential power distribution; stochastic modeling; asymmetric pattern; non-mesokurtic pattern; stock price; temperature

MSC:

60J10; 62M05

1. Introduction

A Hidden Markov Model (HMM) is a modeling framework based on a doubly stochastic process, consisting of an unobserved (hidden) state process that cannot be directly observed and an observation process whose outputs depend on those hidden states [1,2]. The hidden process evolves as a Markov chain whose transitions are governed by a state-transition probability matrix [3]. The observable data are then produced through an emission mechanism, where each hidden state is associated with a specific emission distribution that governs the probabilistic output.

The applications of HMMs span a wide range of fields, including economics [4], bioinformatics [5], and ecology [6]. In the context of bioinformatics, Ma et al. [5] explain that HMMs can be applied to tasks such as transmembrane protein prediction, gene finding, sequence alignment, and related analyses. In economics, HMMs are used to identify hidden regimes in macroeconomic growth, inflation dynamics, stock returns, and broader financial market conditions [4]. Other applications also emerge in ecology, where HMMs help provide a more complete picture of animal behavior, integrate individual- and population-level movement dynamics, and model behavior-dependent population processes [6]. Moreover, HMMs have also been employed in numerous studies in psychology as well as in speech recognition [7,8].

Even with their widespread applications, the fundamental aspects of HMMs should not be neglected. The performance of an HMM depends on the choice of emission distribution. Selecting a distribution that does not adequately represent the emission patterns can lead to poor overall model fit and reduced performance in hidden-state identification. Therefore, choosing the emission distribution is a crucial step in model specification. The most commonly used HMM, the Gaussian-HMM, fails to capture asymmetric patterns because its underlying emission distribution is strictly symmetric [9]. Several studies have developed HMMs using distributions capable of capturing asymmetric patterns, such as the Gamma distribution [10,11] and the Weibull distribution [12]. These distributions are effective for non-negative data and right-skewed behavior. However, because their shapes are inherently non-symmetric, their flexibility remains limited when the data exhibit patterns closer to symmetry.

Other studies, such as Nigri [13], incorporate the Skew-Normal distribution [14] within an HMM framework. This distribution is chosen for its flexibility in capturing both symmetric and asymmetric patterns. In other words, depending on the parameter values, it can exhibit left- or right-skewed behavior, yet also reduce to a symmetric form under specific conditions. This same issue is also highlighted by Unggul et al. [9], who constructed HMMs based on the MSNBurr [15] and Fernández–Steel Skew-Normal [16] distributions. We observe that the flexibility addressed in previous studies has predominantly focused on the ability to accommodate different levels of skewness. Flexibility with respect to varying levels of kurtosis, however, remains rarely explored in HMM-related research. In fact, real-world phenomena may naturally exhibit leptokurtic or platykurtic patterns [17,18,19].

Motivated by this problem, we modify a well-known HMM framework by integrating a flexible distribution called the Skew Exponential Power (SEP) distribution [20]. This distribution has four parameters: a location parameter, a scale parameter, and two shape parameters. Its main advantage lies in its robustness in accommodating both symmetric and asymmetric patterns, as well as a wide range of kurtosis levels, including mesokurtic, leptokurtic, and platykurtic. These two characteristics (skewness and kurtosis) are directly controlled by the two shape parameters. At certain values of the shape parameters, this distribution can resemble the density of the Normal, Skew-Normal, Exponential Power, Laplace, and Uniform distributions, confirming its ability to generalize a wide range of distributional shapes. Moreover, the SEP distribution maintains a stable mode at the location parameter. This property makes the mode identifiable and provides a clearer advantage in interpreting the measure of central tendency.

We refer to the integration of the HMM framework with the SEP distribution as the SEP-HMM. In addition to introducing the model, we present procedures for maximum likelihood parameter estimation and hidden-state identification (decoding). The maximum likelihood estimates are computed using the Baum-Welch algorithm, an Expectation-Maximization procedure. Decoding is then performed using the widely used Viterbi algorithm based on the fitted model parameters. To validate the performance of the SEP-HMM, we conducted simulations under various emission scenarios constructed from combinations of symmetric, asymmetric, mesokurtic, leptokurtic, and platykurtic patterns. The proposed model is compared with two benchmark methods, namely the Gaussian-HMM and the Skew-Normal-HMM (SN-HMM), using evaluation metrics for goodness-of-fit and hidden-state decoding accuracy. More specifically, we compare the log-likelihood, Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and two confusion matrix-based metrics: accuracy and GMean. In addition to the simulation study, we further evaluate the three models on two real-world datasets involving stock-price and temperature-data modeling.

Thus, the objectives of this study are fourfold: (1) to formulate a SEP-based HMM (SEP-HMM) modeling framework; (2) to provide parameter estimation and hidden-state decoding algorithms for SEP-HMM; (3) to conduct a series of simulations to evaluate the robustness of the developed model; and (4) to demonstrate the application of SEP-HMM to two real-world cases, specifically for modeling stock price and temperature datasets. The outline for the remainder of this article is as follows. Section 2 outlines the methodology used, including the introduction of the SEP-HMM framework, parameter estimation techniques, decoding algorithms, model evaluation metrics, and the simulation design employed. This is followed by Section 3, which presents the simulation results in a comprehensive manner. The application of SEP-HMM to real-world examples is completed in Section 4. Section 5 provides a discussion, and Section 6 concludes with a summary of the findings.

2. Methods

2.1. Hidden Markov Model

A Hidden Markov Model (HMM) is a doubly stochastic process consisting of two sequences, namely a hidden state sequence that evolves according to a Markov process and an observable sequence that is generated conditionally on the hidden states. Let

q_{1 : T} = (q_{1}, q_{2}, \dots, q_{T})

denote the sequence of hidden states, where

q_{t}

for

t = 1,2, \dots, T

is drawn from the set of

K

possible states, i.e.,

q_{t} \in {S_{1}, S_{2}, \dots, S_{K}}

. Here, the label of the hidden state at a given time

t

depends only on the hidden state at the previous time step, reflecting the Markov property. In addition, there exists another sequence,

O_{1 : T} = (O_{1}, O_{2}, \dots, O_{T})

, which is observable. This sequence is generated according to a conditional emission process, where the observed value at each time step depends on the corresponding hidden state at that time.

This process is accommodated through the emission distribution, represented by the probability density function denoted as

b_{i} (O_{t} | θ_{i})

, for

i = 1,2, \dots, K

and

t = 1,2, \dots, T

. Here,

θ_{i}

represents the parameters associated with the emission distribution of the

i

-th hidden state. These parameters define the likelihood of observing

O_{t}

given that the system is in state

S_{i}

at time

t

. Therefore, the complete set of emission parameters is denoted by

Θ = (θ_{1}, θ_{2}, \dots, θ_{K})

. Typically,

b_{i} (O_{t} | θ_{i})

inherits the PDF form of common probability distributions such as the Gaussian, Gamma, or Weibull distributions. It may also follow discrete distributions, including the Poisson [21,22,23] or Negative Binomial [24,25] distributions. The model also has

π = (π_{1}, π_{2}, \dots, π_{K})

, the initial state distribution, and

A

, the

K \times K

state transition matrix, which governs the probabilities of transitioning between hidden states. Thus, an HMM contains a set of parameters

λ = (π, A, Θ)

[2]. For better clarity, we present Figure 1, which visualizes the relationship between the hidden sequence, observable sequence, and each parameter component in

λ

. Both sequences (

q_{1 : T}

and

O_{1 : T}

) are shown by the blue boxes. The solid blue arrows represent the Markov property between hidden state periods. The dashed red arrows illustrate the conditional emission process from the hidden states to the observed values. Finally, the dotted black arrows indicate the parameters governing the process.

2.2. Proposed Method: SEP-HMM

In this section, we introduce the integration of HMM with the Skew Exponential Power (SEP) distribution [20], which we refer to as SEP-HMM. The core innovation lies in adopting a more flexible emission distribution that relaxes the strict symmetry and mesokurtic assumptions of traditional HMMs while still being able to represent those cases when needed. This flexibility allows the model to capture a wider range of emission behaviors. To clarify the context, we first provide a brief overview of the four-parameter SEP distribution. Let

Y

follow an SEP distribution with parameters

(μ, σ, α, β)

, where

- \infty < μ < \infty

,

σ > 0

,

0 < α < 1

, and

- 1 < β \leq 1

. The probability density function of

Y

is given in Equation (1), with

z

and

τ

defined in Equations (2) and (3), respectively [20].

f_{S E P} (y | μ, σ, α, β) = \{\begin{matrix} \frac{τ}{σ} \exp \{- \frac{1}{2} {(|z| + (2 α - 1) z)}^{\frac{2}{1 + β}}\}, & - \infty < y < \infty \\ 0 & o t h e r w i s e \end{matrix}

(1)

z = \frac{y - μ}{σ}

(2)

τ = {[Γ (1 + \frac{1 + β}{2})]}^{- 1} \frac{4 α (1 - α)}{2^{1 + \frac{1 + β}{2}}}

(3)

We also present several propositions that summarize important characteristics of the SEP distribution. Proposition 1 establishes that the function defined above is a probability density function (PDF). Proposition 2 shows that the maximum of

f_{S E P} (y | μ, σ, α, β)

always occurs at the location parameter

μ

, regardless of the values of the other parameters. Hence, the SEP distribution has a stable mode at

μ

. Proposition 3 explains how the parameter

α

determines the direction of skewness of the distribution. Finally, Proposition 4 describes how

β

relates to the kurtosis of the distribution. These four propositions are stated below, and their complete proofs are provided in Appendix A.

Proposition 1.

The function

f_{S E P} (y | μ, σ, α, β)

defined in Equation (1) is a probability density function.

Proposition 2.

For any

σ > 0

,

0 < α < 1

, and

- 1 < β \leq 1

, the density

f_{S E P} (y | μ, σ, α, β)

attains its global maximum at

y = μ

. Hence,

μ

is the mode of the SEP distribution.

Proposition 3.

Let

Y ~ S E P (μ, σ, α, β)

. The direction of skewness is governed by the parameter

α

as follows:

The distribution is symmetric if and only if $α = 0.5$
The distribution is positively skewed if and only if $α < 0.5$
The distribution is negatively skewed if and only if $α > 0.5$

Proposition 4.

Let

Y ~ S E P (μ, σ, α, β)

and define

p

as in Equation (4). Consider the kurtosis coefficient about the location parameter

μ

defined in Equation (5).

p = \frac{2}{1 + β}

(4)

K u r t_{μ} (Y) ≔ \frac{E [{(Y - μ)}^{4}]}{{\{E [{(Y - μ)}^{2}]\}}^{2}}

(5)

Then, for any fixed

α \in (0,1)

,

{K u r t}_{μ} (Y)

depends on

β

through

p

. Moreover, increasing

β

(decreasing

p

) modifies the

{K u r t}_{μ} (Y)

in a way that is typically associated with heavier tails, whereas decreasing

β

(increasing

p

) is typically associated with lighter tails.

In addition, Figure 2 is provided to illustrate the flexibility of the SEP distribution in capturing a variety of distributional shapes. The figure also offers a visual intuition for the properties stated in Propositions 2–4, where the mode remains at

μ

,

α

is associated with the direction of skewness, and

β

is related to kurtosis.

Building on these distributional properties, we return to the HMM setting and specify the SEP distribution introduced above as the state-dependent emission model, resulting in the SEP-HMM. For each state

i = 1,2, \dots, K

, the emission density

b_{i} (O_{t} | θ_{i})

follows Equation (6), with the corresponding normalizing constant

τ_{i}

defined in Equation (7). With this setup, the set of emission parameters for the

i

-th state is given by

θ_{i} = \{μ_{i}, σ_{i}, α_{i}, β_{i}\}

, where these parameters correspond to location, scale, and two shape parameters, respectively [20]. The parameter domain is specified as

- \infty < μ_{i} < \infty

,

σ_{i} > 0

,

0 < α_{i} < 1

, and

- 1 < β_{i} \leq 1

for

i = 1,2, \dots, K

.

b_{i} (O_{t} | θ_{i}) = \{\begin{matrix} \frac{τ_{i}}{σ_{i}} \exp \{- \frac{1}{2} {[|\frac{O_{t} - μ_{i}}{σ_{i}}| + (2 α_{i} - 1) (\frac{O_{t} - μ_{i}}{σ_{i}})]}^{\frac{2}{1 + β_{i}}}\} & - \infty < O_{t} < \infty \\ 0 & o t h e r w i s e \end{matrix}

(6)

τ_{i} = {[Γ (1 + \frac{1 + β_{i}}{2})]}^{- 1} \frac{4 α_{i} (1 - α_{i})}{2^{[1 + \frac{1 + β_{i}}{2}]}}

(7)

In addition to the emission parameters

Θ = (θ_{1}, θ_{2}, \dots, θ_{K})

, SEP-HMM is also characterized by the parameters

π

and

A

, which represent the initial state distribution vector of size

K \times 1

and the state transition probability matrix of size

K \times K

, respectively. The elements of

π

, denoted by

π_{i}

for

i = 1,2, \dots, K

, are defined in Equation (8) and satisfy the constraints

π_{i} \geq 0

and

\sum_{i = 1}^{K} π_{i} = 1

. The elements of the matrix

A

, denoted by

a_{i j}

for

i, j = 1,2, \dots, K

, also satisfy

a_{i j} \geq 0

and the row-stochastic constraint

\sum_{j = 1}^{K} a_{i j} = 1

for each

i = 1,2, \dots, K

. The definition of

a_{i j}

is given in Equation (9).

π_{i} = \Pr (q_{1} = S_{i}), i = 1,2, \dots, K

(8)

a_{i j} = \Pr (q_{t} = S_{j} | q_{t - 1} = S_{i}), i, j = 1,2, \dots, K, t = 2,3, \dots, T

(9)

We can construct the complete-data likelihood as the joint probability of the hidden state sequence

q_{1 : T}

and the observable sequence

O_{1 : T}

. Under the Markov property governing the hidden states and the conditional independence assumptions that characterize the emission process, this joint probability factorizes into the product of the initial state probability, the state transition probabilities, and the emission densities. Formally, the complete-data likelihood is expressed in Equation (10). In the next section, we use this

L_{c} (λ)

to derive the Expectation-Maximization updates (as implemented by the Baum-Welch algorithm) to compute the maximum likelihood estimate of

λ

.

L_{c} (λ) = \Pr (q_{1 : T}, O_{1 : T} | π, A, Θ) = \Pr (q_{1} | π) \prod_{t = 2}^{T} \Pr (q_{t} | q_{t - 1}, A) \prod_{t = 1}^{T} \Pr (O_{t} | q_{t}, Θ)

(10)

2.3. Parameter Estimation of SEP-HMM

We estimate the SEP-HMM parameters by maximum likelihood. The maximum likelihood estimate is computed numerically using the Expectation-Maximization (EM) algorithm, implemented via the Baum-Welch algorithm [26,27,28]. Let

λ = (π, A, Θ)

denote the parameter set. Here,

O_{1 : T}

is fixed (observed) and the likelihood is viewed as a function of

λ

. In each iteration, the E-step uses the current parameters to compute posterior quantities for

q_{1 : T}

(equivalently, expectations of the complete-data log-likelihood

\log L_{c} (λ)

), and the M-step updates

π

,

A

, and

Θ

. To avoid ill-defined maximum likelihood estimation due to the unboundedness of the likelihood, the M-step for the SEP emission parameters is carried out under the constraint

σ_{i} \geq σ_{m i n}

(we use

σ_{m i n} = 10^{- 8}

). This procedure iteratively increases

\log \Pr (O_{1 : T} | λ)

with respect to

λ

until convergence, where

\Pr (O_{1 : T} | λ) = \sum_{q_{1 : T}} \Pr (q_{1 : T}, O_{1 : T} | λ)

.

To implement the E-step efficiently, we define the forward and backward variables, denoted by

f_{t} (i)

and

b_{t} (i)

, respectively. The forward variable for each time index

t = 1,2, \dots, T

and state

i = 1,2, \dots, K

is defined in Equation (11). In general, Equation (11) can be rewritten in a recursive form for

t = 2,3, \dots, T

, as shown in Equation (12). This process shows that the forward variable is computed sequentially from the first to the last period. In contrast, the backward variable

b_{t} (i)

is computed in the opposite direction, starting from the

T

-th period and moving back to the first period. The general definition of the backward variable is given in Equation (13), and the recursive form for each

t

is presented in Equation (14).

f_{t} (i) = \Pr (O_{1 : t}, q_{t} = S_{i} | λ), t = 1,2, \dots, T, i = 1,2, \dots, K

(11)

f_{t} (i) = \{\begin{matrix} π_{i} b_{i} (O_{1} | θ_{i}) & t = 1 \\ (\sum_{j = 1}^{K} f_{t - 1} (j) a_{j i}) b_{i} (O_{t} | θ_{i}) & t = 2,3, \dots, T \end{matrix}, i = 1,2, \dots, K

(12)

b_{t} (i) = \Pr (O_{t + 1 : T} | q_{t} = S_{i}, λ), t = 1,2, \dots, T, i = 1,2, \dots, K

(13)

b_{t} (i) = \{\begin{matrix} 1 & t = T \\ \sum_{j = 1}^{K} a_{i j} b_{j} (O_{t + 1} | θ_{j}) b_{t + 1} (j) & t = T - 1, T - 2, \dots, 1 \end{matrix}, i = 1,2, \dots, K

(14)

The next step is to use

f_{t} (i)

and

b_{t} (i)

to obtain

ϕ_{t} (i)

and

ξ_{t} (i, j)

. These two quantities are computed using Equations (15) and (16), respectively. Intuitively,

ϕ_{t} (i)

represents the expected state occupancy count, whereas

ξ_{t} (i, j)

represents the expected state transition count [9]. The resulting values of

ϕ_{t} (i)

and

ξ_{t} (i, j)

are then used to update the parameter set

λ

.

ϕ_{t} (i) = \Pr (q_{t} = S_{i} | O_{1 : T}, λ) = \frac{f_{t} (i) b_{t} (i)}{\sum_{l = 1}^{K} f_{t} (l) b_{t} (l)}, t = 1,2, \dots, T, i = 1,2, \dots, K

(15)

\begin{matrix} ξ_{t} (i, j) = \Pr (q_{t} = S_{i}, q_{t + 1} = S_{j} | O_{1 : T}, λ), t = 1,2, \dots, T - 1, i, j = 1,2, \dots, K \\ = \frac{f_{t} (i) a_{i j} b_{j} (O_{t + 1} | θ_{j}) b_{t + 1} (j)}{\sum_{i^{'} = 1}^{K} \sum_{j^{'} = 1}^{K} f_{t} (i^{'}) a_{i^{'} j^{'}} b_{j^{'}} (O_{t + 1} | θ_{j^{'}}) b_{t + 1} (j^{'})} \end{matrix}

(16)

Appendix B details how these quantities enter the expected complete-data log-likelihood and how they lead to the update rules for

π

,

A

, and

Θ

. Briefly, this procedure is carried out independently for each component of

λ

. The initial-state probabilities

π_{i}

are updated using

ϕ_{1} (i)

, as shown in Equation (17). The transition probabilities are updated according to Equation (18). The emission parameters are updated in the M-step by maximizing the expected complete-data log-likelihood as in Equation (19) (details are provided in Appendix B).

The updated parameters are then used to recompute the forward and backward variables. This iterative process continues until the convergence criterion specified in Equation (20) is satisfied, where

m

and

m + 1

denote two consecutive EM iteration indices, and

ε

denotes a small positive convergence threshold. The parameter values obtained in the final iteration are treated as the final estimates. Algorithm 1 summarizes the complete procedure.

π_{i}^{n e w} = ϕ_{1} (i), i = 1,2, \dots, K

(17)

a_{i j}^{n e w} = \frac{\sum_{t = 1}^{T - 1} ξ_{t} (i, j)}{\sum_{t = 1}^{T - 1} ϕ_{t} (i)}, i, j = 1,2, \dots, K

(18)

θ_{i}^{n e w} = \arg \max_{θ_{i}} \sum_{t = 1}^{T} ϕ_{t} (i) \{\log [\frac{4 α_{i} (1 - α_{i})}{σ_{i} Γ (1 + \frac{1 + β_{i}}{2}) 2^{[1 + \frac{1 + β_{i}}{2}]}}] - \frac{1}{2} {[|\frac{O_{t} - μ_{i}}{σ_{i}}| + (2 α_{i} - 1) (\frac{O_{t} - μ_{i}}{σ_{i}})]}^{\frac{2}{1 + β_{i}}}\}, i = 1,2, \dots, K

(19)

‖λ^{(m + 1)} - λ^{(m)}‖ < ε

(20)

Algorithm 1. Baum-Welch algorithm for SEP-HMM

Input : Sequence of observed data (O_{1 : T})

; Initial parameter values (λ^{(0)})

; Convergence tolerance (ε

)

Output : Estimated model parameter (\hat{λ}

)

1 . Initialize model parameter : λ^{(0)} = (π^{(0)}, A^{(0)}, Θ^{(0)})

2 . Set the iteration index : m \leftarrow 0

3. Repeat until convergence is reached:
// E-Step

Calculate f_{t} (i)

and b_{t} (i)

using Equations (12) and (14), t = 1,2, \dots, T

, i = 1,2, \dots, K

Calculate ϕ_{t} (i)

using Equation (15), t = 1,2, \dots, T

, i = 1,2, \dots, K

Calculate ξ_{t} (i, j)

using Equation (16), t = 1,2, \dots, T - 1

, i, j = 1,2, \dots, K

// M-Step

Update π_{i}

, i = 1,2, \dots, K

, using Equation (17) where π_{i}^{(m + 1)} \leftarrow π_{i}^{n e w}

Update a_{i j}

, i, j = 1,2, \dots, K

, using Equation (18) where a_{i j}^{(m + 1)} \leftarrow a_{i j}^{n e w}

Update θ_{i}

, i = 1,2, \dots, K

, using Equation (19) where θ_{i}^{(m + 1)} \leftarrow θ_{i}^{n e w}

// Convergence check

If ‖λ^{(m + 1)} - λ^{(m)}‖ < ε

then stop the iteration and set \hat{λ} \leftarrow λ^{(m + 1)}

Else, increment iteration index : m \leftarrow m + 1

and repeat step 3.

return \hat{λ} = (\hat{π}, \hat{A}, \hat{Θ})

2.4. Decoding Algorithm of SEP-HMM

The task of decoding hidden states is carried out using the Viterbi algorithm [29], which follows parameter estimation that provides

\hat{λ}

. This algorithm relies on two quantities,

δ_{t} (i)

and

ψ_{t} (i)

, for

t = 1,2, \dots, T

and

i = 1,2, \dots, K

, obtained from Equations (21) and (22), respectively. Notice that for the SEP-HMM, the value of

b_{i} (O_{t} | θ_{i})

is computed using Equation (6), which corresponds directly to the PDF of the SEP distribution with parameters taken from

\hat{λ}

.

δ_{t} (i) = \{\begin{matrix} π_{i} b_{i} (O_{1} | θ_{i}) & t = 1 \\ [\max_{j \in \{1,2, \dots, K\}} δ_{t - 1} (j) a_{j i}] b_{i} (O_{t} | θ_{i}) & t = 2,3, \dots, T \end{matrix}

(21)

ψ_{t} (i) = \{\begin{matrix} 0 & t = 1 \\ \arg \max_{j \in \{1,2, \dots, K\}} δ_{t - 1} (j) a_{j i} & t = 2,3, \dots, T \end{matrix}

(22)

After reaching

t = T

, the procedure continues by selecting the hidden state with the highest

δ_{T} (\cdot)

value. This hidden state is denoted by

q_{T}^{*}

and is formally defined in Equation (23). The subsequent step is backtracking to determine

q_{T - 1}^{*}

,

q_{T - 2}^{*}

, …,

q_{1}^{*}

according to Equation (24).

q_{T}^{*} = \arg \max_{i \in \{1,2, \dots, K\}} δ_{T} (i)

(23)

q_{t}^{*} = ψ_{t + 1} (q_{t + 1}^{*}); t = T - 1, T - 2, \dots, 1

(24)

The output of this algorithm is the decoded hidden-state path

q_{1 : T}^{*} = (q_{1}^{*}, q_{2}^{*}, \dots, q_{T}^{*})

, which serves as the estimate of

q_{1 : T}

given

\hat{λ}

and

O_{1 : T}

. These steps are summarized in Algorithm 2, as written below.

Algorithm 2. Viterbi algorithm for SEP-HMM

Input : Sequence of observed data (O_{1 : T}

); Model parameters λ = (π, A, Θ)

Output : The optimal hidden state sequence (q_{1 : T}^{*}

)
1. Initialization and Recursion

For t = 1,2, \dots, T

do

For i = 1,2, \dots, K

do

Calculate δ_{t} (i)

using Equation (21)

Calculate ψ_{t} (i)

using Equation (22)
End For
End For
2. Termination

Calculate q_{T}^{*}

using Equation (23)
3. Backtracking

For t = T - 1, T - 2, \dots, 1

do

Calculate q_{t}^{*}

using Equation (24)
End For

return q_{1 : T}^{*} = (q_{1}^{*}, q_{2}^{*}, \dots, q_{T}^{*})

2.5. Model Evaluation

The evaluation of HMM performance can generally be classified into two main types. One type is model fitness, which measures how well the model fits the observed data. Common metrics for this include Log-Likelihood and information criteria such as Akaike Information Criterion (AIC) [30] and Bayesian Information Criterion (BIC) [31]. The other type is hidden state decoding accuracy, which is assessed using metrics based on confusion matrices, measuring the difference between actual and predicted hidden state categories. Examples of these metrics include accuracy and GMean.

The log-likelihood value can be derived from the parameter estimation process, specifically using

f_{T} (\cdot)

, as shown in Equation (25). Using these results, we can calculate the AIC and BIC with Equations (26) and (27). Note that the basic idea of AIC and BIC is to penalize complex models by including the number of parameters (

ϱ

) in their formulas. The smaller the AIC and BIC values, the better the model’s performance. In contrast, a higher log-likelihood value indicates a better fit.

L o g - L i k e l i h o o d = \log \Pr (O_{1 : T} | λ) = \log (\sum_{i = 1}^{K} f_{T} (i))

(25)

A I C = - 2 \log \Pr (O_{1 : T} | λ) + 2 ϱ

(26)

B I C = - 2 \log \Pr (O_{1 : T} | λ) + ϱ \log (T)

(27)

Confusion matrix-based evaluation metrics, such as accuracy and GMean, measure how well the hidden state identification from the Viterbi algorithm aligns with the actual labels. For the two hidden state categories, the confusion matrix is shown in Table 1, where True Positive (TP) represents the number of times State 1 is correctly decoded as State 1, and True Negative (TN) represents the number of times State 2 is correctly decoded as State 2. False Positive (FP) denotes the cases where State 2 is incorrectly decoded as State 1, while False Negative (FN) denotes the cases where State 1 is incorrectly decoded as State 2. Accuracy and GMean can be calculated using Equations (28) and (29), respectively. Accuracy measures how many predictions are correct out of all predictions. Meanwhile, GMean evaluates the balance between correctly identifying the positive class and the negative class, as it combines Sensitivity and Specificity through their geometric mean. The model’s performance is considered better as these values increase.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(28)

G M e a n = \sqrt{S e n s i t i v i t y \times S p e c i f i c i t y} = \sqrt{\frac{T P}{T P + F N} \times \frac{T N}{T N + F P}}

(29)

2.6. Simulation Setup

This study constructs nine distinct simulation scenarios. Each scenario will be replicated 500 times, with each replication constructed from a different data generation seed. Each replication across all scenarios has a length of

T = 500

periods, with the number of states fixed at 2. The initial distribution is

π = (0.5, 0.5)

, and the transition probabilities are

a_{i i} = 0.8

for

i = 1,2 .

The difference between each scenario lies in the underlying emission distribution. Each of them represents a unique emission condition, as shown in Table 2 and visualized in Figure 3. Figure 3 visualizes the SEP emission targets used for data generation (solid: state 1; dashed: state 2). By Proposition 1, each curve is a valid (normalized) SEP density. The scenario grid is constructed to reflect Proposition 3, where

α = 0.5

yields symmetry and

α \neq 0.5

induces skewness (with more extreme

α

producing stronger skewness). Accordingly, Scenarios 1, 4, and 7 use

α = (α_{1}, α_{2}) = (0.5,0.5)

for both states (symmetric), while Scenarios 2, 5, and 8 use a moderate asymmetry setting (mild skewness), and Scenarios 3, 6, and 9 use a more extreme asymmetry setting (strong skewness). In addition, the three rows in Figure 3 vary

β

to represent different tail and peak shapes (Proposition 4): larger

β

corresponds to heavier tails and typically a more peaked appearance (more leptokurtic), whereas smaller

β

yields lighter tails and a flatter peak (more platykurtic appearance); the middle row corresponds to an intermediate, mesokurtic case. Overall, the nine scenarios span distinct combinations of skewness level (controlled by

α

) and tail behavior/kurtosis level (controlled by

β

), yielding a compact grid of emission conditions for evaluating both fit and decoding performance.

At each replication, we perform SEP-HMM modeling, and the model’s fit and decoding accuracy were evaluated using the metrics described in Section 2.5. These results were compared with benchmark models, namely Gaussian-HMM and SN-HMM, to assess their relative performance. The processes of data generation, model estimation, decoding, and visualization were implemented using the R statistical programming environment (version 4.3.1) [32]. It can be reproduced using the R code available at https://github.com/dbunggul/sepHMM-bwa (accessed on 24 November 2025).

3. Simulation Results

This section is divided into three parts. The first part examines the closeness of the SEP-HMM parameter estimates to the actual target values. The second part discusses the fit of the SEP-HMM model (in terms of Log-Likelihood, AIC, and BIC) compared to its comparators. The third part explains the hidden state decoding capabilities of the proposed model.

3.1. Evaluation of SEP-HMM Parameter Estimates

We summarize how closely the SEP-HMM parameter estimation results align with the actual targets in Table 3. This table presents the measures of central tendency (mean) of the parameter estimations performed in 500 replications per scenario. The estimates for each parameter are very close to the actual target. This is consistent across almost all scenarios and parameters. However, we notice that the emission parameter in Scenario 6, specifically the

β = (β_{1}, β_{2})

parameter, has a larger difference in the estimated value compared to the other parameters.

Further investigation is then provided by presenting Figure 4, which visualizes the target emission PDF plot (black line) with the estimated PDF plot for each replication (blue line). In Scenarios 1–5 and 7–9, it is evident that the obtained parameter estimates can produce PDF plots that accurately capture emission patterns, whether symmetrical, asymmetrical, or with varying degrees of tail thickness. However, Scenario 6 displays a noticeably different behavior from the other cases. This scenario is constructed under a more extreme configuration of the emission parameters, specifically with highly contrasting values of the shape parameters

α_{1} = 0.9

and

α_{2} = 0.1

, together with a positive

β

parameter (

β_{1} = β_{2} = 0.5

), which produces extremely heavy tails while still inducing noticeable skewness and a relatively sharp peak in the emission density. These combinations create a shape that is substantially more difficult to approximate with high precision.

Nevertheless, it should be acknowledged that the model still captures several key distributional characteristics remarkably well. It correctly identifies the mode, the direction of skewness, and the relative thickness of the tails, complementing the strong performance already demonstrated in the other scenarios. To more objectively assess how well SEP-HMM fits the data, the next subsection presents a comparative evaluation against two benchmark models: Gaussian-HMM and the Skew Normal-based HMM (SN-HMM), using three standard metrics: Log-Likelihood, AIC, and BIC.

3.2. Model Fit Comparison

The evaluation and comparison process between SEP-HMM, SN-HMM, and Gaussian-HMM begins with an analysis of Table 4 below. This table contains the mean and standard deviation of the Log-Likelihood, AIC, and BIC values for each model per scenario. We can observe that SEP-HMM consistently outperforms the other models. In the Log-Likelihood metric, all scenarios consistently show that the mean log-likelihood of SEP-HMM is the highest compared to SN-HMM and Gaussian-HMM. Regarding AIC, the mean AIC of Gaussian-HMM appears to be the lowest in Scenario 1. However, in the other eight scenarios, SEP-HMM has the lowest mean AIC result, indicating relatively better performance. A similar pattern is also found for the BIC metric. Gaussian-HMM shows superiority in Scenario 1 and Scenario 4 with the lowest mean BIC. However, in the other seven scenarios, SEP-HMM claims the title.

Upon further examination, it is observed that SEP-HMM does not emerge as the best model in Scenario 1 (AIC, BIC) and Scenario 4 (AIC). In these cases, its performance is notably weaker compared to Gaussian-HMM. This can be attributed to the fact that both scenarios are constructed under emission conditions that are already symmetric. In these conditions, the added complexity of SEP-HMM does not offer a significant advantage. However, in other scenarios where skewness is induced, the advantage of SEP-HMM becomes more evident, as it is better suited to model asymmetrical data. In these cases, SEP-HMM outperforms other models due to its ability to capture the skewness and kurtosis of the data more effectively.

3.3. Hidden State Decoding Performance

The focus of this subsection is on examining the model’s ability to correctly identify which observations belong to the first and second hidden states. Our results indicate that the SEP-HMM model generally achieved the best performance in most replications across all scenarios. To summarize, we present two figures, which compare the number of times SEP-HMM achieved the best performance with the number of times it was outperformed by its competitors. Figure 5 discusses the evaluation using accuracy metrics, while Figure 6 is constructed for the GMean analysis.

To facilitate a deeper analysis, we will cluster the scenarios into three main groups based on the pattern formed by the

β

parameter values: the first group (

β_{1} = β_{2} = 0

; Scenarios 1–3), the second group (

β_{1} = β_{2} = 0.5

; Scenarios 4–6), and the third group (

β_{1} = β_{2} = - 0.5

; Scenarios 7–9), with each group consisting of three scenarios that vary in

α

values. In terms of accuracy, we observe that in each group, there is an increase in the number of replications where SEP-HMM achieves the best performance as

α_{1}

and

α_{2}

move away from the value indicating symmetry (

α_{1} = α_{2} = 0.5

). For example, in the first group (Scenarios 1–3), SEP-HMM’s performance improves as

α

deviates further from the symmetric condition. In Scenario 1 (

α_{1} = α_{2} = 0.5

), SEP-HMM won 288 (57.60%) replications. However, as the data is induced with slight asymmetry (

α_{1} = 0.7

,

α_{2} = 0.3

) in Scenario 2 and more stronger asymmetry (

α_{1} = 0.9

,

α_{2} = 0.1

) in Scenario 3, the number of replications where SEP-HMM was crowned the best rises to 420 (84.00%) and 496 (99.20%), respectively. A similar trend is observed in the second group, where the number of replications where SEP-HMM achieves the best performance in Scenarios 4, 5, and 6 continues to increase as

α

moves away from the symmetry point. The same pattern is observed in Group 3 (Scenarios 7–9), where SEP-HMM achieved the best performance, with the number of replications increasing from 493 (98.60%) in Scenario 7 to 499 (99.80%) in Scenarios 8–9. As visualized in Figure 5, it can be said that the superiority of SEP-HMM becomes more apparent as the skewness of the emission increases. Regarding GMean, we observe similar results in Figure 6, where the number of replications in which SEP-HMM achieves the best performance increases as

α

move further away from 0.5 (the symmetric condition).

4. Real-Case Implementation

This section is intended to demonstrate the application of our proposed model and to compare its goodness of fit against the benchmark models. The study employs two datasets from different domains, namely a stock price dataset referred to as JCI and an average temperature dataset from the Yogyakarta Climatology Station, Indonesia, referred to as the TEMP dataset. Specifically, the first case examines the modeling of the Jakarta Composite Index (JCI) using daily data from 2023 to 2024. The second case analyzes the change in average air temperature every 10 min, recorded by an Automatic Weather Station (AWS) at the Yogyakarta Climatology Station during the period from 11 to 15 October 2023. The exploratory results in Figure 7, presented as graphical visualizations of the observable values and their corresponding histograms, indicate the presence of multiple modes in the data. This is a defining characteristic of HMMs, where the marginal distribution of the data, when collapsed over time, is a mixture distribution [33]. The exploratory plots in Figure 7 also suggest that the JCI level series is nonstationary and that the TEMP series exhibits visually recurrent behavior over time. Nevertheless, in this section we focus on illustrating model fitting and regime decoding with flexible state-dependent emission patterns captured by the SEP distribution.

The modeling results in Table 5 for SEP-HMM, SN-HMM, and Gaussian-HMM indicate that SEP-HMM outperforms the other two models across all evaluation metrics. For the JCI dataset, SEP-HMM achieves the highest log-likelihood value of −3047.90, exceeding those of SN-HMM and Gaussian-HMM, which are −3083.43 and −3083.43, respectively. SEP-HMM also shows superior performance in terms of AIC and BIC, achieving the lowest values among the three models. A similar pattern is observed for the TEMP dataset, where SEP-HMM again yields the highest log-likelihood (402.11), the lowest AIC (−782.23), and the lowest BIC (−731.89). These results collectively demonstrate the clear superiority of SEP-HMM over the benchmark models. It is important to note that this section does not evaluate decoding accuracy because such an evaluation requires the true hidden state at each time point, which is not available for these datasets. Nevertheless, the decoding results are presented by summarizing the characteristics of each hidden state in Table 6, including the mean, standard deviation (SD), maximum and minimum values, as well as the skewness and excess kurtosis of the associated observations. We also visualized these results in Figure 8, accompanied by the emission fitting for each hidden state based on the corresponding parameter estimates.

We can observe that in the JCI dataset, the proportion of hidden states 1 and 2 is fairly balanced, with hidden state 1 accounting for 48.54% and hidden state 2 for 51.46%. A noticeable difference is found in the TEMP dataset, where hidden state 1 dominates with 69.68%, higher than hidden state 2, which accounts for 30.32%. In the JCI dataset, hidden state 2 has a higher mean and a larger SD than hidden state 1, showing that it occurs at higher levels and with greater variability. In the TEMP dataset, hidden state 1 has a mean of −0.10 and an SD of 0.14, while hidden state 2 has a mean of 0.24 and an SD of 0.19.

For the JCI dataset, we can see that the characteristics of the two hidden states are captured well by the model. In terms of skewness, hidden state 1 has a negative skewness of −0.40, suggesting a distribution with a longer left tail. On the other hand, hidden state 2 shows a positive skewness of 0.87, indicating the pattern with a longer right tail. The TEMP dataset also shows skewness, but with both hidden states exhibiting negative skewness. Hidden state 1 has a much stronger left skew, with a skewness of −1.58, compared to hidden state 2’s −0.52. The more pronounced negative skewness in hidden state 1 indicates a greater concentration of extremely low values compared to hidden state 2. This is particularly important as it demonstrates the SEP-HMM’s capability to handle heavy left-tailed distributions effectively. By accommodating both mild and extreme skewness, the SEP-HMM proves to be a flexible model that can capture asymmetric behaviors commonly found in real-world data.

When we move on to kurtosis, the JCI dataset presents two distributions with platykurtic characteristics. Both hidden states exhibit negative kurtosis, with hidden state 1 showing −0.73 and hidden state 2 slightly closer to zero at −0.02. In contrast, the TEMP dataset displays leptokurtic behavior, with both hidden states showing positive kurtosis values, where hidden state 1 has a kurtosis of 2.72 and hidden state 2 a kurtosis of 1.63. The ability of the SEP-HMM to manage both negative and positive skewness, as well as platykurtic and leptokurtic kurtosis, underscores its versatility in capturing a broad spectrum of distribution types. This flexibility makes it well-suited for modeling non-normal distributions that exhibit asymmetric behaviors and heavy or light tails, such as those often encountered in financial markets, temperature fluctuations, or other complex datasets.

5. Discussion

This study demonstrates that relaxing the symmetry and mesokurtic-shape assumptions by incorporating the SEP distribution is substantively advantageous. The proposed SEP-HMM consistently outperforms both the Gaussian-HMM and SN-HMM under asymmetric, leptokurtic, and platykurtic emission settings across nearly all evaluation metrics, including those that penalize model complexity. These findings indicate that, despite its larger number of parameters, the SEP-HMM remains more favorable across the scenarios examined. These advantages are also reflected in the real-data illustrations, which are drawn from practical series exhibiting non-stationarity and recurrent patterns. Such features may be less ideal for a plain time-homogeneous HMM when the primary goal is to fully model trends or seasonality. In this paper, however, the real-data examples are used mainly to illustrate regime segmentation and decoding with flexible state-dependent emission shapes.

Nevertheless, this study still has several limitations. First, although the SEP-HMM framework allows for a general number of hidden states

K

, the simulation design focuses on

K = 2

. Second, the simulation scenarios primarily varied emission-pattern characteristics, specifically skewness and kurtosis, while holding other structural properties constant. An important yet unexplored dimension is the variation in transition-probability structures across hidden states. Differences in transition probabilities can induce substantial class-proportion imbalances, potentially influencing both fitting and decoding performance. These limitations open several avenues for future research. Developing an estimation procedure capable of jointly determining the optimal number of hidden states and evaluating settings with more than two states could be a valuable extension. Additionally, investigating the model’s fitting and decoding behavior under moderate-to-severe class imbalance due to a highly persistent transition matrix is an important direction for further exploration.

6. Conclusions

In this paper, we modify the HMM framework by utilizing flexible SEP distributions that can accommodate emission patterns with varying degrees of skewness and kurtosis. This model has been verified to outperform its competitors in terms of fitting and decoding capabilities, particularly in cases that exhibit deviations from symmetric and mesokurtic properties. Consequently, it offers greater flexibility in real-world implementation, which often involves asymmetric and non-mesokurtic emission characteristics.

Author Contributions

Conceptualization, D.B.U. and N.I.; methodology, D.B.U., N.I. and I.I.; software, D.B.U.; validation, D.B.U., N.I. and I.I.; formal analysis, D.B.U., N.I. and I.I.; investigation, D.B.U. and N.I.; resources, D.B.U., N.I. and I.I.; data curation, D.B.U. and A.A.P.; writing—original draft preparation, D.B.U.; writing—review and editing, N.I. and I.I.; visualization, D.B.U.; supervision, N.I. and I.I.; project administration, D.B.U. and N.I.; funding acquisition, D.B.U. and N.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Lembaga Pengelola Dana Pendidikan (LPDP) under Scholarship No. LOG-19050/LPDP.3/2024.

Data Availability Statement

The daily closing price data for the JCI can be obtained from https://id.investing.com/indices/idx-composite-historical-data (accessed on 10 November 2025), whereas the average temperature dataset from the Yogyakarta Climatology Station is available upon request.

Acknowledgments

The authors would like to thank the Department of Statistics, Institut Teknologi Sepuluh Nopember, BMKG Yogyakarta Climatology Station, and Lembaga Pengelola Dana Pendidikan for their support throughout this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

This appendix explains the proof of four previously presented propositions. Let

Y ~ S E P (μ, σ, α, β)

. We use the function in Equation (1), with

z

,

τ

, and

p

defined in Equations (2)–(4), where

- \infty < μ < \infty, σ > 0, 0 < α < 1,

and

- 1 < β \leq 1

. For convenience, we define

g (z)

as in Equation (A1).

g (z) = |z| + (2 α - 1) z = \{\begin{matrix} 2 α z & z \geq 0 \\ 2 (- z) (1 - α) & z < 0 \end{matrix}

(A1)

Appendix A.1. Proof of Proposition 1

To establish the validity of

f_{S E P} (y | μ, σ, α, β)

, we demonstrate that it satisfies the two requisite properties of a probability density: (i) it is non-negative for all

y \in R

and (ii) its integral over the real line equals one.

Non-negativity

Observe that for

0 < α < 1

, the factor

4 α (1 - α)

is strictly positive. The Gamma function is positive for positive arguments, ensuring the normalizing constant

τ > 0

. Furthermore, the exponential function returns strictly positive values for any real input and

σ > 0

by definition. Consequently,

f_{S E P} (y | μ, σ, α, β)

is non-negative for all

y \in R

.

ii.: Integrates to One

Performing the substitution

z = (y - μ) / σ

, with

d y = σ d z

, yields

\int_{- \infty}^{\infty} f_{S E P} (y | μ, σ, α, β) d y = \int_{- \infty}^{\infty} \frac{τ}{σ} \exp \{- \frac{1}{2} {[g (z)]}^{p}\} σ d z = τ \int_{- \infty}^{\infty} \exp \{- \frac{1}{2} {[g (z)]}^{p}\} d z

(A2)

We split the domain at

z = 0

. Thus,

\int_{- \infty}^{\infty} \exp \{- \frac{1}{2} {[g (z)]}^{p}\} d z = \int_{- \infty}^{0} \exp \{- \frac{1}{2} {[2 (- z) (1 - α)]}^{p}\} d z + \int_{0}^{\infty} \exp \{- \frac{1}{2} {[2 α z]}^{p}\} d z

(A3)

For the first integral, set

u = - z

so

d u = - d z

. Consequently, both integrals in Equation (A3) share the same form, and we may write Equation (A4).

\int_{- \infty}^{\infty} \exp \{- \frac{1}{2} {[g (z)]}^{p}\} d z = \int_{0}^{\infty} [\exp \{- \frac{1}{2} {[2 (1 - α) z]}^{p}\} + \exp \{- \frac{1}{2} {[2 α z]}^{p}\}] d z

(A4)

For any constant

c > 0

and

p > 0

, we have Equation (A5).

\int_{0}^{\infty} \exp \{- \frac{1}{2} {[c z]}^{p}\} d z = \frac{2^{\frac{1}{p}}}{c p} Γ (\frac{1}{p})

(A5)

Applying Equation (A5) to each term in (A4) gives Equation (A6).

\int_{- \infty}^{\infty} \exp \{- \frac{1}{2} {[g (z)]}^{p}\} d z = \frac{2^{\frac{1}{p}}}{p} Γ (\frac{1}{p}) \frac{1}{2 α (1 - α)} = \frac{2^{(\frac{1 + β}{2})}}{(\frac{2}{1 + β})} Γ (\frac{1 + β}{2}) \frac{1}{2 α (1 - α)}

(A6)

We use

Γ (1 + x) = x Γ (x)

to obtain Equation (A7).

\begin{matrix} \int_{- \infty}^{\infty} f_{S E P} (y | μ, σ, α, β) d y = & τ \frac{2^{(\frac{1 + β}{2})}}{(\frac{2}{1 + β})} Γ (\frac{1 + β}{2}) \frac{1}{2 α (1 - α)} \\ = & {[\frac{1 + β}{2} Γ (\frac{1 + β}{2})]}^{- 1} \frac{4 α (1 - α)}{2 2^{\frac{1 + β}{2}}} \frac{2^{(\frac{1 + β}{2})}}{(\frac{2}{1 + β})} Γ (\frac{1 + β}{2}) \frac{1}{2 α (1 - α)} \\ = & 1 \end{matrix}

(A7)

Thus, the SEP density in Equation (1) integrates to one over the real line.

Hence,

f_{S E P} (y | μ, σ, α, β)

is a valid probability density function, as it satisfies the two defining properties: it is non-negative for all

y \in R

and integrates to one over the real line.

Appendix A.2. Proof of Proposition 2

Since

τ / σ

does not depend on

y

, maximizing

f_{S E P} (y | μ, σ, α, β)

over

y \in R

is equivalent to maximizing

\exp \{- \frac{1}{2} {[g (z)]}^{p}\}

, or equivalently minimizing

{[g (z)]}^{p}

. Minimizing

{[g (z)]}^{p}

is the same as minimizing

g (z)

because

p > 0

. Using definition in Equation (A1), we find that

g (z) \geq 0

for all

z

and

g (z) = 0

occurs only at

z = 0

. Therefore,

{[g (z)]}^{p}

is uniquely minimized at

z = 0

and

\exp \{- \frac{1}{2} {[g (z)]}^{p}\}

is uniquely maximized at

z = 0

. Since

z = 0

is equivalent to

y = μ

in Equation (2), the density

f_{S E P} (y | μ, σ, α, β)

is maximized at

y = μ

.

Appendix A.3. Proof of Proposition 3

Without loss of generality, consider the standardized variable

Z = (Y - μ) / σ

, which follows an SEP(0,1,

α

,

β

) distribution. Its density is given by Equation (A8), where

τ

,

p

and

g (z)

follow Equations (3), (4) and (A1).

f_{Z} (z) = τ \exp \{- \frac{1}{2} {[g (z)]}^{p}\}, - \infty < z < \infty

(A8)

Take any

c > 0

and compare the density at points

c

and

- c

:

\frac{f_{Z} (c)}{f_{Z} (- c)} = \frac{\exp \{- \frac{1}{2} {[g (c)]}^{p}\}}{\exp \{- \frac{1}{2} {[g (- c)]}^{p}\}} = \exp \{- \frac{1}{2} [{(g (c))}^{p} - {(g (- c))}^{p}]\}

Using the definition of

g (\cdot)

in Equation (A1), we obtain

g (c) = 2 α c

and

g (- c) = 2 (c) (1 - α)

. Hence, we get Equation (A9).

\frac{f_{Z} (c)}{f_{Z} (- c)} = \exp \{- \frac{1}{2} {(2 c)}^{p} [α^{p} - {(1 - α)}^{p}]\}

(A9)

Since

p > 0

and

c > 0

, the factor

{(2 c)}^{p} > 0

. The sign of the component inside the exponential term is therefore determined by the factor

- [α^{p} - {(1 - α)}^{p}]

. Consequently, we obtain the following chain of equivalences for all

c > 0

:

$α = 0.5 ⟺ α = 1 - α ⟺ α^{p} = {(1 - α)}^{p} ⟺ \frac{f_{Z} (c)}{f_{Z} (- c)} = 1 ⟺ f_{Z} (c) = f_{Z} (- c)$
$α < 0.5 ⟺ α < 1 - α ⟺ α^{p} < {(1 - α)}^{p} ⟺ \frac{f_{Z} (c)}{f_{Z} (- c)} > 1 ⟺ f_{Z} (c) > f_{Z} (- c)$
$α > 0.5 ⟺ α > 1 - α ⟺ α^{p} > {(1 - α)}^{p} ⟺ \frac{f_{Z} (c)}{f_{Z} (- c)} < 1 ⟺ f_{Z} (c) < f_{Z} (- c)$

Symmetry about zero corresponds to

f_{Z} (c) = f_{Z} (- c)

for all

c > 0

, positive skewness corresponds to

f_{Z} (c) > f_{Z} (- c)

for all

c > 0

(indicating a higher density on the right side), and negative skewness corresponds to

f_{Z} (c) < f_{Z} (- c)

for all

c > 0

(indicating a higher density on the left side). Thus, the direction of skewness of

Z

is fully characterized by

α

. The same conclusions regarding the direction of skewness hold for the original variable

Y = μ + σ Z

, as skewness is invariant under location and scale transformations. Therefore, the parameter

α

governs the direction of skewness of the SEP distribution in the same manner for

Y

.

Appendix A.4. Proof of Proposition 4

Without loss of generality, consider the standardized variable

Z = (Y - μ) / σ

where its density is presented in Equation (A8). For any

c \in (0,1)

and any

r > - 1

, define

I_{r} (c)

as in Equation (A10).

I_{r} (c) ≔ \int_{0}^{\infty} z^{r} \exp \{- \frac{1}{2} {(2 c z)}^{p}\} d z

(A10)

Using the substitution

u = \frac{1}{2} {(2 c z)}^{p}

, we obtain Equation (A11).

I_{r} (c) = \frac{2^{(r + 1) / p}}{p {(2 c)}^{r + 1}} Γ (\frac{r + 1}{p})

(A11)

Next, for even

r \in \{2,4\}

, we compute

E [Z^{r}]

. Splitting the integral at 0 and using Equation (A1), we achieve Equation (A12).

E [Z^{r}] = \int_{- \infty}^{\infty} z^{r} f_{Z} (z) d z = τ \int_{0}^{\infty} z^{r} e^{- \frac{1}{2} {(2 α z)}^{p}} d z + τ \int_{0}^{\infty} z^{r} e^{- \frac{1}{2} {(2 (1 - α) z)}^{p}} d z

(A12)

By the definition of

I_{r} (c)

in Equation (A10), this can be rewritten as Equation (A13).

E [Z^{r}] = τ [I_{r} (α) + I_{r} (1 - α)], r \in \{2,4\}

(A13)

Substituting Equation (A11) into (A13) yields, for

r \in \{2,4\}

,

E [Z^{r}] = \frac{τ 2^{(r + 1) / p}}{p} Γ (\frac{r + 1}{p}) \{\frac{1}{{(2 α)}^{r + 1}} + \frac{1}{{[2 (1 - α)]}^{r + 1}}\}

(A14)

In particular,

E [Z^{2}] = \frac{τ 2^{3 / p}}{p} Γ (\frac{3}{p}) \{\frac{1}{{(2 α)}^{3}} + \frac{1}{{[2 (1 - α)]}^{3}}\}

(A15)

E [Z^{4}] = \frac{τ 2^{5 / p}}{p} Γ (\frac{5}{p}) \{\frac{1}{{(2 α)}^{5}} + \frac{1}{{[2 (1 - α)]}^{5}}\}

(A16)

Since

Y - μ = σ Z

, we have

K u r t_{μ} (Y)

as in Equation (5).

K u r t_{μ} (Y) = \frac{E [{(Y - μ)}^{4}]}{{\{E [{(Y - μ)}^{2}]\}}^{2}} = \frac{E [{(Z)}^{4}]}{{\{E [{(Z)}^{2}]\}}^{2}}

Substituting Equations (A15) and (A16) into Equation (5), and using

τ

from Equation (3) together with

p

in Equation (4), we obtain Equation (A17).

K u r t_{μ} (Y) = \frac{Γ (5 / p) Γ (1 / p)}{{[Γ (3 / p)]}^{2}} \cdot \frac{(1 - α) α^{- 4} + α {(1 - α)}^{- 4}}{{[(1 - α) α^{- 2} + α {(1 - α)}^{- 2}]}^{2}}

(A17)

Finally, since

p = 2 / (1 + β)

, the expression in Equation (A17) depends on

β

through

p

. For fixed

α

, variations in

β

induce variations in

K u r t_{μ} (Y)

. Moreover, since

p

decreases as

β

increases, larger

β

corresponds to smaller

p

, which is typically reflected in larger values of

K u r t_{μ} (Y)

. In turn, larger kurtosis indicates heavier tails (often described as a more leptokurtic shape), whereas smaller values of

K u r t_{μ} (Y)

correspond to lighter tails (more platykurtic shapes).

Appendix B

This appendix will detail the process of deriving and formulating the Baum-Welch algorithm using the Expectation-Maximization (EM) framework described in Section 2.3. The goal is to compute the maximum likelihood estimate of

λ

by maximizing the observed-data likelihood

L (λ) = \Pr (O_{1 : T}| λ)

, viewed as a function of

λ

with

O_{1 : T}

fixed. Since

q_{1 : T}

is unobserved,

\Pr (O_{1 : T}| λ) = \sum_{q_{1 : T}} \Pr ({q_{1 : T}, O}_{1 : T}| λ)

. Given the model parameters

λ = (π, A, Θ)

, the joint probability of the hidden state sequence

q_{1 : T}

and the observations

O_{1 : T}

can be decomposed into an initial-state term, a product of transition probabilities, and a product of emission probabilities shown in Equation (A18).

\Pr (q_{1 : T}, O_{1 : T} | π, A, Θ) = \Pr (q_{1} | π) \prod_{t = 2}^{T} \Pr (q_{t} | q_{t - 1}, A) \prod_{t = 1}^{T} \Pr (O_{t} | q_{t}, Θ)

(A18)

Taking the logarithm yields an additive representation, which can be written using indicator functions to make the dependence on each parameter block explicit. This reformulation, as in Equation (A19), is crucial because it reveals that the complete-data log-likelihood, viewed as a function of

λ = (π, A, Θ)

with

(q_{1 : T}, O_{1 : T})

fixed, is separable with respect to

π

,

A

, and

Θ

.

\begin{matrix} \log \Pr (q_{1 : T}, O_{1 : T} | π, A, Θ) = & \sum_{i = 1}^{K} I (q_{1} = S_{i}) \log π_{i} \\ + & \sum_{t = 1}^{T - 1} \sum_{i = 1}^{K} \sum_{j = 1}^{K} I (q_{t} = S_{i}, q_{t + 1} = S_{j}) \log a_{i j} \\ + & \sum_{t = 1}^{T} \sum_{i = 1}^{K} I (q_{t} = S_{i}) \log b_{i} (O_{t} | θ_{i}) \end{matrix}

(A19)

The E-step forms the expected complete-data log-likelihood using the current parameter value

λ^{(m)}

. With this definition, Equation (A20) gives the corresponding

Q (λ, λ^{(m)})

function. Replacing the complete-data log-likelihood with its indicator-expanded form inside this expectation leads to the decomposition in Equation (A21).

Q (λ, λ^{(m)}) = E_{q_{1 : T} | O_{1 : T}, λ^{(m)}} [\log \Pr (q_{1 : T}, O_{1 : T} | π, A, Θ)]

(A20)

\begin{matrix} Q (λ, λ^{(m)}) = & \sum_{i = 1}^{K} E_{q_{1 : T} | O_{1 : T}, λ^{(m)}} [I (q_{1} = S_{i})] \log π_{i} \\ + & \sum_{t = 1}^{T - 1} \sum_{i = 1}^{K} \sum_{j = 1}^{K} E_{q_{1 : T} | O_{1 : T}, λ^{(m)}} [I (q_{t} = S_{i}, q_{t + 1} = S_{j})] \log a_{i j} \\ + & \sum_{t = 1}^{T} \sum_{i = 1}^{K} E_{q_{1 : T} | O_{1 : T}, λ^{(m)}} [I (q_{t} = S_{i})] \log b_{i} (O_{t} | θ_{i}) \end{matrix}

(A21)

For any indicator

I (\cdot)

,

E_{q_{1 : T} | O_{1 : T}, λ^{(m)}} [I (e v e n t)] = \Pr (e v e n t | O_{1 : T}, λ^{(m)})

, which leads to Equation (A22) (

t = 1,2, \dots, T

) and Equation (A23) (

t = 1,2, \dots, T - 1

) for

i = 1,2, \dots, K

. Substituting these expressions into Equation (A21) yields the final decomposition in Equation (A24). This equation shows that the parameters

π

,

A

, and

Θ

are updated separately in the M-step. In the M-step, we update each parameter block by maximizing the corresponding term in

Q (λ, λ^{(m)})

with respect to its corresponding parameter block, denoted by

Q_{π}

,

Q_{A}

, and

Q_{Θ}

.

E_{q_{1 : T} | O_{1 : T}, λ^{(m)}} [I (q_{t} = S_{i})] = \Pr (q_{t} = S_{i} | O_{1 : T}, λ^{(m)}) ≔ ϕ_{t} (i)

(A22)

E_{q_{1 : T} | O_{1 : T}, λ^{(m)}} [I (q_{t} = S_{i}, q_{t + 1} = S_{j})] = \Pr (q_{t} = S_{i}, q_{t + 1} = S_{j} | O_{1 : T}, λ^{(m)}) ≔ ξ_{t} (i, j)

(A23)

Q (λ, λ^{(m)}) = \underset{Q_{π}}{\underset{⏟}{\sum_{i = 1}^{K} ϕ_{1} (i) \log π_{i}}} + \underset{Q_{A}}{\underset{⏟}{\sum_{t = 1}^{T - 1} \sum_{i = 1}^{K} \sum_{j = 1}^{K} ξ_{t} (i, j) \log a_{i j}}} + \underset{Q_{Θ}}{\underset{⏟}{\sum_{t = 1}^{T} \sum_{i = 1}^{K} ϕ_{t} (i) \log b_{i} (O_{t} | θ_{i})}}

(A24)

Appendix B.1. Update for the Initial Distribution $π$

We first maximize the component of

Q (λ, λ^{(m)})

that depends on

π

. The parameter vector

π = (π_{1}, π_{2}, \dots, π_{K})

must satisfy

π_{i} \geq 0

for

i = 1,2, \dots, K

and

\sum_{i = 1}^{K} π_{i} = 1

. We enforce the normalization constraint using a Lagrange multiplier

η

, yielding the Lagrangian in Equation (A25). Differentiating

L (π, η)

with respect to

π_{i}

gives Equation (A26), which implies

π_{i} = \frac{ϕ_{1} (i)}{η}

as in Equation (A27). Applying the constraint

\sum_{i = 1}^{K} π_{i} = 1

yields

η = \sum_{i = 1}^{K} ϕ_{1} (i)

. Since

\sum_{i = 1}^{K} ϕ_{1} (i) = 1

, hence

η = 1

. Substituting this value back into Equation (A27) yields the update formula presented in Equation (A28). Moreover, because

ϕ_{1} (i) \geq 0

, the solution automatically satisfies

π_{i} \geq 0

.

L (π, η) = \sum_{i = 1}^{K} ϕ_{1} (i) \log π_{i} + η (1 - \sum_{i = 1}^{K} π_{i})

(A25)

\frac{\partial L (π, η)}{\partial π_{i}} = \frac{ϕ_{1} (i)}{π_{i}} - η = 0

(A26)

π_{i} = \frac{ϕ_{1} (i)}{η}, i = 1,2, \dots, K

(A27)

π_{i}^{n e w} = ϕ_{1} (i)

(A28)

Appendix B.2. Update for the Transition Probability Matrix $A$

We now maximize the component of

Q (λ, λ^{(m)})

that involves the transition probabilities, namely

Q_{A}

. Because the transition probabilities form a row-stochastic matrix, this maximization must be carried out subject to the constraints

a_{i j} \geq 0

for all

i, j = 1,2, \dots, K

, and

\sum_{j = 1}^{K} a_{i j} = 1

,

i = 1,2, \dots, K

. These constraints couple the elements within each row but not across rows. Consequently, the maximization decomposes into

K

independent optimization problems. We therefore define the row-wise objective in Equation (A29) and enforce the normalization constraint using a Lagrange multiplier

η_{i}

, leading to the Lagrangian in Equation (A30).

Q_{A_{i}} = \sum_{j = 1}^{K} (\sum_{t = 1}^{T - 1} ξ_{t} (i, j)) \log a_{i j}

(A29)

L (a_{i 1}, a_{i 2}, \dots, a_{i K}, η_{i}) = \sum_{j = 1}^{K} (\sum_{t = 1}^{T - 1} ξ_{t} (i, j)) \log a_{i j} + η_{i} (1 - \sum_{j = 1}^{K} a_{i j})

(A30)

Differentiating the Lagrangian in Equation (A30) with respect to each

a_{i j}

produces the first-order condition given in Equation (A31), from which the expression for

a_{i j}

in Equation (A32) follows directly. The associated Lagrange multiplier

η_{i}

is then obtained by imposing the row-normalization requirement

\sum_{j = 1}^{K} a_{i j} = 1

resulting in Equation (A33). Using the marginalization identity in Equation (A34) for

t = 1,2, \dots, T - 1

, the value of

η_{i}

provides the row-wise update reported in Equation (A35).

\frac{\partial L (a_{i 1}, a_{i 2}, \dots, a_{i K}, η_{i})}{\partial a_{i j}} = \frac{\sum_{t = 1}^{T - 1} ξ_{t} (i, j)}{a_{i j}} - η_{i} = 0

(A31)

a_{i j} = \frac{\sum_{t = 1}^{T - 1} ξ_{t} (i, j)}{η_{i}}

(A32)

η_{i} = \sum_{t = 1}^{T - 1} \sum_{j = 1}^{K} ξ_{t} (i, j)

(A33)

\sum_{j = 1}^{K} ξ_{t} (i, j) = \Pr (q_{t} = S_{i} | O_{1 : T}, λ^{(m)}) = ϕ_{t} (i)

(A34)

a_{i j}^{n e w} = \frac{\sum_{t = 1}^{T - 1} ξ_{t} (i, j)}{\sum_{t = 1}^{T - 1} ϕ_{t} (i)}

(A35)

Appendix B.3. Update for the Emission Parameter $Θ$

Because the emission parameters are state-specific (no parameter tying across states), the maximization over

Θ

separates into

K

independent problems, one for each

θ_{i}

i = 1,2, \dots, K

. The objective function

Q_{Θ}

decomposes additively over

i = 1,2, \dots, K

as seen in Equation (A36). For each state

i

, we therefore maximize

Q_{θ_{i}}

with respect to

θ_{i}

subject to the natural parameter domain of the emission distribution and the bound constraint

σ_{i} \geq σ_{m i n}

(we use

σ_{m i n} = 10^{- 8}

) to prevent degenerate solutions due to the unboundedness of the likelihood. In the SEP-HMM, Equation (A37) can be expanded into Equation (A38). This result serves as the basis for updating the emission parameters

Θ = (θ_{1}, θ_{2}, \dots, θ_{K})

.

Q_{Θ} = \sum_{i = 1}^{K} Q_{θ_{i}}, Q_{θ_{i}} ≔ \sum_{t = 1}^{T} ϕ_{t} (i) \log b_{i} (O_{t} | θ_{i})

(A36)

θ_{i}^{n e w} = \arg \max_{θ_{i}} Q_{θ_{i}} = \arg \max_{θ_{i}} \sum_{t = 1}^{T} ϕ_{t} (i) \log b_{i} (O_{t} | θ_{i})

(A37)

θ_{i}^{n e w} = \arg \max_{θ_{i}} \sum_{t = 1}^{T} ϕ_{t} (i) \{\log [\frac{4 α_{i} (1 - α_{i})}{σ_{i} Γ (1 + \frac{1 + β_{i}}{2}) 2^{[1 + \frac{1 + β_{i}}{2}]}}] - \frac{1}{2} {[|\frac{O_{t} - μ_{i}}{σ_{i}}| + (2 α_{i} - 1) (\frac{O_{t} - μ_{i}}{σ_{i}})]}^{\frac{2}{1 + β_{i}}}\}

(A38)

References

Rabiner, L.; Juang, B. An Introduction to Hidden Markov Models. IEEE ASSP Mag. 1986, 3, 4–16. [Google Scholar] [CrossRef]
Awad, M.; Khanna, R. Hidden Markov Model. In Efficient Learning Machines; Apress: Berkeley, CA, USA, 2015; pp. 81–104. ISBN 978-1-4302-5989-3. [Google Scholar]
Cappé, O.; Moulines, E.; Rydén, T. Inference in Hidden Markow Models; Springer Series in Statistics; Springer: New York, NY, USA, 2005; ISBN 978-0-387-40264-2. [Google Scholar]
Bhar, R.; Hamori, S. Hidden Markov Models: Applications to Financial Economics; Advanced Studies in Theoretical and Applied Econometrics; Kluwer Academic Publishers: Boston, MA, USA, 2004; Volume 40, ISBN 978-1-4020-7899-6. [Google Scholar]
Ma, Y.; Chen, H.; Kang, J.; Guo, X.; Sun, C.; Xu, J.; Tao, J.; Wei, S.; Dong, Y.; Tian, H.; et al. The Hidden Markov Model and Its Applications in Bioinformatics Analysis. Genes Dis. 2026, 13, 101729. [Google Scholar] [CrossRef]
Glennie, R.; Adam, T.; Leos-Barajas, V.; Michelot, T.; Photopoulou, T.; McClintock, B.T. Hidden Markov Models: Pitfalls and Opportunities in Ecology. Methods Ecol. Evol. 2023, 14, 43–56. [Google Scholar] [CrossRef]
Visser, I.; Raijmakers, M.E.J.; Molenaar, P.C.M. Fitting Hidden Markov Models to Psychological Data. Sci. Program. 2002, 10, 185–199. [Google Scholar] [CrossRef]
Mor, B.; Garhwal, S.; Kumar, A. A Systematic Review of Hidden Markov Models and Their Applications. Arch. Comput. Methods Eng. 2021, 28, 1429–1448. [Google Scholar] [CrossRef]
Unggul, D.B.; Iriawan, N.; Irhamah, I. Parameter Estimation of MSNBurr-Based Hidden Markov Model: A Simulation Study. Symmetry 2025, 17, 1931. [Google Scholar] [CrossRef]
Mohammadiha, N.; Kleijn, W.B.; Leijon, A. Gamma Hidden Markov Model as a Probabilistic Nonnegative Matrix Factorization. In Proceedings of the 21st European Signal Processing Conference (EUSIPCO), Marrakech, Morocco, 9–13 September 2013; pp. 1–5. [Google Scholar]
Zhang, H.; Zhang, W.; Palazoglu, A.; Sun, W. Prediction of Ozone Levels Using a Hidden Markov Model (HMM) with Gamma Distribution. Atmos. Environ. 2012, 62, 64–73. [Google Scholar] [CrossRef]
Nkemnole, E.B.; Bamigbode, J.O. Application of Weibull Distribution to Hidden Markov Model for Non-Negative Factorization Matrix. Eur. J. Theor. Appl. Sci. 2024, 2, 607–622. [Google Scholar] [CrossRef]
Nigri, A.; Forti, M.; Shang, H.L. Extending Finite Mixture Models with Skew-Normal Distributions and Hidden Markov Models for Time Series. J. Stat. Comput. Simul. 2025, 1–28. [Google Scholar] [CrossRef]
Azzalini, A. A Class of Distributions Which Includes the Normal Ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
Iriawan, N. Computationally Intensive Approaches to Inference in Neo-Normal Linear Models. Ph.D. Thesis, Curtin University of Technology, Perth, Australia, 2000. [Google Scholar]
Fernández, C.; Steel, M.F. On Bayesian Modeling of Fat Tails and Skewness. J. Am. Stat. Assoc. 1998, 93, 359–371. [Google Scholar] [PubMed]
Ammermann, P.A. Are Stock Return Dynamics Truly Explosive or Merely Conditionally Leptokurtic? A Case Study on the Impact of Distributional Assumptions in Econometric Modeling. J. Data Anal. Inf. Process. 2016, 4, 21–39. [Google Scholar] [CrossRef][Green Version]
Kim, J.H.T.; Kim, H. Estimating Skewness and Kurtosis for Asymmetric Heavy-Tailed Data: A Regression Approach. Mathematics 2025, 13, 2694. [Google Scholar] [CrossRef]
Kamath, A.; Poojari, S.; Varsha, K. Assessing the Robustness of Normality Tests under Varying Skewness and Kurtosis: A Practical Checklist for Public Health Researchers. BMC Med. Res. Methodol 2025, 25, 206. [Google Scholar] [CrossRef]
Hutson, A.D. An Alternative Skew Exponential Power Distribution Formulation. Commun. Stat.-Theory Methods 2019, 48, 3005–3024. [Google Scholar] [CrossRef]
Lu, Y.; Zeng, L. A Nonhomogeneous Poisson Hidden Markov Model for Claim Counts. ASTIN Bull. 2012, 42, 181–202. [Google Scholar] [CrossRef]
Paroli, R.; Redaelli, G.L.M.; Spezia, L. Poisson Hidden Markov Models for Time Series of Overdispersed Insurance Counts. In Proceedings of the XXXI International ASTIN Colloquium; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
Sadeghifar, M.; Seyed-Tabib, M.; Haji-Maghsoudi, S.; Noemani, K.; Aalipur-Byrgany, F. The Application of Poisson Hidden Markov Model to Forecasting New Cases of Congenital Hypothyroidism in Khuzestan Province. J. Biostat. Epidemiol. (JBE) 2016, 2, 14–19. [Google Scholar]
Orfanogiannaki, K.; Karlis, D. Modeling Earthquake Numbers by Negative Binomial Hidden Markov Models. In Proceedings of the EGU General Assembly 2020, Online, 4–8 May 2020. [Google Scholar]
Spezia, L.; Cooksley, S.L.; Brewer, M.J.; Donnelly, D.; Tree, A. Modelling Species Abundance in a River by Negative Binomial Hidden Markov Models. Comput. Stat. Data Anal. 2014, 71, 599–614. [Google Scholar] [CrossRef]
Baum, L.E.; Petrie, T.; Soules, G.; Weiss, N. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains. Ann. Math. Stat. 1970, 41, 164–171. [Google Scholar] [CrossRef]
Welch, L.R. Hidden Markov Models and the Baum-Welch Algorithm. IEEE Inf. Theory Soc. Newsl. 2003, 53, 10–13. [Google Scholar]
Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum Likelihood from Incomplete Data Via the EM Algorithm. J. R. Stat. Soc. Ser. B Stat. Methodol. 1977, 39, 1–22. [Google Scholar] [CrossRef]
Forney, G.D. The Viterbi Algorithm. Proc. IEEE 1973, 61, 268–278. [Google Scholar] [CrossRef]
Akaike, H. A New Look at the Statistical Model Identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
Schwarz, G. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023. [Google Scholar]
Visser, I. Seven Things to Remember about Hidden Markov Models: A Tutorial on Markovian Models for Time Series. J. Math. Psychol. 2011, 55, 403–415. [Google Scholar] [CrossRef]

Figure 1. Visualization of the HMM framework, which includes two sequences (hidden state and observable sequence), a parameter set

λ = (π, A, Θ)

, and the interactions connecting them.

Figure 1. Visualization of the HMM framework, which includes two sequences (hidden state and observable sequence), a parameter set

λ = (π, A, Θ)

, and the interactions connecting them.

Figure 2. PDF plots of the SEP distribution with various shapes, representing (a) symmetric patterns, (b) left-skewed patterns, and (c) right-skewed patterns, along with different levels of kurtosis or tail thickness.

Figure 3. Visualization of the first emission target (solid line) and the second emission target (dashed line) from nine simulated scenarios. The vertical dotted line indicates the mode value of each emission that is stable at the location parameter.

Figure 4. Emission density plots for all scenarios. Each panel shows the emission PDF for a specific scenario. The black line represents the true emission PDF with the actual target parameters, while the blue lines represent the estimated emission PDFs from 500 replications.

Figure 5. SEP-HMM decoding performance in terms of accuracy. Bar plots showing the number of replications where SEP-HMM achieved the best decoding performance (blue) or lower accuracy compared to the benchmark (red) across nine scenarios.

Figure 6. SEP-HMM decoding performance in terms of GMean. Bar plots showing the number of replications where SEP-HMM achieved the best decoding performance (blue) or lower GMean compared to the benchmark (red) across nine scenarios.

Figure 7. Exploratory visualization of the JCI closing price data and the average temperature data. Panel (a) shows the JCI time series on the left and its histogram on the right. Panel (b) shows the 10 min average temperature change time series on the left and its corresponding value distribution on the right.

Figure 8. Decoding results showing the observable sequence and emission distribution for each hidden state for two cases: (a) Closing price of JCI and (b) Temperature change in Yogyakarta.

Table 1. Confusion matrix for evaluating two-state HMM decoding.

	Predicted State 1	Predicted State 2
Actual State 1	True Positive (TP)	False Negative (FN)
Actual State 2	False Positive (FP)	True Negative (TN)

Table 2. Overview of the simulation scenarios and their intuitive interpretations.

Scenario	Target Parameter		Intuitive Interpretation
Scenario	Emission 1	Emission 2	Intuitive Interpretation
Scenario 1	$μ_{1} = - 2; σ_{1} = 1;$ $α_{1} = 0.5; β_{1} = 0$	$μ_{2} = 2; σ_{2} = 1;$ $α_{2} = 0.5; β_{2} = 0$	Symmetric, Mesokurtic
Scenario 2	$μ_{1} = - 2; σ_{1} = 1;$ $α_{1} = 0.7; β_{1} = 0$	$μ_{2} = 2; σ_{2} = 1;$ $α_{2} = 0.3; β_{2} = 0$	Slightly Asymmetric, Mesokurtic
Scenario 3	$μ_{1} = - 2; σ_{1} = 1;$ $α_{1} = 0.9; β_{1} = 0$	$μ_{2} = 2; σ_{2} = 1;$ $α_{2} = 0.1; β_{2} = 0$	Strongly Asymmetric, Mesokurtic
Scenario 4	$μ_{1} = - 2; σ_{1} = 1;$ $α_{1} = 0.5; β_{1} = 0.5$	$μ_{2} = 2; σ_{2} = 1;$ $α_{2} = 0.5; β_{2} = 0.5$	Symmetric, Leptokurtic
Scenario 5	$μ_{1} = - 2; σ_{1} = 1;$ $α_{1} = 0.7; β_{1} = 0.5$	$μ_{2} = 2; σ_{2} = 1;$ $α_{2} = 0.3; β_{2} = 0.5$	Slightly Asymmetric, Leptokurtic
Scenario 6	$μ_{1} = - 2; σ_{1} = 1;$ $α_{1} = 0.9; β_{1} = 0.5$	$μ_{2} = 2; σ_{2} = 1;$ $α_{2} = 0.1; β_{2} = 0.5$	Strongly Asymmetric, Leptokurtic
Scenario 7	$μ_{1} = - 2; σ_{1} = 1;$ $α_{1} = 0.5; β_{1} = - 0.5$	$μ_{2} = 2; σ_{2} = 1;$ $α_{2} = 0.5; β_{2} = - 0.5$	Symmetric, Platykurtic
Scenario 8	$μ_{1} = - 2; σ_{1} = 1;$ $α_{1} = 0.7; β_{1} = - 0.5$	$μ_{2} = 2; σ_{2} = 1;$ $α_{2} = 0.3; β_{2} = - 0.5$	Slightly Asymmetric, Platykurtic
Scenario 9	$μ_{1} = - 2; σ_{1} = 1;$ $α_{1} = 0.9; β_{1} = - 0.5$	$μ_{2} = 2; σ_{2} = 1;$ $α_{2} = 0.1; β_{2} = - 0.5$	Strongly Asymmetric, Platykurtic

Table 3. Comparison of SEP-HMM parameter estimates and actual target values across all scenarios.

Parameter		Scenario
Parameter		1	2	3	4	5	6	7	8	9
$π_{1}$	Target	0.500	0.500	0.500	0.500	0.500	0.500	0.500	0.500	0.500
	Estimate	0.508	0.512	0.510	0.510	0.510	0.514	0.514	0.512	0.514
$π_{2}$	Target	0.500	0.500	0.500	0.500	0.500	0.500	0.500	0.500	0.500
	Estimate	0.492	0.488	0.490	0.490	0.490	0.486	0.486	0.488	0.486
$a_{11}$	Target	0.800	0.800	0.800	0.800	0.800	0.800	0.800	0.800	0.800
	Estimate	0.799	0.798	0.798	0.800	0.800	0.797	0.798	0.798	0.798
$a_{12}$	Target	0.200	0.200	0.200	0.200	0.200	0.200	0.200	0.200	0.200
	Estimate	0.201	0.202	0.202	0.200	0.200	0.203	0.202	0.202	0.202
$a_{21}$	Target	0.200	0.200	0.200	0.200	0.200	0.200	0.200	0.200	0.200
	Estimate	0.200	0.201	0.201	0.199	0.199	0.202	0.201	0.201	0.201
$a_{22}$	Target	0.800	0.800	0.800	0.800	0.800	0.800	0.800	0.800	0.800
	Estimate	0.800	0.799	0.799	0.801	0.801	0.798	0.799	0.799	0.799
$μ_{1}$	Target	−2.000	−2.000	−2.000	−2.000	−2.000	−2.000	−2.000	−2.000	−2.000
	Estimate	−1.989	−1.986	−2.076	−1.973	−1.986	−2.628	−2.013	−2.012	−2.180
$μ_{2}$	Target	2.000	2.000	2.000	2.000	2.000	2.000	2.000	2.000	2.000
	Estimate	2.027	2.019	2.104	1.993	2.025	2.746	2.018	2.026	2.178
$σ_{1}$	Target	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
	Estimate	0.990	0.982	1.171	1.000	0.998	2.332	0.974	0.981	1.148
$σ_{2}$	Target	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
	Estimate	0.983	1.000	1.221	0.994	1.013	2.464	0.973	0.989	1.153
$α_{1}$	Target	0.500	0.700	0.900	0.500	0.700	0.900	0.500	0.700	0.900
	Estimate	0.504	0.705	0.891	0.507	0.703	0.826	0.495	0.697	0.876
$α_{2}$	Target	0.500	0.300	0.100	0.500	0.300	0.100	0.500	0.300	0.100
	Estimate	0.508	0.303	0.113	0.497	0.304	0.187	0.510	0.307	0.124
$β_{1}$	Target	0.000	0.000	0.000	0.500	0.500	0.500	−0.500	−0.500	−0.500
	Estimate	−0.016	−0.013	−0.186	0.476	0.478	−0.328	−0.520	−0.509	−0.458
$β_{2}$	Target	0.000	0.000	0.000	0.500	0.500	0.500	−0.500	−0.500	−0.500
	Estimate	0.006	−0.010	−0.196	0.487	0.489	−0.340	−0.509	−0.509	−0.463

Table 4. Model fitting performance per scenario, represented by the mean value (standard deviation) of 500 replications.

Metric	Model	Scenario
Metric	Model	1	2	3	4	5	6	7	8	9
Log-Likelihood	SEP-HMM	−934.77 (18.78)	−1039.36 (20.58)	−1437.17 (18.96)	−1102.21 (17.76)	−1246.26 (21.07)	−1524.25 (17.76)	−753.28 (18.77)	−840.55 (17.51)	−1265.30 (17.18)
	SN-HMM	−936.55 (18.75)	−1048.52 (22.42)	−1464.38 (24.04)	−1110.66 (18.05)	−1269.27 (21.96)	−1548.24 (19.28)	−770.91 (18.70)	−861.62 (18.46)	−1300.50 (19.52)
	Gaussian-HMM	−936.90 (18.72)	−1056.87 (20.87)	−1481.04 (18.66)	−1111.62 (18.00)	−1276.60 (21.12)	−1553.13 (16.85)	−771.20 (18.70)	−865.34 (18.76)	−1306.56 (18.70)
AIC	SEP-HMM	1891.54 (37.57)	2100.71 (41.16)	2896.34 (37.91)	2226.42 (35.52)	2514.53 (42.14)	3070.50 (35.52)	1528.55 (37.54)	1703.10 (35.02)	2552.61 (34.35)
	SN-HMM	1891.10 (37.49)	2115.04 (44.84)	2946.76 (48.08)	2239.32 (36.11)	2556.53 (43.92)	3114.47 (38.55)	1559.83 (37.40)	1741.25 (36.93)	2619.00 (39.05)
	Gaussian-HMM	1887.81 (37.44)	2127.75 (41.73)	2976.08 (37.32)	2237.23 (35.99)	2567.20 (42.24)	3120.26 (33.7)	1556.39 (37.40)	1744.67 (37.53)	2627.11 (37.40)
BIC	SEP-HMM	1937.90 (37.57)	2147.08 (41.16)	2942.70 (37.91)	2272.78 (35.52)	2560.89 (42.14)	3116.86 (35.52)	1574.91 (37.54)	1749.46 (35.02)	2598.97 (34.35)
	SN-HMM	1929.03 (37.49)	2152.97 (44.84)	2984.69 (48.08)	2277.25 (36.11)	2594.46 (43.92)	3152.41 (38.55)	1597.76 (37.40)	1779.18 (36.93)	2656.94 (39.05)
	Gaussian-HMM	1917.31 (37.44)	2157.25 (41.73)	3005.58 (37.32)	2266.74 (35.99)	2596.70 (42.24)	3149.77 (33.70)	1585.90 (37.40)	1774.18 (37.53)	2656.61 (37.40)

Cells highlighted in green indicate the best performance in each scenario.

Table 5. Summary of the fitting results for all three models across the two cases.

Dataset	Model	Evaluation Metric
Dataset	Model	Log-Likelihood	AIC	BIC
JCI	SEP-HMM	−3047.90	6117.79	6163.66
	SN-HMM	−3083.43	6184.85	6222.38
	Gaussian-HMM	−3083.43	6180.85	6210.04
TEMP	SEP-HMM	402.11	−782.23	−731.89
	SN-HMM	339.53	−661.06	−619.87
	Gaussian-HMM	282.40	−550.80	−518.77

Cells highlighted in green indicate the best performance in each dataset.

Table 6. Summary statistics of the decoded hidden states for the JCI and TEMP datasets.

Dataset	Hidden State	Characteristics
Dataset	Hidden State	Count (%)	Mean	SD	Max	Min	Skewness	Kurtosis
JCI	State 1	232 (48.54%)	6830.40	105.23	7016.84	6565.73	−0.40	−0.73
	State 2	246 (51.46%)	7323.01	206.00	7905.39	6970.74	0.87	−0.02
TEMP	State 1	501 (69.68%)	−0.10	0.14	0.21	−0.80	−1.58	2.72
	State 2	218 (30.32%)	0.24	0.19	0.88	−0.41	−0.52	1.63

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Unggul, D.B.; Iriawan, N.; Irhamah, I.; Prabowo, A.A. SEP-HMM: A Flexible Hidden Markov Model Framework for Asymmetric and Non-Mesokurtic Emission Patterns. Mathematics 2026, 14, 393. https://doi.org/10.3390/math14030393

AMA Style

Unggul DB, Iriawan N, Irhamah I, Prabowo AA. SEP-HMM: A Flexible Hidden Markov Model Framework for Asymmetric and Non-Mesokurtic Emission Patterns. Mathematics. 2026; 14(3):393. https://doi.org/10.3390/math14030393

Chicago/Turabian Style

Unggul, Didik Bani, Nur Iriawan, Irhamah Irhamah, and Andriyas Aryo Prabowo. 2026. "SEP-HMM: A Flexible Hidden Markov Model Framework for Asymmetric and Non-Mesokurtic Emission Patterns" Mathematics 14, no. 3: 393. https://doi.org/10.3390/math14030393

APA Style

Unggul, D. B., Iriawan, N., Irhamah, I., & Prabowo, A. A. (2026). SEP-HMM: A Flexible Hidden Markov Model Framework for Asymmetric and Non-Mesokurtic Emission Patterns. Mathematics, 14(3), 393. https://doi.org/10.3390/math14030393

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SEP-HMM: A Flexible Hidden Markov Model Framework for Asymmetric and Non-Mesokurtic Emission Patterns

Abstract

1. Introduction

2. Methods

2.1. Hidden Markov Model

2.2. Proposed Method: SEP-HMM

2.3. Parameter Estimation of SEP-HMM

2.4. Decoding Algorithm of SEP-HMM

2.5. Model Evaluation

2.6. Simulation Setup

3. Simulation Results

3.1. Evaluation of SEP-HMM Parameter Estimates

3.2. Model Fit Comparison

3.3. Hidden State Decoding Performance

4. Real-Case Implementation

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Proof of Proposition 1

Appendix A.2. Proof of Proposition 2

Appendix A.3. Proof of Proposition 3

Appendix A.4. Proof of Proposition 4

Appendix B

Appendix B.1. Update for the Initial Distribution $π$

Appendix B.2. Update for the Transition Probability Matrix $A$

Appendix B.3. Update for the Emission Parameter $Θ$

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

SEP-HMM: A Flexible Hidden Markov Model Framework for Asymmetric and Non-Mesokurtic Emission Patterns

Abstract

1. Introduction

2. Methods

2.1. Hidden Markov Model

2.2. Proposed Method: SEP-HMM

2.3. Parameter Estimation of SEP-HMM

2.4. Decoding Algorithm of SEP-HMM

2.5. Model Evaluation

2.6. Simulation Setup

3. Simulation Results

3.1. Evaluation of SEP-HMM Parameter Estimates

3.2. Model Fit Comparison

3.3. Hidden State Decoding Performance

4. Real-Case Implementation

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Proof of Proposition 1

Appendix A.2. Proof of Proposition 2

Appendix A.3. Proof of Proposition 3

Appendix A.4. Proof of Proposition 4

Appendix B

Appendix B.1. Update for the Initial Distribution π

Appendix B.2. Update for the Transition Probability Matrix A

Appendix B.3. Update for the Emission Parameter Θ

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix B.1. Update for the Initial Distribution $π$

Appendix B.2. Update for the Transition Probability Matrix $A$

Appendix B.3. Update for the Emission Parameter $Θ$