Particle Filtering Estimation of Regime Switching Factor Model and Its Application in Statistical Arbitrage Strategy

Mu, Yu; Frey, Robert J.

doi:10.3390/jrfm18100549

Open AccessArticle

Particle Filtering Estimation of Regime Switching Factor Model and Its Application in Statistical Arbitrage Strategy

by

Yu Mu

^* and

Robert J. Frey

^*

Department of Applied Mathematics and Statistics, State University of New York at Stony Brook, Stony Brook, NY 11794, USA

^*

Authors to whom correspondence should be addressed.

J. Risk Financial Manag. 2025, 18(10), 549; https://doi.org/10.3390/jrfm18100549

Submission received: 21 July 2025 / Revised: 15 September 2025 / Accepted: 23 September 2025 / Published: 1 October 2025

(This article belongs to the Section Risk)

Download

Browse Figures

Versions Notes

Abstract

Statistical factor models are widely applied across various domains of the financial industry, including risk management, portfolio selection, and statistical arbitrage strategies. However, conventional factor models often rely on unrealistic assumptions and fail to account for the fact that financial markets operate under multiple regimes. In this paper, we propose a regime-switching factor model estimated via a particle filtering algorithm, which is a Monte Carlo-based method well-suited for handling nonlinear and non-Gaussian systems. Our empirical results show that incorporating dynamic structure and a regime-switching mechanism significantly enhances the model’s ability to detect structure breaks and adapt to evolving market conditions. This leads to improved performance and reduced drawdowns in the equity statistical arbitrage strategies.

Keywords:

regime switching; dynamic factor model; particle filtering; statistical arbitrage strategies

1. Introduction

Factor analysis plays a critical role in quantitative finance, with applications in asset pricing, risk management, portfolio selection, and statistical arbitrage strategies. Broadly speaking, factor models can be categorized into two types: explicit factor models, which fall under supervised learning, and statistical factor models, which are typically unsupervised. Compared to explicit factor models such as Barra risk model (Barra, 2004), statistical factor models offer two key advantages: they are numerically more stable in capturing the underlying covariance structure using fewer factors, and they are asset-class agnostic, making them broadly applicable across different markets.

However, financial markets are far from stationary. They undergo regime shifts that fundamentally alter market dynamics. For example, the transition from the unusually low-volatility environment of 2017 to the volatility spike in February 2018, the U.S. Federal Reserve’s abrupt rate hikes in 2022, or geopolitical shocks such as Russia’s invasion of Ukraine all represent regime-switching events. Standard statistical factor models, which assume globally static factor loadings, often fail to forecast covariances accurately during such transitions, the very moments when robust risk modeling is most needed.

The literature on statistical factor models has evolved to address such challenges. Early PCA-based factor models (Jolliffe, 2004) are limited in parametric structure and perform poorly when within group variation dominates between group variation (Hinton & Dayan, 1997; McLachlan & Peel, 2000). In the 1980s, maximum likelihood (ML) factor analysis estimated via the EM algorithm was introduced by (Rubin & Thayer, 1982). This was further generalized in the 1990s by (Ghahramani & Hinton, 1996), who proposed the mixture of factor analyzers (MFA). By combining clustering with local dimension reduction, MFA works better than classical ML factor analysis due to its ability of segmenting data into different market regimes, each with its own factor structure, improving on the “one-size-fits-all” ML factor models.

Despite these advancements, within each regime, MFA still uses a classical ML factor model which assumes i.i.d. hidden factors and Gaussian innovations. Dynamic factor models extend this framework by introducing a recursive structure for hidden factor returns, capturing their time dynamics more realistically. Earlier works such as (Bai & Wang, 2014; Forni et al., 2009; Geweke & Zhou, 1996; Kim & Halbert, 1999) typically assume a simple first-order autoregressive (AR(1)) structure for factor returns, while later developments extend the underlying process to ARMA and VARMA formulations (see Peña & Poncela, 2004, 2006; Urteaga et al., 2017; Varga & Szendrei, 2024). These dynamic factor models are capable of handling both stationary and nonstationary hidden factors.

In this paper, we focus on the AR(1)-based dynamic factor models with regime-switching factor loadings governed by a first order Markov chain. Such models, or, more generally, regime switching state space models, have been extensively studied in the literature (see Carvalho et al., 2010; Kim & Halbert, 1999; Whiteley et al., 2010). This framework not only models the hidden factor process but also captures the regime switching effects inherent in financial data. Common estimation approaches include Gibbs sampling, the EM algorithm, particle MCMC and particle filtering. We adopt the particle filtering framework for its computational efficiency and sequential structure, which aligns closely with real world trading scenarios.

Particle filtering (Arulampalam et al., 2002; Doucet & Johansen, 2009) is a class of Monte Carlo methods designed to perform recursive Bayesian inference when analytical solutions such as the Kalman filter or Baum–Welch algorithm are infeasible. Although it is well developed when model parameter values are known, parameter learning in more complex settings remains challenging. Specifically, when parameters evolve according to a stochastic process rather than being constant, such as in applications in finance, an adaptive parameter learning algorithm becomes critical. To learn the parameter values, one early approach by (Liu & West, 2001) incorporates parameters into the hidden state vector, but this often exacerbates the degeneracy issue in particle filtering. More computationally robust approaches have been introduced in (Djuric et al., 2012) and (Carvalho et al., 2010). Instead of combining parameter estimation to the learning algorithm they marginalize over parameters to reduce the estimation variance of their system. In this work, we implemented a particle filtering algorithm inspired by (Djuric et al., 2012) and (Carvalho et al., 2010) and applied it to estimate a regime switching dynamic factor model using real financial data.

The goal of this study is not merely methodological. By evaluating statistical factor models in the context of statistical arbitrage strategies, we directly test their practical usefulness for risk modeling. Following (Avellaneda & Lee, 2010), we adopt a simplified trading framework where performance differences are driven primarily by the quality of covariance forecasts. In this setup, a superior factor model should deliver more accurate residuals, adapt more quickly to regime shifts, and ultimately provide a stronger risk foundation for arbitrage strategies.

The rest of paper is organized as follows: Section 2 introduces the state space model and its estimation methods, including Kalman filter, EM algorithm, Particle filter, and the more recent development in parameter learning—the particle learning algorithm. Section 3 presents statistical factor models such as the MLE factor model, mixture of factor analyzers and the regime-switching dynamic factor model. We highlight the evolution of statistical factor models over recent decades and provides a simulation study demonstrating the effectiveness of particle learning algorithm for estimating the regime-switching dynamic factor model. Section 4 presents empirical studies applying the regime-switching dynamic factor model in the context of equity stat-arb strategy and compares its performance to the conventional MLE factor model. Finally, Section 5 concludes the paper.

2. State Space Model and Its Estimation

A state space model is a framework used to describe the evolution of a dynamical system over time, particularly when the system is only partially observed. It has numerous applications in finance, such as time series prediction (alpha generation), statistical factor analysis (risk modeling), and portfolio selection (see Kim & Halbert, 1999; Kolm & Ritter, 2014). In the most general form, a state space model can be represented by the following:

\begin{matrix} x_{t} & = f_{t} (x_{t - 1}, w_{t}) \end{matrix}

(1)

\begin{matrix} y_{t} & = g_{t} (x_{t}, v_{t}) \end{matrix}

(2)

where

x_{t}

denotes the hidden states with dimension k, and

y_{t}

the observations with dimension s. The terms

w_{t}

and

v_{t}

represent the noise terms for the evolution Equation

(1)

and the observation Equation

(2)

, with dimensions w and v, respectively. Throughout this work, we use bold lowercase to denote vectors, and bold uppercase to denote matrices. The functions

f_{t}

and

g_{t}

are general nonlinear functions defining the state transition and observation processes. When the parameter values of the above equations are known, inference about the hidden states can be efficiently performed using the recursive Bayesian algorithm (Arulampalam et al., 2002), which is summarized as Algorithm 1:

Algorithm 1 Recursive Bayesian Algorithm

Prediction:

$p (x_{t} ∣ y_{1 : t - 1}) = \int p (x_{t} ∣ x_{t - 1}) p (x_{t - 1} ∣ y_{1 : t - 1}) d x_{t - 1}$

(3)
Update:

$p (x_{t} ∣ y_{1 : t}) = \frac{p (y_{t} ∣ x_{t}) p (x_{t} ∣ y_{1 : t - 1})}{p (y_{t} ∣ y_{1 : t - 1})}$

(4)

Where the denominator in Equation

(4)

is called the marginal likelihood (or evidence), which is another intergral needs to be solved (see Appendix A).

2.1. Kalman Filter and EM-Algorithm Estimation

When the dynamical system described by Equations

(1)

and

(2)

is linear, time-invariant, and finite-dimensional, functions

f_{t}

, and

g_{t}

can be expressed in the following form:

\begin{matrix} x_{t} & = B x_{t - 1} + w_{t} \end{matrix}

(5)

\begin{matrix} y_{t} & = A x_{t} + v_{t} \end{matrix}

(6)

and if we further assume

w_{t} \sim N (0, Q)

,

v_{t} \sim N (0, R)

, then the well-known Kalman filter can be derived to solve Equations

(5)

and

(6)

analytically as described in (Arulampalam et al., 2002) and in Algorithm A1 (see Appendix B). When parameters

A

,

B

are unknown, the EM algorithm (Ghahramani & Hinton, 1996) can be employed for their estimation. In this framework, the Kalman filter is used in the E-step to compute the expected sufficient statistics of the hidden states, while the M-step updates the parameter estimates (see Appendix C).

2.2. Particle Filter

When the dynamical system

(1)

and

(2)

do not admit a linear-Gaussian representation as in

(5)

and

(6)

, deriving an analytical solution for the recursive Bayesian inference becomes challenging. In such cases, particle filtering techniques has been a very popular approach for solving nonlinear dynamical system, including models like the stochastic volatility model (See Djuric et al., 2012).

The Sequential Importance Sampling (SIS) filter (Gordon et al., 1993) approximates the joint posterior density at time t using the importance sampling method:

p (x_{1 : t} ∣ y_{1 : t}) \approx \sum_{i = 1}^{N} W_{t}^{i} δ (x_{1 : t} - x_{1 : t}^{i})

(7)

where

δ (\cdot)

is the Dirac delta function, and

W_{t}^{i}

is the importance weight for particle i, given by the following:

W_{t}^{i} \propto \frac{p (x_{1 : t} ∣ y_{1 : t})}{q (x_{1 : t} ∣ y_{1 : t})}

(8)

With the factorization

q (x_{1 : t} ∣ y_{1 : t}) = q (x_{t} ∣ x_{t - 1}, y_{t}) q (x_{1 : t - 1} ∣ y_{1 : t - 1})

(9)

sequential sampling of states becomes feasible, which forms the foundation of SIS particle filter (see Algorithm A4).

In practice, the SIS particle filter suffers from weight degeneracy: the variance of the weight

W_{t}^{i}

will increase over time until only one particle dominates. To alleviate this issue, the Sequential Importance Resampling (SIR) filter resamples the particles based on their weights, replicating those particles with large weights and eliminating particles with small weights, thereby reducing variance (Doucet & Johansen, 2009).

Although the optimal importance density for a generic particle filter algorithm is

p (x_{t} ∣ x_{t - 1}^{i}, y_{t})

(Doucet et al., 2000), the SIR particle filter typically uses

p (x_{t} ∣ x_{t - 1}^{i})

as an important density and applies resampling at each step (see Algorithm A5). While SIR filter improves stability, it still has weaknesses (Pitt & Shephard, 1999). First, if there is an outlier at time t, the weights

W_{t}^{i}

will be unevenly distributed and the filter will become imprecise. Second, the predictive density

p (x_{t} ∣ x_{t - 1}^{i})

often fails to capture tail behavior accurately, causing some particles to drift into low-likelihood regions during step 2 in Algorithm A5.

To address these limitations, the auxiliary particle filter proposed by (Pitt & Shephard, 1999) introduces a lookahead step—also known as the auxiliary variable mechanism. This approach uses a predictive weighting scheme (see Algorithm A6 step 1) that estimates the likelihood of the upcoming observation given the predicted state, a process often referred to as lookahead weighting. When the observation equation is informative in a dynamical system, the auxiliary particle filter usually will reduce variance in particle weights and hence perform better.

2.3. Particle Filter with Parameter Learning

The particle filter algorithms discussed so far assume that model parameters are known. However, in many applications, model parameters are typically unknown. Learning the static parameter values—or more broadly—identifying the coefficient evolution process—is a critical aspect of system identification. The integration of parameter learning into the particle filtering framework has been extensively studied (Kantas et al., 2015).

One idea is to treat parameters as additional hidden state variables in the particle filtering framework, but this approach often leads to severe degeneracy issue. To address this issue, Liu and West (2001) proposed introducing an artificial evolutional model for the parameters and augment the hidden state space by including them.

Another idea is to integrate out (or marginalize out) the parameters whenever feasible (Djuric et al., 2012; Schon et al., 2005; Storvik, 2002). The benefits of the marginalized particle filter have been discussed in (Özkan et al., 2013).

Building on these ideas, Carvalho et al. (Carvalho et al., 2010) introduced particle learning approach, a novel particle filter with the embedded parameter learning step. The particle learning method marginalizes both the parameters and the hidden states by only tracking their sufficient statistics, which are updated later using recursive Bayesian algorithm and Kalman filter, respectively. It is also constructed under the auxiliary particle filter framework; the performance improvement has been demonstrated in (Carvalho et al., 2010; Lopes & Tsay, 2010).

In our paper, we develop a simplified version of the particle learning algorithm to estimate regime-switching dynamic factor model.

2.3.1. Particle Learning

Particle learning algorithm was introduced by Carvalho et al. in (Carvalho et al., 2010), extending the idea of auxiliary particle filter to incorporate parameter learning. While the overall filtering scheme is the same as auxiliary particle filter, the key innovation lies in tracking the “essential state”

z_{t}

.

Specifically,

z_{t}

includes the following:

the sufficient statistics of the parameter vector $s_{t}$ ;
the sufficient statistics of hidden states $x_{t}$ , $s_{t}^{x}$ ; and
the current value of the parameter vector, $θ_{t}$ ;

$z_{t} = \{s_{t}, s_{t}^{x}, θ_{t}\} .$

The reason we include parameter value in the essential state is because for evaluating likelihood and drawing samples from predictive density we need to use parameter values. However, their values are not directly sampled at each step, which most likely causes degeneracy issues; instead, they are inferred offline from their associated sufficient statistics. This approach is also referred to as marginalization by simulation.

The derivation of this filter starts with the recursive relation:

p (z_{t} ∣ y_{1 : t}) \propto \int p (y_{t} ∣ z_{t - 1}) p (z_{t} ∣ z_{t - 1}, y_{t}) p (z_{t - 1} ∣ y_{1 : t - 1}) d z_{t - 1}

where

\begin{matrix} p (z_{t} ∣ z_{t - 1}, y_{t}) = \int p (s_{t} ∣ s_{t - 1}, x_{t}, y_{t}) p (s_{t}^{x} ∣ s_{t - 1}^{x}, x_{t}, y_{t}) \\ \cdot p (x_{t} ∣ s_{t - 1}^{x}, y_{t}) d x_{t} \end{matrix}

and

p (x_{t} ∣ s_{t - 1}^{x}, y_{t}) = \int p (x_{t} ∣ x_{t - 1}, y_{t}) p (x_{t - 1} ∣ s_{t - 1}^{x}, y_{t}) d x_{t - 1}

So, given the particles of

s_{t - 1}^{x}

, we can construct a Monte Carlo estimate for

p (x_{t} ∣ s_{t - 1}^{x}, y_{t})

by sampling from

p (x_{t - 1} ∣ {(s_{t - 1}^{x})}^{i}, y_{t})

and

p (x_{t} ∣ x_{t - 1}^{i}, y_{t})

, respectively. After that we can update

s_{t}

and

s_{t}^{x}

deterministically. The particle learning method can be summarized as Algorithm 2:

Algorithm 2 Particle Learning

(Resample) Draw $k^{i} \sim M u l t i \{W_{t}^{1}, \dots, W_{t}^{N}\}$ , where $W_{t}^{i} \propto p (y_{t} ∣ z_{t - 1}^{i})$
(Propagate) Draw $x_{t - 1}^{i} \sim p (x_{t - 1} ∣ {(s_{t - 1}^{x})}^{k^{i}}, θ_{t - 1}^{k^{i}}, y_{t})$
(Propagate) Draw $x_{t}^{i} \sim p (x_{t} ∣ x_{t - 1}^{i}, θ_{t - 1}^{k^{i}}, y_{t})$
(Update) $s_{t}^{i} = S (s_{t - 1}^{k^{i}}, x_{t}^{i}, x_{t - 1}^{i}, y_{t})$
(Sample) Draw $θ_{t}^{i} \sim p (θ_{t} ∣ s_{t}^{i})$
(Update) ${(s_{t}^{x})}^{i} = K ({(s_{t - 1}^{x})}^{k^{i}}, θ_{t}^{i}, y_{t})$

Where

S

denotes the deterministic updating function for parameter sufficient statistics and

K

denotes Kalman filter for state sufficient statistics update.

3. Statistical Factor Analysis

Statistical factor analysis is a technique used to explain the correlation structure of multivariate observations through a lower-dimensional set of unobserved factors. The model can be expressed as follows:

y_{t} = B x_{t} + v_{t}

(10)

where

y_{t}

is

(s \times 1)

vector of observations,

x_{t}

is

(k \times 1)

vector of statistical factors, and

B

is

(s \times k)

factor loading matrix.

v_{t}

is the

(s \times 1)

error term that follows certain distribution assumption. The hidden factors

x_{t}

are often assumed to follow an i.i.d. standard normal distribution, i.e.,

x_{t} \sim N (0, I_{k})

, and are assumed to be independent of the error term

v_{t}

.

3.1. MLE Factor Analysis

The most commonly used approach for estimating the factor model in Equation

(10)

is the Expectation-Maximization (EM) algorithm for maximum likelihood estimation (MLE) (Ghahramani & Hinton, 1996; McLachlan & Peel, 2000; Rubin & Thayer, 1982). The EM algorithm for the maximum likelihood estimation of the factor model is outlined below as Algorithm 3:

Algorithm 3 EM algorithm for Statistical Factor Analysis

E-Step: Given current fit of $B$ and $R$ , calculate sufficient statistics:

$\begin{matrix} γ & = B^{T} {(B B^{T} + R)}^{- 1} \end{matrix}$

(11)

$\begin{matrix} E [x_{t} ∣ y_{t}] & = γ y_{t} \end{matrix}$

(12)

$\begin{matrix} E [x_{t} x_{t}^{T} ∣ y_{t}] & = I - γ B + γ y_{t} y_{t}^{T} γ^{T} \end{matrix}$

(13)
M-Step: Update parameter estimation

$\begin{matrix} \hat{B} & = (\sum_{t = 1}^{T} y_{t} E [x_{t} ∣ y_{t}]) {(\sum_{t = 1}^{T} E [x_{t} x_{t}^{T} ∣ y_{t}])}^{- 1} \end{matrix}$

(14)

$\begin{matrix} \hat{R} & = \frac{1}{T} diag (\sum_{t = 1}^{T} (y_{t} y_{t}^{T} - \hat{B} E [x_{t} ∣ y_{t}] x_{t}^{T})) \end{matrix}$

(15)

The EM algorithm starts with certain initial values for

B

and

R

and terminates when the rate of change in the log likelihood falls below some critical value. The log likelihood can be computed as follows:

\begin{matrix} log L (B, R ∣ y_{1 : T}) & = & - \frac{1}{2} (T (s log (2 π) + log (det (B B^{T} + R))) \\ + \sum_{t = 1}^{T} y_{t}^{T} {(B B^{T} + R)}^{- 1} y_{t}) \end{matrix}

(16)

3.2. Mixture of Factor Analyzer

The factor model in Equation

(10)

is a global linear model. However, in many real world applications—such as in financial engineering—it fails to reflect the fact that the observed data present multiple regimes whose behaviors differ materially. To address this limitation, the mixture of factor analyzer (MFA) model extends the MLE statistical factor model by combining local dimension reduction method with a Gaussian mixture model for clustering. Specifically, the MFA model can be described in the following form:

y_{t} = B (i) x_{t} + v_{t} with prob . π_{i} (i = 1, \dots, m)

(17)

where

\sum_{i = 1}^{m} π_{i} = 1, π_{i} \geq 0

, and the index i denotes the component (or regime), drawn from a categorical distribution with mixing probabilities

\{π_{i}\}

.

B (i)

is the factor loading matrix specific to component i. Following the formulation in (Ghahramani & Hinton, 1996), we assume the observation covariance

R

for each component are the same, and the number of mixture is fixed prior to the estimation step. The EM algorithm for estimating the parameters of the MFA model is described below in Algorithm 4:

Algorithm 4 EM algorithm for Mixture of Factor Analyzer

E-Step: Given current fit of $π_{i} for i = 1, \dots, m$ , $B$ and $R$ , calculate,

$\begin{matrix} τ_{i j} & = & \frac{π_{i} N (y_{j}, B (i) B {(i)}^{T} + R)}{\sum_{i = 1}^{m} π_{i} N (y_{j}, B (i) B {(i)}^{T} + R)} \end{matrix}$

(18)

$\begin{matrix} γ (i) & = & B {(i)}^{T} {(B (i) B {(i)}^{T} + R)}^{- 1} \end{matrix}$

(19)

$\begin{matrix} Σ (i) & = & \frac{1}{T} \sum_{j = 1}^{T} (\sqrt{τ_{i j}} y_{j}) {(\sqrt{τ_{i j}} y_{j})}^{T} \end{matrix}$

(20)
M-Step:

$\begin{matrix} \hat{B} (i) & = & \frac{Σ (i) γ {(i)}^{T}}{\frac{1}{T} \sum_{j = 1}^{T} τ_{i j} (I_{s} - γ (i) B (i)) + γ (i) Σ (i) γ {(i)}^{T}} \end{matrix}$

(21)

$\begin{matrix} \hat{R} & = & \frac{1}{T} diag (\sum_{i = 1}^{m} Σ (i) - \sum_{i = 1}^{m} \hat{B} (i) γ (i) Σ (i)) \end{matrix}$

(22)

$\begin{matrix} {\hat{π}}_{i} & = & \frac{1}{T} \sum_{j = 1}^{T} τ_{i j} \end{matrix}$

(23)

Starting with initial values of

π_{i}

for

i = 1, \dots, m

,

B (i)

, and

R

, we can iterate Algorithm 4 until convergence. The log-likelihood of the observed data under the MFA model is given by the following:

log L (Ψ) = \sum_{j = 1}^{T} log \{\sum_{i = 1}^{m} π_{i} N (y_{j}, B (i) B {(i)}^{T} + R)\}

where

Ψ

denotes the set of all model parameters.

3.3. Regime-Switching Dynamic Factor Model

Although the mixture of factor analyzer (MFA) model can capture data generated from multiple regimes, it still assumes the observations are i.i.d. and fails to account for the temporal persistence of market regimes. In practice, particularly in financial applications, market regimes tend to persist over time, with each regime exhibiting distinct statistical properties. Regime switching model equipped with Markov chain mechanism can better model the regime persistence and capture the abrupt regime shifts more efficiently than commonly used “low-pass filtering” approaches such as rolling window estimation. It fits better for financial time series, where capturing time dynamics is essential.

3.3.1. Methods

In this work, we represent the regime-switching dynamic factor model within a state space framework. The estimation of regime switching state space models has been well studied in the literature (Andrieu et al., 2010; Carter & Kohn, 1994; Kim & Halbert, 1999), with most approaches relying on the Markov Chain Monte Carlo (MCMC) algorithms. MCMC approaches are not only computationally very expensive, but also suffer slow convergence issue. Moreover, they are batch (offline) algorithms in nature which limits their use in real-time trading.

In this work, we will rely on the particle learning technique that is introduced in Section 2.3.1 as our estimation method. Computationally, it is significantly faster than MCMC algorithms and, being an online algorithm, it can be adopted in the real time trading environment. The convergence of particle filtering algorithm is well studied in the literature (Del Moral, 2004; Doucet et al., 2000).

Another key advantage of the particle filtering framework is its modular design, which makes it highly adaptable to complex model structures. Additional structure can be incorporated into the system relatively easier than MCMC framework, as long as it remains possible to draw samples from the importance density and evaluate the likelihood function upon receiving the new observations (see Djuric et al., 2012; Urteaga et al., 2017 for examples).

To introduce our model, we first define a discrete latent regime variable

r_{t} \in \{1, \dots, K\}

which captures the regime at time t. The essential state used in the particle learning algorithm is then extended as follows:

Z_{t} = \{r_{t}, s_{t}, s_{t}^{x}, θ_{t}\},

where

s_{t}

and

s_{t}^{x}

denote the sufficient statistics for the parameters and the hidden states, respectively, and

θ_{t}

denotes the model parameters.

The model is specified as follows:

\begin{matrix} x_{t} & = A x_{t} + w_{t} \end{matrix}

(24)

\begin{matrix} y_{t} & = B (r_{t}) x_{t} + \sqrt{λ_{t}} v_{t} \end{matrix}

(25)

\begin{matrix} λ_{t} & \sim I G (v / 2, v / 2) \end{matrix}

(26)

where

I G (\cdot)

denotes the inverse Gamma distribution

v_{t} \sim N (0_{s}, R_{s})

,

w_{t} \sim N (0_{k}, I_{k})

. In this representation, we model heavy tailed innovation using the data augmentation scheme of (Carlin & Polson, 1991), where at each time t, we will follow the two step approach: (i)

λ_{t} \sim Inv - Gamma (v / 2, v / 2)

, (ii)

η_{t} = {\sqrt{λ}}_{t} v_{t}

, so that

η_{t} \sim t_{v} (0_{s}, R_{s})

where

R_{s}

is a diagonal matrix.

If the degree of freedom is given, then at each time step t, our system is a conditionally Gaussian model. To derive the particle learning algorithm for this model (Equations (24)–(26)), we compute the predictive density

p (y_{t} ∣ z_{t - 1}, λ_{t})

by integrating out the latent regime variable

r_{t}

:

\begin{matrix} p (y_{t} ∣ z_{t - 1}, λ_{t}) & = & \sum_{r_{t} = 1}^{K} p (y_{t} ∣ r_{t}, {(s_{t - 1}^{x}, θ_{t - 1}, λ_{t})}^{i}) \\ \cdot p (r_{t} ∣ r_{t - 1}, θ_{t - 1}) \end{matrix}

(27)

where

p (y_{t} ∣ r_{t}, {(s_{t - 1}^{x}, θ_{t - 1}, λ_{t})}^{i})

is the conditional likelihood, which is derived in Appendix G, and

p (r_{t} ∣ r_{t - 1}, θ_{t - 1})

is given by the estimated transition matrix. To propagate

r_{t}

, we sample from the posterior distribution:

\begin{matrix} p (r_{t} ∣ {(r_{t - 1}, s_{t - 1}^{x}, θ_{t - 1}, λ_{t})}^{i}, y_{t}) & \propto & p (y_{t} ∣ r_{t}, {(s_{t - 1}^{x}, θ_{t - 1}, λ_{t})}^{i}) \\ \cdot p (r_{t} ∣ r_{t - 1}, θ_{t - 1}) \end{matrix}

(28)

The particle filtering algorithm for regime switching dynamic factor model is described below in Algorithm 5:

Algorithm 5 Particle Filter for Regime Switching Dynamic Factor Model

(Sampling) Draw $λ_{t}^{i} \sim I G (v / 2, v / 2)$ ( $λ_{t}^{i} = 1$ for Gaussian model)
(Resample) Draw $k^{i} \sim M u l t i \{W_{t}^{1}, \dots, W_{t}^{N}\}$ , with $W_{t}^{i} \propto p (y_{t} ∣ z_{t - 1}^{i}, λ_{t})$ where

$p (y_{t} ∣ z_{t - 1}^{i}, λ_{t}) = \sum_{k = 1}^{K} N (y_{t}; m_{y} (k), C_{y} (k)) p (r_{t} = k ∣ r_{t - 1}^{i}, θ_{t - 1})$

and $m_{y} (k) = B (k) A m_{t - 1}$ , $C_{y} (k) = B (k) A C_{t - 1} A^{T} B {(k)}^{T} + B (k) B {(k)}^{T} + λ_{t} R_{s}$ .
(Propagate) Draw $r_{t}^{i} \sim p (r_{t} ∣ {(s_{t - 1}^{x}, r_{t - 1}, θ_{t - 1})}^{k^{i}}, y_{t})$ , which follows multinomial distribution with:

$p (r_{t} = k ∣ s_{t - 1}^{x}, r_{t - 1}, θ_{t - 1}, y_{t}) \propto N (y_{t}; m_{y}^{k^{i}} (k), C_{y}^{k^{i}} (k)) p (r_{t} = k ∣ r_{t - 1}^{k^{i}}, θ_{t - 1}^{k^{i}})$
(Propagate) Draw $x_{t}^{i} \sim p (x_{t} ∣ r_{t}^{i}, x_{t - 1}^{k^{i}}, θ_{t - 1}^{k^{i}}, y_{t})$ from normal distribution $N (m_{x}, C_{x})$ with:

$\begin{matrix} m_{x} & = & C_{x} (B {(r_{t})}^{T} {(λ_{t} R_{s})}^{- 1} y_{t} + A x_{t - 1}) \end{matrix}$

(29)

$\begin{matrix} C_{x}^{- 1} & = & B {(r_{t})}^{T} {(λ_{t} R_{s})}^{- 1} B (r_{t}) + I_{k} \end{matrix}$

(30)
(Update) $s_{t}^{i} = S (s_{t - 1}^{k^{i}}, r_{t}^{i}, r_{t - 1}^{k^{i}}, x_{t}^{i}, x_{t - 1}^{k^{i}}, y_{t})$ .
(Update) ${(s_{t}^{x})}^{i} = K ({(s_{t - 1}^{x})}^{k^{i}}, s_{t}^{i}, y_{t})$ and draw estimated states ${\hat{x}}_{t}^{i} \sim N (m_{x 1}^{i}, C_{x 1}^{i})$ .

Where the deterministic updating formula

S

in step 5 is given by the following:

\begin{matrix} v_{t} & = & φ^{2} v_{t - 1} + 1 \end{matrix}

(31)

\begin{matrix} g_{t} & = & Φ_{t - 1} x_{t} \end{matrix}

(32)

\begin{matrix} ζ_{t}^{2} & = & λ_{t} φ^{2} + x_{t}^{T} g_{t} \end{matrix}

(33)

\begin{matrix} {\hat{e}}_{t} & = & y_{t} - b_{t - 1}^{T} x_{t} \end{matrix}

(34)

\begin{matrix} b_{t} & = & b_{t - 1} + \frac{1}{ζ_{t}^{2}} g_{t} {\hat{e}}_{t}^{T} I_{r_{t} = k} \end{matrix}

(35)

\begin{matrix} Φ_{t} & = & \frac{1}{φ^{2}} (Φ_{t - 1} - \frac{1}{ζ_{t}^{2}} g_{t} g_{t}^{T} I_{r_{t} = k}) \end{matrix}

(36)

\begin{matrix} V_{t} & = & φ^{2} (V_{t - 1} + \sum_{k = 1}^{K} \frac{1}{ζ_{t}^{2}} {\hat{e}}_{t} {\hat{e}}_{t}^{T} I_{r_{t} = k}) \end{matrix}

(37)

\begin{matrix} g_{t}^{x} & = & Ψ_{t - 1} x_{t - 1} \end{matrix}

(38)

\begin{matrix} σ_{t}^{2} & = & φ^{2} + x_{t - 1}^{T} g_{t}^{x} \end{matrix}

(39)

\begin{matrix} {\hat{e}}_{t}^{x} & = & x_{t} - a_{t - 1}^{T} x_{t - 1} \end{matrix}

(40)

\begin{matrix} a_{t} & = & a_{t - 1} + \frac{1}{σ_{t}^{2}} g_{t}^{x} {\hat{e}}_{t}^{x T} \end{matrix}

(41)

\begin{matrix} Ψ_{t} & = & \frac{1}{φ^{2}} (Ψ_{t - 1} - \frac{1}{σ_{t}^{2}} g_{t}^{x} g_{t}^{x T}) \end{matrix}

(42)

where

i = 1, \dots, s

. The sampling density in step 4 is derived in Appendix H. As discussed in Appendix G, we can assume each row of transition matrix

T

,

p_{i}

, follows a Dirichlet distribution

D i r (p_{i} ∣ α_{i})

, for

i = 1, \dots, K

, and their sufficient statistics can be updated as follows:

α_{i, j, t} = α_{i, j, t - 1} + I_{r_{t} = j, r_{t - 1} = i}, for i, j = 1, \dots, K

(43)

For step 6, following the spirit of (Djuric et al., 2012), we compute parameter values directly from their sufficient statistics rather than sampling them. Compared with the particle learning method of (Carvalho et al., 2010), which samples parameter values from their posterior distributions, our analytical solution reduces Monte Carlo variance and improves computational efficiency without sacrificing model fidelity. Specifically, the parameter updates are as follows:

\begin{matrix} B (r_{t}) & = & b_{k, t} \end{matrix}

(44)

\begin{matrix} R_{s} & = & V_{t} / (v_{t} - 2) \end{matrix}

(45)

\begin{matrix} A & = & a_{t} \end{matrix}

(46)

\begin{matrix} T_{i j} & = & \frac{α_{i, j, t} - 1}{α_{0, t} - K} \end{matrix}

(47)

Here, we use the posterior mode (MAP) as a transition probability estimate. In this step,

K

denotes the Kalman filter update, and the detailed recursion is presented in Algorithm A1 of Appendix B. We also extend Carvalho’s approach by incorporating a mixture of Gaussian innovations, which provides a closer alignment with the heavy-tailed and non-Gaussian features observed in empirical financial data.

3.3.2. Simulation Analysis

To validate our particle filtering estimation algorithm, we apply it to a controlled synthetic data environment. We generate synthetic scenarios based on a three-regime dynamic factor model involving 15 synthetic instruments. This setup allows us to test the algorithm’s ability to recover latent states and parameters under regime-switching dynamics in a controlled setting.

The “true” model parameters are specified as follows:

Factor loading matrices (

B (r_{t})

):

B (r_{t} = 1) = [\begin{matrix} 0.93 & 0.316 & 0.184 \\ 0.205 & 0.568 & 0.596 \\ ⋮ & ⋮ & ⋮ \\ 0.81 & 0.096 & 0.219 \end{matrix}]

B (r_{t} = 2) = [\begin{matrix} - 0.56 & 1.011 & 0.945 \\ 2.821 & - 1.165 & 1.486 \\ ⋮ & ⋮ & ⋮ \\ 0.152 & 2.381 & - 0.153 \end{matrix}]

B (r_{t} = 3) = [\begin{matrix} - 0.694 & - 0.654 & - 0.443 \\ - 0.15 & - 2.678 & - 0.268 \\ ⋮ & ⋮ & ⋮ \\ - 0.917 & - 0.849 & - 0.811 \end{matrix}]

The observation covariance matrices (

R

):

Diag R = [\begin{matrix} 0.49 & 0.211 & \dots & 0.518 & 0.488 \end{matrix}]

The state transition matrix (

A

):

A = [\begin{matrix} 0.553 & 0 & 0 \\ 0 & 0.477 & 0 \\ 0 & 0 & 0.312 \end{matrix}]

and the regime transition matrix

T

is as follows:

T = [\begin{matrix} 0.995 & 0.0025 & 0.0025 \\ 0.0025 & 0.995 & 0.0025 \\ 0.0025 & 0.0025 & 0.995 \end{matrix}]

Since factor loadings and factor returns are not separately identifiable within our model structure, our validation analysis focuses on three key aspects:

Regime detection: assessing whether the algorithm can correctly identify latent regime shifts (see Figure 1);
Idiosyncratic risk estimation: evaluating the accuracy of instrument-specific variance estimates (see Figure 2);
Cumulative expected returns’ tracking: examining how well the estimated expected returns align with the “true” cumulative return over time (see Figure 3).

To initialize the particle filtering algorithm, we begin with non-informative priors. For example, the mean of the factor loading matrix is set to a zero rectangular matrix under each regime

b_{0} (r_{0}) = [\begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 \end{matrix}], r_{0} \in \{0, 1, 2\} .

The prior left covariance of the loading matrix is initialized with high dispersion, specifically a diagonal matrix whose entries are set to 10, to ensure weak prior information and allow the data to drive the estimation. The same applies to the prior distribution chosen for the factor transition matrix

A

, and the idiosyncratic risk distribution,

R^{- 1} \sim W_{s} (V_{0}^{- 1}, v_{0})

. In each case, we begin with non-informative means and large prior dispersion, thereby minimizing prior influence and ensuring that the posterior estimates are predominantly shaped by the data.

As the following results demonstrate, even when initialized with non-informative priors, our algorithm converges accurately to the “true” parameters, thereby validating the effectiveness of Algorithm 5.

As shown in Figure 1, the algorithm accurately captures the evolution of hidden regimes from both regime detection plot and cumulative regimes comparison plot.

In terms of idiosyncratic risk (or error variance), the algorithm also converges to their unbiased estimates, as illustrated in Figure 2. The blue line represents the mean of the particle estimates, while the green lines denote the 25th and 75th quantiles.

The most critical metric in statistical factor analysis is the model’s ability to accurately track expected returns. In our synthetic experiments, the proposed model exhibits strong performance in the cumulative expected return, as illustrated in Figure 3.

4. Results in Statistical Arbitrage Strategy

In both academic and industry, equity statistical arbitrage strategy (equity stat-arb) usually means the idea of trading groups of stocks against either explicit or synthetic factors, which can be seen as the generalization of “pairs-trading” (see Avellaneda & Lee, 2010). In some cases, we would long an individual stock and short the factors, and in others we would short the stock and long the factors. Overall, due to the netting of long and short positions, our exposure to the factors will be small. One key step of the equity stat-arb strategy is the decomposition of assets’ returns into the systematic part and idiosyncratic part. When we use a more effective factor model, we can have a better decomposition and hence a better strategy performance.

4.1. Market Neutral Strategy

In this paper, we evaluate the quality of factor models within the context of an equity statistical arbitrage strategy. Inspired by the approach of (Khandani & Lo, 2008), in order to avoid the unnecessary complexity of signal generation and potential overfitting issues, we adopt a deliberately simple reversion signal which is the negative value of previous day’s residuals. To ensure market neutrality, we construct a dollar-neutral portfolio each day, with equal dollar amounts allocated to long and short positions.

To be specific, we compare the performance of trading strategies derived from both the regime-switching dynamic factor model and static MLE factor model. For estimating the regime-switching dynamic factor model, we employ Algorithm 5. The initialization follows the same procedure as in the simulation analysis (Section 3.3.2), with one refinement: to improve initial convergence, we use the estimates from a mixture of factor analyzer (MFA), trained on a “burn-in” sample from 3 January 2005 to 1 January 2011, as the initial mean of the factor loadings,

b_{0} (r_{0})

. Furthermore, the residualization step is implemented in a rolling window framework so that the residuals are always generated out-of-sample. For the estimation of the static MLE factor model, we employed Algorithm 3 as described in Section 3.1. This serves as our baseline approach, providing a standard maximum likelihood estimation of factor loadings and covariances under the assumption of a single, time-invariant regime.

For computational efficiency, we designed a relatively lean experiment by restricting the dataset to the daily prices of the Dow Jones 30 constituents. Since factor models do not require the same scale-invariance adjustments as PCA-based approaches, we use raw returns rather than normalized return for the estimation. In this setting, we employ a three-factor model and assume that the factor loadings evolve according to a two-regime Markov chain. For each experiment, we use rolling window walk forward time series cross-validation framework to run our backtesting. The training data sample size is 120 days (around 6 months) and the test data sample size is 5 days (around one week). The objective of this experiment is not to identify or interpret the economic meaning of market regimes. Instead, our goal is to demonstrate that when a regime shift occurs, regardless of its underlying cause, our system adapts more rapidly to the new distribution (or data-generating process). This adaptability enables it to deliver a superior risk model for statistical arbitrage strategies.

After the residualization step, we can generate our signals using a naive rule-based approach. If the previous day’s residual for asset i has

v_{i, t - 1} > d

, where d is the threshold, then we will say that the asset i is over-priced at time

t - 1

and short this asset for the next time stamp t. For

v_{i, t - 1} < - d

, we will perform the opposite operation. For simplicity, in our experiment, we set

d = 0

. At each time step, we form a dollar-neutral portfolio by applying the mean-variance closed form solution to both the long and short sides:

w^{*} = λ Σ^{- 1} μ

(48)

where

Σ

is the covariance matrix of returns,

μ

represents the rule-based prediction, and

λ

is a scaling factor that controls overall risk.

4.2. Performance Comparison

We run this naive strategy using the MLE static factor model, the dynamic factor model without regime switching mechanism, and the regime switching dynamic factor model, respectively, and the strategy performance is as shown in Figure 4:

We can see from Figure 4 that the strategy achieves a higher Sharpe ratio and a lower maximum drawdown when using the regime-switching dynamic factor model compared to both the static MLE factor model and the simple dynamic factor model. Specifically, the improvement in Sharpe is approximately 63%, while the reduction in maximum drawdown is about 12%. We also observe that the simple dynamic factor model performs well, telling us the importance of modeling the dynamics of factor returns in equity arbitrage strategies. However, incorporating the regime-switching mechanism provides an additional advantage: it meaningfully reduces drawdowns during market regime shifts (most notably during the COVID period) and improves estimation accuracy in the aftermath of such events. Based on these findings, we conclude that dynamic factor models consistently outperform static factor models, and that incorporating a regime-switching mechanism enables the model to adapt much more quickly to abrupt regime shifts, such as those observed during COVID, than conventional approaches.

The associated historical effective sample size plot and regime detection plot are given in Figure 5:

In the above graphs, the effective sample size (ESS) is used to measure the stability of particle filtering algorithm in real data, and defined as:

ESS = \frac{1}{\sum_{i = 1}^{N} {(w_{t}^{(i)})}^{2}}

where

w_{t}^{(i)}

are normalized weights.

From the ESS graph, we observe that the particle filter estimation is stable, as most of the time the ESS is larger than 2000 for a three-factor model. From the regimes detection, we can see there are more regime changes in the 2020 COVID period and early 2024. Indeed our regime switching model performs better during those periods compared with the simple dynamic factor model and static factor model.

It is also worth noting that although our model does not assume the factor return process is stationary, it is also informative to check the estimated transition matrix:

\hat{A} = [\begin{matrix} - 0.00117863 & 0.02577551 & 0.03975952 \\ - 0.07506055 & - 0.04159962 & 0.02756931 \\ 0.01579488 & 0.07781727 & 0.01877453 \end{matrix}]

The modulus of its eigenvalues are: 0.074, 0.046, 0.046. Since

| λ_{i} (A) | < 1 for all λ_{i}

, the implied factor dynamics are stationary.

Although determining the number of factors and regimes is beyond the scope of this work, it is still worthwhile to examine results under varying specifications. By checking different numbers of factors and regimes, we can assess whether our earlier findings remain robust across alternative setups. The relevant results are summarized in the Table 1 and Table 2 as follows:

From the results above, we observe that our findings remain consistent across varying specifications of factor numbers and regime counts. This robustness test provides further confidence that improvements from regime-switching dynamic factor model are not sensitive to any specific parameter choices.

5. Discussion

This paper investigates the estimation of a regime-switching dynamic factor model using a particle learning algorithm. Through simulation studies, we validate our estimation approach by evaluating regime detection accuracy, idiosyncratic risk estimation, and the model’s ability to track expected returns. The empirical study in the equity statistical arbitrage framework further demonstrates that the regime-switching dynamic factor model outperforms a conventional static MLE factor model in capturing underlying data dynamics.

The particle learning algorithm implemented in this paper builds on the framework of (Carvalho et al., 2010), with several modifications. First, we extend the innovation distribution to a mixture of Gaussian distributions. Second, we simplify the parameter learning step by only tracking the sufficient statistics, rather than using the marginalization via simulation. Lastly, we apply the estimated model to the empirical data and compare its performance against the conventional MLE factor model.

We are able to see the performance improvements in the equity statistical arbitrage strategy when using our model. Our model not only improves the Sharpe ratio but also minimizes the maximum drawdown. These results align with our initial motivation: It is well known that financial market operates under multiple regimes, and incorporating the regime switching mechanism that is governed by the hidden Markov model in our model representation allows us better model the structural shifts and enhances risk management—particularly during periods of elevated volatility and contagion risk.

While the current particle learning algorithm is based on the vanilla auxiliary particle filter, there remains a lot of room to improve the quality of our particle filtering algorithm and hence improve the tracking of hidden states (or hidden factors in this model). For instance, the ABC-based sequential Monte Carlo filter (Jasra et al., 2010) could further enhance the filtering accuracy. Additionally, incorporating more robust, heavy-tailed innovation distributions (Schoutens, 2005) may improve the robustness of our estimation, particularly in the presence of extreme market movements. Investigating more sophisticated specifications for the hidden factor process, including nonstationary factor models, would also be a promising direction for future research (Peña & Poncela, 2004, 2006).

Author Contributions

Conseptualization, Y.M. and R.J.F.; methodology, Y.M. and R.J.F.; software, Y.M.; validation, Y.M. and R.J.F.; formal analysis, Y.M.; investigation, Y.M. and R.J.F.; resources, Y.M.; data curation, Y.M.; writing—original draft preparation, Y.M.; writing—review and editing, Y.M.; visualization, Y.M.; supervision, R.J.F.; project administration, R.J.F.; funding acquisition, Y.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported and funded by the Strategic Partnership for Industrial Resurgence (SPIR) program at Stony Brook University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data used in this study are publicly available from the yfinance API: https://ranaroussi.github.io/yfinance/index.html, accessed on 20 July 2025.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Marginal Likelihood

The marginal likelihood

p (y_{t} ∣ y_{1 : t - 1})

can be represented by the following:

p (y_{t} ∣ y_{1 : t - 1}) = \int p (y_{t} ∣ x_{t}) p (x_{t} ∣ y_{1 : t - 1}) d x_{t}

(A1)

Appendix B. Kalman Filter and Smoother

Algorithm A1 Kalman Filter

Prediction:

$\begin{matrix} m_{t ∣ t - 1} & = B m_{t - 1 ∣ t - 1} \end{matrix}$

(A2)

$\begin{matrix} C_{t ∣ t - 1} & = Q + B C_{t - 1 ∣ t - 1} B^{T} \end{matrix}$

(A3)
Update:

$\begin{matrix} m_{t ∣ t} & = m_{t ∣ t - 1} + K_{t} (y_{t} - A m_{t ∣ t - 1}) \end{matrix}$

(A4)

$\begin{matrix} C_{t ∣ t} & = C_{t ∣ t - 1} - K_{t} A C_{t ∣ t - 1} \end{matrix}$

(A5)

Where

K_{t}

is the Kalman gain and can be computed by the following:

\begin{matrix} S_{t} & = A C_{t ∣ t - 1} A^{T} + R \end{matrix}

(A6)

\begin{matrix} K_{t} & = C_{t ∣ t - 1} A^{T} S_{t}^{- 1} \end{matrix}

(A7)

When estimating hidden states in offline mode (i.e., when the full time series data is available) or when performing the EM algorithm to estimate model parameters, the Kalman smoother is typically used to refine those state estimates using future observations as well (see Algorithm A2).

Algorithm A2 Kalman Smoother

\begin{matrix} m_{t ∣ T} & = m_{t ∣ t} + K_{t ∣ T} (m_{t + 1 ∣ T} - B m_{t ∣ t}) \end{matrix}

(A8)

\begin{matrix} C_{t ∣ T} & = C_{t ∣ t} + K_{t ∣ T} (C_{t + 1 ∣ T} - C_{t + 1 ∣ t}) K_{t ∣ T}^{T} \end{matrix}

(A9)

Where

K_{t ∣ T} = C_{t ∣ t} B^{T} C_{t + 1 ∣ t}^{- 1}

is the Kalman smoothing gain.

Appendix C. EM-Algorithm for State Space Model Estimation

Algorithm A3 EM algorithm

E-Step:

$\begin{matrix} E [x_{t} ∣ y_{1 : T}] & = m_{t ∣ T} \end{matrix}$

(A10)

$\begin{matrix} E [x_{t} x_{t}^{T} ∣ y_{1 : T}] & = C_{t ∣ T} + m_{t ∣ T} m_{t ∣ T}^{T} \end{matrix}$

(A11)

$\begin{matrix} E [x_{t} x_{t - 1}^{T} ∣ y_{1 : T}] & = C_{t, t - 1 ∣ T} + m_{t ∣ T} m_{t - 1 ∣ T}^{T} \end{matrix}$

(A12)
M-Step:

$\begin{matrix} \hat{A} & = (\sum_{t = 1}^{T} y_{t} E {[x_{t} ∣ y_{1 : T}]}^{T}) {(\sum_{t = 1}^{T} E [x_{t} x_{t}^{T} ∣ y_{1 : T}])}^{- 1} \end{matrix}$

(A13)

$\begin{matrix} \hat{R} & = \frac{1}{T} \sum_{t = 1}^{T} (y_{t} y_{t}^{T} - \hat{A} E [x_{t} ∣ y_{1 : T}] y_{t}^{T}) \end{matrix}$

(A14)

$\begin{matrix} \hat{B} & = (\sum_{t = 2}^{T} E [x_{t} x_{t - 1}^{T} ∣ y_{1 : T}]) {(\sum_{t = 2}^{T} E [x_{t - 1} ∣ y_{1 : T}])}^{- 1} \end{matrix}$

(A15)

$\begin{matrix} \hat{Q} & = \frac{1}{T - 1} (\sum_{t = 2}^{T} E (x_{t} x_{t}^{T} ∣ y_{1 : T}) - \hat{A} \sum_{t = 2}^{T} E [x_{t} x_{t - 1}^{T} ∣ y_{1 : T}]) \end{matrix}$

(A16)

$\begin{matrix} m_{1} & = E [x_{1} ∣ y_{1 : T}] \end{matrix}$

(A17)

$\begin{matrix} C_{1} & = E [x_{1} x_{1}^{T} ∣ y_{1 : T}] - E [x_{1} ∣ y_{1 : T}] E {[x_{1} ∣ y_{1 : T}]}^{T} \end{matrix}$

(A18)

Where

m_{t ∣ T}

,

C_{t ∣ T}

will be estimated by the Kalman filter and smoother.

C_{t, t - 1 ∣ T}

can be updated by backward recursions:

C_{t, t - 1 ∣ T} = C_{t ∣ t} K_{t - 1 ∣ T}^{T} + K_{t ∣ T} (C_{t + 1, t ∣ T} - B C_{t ∣ t}) K_{t - 1 ∣ T}^{T}

(A19)

which is initialized by

C_{T, T - 1 ∣ T} = (I - K_{T} A) B C_{T - 1 ∣ T - 1}

.

Appendix D. SIS Particle Filter

Algorithm A4 SIS Particle Filter

Draw $x_{t}^{i} \sim q (x_{t} ∣ x_{t - 1}^{i}, y_{t})$
Assign the particle $x_{t}^{i}$ a weight, $W_{t}^{i} \propto W_{t - 1}^{i} \frac{p (y_{t} ∣ x_{t}^{i}) p (x_{t}^{i} ∣ x_{t - 1}^{i})}{q (x_{t}^{i} ∣ x_{t - 1}^{i}, y_{t})}$

Appendix E. SIR Particle Filter

Algorithm A5 SIR Particle Filter

Draw $x_{t}^{i} \sim p (x_{t} ∣ x_{t - 1}^{i})$
Assign the particle $x_{t}^{i}$ a weight, $W_{t}^{i} \propto p (y_{t} ∣ x_{t}^{i})$
Resampling based on weight $W_{t}^{i}$

Appendix F. Auxiliary Particle Filter

Algorithm A6 Auxiliary Particle Filter

(Resample) Draw $k^{i} \sim M u l t i \{W_{t}^{1}, \dots, W_{t}^{N}\},$ where $W_{t}^{i} \propto p (y_{t} ∣ μ (x_{t - 1}^{i}))$ , and $μ (x_{t - 1}^{i})$ denotes the mean, median, or mode of the distribution of $x_{t}$ given $x_{t - 1}^{i}$ .
(Propagate) Draw ${\tilde{x}}_{t}^{i} \sim p (x_{t} ∣ x_{t - 1}^{k^{i}})$ .
(Resample) Draw $x_{t}^{i} \sim M u l t i \{W_{t}^{1}, \dots, W_{t}^{N}\}$ , where $W_{t}^{t} \propto p (y_{t} ∣ {\tilde{x}}_{t}^{i}) / p (y_{t} ∣ μ (x_{t - 1}^{i}))$ .

Appendix G. Probability Density Function of $p (y_{t} ∣ z_{t - 1})$

The likelihood function

p (y_{t} ∣ z_{t - 1})

is straightforward to derive. Given

z_{t - 1} = \{s_{t - 1}, m_{t - 1}, C_{t - 1}, θ_{t - 1}\}

, and the equation

y_{t} = B (A x_{t - 1} + w_{t}) + v_{t}

, the likelihood function

p (y_{t} ∣ z_{t - 1})

follows the normal distribution

N (m_{y}, C_{y})

, where

\begin{matrix} m_{y} & = B A m_{t - 1} \\ C_{y} & = B (A C_{t - 1} A^{T} + I_{k}) B^{T} + R_{s} \end{matrix}

Appendix H. Probability Density Function of p(x_t | x_t−1, θ_t−1, y_t)

Based on Bayes’ rule, we can factor

p (x_{t} ∣ x_{t - 1}, θ_{t - 1}, y_{t})

as:

\begin{matrix} p (x_{t} ∣ x_{t - 1}, θ_{t - 1}, y_{t}) & \propto & p (y_{t} ∣ x_{t}, θ_{t - 1}) p (x_{t} ∣ x_{t - 1}, θ_{t - 1}) \\ \propto & exp \{- \frac{1}{2} {(y_{t} - B x_{t})}^{T} R_{s}^{- 1} (y_{t} - B x_{t})\} \\ exp \{- \frac{1}{2} {(x_{t} - A x_{t - 1})}^{T} (x_{t} - A x_{t - 1})\} \end{matrix}

Expressing quadratic form to normal form:

\begin{matrix} {(y_{t} - B x_{t})}^{T} R_{s}^{- 1} (y_{t} - B x_{t}) + {(x_{t} - A x_{t - 1})}^{T} (x_{t} - A x_{t - 1}) \\ = {[\begin{matrix} \bar{U} y_{t} - \bar{U} B x_{t} \\ A x_{t - 1} - x_{t} \end{matrix}]}^{T} [\begin{matrix} \bar{U} y_{t} - \bar{U} B x_{t} \\ A x_{t - 1} - x_{t} \end{matrix}] \end{matrix}

where

{\bar{U}}^{T} \bar{U} = R_{s}^{- 1}

. If we let

\bar{V} = [\begin{matrix} \bar{U} y_{t} \\ A x_{t - 1} \end{matrix}]

,

\bar{W} = [\begin{matrix} \bar{U} B \\ I_{k} \end{matrix}]

, then

\begin{matrix} {(\bar{V} - \bar{W} x_{t})}^{T} (\bar{V} - \bar{W} x_{t}) & = & {(\bar{V} - \bar{W} {\hat{x}}_{t})}^{T} (\bar{V} - \bar{W} {\hat{x}}_{t}) \\ + {(x_{t} - {\hat{x}}_{t})}^{T} {\bar{W}}^{T} \bar{W} (x_{t} - {\hat{x}}_{t}) \end{matrix}

where

{\hat{x}}_{t} = {({\bar{W}}^{T} \bar{W})}^{- 1} {\bar{W}}^{T} \bar{V}

. Hence,

\begin{matrix} m_{x} & = & C_{x} (B^{T} R_{s}^{- 1} y_{t} + A x_{t - 1}) \\ C_{x}^{- 1} & = & B^{T} R_{s}^{- 1} B + I_{k} \end{matrix}

Appendix I. Conjugate Prior of Categorical Distribution

The Dirichlet distribution, which is a multivariate generalization of the Beta distribution, is usually used as the conjugate prior distribution of the categorical distribution (also known as multinomial distribution) (See Murphy, 2014). The probability density function of the Dirichlet distribution is defined as:

D i r (x ∣ α) = \frac{1}{B (α)} \prod_{i = 1}^{K} x_{i}^{α_{i} - 1}

where

x = \{x_{1}, \dots, x_{K}\}

lives in a K (

K \geq 2

) dimensional probability simplex with constraints

\sum_{i = 1}^{K} x_{i} = 1

, and

x_{i} > 0 for all i \in [1, K]

.

α = \{α_{1}, \dots, α_{K}\}

is called concentration parameters with constraint

α_{i} > 0 for all i \in [1, K]

. Assume we have a set of data

y = \{y_{1}, \dots, y_{T}\}

generated from Categorical distribution with parameters

p = \{p_{1}, \dots, p_{K}\}

,

\sum_{i = 1}^{K} p_{i} = 1

and

p_{i} > 0

. Then we can represent likelihood as:

p (y ∣ p) = \prod_{i = 1}^{K} p_{i}^{N_{i}}

where

N_{i} = \sum_{t = 1}^{T} δ (y_{t} = i)

, and

δ (\cdot)

is the Dirac Delta function. Since the parameter vector

p

also lives in a K dimensional probability simplex, and the Dirichlet distribution belongs to exponential family, so it becomes a natural choice of conjugate prior for parameter

p

. If we assume prior distribution for

p

as:

D i r (p ∣ α) = \frac{1}{B (α)} \prod_{i = 1}^{K} p_{i}^{α_{i} - 1}

Hence, the posterior is also Dirichlet:

\begin{matrix} p (p ∣ y) & \propto & D i r (p ∣ α) p (y ∣ p) \\ \propto & \prod_{k = 1}^{K} p_{i}^{N_{i}} p_{i}^{α_{i} - 1} \\ \propto & \prod_{k = 1}^{K} p_{i}^{α_{i} + N_{i} - 1} \\ = & D i r (p ∣ α_{1} + N_{1}, \dots, α_{K} + N_{K}) \end{matrix}

In other words, the posterior sufficient statistics

{\tilde{α}}_{i}

can be updated by simply adding the empirical counts

N_{i}

. It is easy to show the posterior mode (MAP estimate) is given by:

{\tilde{p}}_{i} = \frac{α_{i} + N_{i} - 1}{α_{0} + N - K}

where

α_{0} = \sum_{i = 1}^{K} α_{i}

,

N = \sum_{i = 1}^{K} N_{i}

. At the meanwhile, the posterior mean is given by:

{\hat{p}}_{i} = \frac{α_{i} + N_{i}}{α_{0} + N}

References

Andrieu, C., Doucet, A., & Holenstein, R. (2010). Particle markov chain monte carlo methods. Journal of the Royal Statistical Society Series B: Statistical Methodology, 72, 269–342. [Google Scholar] [CrossRef]
Arulampalam, M., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing, 50, 174–188. [Google Scholar] [CrossRef]
Avellaneda, M., & Lee, J. H. (2010). Statistical arbitrage in the US equities market. Quantitative Finance, 10, 761–782. [Google Scholar] [CrossRef]
Bai, J., & Wang, P. (2014). Identification and bayesian estimation of dynamic factor models. Journal of Business & Economic Statistics, 33, 221–240. [Google Scholar] [CrossRef]
Barra. (2004). Barra risk model handbook. MSCI. [Google Scholar]
Carlin, B. P., & Polson, N. G. (1991). Inference for nonconjugate Bayesian models using the Gibbs sampler. Canadian Journal of Statistics, 19, 399–405. [Google Scholar] [CrossRef]
Carter, C. K., & Kohn, R. (1994). On Gibbs sampling for state space models. Biometrika, 81, 541–553. [Google Scholar] [CrossRef]
Carvalho, C. M., Johannes, M. S., & Lopes, H. F. (2010). Particle learning and smoothing. Statistical Science, 25, 88–106. [Google Scholar] [CrossRef]
Del Moral, P. (2004). Feynman-kac formulae: Genealogical and interacting particle systems with applications. Springer. [Google Scholar]
Djuric, P. M., Khan, M., & Johnston, D. E. (2012). Particle filtering of stochastic volatility modeled with leverage. IEEE Journal of Selected Topics in Signal Processing, 6, 327–336. [Google Scholar] [CrossRef]
Doucet, A., Godsill, S., & Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10, 197–208. [Google Scholar] [CrossRef]
Doucet, A., & Johansen, A. M. (2009). A tutorial on particle filtering and smoothing: Fifteen years later. Oxford Handbook of Nonlinear Filtering, 12, 656–704. [Google Scholar]
Forni, M., Giannone, D., Lippi, M., & Reichlin, L. (2009). Opening the black box: Structure factor models with large cross sections. Econometric Theory, 25(5), 1319–1347. [Google Scholar] [CrossRef]
Geweke, J., & Zhou, G. (1996). Measuring the price of the arbitrage pricing theory. The Review of Financial Studies, 9(2), 557–587. [Google Scholar] [CrossRef]
Ghahramani, Z., & Hinton, G. E. (1996). Parameter estimation for linear dynamical systems (Technical report CRG-TR-96-2). University of Toronto, Department of Computer Science.
Gordon, N., Salmond, D., & Smith, A. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings F Radar and Signal Processing, 140, 107. [Google Scholar] [CrossRef]
Hinton, G. E., & Dayan, P. (1997). Modeling the manifolds of images of handwritten digits. IEEE Transactions on Neural Networks, 8, 65–74. [Google Scholar] [CrossRef] [PubMed]
Jasra, A., Singh, S. S., Martin, J. S., & McCoy, E. (2010). Filtering via approximate Bayesian computation. Statistics and Computing, 22, 1223–1237. [Google Scholar] [CrossRef]
Jolliffe, I. T. (2004). Principal component analysis. Springer. [Google Scholar]
Kantas, N., Doucet, A., Singh, S. S., Maciejowski, J., & Chopin, N. (2015). On particle methods for parameter estimation in state-space models. Statistical Science, 30, 328–351. [Google Scholar] [CrossRef]
Khandani, A. E., & Lo, A. W. (2008). What happened to the quants in August 2007? Evidence from factors and transactions data. Journal of Financial Markets, 14, 1–46. [Google Scholar] [CrossRef]
Kim, C. J., & Halbert, D. C. R. (1999). State-space models with regime switching: Classical and Gibbs-sampling approaches with applications. The MIT Press. [Google Scholar]
Kolm, P. N., & Ritter, G. (2014). Multiperiod portfolio selection and Bayesian dynamic models. Risk, 28(3), 50–54. [Google Scholar] [CrossRef]
Liu, J., & West, M. (2001). Combined parameter and state estimation in simulation-based filtering. In Sequential monte carlo methods in practice (pp. 197–223). Springer. [Google Scholar]
Lopes, H. F., & Tsay, R. S. (2010). Particle filters and Bayesian inference in financial econometics. Journal of Forecasting, 30, 168–209. [Google Scholar] [CrossRef]
McLachlan, G., & Peel, D. (2000). Finite mixture models. Wiley. [Google Scholar]
Murphy, K. P. (2014). Machine learning: A probabilistic perspective. MIT Press. [Google Scholar]
Özkan, E., Šmïdl, V., Saha, S., Lundquist, C., & Gustafsson, F. (2013). Marginalized adaptive particle filtering for nonlinear models with unknown time-varying noise parameters. Automatica, 49, 1566–1575. [Google Scholar] [CrossRef]
Peña, D., & Poncela, P. (2004). Forecasting with nonstationary dynamic factor models. Journal of Econometrics, 119, 291–321. [Google Scholar] [CrossRef]
Peña, D., & Poncela, P. (2006). Nonstationary dynamic factor analysis. Journal of Statistical Planning and Inference, 136, 1237–1257. [Google Scholar] [CrossRef]
Pitt, M. K., & Shephard, N. (1999). Filering via simulation: Auxiliary particle filters. Journal of the American Statistical Association, 94, 590–599. [Google Scholar] [CrossRef]
Rubin, D. B., & Thayer, D. T. (1982). EM algorithms for ML factor analysis. Psychometrika, 47, 69–76. [Google Scholar] [CrossRef]
Schon, T., Gustafsson, F., & Nordlund, P. J. (2005). Marginalized particle filters for mixed linear/nonlinear state-space models. IEEE Transactions on Signal Processing, 53, 2279–2289. [Google Scholar] [CrossRef]
Schoutens, W. (2005). Lévy processes in finance: Pricing financial derivatives. Wiley. [Google Scholar]
Storvik, G. (2002). Particle filters for state-space models with the presence of unknown static parameters. IEEE Transactions on Signal Processing, 50, 281–289. [Google Scholar] [CrossRef]
Urteaga, I., Bugallo, M. F., & Djurić, P. M. (2017). Sequential Monte Carlo for inference of latent ARMA time-series with innovations correlated in time. EURASIP Journal on Advances in Signal Processing, 2017, 84. [Google Scholar] [CrossRef]
Varga, K., & Szendrei, T. (2024). Non-stationary financial risk factors and macroeconomic vulnerability for the UK. International Review of Financial Analysis, 97(C), 103866. [Google Scholar] [CrossRef]
Whiteley, N., Andrieu, C., & Doucet, A. (2010). Efficient Bayesian inference for switching state-space models using discrete particle Markov Chain Monte Carlo methods. arXiv, arXiv:1011.2437. [Google Scholar] [CrossRef]

Figure 1. Regime detection.

Figure 2. Idiosyncratic Risk Estimation.

Figure 3. Cumulative expected return tracking.

Figure 4. Equity stat-arb performance comparison.

Figure 5. (a) Effective sample size at each day when running particle filter. (b) Regime detection when using regime switching particle filter.

Table 1. Sharpe ratio across alternative setups.

	1 Factor	2 Factors	3 Factors	4 Factors
static factor model	0.6011	0.4524	0.5126	0.5710
dynamic factor model	0.5966	0.7080	0.6546	0.5869
2 regime factor model	0.8312	0.7453	0.8374	0.9512
3 regime factor model	0.9184	0.8102	0.8617	0.8266

Table 2. Maximum drawdown across alternative setups.

	1 Factor	2 Factors	3 Factors	4 Factors
static factor model	−8.59%	−9.88%	−7.46%	−7.52%
dynamic factor model	−13.76%	−12.47%	−12.16%	−12.04%
2 regime factor model	−7.62%	−7.52%	−6.57%	−7.33%
3 regime factor model	−7.93%	−7.15%	−8.19%	−9.37%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mu, Y.; Frey, R.J. Particle Filtering Estimation of Regime Switching Factor Model and Its Application in Statistical Arbitrage Strategy. J. Risk Financial Manag. 2025, 18, 549. https://doi.org/10.3390/jrfm18100549

AMA Style

Mu Y, Frey RJ. Particle Filtering Estimation of Regime Switching Factor Model and Its Application in Statistical Arbitrage Strategy. Journal of Risk and Financial Management. 2025; 18(10):549. https://doi.org/10.3390/jrfm18100549

Chicago/Turabian Style

Mu, Yu, and Robert J. Frey. 2025. "Particle Filtering Estimation of Regime Switching Factor Model and Its Application in Statistical Arbitrage Strategy" Journal of Risk and Financial Management 18, no. 10: 549. https://doi.org/10.3390/jrfm18100549

APA Style

Mu, Y., & Frey, R. J. (2025). Particle Filtering Estimation of Regime Switching Factor Model and Its Application in Statistical Arbitrage Strategy. Journal of Risk and Financial Management, 18(10), 549. https://doi.org/10.3390/jrfm18100549

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Particle Filtering Estimation of Regime Switching Factor Model and Its Application in Statistical Arbitrage Strategy

Abstract

1. Introduction

2. State Space Model and Its Estimation

2.1. Kalman Filter and EM-Algorithm Estimation

2.2. Particle Filter

2.3. Particle Filter with Parameter Learning

2.3.1. Particle Learning

3. Statistical Factor Analysis

3.1. MLE Factor Analysis

3.2. Mixture of Factor Analyzer

3.3. Regime-Switching Dynamic Factor Model

3.3.1. Methods

3.3.2. Simulation Analysis

4. Results in Statistical Arbitrage Strategy

4.1. Market Neutral Strategy

4.2. Performance Comparison

5. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Marginal Likelihood

Appendix B. Kalman Filter and Smoother

Appendix C. EM-Algorithm for State Space Model Estimation

Appendix D. SIS Particle Filter

Appendix E. SIR Particle Filter

Appendix F. Auxiliary Particle Filter

Appendix G. Probability Density Function of p y t ∣ z t − 1

Appendix H. Probability Density Function of p(xt | xt−1, θt−1, yt)

Appendix I. Conjugate Prior of Categorical Distribution

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix G. Probability Density Function of $p (y_{t} ∣ z_{t - 1})$

Appendix H. Probability Density Function of p(x_t | x_t−1, θ_t−1, y_t)