Next Article in Journal
Development of Financial Indicator Set for Automotive Stock Performance Prediction Using Adaptive Neuro-Fuzzy Inference System
Previous Article in Journal
Hot-Hand Belief and Loss Aversion in Individual Portfolio Decisions: Evidence from a Financial Experiment
Previous Article in Special Issue
DASF-Net: A Multimodal Framework for Stock Price Forecasting with Diffusion-Based Graph Learning and Optimized Sentiment Fusion
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

PortRSMs: Learning Regime Shifts for Portfolio Policy

by
Bingde Liu
* and
Ryutaro Ichise
*
Department of Industrial Engineering and Economics, School of Engineering, Institute of Science Tokyo, Tokyo 152-8550, Japan
*
Authors to whom correspondence should be addressed.
J. Risk Financial Manag. 2025, 18(8), 434; https://doi.org/10.3390/jrfm18080434
Submission received: 17 July 2025 / Revised: 1 August 2025 / Accepted: 2 August 2025 / Published: 5 August 2025
(This article belongs to the Special Issue Machine Learning Applications in Finance, 2nd Edition)

Abstract

This study proposes a novel Deep Reinforcement Learning (DRL) policy network structure for portfolio management called PortRSMs. PortRSMs employs stacked State-Space Models (SSMs) for the modeling of multi-scale continuous regime shifts in financial time series, striking a balance between exploring consistent distribution properties over short periods and maintaining sensitivity to sudden shocks in price sequences. PortRSMs also performs cross-asset regime fusion through hypergraph attention mechanisms, providing a more comprehensive state space for describing changes in asset correlations and co-integration. Experiments conducted on two different trading frequencies in the stock markets of the United States and Hong Kong show the superiority of PortRSMs compared to other approaches in terms of profitability, risk–return balancing, robustness, and the ability to handle sudden market shocks. Specifically, PortRSMs achieves up to a 0.03 improvement in the annual Sharpe ratio in the U.S. market, and up to a 0.12 improvement for the Hong Kong market compared to baseline methods.

1. Introduction

High-frequency trading often relies on the modeling of the distribution of short-term asset returns. Many studies have shown the serial correlation of volatility in asset returns (LeBaron, 1992; Shiller, 1990), indicating that the distribution of asset returns may exhibit consistency over a period of time. Basic time-series models can effectively capture these properties (Bauwens et al., 2006; Bollerslev et al., 1994). However, due to structural changes, the statistical characteristics of asset price series can completely change from one period to another. For example, after policy or macroeconomic shocks, the volatility of stock prices may undergo drastic changes, leading to the failure of basic time-series models. On the other hand, Regime Shift Model (RSM) paradigms address the shortcomings in basic time-series modeling by dividing time series into different states, effectively dealing with such shocks. Therefore, short-term asset return distributions are widely modeled by RSMs (Cai, 1994; Haas et al., 2004; So et al., 1998).
Optimized in the Deep Reinforcement Learning (DRL) formulation, the portfolio policy network aims to generate an effective policy to guide high-frequency portfolio rebalancing trading strategies (Jiang & Liang, 2017). It is essential to let the portfolio policy network model the distribution of asset returns in each trading period. In previous work, modeling was often done by neural network models (Jiang & Liang, 2017; X. Li et al., 2022; Wang et al., 2021; Xu et al., 2021). These methods can effectively extract consistent distribution properties in a short period, but they do not follow the RSM modeling paradigm, making hem sub-optimal in financial series modeling. Constructing a policy network withing the RSM paradigm has been challenging due to the lack of previous research using deep neural networks, which is essential for DRL.
With the recent breakthroughs in State-Space Model (SSM) research in the field of deep learning (Gu et al., 2020, 2022; Schiff et al., 2024), we can now use neural networks to model regime shifts in financial series. This enables us to come up with new policy network designs. In our work, we use stacked SSMs to model multi-scale continuous regime shifts present in financial time series, serving as the backbone of the DRL policy network. This method excels at balancing the exploration of consistent distribution properties over short periods and sensitivity to sudden shocks in price sequences. We also perform regime fusion between different assets through hypergraph attention mechanisms (HGAMs) (X. Li et al., 2022), providing a more comprehensive state space for describing changes in asset correlations and co-integration. These features give our method better performance compared to previous methods. We call our method PortRSMs, which is the abbreviation of “Portfolio RSMs”.
Our contributions are summarized as follows:
  • We propose a new portfolio policy network structure with an RSM paradigm. The new structure can model regime shifts present in financial time series to strike a balance between exploring consistent distribution properties over short periods and maintaining sensitivity to sudden shocks for better portfolio decision-making.
  • We propose a method for cross-asset regime fusion through HGAM, providing a more comprehensive state space for describing changes in asset correlations and co-integration.
  • We conducted experiments on two different trading frequencies in the United States and Hong Kong stock markets. The experimental results showed the superiority of our method compared with other methods in terms of profitability, risk–return balancing, robustness, and ability to deal with sudden market shocks.
The remainder of this paper is organized as follows. Section 2 reviews related work in financial time-series modeling and deep reinforcement learning. Section 3 introduces the DRL framework for portfolio management and establishes the key mathematical notations used throughout the paper. Section 4 presents our proposed PortRSMs method, including the formulation of regime shift modeling and the regime fusion mechanism. Section 5 reports and analyzes the experimental results for multiple stock markets under different trading frequencies. Finally, Section 6 concludes the paper and discusses future research directions.

2. Related Work

Markowitz first introduced modern portfolio theory to design portfolios of assets with fixed weights using mean-variance analysis (Markowitz, 1952). The general portfolio algorithms are portfolio selection algorithms that rebalance the portfolio at the end of each trading period rather than using fixed weights. The general portfolio algorithms can be roughly divided into “follow the winner” (Agarwal et al., 2006; Helmbold et al., 1998), “follow the loser” (Borodin et al., 2003; Lai et al., 2018; B. Li & Hoi, 2012; B. Li et al., 2011b, 2012), and “pattern matching” (Györfi et al., 2006; B. Li et al., 2011a). Traditional strategies have a good explanatory and mathematical foundation, but they achieve suboptimal results in the long run, for they fail to model the complex dynamics of financial markets.
DRL approaches are a series of extremely strong “pattern matching” strategies that leverage the strong ability of deep learning for feature representation and pattern recognition. They have attracted significant attention in recent years. The EIIE methods (Jiang & Liang, 2017) first proposed a general framework to apply DRL for portfolio management, and they initially used the Temporal Convolutional Network (TCN) (Lea et al., 2017) and Long Short-Term Memory (LSTM) model (Hochreiter & Schmidhuber, 1997) as the policy network structure. The RAT (Xu et al., 2021) method first used the Transformer (Vaswani et al., 2017) structure as the policy network structure to extract complex information from financial time series. Those structures have also become commonly used in follow-up work (J. Li et al., 2023; X. Li et al., 2022; Wang et al., 2021). However, the above-mentioned structures are not exported by RSMs. We start with continuous-time RSMs, derive a method using SSMs (Gu et al., 2020, 2022; Schiff et al., 2024) as the policy network structure, and optimize it according to the specific needs of portfolio management tasks.

3. DRL for Portfolio Management

In this section, we briefly introduce the DRL method for the portfolio management problem, which provides the foundation for our work and establishes key mathematical symbols. Our formulation follows the foundational framework introduced in EIIE (Jiang & Liang, 2017) for DRL-based portfolio management, sharing all constraints.

3.1. Action

The portfolio vector represents the weights distributed to each asset. Let
w k = ( w k , 0 , w k , 1 , w k , 2 , , w k , m ) R + m + 1 [ 0 , 1 ] , s . t . i = 0 m w k , i = 1
be the portfolio vector before trading period k, where i = 0 , 1 , 2 , , m indexes the assets and k N + indexes the trading periods. Note that w k , 0 indicates the weight allocated to the risk-free asset (e.g., cash). w k directly defines the action to take in trading period k in the framework. w k then transitions to w k R + m + 1 [ 0 , 1 ] , i.e., the portfolio vector after period k, due to changes in the prices of the assets.

3.2. State

In the framework, the state comprises market environment features and current asset holdings. Research in investment science indicates that due to market inefficiency, asset price data over a historical period has a certain predictive power for future price changes (Bustos & Pomares-Quimbaya, 2020; Jegadeesh & Titman, 2023). Therefore, the most intuitive market features are asset price time series. Asset price time series are sampled with equal intervals, using techniques like candlestick charts. Let t : = k T N + represent the sample timestamps, where T N + represents how many timestamps there are in a trading period. The portfolio vector after period k 1 , represented as w k 1 , is also included in the state, since adjusting it to w k incurs transaction fees. In summary, the state in the framework is s t = ( P k , w k 1 ) , where
  • P k : = ( P k T l + 1 , , P k T ) R + l × ( m + 1 ) × 4 are the l N + latest samples in the price series in trading period k;
  • P t : = ( P t , 0 , , P t , m ) R + ( m + 1 ) × 4 represents the price samples of all assets at timestamp t;
  • P t , i : = ( p t , i O , p t , i H , p t , i L , p t , i C ) R + 4 indicates the opening, highest, lowest, and closing prices of asset i in the sample with timestamp t.
Initially, w 0 = ( 1 , 0 , 0 , , 0 ) , i.e., the situation where all funds are in the risk-free asset.

3.3. State Transition Function

Let r k , i R + be the price change ratio of asset i in trading period k, and let r k : = ( r k , 1 , r k , 2 , , r k , m ) R + m + 1 represent the price change ratios of all assets in trading period k, where r k , i : = p ( k + 1 ) T , i C p k T , i C represents the price change ratio of a certain asset. The price change ratios ( r k ) can be predicted from P k ; namely, there exists a probability model, i.e., T ( r k | P k ) . Given r k , the portfolio vector ( w k ) deterministically becomes w k . Concurrently, the new asset price series ( P k ) generated deterministically by the price changes becomes observable. The state transition model ( T ( s k + 1 | s k , a k ) : = T ( ( P k , w k ) | ( P k , w k 1 ) , w k ) ) can then be readily obtained from the probability model ( T ) and all the aforementioned deterministic relationships.

3.4. Reward Function

Under portfolio vector w k , the return generated by the price changes ( r k ) is rewarded. Note that the transaction fees incurred as a result of adjusting the portfolio from w k 1 to w k should be deducted from the reward. Let the transaction fee ratio be c R [ 0 , 1 ] ; then, the return can be calculated as follows:
r t : = w k T r k c | | w k w k 1 | | 1 ,
where the first term is the initial return and the second term represents the trading costs incurred from transaction fees. The formulation of the reward function in trading period k is R ( s k , a k ) = log ( r k + 1 ) , where a k = w k . Here, the return is taken as the logarithm to ensure the additivity of the reward function over time.

3.5. Deterministic Policy Gradient

In deterministic policy gradient algorithms, the policy network ( π θ ( s ) ) is a neural network that maps the current state to an action (Silver et al., 2014). The policy network is parameterized by θ , which refers to the trainable weights and biases of the neural network. During training, the reward function is differentiated directly, and gradient ascent is applied to update θ with a learning rate of α R + , according to θ θ + α R ( s , π θ ( s ) ) , where R ( s , π θ ( s ) ) denotes the gradient of the reward function with respect to the policy parameters ( θ ).

4. PortRSMs

In this section, we introduce how we establish the PortRSMs method step by step, including the mathematical form of RSMs and regime fusion, as well as the portfolio weight-generating method. Figure 1 is a data flow diagram providing an overview of PortRSMs.

4.1. SSMs

Recent research used time-series modeling with SSMs to describe regime shifts (Gu et al., 2020). For continuous-time signals, SSMs have the following form:
u ( t ) = A u ( t ) + B x ( t ) , ( T r a n s i t i o n M o d e l ) y ( t ) = C u ( t ) + D x ( t ) , ( E m i s s i o n M o d e l ) w h e r e , u ( t ) R h × 1 , x ( t ) R d × 1 , y ( t ) R d × 1 , A R h × h , B R h × d , C R d × h , D R d × d
where u ( t ) represents the hidden state with hidden dimension h R + . x ( t ) describes the observable instantaneous information at time t, while y ( t ) is the instantaneous information that cannot be observed up to time t. Information is described with the d R + dimension vector. In addition to A for describing inherent transitions of hidden states, the transition model also uses matrix B to describe how current observable instantaneous information affects hidden states. The emission model then maps u ( t ) and x ( t ) to current y ( t ) through matrices C and D .
The follow-up research improved SSMs. On one hand, A and D are considered trainable neural network parameters (Gu et al., 2022). On the other hand, B and C are considered time-variant parameters ( B t and C t , respectively)to improve performance (Schiff et al., 2024). This makes SSMs a linear time-variant system where B t and C t are obtained through nonlinear transformation:
B t = δ ( Γ B x ( t ) ) , C t = δ ( Γ C x ( t ) ) , w h e r e , Γ B , Γ C R h × d
where δ ( . ) is an SiLU activation function1 and Γ B and Γ C are learnable projection matrices.

4.2. RSMs in Price Series

RSMs, based on the improvement of hidden Markov models, are widely used in modeling asset price change ratio distributions (Cai, 1994; Haas et al., 2004; So et al., 1998). They take instantaneous prices as the input information and the instantaneous price change ratios as the output.
In our work, we use SSMs to model the continuous regime shifts from the discretized price series. Therefore, RSMs are defined as follows:
x t , i = δ ( Γ P P t , i ) , u t , i = A ˜ t , i u t 1 , i + B ˜ t , i x t , i , y k , i = δ ( Γ C x t , i ) u t , i + D x t , i , s . t . t = k T , A ˜ t , i = e x p ( Δ t , i A ) , B ˜ t , i = ( Δ t , i A ) 1 ( e x p ( Δ t , i A t , i ) I ) Δ t , i B t , i , B t , i = δ ( Γ B x t , i ) , Δ t , i = δ ( Γ Δ x t , i ) w h e r e . Γ P R d × 4 , Γ Δ R h × d
where Γ P is a projection matrix to restore input information ( x t , i ) from P t , i , as defined in Section 3.2, while y k , i is a descriptor vector for the distribution characteristics of r k , i , as defined in Section 3.3. A ˜ τ , i and B ˜ τ , i represent discretized versions of A and B τ , i . In the case of zero-order hold sampling, the relationships between A ˜ τ , i , B ˜ τ , i and A , B τ , i have been shown in existing research (Gu et al., 2022; Schiff et al., 2024). Δ τ , i describes the time scale that one sample interval maps to in continuous perspective. Note that Δ τ , i is also a time-variant parameter obtained from x t , i by the non-linear transformation by projection matrix Γ Δ and the activation function. This property further enhances its ability to model the widespread fractal properties in financial time series (Evertsz, 1995; Ni et al., 2011; Peters, 1989). The time-variant Δ τ , i can map different dynamics from discrete perspectives to the same dynamics in continuous perspectives to model the continuous regime shifts. During training, optimization is carried out for parameters A , D , μ i , Γ B , Γ C , Γ Δ , and Γ P .
Notice that the absolute price ranges of different assets at different times may vary greatly. On the one hand, this can lead to similar dynamics being considered completely different, further causing a decrease in training data efficiency. On the other hand, it can result in uneven gradient values, making convergence difficult during training. Therefore, data normalization is important in data pre-processing. We adopt the same approach used in previous work (Jiang & Liang, 2017; Xu et al., 2021), using the latest close price in the state representation defined in Section 3.2, i.e., p k T C , as the denominator to normalize all price data in P k . To apply this method, in each trading period (k), the SSM is re-executed on P k , i.e., all samples of the latest l timestamps. The calculation can be performed in a high-speed way on modern hardware by converting recursion to convolution (Schiff et al., 2024). In each calculation, u k T l , i is initialized as a zero vector.

4.3. Stacked SSMs

To further improve performance, considering the existence of multiple regime shifts of different scales in the financial time series, we use stacked SSMs to model them. The formulation of RSMs modeled by N-layer SSMs is expressed as follows:
x t , i 1 = δ ( Γ P P t , i ) , u t , i 1 = A ˜ t , i 1 u t , i 1 + B ˜ t , i 1 x t , i 1 , x k , i 2 = y t , i 1 = δ ( Γ C 1 x t , i ) u t , i 1 + D t , i 1 x t , i 1 , u t , i 2 = A ˜ t , i 2 u t , i 2 + B ˜ t , i 2 x t , i 2 , x t , i 3 = y t , i 2 = δ ( Γ C 2 x t , i ) u t , i 2 + D t , i 2 x t , i 2 , u t , i N = A ˜ t , i N u t , i N + B ˜ t , i N x t , i N , y k , i N = y t , i N = δ ( Γ C N x t , i ) u t , i N + D t , i N x t , i N , s . t . t = k T , A ˜ t , i n = e x p ( Δ t , i n A n ) , B ˜ t , i n = ( Δ t , i n A n ) 1 ( e x p ( Δ t , i n A t , i n ) I ) Δ t , i n B t , i n , B t , i n = δ ( Γ B n x t , i n ) , Δ t , i n = δ ( Γ Δ n x t , i n ) , n = 1 , 2 , , N
Note that parameters are not shared between layers. In this case, we take the output ( y t , i n 1 ) of the n 1 th SSM layer as the input ( x t , i n ) of the nth SSM layer. Each SSM layer can model regime shifts over the regimes modeled by the previous layer on a more abstract scale, thereby mining more stable patterns and longer dependencies.

4.4. Hypergraph Attention for Cross-Asset Regime Fusion

So far, we have only discussed the situation of individual assets. In the portfolio management task, multiple asset price series should be modeled simultaneously. The correlation and co-integration between these series can shift over time. Modeling these properties has been shown to be crucial for the learning of portfolio policy in existing work (X. Li et al., 2022; Shi et al., 2022; Soleymani & Paquet, 2021; Xu et al., 2021).
We include considerations of relevance and co-integration in the RSMs by utilizing HGAM (X. Li et al., 2022). The HGAM aims to learn the differing importance of asset neighbors for information merging with the attention mechanism (Vaswani et al., 2017). For stock i and its neighbor ( j = 0 , 1 , , m ), quantifying the degree to which i is related to j based on the output ( y k , i n ) of the nth SSM layer, i.e.,
D ( y k , i n , y k , j n ) = δ ( b n [ Γ R n y k , i n , Γ R n y k , j n ] ) ,
where Γ R n R d × 1 is a projection matrix to be learned, b n R 1 × d is a shared attention vector, d is the model dimension defined in Section 4.2, [ . ] denotes the concatenation operation, δ ( . ) denotes a nonlinear activation function like LeakyReLU2. Then, the softmax function is applied to obtain the importance weight ( α i j ):
α i j = exp ( D ( y k , i n , y k , j n ) ) v = 0 , 1 , , m exp ( D ( y k , i n , y k , v n ) ) .
After that, y t , i n is aggregated across assets as follows:
y k , i n δ j = 0 , 1 , , m α i j P n y k , j n
In the case of stacked SSMs, information from different assets is aggregated layer by layer, thereby achieving cross-asset regime fusion, providing more comprehensive state spaces for the description of changes in asset correlations and co-integration.

4.5. Portfolio Generation

The last SSM-layer output ( y k , i N R d × 1 ) is a descriptor of r k , i distribution characteristics. T , as defined in Section 3.3, can be established following previous practices (Jiang & Liang, 2017; Xu et al., 2021). The formulation is expressed as follows:
w k = S o f t m a x ( [ b , ( Γ w [ w k 1 , y k ] ) ] ) , w h e r e . y k = ( y k , 1 N , y k , 2 N , , y k , m N ) R d × m , w k 1 R 1 × m , b R 1 × 1 , Γ w R ( d + 1 ) × 1
where [.] denotes the concatenation operation in the first dimension. After concatenating the distribution descriptors and portfolio vector ( w k 1 , without risk-free asset weights), the absolute score of each asset can be obtained by applying a projection matrix ( Γ w ). Finally, w k can be generated from absolute scores by concatenating a cash bias (b) and applying a softmax function on the first dimension.

5. Experiments

5.1. Experimental Settings

5.1.1. Datasets

We conducted experiments with two different trading frequencies (i.e., 1 day and 1 week as the duration of the trading period) in the United States and Hong Kong stock markets. Specifically, we used the Yahoo Finance API (Yahoo, 2025) and AKShare API (King, 2019) to collect data on the Dow Jones Industrial Average (DJIA) and Hang Seng Index (HSI) constituents from 1 January 2004 to 1 January 2024. The data are sampled in daily frequency, which means T = 1 when the trading period is 1 day and T = 5 when the trading period is 1 week. Stocks with more than 70% missing data were excluded. When reporting the experimental results, we use DJIA1d/DJIA1w and HSI1d/HSI1w to represent the names of the dataset. Furthermore, 1d means 1 day as the duration of the trading period, while 1w means 1 week. For trading fees, we used an industry-standard round-trip cost of 0.06%. The training set and test set are divided chronologically in a ratio of 4:1.

5.1.2. Methods for Comparison

We use the constant rebalanced portfolio (CRP) and buy-and-hold (BAH) method as a benchmark. CRP allocates equal funds to all assets at all times, representing average market performance. BAH allocates equal funds to all assets in the first trading period, without any further trading actions. We also compare our method with traditional portfolio management algorithms (EG (Helmbold et al., 1998), OLMAR (B. Li & Hoi, 2012), RMR (Huang et al., 2016), BNN (Györfi et al., 2006), and CORN (B. Li et al., 2011a)), as well as state-of-the-art DRL-based methods with different policy network structures (bRNN EIIE (Jiang & Liang, 2017), CNN EIIE (Jiang & Liang, 2017), RAT (Xu et al., 2021), HGAM (X. Li et al., 2022), and LSRE-CAAN (J. Li et al., 2023)).

5.1.3. Evaluation Metric

Following previous works (Jiang & Liang, 2017; Wang et al., 2021; Xu et al., 2021), we use three metrics to evaluate performance: the annualized return (AR), annualized Sharpe ratio (ASR), and annualized Calmar ratio (ACR). AR measures compound annual portfolio growth over a period, with a higher AR indicating profitability. ASR measures volatility-adjusted return, with a higher ASR reflecting better risk–return balancing. ACR is a risk–return balancing metric similar to ASR that measures return adjusted by the maximum draw-down in the profit. We used five different random seeds { 0 , 64 , 128 , 256 , 512 } for each experimental group to conduct repeated experiments and reported the mean and standard deviation of the results to reduce the randomness of the results and study the robustness of training. We plot the graph of cumulative log return over time in the testing period for qualitative analysis.

5.1.4. Implementation Details

Training was performed using the Adam optimizer on a single NVIDIA RTX A6000 GPU, with the learning rate set to α = 1.92 × 10 5 and a batch size of 32. We used a model dimension of d = 36 and hidden dimension of h = 16 . By default, we use an SSM layer number of N = 2 and sample count of l = 50 . Cross-asset regime fusion is also applied by default.

5.2. Ablation Studies

5.2.1. Modeling Paradigm

We conducted experiments by replacing the SSM module with LSTM (Hochreiter & Schmidhuber, 1997), TCN (Lea et al., 2017), and Transformer (Vaswani et al., 2017) decoder modules with the same model dimension. As for the quantitative results, we present the AR in Table 1, ASR in Table 2, and ACR in Table 3. SSM achieves the best performance on most datasets with most of the time-series samples (l).
Insights into the performance of different modules can be gained based on how their performance changes with l and the stability of training performance with different random seeds. Note that when using the LSTM module, although the algorithm’s performance remains stable for most of the l, it cannot surpass our SSMs’ performance, for it has sub-optimal modeling paradigms relative to RSMs. TCN shows a larger variance in performance metrics compared to other modules when trained with different random seeds, likely due to the assignment of equal weights to each sample during learning, which can lead to overfitting on noise. The Transformer’s performance is highly unstable as l varies, especially on the HSI dataset, where its performance sharply declines as l increases because the attention mechanism tends to focus on over-long dependencies during training, affecting sensitivity toward sudden shocks.
With the RSM paradigm, the SSM module exhibits better stability as l changes and lower variance when trained with different random seeds compared to other models, indicating a balance between the extraction of consistent distribution properties and sensitivity towards sudden shocks.
Figure 2 and Figure 3 show a performance comparison of different modules from a qualitative analysis perspective. The SSM module has a stable and rapidly growing return curve in most periods compared with other modules. Specifically, as shown in Figure 3, during the COVID outbreak period, when overall market prices were falling rapidly, the SSM module did not incur excessive losses, and during periods of price rebounding, the SSM module quickly identified profit opportunities in the market, resulting in higher returns than other methods.

5.2.2. Stacked SSMs

In the ablation experiment, we analyze the role of each component in the proposed algorithm. First, we compare the effect of the SSM module with the RSM paradigm and other existing time-series modeling modules with other paradigms by replacing each layer with other modules with the same model dimension. Secondly, we study the effect of stacked SSMs by changing the number of SSM layers. Finally, we compare the performance of algorithms with and without cross-asset regime fusion to illustrate its effectiveness.
We study the relationship between different numbers of SSM layers and performance when the time-series sample count (l) varies. As for the performance metrics, we present the AR in Table 4, ASR in Table 5, and ACR in Table 6. Note that on the DJIA1d and HSI1d datasets, the choice of layer number is independent of l. This indicates that in high-frequency trading, it is only necessary to model very short-term regime shifts, so increasing l does not lead to significant changes in regime-shift information. In contrast, on the DJIA1w dataset, increasing l results in richer regime-shift information for portfolio policy generation. Therefore, using more layers for RSM modeling results in better performance. Finally, note that on the HSI1w dataset, although performance improves with three layers compared to using two SSM layers as l increases, better performance is achieved with just one layer instead. This is because the HSI1w dataset exhibits a clear mean-reversion property that can be modeled effectively with only one SSM layer. The mean-reversion nature of HSI1w is also shown in the results in Section 5.3.

5.2.3. Cross-Asset Regime Fusion

We study the relationship between performance with and without cross-asset regime fusion when the time-series sample count (l) varies. As for the performance metrics, we present the AR in Table 7, ASR in Table 8, and ACR in Table 9. In 1d high-frequency trading, cross-asset regime fusion can consistently improve performance, indicating that asset correlation and co-integration are particularly important for the generation of high-frequency portfolio policies. However, in 1w low-frequency trading scenarios, whether cross-asset regime fusion can improve performance is closely related to l. In both DJIA and HSI, although cross-asset regime fusion can enhance performance at small values of l, it may lead to a decrease in performance as l increases. This suggests that the correlation and co-integration of assets in DJIA and HSI are relatively unstable, where interference from instability characteristics within large samples leads to decreased performance.

5.3. Comparison with Other Methods

We compare our method’s performance to that of other methods, showing the AR in Table 10, ASR in Table 11, and ACR in Table 12. Overall, the use of DRL methods generally outperforms traditional methods. However, on HSI1w, mean reversion-based methods such as OLMAR (B. Li & Hoi, 2012) and RMR (Huang et al., 2016) perform better than deep learning methods, indicating a clear mean reversion characteristic on HSI1w. Previous DRL method performances depended on the dataset. CNN EIIE (Jiang & Liang, 2017) achieves good results on the DJIA dataset. The LSRE-CAAN (J. Li et al., 2023) method performs well on the DJIA dataset. RAT (Xu et al., 2021) handles HSI datasets well. Compared with other methods, our PortRSMs achieved the best performance among almost all datasets using DRL methods, showing its effectiveness and robustness.

6. Conclusions

This paper presents an innovative portfolio policy network structure that effectively addresses the challenges in financial time-series modeling by combining the regime shift model (RSM) paradigm with recent advancements in deep learning techniques. The experimental results across multiple stock markets and trading frequencies confirm the superiority of the proposed approach. This paper offers new insights for the development of high-frequency trading strategies and contributes valuable perspectives to the field of financial time-series modeling.
However, we acknowledge certain limitations of the current study. Due to the varying characteristics of different financial markets and time periods, some architecture hyper-parameters of the proposed model need to be adjusted for different scenarios. This sensitivity limits the model’s direct generalization across markets.
In addition, our approach implicitly assumes that short-term asset return distributions are locally consistent within regimes and that regime shifts can be effectively captured by structural patterns in the data. This assumption is supported by prior work on volatility clustering and regime shift models (RSMs) but may not hold under all market conditions, such as in cases of highly irregular or non-stationary shocks.
To address this, future research could explore the integration of meta-learning or online learning techniques, which are capable of adapting to dynamic and heterogeneous market environments in a more automated and robust manner.
Future research could also explore the application of the proposed method in other financial markets and asset classes, as well as further optimize the model to adapt to more complex market environments.

Author Contributions

Conceptualization, B.L.; methodology, B.L.; software, B.L.; validation, B.L. and R.I.; formal analysis, B.L. and R.I.; investigation, B.L.; resources, R.I.; data curation, B.L.; writing—original draft preparation, B.L.; writing—review and editing, R.I.; visualization, B.L.; supervision, R.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Institute of Science Tokyo.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Notes

1
The SiLU (Sigmoid Linear Unit) function is defined as SiLU ( x ) = x · sigmoid ( x ) and is known for being smooth and non-monotonic.
2
LeakyReLU is defined as f ( x ) = x for x > 0 and f ( x ) = α x for x 0 , where α is a small constant (e.g., 0.01), allowing a small gradient when the input is negative.

References

  1. Agarwal, A., Hazan, E., Kale, S., & Schapire, R. E. (2006, June 25–29). Algorithms for portfolio management based on the newton method. 23rd International Conference on Machine Learning (pp. 9–16), Pittsburgh, PA, USA. [Google Scholar] [CrossRef]
  2. Bauwens, L., Laurent, S., & Rombouts, J. V. (2006). Multivariate GARCH models: A survey. Journal of Applied Econometrics, 21(1), 79–109. [Google Scholar] [CrossRef]
  3. Bollerslev, T., Engle, R. F., & Nelson, D. B. (1994). Chapter 49 Arch models. In Handbook of econometrics (Vol. 4, pp. 2959–3038). Elsevier. [Google Scholar]
  4. Borodin, A., El-Yaniv, R., & Gogan, V. (2003, December 8–13). Can we learn to beat the best stock. Advances in Neural Information Processing Systems (Vol. 16), Vancouver, BC, Canada. [Google Scholar]
  5. Bustos, O., & Pomares-Quimbaya, A. (2020). Stock market movement forecast: A systematic review. Expert Systems with Applications, 156, 113464. [Google Scholar] [CrossRef]
  6. Cai, J. (1994). A markov model of switching-regime ARCH. Journal of Business & Economic Statistics, 12(3), 309–316. [Google Scholar] [CrossRef]
  7. Evertsz, C. J. (1995). Fractal geometry of financial time series. Fractals, 3(3), 609–616. [Google Scholar] [CrossRef]
  8. Gu, A., Dao, T., Ermon, S., Rudra, A., & Ré, C. (2020, December 6–12). HiPPO: Recurrent memory with optimal polynomial projections. 34th International Conference on Neural Information Processing Systems (Vol. 33, pp. 1474–1487), Vancouver, BC, Canada. [Google Scholar]
  9. Gu, A., Goel, K., & Re, C. (2022, April 25–29). Efficiently modeling long sequences with structured state spaces. International Conference on Learning Representations, Virtual. [Google Scholar]
  10. Györfi, L., Lugosi, G., & Udina, F. (2006). Nonparametric kernel-based sequential investment strategies. Mathematical Finance, 16(2), 337–357. [Google Scholar] [CrossRef]
  11. Haas, M., Mittnik, S., & Paolella, M. S. (2004). A new approach to markov-switching GARCH models. Journal of Financial Econometrics, 2(4), 493–530. [Google Scholar] [CrossRef]
  12. Helmbold, D. P., Schapire, R. E., Singer, Y., & Warmuth, M. K. (1998). On-line portfolio selection using multiplicative updates. Mathematical Finance, 8(4), 325–347. [Google Scholar] [CrossRef]
  13. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. [Google Scholar] [CrossRef]
  14. Huang, D., Zhou, J., Li, B., Hoi, S. C., & Zhou, S. (2016). Robust median reversion strategy for online portfolio selection. IEEE Transactions on Knowledge and Data Engineering, 28(9), 2480–2493. [Google Scholar] [CrossRef]
  15. Jegadeesh, N., & Titman, S. (2023). Momentum: Evidence and insights 30 years later. Pacific-Basin Finance Journal, 82, 102202. [Google Scholar] [CrossRef]
  16. Jiang, Z., & Liang, J. (2017, September 7–8). Cryptocurrency portfolio management with deep reinforcement learning. 2017 Intelligent Systems Conference (pp. 905–913), London, UK. [Google Scholar]
  17. King, A. (2019). Akshare. Available online: https://github.com/akfamily/akshare (accessed on 31 July 2025).
  18. Lai, Z.-R., Yang, P.-Y., Fang, L., & Wu, X. (2018). Reweighted price relative tracking system for automatic portfolio optimization. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 50(11), 4349–4361. [Google Scholar] [CrossRef]
  19. Lea, C., Flynn, M. D., Vidal, R., Reiter, A., & Hager, G. D. (2017, July 21–26). Temporal convolutional networks for action segmentation and detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1003–1012), Honolulu, HI, USA. [Google Scholar]
  20. LeBaron, B. (1992). Some relations between volatility and serial correlations in stock market returns. The Journal of Business, 65(2), 199–219. [Google Scholar] [CrossRef]
  21. Li, B., & Hoi, S. C. (2012, June 26–July 1). On-line portfolio selection with moving average reversion. 29th International Conference on Machine Learning (pp. 273–280), Edinburgh, Scotland. [Google Scholar]
  22. Li, B., Hoi, S. C., & Gopalkrishnan, V. (2011a). CORN: Correlation-driven nonparametric learning approach for portfolio selection. ACM Transactions on Intelligent Systems and Technology, 2(3), 1–29. [Google Scholar] [CrossRef]
  23. Li, B., Hoi, S. C., Zhao, P., & Gopalkrishnan, V. (2011b, April 11–13). Confidence weighted mean reversion strategy for on-line portfolio selection. 14th International Conference on Artificial Intelligence and Statistics (Vol. 15, pp. 434–442), Fort Lauderdale, FL, USA. [Google Scholar]
  24. Li, B., Zhao, P., Hoi, S. C., & Gopalkrishnan, V. (2012). PAMR: Passive aggressive mean reversion strategy for portfolio selection. Machine Learning, 87(2), 221–258. [Google Scholar] [CrossRef]
  25. Li, J., Zhang, Y., Yang, X., & Chen, L. (2023). Online portfolio management via deep reinforcement learning with high-frequency data. Information Processing & Management, 60(3), 103247. [Google Scholar] [CrossRef]
  26. Li, X., Cui, C., Cao, D., Du, J., & Zhang, C. (2022, May 23–27). Hypergraph-based reinforcement learning for stock portfolio selection. IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4028–4032), Singapore. [Google Scholar]
  27. Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7(1), 77–91. [Google Scholar] [CrossRef] [PubMed]
  28. Ni, L.-P., Ni, Z.-W., & Gao, Y.-Z. (2011). Stock trend prediction based on fractal feature selection and support vector machine. Expert Systems with Applications, 38(5), 5569–5576. [Google Scholar] [CrossRef]
  29. Peters, E. E. (1989). Fractal structure in the capital markets. Financial Analysts Journal, 45(4), 32–37. [Google Scholar] [CrossRef]
  30. Schiff, Y., Kao, C.-H., Gokaslan, A., Dao, T., Gu, A., & Kuleshov, V. (2024, July 21–27). Caduceus: Bi-directional equivariant long-range DNA sequence modeling. 41st International Conference on Machine Learning (Vol. 235, pp. 43632–43648), Vienna, Austria. [Google Scholar]
  31. Shi, S., Li, J., Li, G., Pan, P., Chen, Q., & Sun, Q. (2022). GPM: A graph convolutional network based reinforcement learning framework for portfolio management. Neurocomputing, 498, 14–27. [Google Scholar] [CrossRef]
  32. Shiller, R. J. (1990). Market volatility and investor behavior. The American Economic Review, 80(2), 58–62. [Google Scholar]
  33. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014, June 21–26). Deterministic policy gradient algorithms. 31st International Conference on Machine Learning (pp. 387–395), Beijing, China. [Google Scholar]
  34. So, M. K. P., Lam, K., & Li, W. K. (1998). A stochastic volatility model with markov switching. Journal of Business & Economic Statistics, 16(2), 244–253. [Google Scholar] [CrossRef] [PubMed]
  35. Soleymani, F., & Paquet, E. (2021). Deep graph convolutional reinforcement learning for financial portfolio management—DeepPocket. Expert Systems with Applications, 182, 115127. [Google Scholar] [CrossRef]
  36. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017, December 4–9). Attention is all you need. International Conference on Neural Information Processing Systems (pp. 6000–6010), Long Beach, CA, USA. [Google Scholar]
  37. Wang, Z., Huang, B., Tu, S., Zhang, K., & Xu, L. (2021, February 2–9). DeepTrader: A deep reinforcement learning approach for risk-return balanced portfolio management with market conditions embedding. AAAI Conference on Artificial Intelligence (pp. 643–650), Virtual. [Google Scholar]
  38. Xu, K., Zhang, Y., Ye, D., Zhao, P., & Tan, M. (2021, January 7–15). Relation-aware transformer for portfolio policy learning. 29th International Conference on International Joint Conferences on Artificial Intelligence (pp. 4647–4653), Yokohama, Japan. [Google Scholar]
  39. Yahoo. (2025). Yahoo finance—Stock market live, quotes, business & finance news. Available online: https://finance.yahoo.com/ (accessed on 31 July 2025).
Figure 1. Data flow diagram of PortRSMs at timestamp t and trading period k ( t = k T as defined in Section 3.2). PortRSMs uses sample P in the asset price time series as input. To model RSMs, SSMs are used to simulate the transition of state u. The output (y) of each SSM layer is used as the input of the next layer, and the output of each layer is fused between the asset through the HGAM. The output of the last layer generates new portfolio weights ( w k ) through the output projection and softmax functions, after concatenating with the existing portfolio weights ( w k 1 ).
Figure 1. Data flow diagram of PortRSMs at timestamp t and trading period k ( t = k T as defined in Section 3.2). PortRSMs uses sample P in the asset price time series as input. To model RSMs, SSMs are used to simulate the transition of state u. The output (y) of each SSM layer is used as the input of the next layer, and the output of each layer is fused between the asset through the HGAM. The output of the last layer generates new portfolio weights ( w k ) through the output projection and softmax functions, after concatenating with the existing portfolio weights ( w k 1 ).
Jrfm 18 00434 g001
Figure 2. The cumulative log return in the testing period when using different modules, with l = 50 . The result for CRP is also shown for comparison as a benchmark.
Figure 2. The cumulative log return in the testing period when using different modules, with l = 50 . The result for CRP is also shown for comparison as a benchmark.
Jrfm 18 00434 g002
Figure 3. The cumulative log return during the COVID outbreak period for the DJIA dataset when using different modules, with l = 50 . The result for CRP is also shown for comparison as a benchmark.
Figure 3. The cumulative log return during the COVID outbreak period for the DJIA dataset when using different modules, with l = 50 . The result for CRP is also shown for comparison as a benchmark.
Jrfm 18 00434 g003
Table 1. AR performance comparison of different modules across various datasets, using different time-series sample counts (l). Bold values indicate the best result in each group of comparisons, and underlined values indicate the suboptimal result.
Table 1. AR performance comparison of different modules across various datasets, using different time-series sample counts (l). Bold values indicate the best result in each group of comparisons, and underlined values indicate the suboptimal result.
DatasetModule l = 20 l = 50 l = 100
DJIA1dLSTM0.104 ± 0.0030.105 ± 0.0040.104 ± 0.003
TCN0.099 ± 0.0110.046 ± 0.0360.093 ± 0.054
Transformer0.104 ± 0.0010.109 ± 0.0030.119 ± 0.013
SSMs0.109 ± 0.0080.121 ± 0.0120.127 ± 0.013
DJIA1wLSTM0.124 ± 0.0060.127 ± 0.0080.131 ± 0.009
TCN0.158 ± 0.0220.171 ± 0.0350.298 ± 0.078
Transformer0.132 ± 0.0100.123 ± 0.0030.156 ± 0.029
SSMs0.161 ± 0.0150.170 ± 0.0260.157 ± 0.017
HSI1dLSTM0.218 ± 0.0340.175 ± 0.0360.201 ± 0.085
TCN0.236 ± 0.0200.211 ± 0.0540.160 ± 0.086
Transformer0.232 ± 0.0340.179 ± 0.0200.112 ± 0.052
SSMs0.296 ± 0.0880.220 ± 0.0290.247 ± 0.122
HSI1wLSTM0.225 ± 0.0120.231 ± 0.0150.233 ± 0.018
TCN0.215 ± 0.0470.245 ± 0.0480.183 ± 0.087
Transformer0.251 ± 0.0230.180 ± 0.0710.145 ± 0.064
SSMs0.292 ± 0.0480.279 ± 0.0480.251 ± 0.076
Table 2. ASR performance comparison of different modules across various datasets, using different time-series sample counts (l). Bold values indicate the best result in each group of comparisons, and underlined values indicate the suboptimal result.
Table 2. ASR performance comparison of different modules across various datasets, using different time-series sample counts (l). Bold values indicate the best result in each group of comparisons, and underlined values indicate the suboptimal result.
DatasetModule l = 20 l = 50 l = 100
DJIA1dLSTM0.558 ± 0.0060.559 ± 0.0100.557 ± 0.007
TCN0.506 ± 0.0670.394 ± 0.1360.532 ± 0.077
Transformer0.558 ± 0.0020.565 ± 0.0040.586 ± 0.028
SSMs0.573 ± 0.0250.613 ± 0.0360.630 ± 0.040
DJIA1wLSTM0.621 ± 0.0110.619 ± 0.0080.616 ± 0.008
TCN0.532 ± 0.0810.514 ± 0.0550.681 ± 0.099
Transformer0.623 ± 0.0030.617 ± 0.0050.642 ± 0.019
SSMs0.646 ± 0.0240.662 ± 0.0180.665 ± 0.014
HSI1dLSTM0.695 ± 0.0660.635 ± 0.0510.676 ± 0.121
TCN0.724 ± 0.0440.652 ± 0.0880.565 ± 0.179
Transformer0.776 ± 0.0400.710 ± 0.0370.503 ± 0.155
SSMs0.808 ± 0.1420.768 ± 0.0400.792 ± 0.147
HSI1wLSTM0.756 ± 0.0130.779 ± 0.0270.785 ± 0.027
TCN0.718 ± 0.1280.714 ± 0.0620.613 ± 0.214
Transformer0.854 ± 0.0490.679 ± 0.1930.537 ± 0.136
SSMs0.868 ± 0.0600.899 ± 0.0650.793 ± 0.113
Table 3. ACR performance comparison of different modules across various datasets, using different time-series sample counts (l). Bold values indicate the best result in each group of comparisons, and underlined values indicate the suboptimal result.
Table 3. ACR performance comparison of different modules across various datasets, using different time-series sample counts (l). Bold values indicate the best result in each group of comparisons, and underlined values indicate the suboptimal result.
DatasetModule l = 20 l = 50 l = 100
DJIA1dLSTM0.307 ± 0.0040.307 ± 0.0060.306 ± 0.004
TCN0.235 ± 0.0730.164 ± 0.1220.325 ± 0.119
Transformer0.305 ± 0.0010.309 ± 0.0020.321 ± 0.015
SSMs0.313 ± 0.0150.345 ± 0.0250.354 ± 0.029
DJIA1wLSTM0.387 ± 0.0170.388 ± 0.0140.385 ± 0.066
TCN0.364 ± 0.1390.393 ± 0.1160.724 ± 0.279
Transformer0.405 ± 0.0110.399 ± 0.0070.473 ± 0.079
SSMs0.499 ± 0.0750.555 ± 0.1010.515 ± 0.066
HSI1dLSTM0.567 ± 0.1020.488 ± 0.0580.560 ± 0.182
TCN0.590 ± 0.080.488 ± 0.1130.431 ± 0.210
Transformer0.661 ± 0.0690.604 ± 0.0600.354 ± 0.181
SSMs0.728 ± 0.2170.688 ± 0.0680.762 ± 0.245
HSI1wLSTM0.804 ± 0.0530.884 ± 0.0580.881 ± 0.060
TCN0.676 ± 0.2620.688 ± 0.1310.478 ± 0.225
Transformer0.979 ± 0.1120.588 ± 0.2520.346 ± 0.158
SSMs1.189 ± 0.1421.282 ± 0.1820.830 ± 0.160
Table 4. Performance comparison of different numbers of stacked SSM layers across various datasets, using different time-series sample count (l), as evaluated by AR. Bold values indicate the best result in each group of comparisons, and underlined values indicate the suboptimal result.
Table 4. Performance comparison of different numbers of stacked SSM layers across various datasets, using different time-series sample count (l), as evaluated by AR. Bold values indicate the best result in each group of comparisons, and underlined values indicate the suboptimal result.
DatasetLayer Num l = 20 l = 50 l = 100
DJIA1d10.100 ± 0.0100.110 ± 0.0140.114 ± 0.011
20.109 ± 0.0080.121 ± 0.0120.127 ± 0.013
30.107 ± 0.0050.116 ± 0.0150.125 ± 0.019
DJIA1w10.140 ± 0.0030.156 ± 0.0060.164 ± 0.008
20.161 ± 0.0150.170 ± 0.0260.157 ± 0.017
30.142 ± 0.0300.187 ± 0.0520.190 ± 0.048
HSI1d10.273 ± 0.0600.186 ± 0.0220.181 ± 0.022
20.296 ± 0.0880.220 ± 0.0290.247 ± 0.122
30.355 ± 0.0960.262 ± 0.0820.317 ± 0.164
HSI1w10.284 ± 0.0070.341 ± 0.0200.315 ± 0.031
20.292 ± 0.0480.279 ± 0.0480.251 ± 0.076
30.282 ± 0.0400.294 ± 0.0710.272 ± 0.029
Table 5. Performance comparison of different numbers of stacked SSM layers across various datasets, using different time-series sample counts (l), as evaluated by ASR. Bold values indicate the best result in each group of comparisons, and underlined values indicate the suboptimal result.
Table 5. Performance comparison of different numbers of stacked SSM layers across various datasets, using different time-series sample counts (l), as evaluated by ASR. Bold values indicate the best result in each group of comparisons, and underlined values indicate the suboptimal result.
DatasetLayer Num l = 20 l = 50 l = 100
DJIA1d10.559 ± 0.0040.582 ± 0.0140.589 ± 0.020
20.573 ± 0.0250.613 ± 0.0360.630 ± 0.040
30.569 ± 0.0140.596 ± 0.0460.614 ± 0.054
DJIA1w10.635 ± 0.0080.654 ± 0.0060.661 ± 0.006
20.646 ± 0.0240.662 ± 0.0180.665 ± 0.014
30.639 ± 0.0370.658 ± 0.0260.670 ± 0.023
HSI1d10.763 ± 0.0930.713 ± 0.0430.685 ± 0.036
20.808 ± 0.1420.768 ± 0.0400.792 ± 0.147
30.948 ± 0.1410.818 ± 0.1050.880 ± 0.213
HSI1w10.859 ± 0.0120.950 ± 0.0330.886 ± 0.041
20.868 ± 0.0600.899 ± 0.0650.793 ± 0.113
30.856 ± 0.0690.905 ± 0.0520.827 ± 0.041
Table 6. Performance comparison of different numbers of stacked SSM layers across various datasets, using different time-series sample counts (l), as evaluated by ACR. Bold values indicate the best result in each group of comparisons, and underlined values indicate the suboptimal result.
Table 6. Performance comparison of different numbers of stacked SSM layers across various datasets, using different time-series sample counts (l), as evaluated by ACR. Bold values indicate the best result in each group of comparisons, and underlined values indicate the suboptimal result.
DatasetLayer Num l = 20 l = 50 l = 100
DJIA1d10.309 ± 0.0040.310 ± 0.0120.316 ± 0.010
20.313 ± 0.0150.345 ± 0.0250.354 ± 0.029
30.314 ± 0.0060.336 ± 0.0320.343 ± 0.041
DJIA1w10.428 ± 0.0160.483 ± 0.0140.508 ± 0.026
20.499 ± 0.0750.555 ± 0.1010.515 ± 0.066
30.476 ± 0.0890.604 ± 0.1900.611 ± 0.169
HSI1d10.633 ± 0.1280.603 ± 0.0800.553 ± 0.059
20.728 ± 0.2170.688 ± 0.0680.762 ± 0.245
30.914 ± 0.2100.780 ± 0.1860.908 ± 0.306
HSI1w11.159 ± 0.0471.479 ± 0.0881.206 ± 0.178
21.189 ± 0.1421.282 ± 0.1820.830 ± 0.178
31.209 ± 0.2201.291 ± 0.2780.979 ± 0.200
Table 7. Performance comparison (AR) of algorithms with and without cross-asset regime fusion across various datasets and time-series sample counts (l). Bold values indicate the best result in each comparison group. Bold values indicate the best result in each comparison group. (✔) means with cross-asset regime fusion, and (×) means without.
Table 7. Performance comparison (AR) of algorithms with and without cross-asset regime fusion across various datasets and time-series sample counts (l). Bold values indicate the best result in each comparison group. Bold values indicate the best result in each comparison group. (✔) means with cross-asset regime fusion, and (×) means without.
DatasetFusion l = 20 l = 50 l = 100
DJIA1d×0.100 ± 0.0040.103 ± 0.0070.112 ± 0.005
0.109 ± 0.0080.121 ± 0.0120.127 ± 0.013
DJIA1w×0.145 ± 0.0060.158 ± 0.0030.166 ± 0.006
0.161 ± 0.0150.170 ± 0.0260.157 ± 0.017
HSI1d×0.192 ± 0.0330.171 ± 0.0080.171 ± 0.006
0.296 ± 0.0880.220 ± 0.0290.247 ± 0.122
HSI1w×0.244 ± 0.0150.305 ± 0.0150.257 ± 0.016
0.292 ± 0.0480.279 ± 0.0480.251 ± 0.076
Table 8. Performance comparison (ASR) of algorithms with and without cross-asset regime fusion across various datasets and time-series sample counts (l). Bold values indicate the best result in each comparison group. (✔) means with cross-asset regime fusion, and (×) means without.
Table 8. Performance comparison (ASR) of algorithms with and without cross-asset regime fusion across various datasets and time-series sample counts (l). Bold values indicate the best result in each comparison group. (✔) means with cross-asset regime fusion, and (×) means without.
DatasetFusion l = 20 l = 50 l = 100
DJIA1d×0.557 ± 0.0020.563 ± 0.0100.578 ± 0.013
0.573 ± 0.0250.613 ± 0.0360.630 ± 0.040
DJIA1w×0.642 ± 0.0060.661 ± 0.0040.668 ± 0.004
0.646 ± 0.0240.662 ± 0.0180.665 ± 0.014
HSI1d×0.647 ± 0.0410.678 ± 0.0130.655 ± 0.015
0.808 ± 0.1420.768 ± 0.0400.792 ± 0.147
HSI1w×0.816 ± 0.0190.909 ± 0.0190.836 ± 0.026
0.868 ± 0.0600.899 ± 0.0650.793 ± 0.113
Table 9. Performance comparison (ACR) of algorithms with and without cross-asset regime fusion across various datasets and time-series sample counts (l). Bold values indicate the best result in each comparison group. Bold values indicate the best result in each comparison group. (✔) means with cross-asset regime fusion, and (×) means without.
Table 9. Performance comparison (ACR) of algorithms with and without cross-asset regime fusion across various datasets and time-series sample counts (l). Bold values indicate the best result in each comparison group. Bold values indicate the best result in each comparison group. (✔) means with cross-asset regime fusion, and (×) means without.
DatasetFusion l = 20 l = 50 l = 100
DJIA1d×0.308 ± 0.0010.307 ± 0.0020.308 ± 0.007
0.313 ± 0.0150.345 ± 0.0250.354 ± 0.029
DJIA1w×0.446 ± 0.0190.474 ± 0.0490.524 ± 0.019
0.499 ± 0.0750.555 ± 0.1010.515 ± 0.066
HSI1d×0.485 ± 0.0630.557 ± 0.0240.521 ± 0.024
0.728 ± 0.2170.688 ± 0.0680.762 ± 0.245
HSI1w×1.006 ± 0.0441.311 ± 0.0620.963 ± 0.086
1.189 ± 0.1421.282 ± 0.1820.830 ± 0.160
Table 10. Performance comparison (AR) of different methods across various datasets. The best and second best results are marked by bold and underlined values, respectively.
Table 10. Performance comparison (AR) of different methods across various datasets. The best and second best results are marked by bold and underlined values, respectively.
MethodDJIA1dDJIA1wHSI1dHSI1w
CRP0.1040.1040.0210.012
BAH0.0930.0920.0140.007
EG0.1040.1040.0200.012
OLMAR−0.102−0.042−0.2400.316
RMR−0.2110.024−0.2290.508
BNN0.079−0.171−0.095−0.171
CORN−0.075−0.069−0.184−0.314
CNN EIIE0.114 ± 0.0120.138 ± 0.0280.082 ± 0.0650.130 ± 0.094
bRNN EIIE0.097 ± 0.0060.122 ± 0.0030.043 ± 0.0130.107 ± 0.040
RAT0.103 ± 0.0020.124 ± 0.0030.149 ± 0.0130.215 ± 0.045
HGAM0.103 ± 0.0000.170 ± 0.0560.078 ± 0.0610.241 ± 0.110
LSRE-CAAN0.119 ± 0.0000.102 ± 0.0290.011 ± 0.0070.009 ± 0.000
PortRSMs0.121 ± 0.0120.170 ± 0.0260.220 ± 0.0290.279 ± 0.048
Table 11. Performance comparison (ASR) of different methods across various datasets. The best and second best results are marked by bold and underlined values, respectively.
Table 11. Performance comparison (ASR) of different methods across various datasets. The best and second best results are marked by bold and underlined values, respectively.
MethodDJIA1dDJIA1wHSI1dHSI1w
CRP0.5610.5950.2040.156
BAH0.5200.5510.1710.134
EG0.5590.5930.2020.154
OLMAR−0.0340.177−0.2620.815
RMR−0.3530.297−0.2251.116
BNN0.388−0.3280.059−0.216
CORN−0.092−0.108−0.185−0.542
CNN EIIE0.584 ± 0.0220.635 ± 0.0320.398 ± 0.1930.523 ± 0.287
bRNN EIIE0.557 ± 0.0010.625 ± 0.0050.308 ± 0.0560.516 ± 0.081
RAT0.560 ± 0.0010.620 ± 0.0020.644 ± 0.0410.805 ± 0.102
HGAM0.559 ± 0.0000.669 ± 0.0340.428 ± 0.2160.814 ± 0.176
LSRE-CAAN0.564 ± 0.0000.508 ± 0.0940.182 ± 0.0040.149 ± 0.000
PortRSMs0.613 ± 0.0360.662 ± 0.0180.768 ± 0.0400.899 ± 0.065
Table 12. Performance comparison (ACR) of different methods across various datasets. The best and second best results are marked by bold and underlined values, respectively.
Table 12. Performance comparison (ACR) of different methods across various datasets. The best and second best results are marked by bold and underlined values, respectively.
MethodDJIA1dDJIA1wHSI1dHSI1w
CRP0.3100.3320.0670.043
BAH0.2790.2970.0390.023
EG0.3090.3300.0650.042
OLMAR−0.1490.064−0.2940.963
RMR−0.3040.036−0.2861.305
BNN0.135−0.2390.124−0.235
CORN−0.154−0.118−0.240−0.377
CNN EIIE0.312 ± 0.0050.462 ± 0.0700.268 ± 0.2060.452 ± 0.326
bRNN EIIE0.309 ± 0.0020.395 ± 0.0040.150 ± 0.0470.399 ± 0.069
RAT0.308 ± 0.0010.402 ± 0.0040.514 ± 0.0450.856 ± 0.209
HGAM0.308 ± 0.0000.578 ± 0.1910.274 ± 0.2040.870 ± 0.316
LSRE-CAAN0.307 ± 0.0000.292 ± 0.0780.062 ± 0.0220.025 ± 0.000
PortRSMs0.345 ± 0.0250.555 ± 0.1010.688 ± 0.0681.282 ± 0.182
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, B.; Ichise, R. PortRSMs: Learning Regime Shifts for Portfolio Policy. J. Risk Financial Manag. 2025, 18, 434. https://doi.org/10.3390/jrfm18080434

AMA Style

Liu B, Ichise R. PortRSMs: Learning Regime Shifts for Portfolio Policy. Journal of Risk and Financial Management. 2025; 18(8):434. https://doi.org/10.3390/jrfm18080434

Chicago/Turabian Style

Liu, Bingde, and Ryutaro Ichise. 2025. "PortRSMs: Learning Regime Shifts for Portfolio Policy" Journal of Risk and Financial Management 18, no. 8: 434. https://doi.org/10.3390/jrfm18080434

APA Style

Liu, B., & Ichise, R. (2025). PortRSMs: Learning Regime Shifts for Portfolio Policy. Journal of Risk and Financial Management, 18(8), 434. https://doi.org/10.3390/jrfm18080434

Article Metrics

Back to TopTop