Trading Strategy for Market Situation Estimation Based on Hidden Markov Model

: Determining states of the market and scientiﬁc laws of transfer between these states is an important subject in the ﬁeld of ﬁnancial mathematics. According to the results of market situation estimation, formulating corresponding trading strategies can gain proﬁts in the market through machine trading. The market situation is mainly divided into three types: bull market, mixed market and bear market, and it can be further subdivided into multiple types. Using the hidden Markov model (HMM) to estimate the market situation is not restricted by linear conditions compared to the traditional use of linear models. In this paper, we ﬁrst use HMM to model the market situation, perform feature analysis on the hidden state of the model input, and then estimate the three market situations, and propose the Markov situation estimation trading strategy. On this basis, we have made a more ﬁne-grained division of the market situation and increased the number of hidden sequences in the model. Experiments verify that this method can improve the proﬁtability of the strategy.


Introduction
Theoretically, the stock price is determined by the stock value, but in the stock market, many factors affect the stock price, such as political factors, economic environment factors, military factors, industry factors, company factors, and investor psychological factors. Precisely predicting the financial market has become a very difficult task because the stock price is affected by many factors. Although the market cannot be predicted, much literature in recent years has carried out relevant research on the trend of financial market price changes and the state of the market and the scientific law of the transition between these states. It is hoped that through these studies and applying them to the field of machine trading will achieve the purpose of making profits in the market. Abad-Segura et al. [1] reviewed the evolution of scientific production on financial transactions in a global context.
Previous studies on the law of market price fluctuations were mainly based on traditional statistical methods, such as the moving average model [2] and the autoregressive moving average model [3]. The moving average model averages the corresponding indexes in the market over a period of time and then connects these averages into a moving average, and then analyzes the market and studies the law of price changes. The autoregressive moving average model is a hybrid model produced by combining the autoregressive model and the moving average model. Based on the moving average model, parameter estimation is added to provide the accuracy of the moving average model. These traditional models all rely on the linear nature of the data, and the data in the market is often non-linear, so the traditional model will be restricted by the nonlinearity of the data when studying the market price law.
In recent years, many scholars have conducted research on quantitative trading strategies. In 2018, Wen et al. [4] systematically studied the characteristics of trading strategies from a network perspective and analyzed the Chinese stock market. Yang et al. [5] used Pearson's correlation coefficient to calculate the correlation of market data and established a trading strategy based on the capital flow model. In 2019, Pierre et al. [6] used the data obtained from Google's website to study investors' interest in the stocks listed on the Dow Jones Index and formulated corresponding trading strategies. Li et al. [7] used the theory of deep reinforcement learning to study financial markets and established corresponding trading strategies. In 2020, Thibaut et al. [8] used the method of deep reinforcement learning to study the optimal time for market transactions and formulate corresponding market timing strategies. The hidden Markov model (HMM), as a tool to deal with time series problems, is not affected by whether the data is linear or not when analyzing market conditions and the transition law between these states. Since Hassan and Nath [9] began to use the hidden Markov model to study the law of stock market changes in 2005, more and more scholars began to use the hidden Markov model to analyze the behavior of the stock market. Hidden Markov models are increasingly used in the study of trading strategies. In 2013, Tuyen [10] used a normally distributed hidden Markov model to apply historical data of VN-Index to find the most suitable Markov model for the data. In 2015, Nguyen Nguyet [11] and others used the hidden Markov model to predict the hidden state of observed data on the market, and selected stocks based on the predicted state, and gave a stock portfolio plan using its method. In 2016, Holzmann et al. [12] tested the number of states in the hidden Markov model and found the highest volatility state corresponding to the recent financial crisis. In 2017 Liu et al. [13] used a hidden Markov model with three hidden states to explain the time-varying distribution of Chinese stock market returns since 2005. In 2018, Fu and Wu et al. [14] conducted a quantitative timing study on the CSI 300 Index based on the hidden Markov model, and successfully identified the market status and obtained good results. In the same year, Chen and Xia [15] proposed a time probability method called piecewise continuous hidden Markov model, trying to solve the problem of commodity classification in the market. Cheng et al. [16] used the hidden Markov model to identify the sentiment of investors in China's A-share market and formulated corresponding trading strategies. In 2019, Huang et al. [17] studied the non-homogeneous hidden Markov model and proposed an improved EM algorithm to reveal the different patterns of bull and bear markets. Constandina et al. [18] used HMM studying the statistical and financial properties of cryptocurrencies and analyzed their statistical properties. Eun-Chong Kim et al. [19] used HMM to identify the phases of individual assets and proposed an investment strategy using price trends effectively. Shi et al. [20] pointed out that the application of conventional econometric models in prediction can incur significant errors. They combined a series of machine learning models as examples to predict the prices of two representative indexes and a Hong Kong stock, and evaluated and compared the prediction performance of different models. In 2020, Liu et al. [21] established a hidden Markov modulation jump diffusion model under the regime transition economy and studied the option pricing when the pricing system switched risks.
In this paper, we first divide the market situation into three types: bull market, mixed market and bear market, and establish a hidden Markov model of state estimation to solve the problem of market situation estimation. Then we propose the Markov situation estimation trading strategy. On this basis, we have made a more fine-grained division of the market situation, increased the number of hidden sequences in the model. Experiments verify that this method can improve the profitability of the strategy.

Model and Method
The Hidden Markov model is a statistical model that describes a Markov process with unknown parameters. The model contains two random sequences, namely unobservable hidden sequence I and observable sequence O. The hidden sequence I is a random sequence that can affect the observation sequence O and cannot be directly observed or acquired. When studying the market state and behavior, the state of the market is the hidden sequence. Observable sequence O is a random sequence that can be directly observed or acquired under the influence of hidden sequence I. When studying market status and behavior, various historical data that can be obtained from the market are observable sequences. In HMM, the form of the hidden sequence I is as follows: each state i k in the sequence has a different possible value q k . It may be assumed that there are N values, which need to meet the following conditions: The form of the observable sequence O is as follows: each state c j in the sequence also has different possible values v j . It may be assumed that there are M values, which need to meet the following conditions: In HMM, a hidden Markov chain randomly generates a hidden sequence I, and then the observation sequence O is generated from each state of the hidden sequence. Figure 1 shows the relationship between hidden sequences and observable sequences in HMM. HMM can be expressed by the initial probability distribution π, the state transition matrix A and the emission matrix B. Where the initial probability distribution π represents the probability distribution of the initial hidden state: The state transition matrix A represents the probability that the hidden sequence is q j at time t + 1 in the case where the hidden state is q i at time t: The emission matrix B represents the probability that the observation sequence is v k when the hidden state is q j at time t: For HMM, there are two important assumptions: The first is the homogeneous Markov assumption: this assumption refers to that the hidden random sequence satisfies the Markov property, which is, the state of the hidden sequence at any time is only related to the state of the hidden sequence of the previous state and is independent of other factors. The second is the independent assumption of observation: this assumption indicates that at any moment, the state of the observation sequence is only related to the state of the hidden sequence at this moment and is independent of other factors.

HMM Stock Market Modeling
Most studies on the market situation divide the market into three states: bull market, mixed market, and bear market. These states cannot be directly obtained. When modeling the stock market using HMM, we set these market states as hidden sequences, set the basic historical data that can be directly obtained, or the data that can be calculated from the basic historical data as the observable sequence. According to the efficient market hypothesis, without considering market manipulation, we can think that changes in stock prices are only related to the state of the stock market, that is, the current stock market information can represent the historical information in the stock market, which is consistent with the hidden Markov model assumption.
The schematic diagram of modeling the stock market using HMM is shown in Figure 2. In the figure, i 1 , i 2 and i 3 respectively represent the three states of bull market, mixed market, and bear market in the stock market. o 1 , o 2 and o 3 respectively represent the observable data of stock price rise, stock price volatility, stock price fall, a i j represents the probability that the market state will change from i i to i j : b j (k) represents the probability that the observable data is o k when the market state is i j :  The transition probability between market states is P(i t+1 |i t ). The state transition matrix is A = [a ij ]. The emission probability from market state to stock price change is P(o t |i t ). The emission matrix is B = [b j (k)]. We only need to know the initial market state probability distribution π using HMM to model and analyze the stock market. The model parameter λ = (π, A, B) can be trained according to the historical data of the market. After obtaining the model parameters, the market situation can be estimated, and an automated trading strategy can be constructed according to the situation.
The process of modeling the stock market using HMM is shown in Figure 3, mainly divided into data preparation and feature extraction stage, model training stage, and trading strategy application stage.
In the stock market, feature analysis is generally divided into two categories, namely fundamental feature analysis, and technical feature analysis. Fundamental characteristics analysis mainly includes macroeconomic analysis such as national fiscal policy and monetary policy, industry analysis, and company analysis including company financial data and performance report. Technical characteristics analysis mainly includes analysis of various technical indicators such as K-line, moving average, and exponential moving average. Different characteristics often reflect market conditions from different aspects. Considering that the feature data of fundamental analysis is difficult to obtain and quantify, at the same time, to improve the applicability of the model, in this paper we choose to analyze the technical features.  In the stage of data preparation and feature extraction, the historical data of the stock market is obtained through the data interface for feature extraction, and effective basic features are selected. In general, the data obtained through the data interface are basic data, such as daily closing price, opening price, highest price, lowest price, and transaction volume. To reduce the noise caused by these basic data during modeling, we have to deal with the basic data. In this paper, we calculate the daily logarithmic difference between the daily rise and fall points, the logarithmic return difference of the index every five days, and the log difference of the index volume every five days as the observed characteristics in the model.

Trading Strategy Application
In the model training stage, we use the selected effective observation signs to train the hidden Markov model, set three hidden states and five hidden states respectively to get the trained model to estimate the market situation. We assume that the emission probability of the model follows the Gaussian distribution, so that it can reflect the real situation of the market and is also convenient for model training.
In the trading strategy application stage, we formulate corresponding trading strategies based on the results of situation estimation under different numbers of hidden states, and evaluate different strategies by comparing the benefits of different strategies.

Experiments and Analysis
We mainly conduct a situation analysis of the financial market based on HMM. In this paper, we choose the data of the CSI 300 Index from January 2018 to January 2020 as the experimental object for analysis. The CSI 300 Index reflects the comprehensive changes in stock prices of representative stocks with strong liquidity and large market capitalization. Huatai Borui Shanghai and Shenzhen 300 Trading Open Index Securities Investment Fund use the CSI 300 Index as the target for trading and purchase/redemption in the secondary market.

Data Acquisition
We used the data interface to get the basic data. Part of the basic data we used in this paper is shown in Table 1. Due to the large amount of data, it is not convenient to display all of them. The table only lists part of the data at the beginning, middle, and end of 2018, 2019.

Analysis of Input Observation Features
In previous studies, scholars often used basic data in the market as input observation features for training hidden Markov models, for example, when Cheng and others [16] used the hidden Markov model to study the stock market, the daily opening price, closing price, highest price, lowest price, trading volume, etc., were used as the observation sequence. However, in the study of the time series of stock market data, the data can be further expanded and calculated to reflect more correlations between the data. In this paper, we calculate the daily logarithmic difference between the daily high and low points, the logarithmic return difference of the index every five days, and the log difference of the index volume every five days as the observed features in the model. Subtracting the five-day data can reflect more the correlation between the data, so as to better judge the market status. Because the magnitudes of data such as price and transaction volume in the observation series differ greatly, logarithmic difference processing on the data can effectively reduce the influence of fluctuations between different observation series due to the large magnitude difference. To reflect the effectiveness of our method of extracting input observation features, we have established different hidden Markov models using our method and traditional methods, and back tested based on the results obtained by the model. Figure 4 shows the return rate of different methods. It can be seen from Figure 4 that the overall rate of return using our method is higher, and the cumulative rate of return curve is more stable. The return rate we mentioned in this paper is calculated using the following formula: where δ represents the rate of return, x t represents the closing price of the day, and x t−1 represents the closing price of the previous day. Considering that the daily closing price changes are often small, the daily rate of return is a small value. Therefore, the following formula holds: that is to say:

Situation Assessment with Three Hidden States
We first set three hidden states to train the hidden Markov model with the prepared observation feature data, and use different colors to represent different states on the index curve. The result is shown in Figure 5. Figure 5 shows the results of the three situation estimates. It can be seen that in the state represented by hidden state 0, which is the blue dot, the CSI 300 index points tend to rise, that is, the market can be considered to be in a bull market. In the state represented by hidden state 2, which is the green dot, the CSI 300 index points tend to decline, that is, the market is considered to be in a bear market. In the state represented by hidden state 1, which is the orange dot, the index does not show a clear upward and downward trend, that is, the market can be considered to be in a volatile market. KLGGHQVWDWH KLGGHQVWDWH KLGGHQVWDWH Figure 5. Index curve with hidden states.

Markov Situation Assessment Trading Strategy
After obtaining the results of the three situation estimates, we further apply the results to the trading strategy of quantitative trading, choose to buy long or sell short based on the different market situations. That is, when the market is in a bull market situation, buy long the next day, when the market is in a bear market situation, sell short the next day, when the market is in a mixed market situation, do not buy or sell.
To reflect the effectiveness of the trading strategy developed by our method, we chose the double moving average strategy to compare with our method. The double moving average strategy is based on the intersection of moving averages of different days, seizing the strength and weakness of the stock to trade at the moment, by establishing m-day moving average and n-day moving average, and finding the intersection of the two lines, If m > n, the time when the n-day moving average crosses the m-day moving average from bottom to top is the golden cross, the time when the n-day moving average crosses the m-day moving average from top to bottom is the death cross. Buy long at the time of the golden cross and sell short at the time of the death cross. Figure 6 shows the changes of the five-day moving average and the 30-day moving average of the CSI 300 Index from 2018 to 2019. PD PD Figure 6. Changes in the double average curve.
The comparison of the yield index curve using the hidden Markov situation estimation trading strategy with three hidden states and the double moving average strategy is shown in Figure 7. As a whole, the benefits of the three hidden states hidden Markov situation estimation trading strategies are better than the double moving average strategy, but the return of this situation estimation trading strategy fluctuates greatly. In particular, a large drawdown began around November 2018 and even suffered losses in the next three to four months. We deem that the double moving average strategy relies on the linear nature of the data, and hidden Markov situational estimation trading strategy effectively considers the impact of data nonlinearity, therefore, the overall return of the dual moving average strategy is low. The return rate of three hidden states hidden Markov situation estimation trading strategy showed a sharp drawdown at the end of 2018, with the fact that in the context of the world economic crisis in 2018, the market dynamics are more intense, and the CSI 300 Index has also shown a significant decline overall. Three hidden states are not enough to fully capture market situation information. To improve the accuracy of the hidden Markov model in estimating the market situation, we increased the number of hidden states in the model, divided the market situation at a finer granularity, to capture more effective information.

Trading Strategy under the Division of Fine-Grained Market Situation
To improve the profitability of the hidden Markov situation estimation trading strategy, we modeled the market with a hidden Markov model of five hidden states. We correspond to the three hidden states mentioned above corresponding to the market bull market, mixed market and bear market. Similarly, we can correspond the five hidden states to the market bull market, small bull market, mixed market, small bear market and bear market. In this way, the granularity of the market situation is more finely divided and the market situation can be more clearly identified, thereby formulating a trading strategy with better returns. The comparison of the yield index curve using the hidden Markov situation estimation trading strategy with five hidden states and the double moving average strategy is shown in Figure 8. The benefits of the five hidden states hidden Markov situation estimation trading strategies are significantly better than the double moving average strategy. The comparison of the yield index curve using the hidden Markov situation estimation trading strategy with five hidden states and three hidden states is shown in Figure 9. When using the hidden Markov model with five hidden states to model the market, more effective information can be captured, the hidden Markov situation estimation trading strategy with five states has the characteristics of low volatility and higher overall yield. KLGGHQVWDWHV KLGGHQVWDWHV Figure 9. Hidden Markov situation estimation trading strategy yield index curve under different hidden state quantity.

Model Accuracy Analysis
Our purpose of establishing a hidden Markov model for the market is to estimate the state of the market and further formulate corresponding trading strategies based on the state. To reflect the effectiveness of the model we established in estimating market conditions, we conducted an accuracy analysis of the model. We hope that the price of the bull market estimated by the model is rising, and the price of the estimated bear market is falling. We have established the following accuracy indicators: where η represents the model state estimation accuracy, η 1 represents the bull market state estimation accuracy and η 2 represents the bear market state estimation accuracy. The bull state recognition accuracy η 1 is the proportion of the market that is rising among all bull states identified by the model: where N 1 represents the number of days of the bull market state estimated by the model in the test data set and M 1 represents the number of days that the index rises in the day when the model estimates the bull market state. The bear market state estimation accuracy η 2 is the proportion of the market that is falling among all bear states identified by the model: where N 2 represents the number of days of the bear market state estimated by the model in the test data set and M 2 represents the number of days that the index falls in the day when the model estimates the bear market state. The data we use when calculating the accuracy of the model are directly related to the trading strategy. Since no transactions are conducted in the mixed market state, the small bull and small bear markets, the data of the mixed market state, the small bull and small bear markets are not used in the accuracy calculation.
The traditional model uses basic data such as daily opening price, closing price, highest price, lowest price, and trading volume as observation sequence training model parameters, our model uses the data extracted through feature analysis as the observation sequence training model parameters. We separately analyzed the accuracy of the traditional hidden Markov model with three hidden states, the hidden Markov model we used with three hidden states, and the hidden Markov model we used with five hidden states, and the results are shown in Table 2. It can be seen from Table 2 that the accuracy of the three methods exceeds 50%, since our method is based on the previous data to estimate the market state of the next day, in the constantly changing market data, the estimation accuracy exceeds 50%, we can roughly think that the method is successful. Our method after feature extraction has significantly improved the accuracy of market state estimation on the basis of traditional methods. However, we noticed that in the case of five hidden states, the number of bull markets and bear markets identified by the model is relatively small. This is due to a more fine-grained division of the market, and the model places higher requirements on the determination conditions of bull markets and bear markets. That is, when the increase or decrease in the index is large, the model will correspondingly estimate the status of the bull and bear markets, when the increase or decrease is relatively small, the model will estimate these states as a small bull market, small bear market or mixed market state. This also explains that the trading strategy based on the hidden Markov model with five hidden states has a more stable yield curve.
To verify that the hidden Markov model with five hidden states can better identify the bull and bear states with large fluctuations, we have increased the requirements for determining the status of bull markets and bear markets, that is, when the increase or decrease exceeds a certain percentage, we can consider the index of the day to be a rise or fall. We calculated the daily rate of change of the index in the test data set, among the days that the index rose, the average rate of increase is 0.48%, among the days that the index fell, the average rate of decline is 0.47%. We use dates that exceed the daily rise or fall average to represent the real bull or bear market respectively, and calculated the accuracy of the model, the results are shown in Table 3. It can be seen from Table 3, when we increase the judgment requirements for the real bull market or bear market, the traditional method can no longer reflect better prediction accuracy. The accuracy of our method with three hidden states is just over 50%. Although it can be considered that the model has the ability to estimate the state, its effect is not good. The accuracy of our method with five hidden states can reach 53%, this shows that our method can still reflect good performance after improving the judgment conditions for real bull and bear markets.

Conclusions
This paper uses HMM modeling for the market situation, which is intended to calculate the state of the market, and establish a hidden Markov situation estimation trading strategy based on the results of situation estimation. We have verified the effectiveness of the model through actual data.
The experimental data shows that hidden Markov situation estimates the trading strategy's overall return is better than the double moving average strategy. Estimating the market situation based on the hidden Markov model is not subject to the linear nature of market data. It can effectively estimate the market situation through market data, seize opportunities, and achieve profitability.
Performing feature processing on the observed features of the model input to improve the correlation between the data can improve the accuracy of the model prediction, so that a better trading strategy can be worked out according to the model.
Making a fine-grained division of the market and increasing the number of hidden states in the hidden Markov model can improve the accuracy of market situation estimation. Experiments show that trading strategies based on fine-grained market situation estimates are more profitable.