A Multifactor Fuzzy Time-Series Fitting Model for Forecasting the Stock Index

: Fuzzy time series (FTS) models have gotten much scholarly attention for handling sequential data with incomplete and ambiguous patterns. Many conventional time series methods employ a single variable in forecasting without considering other variables that can impact stock volatility. Hence, this paper modified the multi-period adaptive expectation model to propose a novel multifactor FTS fitting model for forecasting the stock index. Furthermore, after a literature review, we selected three important factors (stock index, trading volume, and the daily difference of two stock market indexes) to build a multifactor FTS fitting model. To evaluate the performance of the proposed model, the three datasets were collected from the Nasdaq Stock Market (NASDAQ), Taiwan Stock Exchange Index (TAIEX), and Hang Seng Index (HSI), and the RMSE (root mean square error) was employed to evaluate the performance of the proposed model. The results show that the proposed model is better than the listing models, and these research findings could provide suggestions to the investors as references.


Introduction
Humans have attempted to forecast future events based on past history since ancient times. Consequently, people have constantly endeavored to develop new technologies for making better predictions. In recent decades, numerous scholars have proposed many fuzzy time series (FTS) methods for improving the forecast performance. Song and Chissom [1] first presented a FTS model, and many different FTS models have been subsequently proposed for forecast in various domains, such as university enrollment [2][3][4][5], stock indexes [6][7][8][9][10] and meteorology [11,12]. For investors, forecasting the stock index is a very important topic; many profits can be earned if we can accurately predict stock trends accurately. Many factors can affect the stock market and cause market volatility, such as global economic conditions, domestic macroeconomic factors, and highly relevant foreign stock markets. However, most of the time series models employ the stock index as the only factor in forecasting and ignore other important factors [13]. Intuitively, a better result for the forecast can be obtained if we consider more variables when making predictions [14].
Since the 1990s, we have experienced many financial crises, such as the 1994 Mexican financial crisis, the 1997 Asian financial crisis, and the 2008 global financial tsunami. As a result, international financial markets have undergone numerous changes, the most critical being that large amounts of money are now free to flow between without restrictions between any governments. In the context of this liberalization and globalization of international capital flow, the stock market reflects all economic activities worldwide. Consequently, using the stock price index, we can gauge the economic circumstances of a country. Finding a pattern in the fluctuations of a country's stock market would benefit forecasting the future trends of a country's economy. It would also provide information for enterprises and investors to help them make decisions. Dickinson [15] showed that international stock markets can lead to the fluctuations in global stock indexes. For individual investors, stocks are highly profitable financial products but involve high risk. The stock market is a dynamic environment, and obtaining precise information is crucial for planning investments of investors.
Trading volume is a fundamental factor suitable for use in long-term forecasting [16]. Many researchers have reported a relationship between stock index and trading volume [6,[17][18][19][20][21][22][23][24]. Campbell et al. [18] showed the relationship between the trading volume of the stock market and the daily serial stock returns. Hiemstra and Jones [20] presented the evidence of a clear two-direction nonlinear causal relationship between returns and trading volumes. Rashid [17] showed that volume has an important non-linear explanatory power for stock returns. However, stock returns have a linear relation with trading volumes, which indicates that one of these factors can improve forecast by knowledge of the other. Zhu et al. [22] demonstrated that volume can enhance the medium-term and long-term forecasts in neural networks. Le and Zurbruegg [23] proposed using trading volume to improve the forecasts in various autoregressive conditional heteroscedasticity (ARCH) models.
In the soft computing techniques, artificial neural networks (ANNs) are used in most cases due to their performance in nonlinear systems. To deal with the problem of nonlinear or nonstationary time series, many notable studies on adaptive neuro-fuzzy models (ANFISs) have been done by Su and Cheng [25], Stefanakos [26], and others. Therefore, the motivation for this study is based on three factors used to fit a linear combination of multifactor fuzzy time-series to forecast the stock index, because a linear model is simple and easily explained. As mentioned above, this study proposed a novel multifactor FTS model based on stock volatility to forecast a stock index. The trading volume, stock index, and interaction between two stock markets were employed as forecasting factors in the proposed model, and the optimal parameters α, β, and γ are obtained through training.
The remaining section is organized as follows: The related work is presented in Section 2; Section 3 outlines the proposed model; Section 4 shows the verification and comparison; Section 5 presents the findings; and Section 6 is the conclusions.

Related Work
This section briefly reviews the FTS and causality relationship between price and volume.

Fuzzy Time Series
Fuzzy theory [27] was first employed to solve the linguistic values problems. Song and Chissom [1,25] applied fuzzy theory to a time series model to forecast student enrollment. Soon, Song and Chissom [28] also defined a one-order time invariant FTS model to forecast enrollment. Later, Chen [4] presented a FTS model that produced a more accurate forecast than that of [1] by applying simplified arithmetical operations in forecasting algorithms. Bose and Mali [29] summarized and reviewed the past 25 years of Elsevier journals on fuzzy time series forecasting during 1993-2018, they were based on FTS forecasting models with five main steps (define range, partition range into intervals, fuzzifiy the dataset, build a fuzzy logical relationship (FLR) and FLR group (FLRG), and defuzzify the fuzzy value) to classify into two main contribution phases: (1) the data partitioning Phase and (2) the prediction phase.
For partitioning, Chen et al. [30] proposed a new FTS forecasting method based on the proportions of intervals and particle swarm optimization (PSO) techniques; they used PSO techniques to obtain the optimal partitions of intervals. In forecasting, an important factor impacting the forecasting performance of fuzzy TSF models is FLRs. Traditionally, the rule-based methods have been used to build FLRs [29]. However, to reduce the computational complexity and improve the forecast accuracy, authors have used ANNs [31,32], fuzzy inference systems [33], support vector regression [34], and a general regression neural networks [35] as alternatives. For a multi-factors FTS, Chen and Chen [36] proposed a leverage of fuzzy variations between the main factor and the secondary factor to forecast the TAIEX. Huarng et al. [37] also presented a multivariate heuristic model for fuzzy time-series forecasting.
The current FTS approach benefits both theoretical development and related applications, which offers wider uses. This trend indicates that the development of FTS has improved significantly. Especially, artificial intelligence is used in most forecast problems due to its performance for nonlinear systems. In nonlinear or nonstationary time series, many studies on adaptive neuro-fuzzy models (ANFIS) have been proposed, such as those by Su and Cheng [25] and Stefanakos [26].

The Causality between Price and Volume
In financial prediction, we usually use the price as an index to predict the financial trend. In the stock market, price fluctuations will impact the investor's mind and behavior. Volume is a measure of market liquidity, while index refers to the price. If there is a relationship between the volumes and fluctuations of price, the trading volume is higher than the average level, which generally represents the real trend of the price.
Recently, Tsai [38] used the average price and transaction volume of existing houses to analyze the dynamic price-volume causality in the American housing market; he applied a rolling window sample to estimate the relationship in the bootstrap Granger causality test, and the results reveal that the transaction volume tends to be informative during price rigidity. Previously, most investors have paid attention to the relationship between volume and stock returns, but recent studies have begun to show a dynamic relationship (causality) between the daily share returns and trading volumes, following the concept of Granger causality [39]. The circumstance above could be found in the multinational stock market and used to enhance the accuracy of predicting the stock price and forecasting the trend of the stock market index [40][41][42][43]. To correctly forecast stock price, experts have studied the technical analysis indicators and proven the significance of these indicators. The volume ratio value is verified to be a significant technical indicator and is used in our proposed model [44].
However, to obtain the better forecasting, more factors must to be added in the FTS model, and the causal relationship of the FTS should be considered. The high-order time-series models must also be employed. Lee et al. [14] has proposed a two-factor and high-order FTS, which has been proven to be highly efficient.

Proposed Model
Stock market volatility is affected by many variables, such as global economic conditions, the domestic macroeconomy, and highly interrelated foreign stock markets. In previous studies, many time series models only employed the stock index as a single factor in forecasting and ignored other factors. To improve accuracy, more factors should be considered in stock forecasting models.
Previous studies paid attention to the relationship between stock returns and trading volume, but many researchers have begun to present the dynamic relationship between daily trading volume and stock returns, which is named "causality" [20]. Stock market causalities have shown a correlation between trading volume and the stock market index. Hence, the trading volume must be added as a factor in stock index forecasting.
Likewise, both academics and investors have given considerable attention to the interactions between different stock markets when forecasting a stock index. The practical experience of stock market investors has shown that the relationships between different stock markets are enough to be noticeable; therefore, numerous investors have utilized the fluctuations of other stock markets that significantly affect their respective domestic markets as the basis for forecasting [13]. Lee et al. [45] looked for spillovers from the returns of the Nasdaq Stock Market (NASDAQ) and volatility in Asian second board markets after checking for spillovers from the New York Stock Exchange and discovered strong evidence of staggered spillovers from the NASDAQ to Asian markets when concurrent main board market returns were excluded. Yang [46] proved that the price change spillover effects from the NASDAQ were significant for both the Taiwan Stock Exchange Index (TAIEX) and the Taiwan Electronic Sector Index Futures (TEXF), with close-to-close returns being the most strongly affected. Because of the time difference between the United States and Taiwan, the opening price of the Taiwan stock market might show an overreaction in its close-to-open rate of return. Following the overreaction of the opening price, the market index would adjust to the proper price during trading hours. Other studies have clearly described the significant interaction between two stock markets [47,48]. Therefore, in this study, the NASDAQ, TAIEX, and Hang Seng Index (HSI) were used as experimental datasets with which to forecast a stock index in this study.
Most stock markets forecasting models developed have three major drawbacks: (1) they employ the stock index as the only single factor in forecasting; (2) they include only the causality between the trading volume and stock index; and (3) they consider the interactions between only two stock markets. To overcome these drawbacks, this paper refers to [49] to improve the proposed method and experiment. The proposed model extends the multi-period adaptation model in [7] by including the three crucial factors mentioned: stock index, trading volume of stock, and the interactions between two stock markets. The procedure of the proposed model is shown in Figure 1.

Proposed Computational Step
As mentioned above, this paper proposes a multi-factor fuzzy time series model to improve performance for stock forecasting. The proposed computational methods entails eight steps: (1) Collect datasets; (2) fuzzify the observations and defuzzify F(t) to obtain the initial forecast; (3) transfer trading volume into market signals; (4) calculate the daily difference of the indexes between two stock markets; (5) propose a multi-factor forecasting Equation; (6) adapt the best parameters for α, β and γ; (7) forecast the stock index; and (8) evaluate the forecasting performance. To detail the proposed model, each step of the proposed algorithm is described as follows: Step 1. Collect the datasets.
Step 2. Fuzzify the data and defuzzify F(t) to obtain an initial forecast.
(1) Determine the range and linguistic values.
(2) Build fuzzy sets and fuzzify the historical data.
Build the fuzzy set B1, B2, … Bk in the following range: where bij is the grade of uj in fuzzy set Bi, Determine the grade of each historical datum belonging to each Bi (i = 1… m). If the maximal grade is located in Bk, then we mark the fuzzified stock index as Bk. The following seven linguistic values were used in this study: B1 (very low), B2 (low), B3 (slightly low), B4 (normal), B5 (slightly high), B6 (high), and B7 (very high).
(3) Build fuzzy logical relationships (FLRs) From the two consecutive fuzzy sets, Bi (t − 1) and Bj (t), we can use Bi→Bj to represent fuzzy logical relationships. i.e., "If part (rule condition)" is the value of the t trading day, and "Then part (rule conclusion)" is the value of the t + 1 trading day.  (1)); then, the FLR can be represented as B6 (t)→B5 (t + 1). (4) Merge all FLRs We use the same left-hand sides to merge FLRs and form a FLRs group. For instance, Bi→Bj, Bi→Bk, Bi→Bm can be grouped as Bi→Bj, Bk, Bm. A fluctuation-type stock index has three trends: upward trend, no change, and downward trend; these three trends are used to express the FLRs group. The range of the stock index is partitioned into seven linguistic terms in this paper. As mentioned, price fluctuation is used to group the FLRs. For example, B1→B2 is grouped as an "upward" trend, B1→B1 is "no change", and B2→B1 is a "downward" trend.

(5) Give weights to all FLRs groups
Using the fluctuation-weighted approach [51], we assign weights to all FLR groups, and normalize all weights to a normalized weight matrix, Wn(t), as presented in Equation (2): (6) Defuzzify F(t + 1) to obtain the initial forecast.
The defuzzified forecast F(t + 1)df and the defuzzified matrix Ldf(t), which are defined as Equations (3) and (4), respectively, are applied in this step. The defuzzified value then denotes the initial stock index forecast.
where mi is the intermediate point of each linguistic interval, Li.
Step 3. Transfer the trading volume into market signals.
(1) The daily trading volume is converted into a technical indicator of volume VR(t) as follows: The range X = [low, high], which covers all observations VR(t) in the training dataset. This range is initially partitioned into five linguistic intervals Li based on stock market properties.
The function of the signal transfer is defined in Equation (6). This function can be used to transfer all possible linguistic terms into the corresponding market signals.
Step 4. Compute the daily differences of the two stock market indexes.
(1) Select the causal relationship of the two stock markets.
It is common knowledge that the fluctuation of the NASDAQ strongly impacts the TAIEX for Taiwanese investors. This correlation has two main aspects: (1) The US stock market occupies a leading role in the global economic environment and thus has a certain effect on other stock markets; and (2) Taiwan's economy relies mainly on exports, and the United States is one of its major export countries [13]. Therefore, Taiwan is affected whenever the American economy is in a downturn. More specifically, the Taiwan stock market mostly consists of stocks in the electronic technology industry, such as in the Taiwan Semiconductor Manufacturing Company and Chunghwa Telecom. The NASDAQ is thus a crucial indicator for the Taiwan stock index and is used as an important factor affecting the TAIEX and HSI.
(2) Compute the differences of daily two stock market indexes If no transactions exist on day(t), the index of this day is replaced by that on day (t − 1). The daily difference between the Taiwan stock (or HSI stock) and NASDAQ stock is |P(t) − N(t)|, where P(t) is the stock index at day t, and N(t) is the NASDAQ index at day t.
Step 5. Propose a multifactor forecasting model. This step is based on the adaptive expectation model (Kmenta, 1986) and the multi-period adaptation model (Chen et al., 2008) to propose a novel multifactor forecasting model, which contains three significant factors, as per Equation (7): where P(t) represents the stock index on day(t), N(t) denotes the NASDAQ index on day(t), F(t + 1) denotes a forecast of the stock index on day(t + 1), LVR(t) represents a linguistic term of the VR(t) indicator, VR(t) denotes a technical indicator of volume, M(LVR(t)) represents a signal transfer function, and α, β, and γ are the coefficient of three factors for the proposed fitting forecast model. The meanings of parameters α, β, and γ are explained as follows: (1) α represents the degree of influence of the F(t + 1) forecast from the market signals of trading volume and the actual stock index. Taiwan stock has is a volatility limitation of ±7%, whereas HSI has no such restriction; thus, to obtain accurate factors and better train the forecasting equation, we extend the range of α to between −0.15 and 0.15. (2) β represents the degree of influence of the F(t + 1) forecast based on the difference between the forecast stock index and the actual stock index. Moreover, given the volatility limitation of TAIEX (±7%) and the lack of a limit for the HSI stock, we plot the daily fluctuation of HSI as shown in Figure 2. From Figure 2, we see that the daily fluctuation is no greater than ±15%. Then, we can set the range of β from −0.15 to 0.15 to search for the optimal β.
(3) γ represents the degree of influence of the F(t + 1) forecast from the daily difference of two stock indexes; the range of γ is [−1, 1], where −1 is an entire negative correlation, and 1 represents completely positive correlation In Equation (7), F(t) has been defuzzified by Equation (3) and M(LVR(t)) has been transferred into the corresponding market signals by Equation (6). Therefore, the three factors have been crisp values, and the M(LVR(t)) is an indicator signal; then M(LVR(t)) × P(t) is a linear factor. Hence, the proposed model in Equation (7) is a linear multifactor forecasting model. Next, from Step 1 to Step 4, we employed the collected stock datasets to fit the Equation (7) based on the minimal RMSE to obtain the best parameters for α, β, and γ, and each iteration step is 0.001 for α, β, and γ, the detailed computation could be referred to Algorithm 1.
To obtain the best parameters for α, β, and γ, the training data are used to adapt these parameters by calculating the minimum RMSE using Equation (8). We set a reasonable step iteration (here set as step = 0.001), thereby finally producing the best parameters for α, β, and γ.
where A(t) is the actual value on day(t), F(t) is the forecast value on day(t), and n is the number of trading days.
Step 7. Forecast the stock index.
In step 6, the parameters that optimize forecasting performance (minimum RMSE) are obtained in the training process. The trained Equation can then forecast F(t + 1) for the testing dataset. Step 8. Evaluate the performance of the forecast.
Using Equation (8), the RMSE for all the testing data is calculated, and we use the RMSE as an evaluation criterion to compare with the listed models.

The Pseudocode of the Proposed Model
For easy computation, this section presents a pseudocode for the proposed model as Algorithm 1. The notation used in Algorithm 1 is introduced as follows: : stock index for training in the i-th year; : trading volume for training in the i-th year; : interaction between two stock markets for training in the i-th year; : closing price for training in the i-th year; : next half-year stock index for testing in the i-th year; : next half-year trading volume for testing in the i-th year; : next half-year interaction between two stock markets for testing in the i-th year; : next half-year closing price for testing in the i-th year.

Verification and Comparison
The TAIEX, HSI, and NASDAQ stock data from 1997 to 2004 are used as an experimental dataset in this study. Using the half-year sliding window method, we divide these dataset into fourteen overlapping sub-datasets. Each sub-dataset contains data covering one year for training and half a year for testing, and the window is shifted up by half a year for each sub-dataset. For verification, the forecast results of the fourteen TAIEX testing periods are listed in Table 2 and that the fourteen HSI testing periods in Table 3.  To show the forecast results of the proposed model, we employed the RMSE as an evaluation indicator and used Chen's [4], Huarng and Yu's [9], and Chen et al.'s [7] FTS models as comparison models. To evaluate whether the performance of the proposed model surpassed that of conventional time series models, we also compared the performance of the proposed model with support vector regression [33] and a general regression neural network [34]. The experimental results for the TAIEX are shown in Table 4 and results for HSI are shown in Table 5. These tables show that the proposed model outperformed the other models in the fourteen testing periods, and the average RMSE of Tables 4 and 5 also shows that the proposed model outperformed the other models.  /12  155  237  157  672  2077  82 a  2003/01~2003/06  160  150  157  159  1582  71 a  2003/07~2003/12  150  459  246  733  1473  61 a  2004/01~2004/06  188  534  314  360  1924  112 a  2004/07~2004/12  106  166  96  392  1357  59 a  Average  241  303  303  541 2064 101 a a The best performance among 6 models, SVR: support vector regression, and GRNN: general regression neural network. Note: "a" denotes the best performance among the six models.

Algorithm 1: Multifactor FTS model
To test whether the proposed model outperforms the other models, we use a nonparametric Wilcoxon signed rank test [52] to compare all RMSEs of the two matched models and determine whether their population mean ranks are different. Each stock market has 15 pairwise comparisons (a pairwise comparison of six models), and the test results of two stock market are shown in Table 6. We see that all cells are "+ *" in the second and eighth row of Table 6, indicating that the proposed method is better than that of the other five models in the TAIEX and HSI datasets, and the GRNN presents the worst result.

Findings and Discussion
After verification and comparison, this section provides some findings and discusses the relevant problems. First, there are five findings: (1) From the literature review, the selected attributes (trading volume, stock index, and interaction between two stock markets) have been proved to have an impact on the forecast of the stock market, and the results have a minimal RMSE, which will lead to a higher profits for investors. (2) Tables 4 and 5 indicate that the TAIEX is less volatile than the HSI. This is because Taiwan limits the volatility of shares to ±7%, whereas Hong Kong has no limit. From Figure 2, the daily fluctuation of HSI can help us to set the search range for quickly obtaining the optimal parameters for α and β. (3) As in Table 2, Taiwan is influenced by US stock market activity; the maximal impact γ is ±0.003.
However, Table 3 shows that Hong Kong is less influenced by US stock market activity, and the impact γ is 0 in the five training periods.  (Tables 4 and 5) and statistical test (Table 6) show that the proposed model outperforms other models in forecast accuracy (less RMSE) because the proposed linear multifactor forecasting equation with three optimized parameters (α, β, and γ) produces an optimal prediction to match past stock index patterns and generates a more accurate forecast.
From results and findings, we list three issues to discuss as follows: (1) Will an accurate forecast lead to higher profits? In general, the minimal forecast RMSE will lead to a higher profits for investors. However, it is very dangerous to directly use an AI forecast model for investing in the financial market. In practice, the use an AI forecast model has its limitations, such as factor selection (technical indicators, foundational analysis, and news), information quality, timely information, and investor policies (riskadjusted trading strategy returns, valuable investment, short-term, very short-term, medium-to-longterm, and stock ownership), etc. Therefore, we suggest that the investors should be more cautious in using AI forecast models.
(2) How should one select a forecast model?
There are many algorithms we can use, from longstanding best practices to cutting-edge methodologies. Each algorithm has its pros and cons. The complexity, assumptions, and types of data inputs used in a given model type will vary, but the basic ingredients are similar across the board. Therefore, we suggest that a user first needs to understand the pros and cons of each model as a start, and the dataset would fit into a single model type. To understand the model limitations, it is also possible to build many forecast models to compare their performance.
(3) What are the advantages of using fuzzified historical data?
From the compared methods [4,7,9,33,34], we find that Chen method [4] is FTS model with one factor, Huarng and Yu [9] is also one factor with picking high (the highest price of the day) and low (the lowest price of the day) as Type 2 observations, and Chen et al. [7] is a one-factor FTS model with multi-period adaptation model. The three methods [4,7,9] are FTS models with simple and easy understanding, but they do not consider more key factors, hence their performance are not better than the proposed model. Furthermore, GRNN has local minimal point and over-fitting problems, and the computational requirement of the SVR is quite tedious [53], especially how to select the fitting kernel function is a key question.

Conclusions
This study proposed a novel multifactor FTS model based on stock volatility, the proposed model employs three crucial forecasting factors, namely, stock trading volume, stock index, and interaction between two stock markets. Experimental results for the TAIEX and HSI demonstrate that the proposed model forecasts the stock index effectively and outperforms other models in RMSE. From the forecast results of two stock markets, the average RMSE indicates that the proposed model outperformed the other models as shown in Tables 4 and 5. However, HSI index has larger fluctuation than TAIEX index, because Taiwan stocks have a volatility limitation of ±7%, whereas HSI stocks have no such restriction. Hence, the larger fluctuation leads to the average RMSE of all models with larger RMSE except SVR in HSI dataset. On the other hand, the proposed model has the tolerance of fluctuation (smaller RMSE) as shown in Tables 4 and 5.
After our findings and discussions, we found that the proposed model has three major advantages: (1) fuzzy time series with multifactor prediction; (2) to transform nonlinear model into linear fitting model with fast convergence and accurate forecast capability; and (3) from literature review, to select three important factors (stock index, trading volume, and the daily difference of two stock market indexes). Therefore, we think that the proposed model fits well into the current state of knowledge.
A simple and easy to explain model is the motivation of this study; therefore, we used three factors to build a linear combination of multifactor fuzzy time-series for forecasting stock index. From Ahmed and Khalid [53], we found the ANN-based forecasting models have some drawbacks such as local minimal point, over-fitting problems, etc. These drawbacks can be overcome by advanced hybrid intelligent models like SVM, ELM, and ANFIS. However, the computational requirement of most of these models is quite tedious, especially if training through an optimization technique is involved; this is a drawback for real applications. Hence, this study proposes a linear fitting model, which has been converted from a nonlinear fuzzy time-series multifactor model into a linear fitting model, and the proposed linear fitting model has fast convergence and accurate forecast capability.
Two suggestions to further enhance the proposed model by making its results less conservative and improving forecasting performance may be suggested: (1) Other forecast factors could be used in the proposed model; (2) other methods (such as machine learning algorithms) could be embedded in the proposed model, and (3) the policy of investor being a key factor to obtain profits, adding policies of investor could improve forecast capability and calculate the profits.