Forecasting the Stock Market with Linguistic Rules Generated from the Minimize Entropy Principle and the Cumulative Probability Distribution Approaches

To forecast a complex and non-linear system, such as a stock market, advanced artificial intelligence algorithms, like neural networks (NNs) and genetic algorithms (GAs) have been proposed as new approaches. However, for the average stock investor, two major disadvantages are argued against these advanced algorithms: (1) the rules generated by NNs and GAs are difficult to apply in investment decisions; and (2) the time complexity of the algorithms to produce forecasting outcomes is very high. Therefore, to provide understandable rules for investors and to reduce the time complexity of forecasting algorithms, this paper proposes a novel model for the forecasting process, which combines two granulating methods (the minimize entropy principle approach and the cumulative probability distribution approach) and a rough set algorithm. The model verification demonstrates that the proposed model surpasses the three listed conventional fuzzy time-series models and a multiple regression model (MLR) in forecast accuracy.


Introduction
Individual stock investors never stop dreaming of becoming wealthy by trading stocks.However, only a very few people can make huge profits because it is enormously difficult to accurately predict stock prices on a daily basis.In the stock market, there are too many factors influencing prices, such as stock news, company financial reports and government economic policies.Therefore, since the first stock market opened, many analytical methods and forecasting models have been advanced in an attempt to land the big fish in the stock market sea.Two major stock market analysis approaches, fundamental and technical analysis [1][2][3][4], are commonly used by both stock analysts and artificial intelligence (AI) methods proposed by the researchers who are interested in stock markets [5][6][7][8][9][10].
Technical analysis is a subjective way to predict stock market fluctuations, although, more hidden information for future prices is given from technical indicators, which are transformed from basic indexes by specific mathematic equations [2,11], than is given by daily basic indexes (time, open index, high index, low index, close index and volume).Two analysts can come up with two completely different forecasts from the same analytical charts and technical indicators.Much of technical analysis is truly "in the eye of the beholder [4]".Therefore, viewed from the investor's point of view, empirical rules or investment experience are necessary in order to predict stock prices accurately.
However, with the emergence of data mining techniques, more and more AI tools have been applied in predicting stock markets, such as choosing an optimal portfolio by genetic algorithms [5], selecting real-world stocks by neural networks [12], and predicting the S&P 100 index by rough sets [13].In this paper, in order to avoid any possible intrusions of the model designer's subjective predictions, based on technical analytical methods, one objective, automatic, artificial intelligence model is proposed, which combines three data mining techniques into forecasting processes: (1) MEPA (minimize entropy principle approach), which subdivides data into membership functions [14][15][16][17][18]; (2) CPDA (cumulative probability distribution approach), which fuzzifies the observations into linguistic values based on the cumulative probability of the observations [17,19,20]; and (3) rough set theory [17,19,[21][22][23][24], which mines rules from the linguistic dataset.Using these techniques, objective and effective rules can be produced as the basis for forecasting.

Related Works
This section briefly reviews the related literature, including the minimize entropy principle approach (MEPA), the cumulative probability distribution approach (CPDA), rough set theory, and defuzzification methods.

The Minimize Entropy Principle Approach (MEPA)
A key goal of entropy minimization analysis is to determine the quantity of information in a given dataset.The entropy of a probability distribution is a measure of the uncertainty of the distribution [15].To subdivide the data into membership functions, establishing the threshold between classes of data is needed.A threshold line can be determined with an entropy minimization screening method, after which the segmentation process may begin, with the initial segmentation divided into two classes.
Therefore, a repeated partitioning with threshold value calculations will allow us to partition the data set into a number of fuzzy sets [25].
Assume that a threshold value is being sought for a sample in the range between x 1 and x 2 .An entropy equation is written for the regions [x 1 , x] and [x, x 2 ], with the first region denoted p and the second region denoted q.An entropy [14,16] with each value of x is expressed by following equations (1) through (3): where: where p k (x) and q k (x) = conditional probabilities (see equation ( 4)) that the class k sample is in the region [x 1 , x 1 +x] and [x 1 +x, x 2 ], respectively; p(x) and q(x) = probabilities that all samples are in the region [x 1 , x 1 +x] and [x 1 +x, x 2 ], respectively: A value of x that gives the minimum entropy is the optimum threshold value.The entropy [14,16] estimates of p k (x) and q k (x), p(x) and q(x), are calculated by following equation (5) to equation ( 8): where: Figure 1 shows partitioning processes for MEPA.While moving x in the region [x 1 , x 2 ], we calculate the values of entropy for each position of x.The value of x in the region that holds the minimum entropy is called the primary threshold (PRI) value.

Figure 1.
Partitioning process of minimize entropy principle approach.

The Cumulative Probability Distribution Approach (CPDA)
The cumulative probability of normal distribution can be used to define intervals of linguistic value [17,19,26].The procedures of cumulative probability distribution approach are described in four steps, as follows: Step 1: Test normal distribution.In this step, CDPA is used to ascertain whether the target dataset follows normal distribution.The Lilliefors test [27] is used to identify the distribution characteristic of the observations contained in the dataset.
Step 2: Define the universe of discourse U. Define the universe of discourse, U, as [D min − σ, D max + σ] for the target dataset, where D min denotes the minimum value; D max denotes the maximum value; and σ denotes the standard deviation for the observations contained in the target dataset.
Step 3: Determine interval length and build membership function.There are three sub-steps in this process: (1) define the lower bound of cumulative probability (P LB ), and the upper bound of cumulative probability (P UB ); (2) invert the normal cumulative distribution function (CDF) for defined linguistic values; and (3) define fuzzy sets and build membership functions.
Step 3-1: Define lower bound and upper bound of cumulative probability.For each given linguistic value, the lower bound of cumulative probability (P LB ) and the upper bound of cumulative probability (P UB ) are defined by equations ( 9) through (10) [28]: where i denotes the order of the linguistic value and n denotes the amount for defined linguistic values.
Based on equation ( 9) to (10), the lower and upper bounds with cumulative probability for five linguistic values are listed in Table 1.
σ σ (11) where P denotes the probability that a single observation from a normal distribution with parameters µ and σ will fall in the interval (−∞ x]. From an algorithm for computing the inverse normal cumulative distribution function [30], the lower and upper bound for five linguistic values can be produced.Step 3-3: Define fuzzy sets and build membership functions.The triangle fuzzy number (TFN) [31] is used to present the fuzzy sets for the linguistic variables of the price fluctuations (A1 to A5) based on the linguistic intervals from Table 4.The membership function of the TFN is defined as equation (12).If an observation meets two or more membership functions, the linguistic value with the maximum membership value is chosen and labeled on the observation.Table 3 demonstrates the parameterized triangle fuzzy numbers for linguistic variables of the price fluctuations (L1 to L5) from Table 2. Step 4: Fuzzify the historical data.With the inverse of normal CDF and parameterized triangle fuzzy numbers for linguistic variables of price fluctuations, all observations contained in the target set can be fuzzifed as linguistic values.

Rough Set Theory
Rough set theory was proposed by Pawlak [21] in order to distill the rules that determine the safety performance of construction firms.Since the development of the original exposition of the rough set theory (RST) as a method of set approximation, it has continued to flourish as a tool for data mining [17,[22][23][24].
Rough set theory is also a mathematical framework that deals with vagueness and uncertainty, and can be situated within the fields of artificial intelligence (AI), knowledge discovery in databases and data mining (DM).The rough set philosophy is founded on the assumption that with every object of the universe of discourse associated with it, some informational objects, characterized by the same information, are indiscernible in view of the available information about them.Any set composed of all indiscernible objects is called an elementary set and forms a basic granule of knowledge about the universe.Any union of elementary sets is referred to as a precise set, otherwise the set is rough.
A pair of precise sets, called the lower and the upper approximation of the rough set, is associated [21,32] with any rough set.The lower approximation consists of all objects which surely belong to the set, and the upper approximation contains all objects which possibly belong to the set.
The difference between the upper and the lower approximation constitutes the boundary region of the rough sets.Approximations are two basic operations in the rough set theory.The basic notions in rough sets are shown in Figure 2 [19,33].The rough set method is a series of logical reasoning procedures, used for analyzing an information system.An information system can be seen as a decision table, denoted by An example of an accident occurrence decision table [34] is illustrated in

Defuzzification
Defuzzification is the conversion of a practice quantity to a fuzzy quantity.Many defuzzification methods have been proposed and have become popular in defuzzifying fuzzy output functions.Four of these methods are summarized, as follows [25]: Max-membership principle: this scheme is limited to the peak output function; it is given as the algebraic expression (13): Centroid method: this procedure (also called center of area, center of gravity) is the most popular defuzzification method; it is given as the algebraic expression ( 14): Weighted average method: this method is only valid for symmetrical output membership functions; it is given as the algebraic expression (15): Mean-max membership: this method (also called middle-of maxima) is closely related to the first method, except that the locations of the maximum membership can be non-unique; it is given as the algebraic expression ( 16):

The Proposed Model
In stock market forecasting, we argue that two issues for statistical time-series models are considered imperfect in forecasting algorithms: (1) some mathematic distribution assumptions are made for stock market data, but sometimes the observations do not follow these assumptions; and (2) basic indexes (time, open index, high index, low index, close index and volume) cannot provide enough of the stock information hidden in history for statistical time-series models to predict stock market movements accurately because the basic indexes can only exhibit the daily static conditions of the past, which cannot express the dynamic trends of a stock market.
In recent research, many advanced forecasting systems have utilized neural networks [7-10] and genetic algorithms [35] to predict stock prices.However, we argue that there are some disadvantages to these advanced systems.
For the systems based on neural networks, three drawbacks are addressed: (1) there is little perceived reliability for neural-fuzzy systems because it is hard to determine whether the number of observations in a training dataset is adequate for forecasting; (2) the forecasting algorithms employing neural networks or genetic algorithms are not easily understood by the average stock investor; and (3) the neural-fuzzy technique is strictly quantitative and generalized to the point where human qualitative judgments are completely removed from the system [36].
For the systems based on genetic algorithms, two disadvantages are found: (1) computing costs, such as time consumption and computer resources, is higher than other statistical forecasting systems; and (2) the optimal forecast is not easily certifiable.

Proposed Concepts
To overcome the problems mentioned above, a novel forecasting model (the framework of the proposed model is illustrated in Figure 3), which integrates two advanced data granulating approaches (CDPA and MEPA) and a data mining method (rough set theory) in forecasting processes, is proposed in this paper.The three main procedures of the proposed model are described, as follows: (1) Data preprocess.Convert six basic indexes of the stock database (time, open index, high index, low index, close index, and volume) into nine useful technical indicators (RSI, MA, DIS, STOD, ROC, OBV, VR, PSY and AR, defined in Table 5), which are highly related to stock price fluctuation [2], in order to compose the attributes of experimental datasets.
(2) Granulate observations and produce rules.Utilize two advanced data granulating approaches, CPDA and MEPA, to granulate the observations of the nine technical indicators (defined in Table 5), and stock price fluctuation (defined in equation ( 17)) into linguistic values.The technical indicators are defined as conditional attributes and price fluctuation is defined as a decision attribute.Use a rough set algorithm (LEM2, Learning from Examples Module, version 2 [37]) to extract a training dataset to produce forecasting rules of linguistic values.
(3) Forecast and evaluate performance.Produce linguistic forecasts for testing a dataset with the extracted rules from a training dataset, and defuzzify the linguistic forecasts into numeric forecasts.Use root mean square error (RMSE) as a forecasting performance indicator for the proposed model.We argue that the proposed model can produce effective rules for forecasting stock market prices, based on three reasons, as follows: Firstly, we employ technical indicators as forecasting factors instead of daily basic indexes; they are practical tools for stock analysts and fund managers to use in forecasting stock market prices Also, it has been proven that some technical indicators are highly related to future stock prices [2].
Secondly, from past literature related to rough set theory, three advantages have been found: (1) the rough set algorithms can process data without making any assumptions about the dataset; (2) rough set theory has powerful algorithms which can deal with a dataset that contains both quantitative and qualitative attributes; and (3) rough set algorithms can discover non-linear relations between observations hidden in multi-dimensional datasets, and produce understandable rules in an If-Then format that are meaningful to the average stock investor.
Lastly, the advantages to using data granulating methods to preprocess raw data are that the data dimension of a database can be reduced and simplified, and the use of discrete features is usually more compact and shorter than the use of continuous ones [38].We argue that data granulating approaches can use linguistic values to represent observations in order to reduce the data complexity when using a high-dimension of a numeric dataset as an experimental dataset.Therefore, the proposed model can promote efficiency in data preprocess by employing CPDA and MEPA.This step granulates the numeric experimental dataset, which consists of two types of attributes (conditional and decision) into a granulated dataset of linguistic values for rule mining.The experimental dataset is preprocessed by two different approaches: CPDA is used to granulate the records of the decision attribute (stock price fluctuation), and MEPA is employed to granulate the records of the conditional attributes (nine technical indicators).The appropriate number of categories, based on human short-term memory function, is seven, and seven, plus or minus two [39].Therefore, from the researchers' perspective, the decision attribute is granulated with five linguistic values and the conditional attribute is granulated with seven linguistic values.The five linguistic values used to present stock price fluctuations are introduced, as follows: L1 denotes going up sharply; L2 denotes going up; L3 denotes remaining flat; L4 denotes going down; and L5 denotes down sharply.Because a technical indicator value cannot be defined in meaningful terms, the seven linguistic values to represent a technical indicator are defined as seven labeled numbers (L1 through L7).Table 8 demonstrates five parameterized triangle fuzzy numbers for five linguistic values of stock price fluctuations.Table 9 demonstrates the seven linguistic values (fuzzy numbers) and their corresponding numeric ranges for the conditional attribute of MA.Table 10 lists some observations for conditional and decision attributes for the experimental datasets.[37]) to produce rules for forecasting the future price.Table 11 lists some raw rules extracted from the training dataset.The rules can be expressed in the format of "If-Then" (Table 12 demonstrates three rules).

Forecast and Evaluate Performance
Table 11.Examples of rules extracted from training dataset using rough set algorithm.

Conditional Attribute Decision Attribute
Rule 1 Step 4: Forecast based on the extracted rules.This step maps the conditional attributes of every record in the testing dataset with the extracted rules from the training dataset (see Table 11) in order to generate a linguistic forecast for future price trends.If the conditional attributes of a record satisfy the "If" criteria of a specific rule, the linguistic forecast for this instance is defined as the "Then" part of the rule.Whenever no rule can be found for the conditional attributes of a record, the naïve forecast [40] is employed as the forecast for the future price trend.Table 13 demonstrates the linguistic conditional attributes of some records and their corresponding linguistic forecasts for a testing dataset.Step 5: Defuzzify and forecast testing datasets.Max membership principle [25] (see equation ( 16)) is employed to defuzzify the linguistic forecast from Step 4. After a linguistic forecast has been defuzzified to a numeric value, a numeric forecast (see Table 14) for a future stock price is generated by equation ( 18): where P(t − 1) denotes the stock price at time t − 1; f (t) denotes the numeric value defuzzified from the linguistic forecast for the future price trend at time t; and F(t) denotes the numeric forecast for the future stock price at time t.
Step 6: Evaluate performance with RMSE.In this step, RMSE (defined in equation ( 19)) is used as a performance indicator for the proposed model.Table 15 demonstrates some forecasts produced from the proposed model and how to compute RMSE as a performance datum: ( ) where P(t) denotes the actual stock price at time t; F(t) denotes the forecast at time t; and n is the total amount of forecasts.From the experimental results for the first part of the performance comparisons, listed in Table 16 and illustrated in Figure 4, it is clear that the proposed model outperforms Chen's (1996) [11] and Huarng et al.'s (2006) [6] models.Among the three models, the proposed model bears the smallest RMSE for four of the five experimental datasets (2001, 2003, 2004 and 2005).The proposed model also holds the smallest value of average RMSE (81) among the comparison models (94 for Chen's model [11] and 84 for Huarng et al.'s model [6]).Further, the variance of RMSE for the proposed model is the smallest (580 for the proposed model, 854 for Chen's model and 630 for Huarng et al.'s model [6]).The smallest variance implies that the proposed model performs with more stability than the other two models.* denotes minimum value among three models.
From the experimental results for the first part of the performance comparisons, listed in Table 17 and illustrated in Figure 5, we may note that the proposed model still performs with the smallest RMSE in four testing datasets (2001, 2003, 2004 and 2005) and the smallest average RMSE (81 for the proposed model, 493 for multiple regression model, and 462 for Cheng et al's model [41]).Additionally, in the stability analysis, the proposed model has better forecasting stability than the other two multiple-factor forecasting models, based on the variance of RMSE (585 for the proposed model, 105310 for Multiple Regression Model [42], and 58777 for Cheng et al's model [41]).
As the performance datum above adduces, the proposed model demonstrates outstanding performance and stability in forecasting Taiwan's stock market trends.

Conclusions and Future Research
In this paper, one novel forecasting model, based on two advanced granulating methods (MEPA and CDPA), and rough set theory, is proposed to provide understandable rules for the average stock investor and to improve forecasting accuracy of Taiwan's stock market.Based on the model verification, we argue that the proposed model has reached the research objectives.After implementing the experiment for evaluating the proposed model, three findings are noted, as follows: Firstly, technical indicators can provide more information for forecasting future stock prices.In practical stock market analysis, multiple-technical indicators can posit more meaningful stock information, such as stock price trends, fluctuations and momentums, and many stock analysts do employ technical indicators to analyze market trends.However, past fuzzy time-series models such as Chen's (1996) [11], and Huarng et al.'s (2006) [6] employed only one forecasting factor, past stock price, to predict the future stock price.The single forecasting factor is absolutely insufficient to reveal the complex relationships within a stock market.Regarding the forecasting model using basic indexes (time, open index, high index, low index, close index and volume) as multiple forecasting factors, such as Cheng et al.'s model [41] and multiple regression model (MLR) [42], we argue that the basic indexes cannot provide useful stock information for forecasting stock markets because they can only display static statistics of stock markets not dynamic market trends and fluctuations.From performance comparisons (see , it is clear that the proposed model outperforms the four listed models, Chen's (1996) [11], Huarng et al.'s (2006) [6] models, Cheng et al.'s [41] and MLR [42].The evidence has proven this finding.
Secondly, granulating methods can reduce the complexity of experiments using high-dimension datasets.The proposed model employs MEPA and CPDA to produce linguistic values for conditional and decision attributes, which can make the rule-extracting process of rough set algorithm simpler and faster.
Lastly, rough set algorithm can find useful rules from historical stock data for investment decision-making.From Table 11-12, the rules extracted by rough set algorithm can be used as investment decision suggestions for average investors.Although a linguistic forecast, generated by the rules, cannot be employed as a forecasting value, the proposed model has provided a valid defuzzifying method to produce an accurate forecasting value, based on the linguistic forecast, to predict future stock prices.
For future research, two suggestions are offered: (1) other financial markets, such as commodity futures and mutual funds can be used as forecasting targets to evaluate the proposed model; and (2) other modifying models, such as adaptive expectation models and neural networks can be used to modify the forecasts, produced from the proposed model, enabling more accurate forecasts.
value of crisp data x belonging to fuzzy set i A ~; the lower bound, midpoint and upper bound of i A ~ are defined by a, b and c, respectively.
( , , , ) S U A C D = , where U is universe of discourse, A is a set of primitive features, and , C D A ⊂ are two subsets of features, assuming that A C D =  and C D = ∅  , where C is called condition attribute, and D, as decision attribute.

Figure 3 .
Figure 3. Framework of the proposed model.


OBV is a running cumulative total which should confirm the price trend DIS DIS shows the stability of the most recent closing prices AR ROC gives buy (130 and above) and sell (70 and below) signals

Figure 4 .
Figure 4. Performance comparisons with single-factor forecasting models.

Figure 5 .
Figure 5. Performance comparisons with multiple-factor forecasting models.

Table 1 .
Lower and upper bound of cumulative probability for linguistic value.

Table 2 demonstrates the five sets of linguistic intervals (lower and upper bound values) for five linguistic values of price fluctuation in the 2001 TAIEX, based on the lower bound (P LB ) and upper bound (P UB ) of cumulative probability fromTable 1 .Table 2 .
Linguistic intervals for five linguistic values of price fluctuation.

Table 3 .
Parameterized fuzzy numbers for price fluctuations of the 2001 TAIEX.

Table 4
. In it, five cases are characterized with three condition attributes: driver's age, vehicle type and climate; and one decision attribute: accident type.The three condition attributes form four elementary { .This means that cases, 1 and 3, are indiscernible, while the other cases are characterized uniquely with all available information.Therefore, the off-road accident type

Table 4 .
Accident cases with describing features.

Granulate attributes by MEPA and CPDA Forecast testing dataset and defuzzify Extract fuzzy rules from training datasets by rough set theory Evaluate performance with RMSE Forecast based on the extracted rules Stock Data Base Transfer six basic indexes into popular technical indicatorsTable 5 .
Defined equations for popular technical indicators.

Table 7 .
Original data of conditional attributes and decision attribute.
(17) 2: Granulate conditional and decision attributes by MEPA and CPDA.In the experimental dataset, nine technical indicators are used as conditional attributes, and stock price fluctuations, defined in equation(17), is employed as a decision attribute: price fluctuation(t) = P(t) − P(t − 1)(17)where price fluctuation (t) denotes the price change from time t − 1 to time t; P (t) denotes closing price at time t; and P(t − 1) denotes closing price at time t − 1.

Table 8 .
Parameterized fuzzy numbers for decision attributes (price fluctuation).

Table 10 .
Observations for conditional and decision attributes (TAIEX).Extracted fuzzy rules from training datasets by Rough Set Theory.In this step, the experimental dataset of linguistic values is split into two datasets, training and testing.The training dataset is extracted by a rough set algorithm (LEM2, Learning from Examples Module, version 2

Table 13 .
Linguistic forecasts for testing dataset.

Table 14 .
Numeric forecasting value for testing dataset.

Table 15 .
[42]casting value and performance with RMSE.In the second part, the purpose is to verify the superiority of the proposed model.Therefore, two forecasting models using multiple forecasting factors, Cheng et al.'s model[41]and a multiple regression model (MLR)[42], are employed for purposes of comparison.

Table 16 .
Performance comparisons with single-factor forecasting models.
* denotes the minimum value among three models.

Table 17 .
Performance comparisons with multiple-factor forecasting models.