A Neutrosophic Forecasting Model for Time Series Based on First-Order State and Information Entropy of High-Order Fluctuation

In time series forecasting, information presentation directly affects prediction efficiency. Most existing time series forecasting models follow logical rules according to the relationships between neighboring states, without considering the inconsistency of fluctuations for a related period. In this paper, we propose a new perspective to study the problem of prediction, in which inconsistency is quantified and regarded as a key characteristic of prediction rules. First, a time series is converted to a fluctuation time series by comparing each of the current data with corresponding previous data. Then, the upward trend of each of fluctuation data is mapped to the truth-membership of a neutrosophic set, while a falsity-membership is used for the downward trend. Information entropy of high-order fluctuation time series is introduced to describe the inconsistency of historical fluctuations and is mapped to the indeterminacy-membership of the neutrosophic set. Finally, an existing similarity measurement method for the neutrosophic set is introduced to find similar states during the forecasting stage. Then, a weighted arithmetic averaging (WAA) aggregation operator is introduced to obtain the forecasting result according to the corresponding similarity. Compared to existing forecasting models, the neutrosophic forecasting model based on information entropy (NFM-IE) can represent both fluctuation trend and fluctuation consistency information. In order to test its performance, we used the proposed model to forecast some realistic time series, such as the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX), the Shanghai Stock Exchange Composite Index (SHSECI), and the Hang Seng Index (HSI). The experimental results show that the proposed model can stably predict for different datasets. Simultaneously, comparing the prediction error to other approaches proves that the model has outstanding prediction accuracy and universality.


Introduction
Financial markets are a complex system where fluctuation is the result of combined variables. These variables cause frequent market fluctuations with trends exhibiting degrees of ambiguity, inconsistency, and uncertainty. This pattern implies the importance of time series representations, and thus, an urgent demand arises for analyzing time series data in more detail. To some extent, an effective time series representation can be understood from two aspects: traditional time series prediction approaches [1][2][3][4]; and the fuzzy time series prediction approaches [5,6]. The former emphasizes the use of a crisp set to represent the time series, while the latter uses the fuzzy set.
Generally speaking, data are not only the source for prediction processes or prediction system inputs. The original data, however, are full of noise, incompleteness, and inconsistency, which limit the function of traditional prediction methods. Therefore, Song and Chissom [7][8][9] developed a fuzzy time series model to predict real-time scenarios like college admissions. The fuzzification method effectively eliminates part of the noise inside the data, and the prediction performance of the time series is strengthened. Subsequently, with advancing research, the non-determinacy of information has become the main contradiction affecting prediction accuracy. Some studies proposed novel information representation approaches, such as the type 2 fuzzy time series [5], rough set fuzzy time series [10], and intuitionistic fuzzy time series [11].
Although the above work has achieved considerable results for specific problems, certain shortcomings remain that pose a barrier to the accuracy and applicability of predictions. More specifically, complex scenarios and variables in actual situations make it unrealistic to define and classify explicitly the membership and non-membership of elements.
The neutrosophic sets (NSs) method, proposed by Smarandache [12] for the first time, is suitable for the expression of incomplete, indeterminate, and inconsistent information. A neutrosophic set consists of true-, indeterminacy-, and false-memberships. From the perspective of information representation, scholars have proposed two specific concepts based on the neutrosophic set: single-valued NSs [13] and interval-valued NSs [14]. These concepts are intended to seek a more detailed information representation, thereby enabling NSs to quantify uncertain information more accurately. To deal with the above problem, entropy is an important representation of the degree of the complexity and inconsistency. In a nutshell, entropy is more focused on the representation and measure of inconsistency, while NSs tends to describe uncertainty. Zadeh [15] first proposed the entropy of fuzzy events, which measures the uncertainty of fuzzy events by probability. Subsequently, De Luca and Termin [16] proposed the concept of entropy for fuzzy sets (FSs) based on Shannon's information entropy theory and further proposed a method of fuzzy entropy measurement. Since information entropy is an effective measurement in the degree of systematic order, it has been gaining popularity for different applications, such as climate variability [17], uncertainty analysis [18,19], financial analysis [20], image encryption [21], and detection [22]. Specifically, He et al. [23] proposed a collapse hazard forecasting method and applied the information entropy measurement to reduce the influence of collapse activity indices. Bariviera [24] proposed a prediction method based on the maximum entropy principle to predict the market and further monitor market anomalies. In Liang's research [25], information entropy was introduced to analyze trends for capacity assessment of sustainable hydropower development. Zhang et al. [26] proposed a signal recognition theory and algorithm based on information entropy and integrated learning, which applied various types of information entropy including energy entropy and Renyi entropy.
In order to describe the indeterminacy of fluctuations and further measure the inconsistency and uncertainty of dynamic fluctuation trends, we propose a neutrosophic forecasting model based on NSs and information entropy of high-order fuzzy fluctuation time series (NFM-IE). The biggest difference compared to the original models is that the NFM-IE represents both fluctuation trend information and fluctuation consistency information. First of all, a time series is converted to a fluctuation time series by comparing each of the current data and corresponding previous data in the time series. Then, the upward trend of each of the fluctuation data is mapped to the truth-membership of a neutrosophic set and falsity-membership for the downward trend. Information entropy of high-order fluctuation time series is introduced to describe the inconsistency of historical fluctuations and is mapped to the indeterminacy-membership of the neutrosophic set. Finally, an existing similarity measurement method for the neutrosophic set is introduced to find similar states during the forecasting stage, and the weighted arithmetic averaging (WAA) aggregation operator is employed to obtain the forecasting result according to the corresponding similarity. The largest contributions of the proposed model are listed as follows: (1) Introducing information entropy to quantify the inconsistency of fluctuations in related periods and mapping it to the indeterminacy-membership of neutrosophic sets allow NFM-IE to extend traditional forecasting models to a certain level. (2) Employing a similarity measurement method and aggregation operator allows NFM-IE to integrate more possible rules. In order to test its performance, we used the proposed model to forecast some realistic time series, such as the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX), the Shanghai Stock Exchange Composite Index (SHSECI), the Hang Seng Index (HSI), etc. The experimental results show that the model has a stable prediction ability for different datasets. Simultaneously, comparing the prediction error with that from other approaches proves that the model has outstanding prediction accuracy and universality.
The rest of this paper is organized as follows: Section 2 introduces the basic concepts of wave time series and information entropy. Then, the concepts proposed in this paper, such as neutrosophic fluctuation time series (NFTS) and the neutrosophic fluctuation logical relationship, are defined. Section 3 presents the specific modules of the model presented in this paper. Section 4 details the prediction steps and validates the model using TAIEX as the dataset. Section 5 further analyzes the prediction accuracy and universality of the model based on SHSECI and HSI. Finally, the conclusions and prospects are presented in Section 6.

Information Entropy of the m th -Order Fluctuation in a Time Series
Information entropy (IE) [27] was proposed as a measurement of event uncertainty. The amount of information can be expressed as a function of event occurrence probability. The general formula for information entropy is: where p(·) is the probability function of a set of N events. In addition, the information entropy must satisfy the following conditions: p(x t ) = 1, 0 < p(x t ) < 1. The information entropy is always positive.
According to the fuzzy set definition by Zadeh [28], each number in a time series can be fuzzified by its membership function of a fuzzy set L = L 1 , L 2 , . . . , L g , which can be regarded as an event in a time series. For example, when g = 5, it might represent a set of linguistic event variants as: L = {L 1 , L 2 , L 3 , L 4 , L 5 } = {very low, low, equal, high, very high}, etc. Definition 2. Let F(t − 1), F(t − 2), . . . , F(t − m) be fuzzy sets of the m th -order fluctuation time series {U t |t = m + 1, m + 2, . . . , T}. Let p Ut (L 1 ), p Ut (L 2 ), p Ut (L 3 ), p Ut (L 4 ), and p Ut (L 5 ) be the probabilities of the occurrence of the linguistic variants L 1 , L 2 , L 3 , L 4 , and L 5 for F(t − 1), F(t − 2), . . . , F(t − m). The information entropy of the m th -order fluctuation is defined as: where g = 5, E(U t ) is the information entropy of the m th -order fluctuation at point t in the fluctuation time series {U t |t = m + 1, m + 2, . . . , T}.

Neutrosophic Fluctuation Time Series
Definition 3. (Smarandache [12]) Let W be a space of points (objects), with a generic element in W denoted by w. A neutrosophic set A in W is characterized by a truth-membership function T A (w), am indeterminacy-membership function I A (w), and a falsity-membership function F A (w). The functions T A (w), I A (w), and F A (w) are real standard or nonstandard subsets of ] There is no restriction on the sum of T A (w), I A (w), and F A (w).

Definition 4.
Let {U t |t = 2, 3, . . . , T} be a fluctuation time series of a stock time series as defined in Definition 1. A number U t in U is characterized by an upward-trend function T(U t ), a fluctuation-inconsistency function I(U t ), and downward-trend function F(U t ), which can be correspondingly mapped to the truth-membership, indeterminacy-membership, and falsity-membership dimension of a neutrosophic set, respectively. The upwardtrend function T(U t ) and downward-function F(U t ) are defined according to the number U t shown as follows: where m j and o j (j = 1, 2) are parameters according to the fluctuation time series.
The fluctuation-inconsistency function I(U t ) can be represented by the information entropy E(U t ) as defined in Equation (2).

Neutrosophic Logical Relationship
Definition 5. Let {X t |t = 1, 2, 3, . . . , T} be a fluctuation time series. If there exists a relation R(t, t + 1), such that: where • is a max-min composition operator, X t+1 is said to be derived from X t , denoted by the neutrosophic logical relationship (NLR) X t → X t+1 . X t and X t+1 are called the left-hand side (LHS) and the right-hand side (RHS) of the NLR, respectively. X t+1 can also represented by D t . Therefore, X t → X t+1 can also be represented by X t → D t .
The Jaccard index, also known as the Jaccard similarity coefficient, is used to compare similarities and differences between finite sample sets [29]. The larger the Jaccard similarity value, the higher the similarity. Definition 6. Let X t , X j be two NSs. The Jaccard similarity between X t and X j in vector space can be expressed as follows:

Aggregation Operator for NLRs
. . , D n } be the LHSs and RHSs of a group of NLRs, respectively. The Jaccard similarities between X t (t = 1, 2, . . . , n) and X j are S X i, j (i = 1, 2, . . . , n), respectively. The corresponding D j can be calculated by an aggregation operator [30] as: According to the definition of NLR, D j can be represented by X j+1 .

Research Methodology
In this section, we will introduce a neutrosophic forecasting model for time series based on first-order state and information entropy of high-order fluctuation. The detailed steps are shown as follow steps and in Figure 1.

Research Methodology
In this section, we will introduce a neutrosophic forecasting model for time series based on first-order state and information entropy of high-order fluctuation. The detailed steps are shown as follow steps and in Figure 1.

Neutrosophication
The similarity between X t , X j historical values Using neutrosophic fluctuation sets to describe a time series

Establish Logical Relationships For Training
Data Between X And D

Calculate The Similarities
Forecast Future Data Using Aggregation Operator

STOCK INDEX DATABASE
historical fluctuation Information entropy current values

Deneutrosophication And Calculate The Forecasted Value
is the similarity measure

Step 1: Using Neutrosophic Fluctuation Sets to Describe a Time Series
Let {V t |t = 1, 2, 3, . . . , T} be a stock index time series and {U t |t = 2, 3, . . . , T} be its fluctuation time series, where U t = V t − V t−1 (t = 2, 3, . . . , T). Then, we can calculate len = T t=2 |U t | T−1 , which is the benchmark for interval division when calculating membership. Let {X t |t = m, m + 1, m + 2, . . . , T} be the m th -order neutrosophic expression of fluctuation time series {U t |t = 2, 3, . . . , T}. The conversion rules for the truth-membership T X t and falsity-membership F X t of X t are defined as follows: 3.2.
Step 2: Using Information Entropy to Represent the Complexity of Historical Fluctuations {U t |t = 1, 2, 3, . . . , T} can be fuzzified according to a linguistic set L = {l 1 , l 2 , l 3 , l 4 , l 5 }. Specifically, The conversion rule for the indeterminacy-membership I X t is defined as follows: where g = 5, p X t (L n ) indicates the probability of occurrence of the label l n in the past m days.

Step 3: Establishing Logical Relationships for Training Data
According to Definition 5, NLRs were established as a training dataset.

Step 4: Calculating the Similarities between Current Data and Training Data
According to Definition 6, similarities between current data and training data were calculated. Let t be the current data of the point. S X t,j is the similarity of NFTS between the current point t and training data j.

Step 5: Forecasting Neutrosophic Value Using the Aggregation Operator
According to Definition 7, the future neutrosophic fluctuation number X t+1 can be generated based on the training dataset and the similarities with X t . In order to eliminate very low similarity data, valid NLRs satisfy S X t,j ≥ w .

Step 6: Deneutrosophication for the Neutrosophic Fluctuation Set and Calculating the Forecasted Value
Calculating the expected value of the forecasted neutrosophic set X t+1 , the forecasted fluctuation value can be calculated by: This study needs to select the parameters of the model and estimate its performance. Many studies in the field of fuzzy forecasting have used the data from January-October as the training set and the data from November-December as the test dataset. To facilitate comparison with these existing studies, we also selected data from November-December as the test dataset. Considering the characteristics of time series, traditional cross-validation methods (such as k-fold cross-validation) have poor adaptability. A subset of data after the training subset needs to be retained for validation of model performance. Therefore, we chose a special nested cross-validation, the outer layer of which was used to estimate the model performance and the inner layer of which was used to select the parameters. Specifically, in this paper, we used TAIEX's 1999 data as an example. The closing prices from 1 January-31 October were used as the training dataset. Among them, from January-August was a training subset, and from September-October was for validation. Logical relationships were constructed between each dataset and its closest ninth-order historical values. The closing prices from 1 November-31 December were used as forecast data, and performance was evaluated by comparing forecasting and realistic data.
The information entropy of fluctuation time proposed in this paper is the intermediate term of NS. In order to maintain the consistency with the other two terms, the above results must be normalized. Normalized information entropy based on the maximum values of information entropy is calculated as follows: E (U 12 ) = 1.8911 3.7000 = 0.5111 (13) E (U 13 ) = 1.5307 3.7000 = 0.4137 (14) E (U 14 ) = 1.3923 3.7000 = 0.3763 (15) . . .
In order to convert the numerical data of stock market fluctuation time series into NS, it is necessary to calculate the elements corresponding to the truth-membership term and the falsity-membership term of NS. According to Equation (7), neutrosophic set membership can be calculated. For example, when the fluctuation value is U 12 = 28.7, then truth-membership T X 12 of X 12 is 28.7 3/2×len + 1 3 = 0.5584 and falsity-membership F X 12 of X 12 is −28.7 3/2×len + 1 3 = 0.1082. Then, the fluctuation can be represented by the neutrosophic set as follows: . . .
. . . This step requires establishing neutrosophic logical relationships based on the feature and target sets, where X 12 is the feature item of X 13 .

Step 3: Calculating the Jaccard Similarity
Jaccard similarity is usually used to compare similarities and differences of a limited set of samples. The higher the value, the higher the similarity. We used it to compare the current logical group with the logical groups in the training set in order to identify similar groups. S X 223, 12 indicates the similarity between the 223rd and 12th groups. First, we applied the Jaccard similarity measure method to locate similar LHSs of NLRs. We tested different threshold values for the training data. In this example, it was set to 0.89, and we identified 65 groups that met the criteria.
Furthermore, we calculated the forecasting NFTS using the aggregation operator: Then, we calculated the predicted fuzzy fluctuation: We also calculated the real number of the fluctuation: Finally, the predicted value was obtained from the actual value of the previous day and the predicted fluctuation value: For the sample dataset, the complete prediction result of stock fluctuation trends and the actual values are shown in Table 1 and Figure 2.
Table 1 and Figure 2 show that NFM-IE was able to successfully forecast TAIEX data from 1 November 1999-30 December 1999 based on the logical rules derived from training data.

Performance Assessments
During the experimental analysis, some methods were used to measure prediction accuracy in order to quantify model prediction effects. These methods are mainly used in the prediction field, including the mean squared error (MSE), the root mean squared error (RMSE), the mean absolute error (MAE), and the mean absolute percentage error (MAPE).
These expressions are respectively illustrated by Equations (26)- (29): where forecast t represents the predicted observations and actual t represents actual observations. Theil's U index [31] is primarily used to measure the deviation between predicted and actual values. It can get a relative value between zero and one, where zero means that the actual value is equal to the predicted value, that is the prediction model is perfect. At the same time, one indicates that the model prediction effect is not satisfactory. Theil's U index is expressed as follows: From Table 2, the results of different error statistics methods showed that NFM-IE can successfully forecast different time series of TAIEX 1997-2005.

Taiwan Stock Exchange Capitalization Weighted Stock Index
In general, TAIEX is a widely-used dataset in stock market forecasting. In order to facilitate comparison with other forecasting models, this paper also uses it as the main dataset to verify the model. Using non-stationary data can lead to spurious regressions, so we first performed a stationarity test based on the unit root test by software Eviews (Eviews10.0 Enterprise Edition, Microsoft, Redmond, WA, USA). It can be concluded that the first-order difference of TAIEX 1997-2005 was stationary data, which indicates that the fluctuation data used in this study were stationary. Further, other datasets in this study were also stationary data.
The model in this paper was based on high order, and thus, different orders may affect the accuracy of the prediction. Hence, the experimental analysis showed that when the order of fuzzy fluctuation information entropy was 9-11, the stability of the model was more ideal. Table 3 shows the experimental errors for different years under different orders. Not surprisingly, accurate fluctuation trend predictions are very important and needed. Therefore, the performance of different methods must be compared and evaluated, thus verifying the superiority or deficiency of the model. In order to verify the effects of model prediction, this section focuses on comparing this model's experimental results with those from other models. Comparing the errors across model showed that the current model had certain advantages in prediction accuracy. Table 4 shows the prediction errors for the different methods between 1997 and 2005. The NFM-IE hybrid model achieved better prediction accuracy compared to the traditional regression model, autoregressive model, neural network model, and fuzzy model (Table 4). In addition, NFM-IE exhibited better predictive power in some years compared to other hybrid models based on the fuzzy theory.

Forecasting Shanghai Stock Exchange Composite Index
SHSECI is one of the most typical stock indices in China, with certain representativeness. We selected it as an experimental dataset to verify the model's applicability.
Recently, scholars have proposed more comprehensive models based on traditional prediction methods. For example, Guan et al. [39] proposed a two-actor autoregressive moving average model based on the fuzzy logical relationships (ARMA-FR). Guan et al. [40] proposed a model based on back propagation neural network and high-order fuzzy-fluctuation trends (BPNN-HFT). This section compares several typical prediction methods. The results indicated that the model can also effectively predict the stock index. Table 5 and Figure 3 show a comparison of the different prediction methods.

Forecasting Shanghai Stock Exchange Composite Index
SHSECI is one of the most typical stock indices in China, with certain representativeness. We selected it as an experimental dataset to verify the model's applicability.
Recently, scholars have proposed more comprehensive models based on traditional prediction methods. For example, Guan et al. [39] proposed a two-actor autoregressive moving average model based on the fuzzy logical relationships (ARMA-FR). Guan et al. [40] proposed a model based on back propagation neural network and high-order fuzzy-fluctuation trends (BPNN-HFT). This section compares several typical prediction methods. The results indicated that the model can also effectively predict the stock index. Table 5 and Figure 3 show a comparison of the different prediction methods.  The comparison shows that NFM-IE outperformed other methods in predicting SHSECI from 2007-2015.
Comparing the average value of the SHSECI prediction error showed that NFM-IE had better prediction accuracy and stability compared to the neural network-based BPNN-HFT model and the statistical-based ARMA-FR model.

Forecasting Hong Kong-Hang Seng Index
Finally, the Hong Kong-Hang Seng Index (HSI) was selected as the experimental dataset. Comparing several authoritative prediction methods, we can verify the universality of the model in other stock markets. Table 6 and Figure 4 show a comparison of the different prediction methods from 1998-2012. Comparing the average value of the SHSECI prediction error showed that NFM-IE had better prediction accuracy and stability compared to the neural network-based BPNN-HFT model and the statistical-based ARMA-FR model.

Forecasting Hong Kong-Hang Seng Index
Finally, the Hong Kong-Hang Seng Index (HSI) was selected as the experimental dataset. Comparing several authoritative prediction methods, we can verify the universality of the model in other stock markets. Table 6 and Figure 4 show a comparison of the different prediction methods from 1998-2012.  To further evaluate the validity of the proposed model, we used Friedman's test to perform a significance test based on the study of Demšar [44]. For reference, Friedman's test is a parametric statistical test that was proposed by Milton Friedman [45,46]. To further illustrate the significance of the model's predictions compared to other prediction methods, this section will use Friedman's test and the post-hoc test for significance analysis. In the Friedman test phase, SPSS was used for statistical testing, and the post-hoc test phase was based on manual calculations.
In the first stage, Friedman's test requires comparison of the average ranking of different algorithms R j = 1 N i r j i , where, r j i is the rank of the j-th of k algorithms on the i-th of N datasets. The ranking of each method was based on the analysis of HSI forecast results as shown in Table 7.  [42] 4.40 Ren (2016) [43] 4.20 Cheng (2018) [10] 1.53 NFM-IE 1.47 Through software analysis, we concluded that the method had the highest comprehensive ranking. In addition, according to the Chi-square distribution, there were significant differences between these methods.
In the second stage, in order to further compare the different methods, we used the Nemenyi test [47]. According to Equation (31), α = 0.05 and CD = 1.575. Upon further comparison, we found that the method proposed in this study had significant advantages over Yu (2005) [41], Wan (2017) [42], Ren (2016) [43], etc. Although it was not significant compared with Cheng's method (2018) [10], the NFM-IE had certain advantages from the perspective of error mean and average level.

Discussion
The research was mainly focused on two issues. The first was whether the uncertainty of stock market volatility can be used as a key feature of forecasting in a complex environment. The other was whether the prediction method considering uncertainty and trend was effective. We first used the inconsistency of historical fluctuations as a stock forecasting feature and further characterized and quantified it. Then, we applied the neutrosophic set to be the representation of the information and established a neutrosophic logic relationship based on wave inconsistency. Through experimental analysis, the proposed model achieved robustness and stability with relatively few parameters. In addition, it was also proven that predictions that consider inconsistency are meaningful and effective. The advantages were embodied in the following aspects: First, NFM-IE did not need to establish complex assumptions compared to traditional regression-based prediction models. Second, the NFM-IE prediction process was more interpretable than the neural network. Finally, compared with the fuzzy prediction method, NFM-IE effectively utilized data inconsistency as key information. All in all, the model showed satisfactory performance. However, it also showed certain limitations: First, the model used single stock market data as the system input and failed to consider multiple factors fully. Secondly, using information entropy as a key tool for uncertainty measurement requires further optimization in characterizing data.

Conclusions
In this paper, we presented the concept of NFTS and proposed a prediction model based on the neutrosophic set and information entropy of high-order fuzzy fluctuation time series. This model had significant performance advantages over existing fuzzy time series models, machine learning prediction models, and traditional economic prediction models. In this paper, we applied three typical test datasets to prove that the model had certain universality and stability. In addition, this paper had a certain degree of scientific contribution in the following aspects: First, the concept of NFTS was proposed. Second, this paper proposed information entropy based on high-order fluctuation time series. Finally, this paper established NLRs based on NFTS and information entropy. This paper discussed the first-order neutrosophic time series to characterize the historical state of uncertainty and high-order information fluctuation entropy to measure the complexity of historical fluctuations. Other types of time series will be tested in the future. Meanwhile, future research should aim to establish detailed high-order neutrosophic time series models indicating the uncertainty of historical trends. In this study, we have considered the Jaccard similarity measure for comparing X_t and X_j. Further work could considered the Jensen-Shannon distance [20], which accomplishes the triangular inequality. Furthermore, in order to verify the robustness of the forecast in longer forecast scenarios, we will extend the model to 2, 3, or 4 periods ahead.