Distinguish between Stochastic and Chaotic Signals by a Local Structure-Based Entropy

As a measure of complexity, information entropy is frequently used to categorize time series, such as machinery failure diagnostics, biological signal identification, etc., and is thought of as a characteristic of dynamic systems. Many entropies, however, are ineffective for multivariate scenarios due to correlations. In this paper, we propose a local structure entropy (LSE) based on the idea of a recurrence network. Given certain tolerance and scales, LSE values can distinguish multivariate chaotic sequences between stochastic signals. Three financial market indices are used to evaluate the proposed LSE. The results show that the LSEFSTE100 and LSES&P500 are higher than LSESZI, which indicates that the European and American stock markets are more sophisticated than the Chinese stock market. Additionally, using decision trees as the classifiers, LSE is employed to detect bearing faults. LSE performs higher on recognition accuracy when compared to permutation entropy.


Introduction
The local structure and its integration are crucial components in defining the vast system as a whole. For instance, the personal abilities of football players on the field and their teamwork play key roles in a competition [1], or the seismic capacities of buildings rely on materials and frame structures [2]. However, people tend to use local structures to identify objects in machine learning work [3,4]. Numerous examples can be found in feature selection [5,6], pattern recognition [7][8][9], and so on. Is it possible that a local structure is enough to accomplish identification? If it is possible, it means that other methods, such as global-local structure-based learning [10], are not necessary. So, no one can deny that it depends on specific issues. People tend to agree that only when local structures contain enough information for judging can the local structure-based strategies take effect. Therefore, finding the amount of information contained in the local structures of samples is important for further exploration.
In the time series analysis field, entropies were used to reflect the amount of information contained in observed data. The first one can be traced back to Shannon entropy [11], which is defined as, where X is a random variable and p(x) is its probability density. For a time series, Equation (1) is valid under the assumption of stationarity despite whether X is univariate

Method
In [50], Donner introduced the construction of RN, which takes individual values as vertices and indicators of recurrences as edges. Here, a pair of states whose values are close enough can be regarded as recurrences. Thus, RN represents the reappearance of system states over a long period. It is natural for one to (hope to) use the recurrent numbers of the initial states in sliding windows of different lengths to characterize the complexities of the systems. Let x t = (x 1t , x 2t , . . . , x mt ) T t≤N be an m-dimensional time series. τ and i are integers . . . . . . . . .
Let j ∈ {0, 1, 2, . . . , τ − 1}, y i j denotes the (j + 1)th column vector of Y i . Define where |·, ·| is a distance in R m (in this paper, we use the Euclidean distance). Given a tolerant threshold r and the number of j, such that dis j ≤ r is called the recurrent number of Y i with the scale τ, and the tolerance r is denoted by #(Y i ; τ, r). That is, where sign(·) is the indicator function. Two examples of the above procedure are demonstrated in Figure 1.  Now, the LSE of x t under scale τ and tolerance r is defined as where g(·) = p(·)/a τ,r , p(·) represents the density function, and a τ,r is the average of #(Y i ; τ, r).

NOTE 1
The value of LSE depends on the selection of τ and r. When their ranges are given, they can be chosen by the maximal LSE. Namely, In general, the range of τ can be decided by the length of the time series, such as 1/100 to 1/10 of the raw data. The range of r can be selected by referring to the standard deviation of the multivariate standard Gaussian series. See Appendix A.1. NOTE 2 Given a set of values for τ and r, respectively, such as τ ∈ {τ 1 , τ 2 , . . . , τ q } and r ∈ {r 1 , r 2 , . . . , r s }, the values of LSE(x t ; τ i , r j ) form a matrix, which presents the variation of the complexity of x t over different scales and tolerances. Therefore, it could be used to characterize x t . NOTE 3 We adopt two strategies for further exploration. To designate the corresponding LSE values, we utilize the variables LSE A and LSE B . METHOD A: normalize each component of x t and specify a tolerance range as described above, which leads to LSE A being immune to linear transformations; METHOD B: do not normalize x t but set tolerance range by the total of component variances, thus taking into account the extents of various channel volatilities. Broadly speaking, LSE A and LSE B are not equivalent, and they can be seen as two different ways of describing the initial sequence. See Appendix A.2 for more information.

Numerical Simulations
In this section, we test LSE on several multivariate deterministic and stochastic signals. At first, we analyze fractional-order chaotic systems and random series and find out ranges of τ and r in Section 3.1, then we test LSE on integer-order, fractional-order chaotic systems and random sequences in Section 3.2.

Fractional-Order Chaotic Systems and Random Series
Derivatives and integrals of fractional orders are employed to describe objects with memory properties, such as power law nonlocality or long-range dependence [51] and, thus, can model real-world systems more accurately than the classical integer calculus. Many fractional-order dynamical systems [52][53][54] with total order of less than three can exhibit chaos while continuous nonlinear systems with the total order of less than three cannot under the concept of the usual integer order. In this section, we simulate three multivariate fractional-order dynamical models and three random signals to test LSE.
For many classes of functions, the three most well-known fractional-order derivatives (the Grünwald-Letnikov (GL), Riemann-Liouville, and Caputo) are equivalent under some conditions [55]. Here, we use the GL definition, i.e., Then, for the fractional-order differential equation a general numerical solution has the form of [56] where c (α) Then, several different dynamic systems, including the fractional order chaotic system, multivariate vector autoregression moving-average process (VARMA), white Gaussian noise (WGN), and 1/f noise, are generated and then analyzed by LSE to showcase its effectiveness. The method in [56] is adopted to generate numerical solutions of Equations (11) to (13); they are chaotic time series, which are cross-validated with the largest Lyapunov exponent [54]. The simulation timespan is 0:0.005:30. (Start at 0, end at 30, the step is 0.005). To avoid the influences of the initial values, the first 10% of data are discarded and the sampled series (500 × 3, 500 simulated three-dimensional vectors) start from a random position. Figure 2 displays examples of these signals.

1.
Fractional-order Chen system [52] Fractional-order Rössler system [53]   Fractional-order financial system [54]  Multivariate vector autoregression moving-average process matrix is the unit matrix, is a standard Gaussian White noise, and the length equals to the above numerical solutions. This procedure is completed by the ARMA2AR and VARMA function of MATLAB2020b.

5.
White Gaussian noise We use the NORMRND function of MATLAB2020b to simulate WGN (500 × 3). Its components are independent with zero mean and unit standard deviation. 6.
1/f noise On the basis of the algorithm in [57], the procedure of generating 1/f noise consists of three basic steps: (i) simulating white noise whose length is 1500, we obtain y 0 ; (ii) DFT (discrete Fourier transformation) on y 0 , multiplied by f − 1 2 and symmetrized for a real function, then IDFT (inverse discrete Fourier transformation), adjust the mean and standard deviations, yielding y 1 ; (iii) resize y 0 and y 1 to 500 × 3 matrix; finally, the 1/f noise series is composed as [y 0 (:, 1), y 1 (:, 2), 1 2 (y 0 (:, 3) + y 1 (:, 3))]. Now, we set the simulation parameters as follows: τ ∈ {5, 6  Examples of the chaotic and stochastic series. In fact, we view the multivariate time as a vector sequence. The first axis is the time, (t,0,0) is regarded as the original point for plotting the vectors (x(t),y(t),z(t)). The length is 300, and the time step is 0.005. Note that the series of Rössler and the financial systems appear to be very simple variations.  From Figure 3, we found both METHOD A and METHOD B can distinguish multivariate chaotic signals from stochastic ones. For a fixed (τ, r), the LSE of random time series is higher than that of a chaotic one. As shown in Figure 3a, WGN has the most complex signals, the 1/f noise and VARMA series appear to have similar trends, and the other three chaotic series lie under 0.1. They keep the same orders for most (τ, r). In Figure 3b, LSE values are significantly different and hardly overlap, except for Chen and Rössler systems. The reason these two systems are indistinguishable can be because the range of coefficients of r is rather small. Next, we change the lower and upper bounds to 0.01 and 1. Moreover, the above simulation will be repeated 100 times (100 trajectories for each system) to check the robustness of LSE. In each simulation, we add small disturbances to initial values (Disturb initial = 0.3 × ) ). Time series generated from Equations (11) to (13) are still chaos with these disturbances; refer to [56]. Note that the intersections of LSE surfaces and τ = τ i (or r = r i ), other than surfaces themselves, are utilized to verify the validity of our methods.  In Figure 4, the LSE of random series is higher than that of chaotic ones, thus we believe LSE can be treated as an efficient tool to characterize multivariate time series. When τ is fixed, LSE values of different dynamics show their own features. In Figure 4a . Moreover, the former shows better discrimination on chaotic systems. This is because of the standardization process of METHOD A, which weights all components at an equal level. It may amplify the LSE if the system's variation is caused by only one or a small group of components.
Based on the simulation findings mentioned above, the following reference range of parameters can be provided: Figure 4a-d and in Figure 4i-l, when r is taken appropriately, these curves have essentially the same shape, making it possible to distinguish between different signals. This suggests that the distinction is not dependent on the value of τ, and the corresponding time complexity can be taken into consideration when designing this parameter.
For example, when r = 0.53399, LSE chaos is less than 0.1 but LSE random is higher than 0.1. Therefore, the stochastic sequence can be distinguished from chaotic ones. See Figure 4e. If r is too small, LSE has a large standard deviation, as is shown in Figure 4a,i. If r is too large, LSE chaos and LSE random are overlapped, thus, it is difficult to tell apart various signals. See Figure 4g,o.
However, one problem is that LSE rossler and LSE f inancial cannot be distinguished, as they are coincident in Figure 4. The reason for this is that the simulated series of Equations (12) and (13) is short and the time step is too small, thus they cannot reflect the feature of the whole system. So, we change the time step to 0.05 and the time span to [0:0.05:300] for the above two systems in order to verify whether LSE functions or not. Examples of the above six kinds of signals are shown in Figure 5. After repeating 100 times, similar to what we have done in previous tasks, we perceive that LSE works for many tolerance values and some results are drawn in Figure 6. Here, we just show several LSE-τ figures for fixed tolerances or coefficients, such as Figure 4e,n, because LSE-r charts appear too similar to disparate different series. From Figure 6, one can easily distinguish six series. For example, in Figure 6d, LSE rossler A lies at the top, in Figure 6b, the LSE values of the other five series do not intersect, and the order is WGN, 1/f noise, VARMA, Chen, and financial, from the top to the bottom. However, the coefficient should be small, such as it is around 0.4 (Figure 6b,f). Otherwise, LSE curves are overlapped (Figure 6d,h).
Moreover, to test the dependence on the distance function, we replace Equation (3) with the distance derived from the l ∞ norm, i.e., Repeat the above simulations, the corresponding results are drawn in Figure 7. Compare Figure 7 to Figure 6, one can see that LSE with the same range of parameters can still effectively distinguish between different signals when using the l ∞ norm-derived distance, as Figure 7a,b,f show. Meanwhile, the range of r should be carefully chosen, or the LSE curve may be overlapped, such as Figure 7d,h.
To compare with the multivariate permutation entropy (PE) [17,58], Figure 8 shows PE curves for the above six series. Chaotic signals can be distinguished from stochastic ones via PE since PE chaos < PE random for all τ in Figure 8. However, PE cannot discriminate random series, which have correlations in different degrees. Moreover, it fails in the Rössler and financial cases. It means that PE is not sensitive to tiny differences in the complexity of systems. However, LSE can distinguish those systems, which indicates LSE is a more accurate measure of complexity.

Integer-Order Chaos, Fractional-Order Chaos, and Random Series
In this subsection, we test LSE on the integer-order chaotic systems, fractional-order chaotic systems, and stochastic sequences. For integer-order chaos, we use Chen, Rössler, and financial systems. The timespans are 0:0:005:30, 0:0.05:300, and 0:0.05:300, respectively. In each simulation, small disturbances are added to the initial values. See Equations (16) to (18).
For fractional-order chaos, Equations (11) to (13) are used to generate chaotic time series, such as in Section 3.1. Additionally, we add a small disturbance (Disturb α = ±0.02 × ) to parameters (α 1 , α 2 , and α 3 ), in each simulation. Moreover, we use the same methods as in Section 3.1 to generate random signals. All the simulated series have the same size (500 × 3).
Since we aim to illustrate the effectiveness of LSE, here, we only employ METHOD A to distinguish between different types of signals. Similar to Section 3.1, we also repeat the simulation procedure 100 times; τ ∈ {6, 7, . . . , 14} and r ∈ [0.5, 1.81]. The results are drawn in Figure 9. In Figure 9, chaotic signals can be distinguished from random ones. For instance, LSE random is greater than 0.1 but LSE chaos is less than 0.1 in Figure 9a. A similar situation can be seen in the images corresponding to other r values.
The return map (RM) can be used to characterize the nonlinear process by the transformation of the local maxima (or minima) of the signal [59][60][61][62]. We tested RM on the length sequences associated with the above nine multivariate time series and the results are drawn in Figure 10.
The difference between the Rössler system and the other signals is relatively obvious in both the integer-order and fractional-order cases, as shown in Figure 10a,b. However, it is difficult to distinguish between the Chen system and financial system. In addition, both the Chen and Financial systems are mostly located in [0.5, 1] × [0.5, 1], overlapping with the position where random signals are located, see Figure 10c. Thus, Figure 10 shows that several signals are muddled and difficult to differentiate apart. This failure could be related to the time series' length, since RM commonly calls for a sizable amount of data points. Three shortcomings noted in Section 1 have been somewhat mitigated by the LSE method.

1.
Multi-scale entropy and complex networks often necessitate a substantial number of data nodes to obtain useful findings, but LSE can process relatively short time series.
The length of the test cases in this section is only 500.

2.
According to the calculation process in Section 2, it is simple to know that the time complexity of the LSE method is linear (O(τn)) and appropriate for handling real-time jobs. 3.
LSE is more effective at depicting the short-term autocorrelation and the correlation between various components of multidimensional time series because it takes advantage of the similarity of vectors in sliding windows.
In Section 3.1, the time steps for Rössler and financial systems are 0.005 and 0.05, and the LSE values vary remarkably. Here, we test LSE on the Rössler system with different steps. The results are shown in Figure 11. It can be seen that the increase in the step length causes LSE to increase, but the gap between LSE rossler and LSE random is still clear. In order to assess the robustness of the proposed method, the original signal is supplemented with Gaussian white noise at different levels, and the LSE values are recorded accordingly. The original time series is produced by the chaotic Rössler system. The signal-to-noise ratio (SNR) is used to measure the level of background noise, which is defined by where P S is the power of the signal and P N is that of noise. See Figure 12. Noise tolerance is present in LSE to some extent. The LSE rossler is less than 0.1 when the SNR is higher than 10 dB. As a result, the random signals differ significantly. LSE, however, is unable to properly discriminate between chaotic and random signals as SNR continues to drop.
The above analysis indicates that the proposed LSE, based on its variations in different scales for certain tolerances (or coefficients), can be regarded as an efficient tool to identify multivariate time series. In the next section, we attempt to apply LSE to real-world data, such as the financial market index and machinery data.

Application on Real-World Data
As we introduced in Section 1, multivariate time series conceal the characters in the autocorrelation of each component, as well as the cross-correlation between some channels, which makes it difficult to extract suitable features for further exploration. Two real-world applications, one for the financial market and another for fault diagnosing, will be discussed below.

Financial Market Index
Financial time series are typical signals with high complexities. How do we illustrate the discrimination between financial markets in different regions? Here, we use LSE to quantify the complexities of three important indices, S&P500, FTSE100, and Shenzhen Securities Component Index (SZI). Their values can be attained from Yahoo.com [63]. The time period is between 25/06/2017 and 24/06/2022. In this experiment, daily OHLC (open, high, low, close) prices and volume are considered. In order to avoid the result being manipulated by the volume only, METHOD A is employed to explore the complexity of the three indices, because volume values have much higher standard deviations than OHLC prices. For more details, see Table 1. The scales are {5, 6, . . . , 15} and the tolerance is between 0.63 and 3.16. The results are drawn in Figure 13.  From Figure 13, we can see that LSE FTSE100 lies at the top, LSE SZI at the bottom, and LSE S&P500 is between them when r < 1.8. That is, the European stock market shows higher complexity than the other two markets; maturate financial markets, European, and American stock markets display more complexity and stability than the Chinese stock market, which is in accordance with Cao [64]. Note that LSE SZI lies between LSE FTSE100 and LSE S&P500 for 1.8 < r < 2.4. It contradicts the above result. The reason for this can be attributed to the value of r. We are aware that the range r is determined by multidimensional WGN, but since the financial market index actually has a significant correlation, the tolerance should be lower than usual.

Machinery Fault Recognition
In this part, we examine the ability of the LSE to recognize vibration signals produced by normal or faulty mechanical systems. The bearing dataset is from the machinery fault database (MAFAULDA), which is kindly provided by the Signals Multimedia and Telecommunications Laboratory (SMT) of the Federal University of Rio de Janeiro (UFRJ) [65]. MAFAULDA collects multivariate time series recorded by sensors on a SpectraQuest's machinery fault simulator (MFS) alignment-balance-vibration (ABVT) and comprises six different simulated states. We tested LSE for three states: normal function as well as horizontal and vertical misalignment faults. The data acquisition system is composed of several sensors: one Monarch Instrument MT-190 analog tachometer, three Industrial IMI Sensors accelerometers (Model 601A01), one IMI sensors triaxial accelerometer (model 604b31), and a Shure SM81 microphone. Each sequence has 8 columns sampled at 50 KHz for 5 s, namely a 250,000 × 8 matrix. We randomly intercept 100 fragments (each has 3000 rows) for every state from the database as the test dataset. Figure 14 shows the distribution of each component of the test data (first 100 rows). Then, we employed METHOD A to compute the LSE values. The scale set is {5, 6, . . . , 15} and the tolerance is 0.2457. Moreover, the decision tree method is applied to classify LSE values of different states. A decision support tool known as a decision tree employs a tree-like paradigm to represent options and their outcomes. It consists of nodes, branches, and leaves, each of which displays a property, a rule, or an outcome [66]. In this study, three alternative classification algorithms based on the decision tree theory, fine tree (about 100 leaves make fine distinctions between classes), medium tree (less than 20 leaves with medium flexibility), and coarse tree (less than 4 splits), were used to categorize the vibration signals [67]. In addition, an ensemble bagging tree classifier was added, which is processed by creating numerous decision trees during training and outputs the majority of these tree choices for classification tasks [68]. The ten-fold cross-validation was employed in this test and the confusion matrix was plotted in Figure 15a  From Figure 15, bagged trees based on LSE had the highest accuracy at 98%, while that of PE was 92.3%. Moreover, the accuracies of the LSE-based fine tree, medium tree, and coarse tree (95.3%, 95.3%, and 88.7%) were higher than those of PE (89.3%, 89.3%, 88%). In short, as the LSE values (as features) contribute higher accuracies than PE, the proposed LSE can be an efficient tool for distinguishing multivariate real-world time series.

Conclusions
In this paper, we proposed a local structure-based entropy (LSE), which reflects recurrence conditions in certain scales. It can be regarded as an index of complexity for multivariate time series. Depending on whether or not the components are normalized, we suggest two strategies for using LSE: one shows greater discrimination while the other is more stable but easily ignores the effects of slightly varying components. When the tolerance is small, LSE values of fractional chaotic time series are significantly lower than those of stochastic ones. Moreover, the LSE method also has some resilience to noise. When the SNR is higher than 10 dB, accurate classification is obtained for the task of differentiating chaotic signals from random ones, but the accuracy declines when SNR drops. With suitable tolerance, LSE (in certain scales) can be considered a feature of a dynamical system. Regarding real-world data, it was applied to a financial market index, indicating that European and American financial markets are more complex and stable than Chinese markets. Furthermore, we tested LSE on MAFAULDA; it resulted in a higher accuracy than PE-based classification.
The results are sensitive to the parameters (especially for r), but the optimal range is provided by the simulation other than the theoretical calculation. Therefore, a more comprehensive examination of the parameters and LSE values based on basic information, such as dimensionality, correlation, and time series length, is required. A distance function that can eliminate the impact of the correlation can also be utilized to increase the applicability of LSE. , then z i ∼ N(0, 1). where Σ is introduced in Proposition A2.

Appendix A.2
Proposition A2. Let x = (x i ) i≤n , where x i ∼ N(µ i , σ 2 i ), be a n-dimensional random variable and assume its components are independent; x and y are iid. Denote SS = ∑ n i=1 (x i − y i ) 2 , then Σ = D(SS) = 8 ∑ n i=1 σ 4 i .