Inferring Weighted Directed Association Networks from Multivariate Time Series with the Small-Shuffle Symbolic Transfer Entropy Spectrum Method

Complex network methodology is very useful for complex system exploration. However, the relationships among variables in complex systems are usually not clear. Therefore, inferring association networks among variables from their observed data has been a popular research topic. We propose a method, named small-shuffle symbolic transfer entropy spectrum (SSSTES), for inferring association networks from multivariate time series. The method can solve four problems for inferring association networks, i.e., strong correlation identification, correlation quantification, direction identification and temporal relation identification. The method can be divided into four layers. The first layer is the so-called data layer. Data input and processing are the things to do in this layer. In the second layer, we symbolize the model data, original data and shuffled data, from the previous layer and calculate circularly transfer entropy with different time lags for each pair of time series variables. Thirdly, we compose transfer entropy spectrums for pairwise time series with the previous layer’s output, a list of transfer entropy matrix. We also identify the correlation level between variables in this layer. In the last layer, we build a weighted adjacency matrix, the value of each entry representing the correlation level between pairwise variables, and then get the weighted directed association network. Three sets of numerical simulated data from a linear system, a nonlinear system and a coupled Rossler system are used to show how the proposed approach works. Finally, we apply SSSTES to a real industrial system and get a better result than with two other methods.


Problem Statement
Association networks are found in many domains, such as networks of citation patterns across scientific articles [1][2][3], networks of actors co-starring in movies [4][5][6], networks of regulatory influence among genes [7,8], and networks of functional connectivity between regions of the brain [9,10].The rules defining edges in association networks are not the same.In general, if the relationships among nodes are explicit, we can define a rule for their connectivity and establish the network easily.However, the relationships among components are unknown in many real complex systems, so association network inference has become a popular research topic.Many complex systems belong to the industrial field and the datasets obtained from these complex systems are multivariate time series.Therefore, we aim at studying association network inference from multivariate time series and also attempt to deal appropriately with the problems of the edges' direction and weight in the network.

Related Works
Association network inference has been a research topic for several years.We will review some methods that have been proposed so far to address the undetermined relationships among variables.The most classical approach is based on correlation.For instance, Guo et al. [7] incorporated the distance correlation into inferring gene regulatory networks from the gene expression data without any underlying distribution assumptions.Maucher et al. [11] used Pearson correlation as an elementary correlation measure to detect regulatory dependencies in a gene regulatory network.The association networks generated by basic correlation approaches usually include many indirect relationships which need to be detected and removed to increase the power of the network inference approach.Therefore, a major challenge in inferring association networks is the identification of direct relationships between variables.The classical approach to detect indirect relationships is based on partial correlations, which imposes the control of one gene on the relationship of others.Han and Zhu [12] proposed a method based on the matrix of thresholding partial correlation coefficients (MTPCC) for network inference from expression profiles.The corresponding undirected dependency graph (UDG) was obtained as a model of the regulatory network of S. cerevisiae.Yuan et al. [8] proposed a directed partial correlation (DPC) method as an efficient and effective solution to regulatory network inference.It combines the efficiency of partial correlation for setting up network topology by testing conditional independence, and the concept of Granger causality to assess topology change with induced interruptions.Wang et al. [13] focused on gene group interactions and inferred these interactions using appropriate partial correlations between genes, that is, the conditional dependencies between genes after removing the influences of a set of other functionally related genes.
Moreover, Gaussian Graphical Models also perform well for inferring association networks in specific experimental datasets.Schäfer and Strimmer [14] introduced a framework for small-sample inference of graphical models from gene expression data to detect conditionally dependent genes.Huynh-Thu et al. [15] proposed an algorithm using tree-based ensemble methods like random forests or extra-trees for the inference of GRNs (Genetic Regulatory Networks) that was the best performer in the DREAM4 In Silico Multifactorial Challenge.Some approaches to infer association networks rely on information theory-based similarity measures.Margolin et al. [16] described a computational protocol for the ARACNE algorithm, an information-theory method for identifying transcriptional interactions between gene products using microarray expression profile data.Faith et al. [17] developed and applied the context likelihood of relatedness (CLR) algorithm, also used mutual information as a metric of similarity between the expression profiles of two genes.Zoppoli et al. [18] proposed a method called TimeDelay-ARACNE.It tries to extract dependencies between two genes at different time delays, providing a measure of these dependencies in terms of mutual information.TimeDelay-ARACNE can infer small local networks of time regulated gene-gene interactions detecting their versus and also discovering cyclic interactions when only a medium-small number of measurements are available.Villaverde et al. [19] reviewed some of the existing information theory methodologies for network inference, and clarified their differences.
In addition, approaches rooted in Bayesian Networks (BN) employ probabilistic graphical models in order to infer causal relationships between variables.Aliferis et al. [20] presented an algorithmic framework for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large data sets with relatively small samples.The selected feature sets can be used for causal discovery and classification.Dondelinger et al. [21] introduced a novel information sharing scheme to infer gene regulatory networks from multiple sources of gene expression data.They illustrate and test this method on a set of synthetic data, using three different measures to quantify the network reconstruction accuracy.As a review paper, Lian et al. [22] first discussed the evolution of molecular biology research from reductionism to holism.This is followed by a brief insight on various computational and statistical methods used in GRN inference before focusing on reviewing the current development and applications of DBN-based methods.
Granger causality (GC) is also a very popular tool for association networks inference.It can assess the presence of directional association between two time series of a multivariate data set.GC was introduced originally by Wiener [23], and later formalized by Granger [24] in terms of linear vector autoregressive (VAR) modeling of multivariate stochastic processes.Tilghman and Rosenbluth [25] presented Granger Causality as a method for inferring communications links among a collection of wireless transmitters from externally measurable features.The link inference method was applicable to inferring the link topology of broad classes of wireless networks, regardless of the nature of the Medium Access Control (MAC) protocol used.Cecchi et al. [9] presented a scalable method, based on the Granger causality analysis of multivariate linear models, to compute the structure of causal links over large scale dynamical systems that achieves high efficiency in discovering actual functional connections.The method was proved well to deal with autoregressive models of more than 10,000 variables.Schiatti et al. [26] compared the GC with a novel measure, termed extended GC (eGC), able to capture instantaneous causal relationships.The practical estimation of eGC worked with a two-step procedure, first detecting the existence of zero-lag correlations, and then assigning them to one of the two possible causal directions using pairwise measures of non-Gaussianity.Montalto et al. [27] introduced a new Granger causality measure called neural networks Granger causality.Instead of fitting predefined models, (linear ones in the original proposal by Granger) they trained a neural network to estimate the target using only the past states that can better explain the target series, by using the non-uniform embedding technique.
Until now, we have not seen a description that can deal with any association network inference problem.Although any of the abovementioned researches have its advantages proved by different styles, it is not always suitable for one's particular network inference problem.Because each strategy applies different assumptions, they each have different strengths and limitations and highlight complementary aspects of the network.Some of these popular tools are non-directional, e.g., correlation or partial correlation, mutual information measures and Bayesian Networks, thus these measures cannot satisfy one's directed association networks inference study [36].Granger causality has acquired preeminent status in the study of interactions and is able to detect asymmetry in the interaction.However, its limitation is that the model must be appropriately matched to the underlying dynamics of the examined system, otherwise model misspecification may lead to spurious causalities [37].Some of the proposed methods cannot detect indirect relationships, such as basic correlation, mutual information and Bayesian Networks.Some of the proposed methods mainly deal with linear problem, e.g., Pearson correlation and Spearman correlation but are not appropriate for nonlinear problems.

Primary Contribution of This Work
To address the issues mentioned above, we will propose an approach called small-shuffle symbolic transfer entropy spectrum (SSSTES).This work addresses five challenges: (1) Time series being non-stationary: the probabilities are estimated from observations of a single instance over a long time series.It is very important that the time series is statistically stationary over the period of interest, which can be a practical problem with transfer entropy calculations [38].
In most cases the time series from real industrial complex systems are non-stationary.(2) Time series being continuous: it is problematic to calculate the transfer entropy on continuous-valued time series.Thus, here we will resort to a solution.(3) Strong relationships identification: in general, when carrying out correlation analysis, we are more interested in strong correlations than weak correlations.Because the relationships among these variables are unknown, strong correlations are more convincing but weak correlations have a greater probability of misidentification and this may bring serious consequences.We don't take the indirect correlation into account in the whole paper.(4) The direction and quantity of influence: the causality relation identification is crucial for network prediction and evolution.It is difficult to detect the directional influence that one variable exerts on another between two variables.(5) Temporal relation identification: we attempt to detect the specific temporal relation-based time lags, namely the function relation of time.
In the next section, we will propose a method of inferring association networks from multivariate time series.The emphasis is on how to solve the five challenges mentioned above.Section 3 will apply the proposed method to two numerical examples whose coupled relationships of their components are clear and the values are time-varying.We summarize the results of this paper and identify out some topics for further study in Section 4.

Methods
In this section, we will explain the proposed approach in detail.First, we will show you an integrated framework of the approach, and then carry out a detailed description around the framework.

Main Principle
The approach designed for association network inference takes exploration and application into account thus minimizing human intervention when modeling.Therefore, the approach starts with inputting data and ends with outputting a network inferred from multivariate time series.The modelling process is transparent for users.The main principle of the proposed approach is shown in Figure 1.
The integrated framework has four layers.The first layer, the so-called Data Layer, is the interface interaction with users.One thing to do in this layer is to input the original multivariate time series and modelling parameters, the other thing to do is to shuffle the original data several times with a surrogate data method.What to do next are data operation and modelling the approach.We can divide the complicated modelling work into two layers.The things to do in Model I Layer are time series symbolism and transfer entropy calculation.We will review some theories have been put forward and explain how to synthesize them for our purpose.The output of this layer is a complicated list of transfer entropy matrices.The thing to do in Model II Layer is strong correlation identification with an approach called transfer entropy spectrum.The output of this layer is a 0-1 matrix.In the last layer, we will get a weighted directed network.The start node of an arrowed edge represents a driven variable and the end node represents its corresponding variable.The weight of an edge quantifies the correlation between two nodes, i.e., time series variables.
As shown in Figure 1, there are six key processing operations, represented by rounded rectangles, to accomplish association networks inference.Thus, we will introduce the six steps one by one in the rest of this section.When the value of condition expression is false, the corresponding process will be carried out.Each rounded rectangle represents a key processing operations using a specific method and each hexagon represents a staged result.

Small-Shuffle Surrogate Data Method
The technique of surrogate data analysis is a randomization test method [39].Given time series data, surrogate time series are constructed consistent with the original data and some null hypothesis.The random-shuffle surrogate (RSS) method proposed in [39] can test whether data can be fully described by independent and identically distributed random variables.As summarized in [38,39], the limitation of the RSS method is that it destroys any correlation structure in the data.That is, not only the short-term relationships but also the long-trend relationships between two variables are destroyed.The RSS method assumes global stationarity and performs a pairwise linear decoupling between channels, but in many typical examples the individual channels are also influenced by other nonstationary variations, so we prefer to use the small-shuffled surrogate (SSS) method proposed in [40][41][42].
The SSS method destroys local structures or correlations in irregular fluctuations (short-term variabilities) and preserves the global behaviors by shuffling the data index on a small scale.The steps using SSS method are described as follows: Let the original data be ( ) x t , let ( ) i t be the index of ( ) x t [that is, ( ) i t t = , and so ( ( )) ( ) ], let ( ) g t be Gaussian random numbers, and ( ) s t will be the surrogate data.When the value of condition expression is false, the corresponding process will be carried out.Each rounded rectangle represents a key processing operations using a specific method and each hexagon represents a staged result.

Small-Shuffle Surrogate Data Method
The technique of surrogate data analysis is a randomization test method [39].Given time series data, surrogate time series are constructed consistent with the original data and some null hypothesis.The random-shuffle surrogate (RSS) method proposed in [39] can test whether data can be fully described by independent and identically distributed random variables.As summarized in [38,39], the limitation of the RSS method is that it destroys any correlation structure in the data.That is, not only the short-term relationships but also the long-trend relationships between two variables are destroyed.The RSS method assumes global stationarity and performs a pairwise linear decoupling between channels, but in many typical examples the individual channels are also influenced by other nonstationary variations, so we prefer to use the small-shuffled surrogate (SSS) method proposed in [40][41][42].
The SSS method destroys local structures or correlations in irregular fluctuations (short-term variabilities) and preserves the global behaviors by shuffling the data index on a small scale.The steps using SSS method are described as follows: Let the original data be x(t), let i(t) be the index of x(t) [that is, i(t) = t, and so x(i(t)) = x(t)], let g(t) be Gaussian random numbers, and s(t) will be the surrogate data.
where A is an amplitude.
Entropy 2016, 18, 328 6 of 22 (ii) Sort i (t) by the rank order and let the index of i (t) be î(t).
(iii) Obtain the surrogate data: Parameter A reflects the extent of shuffling data.A higher value of parameter A results in more difference between the surrogate data and the original data.On the contrary, the smaller the value of A, the less the difference.The parameter A is input at the beginning of the method and its empirical value is 1.0.

Time Series Symbolization
The time series symbolization technique was introduced with the concept of permutation entropy [43,44].This technique has helped many other researches on time series achieve new progress and brought us some new techniques, e.g., permutation entropy [43] and symbolic transfer entropy (STE) [44].It is helpful to deal with the problem of continuous and non-linear time series.The principle of time series symbolization is described as follows.
For original multivariate time series, let two time series V 1 , V 2 , be {v 1,t }, {v 2,t } respectively, t = 1, 2, ..., k.The embedding parameters in order to form the reconstructed vector of the time series V 1 are the embedding dimension m 1 and the time delay τ 1 .Accordingly, m 2 and τ 2 are the embedding parameters defined for V 2 .The reconstructed vector of V 1 is defined as: where For each vector ν 1,t , the ranks of its components assign a rank-point ν1,t = [r 1,t , r 2,t , L, r m 1 ,t ] where r j,t ∈ {1, 2, ..., m 1 } for j = 1, 2, ..., m 1 .ν2,t is defined accordingly.

Symbolic Transfer Entropy Calculation with Different Time Lags
Symbolic transfer entropy means that our transfer entropy calculation is based on the symbolic time series data presented in Section 2.3.Symbolic transfer entropy is defined as follows [44]: where τ is the time delay, p( ν1,t+τ , ν1,t , ν2,t ), p( ν1,t+τ | ν1,t , ν2,t ) and p( ν1,t+τ | ν1,t ) are the joint and conditional distributions estimated on the rank vectors as relative frequencies, respectively.Symbolic transfer entropy uses a convenient rank transform to find an estimate of the transfer entropy on continuous data without the need for kernel density estimation.Since slow drifts do not have a direct effect on the ranks, it still works well for non-stationary time series [36].Due to the fact the time delay is underdetermined, the symbolic transfer entropy is calculated n times for each pair of time series.This process is described using Algorithm 1 below.

Algorithm 1: Symbolic Transfer Entropy Calculation with Different Time Lags
Input: STS, symbolic time series tm, maximum time delay Output: STEML, a list of symbolic transfer entropy matrix Method: We first use Algorithm 1 to get a list of symbolic transfer entropy matrices on original time series.Then we shuffle the original data several times which has been specified at the beginning of our method.We repeat the Algorithm 1 on each shuffled data accordingly.

Symbolic Transfer Entropy Spectrum Composition
The Symbolic Transfer Entropy Spectrum (STES) is defined as follows.
The symbolic transfer entropy spectrum between time series Y and X is composed of their many symbolic transfer entropy curves drawn in a rectangular coordinate system.The horizontal axis represents different time delays and the vertical axis represents transfer entropy.One of the transfer entropy curves comes from the original data and other curves come from shuffled data.
Let L o Y→X be the transfer entropy curve of the original data, and L s Y→X be the transfer entropy curve of the shuffled data, then the symbolic transfer entropy spectrum between Y and X can be denoted as follows: In order to form the transfer entropy spectrum, we must understand the structure of the output in Section 2.4.The output is a complicated list of transfer entropy matrices.For each data point, original data or shuffled data, a list of transfer entropy matrices with different delays is returned after applying Algorithm 1.Thus, for all data, the returned result of the last step is a list of transfer entropy matrix lists.The parameters input at the beginning of the method are maximum time delay tm and shuffling times sm.Let tm = 10, sm = 99, then the output of the last step is a list of 100 elements and each element is a list of 10 transfer entropy matrices.Moreover, each entry of the transfer entropy matrix reflects the correlation strength of a pair of time series.Thus, according to the definition of transfer entropy spectrum, we first split the output of Section 2.5 into pieces and then form the transfer entropy spectrum in a certain way.

Strong Correlation Identification
The target of the proposed method in this paper is strong correlation identification and is not all correlations among multivariate time series.The scenario for this method is that we don't know the relationships in the complex system.We pay more attention to the precision of correlation identification but not the recall, because the misidentification of relationships among variables may have serious consequences for our data analysis.
Our decision whether a strong correlation exists or not between two variables is made by the characteristic of the transfer entropy spectrum.This characteristic is based on the theory of hypothesis testing which is often used in surrogate data methods [31,36,39,42].Discriminating statistics are necessary for surrogate data hypothesis testing.The cross correlation and average mutual information were selected as discriminating statistics in [41,42], and partial symbolic transfer entropy in [36].In this paper, we consider transfer entropy as the discriminating statistics.The surrogate data method also needs a null hypothesis.Applying a statistical hypothesis test can result in two outcomes, i.e., the null hypothesis is rejected or not.There are two type of errors when using the hypothesis testing.If the null hypothesis is rejected and it is true, this is called type I error; if we fail to reject the null hypothesis when it is in fact false, this is called type II error.The null hypothesis in our proposed method is that there is no short-term correlation structure between the data or that the irregular fluctuations are independent.In the symbolic transfer entropy spectrum, if the symbolic transfer entropy of the original data falls outside the distribution of the SSS data and there exists an outlier point which value is greater than any other points' value, we can reject the null hypothesis.As a result, we consider that there is a short-term correlation structure between the data and this correlation is a strong correlation.Otherwise, we accept the null hypothesis and consider that there is not a strong correlation between the data.The output of this step is an adjacency matrix and its entry a ij is denoted as follows: where t ∈ (1, 2, ..., tm), s ∈ (1, 2, ..., sm), STE o i→j (t) is the symbolic transfer entropy from variable i to variable j with a time delay t and STE s i→j is the symbolic transfer entropy with all different time delays from variable i to variable j.

Association Network Inference
The association network inferred from multivariate time series can be denoted as G = (V, E).
the set of vertices, i.e., time series variables, and E is the set of edges, i.e., the strong correlations, identified in the Section 2.6, between each pair of vertices in V.
From the 0-1 adjacency matrix from the last step, we have determined the direction of the network.In this step, we assign a weight to the edges in E. The selected measure for the weight is the corresponding maximum symbolic transfer entropy of original data calculated in Section 2.4 and the Equation ( 6) is transformed as follows: where i is the driven variable, and j is the response variable.Finally, we can plot the association network based on the weighted adjacency matrix denoted as Equation ( 7) and carry out deep network analysis.

Results
In this section, we demonstrate the application of the proposed method to simulated time series data from two types of complex system, i.e., a linear system and a nonlinear system.The relationships among the variables in these two examples are clear and therefore we can assess our method by some measures.
In all the following cases, the parameters for modelling with SSSTES method are shuffling amplitude A = 1.0, the dimension of the symbolic time series m = 2, maximum time delay tm = 10, Entropy 2016, 18, 328 9 of 22 maximum shuffling times sm = 99, time points t = 1, 2, ..., 1000.These parameters are input in the Data Layer shown in Figure 1.

Numerical Example from Linear System
First, we apply our method to a linear system which has four time series variables, i.e., x 1 (t), x 2 (t), x 3 (t), x 4 (t).The relationships among these variables are modelled by the following expressions [42]: x 4 (t) = 1.5 where r i (t) (i = 1, 2, 3, 4) are random noise, independent and identically distributed Gaussian random variables with mean zero and standard deviation 1.0.These four time series are shown in Figure 2.
It is difficult for us to find the relationships among the four time series variables from Figure 2. Their fluctuations seem to be irregular and not have any obvious trend, but they have linear relationships in reality.If the variable y is a linear combination of variables x 1 , x 2 , • • • , x n , we say y is a response variable and x 1 , x 2 , • • • , x n are the drive variables.In the network, we denote the drive-response relationship between y and x 1 as a arrowed edge from x 1 to y.Therefore, the corresponding network of the above linear system is shown in Figure 3a. .These parameters are input in the Data Layer shown in Figure 1.

Numerical Example from Linear System
First, we apply our method to a linear system which has four time series variables, i.e., 1 ( ) x t , 2 ( ) x t , 3 ( ) x t , 4 ( ) x t .The relationships among these variables are modelled by the following expressions [42]: where ( ) ( 1,2,3,4) =  , , n  x x x  , we say y is a response variable and 1 2 , , , n x x x  are the drive variables.In the network, we denote the driveresponse relationship between y and 1 x as a arrowed edge from 1 x to y .Therefore, the corresponding network of the above linear system is shown in Figure 3a.As shown in Figure 3a, variable 1 x is driven by two other variables 2 4 , x x , variable 3 x is driven by variables 1 4 , x x and 4 x is driven by 1 x .However, 2 x is not driven by any other variables and it is only a driven variable of 1 x .It is noted that there are autocorrelations in Equations ( 8)-( 11) but we do not show the autocorrelations in Figure 3a.In this paper, we focus on the relationships among different variables but are not concerned with the autocorrelations.
After generating the simulated data by Equations ( 8)- (11) in the Data Layer shown in Figure 1, what we should do is modelling with the proposed method SSSTES.This process has been described in detail in Sections 2.3-2.6.The shuffled data used in the modelling process is generated with the method described in Section 2.2.One output of the Model Layer is the symbolic transfer entropy spectra shown in Figure 4. Since the STE values are rather small, they are multiplied by 100 for ease of plotting.There are twelve pairs of relationships among four time series and they are all shown in Figure 4. Per line we have two pairs of relationships.One is the relationship from time series x to y and the other is the reverse relationship.For example, the left plot in first line is the symbolic transfer entropy spectrum from time series 1 x to time series 2 x and the right one is the symbolic transfer entropy spectrum from time series 2 x to time series 1 x .Other symbolic transfer entropy spectra are all similarly shown in Figure 4.Then, we need to identify the strong relationships in Figure 4.The method to identify the strong relationship between a pair of time series is described in Section 2.6.With this method we identify the strong relationships v21, v13, v14, v43.As a result, we conclude that time series 1 x is influenced by time series 2 x and this conclusion is consistent with Equation (8).By contrast with Equations ( 8)- (11), we find that other three identified strong relationships v13, v14, v43 are also correct.The other output of the Model Layer is a 0-1 adjacency matrix, corresponding to the symbolic transfer entropy spectrum.For instance, the resulted adjacency matrix from Figure 4 is described by Equation ( 12): Finally, we infer a weighted directed association network in the last layer.From Equation ( 12), we can get a directed network and then we should quantify the correlation strength between those pairs of relationships that have been identified out above.Therefore, we introduce a correlation measure into adjacency matrix C and get a new weighted adjacency matrix C′ , whose entries is described as Equation (7).The selected measure is the maximum symbolic transfer entropy with different time lags of original data.Then, we get the weighted adjacency matrix C′ as follows: As shown in Figure 3a, variable x 1 is driven by two other variables x 2 , x 4 , variable x 3 is driven by variables x 1 , x 4 and x 4 is driven by x 1 .However, x 2 is not driven by any other variables and it is only a driven variable of x 1 .It is noted that there are autocorrelations in Equations ( 8)-( 11) but we do not show the autocorrelations in Figure 3a.In this paper, we focus on the relationships among different variables but are not concerned with the autocorrelations.
After generating the simulated data by Equations ( 8)-( 11) in the Data Layer shown in Figure 1, what we should do is modelling with the proposed method SSSTES.This process has been described in detail in Sections 2.3-2.6.The shuffled data used in the modelling process is generated with the method described in Section 2.2.One output of the Model Layer is the symbolic transfer entropy spectra shown in Figure 4. Since the STE values are rather small, they are multiplied by 100 for ease of plotting.There are twelve pairs of relationships among four time series and they are all shown in Figure 4. Per line we have two pairs of relationships.One is the relationship from time series x to y and the other is the reverse relationship.For example, the left plot in first line is the symbolic transfer entropy spectrum from time series x 1 to time series x 2 and the right one is the symbolic transfer entropy spectrum from time series x 2 to time series x 1 .Other symbolic transfer entropy spectra are all similarly shown in Figure 4.
Then, we need to identify the strong relationships in Figure 4.The method to identify the strong relationship between a pair of time series is described in Section 2.6.With this method we identify the strong relationships v 21 , v 13 , v 14 , v 43 .As a result, we conclude that time series x 1 is influenced by time series x 2 and this conclusion is consistent with Equation (8).By contrast with Equations ( 8)-( 11), we find that other three identified strong relationships v 13 , v 14 , v 43 are also correct.
The other output of the Model Layer is a 0-1 adjacency matrix, corresponding to the symbolic transfer entropy spectrum.For instance, the resulted adjacency matrix from Figure 4 is described by Equation ( 12): Finally, we infer a weighted directed association network in the last layer.From Equation (12), we can get a directed network and then we should quantify the correlation strength between those pairs of relationships that have been identified out above.Therefore, we introduce a correlation measure into adjacency matrix C and get a new weighted adjacency matrix C , whose entries is described as Equation (7).The selected measure is the maximum symbolic transfer entropy with different time lags of original data.Then, we get the weighted adjacency matrix C as follows: The association network corresponding to the matrix C is shown in Figure 3b.In Figure 3b, each time series is mapped as a node, and each arrowed edge stands for a drive-response relationship, and we associate each edge with a weight value, i.e., the max symbolic transfer entropy value, which is mapped as the width of the lines.As we see, the relationship from x 4 to x 3 is the strongest one.In Figure 3, the original network (a) has five directed edges and the inferred network; (b) has four edges.By comparison, we find that the four edges of inferred network all exist in the original network, thus we get a higher precision.
The association network corresponding to the matrix C′ is shown in Figure 3b.In Figure 3b, each time series is mapped as a node, and each arrowed edge stands for a drive-response relationship, and we associate each edge with a weight value, i.e., the max symbolic transfer entropy value, which is mapped as the width of the lines.As we see, the relationship from 4 x to 3 x is the strongest one.
Figure 3, the original network (a) has five directed edges and the inferred network; (b) has four edges.By comparison, we find that the four edges of inferred network all exist in the original network, thus we get a higher precision.Next, we assess the correlation identification of the proposed method using two indicators, i.e., precision and sensitivity (or recall, true positive rate) [45,46].Precision is defined as Equation ( 14) and sensitivity is defined as Equation ( 15): Here, TP is the numbers of edges which are in the intersections between the original edge set and the inferred edge set, FP is the number of edges which are in the inferred edge set but not in the original edge set and FN is the number of edges which are not in the inferred edge set but are in the original edge set.In order to test whether the model is sensitive to the system noise, we generate ten groups of data using Equations ( 8)-( 11) and then apply the proposed method to these data.As a result, we get ten precision values and sensitivity values and their average values shown in Table 1.From Table 1, the average precision of our model is higher to 0.92 and the average sensitivity achieve to 0.74 although it is inferior to precision.Next, we discuss the temporal relation identification of the proposed method.Please note that the following discussion is based on those edges inferred correctly.The time lag we assign to two correlation variables is the time point when STE of original data achieve the maximum value.Based on this definition, we define a measure, i.e., the precision of time lags (PTL), to assess the temporal relation identification of the proposed method.It is defined as Equation ( 16): Here, TPL is the correct number of temporal relation identification in those edges which have been identified correctly, FPL is the error number of temporal relation identification which have been identified correctly.The results of PTL are shown in Table 1.We get a higher PTL of 0.98.
It is proved that the proposed method is good at inferring association networks from linear multivariate time series.In Section 3.2, we will validate our method using a nonlinear system.
Here, r i (t)(i = 1, 2, 3, 4) are random noise, independent and identically distributed Gaussian random variables with mean zero and standard deviation 1.0.Two changes are made to get a nonlinear system compared with Equations ( 8)- (11).One change is made with x 1 (t).In Equation ( 17), there is a product x 2 (t − 4) x 4 (t − 7), which is a nonlinear function.In addition, in Equation ( 20), there is a product 0.2x 1 (t − 2) and this makes x 4 (t) also nonlinear.The other change can be found in Equation (19).The function h(•) is a static monotonic nonlinear function and it is denoted as follows [47]: Here, ρ = 3, a = −2, b = 10.The time series generated by Equations ( 17)-( 20) are shown in Figure 5.The original network of this nonlinear system is the same as is Figure 3a.We apply the proposed method to this nonlinear system and the process is the same as that described in Section 3.1.The resulting symbolic transfer entropy spectrum is shown in Figure 6.From this figure, we conclude that time series x 1 is influenced by time series x 2 and x 4 .Time series x 3 is influenced by time series x 1 and x 4 .These identified relationships are correct by contrast to Equations ( 17)- (20).We can denote the output of symbolic transfer entropy spectrum as an adjacency matrix Equation (22).
Entropy 2016, 18, 328 13 of 21 Here, ( )( 1,2,3,4) i r t i = are random noise, independent and identically distributed Gaussian random variables with mean zero and standard deviation 1.0.Two changes are made to get a nonlinear system compared with Equations ( 8)- (11).One change is made with 1 ( ) x t .In Equation (17), there is a product 2 4 ( 4) ( 7) x t x t − − , which is a nonlinear function.In addition, in Equation ( 20), there is a product x t − and this makes 4 ( ) x t also nonlinear.The other change can be found in Equation (19).The function ( ) ⋅ h is a static monotonic nonlinear function and it is denoted as follows [47]: Here, =3 ρ , The time series generated by Equations ( 17)-( 20) are shown in Figure 5.The original network of this nonlinear system is the same as is Figure 3a.We apply the proposed method to this nonlinear system and the process is the same as that described in Section 3.1.The resulting symbolic transfer entropy spectrum is shown in Figure 6.From this figure, we conclude that time series 1 x is influenced by time series 2 x and 4 x .Time series 3 x is influenced by time series 1 x and 4 x .These identified relationships are correct by contrast to Equations ( 17)- (20).We can denote the output of symbolic transfer entropy spectrum as an adjacency matrix Equation (22).This is a 0-1 adjacency matrix.We aim to get a weighted directed network, so we assign a weight to each edge following the method described in Section 2.7.Then, we get the weighted adjacency matrix which is denoted as Equation ( 23): From this matrix, we get the association network which is shown in Figure 3c.The inferred network has four edges and they are all contained in the original network which is shown in Figure 3a.Therefore, we consider that the proposed method works well for nonlinear systems.This is a 0-1 adjacency matrix.We aim to get a weighted directed network, so we assign a weight to each edge following the method described in Section 2.7.Then, we get the weighted adjacency matrix which is denoted as Equation ( 23): From this matrix, we get the association network which is shown in Figure 3c.The inferred network has four edges and they are all contained in the original network which is shown in Figure 3a.Therefore, we consider that the proposed method works well for nonlinear systems.
To end of this section, we also assess the performance of the proposed method when it is applied to the nonlinear system.The indicators are still precision, sensitivity [45,46] and PTL, described in Section 3.1.The results measured on ten groups of data are shown in Table 2. From Table 2, we see that the average precision of our model is higher to 0.90, the average sensitivity achieved is 0.72 and the precision of time lag identification is 0.98.

Numerical Example from the Coupled Rossler Systems
The linear system and nonlinear system, illustrated in Sections 3.1 and 3.2, are common systems, in which the relationships among time series variables are determined.In this section, we validate whether the proposed method work well for the coupled chaotic system, in which the relationships between components are uncertain and complicated.The coupled Rossler systems are two symmetrically coupled chaotic systems.The equations are given as follows: . . . . . .
where the parameters are a 1 = 0.2, a 2 = 0.4, b 1 = b 2 = 0.2, w 1 = w 2 = 5.7, c 1 = c 2 = 0.2 and c 1 , c 2 are coupling control parameters.In this coupled chaotic system, we generate data using the fourth order Runge-Kutta method with the sampling interval 0.01.This coupled Rossler system has six components and their time series are shown in Figure 7.The relationships among these six time series are shown in Figure 8a.From Equation (24), we know x 1 is driven by y 1 and z 1 .From Equation (25), we know y 1 is driven by y 2 .From Equation (26), we know z 1 is driven by x 1 .In the same way, we can get the other relationships in Figure 8a from Equations ( 27)- (29).It is emphasized that we don't take the indirect relationships into account, not only for this example, but also for the other examples.After applying the proposed method to the coupled Rossler system, we get the inferred association network shown in Figure 8b.There are 10 edges in Figure 8a and there are eight edges in Figure 8b.By comparison, the precision is 0.875 and the sensitivity is 0.7 according to their definitions in Section 3.1.
Next, we also assess the performance of the proposed method when it is applied to the coupled Rossler system.For the time series shown in Figure 7, the experiments are carried out ten times.By comparison and calculation, ten groups of results are obtained, shown in Table 3.The assessment indicators are precision and sensitivity.From Table 3, we see that the average precision of our model is higher to 0.84 and the average sensitivity is 0.62.It is thus proved that our model method is also effective for chaotic time series.comparison and calculation, ten groups of results are obtained, shown in Table 3.The assessment indicators are precision and sensitivity.From Table 3, we see that the average precision of our model is higher to 0.84 and the average sensitivity is 0.62.It is thus proved that our model method is also effective for chaotic time series.Table 3.The model assessment on coupled chaotic data generated by Equations ( 24)-( 29).The values of precision and sensitivity in the table are rounded to two decimals.comparison and calculation, ten groups of results are obtained, shown in Table 3.The assessment indicators are precision and sensitivity.From Table 3, we see that the average precision of our model is higher to 0.84 and the average sensitivity is 0.62.It is thus proved that our model method is also effective for chaotic time series.Table 3.The model assessment on coupled chaotic data generated by Equations ( 24)-( 29).The values of precision and sensitivity in the table are rounded to two decimals.

Application
In this section, we apply the proposed method to a real industrial system which is an important part in the whole power generation system, i.e., the steam turbine system.The basic principle of a steam turbine system is shown in Figure 9.The eight monitored parameters are main steam pressure left (MSPL), main steam pressure right (MSPR), reheat steam pressure left (RSPL), reheat steam pressure right (RSPR), governing stage pressure (GSP), IP turb exh pressure (IPTEP), condenser vacuum A (CVA) and condenser vacuum B (CVB), respectively.The font color of these parameters is red.In addition, the main components of turbine are also shown in Figure 9, i.e., high pressure turb (HPT), intermediate pressure turb (IPT), low pressure turb A (LPTA), low pressure turb B (LPTB) and dynamo.The other symbols are switches or valves.
The data set was obtained from the distributed control system (DCS) of a power plant belonging to China Huadian Corporation, which is one of the five state-owned sole proprietorship power generation corporations.We use the eight monitoring parameters mentioned above and 5000 records for analysis.The time series of these monitored parameters are shown in Figure 10. Figure 11 shows the original network and inferred network from the time series.The original network is drawn according to the work principle of the stream turbine shown in Figure 9.

Application
In this section, we apply the proposed method to a real industrial system which is an important part in the whole power generation system, i.e., the steam turbine system.The basic principle of a steam turbine system is shown in Figure 9.The eight monitored parameters are main steam pressure left (MSPL), main steam pressure right (MSPR), reheat steam pressure left (RSPL), reheat steam pressure right (RSPR), governing stage pressure (GSP), IP turb exh pressure (IPTEP), condenser vacuum A (CVA) and condenser vacuum B (CVB), respectively.The font color of these parameters is red.In addition, the main components of turbine are also shown in Figure 9, i.e., high pressure turb (HPT), intermediate pressure turb (IPT), low pressure turb A (LPTA), low pressure turb B (LPTB) and dynamo.The other symbols are switches or valves.
The data set was obtained from the distributed control system (DCS) of a power plant belonging to China Huadian Corporation, which is one of the five state-owned sole proprietorship power generation corporations.We use the eight monitoring parameters mentioned above and 5000 records for analysis.The time series of these monitored parameters are shown in Figure 10. Figure 11 shows the original network and inferred network from the time series.The original network is drawn according to the work principle of the stream turbine shown in Figure 9.
Based on the experiments from simulated numerical examples in Sections 3.1-3.3,we apply the proposed method to the multivariate time series from the DCS of the power plant.The inferred association network is shown in Figure 11b.In comparison with Figure 11a, we can see that there are nine edges in the inferred network and six edges exist in both the original network and inferred network.Therefore, the precision is 0.67 and the sensitivity is 0.75.The result in application is not as good as the result obtained in Sections 3.1 and 3.2.The error relationships are MSPL-IPTEP, MSPR-IPTEP and GSP-IPTEP respectively.However, this doesn't mean that there are no correlations between them.Indeed, we can conclude from the original network that these three pairs of correlations are all right.The reason why we consider them as error relationships is that they are all indirect correlations, but we only consider the direct correlations as correct.Therefore, although the precision is a bit low, it still performs well in this application.Based on the experiments from simulated numerical examples in Sections 3.1-3.3,we apply the proposed method to the multivariate time series from the DCS of the power plant.The inferred association network is shown in Figure 11b.In comparison with Figure 11a, we can see that there are nine edges in the inferred network and six edges exist in both the original network and inferred network.Therefore, the precision is 0.67 and the sensitivity is 0.75.The result in application is not as good as the result obtained in Sections 3.1 and 3.2.The error relationships are MSPL-IPTEP, MSPR-IPTEP and GSP-IPTEP respectively.However, this doesn't mean that there are no correlations between them.Indeed, we can conclude from the original network that these three pairs of correlations are all right.The reason why we consider them as error relationships is that they are all indirect correlations, but we only consider the direct correlations as correct.Therefore, although the precision is a bit low, it still performs well in this application.Next, we will make a comparison between SSSTES and two other methods, i.e., random-shuffle symbolic transfer entropy (RSSTE) [36] and Granger Causality (GC) [24,48].We apply these three methods to the power data set, respectively, and get the results shown in Table 4.For the two surrogate data methods, i.e., SSSTES and RSSTE, the shuffling times is 99 and the values of precision and sensitivity are the average values.The average precision of our proposed SSSTES method is 0.67 Next, we will make a comparison between SSSTES and two other methods, i.e., random-shuffle symbolic transfer entropy (RSSTE) [36] and Granger Causality (GC) [24,48].We apply these three methods to the power data set, respectively, and get the results shown in Table 4.For the two surrogate data methods, i.e., SSSTES and RSSTE, the shuffling times is 99 and the values of precision and sensitivity are the average values.The average precision of our proposed SSSTES method is 0.67 Next, we will make a comparison between SSSTES and two other methods, i.e., random-shuffle symbolic transfer entropy (RSSTE) [36] and Granger Causality (GC) [24,48].We apply these three methods to the power data set, respectively, and get the results shown in Table 4.For the two surrogate data methods, i.e., SSSTES and RSSTE, the shuffling times is 99 and the values of precision and sensitivity are the average values.The average precision of our proposed SSSTES method is 0.67 and the average sensitivity is 0.75.The average precision of RSSTE is 0.27 and the average sensitivity is 0.38.The precision of GC is 0.17 and the sensitivity is 1.00.By comparison, it is proved that our proposed method is better than the other two methods in this application.Although the sensitivity of GC is the highest, the cost is the lowest precision.It means that GC identifies all the correlations but causes a high rate of error identification.However, SSSTES is a moderate method, which maintains a high precision and meanwhile the sensitivity in an acceptable range.

Conclusions
In order to infer a weighted directed association network from multivariate time series, we have proposed a method named Small-Shuffle Symbolic Transfer Entropy Spectrum (SSSTES) which synthesizes the Symbolic Transfer Entropy (STE) and Small-Shuffle Surrogate (SSS) methods.We first proposed the framework of the method.It is composed of four layers, i.e., Data Layer, Network Layer and two Model Layers.Then we described the six main process of SSSTES from Section 2.2 to Section 2.7.The most complicated and important processes are STES composition and strong correlation identification.After the method description, we applied the proposed method to numerically simulate linear, nonlinear and coupled chaotic systems.We used three indicators, i.e., precision, sensitivity and PTL, to assess the proposed method.As a result, the method shows good performance in the linear system, nonlinear system, and coupled chaotic system.Finally, we applied the proposed method to an empirical investigation on a stream turbine system in a power plant.As a comparison, we also used another two methods, i.e., RSSTE and GC, which can deal with directional correlation.The results show that our proposed method is better than the other two methods.In general, the method performs well on both simulated numerical examples and a real complex system.It can not only identify the strong correlations, but also find out the time delay between pairwise time series when the temporal correlation is determined.
Although it is illustrated that the proposed method is good at inferring association networks from multivariate time series, there are still some topics that are worth studying in the future.First, in this paper, it is considered that the misidentification of relationships may have serious consequences, thus we aim to achieve a strong correlation identification and ignore the proportion of identified relationships among all relationships existing in the complex system.Therefore, we will attempt to improve the sensitivity of SSSTES.Second, although the method deals with many issues, such as correlation identification, correlation quantification, direction identification, indirect correlation and temporal relation identification, some of their abilities are not good enough and influence the result, e.g., the sensitivity in Section 3.3 and the precision in Section 3.4.We should make further efforts on these topics.Nevertheless, the proposed method still can serve as a heuristic tool for inferring association networks from multivariate time series so as studying the system deeply with complex network knowledge.

Figure 1 .
Figure 1.Transfer entropy-based framework to infer association networks from multivariate time series.The black solid arrowed line in the flow diagram represents the determined sequential process and the blue dashed arrowed line, along with a Boolean condition, represents potential process.When the value of condition expression is false, the corresponding process will be carried out.Each rounded rectangle represents a key processing operations using a specific method and each hexagon represents a staged result.

Figure 1 .
Figure 1.Transfer entropy-based framework to infer association networks from multivariate time series.The black solid arrowed line in the flow diagram represents the determined sequential process and the blue dashed arrowed line, along with a Boolean condition, represents potential process.When the value of condition expression is false, the corresponding process will be carried out.Each rounded rectangle represents a key processing operations using a specific method and each hexagon represents a staged result.
following cases, the parameters for modelling with SSSTES method are shuffling amplitude 1.0 A = , the dimension of the symbolic time series 2 m= , maximum time delay 10 tm = , maximum shuffling times 99 sm = , time points 1, 2,...,1000 t = are random noise, independent and identically distributed Gaussian random variables with mean zero and standard deviation 1.0.These four time series are shown in Figure2.It is difficult for us to find the relationships among the four time series variables from Figure2.Their fluctuations seem to be irregular and not have any obvious trend, but they have linear relationships in reality.If the variable y is a linear combination of variables 1 2 ,

Figure 3 .
Figure 3.The original network and inferred networks.(a) the original association network constructed from Equations (8)-(11); (b) the inferred association network in Section 3.1; (c) the inferred association network in Section 3.2.

Figure 4 .
Figure 4. Symbolic transfer entropy spectrums of linear system.Plot v 12 is the symbolic transfer entropy spectrums between time series 1 and 2. Plots v 21 , v 13 , v 31 , v 14 , v 41 , v 23 , v 32 , v 24 , v 42 , v 34 , v 43 are the symbolic transfer entropy spectrums between other time series pairs respectively.

Figure 6 .
Figure 6.Symbolic transfer entropy spectra of a nonlinear system.Plot v 12 is the symbolic transfer entropy spectrums between time series 1 and 2. Plots v 21 , v 13 , v 31 , v 14 , v 41 , v 23 , v 32 , v 24 , v 42 , v 34 , v 43 are the symbolic transfer entropy spectrums between other time series pairs, respectively.

Figure 7 .Figure 8 .
Figure 7.The multivariate time series in the coupled Rossler system.

Figure 7 .Figure 8 .
Figure 7.The multivariate time series in the coupled Rossler system.

Figure 8 .
Figure 8.The original network and inferred network from the coupled Rossler systems.(a) The original association network constructed from Equations (24)-(29); (b) the inferred association network.

Entropy 2016, 18 , 328 18 of 21 Figure 10 .Figure 11 .
Figure 10.The time series of eight monitored parameters from the power system DCS.

Figure 10 . 21 Figure 10 .Figure 11 .
Figure 10.The time series of eight monitored parameters from the power system DCS.

Figure 11 .
Figure 11.The original network and inferred network from the DCS time series.(a) The original association network constructed from the DCS time series; (b) The inferred association network from the DCS time series.

Table 1 .
(11)model assessment on 10 groups of data generated by Equations (8)-(11).The values of Precision, Sensitivity and precision of time lags (PTL) in the table are rounded to two decimals.

Table 2 .
(20)model assessment on 10 groups of data generated by Equations (17)-(20).The values of precision, sensitivity and PTL in the table are rounded to two decimals.
The multivariate time series in the coupled Rossler system.

Table 3 .
(29)model assessment on coupled chaotic data generated by Equations (24)-(29).The values of precision and sensitivity in the table are rounded to two decimals.

Table 4 .
The algorithm assessment of three different methods.The values of precision and sensitivity in the table are rounded to two decimals.