Construction of Complex Network with Multiple Time Series Relevance

Multivariate time series data, which comprise a set of ordered observations for multiple variables, are pervasively generated in weather conditions, traffic, financial stocks, etc. Therefore, it is of great significance to analyze the correlation between multiple time series. Financial stocks generate a significant amount of multivariate time series data that can be used to build networks that reflect market behavior. However, traditional commercial complex networks cannot fully utilize the multiple attributes of stocks and redundant filter relationships and reveal a more authentic financial stock market. We propose a fusion similarity of multiple time series and construct a threshold network with similarity. Furthermore, we define the connectivity efficiency to choose the best threshold, establishing a high connectivity efficiency network with the optimal network threshold. By searching the central node in the threshold network, we have found that the network center nodes constructed by our proposed method have a more comprehensive industry coverage than the traditional techniques to build the systems, and this also proves the superiority of this method.


Introduction
In the real world, a large number of systems can be described by complex networks [1][2][3][4].The financial market is naturally a complex system, and the economy and financial systems have a self-organized complexity [5].Many studies have shown that complex networks have been successfully used to describe financial systems [6].From a different point of view, people put forward much network construction arithmetic [7][8][9].In the U.S. stock market, the power-law of the degree distribution [10] in the minimum spanning tree (MST) has been observed [11].The minimum spanning tree and hierarchical tree are also used to study the topology of correlation networks among major currencies [12,13] and financial crisis [14].Kwapie ń et al. [15] introduced a family of q-dependent minimum spanning trees (qMSTs) that are selective for cross-correlations between different fluctuation amplitudes and different time scales of multivariate data.Djauhari et al. [16] introduced a set of optimality criteria and proposed the process of selecting the optimal MST.Through the analysis of the planar maximally-filtered graphs (PMFG) of the portfolio of the 300 stocks traded, Tumminello et al. [17] confirmed that the selected stocks composed a hierarchical system progressively structuring as the sampling time horizon increases.The method of correlation networks has been applied to the structural transition of financial systems during a crisis in a local market [18].In the Tehran stock market and the DJIA, a scale-free threshold network in a restricted threshold has been observed [19].Longfeng Zhao et al. [20] found some long-duration edges that serve as the backbone of the stock market during crises.Huang Weiqiang et al. [21] used the threshold method (traditional method) to construct Chinese stock-related networks and then studied the structural properties and topological stability of the net.
In common cases, the information on parts of the dynamical system is expressed in the form of time series, and time series analysis is a classic means of data analysis [22].Recently, there has been a growing industry in the application of complex network theory to carry out time series analysis [23,24].The time series firstly is transformed into networks and then analyzed with various complex network tools [25][26][27][28].Shirazi et al. [29] demonstrated that the time series can be reconstructed with high precision by means of a simple random walk on their corresponding networks.The interaction of financial markets naturally constitutes a complex system in which the stock data are a multivariate time series (MTS) [30] of information in each part of the complex system.RN Mantegna [31] mapped the financial market as a network, the vertices of which are stocks and the edges between vertices are the relationships of shares.Kazemilari et al. [32] constructed a multivariate correlation network of stocks where each of them was represented by a multivariate time series, and vector correlation was used to measure the similarity among multivariate time series in the stock network.
Although networks based on price fluctuations can help us understand the complex correlations between stocks, the direct conversion implicitly assumes the Markov property (first-order dependency) and loses important information about dependencies in the raw data.Besides, the stock market constructed with the volatility relationship of the closing price data ignores other attributes of the stock, which are the multivariate time series.In real life, simple price fluctuations do not support a series of other data, such as trading volume, and the impact of stock fluctuations cannot effectively pass this effect on to related stocks.Therefore, the stock price fluctuation network constructed by the closing price time series cannot fully portray the interoperability between the time series itself and make full use of the stock's multi-attribute information.
Due to the limitations of the stock price index correlation threshold network, we will introduce and improve a new method of building a stock network.First of all, we select a representative multiple sub-time series from a multi-stock time series and convert each sub-time series into a Gramian angular field (GAF) [33] grayscale image.We then merge multiple GAF grayscale images into MGAF color images and construct a multivariate stock network by calculating image similarity.Finally, we find that the network established by our method has a high connectivity efficiency, and the central node has wider industry coverage than the network constructed by the stock price index correlation threshold method.Next, we will introduce the network construction method in detail and give the empirical research and results in the summary.

Image Fusion Network Construction Method
In this section, we will introduce how to construct complex networks using multivariate time series.The process is as shown in Figure 1.

Feature Selection
An MTS contains a set of ordered observations at a discrete time for multiple variables.To reduce the amount of computation while retaining as much information as possible, we use the Pearson correlation coefficient to perform feature selection on MTS.The Pearson correlation coefficient between each subsequence of multiple time series X i is as Equation (1).
where X i is the sequence of the i-th attribute of the time series X, σ x i is the standard deviation of X i and cov X i , X j is the covariance of the X i and X j .The elements ρ x i ,x j are restricted to the interval [−1, 1], where ρ x i ,x j = 1 defines perfect correlation and ρ x i ,x j = −1 corresponds to perfect anti-correlation.ρ x i ,x j = 0 corresponds to uncorrelated pairs of stocks.

Imaging Multivariate Time Series and Similarity Calculation
The GAF [33] converts time series X i = (x i1 , x i2 ...x in ) into grayscale images.The GAFs provide a way to preserve temporal dependency, since time increases as the position moves from top-left to bottom-right.The GAFs contain temporal correlations because G (i,j||i−j|=k) represents the relative correlation by superposition of directions for time interval k.In contrast, GAFs are essentially grayscale images that represent only the time series of a single attribute.Considering many time series, such as stocks, have multiple attributes, we propose that multiple GAFs be defined as Equation (2).
where G i is an n × n pixel GAFs image converted from subsequence X i and w i is the fusion weight corresponding to the grayscale image G i .MGAF m is an image fused by m GAF images.
In the construction of correlation threshold networks, we often need to know the size of the differences between individuals and then evaluate the similarity between individuals.Through the method described above, we converted each stock into an image.Then, we calculated the image similarity to determine the network edge.Fortunately, the structural similarity (SSIM [34]) is excellent in image similarity.The structural similarity index defines the structural information from the perspective of image composition as a property that reflects the structure of the object in the scene independently of the brightness and the contrast and models the distortion as a combination of the three different factors of brightness, contrast and structure.The mean value is used as the estimate of brightness; the standard deviation is used as an estimate of contrast; and the covariance is used as a measure of the degree of structural similarity.We use Equation ( 3) to evaluate the overall image similarity.
where the images M i and M j are divided into N sub-images in the same manner.Suppose S i = (S i1 , S i2 , • • • ) represents stock i, where S i1 represents the closing price sequence and S i2 represents the turnover rate sequence.S 1 , S 2 , S 3 and S 4 are four different stocks, and we can see the different correlation of these stocks in Figure 2.For the stock Pearson cross-correlation methods that usually only focus on the closing price of a stock [21], S 3 and stock S 1 have the same absolute value of the Pearson similarity of stock S 3 and stock S 2 .GAF images take into account the time dependence of time series, and stock S 3 and stock S 1 have the different MSSI M of stock S 3 and stock S 2 .The MSSI M of stock S 3 and stock S 1 is 0.308; the MSSI M of stock S 3 and stock S 2 is 0.265.Considering only the closing price of the stock, the MSSI M of stock S 3 and stock S 4 is 1.0.The MSSI M of stock S 3 and stock S 4 is 0.832, taking into account the stock closing price and turnover rate.With the appropriate thresholds, the fused image of the stock can distinguish the previous one, while the traditional method [21] cannot distinguish this.This is the advantage of converting a multivariate time series into a fused image.

The Connection Efficiency
In order to understand the features of the similar structure of financial stocks better, we establish the stocks' associated networks (SAN) using complex network theory.To observe the network attributes under different thresholds in a more intuitive way, we define the connection efficiency in addition to the common network global attributes such as the average clustering coefficient [35], the average efficiency [36], density and the fraction of the coverage.
Network density can be used to characterize the degree of a respective edge between nodes in the network.It is defined as Equation (4).
where n is the number of nodes and m is the number of edges in the network.
In an undirected graph G, v i and v j are connected if there is a path from vertex v i to vertex v j .A maximally-connected subgraph of an undirected graph G is called a connected component of G.
The ratio between the maximum number of connected component nodes and the number of stocks in a network is then defined as the fraction of the coverage of the network, that is Equation (5).
where V is the collection of stocks in a network and V i is the collection of the connected component of a stock network.
The connection efficiency can be used to characterize the strength of edge connection nodes.According to Equation ( 6), if we want to maximize the connection efficiency, we need to the fraction of the coverage (F) of the network as much as possible and minimize the network density (D).Looking back at Equations ( 4) and ( 5), we will build a network whose edges link all nodes as much as possible without making the edges too dense.In other words, we use fewer edges to get more nodes to access the network.The higher the connection efficiency, the stronger the ability of the edge to connect to the node.This means that, compared to the previous network construction methods, the connected edges found by our proposed method are more important, and the relationship between stocks is more significant.
where D is the network density and F is the fraction of the coverage of the network.

The Network Overview
We calculate the Pearson correlation coefficient between the stocks' properties.Through Figure 3, we can see that the attributes are roughly divided into three groups, so we choose m = 3.The MGAF 3 formula is as Equation (7).
where G = (G 1 , G 2 , G 3 ) is the GAF image converted from ups and downs, closing price and the turnover rate respectively and w = (0.299, 0.587, 0.114) is the fusion weight.We used the default parameter values that the RGB image converted to the grayscale image as fusion weights, but this is not limited to this kind of weight.One can modify the weights according to one's needs to get different results.
Then, we calculated the image similarity MSSI M and established SAN models with different thresholds.Then, we selected the appropriate threshold network as a representative, as shown in Figure 4.The size of each node depends on the degree of the node.The image fusion network.Each node in the network corresponds to a stock.The edge indicates that the correlation between the two stocks corresponding to the fused image is greater than or equal to 0.3, which is roughly the threshold corresponding to the maximum value of the connection efficiency in Figure 6.The size of the node represents the degree of the node.The larger the node, the greater the degree of the node.

Threshold Selection
For building a correlation threshold network, how to choose a reasonable threshold is the key.At present, the threshold is determined based on a comparison of attributes between different threshold networks; or the threshold is set based on relevant expert experience.To explore how to determine the threshold of the network, we constructed the fibers with different thresholds of 0.01 to 0.86 and statistics; the global properties of the system by each limit are shown in Figure 5. From the local features of the network, the role of nodes in the system is very different.A few critical nodes played a leading role in the operation of the network.We compared the node centrality of the system within the threshold interval and ranked the first ten nodes according to their centrality, as shown in Tables 1 and 2.
From Figure 5, we can see that as the threshold increases, the network density, the average clustering coefficient, etc., are continuously decreasing.We can imagine that when the threshold is too small, there is an edge between almost all nodes, which is equivalent to a complete graph.However, when the limit is too large, almost all nodes are isolated.To reduce redundant edges and incorporate all nodes into the network as much as possible, we defined the connection efficiency and chose the limit that maximized the connectivity efficiency as the network's threshold.As shown in Figure 6, we see that the connection efficiency rises steeply within the threshold interval, reaching a rapid drop after a rare extreme.The connectivity efficiency, the strength of the edge connection nodes, is defined as Equation ( 6).The connection efficiency is a convex function with respect to the threshold.The connection efficiency increases first as the threshold increases and then decreases after the maximum value is obtained.
From the table, we can get that the first row in the table represents the threshold, and each column in the table shows the node code with a larger degree centrality at this threshold.Observing the table horizontally, we have found that no matter how the threshold changes, there are always some nodes that are always more central nodes, which are more important nodes.Looking at the table longitudinally, we find that some thresholds contain fewer important nodes, whereas some thresholds contain more important nodes.We think that the threshold contains a greater number of important nodes, that it is more appropriate it is and that we want it to be more so.In order to facilitate the observation of the frequency of stocks, we mark the stocks in the table in red, purple, green, blue and white in order of frequency.From the table, we can see that the high-order nodes of the network constructed by our method tend to have a threshold of 0.3.Similarly, high-order nodes of a network built by the stock price index correlation threshold method (traditional method) tend to have a threshold of 0.4.Concerning Figure 6, Tables 1 and 2, we can conclude that when the value of connectivity efficiency is more significant, the number of high-frequency center nodes increases, and when the connection efficiency reaches a maximum value, the number of high-frequency center nodes is the highest.This shows that the network constructed near the maximum of the connectivity efficiency can better characterize the relationship between stocks, and from the side, our defined threshold selection criteria, the connectivity efficiency, can help us to select the appropriate threshold.Moreover, as can be seen from Figure 6, the effectiveness of the network constructed by the method proposed by us is higher overall than the traditional method.However, as can be seen from Tables 1  and 2, the number of network high-frequency center nodes constructed by this method is generally lower than the traditional method.This means that the network we have built uses fewer edges to achieve higher connection efficiency and better reflects the indispensable interaction between stocks.
In addition, we also found that the vast majority of the stocks (000166, 000686, 601555, 600369, 002736, 601099, 002500, 600909, 000783) of nodes with relatively high degrees of the network constructed by traditional methods belong to the financial industry.The stocks (600118, 601958, 000166, 601866, 601099, 000415, 600482, 600219) corresponding to the nodes with higher networks moderately constructed by our method belong to the communication equipment, mining industry, financial industry, transportation logistics and non-ferrous metals, metals, shipping equipment, leasing and commerce, respectively.This has important implications for our study of the relationship between different industries in the entire stock market.Although the model presented in this paper has achieved good experimental results, there are still some shortcomings.Because of the calculation of image similarity, the algorithm is not efficient.What is more, the system is an undirected network, which does not reflect the typical asymmetric interactions in the real world.

Conclusions
This paper selects the characteristics of multivariate stock sequences, converts the critical attribute sequences into grayscale images to compose color images and calculates the similarity between images.In fact, we proposed a way to calculate the correlation of multivariate time series.We treat the stock as a node, consider the relationship between the shares corresponding to the images as a continuous edge and build a stock network based on different thresholds.Furthermore, we define the connectivity efficiency to choose the best limit and establishing a high connectivity efficiency network with the optimal network threshold.By searching the central node in the threshold network, we have found that the network center nodes constructed by our proposed method have a more comprehensive industry coverage than the traditional techniques to build the systems, and this also proves the superiority of this method.Creating a directed net and popularizing our algorithm model in more realistic systems will be our future research direction.

Figure 1 .
Figure 1.Architecture of our proposed method.The whole process of our method contains five steps.(1) Selected representative sub-time series to form a new multivariate time series (MTS).(2) Converted sub-time series to a Gramian angular field (GAF) grayscale image.(3) Fused multiple grayscale images into a color image.(4) Added nodes to the complex network.(5) Determined edges between nodes by the similarity between fused images.

Figure 2 .
Figure 2. Correlation of time series and images.(A) X 1 , X 2 and X 3 represent different time series; G 1 , G 2 and G 3 are GAF images converted from the time series X 1 , X 2 and X 3 , respectively.S 1 is a multivariate time series in which the first element is X 1 and the second element is X 2 .Similarly, S 2 , S 3 and S 4 are multivariate time series, respectively.M 1 and M 2 are MGAF images converted from S 3 and S 4 , respectively.(B) Only use the first element of S 1 , S 2 and S 3 .In this situation, the Pearson similarity of S 1 and S 3 is 0.389 and S 2 and S 3 are the same.(C) In previous methods, the MSSI M of the corresponding GAF image of S 1 and S 3 was 0.308, and the MSSI M of the corresponding GAF image of S 2 and S 3 was 0.265.The MSSI M of the corresponding GAF image of S 3 and S 4 was 1.0.(D) Taking the first and second elements of S 3 and S 4 into consideration, the MSSI M of the corresponding MGAF image of S 3 and S 4 is 0.832.

Figure 3 .
Figure 3.The correlation between the various attributes of the stock.The color from light to dark in the figure indicates the Pearson similarity from small to large.The deeper the color, the greater the similarity.In the figure, open, high, close, low, ma5, ma10 and ma20 have a large correlation.We chose the close to represent other highly-relevant attributes.In the same way, we choose p_change and turnover representing the p_change, price_change and volume, turnover, v_ma5, v_ma10, v_ma20, respectively.

Figure 4 .
Figure 4.The image fusion network.Each node in the network corresponds to a stock.The edge indicates that the correlation between the two stocks corresponding to the fused image is greater than or equal to 0.3, which is roughly the threshold corresponding to the maximum value of the connection efficiency in Figure6.The size of the node represents the degree of the node.The larger the node, the greater the degree of the node.

Figure 5 .
Figure 5.The network global attributes' map with the thresholds.(A) shows the results of the proposed method.(B) shows the results of the stock price index correlation threshold method [21] (traditional method).The purple, blue, green and orange curves in the graph represent network density, efficiency, clustering coefficient and the fraction of the coverage of the network, respectively.

Figure 6 .
Figure 6.The connectivity efficiency of the proposed method and the traditional method.The connectivity efficiency, the strength of the edge connection nodes, is defined as Equation (6).The connection efficiency is a convex function with respect to the threshold.The connection efficiency increases first as the threshold increases and then decreases after the maximum value is obtained.

Table 1 .
Degree centrality in the image similarity stock network.

Table 2 .
Degree centrality in the Pearson stock network.