Stability Analysis of Company Co-Mention Network and Market Graph Over Time Using Graph Similarity Measures

: The aim of the paper is to provide an analysis of news and ﬁnancial data using their network representation. The formation of network structures from data sources is carried out using two different approaches: by building the so-called market graph in which nodes represent ﬁnancial assets (e.g., stocks) and the edges between nodes stand for the correlation between the corresponding assets, by constructing a company co-mention network in which any two companies are connected by an edge if a news item mentioning both companies has been published in a certain period of time. Topological changes of the networks over the period 2005–2010 are investigated using the sliding window of six-month duration. We study the stability of the market graph and the company co-mention network over time and establish which of the two networks was more stable during the period. In addition, we examine the impact of the crisis of 2008 on the stability of the market graph as well as the company co-mention network. The networks that are considered in this paper and that are the objects of our study (the market graph and the company co-mention network) have a non-changing set of nodes (companies), and can change over time by adding/removing links between these nodes. Different graph similarity measures are used to evaluate these changes. If a network is stable over time, a measure of similarity between two graphs constructed for two different time windows should be close to zero. If there was a sharp change between the graphs constructed for two adjacent periods, then this should lead to a sharp increase in the value of the similarity measure between these two graphs. This paper uses the graph similarity measures which were proposed relatively recently. In addition, to estimate how the networks evolve over time we exploit QAP (Quadratic Assignment Procedure). While there is a sufﬁcient amount of works studying the dynamics of graphs (including the use of graph similarity metrics), in this paper the company co-mention network dynamics is examined both individually and in comparison with the dynamics of market graphs for the ﬁrst time.


Introduction
The modern economy is a complex system consisting of an enormous number of companies that interact with each other to achieve their own goals. Modeling the aggregate as well as local behavior of such systems is an extremely important, albeit complex problem. One of the modern approaches to building models of economic or financial systems is graph models that are based on transforming empirical data into a network representation using additional reasonable assumptions. In such graphs, nodes usually correspond to companies, and edges between nodes reflect the relations between them. The following may serve an example of such relationships: • direct links between companies, for example, a supplier-consumer type relationship, i.e., one of the companies supplies goods or services to another company [1][2][3][4][5], • relations between banks [6][7][8][9][10] (lenders and borrowers in the interbank loan market); • the connections reflecting investments of one of the companies into another one [11][12][13][14].
Unfortunately, such information is often confidential and not always available to researchers. Therefore, for the construction of network models of economic interaction, not directly observable links between economic agents are often used: • One of the possibilities in discovering connections between companies is to use correlations between the returns of companies' assets. In accordance with the efficient market hypothesis, it is assumed that stock prices of companies and their mutual behavior reflect all publicly available information about companies. Thus, economic and financial connections between companies may be reflected by the correlation of the log returns of company assets [15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32]. • Some researchers believed that a connection between companies arises, if both companies have common members of the board of directors [33][34][35][36][37]. • Several papers dealt with company co-mention networks in which connection between companies is reflected by the fact of mentioning the both companies in one news item [38][39][40][41]. • Also, in various applications it can be useful to build graphs of industrial or spatial affiliation of companies [40].
It should be noted that in the course of their activities, economic agents generate a large amount of publicly available information which includes news reports on companies published by news agencies. News flow also includes SEC reports, court documents, reports of various government agencies, business resources, company reports, announcements, industrial, and macroeconomic statistics. The news flow containing financial and economic news items is extremely intense (thousands of news items per second) and exhibits high unstructurability. The analysis of the characteristics of time series corresponding to the flow of financial and economic news is an important and interesting task. The study of such characteristics would allow a deeper understanding of the features of the news flow background and its dependence on the current situation on financial markets.
News analytics data providers such as Thompson Reuters and Raven Pack collect data from different sources including news agencies and social media (blogs, social networks, etc.) and process such data in real time [42,43]. The paper [44] studies structural characteristics of news flows generated by news agencies, enterprises, organizations, social networks, etc.
Following ideas of [38,39] the current paper presents a company co-mention network as a graph where the world's major companies mentioned in financial, business-related, and economic news flow are shown as nodes. If any two companies are mentioned in the same piece of news, the company co-mentions graph shows it as the edge between two nodes. Various important characteristics of company co-mention networks were studied with use of different SNA metrics, such as eigenvector centrality, degree of centrality, betweenness centrality, closeness centrality, frequency, etc. in papers [38,39], and key companies of networks were found. It was demonstrated that the degree distributions as well as the clustering-degree of the analyzed graphs follow a power law, but using a non-typical exponent value. The subgraph analysis in terms of industrial and spatial affiliation allowed for key company identification in the network and the possibility to research the analogous power-law distribution for company co-mention network subgraphs. The QAP analysis method was used in [40] to examine the correlations between the company co-mention network and graphs, describing sectoral (and spatial) affiliation. Papers [38,39] explore how the structural characteristics of the company co-mentions graph change over time, such as the distribution of the vertices degrees, the distribution of the average clustering coefficient, the edge density, the size of maximum clique, its connectivity. It was found that the power-law structure of the co-mention graph is quite stable. The degree exponent as well as the clustering-degree coefficient were at their lowest values during the 2007-2008 financial crisis. All maximum cliques comprise a big amount of companies from banking sector. The maximum independent set was the largest at the peak of the 2008 financial crisis.
The aim of the paper is to provide a joint analysis of news and financial data using their network representation. The formation of the network structures from different data sources is carried out using two different approaches: • by building the so-called market graphs in which nodes represent financial assets (e.g., stocks) and the edges between nodes stand for the correlation between the corresponding assets. • based on companies' co-mention in the news flow. The company co-mention network is constructed as follows: two companies are connected by an edge if a news item mentioning both companies has been published in a certain period of time.
The market graph as well as the company co-mention network are changing over time by means of adding or by removing links. Two co-mention networks built for two consecutive periods contain the same nodes (companies). However, the presence or absence of an edge (a link between two companies) depends on whether the news item mentioning these two companies was published during this period or not. The news generating process is random, so the presence of an edge in a specific period of time may vary from one period to another. We could not find any papers in which this behavior of the news flow is modeled. In this paper we construct the company co-mention network as well as the market graph using empirical data.
The main research questions that we would like to answer in this paper are the following ones: 1. Does the market graph remain stable over time? How significantly do the market graphs constructed for two consecutive 6-month windows differ? How did the crisis of 2008 change the stability of the market graph? Were the changes of the market graph during the crisis minor or noticeable? 2.
Does the company co-mention network remain stable over time? How significantly do the company co-mention networks constructed for two consecutive windows differ? How did the crisis of 2008 change the stability of the company co-mention network? Were the changes of the company co-mention network during the crisis small or huge? 3.
What of the two networks was more stable over time: the market graph or the company co-mention network? 4.
How do the market graph and the company co-mention network constructed for the same time window differ?
To examine how the networks have actually evolved over time we will employ the approach described in the paper [45] based on the use of different graph similarity metrics. To avoid one-sided results associated with the wrong choice of the graph similarity metrics, in our study we will use four different measures: • the Hamming distance (h) between graphs; • a network similarity measure d proposed in [46] that quantifies how the set of central nodes (their ranking) has changed in a network; • D-measure which is proposed in [47] and proved to be discriminative and computationally efficient to distinguish and quantify graph dissimilarities and which can identify and quantify topological differences between graphs; • graph diffusion distance (GDD) [48] based on measuring the average similarity of heat diffusion on each graph.
Note the last three measures have appeared relatively recently, but have already demonstrated their advantages in several empirical studies.
In addition, we exploit QAP (Quadratic Assignment Procedure) which was introduced in [49] and developed in [50,51]. The paper has the following structure. Section 2 describes the procedures for constructing the market graph and the company co-mention network based on empirical data. In addition, in this section we describe a methodology for assessing the stability of graphs over time based on the use of graph similarity measures. Section 3 contains a brief description of the graph similarity measures, which we will use in our study. In Section 4, we describe the empirical data based on which we construct the market graph and the company co-mention network. Finally, we present our results on the analysis of the graph stability over time in Section 5. Section 5 of this paper presents new empirical study on the dynamics of company co-mention network both separately and in comparison with the dynamics of market graph, based on the estimation of graph similarity metrics of graphs constructed for successive time periods. For convenience of the reader, we list some notations in Abbreviations.

Market Network Construction
The examination of properties for market networks has accentuated in the past few years. It seems that the notion of the market graph was firstly studied in the paper [15]. In his work Boginsky defined the market graph as a complete weighted graph in which the nodes (or vertices) serve as stocks and weights of edges match similarity between behavior of the stocks. The simplest way to quantify this similarity can be done with use of Pearson correlation coefficient. For this reason, Boginsky [15] suggests that an edge between two nodes (assets) is embedded in the market graph, if the corresponding value of Pearson correlation coefficient is bigger than a fixed threshold.
The market graph approach proposed in [15] has received much interest in the recent decade. In particular, many papers have obtained applicable empirical results using real market data while exploring various structural features and aspects of market graph such as maximum cliques, maximum independent sets, degree distribution [16][17][18][19], clustering in Pearson correlation [20], dynamics of the US market graphs [21], complexity of market graph [22]. The papers [17,[23][24][25][26] examine distinct financial markets to find differences between them. Market graphs with measures of similarity diverse from correlation are under investigation in works [23,[27][28][29][30]]. An analysis for estimation of reliability of market graph approach results was presented in [31].
We construct the market network using the Pearson correlation; it means that two companies are connected if the value of the Pearson correlation between the two assets are above a given threshold (in this period of time).
More precisely, the market graph is constructed as follows. We denote by P i (t) the price of the asset i in day t. Then is the logarithm of the ratio of the price of the asset i in day t to the price in the previous day t − 1. Let n be the number of assets. We will suppose that random variable R i (t), t = 1, 2, . . . , N, has a corresponding distribution R i , i = 1, 2, . . . , N, and the joint distribution of random R 1 , R 2 , . . . , R N is not known.
The Pearson correlation coefficient between random variables R i and R j is defined by denotes the mean value of R i . The Pearson correlation is the most popular measure exercised in the examination of the finance market. The main shortcoming of the Pearson correlation is weak robustness to deviations from the assumptions on identity distribution of the random variables in question.
We will use the Pearson correlation to measure the pairwise similarity measure for stocks i and j. The edge between the vertices i and j is added to the graph if r ij ≥ θ, which means that the prices for these two assets behave identically over time, and the degree of this similarity is determined by the corresponding value of the Pearson correlation coefficient.
The market graph constructed with use of the measure linearly dependent on the sign correlation was studied in [23,32]. The paper showed that the measure is capable for the analysis of the market graphs. As pointed out in [23], the sign correlation has a few important differences from the Pearson correlation, which makes it more applicable to our analysis than the classical correlation.
It is worth noting that for large graphs it would be more computationally effective to use a clustering algorithm to find clusters of highly connected nodes, and then compute similarity between pairs of nodes within each cluster and then between clusters [52]. Please note that the computation of node similarity between all pairs over the entire graph may be much more time costly for large networks. In our research the graph sizes do not exceed 1053 nodes and the construction of graphs may be made in reasonable time by calculating all pairwise correlations.
To expose the evolution of market structures we use the dynamic approach which is peculiarly useful for the comparisons of calm periods before the financial crisis of 2008 and crashes. For every stock pair, we take the log return time series in a time window with the length of six months, i.e., with price values included in the window. Using a sliding window approach, we can calculate the correlation matrices for each of the six-month sliding windows by shifting each subsequent window by one month. Thus, in the dynamic approach with sliding windows, we calculate a sequence of correlation matrices and corresponding market graphs. We chose the window length of six months for the following reasons. Too small a window length would lead to incorrect correlation dependencies, since the number of assets is more than 1000. On the other hand, choosing a larger window length would result in the effect of the shocks of one local period being reflected in the correlation matrix constructed for this long interval.

Network Representation of News Analytics Data
In the company co-mention network, the network "node" represents a company, while the nodes relationship is indicated by the edge. If a company was mentioned in the same news report with some other companies, it is connected with them. The company co-mention network can be viewed as an undirected weighted graph, and therefore, the company co-mention network can be treated as a social network.
We conduct the analysis of the companies' co-mention network according to the pattern outlined in [38]: 1. we assemble all economic, business-related and financial news published over six years (2005-2010); 2. we accomplish the process of data cleansing; 3. we chose companies cited in news reports during this time; 4. we divide the 6-year period into overlapping semiannual intervals. Each subsequent interval is obtained by shifting the previous one 1 month ahead. The result is 67 intervals of the same 6-month size (approximately 125 trading days). 5. we calculate the number of co-mentions (link weight) for every two companies cited together at least in one piece of news over each time interval. In case the companies are not co-mentioned in the given interval, the link weight is 0. 6. we used these weighted calculations of the collective companies' mentions to obtain symmetric co-mention matrices for each interval; 7. we explore the evolution of the co-mention matrices over the time, and the results of this study are being visualized and interpreted.
The first operation is executed by the news analytics providers among which are Raven Pack, Media Sentiments, and Thompson Reuters. They assemble news items in real time from various news providers and sources. They employ AI algorithm to accomplish analysis of each news item in real time for their subscribers. As a result, each news report is transformed into a set of metadata including time of publication, the name of company or asset, news relevance, novelty, etc. The comprehensive characterization of news analytics and its application in finance industry may be found in [42,43].

Methodology
Many real networks have evolved over time by adding/removing nodes or links between the nodes. The network at time t and the network at time t + 1 may differ from each other, even if the set of nodes has not changed. If these changes are neglectable, then the network remains stable over time.
In addition, a network that has remained stable for a period of time may change sharply at some point in time due to some unexpected reasons.
The networks that are considered in this paper and that are the objects of our study (the market graph and the company co-mention network) have a non-changing set of nodes (companies), and can change over time by adding/removing links between these nodes:

•
We construct the market graphs based on the correlations between assets for a 6-month window, moving the sliding window by one month ahead to construct the following subsequent graph.

•
We construct the company co-mention network (for the same companies that form the market graph), adding an edge between two companies, if a news item mentioning both these companies was published during a 6-month window, shifting the sliding window by one month forward to construct the subsequent network.
To evaluate these changes, a graph similarity measure can be used. If a network is stable over time, the measure of similarity between two graphs constructed for two different time windows should be close to zero. If there was a sharp change between the graphs constructed for two adjacent periods, then this should lead to a sharp increase in the value of the similarity measure between these two graphs.
Currently, there are a large number of graph similarity measures. Each such measure has both positive characteristics and several drawbacks. Therefore, to avoid one-sided results caused by the wrong choice of the graph similarity measure, we will use different measures that have worked well in applied research and that evaluate the similarity of graphs in respect of various aspects and characteristics (topology, node ranking, etc.).
In this section, we describe two methods we will use to analyze the dynamics of graphs (both market graph and company co-mention network). Let G 1 , G 2 , . . . , G T be the sequence of the graphs representing the states of a complex system at time slots 1, 2, . . . , T. Let ρ(G t 1 , G t 2 ) be the value of a graph similarity measure calculated for two states G t 1 and G t 2 . As such a measure of similarity ρ, it can be possible to use various metrics that estimate the distance between the graphs (e.g., the difference in L 2 -metric, graph-edit distance, or measures based on the presence of isomorphic subgraphs). Unfortunately, the use of these simple measures did not allow us to obtain interpretable results (both market graph and company co-mention network). Therefore, in our study we used the similarity measures described later in Section 3.

Dynamics Analysis Based on the Assessment of the Neighboring Graphs Similarity
The essence of this method is simple enough and consists of using two different similarity measures ρ 1 and ρ 2 . It is desirable that these measures evaluate different types of graph dissimilarities (for example, topological and structural dissimilarities). As such measures, in the following sections we will use the Hamming distance and the d-measure defined below by Equations (2) and (3). The first measure measures the closeness of the local structural properties of graphs, while the second measures the similarity of the centrality indices of vertices. Then we find the values of the measures ρ 1 and ρ 2 for all neighboring pairs of graphs. Thus, we get T − 1 points on the plane (ρ 1 , ρ 2 ). Visualizing these points on the plane can help a researcher • to find the periods in which the greatest changes occurred during the transition from one time interval to another; • to find periods of stability in which there were no changes between adjacent graphs in terms of measures ρ 1 and ρ 2 ; • to understand which characteristics of graphs have changed more: those that are evaluated by ρ 1 or those that are related to the measure ρ 2 .
In particular, this approach was applied in [46] to analyze the dynamics of immigration flows between countries.
It should be noted that the proximity between pairs of points ( ) as well as all points i, i + 1, . . . , j − 1 between them on the plane (ρ 1 , ρ 2 ) does not guarantee the proximity of the initial G i−1 and the last G j graphs. Therefore, despite the simplicity and clarity of the resulting visualizations, this approach has its limitations.
We apply this approach for visualization of the dynamics of both market graphs and company co-mention networks in Section 5.1.

Multidimensional Scaling Analysis Approach
Another idea that we also apply in our research is to use the multidimensional scaling analysis. First, we calculate the values of the distances between all pairs of graphs from the sequence G 1 , G 2 , . . . , G T using the measure ρ, and form a matrix of pairwise distances Then applying the multidimensional scaling analysis (MSA) to matrix A, we can derive underlying factors which influence the graph dynamics (with respect to the measure ρ). In particular, the MSA may expose essential underlying dimensions that help the researcher to interpret observed similarities or dissimilarities (distances) between the graphs. This approach is applied to market graphs and company co-mention networks in Section 5.3.

Graph Similarity Measurement
The problem of finding adequate network stability and similarity measures has been the focus of research in the recent decades.
Paper [46] pointed out that main shortcoming of many methods for graph similarity quantification is that they do not take into consideration topological structure of the networks. All edges are treated equally with no regard to the fact whether they link two disconnected components or two vertices in a dense network. To reflect and quantify topological similarities of the networks several approaches have been developed in papers [46,67,68].
In this section, we briefly describe the well-known Hamming distance and the graph similarity measures proposed in [46][47][48] that we will use in Section 5.

The Hamming Distance: Similarity of Local Structure
The Hamming distance is a special instance graph-edit distances and measures the number of edge deletions and insertions necessary to transform one graph into another. The Hamming distance can be used for a network dynamics analysis which shows how a network evolved over time in terms of its local structure. The brief description of this approach can be found in this subsection.
Let A t denote the adjacency matrix of graph G at time t. The Hamming distance between networks at two time slots t 1 and t 2 is defined as follows: The Hamming distance h(G 1 , G 2 ) is symmetric and varies from 0 to 1. If h(G 1 , G 2 ) = 1 then the networks are completely different. If h(G 1 , G 2 ) = 0 then these networks are identical.

d-Measure: Node Similarity Measure Based on Interval Orders
Paper [46] proposes a measure that describes the distance between two graphs G 1 and G 2 . The measure d(G 1 , G 2 ) uses an interval order idea in a network theory by evaluating how the central nodes of network have changed.
Let G 1 and G 2 be two graphs which we would like to compare using the sets of their most important nodes. Let the graphs have the same number of vertices n and the same set of nodes. Let c t i be the centrality of node i in graph G t , t = 1, 2. In our study we rank the nodes of the graphs based on the PageRank measure.
Let R t = rank t ij represent our knowledge about comparable ranking of vertices in graph G t formed by means of their centrality evaluation at time t: Paper [46] pointed out that the selection of parameter ε should be based on the problem under consideration. In our study we chose ε = 0.00001, so even relatively small changes in data would be taken into account.
Then d-measure between G 1 and G 2 (the distance between the two rankings for the networks G 1 and G 2 ) is defined in [46] using the Hamming distance formula: The d-measure is symmetric and varies from 0 to 1.

D-Measure
The D-measure (dissimilarity measure) was proposed in [47]. Let the distance distribution in each node i of the graph G with n nodes, P i = {p i (j)}, is given, where p i (j) denotes the proportion of nodes which are connected to node i at distance j. Comprehensive information of the network topology in a compressed way is presented in the set of n node-distance distributions, {P 1 , . . . , P n }.
For an N-nodes network, the set of n distance distributions {P 1 , . . . , P n }, is normalized by NND is defined by the following equation: where is the Jensen-Shannon divergence of the N distributions and µ j = 1 n ∑ i,j p i (j) is their average. The D-measure was defined in [47] as follows: where µ G 1 , µ G 2 are the graphs averaged node-distance distributions, NDD is defined in (4), G c 1 , G c 2 are complements of G 1 and G 2 . The last term includes the comparison of α-centrality values of the graphs computed through the Jensen-Shannon divergence.
In our paper we use the weights w 1 = w 2 = 0.45, w 3 = 0.1, which was suggested in [47] as the most appropriate way to quantify structural dissimilarities in graphs.

Graph Diffusion Distance
Graph diffusion distance (GDD) was proposed in [48]. GDD is aimed at evaluating the dissimilarity between two graphs with the same number of nodes and is based on quantification of the average similarity of heat diffusion in the graphs. To compute the value of GDD it is necessary to find (for each graph) Laplacian exponential kernel matrices that arise in solving the heat diffusion problem with initial conditions restricted to single vertices. Then the value of GDD is defined in [48] as the Frobenius norm of the difference of the kernels, at the diffusion time in which the difference is achieved its maximum.

Combined Similarity Metric
It can be possible to visualize the overall changes in a graph as a point (h(G 1 , In case the obtained point is a null point, (h(G 1 , G 2 ), d(G 1 , G 2 )) = (0, 0), we treat the networks as identical. When (h(G 1 , G 2 ), d(G 1 , G 2 )) = (1, 0), the network structure differs entirely yet the central elements are same (i.e., complete or empty graph). If (h(G 1 , G 2 ), d(G 1 , G 2 )) = (1, 1) then two networks are different both in local structures and sets of key elements (for example, node chain compared to inverse chain). If (h(G 1 , G 2 ), d(G 1 , G 2 )) → (0, 1) then there is a complete instability in terms of its central elements.
Paper [46] suggests transforming the two measures into one similarity measure where α is relative importance of the ranking distance. When α = 0, the similarity measure can be considered similar to classical measures as being based on the network structure. If α = 1, then the networks are similar (in the case when network node-rankings disregarding the local structure). If α = 0.5 then both measures are equal. The main trends in a network can be revealed by the two measures application, which can also be used to create a comparison of pairs of temporal networks with clustering procedure. This is used on a network to find its homogeneous periods or life cycles.

QAP Procedure
One of the methods for graph similarity estimation is the applied quadratic assignment procedures (QAP) regression. In our research we use QAP procedure to examine the stability of the market graph and company co-mention network over time. It should be noted that the application of the standard OLS regression would provide incorrect results due to the fact that this method relies on the assumption of independency of the observations and that they are identically distributed. Indeed, since many vertices of the network are connected by links, the directly or indirectly linked vertices have potentially dependent relation. Thus, the precondition for ordinary least squares method is not met.
For this reason, QAP regression proposed by D. Krackhardt in [49] uses nonparametric permutation. The QAP procedure permutes rows and columns of the graph matrices, and then correlation coefficient between independent adjacency matrices and the dependent adjacency matrix is calculated. The QAP procedure repeats permutations of rows and columns of the adjacency matrices many times to find a test statistic for testing the null hypothesis of the regression.
It was shown in [69] that in the case of high autocorrelation the QAP procedure leads to a much lower proportion of type 1 error than OLS regression.
In our research, we would like to find • the dependence between the adjacency matrix of the market graph constructed in a given period and matrices constructed for other periods; • the dependence between the adjacency matrix of the company co-mention network constructed in a given period and matrices constructed for other periods; • the dependence between the adjacency matrix of the market graph constructed in a given period and the adjacency matrix of the company co-mention network constructed for the same period.
In such network matrices, the autocorrelation might occur. By this reason we employ the QAP regression procedure.
QAP method has proved to be successful in many applied problems: for identifying significant factors for predicting social relations [70], for finding important factors that influence web citation among universities [71], to study the job mobility of scientists [72], to recognize the patterns in patent network analysis [73].

Financial Data
The database for constructing and analyzing the market graph was taken from the Yahoo Finance. The daily data were collected from Yahoo Finance database, which was used to retrieve historical prices of the companies traded in the largest stock exchanges for the period from 1 January 2005 to 31 December 2010 (i.e., 1500 trading days). To study the dynamics of the market graph, the 1500-day trading days interval was divided into 67 consecutive overlapping 125-day periods. The dates corresponding to each period are presented in Table 1. Market network is formed based on correlation; it means that a company has connection with those companies which have the positive significant correlation of assets with it in this period of time. In our research market graphs were constructed as it is described in Section 2.

News Analytics Data
The paper analyzes the entire scope of financial, business-related and economic news published over six years (72 months) from 1 January 2005 to 31 December 2010. The news analytics data were cleared to eliminate all messages on the beginning and end of the exchange trading sessions and analytical reports with tabular data. Overall, the cleared data set contained over 8,550,000 messages for a six-year period. The intensity of the news flow remained rather stable over the time interval. The news count increased by an average of 2% per year. The monthly number's magnitude ranged from 90,000 to 145,000. The maximum points of co-mentions may correlate with the period of the early 2007-2008 financial crisis of ( Figure 1).
The number of companies in which there was at least one mention of them in 5 years exceeded 24,000. Moreover, 18,500 enterprises had at least one joint mention in the same time interval. Table 2 shows that 92.2% of the entire amount of news mentioned only one of the enterprises. 7.1% cited two companies, whereas 0.5% of all news items mentioned three companies. The number of news containing co-mentions (i.e., related to more than one firm) ranged between 5.5% and 11.4% in different months. Less than 0.05% of the messages contained the co-mention of four or more enterprises. News reports containing simultaneous mention of ten or more firms were fairly rare (fewer than 50 news items over a 6-year period). The highest number of enterprises cited in one piece of news was 14. Table 3 shows thatthe total of co-mentioned pairs over five years was more than 1,757,000. Over 50% of news reports and 45% of co-mentions were associated with firms (stocks) traded in the United States. Over 90% of news and co-mentions were connected to companies (stocks) traded on the 15 largest exchanges. Table 4 shows the amount of news items mentioning a given number of companies in each year from 2005 to 2010.
For each of the 72 months the number of co-mentions of each pair of companies was calculated, then the corresponding adjacency matrices of co-mention graphs were created. At the next step, we ranked the companies by the average co-mention number per month. The leader (the most frequently co-mentioned company) was determined in the news stream along with other enterprises in 220 messages on average per month. For 4 years, over 4000 assets were cited with the leader. However, only about 200 companies had co-mentions together with the leader more than one time per year.   In our research company co-mention networks were constructed as it is described in Section 2.

Empirical Result
We divide 6-year interval into 67 half-years overlapping intervals and choose 1053 companies with highest density of news that mention them during the period under review. We excluded news with relevance under 80 (i.e., news with 80% or less probability of being connected with the company). Then for each time interval we check the amount of co-mentions for each pair of companies in one article (if two companies are both mentioned in one article during the period of time, the weight of the link is considered 1); if companies were not mentioned during the interval the weight of the link is considered 0. Then we form unweighted symmetric matrices of co-mentions for each time interval using these weighed calculations of the collective companies' mentions.
The market graph is based on correlations between 1053 shares chosen while forming the co-mention network. It means that a company has connection with those companies which have the positive significant correlation of assets with it in this period of time. In order to have market graphs similar to the co-mention graphs correlation threshold value is made 0.6. Figure 2 shows the dynamics of edge density of the resulting graphs for the chosen periods. It can be seen that the edge density had its highest values during the 2008 financial crisis (the dark fragment in the middle of the figure). It has to be mentioned that the density of the co-mention network has been reasonably stable and has been insignificantly rising before the 2008 financial crisis, while the market graph had noticeable edge density rise during the major events of the financial crisis.

Similarity Analysis Using Measures h and d
We apply the proposed model to the co-mention network and to the market graph. The information about how the structure of the market graph changed over the adjacent half-years regarding ranking distance d and local structure distance h is shown in Figure 3.
For each six-month window (period) we constructed a market graph in accordance with the approach described in Section 2.1. The IDs of the periods and their starting and ending dates are given in Table 1. Thus, we obtained 67 market graphs M 1 , M 2 , . . . , M 67 corresponding to the 67 six-month periods. Similarly, we obtained 67 company co-mention networks C 1 , C 2 , . . . , C 67 corresponding to each of the 67 periods (see Table 1) using the methodology described in Section 2.2.
We found the values of d-metric for each pair of graphs constructed for all two consecutive 6-month periods, i.e.,  shows the evolution of ranking and local structure distances between each pair of market graphs constructed for every pair of consecutive six-month periods, i.e., between 1 and 2, between 2 and 3,. . . , between 66 and 67. Thus, i-th point on the (h, d)-plane has coordinates (h(M i , M i+1 ), d(M i , M i+1 )), i = 1, . . . , 66. Each point on the plane characterizes the differences between the graphs at the current and previous time windows, evaluated by both the Hamming distance h and the d-measure. This visualization allows one to distinguish periods with higher or lower intensity of graph changes. Figure 3 shows that the local structure of the market graph changed very little until the beginning of the 2008 crisis (blue points). However, during the crisis (red points), the values of the similarity measure h (i.e., the Hamming distance) between consecutive graphs increased sharply (more than ten-fold). Moreover, after the peak of the crisis was passed, the instability of the network local structure remained at the same high level (green points). On the other hand, the value of the measure d, which measures the proximity of the ranking of the vertices of two consecutive graphs, did not increase during the crisis. 0 1 · 10 −2 2 · 10 −2 3 · 10 −2 4 · 10 −2 5 · 10 −2 6 · 10 −2 7 · 10 −2 8 · 10 −2 9 · 10 −2 0.  The i-th point in Figure 3 show (h, d)-similarity of i-th and (i + 1)-th graphs constructed for the corresponding consecutive 6-month intervals defined in Table 1 It should be noted that the local structure of the market graph changed greatly at the beginning and during the financial crisis. Figure 3 shows that structure of significant correlations between asset returns was slightly changing before the crisis, while turbulence in financial markets during the crisis was inducing the visible transformations of the market graphs. Structural changes slowed down for several periods and then they started again. The central vertices list of the market graphs was updating more intense before and after the crisis than during the crisis, i.e., the ranking order of the companies was more stable during the crisis. Perhaps, it was caused by the fact that during the crisis many vulnerable companies were from the same economic sectors that were exposed by risks.
It is well-known that if the edge densities of any two graphs are very different, then the Hamming distance between these graphs will be large. Thus, the main contribution to the change of the market graph structure was due to increase and decrease in the edge density of the graph which can be seen in Figure 2). Please note that from the fact that the "blue" points are close to the "green" ones it does not follow that the corresponding graphs are (h, d)-close. To understand how much the graphs from the starting "blue" period differ from the "green" graphs, we conduct the multidimensional scaling analysis in Section 5.3.
Similarly, we found the values of hand d-metrics for each pair of company co-mention networks built for all of two consecutive six-month periods, i.e., h(C 1 , 66, are shown in Figure 4. Unlike the market graph, the node ranking and the structure of co-mention networks did not change significantly over time. However, the network local structure had been changing in periods from April 2007 to March 2008 (Figure 4). This period occurs before and during the financial crisis of 2008. Figure 4 shows that the co-mention network local structure changed slightly in 2007 (blue points). However, in the period before the crisis (red points), the values of the similarity measure h (i.e., the Hamming distance) between consecutive graphs increased by more than 1.5-2 times. Questions about what caused the changes in the local structure of the company co-mention network, as well as whether such changes in the characteristics of the news flow may be forerunner of crisis phenomena on the financial market, remain open. Surprisingly, at the very beginning of the crisis, the network local structure became more stable than in 2007, and remained stable in subsequent periods (green points). On the other hand, the value of the measure d, which measures the similarity in the ranking of the vertices of two consecutive graphs, did not increase during the crisis.  The obtained values of the measures d and h for consecutive market graphs (Figure 3) significantly exceed the values of the measures d and h for consecutive company co-mention networks (Figure 4). Some values of measure d differ by more than 2 times, while the values of h-measure differ by an order of magnitude. In this sense, the company co-mention network is more stable than the market graph.
The information about how the structure of the market graph changed in the adjacent half-years regarding co-mention network is shown in Figure 5. The ranking distance has increased significantly while local structure distance has been stable and not high. So, from the local structure point of view the market graph and co-mention network are similar in many ways. The only exception are the Financial and economic news which impacts an industry or a sector often mentions key companies of the industry or the sector. Therefore, the connection between companies reflected by their joint co-mention in a news item may be the result of their belonging to the same economic sector. It is known that correlations between returns on assets in the same sector are quite high. Therefore, it can be assumed that the market graph, constructed based on correlations between asset returns, and the company co-mention network, constructed on the basis of co-mentioning in the news, should be similar. However, as Figure 5 shows, this is not quite true: the differences are significant both with respect to network local structure (h), and with respect to node ranking (d).

QAP Correlation and Regression Analysis
Using networks of co-mentioning companies and market graphs, we carry out a QAP correlation analysis, since standard correlation analysis is not suitable for such data (as they are not independent from each other). This is contrary to one of the basic assumptions of linear regression analysis. QAP (Quadratic Assignment Procedure) was proposed and developed in [49][50][51]74]. We use QAP correlation analysis to determine the significance of correlations: • for related networks of co-mention, • for time-related market graphs, When using the market graph as the main network, the corresponding cells of the matrix are compared to compute the Pearson correlation coefficient. Furthermore, this process is repeated, randomly rearranging the columns and rows to find a correlation. Lower Pearson correlation values for random permutations indicate a significant relationship between the respective matrices.
For the correlation analysis, we used the package R. We apply QAP regression to find the factors which influence the market graph and the company co-mention network. For network presented in binary data, OLS should not be used when building regression, since this method requires observations to be independent and equally distributed. Connections between nodes in the network imply a potentially dependent relationship between either directly or indirectly connected nodes. Hence, the assumption is incorrect and the OLS method cannot be used. Rows and columns of network matrices in QAP are rearranged, thus the calculation of correlations is done between the independent matrices and the dependent matrix. Test statistics can be obtained after several permutations, we use them to check the null regression hypothesis.
In our study, we wanted to find a connection between market graphs, company co-mention networks in adjacent periods of time. To investigate how the market graph is related to the company co-mention network, we used QAP regression, where MarketGraph t at time t is used as a dependent variable. Market graph matrices in previous periods and company co-mention networks in the current period were used as independent variables for QAP regression.
The results of the analysis are presented in Tables 5 and 6. Rows and columns of the dependent variable matrix were rearranged 1000 times. Matrices of independent variables are shown in Table 6. The QAP results showed that the market graph matrix is closely related to the market graph in the previous period of time. The exceptions are periods 37-43 (April 2008-October 2008)-the peak of the financial crisis. Company co-mention networks had a smaller impact on the market graph, though they are also significant for all models built.  Table 5) that there is a significant correlation both between adjacent co-mention networks and between adjacent market graphs. The estimated density of repeated launches of QAP shows that of all launches, correlations for random graphs turned out to be less than test statistics, and therefore the obtained correlation values can be considered statistically significant.
Estimated correlation coefficients are quite high. At the same time, the company co-mention network is stably reproduced from period to period. As for market graphs, the correlation values vary in wide ranges and it can be argued that it decreased during the beginning of the global financial crisis.
Since we have data for several types of graphs and periods of time, this also allows us to construct a linear regression on graphs. The market graph was taken as a dependent variable at the current time (period) of time (Market graph t ). The independent variables were the market graph at the previous point in time (Market graph t−6 ) and the company co-mention network in the current period of time (Co-mention t ).
The QAP regression analysis of the dependence of the current market graph on the previous one, as well as on the current company co-mention graph, is given in Table 6. All coefficients of the models are statistically significant.
We also note that the coefficient for Co-mention t has its highest value for t = 43. This period corresponds exactly to the beginning of 2008 crisis. This indicates that during the crisis, the market graph had a special structure, which can be explained by the structure of a corresponding co-mention network.

Multidimensional Scaling
In this subsection we use the multidimensional scaling procedure to visually represent the matrix of pairwise distances between graphs (both market graphs and company co-mention networks). Multidimensional scaling was developed in [75] and aims in a graphical representation of distances between sets of objects [76]. Given a small number of dimensions, k, and for a given distance matrix with the distances between each pair of objects (graphs), multidimensional scaling algorithm is aimed in placing every object (graph) into k-dimensional Euclidian space in a way such that the between-object distances obtained by graph similarity measures would be preserved as close as possible.
The best-known methods of multidimensional scaling are metric, non-metric and generalized multidimensional scaling methods. Please note that metric multidimensional scaling algorithm finds a linear relationship, while non-metric multidimensional scaling algorithm is characterized by a set of nonparametric monotonic curves. Since we used quantitative rather than ordinal scales, the preference was given to the classical multidimensional scaling (MDS) which is also known as principal coordinates analysis [77].
Since we consider two sequence of graphs (market graphs and company co-mention networks) and use five measures for calculating distances between graphs, results are formed as the ten matrices of pairwise distances between graphs (five for market graphs, and five for co-mention graphs). Therefore, we apply the multidimensional scaling procedure to the ten distance matrices.
Let ρ be a similarity measure which finds the distance (similarity) ρ(G 1 , G 2 ) between two graphs G 1 and G 2 . In our study we will use as ρ: • the Hamming distance h; • the network similarity measure d proposed in [46]; • D-measure [47]; • graph diffusion distance (GDD) [48].
Using the measure ρ, we can find the distance matrix (adjacency matrix) (ρ(M i , M j )) 67 between all pairs of market graphs from our sequence M 1 , M 2 , . . . , M 67 . Also, using the measure ρ, we can calculate the distance matrix (adjacency matrix) (ρ(C i , C j )) 67 i,j=1 between all pairs of company co-mention networks from the sequence C 1 , C 2 , . . . , C 67 .
Multidimensional scaling analysis allows us • to visualize the dynamics of changes in the sequence of graphs; • to find the number of components (factors) explaining the dynamics which is determined by adjacency matrices.
Therefore, the multidimensional scaling analysis can provide an important insight into the dynamics of both market graphs and company co-mention networks. Figure 6a presents the results of multidimensional scaling applied to the distance matrix between market graphs which is calculated using h-measure defined in (2). Figure 6a shows that the local structure of the market graph is stable over time. During the financial crisis of 2008 (periods [38][39][40][41][42][43][44][45][46][47][48][49][50], the topological dissimilarity increases significantly and quickly returns to its previous level. Redundancy analysis shows that 56% of the variance is explained by the first principal component which is good enough. Figure 6b presents the results of multidimensional scaling applied to the distance matrix between the co-mention graphs which is calculated using h-measure defined in (2). Figure 6b shows that the topological dissimilarity of co-mention graphs is largely decreased before the beginning of the crisis and quickly returns to its previous level after that. Only 20% of the variance is explained by the first principal component. Figure 6c presents the results of multidimensional scaling to the market graph for distance matrix obtained using d-measure defined in (3). There can be seen a significant shift of the central nodes (companies) of the market graph during the crisis. 28% of the variance is explained by the first principal component. Figure 6d presents the results of multidimensional scaling to the company co-mention graph for distance matrix obtained using d-measure defined in (3). Figure 6d shows that for the co-mention graph there is a monotone increase in the rank distance, which accelerates after the crisis. Thus, the crisis led to significant changes in the ranking order of the co-mention graph companies. Only 31% of the variance is explained by the first principal component.
The results of multidimensional scaling to the market graph and the co-mention graph based on the distance matrix obtained using the linear combination of d and h defined in (5) are presented in Figure 6e (with α = 0.5) and Figure 6f (with α = 0.05). Figure 6g presents the results of multidimensional scaling applied to the market graph for distance matrix obtained using D-measure. It should be noted that the results are quite similar to the results shown in Figure 6c. 59% of the variance is explained by the first principal component. Figure 6h presents the results of multidimensional scaling applied to the co-mention graph for distance matrix obtained using D-measure. There can be seen a significant decrease before the beginning of the crisis and an increase to higher level after that. 75% of the variance is explained by the first principal component. Figure 6k,l present the results of multidimensional scaling applied to the market graph and to the co-mention graph respectively for distance matrix obtained using Graph Diffusion Distance. The results are similar to the results shown in Figure 6a. 39% of the variance is explained by the first principal component for the market graph and 13% for the co-mention graph.
The graph similarity measures (D-measure, Graph Diffusion Distance, d, h) showed similar results for the market graph in terms of the principal component method. In the case of the D-measure and h-metrics it suffices to use only the first principal component. For the co-mention network, there were obtained different results for different measures. Except for the D-measure, the first principal component explains less than 32% of the total variance.
However, it seems that the calculation of D-measure is the most time-consuming with comparison to other similarity measures. In our study, we used the corresponding R functions to estimate the similarity between graphs with 1053 nodes. The calculation of the similarity for each of the pairs using D-measure lasted about 5 times longer (and even more in case of increasing the edge density of the graphs) with comparison to d-, h-metrics and GDD.  Figure 6. Graphic representation of the results of calculating the difference matrices between graphs. Figures (a,c,e,g,k) present the results of multidimensional scaling applied to the distance matrix between market graphs which is calculated using the Hamming distance, d-measure, l α -measure with α = 0.05, D-measure and GDD, respectively. Figures (b,d,f,h,l) present the results of multidimensional scaling applied to the distance matrix between company co-mention networks which is calculated using the Hamming distance, d-measure, l α -measure with α = 0.05, D-measure and GDD, respectively.
Below we draw some conclusions on the results of the multidimensional scaling (MDS). We found that the one-factor model can explain a significant part of the change dynamics in the structure of both the market graph and the co-mention graph. However, the reliability of the conclusion essentially depends on the choice of a graph similarity measure.
One-factor estimates obtained by the MDS based on the distance matrix for the market graphs are turned out to be slightly diverse for different graph similarity measures. In particular, the use of h-measure and GDD metrics gives very similar results, which are different from the results obtained for dand Dmeasures. The one-factor estimates obtained by the MDS for the co-mentioning graphs are more sensitive to the choice of the graph similarity measure.
We would like to note that visual representations of the evolution of the market graph constructed using the Hamming distance and GDD-measure (Figure 6a,k), show very similar temporal dynamics.
The visual representations of the evolution of the company co-mention network constructed using these two measures (Figure 6b,l) show also quite similar temporal dynamics, which differ only in sign.
The apparent similarity of the edge density dynamics (Figure 2) with the dynamics shown in Figure 6a,k indicates that the main factor, that has been identified by the MDS when using the Hamming distance or GDD-measure, is the graph edge density. In other words, the dynamics of graph changes obtained using the Hamming distance or GDD-measure can be easily explained by such a simple factor as the graph edge density.
On the other hand, the use of d-measures allowed us to identify almost identical dynamics for both the market graph and the co-mention network over time (Figure 6c,d). The figures show that these changes took place smoothly and continuously, while the ranking of the central nodes during the entire period under consideration changed quite significantly in both graphs.
The results obtained using the D-measure are more ambiguous. Figure 6g,h show that one factor is not sufficient to explain the dynamics of the market graph. It seems that the D-measure is a more adequate tool for network comparison, since it uses more factors to explain the differences between the graphs.
One method out of five (d-measure) shows a significant difference in the structure of graphs in the pre-crisis and after crisis periods. The dynamics of changes for the market graph are turned out to be not similar to the dynamics of the company co-mention network. However, we obtained the closest similarity when applying d-measure.

Conclusions
In this paper, we applied the methods of graph similarity analysis to study the network structures that describe the correlation relationship between the profitability of financial assets (market graphs) and the co-mentions of companies in the news flow (co-mention networks) during 2005-2010. In order to analyze the variability of the network structures over time, different methods were used to calculate the graphs similarity (graph diffusion distance, D-measure, node ranking similarity-based metric and the Hamming distance). In addition, QAP correlation and regression analysis were used to examine graphs similarity. The results of applying different methods for measuring differences in network structures turned out to be generally consistent with each other. The structures of graphs in adjacent periods are quite similar. However, the Hamming distance has shown great sensitivity to differences in market graphs, based on the data for the half year during the financial crisis of 2008, and the preceding and subsequent periods as well. On the other hand, nodes similarity-based metric better reflects the migration of the position of the central nodes in the co-mention graph. In addition, the use of the QAP procedure confirmed the presence of significant correlations between the adjacency matrices of the market graph and the company co-mention graph.
Our study analyzes changes in the graph properties corresponding to two parallel processes as well as similarities or differences in their dynamics. Moreover, we examine how stable the results of this analysis are regarding the choice of the graph similarity metrics. To do this, we calculated distance matrices of graphs constructed from data for successive periods, and analyzed the distance matrices using the multidimensional scaling method (MSM). The results of applying five different graph similarity measures are compared. We can make the following conclusions: • We found that the market graph constructed based on correlations between financial asset returns was significantly less stable over time than the company co-mention network in the period 2005-2010. In fact, the value of the Hamming distance between two consecutive market graphs reached the value around 0.1 in some periods, i.e., about 10% of links were added or removed in the graph when the six-month sliding window was shifted one month ahead. At the same time, the value of the Hamming distance between any two consecutive company co-mention networks did not exceed 0.06. In addition, the values of the d metric for the market graph were twice or triple as great as for the co-mention network. • A common and quite intuitive point of view is that the changes in the news flow intensity and structure may be the cause of the volatility in financial markets. On the other hand, sharply increased volatility can cause a sharp surge in the amount of news items published by news agencies. According to these ideas, the structure of the news flow and the level of volatility should be correlated. However, as our results show, the structure and intensity of the news flow is extremely stable and cannot be either the cause or the result of changes in the volatility of the financial market.

•
According to empirical data, the structure of the co-mention network slightly changed approximately one year before the crisis began. However, these changes are minor and cannot explain the appearance of the global financial crisis that broke out a year after. • Please note that changes of the market graph structure are either related to the increase in volatility caused by the fall in financial asset prices during the crisis (the first peak in Figure 2), or to the volatility associated with the subsequent increase in asset prices (the second hump in Figure 2). These changes of the market graph structure are also well reflected in the Figure 6a,c,e,g,k.
Perhaps, one could make the market graph more stable in time applying the dynamic formation of the threshold θ.
In this paper, we examined the evolution similarity of two network structures reflecting the same fundamental process, namely the pricing of financial assets. Obviously, company co-mentions is only a small part of the news flow background, but they are observable and available in real time, while correlations between asset prices are available with a delay. If the information contained in company co-mentions in financial and economic news flow is significant for stock market participants then it should be reflected in asset prices and similar trends should be present in the dynamic market graphs. Therefore, an interest for further research may include: • the development of methods for joint analysis of trends in the evolution of two simultaneously formed networks; • the development of models and methods for the detection of local mutual causality in the evolution of company co-mention network and market graphs.

Conflicts of Interest:
The authors declare no conflict of interest.