Dynamics of Information Flow between the Chinese A-Share Market and the U.S. Stock Market: From the 2008 Crisis to the COVID-19 Pandemic Period

The relationship between the Chinese market and the US market is widely concerned by researchers and investors. This paper uses transfer entropy and local random permutation (LRP) surrogates to detect the information flow dynamics between two markets. We provide a detailed analysis of the relationship between the two markets using long-term daily and weekly data. Calculations show that there is an asymmetric information flow between the two markets, in which the US market significantly affects the Chinese market. Dynamic analysis based on weekly data shows that the information flow evolves, and includes three significant periods between 2004 and 2021. We also used daily data to analyze the dynamics of information flow in detail over the three periods and found that changes in the intensity of information flow were accompanied by major events affecting the market, such as the 2008 financial crisis and the COVID-19 pandemic period. In particular, we analyzed the impact of the S&P500 index on different industry indices in the Chinese market and found that the dynamics of information flow exhibit multiple patterns. This study reveals the complex information flow between two markets from the perspective of nonlinear dynamics, thereby helping to analyze the impact of major events and providing quantitative analysis tools for investment practice.


Introduction
Both China and the United States are large economies with high influence, so analyzing the relationship between the two stock markets is of great significance for investment practice and risk management. Currently, there are many studies analyzing the relationship between the two markets based on different datasets, in which multiple empirical tools are used. For example, Goh et al. found that some economic variables in the United States can improve the forecast performance of the Chinese stock market [1]. Jian Chen et al. found that US economic variables can help predict the monthly volatility of the Chinese market [2]. In addition, previous studies have also shown that US economic policy uncertainty can significantly explain the returns of China's A-share market [3]. Researchers not only pay attention to how economic variables and policies in the US market affect the Chinese market but also directly discuss the relationship between the returns of the two markets. These studies focus on the impact of special events on the relationship between markets, such as the 2007-2009 crisis [4][5][6][7][8][9][10], the Eurozone crisis [11], and the COVID-19 pandemic [12][13][14]. Empirical analysis shows that the financial crisis has had a significant impact on the BRICS and emerging markets, such as leading to significant risk spillover effects and risk contagion [5][6][7]. In particular, the Chinese market was significantly affected by the US market during the crisis [4][5][6]8,10,11]. These studies used many types of quantitative tools, such as multivariate GARCH models [4,6,10], Granger causality test [5], latent factor models [9], correlation contagion tests [8], etc. In particular, recently, the COVID-19 pandemic has significantly impacted the Chinese market and the US market [12]. For example, research reveals that China's stock market and the US stock market have significant spillover effects during the pandemic [13]. In addition, the pandemic has also impacted the correlation structure between market indexes [14].
Existing studies have revealed that there are extensive links between markets, and the Chinese market is significantly affected by the US market, especially under the impact of major events. However, previous studies have not fully analyzed the long-term relationship between the two markets. For example, most studies only analyze the static relationship in a specified period but do not focus on dynamics. This study analyzes the long-term relationship between the two markets from the perspective of nonlinear information flow dynamics. We use transfer entropy (TE) to quantitatively characterize nonlinear relationships [15], and analyze the dynamics in detail by surrogate time series [16]. Surrogates is an effective technique for analyzing nonlinear time series, which generates simulated time series by keeping some characteristics of the original time series [17][18][19].
At present, transfer entropy has expanded into a "toolbox" that includes multiple tools. For example, Marschinski et al. introduced effective transfer entropy by shuffling the source time series, where the shuffling process is used to destroy the correlation [20]. Further, the group transfer entropy and the effective group transfer entropy can be constructed by considering multiple time series [21]. Jizba et al. constructed the Rényian transfer entropies by extending Shannon entropy to Rényi entropy [22]. Papana et al. extended the transfer entropy method to analyze non-stationary time series [23]. Recently, Nie proposed a method to detect local information flow [16]. This method obtains the baseline TE value from local surrogates so that it is possible to observe the contribution of observations within a small computational window to the original TE value. An advantage of this method is that changes in the intensity of the information flow can be clearly observed, so that we can identify the impact of events on the information flow [16].
Since transfer entropy is a non-parametric method that effectively extracts the causal relationship between variables, it is suitable for analyzing high-complexity financial time series. For example, methods based on transfer entropy have been widely used to analyze financial time series, such as the relationship between stock markets [24][25][26][27][28], the pricevolume relationship [16,[29][30][31], and the foreign exchange market [32]. In particular, early empirical research shows that the US market is a core source of information for global financial markets [28]. Kim et al. considered 10 important indexes and found that the structure of the information transfer network was affected by the crisis [33].
This study focuses on analyzing the evolutionary characteristics of the information flow between China's A-share market and the US market. We use nonlinear analysis methods to characterize the complex information flow dynamics between the two markets and, in particular, identify periods of high information flow intensity by TE and LRP. This study reveals the details of information transfer between markets based on long-term data and from multiple time scales. In the following, we first describe the data set and analyze basic descriptive statistics. Secondly, we review the concept and calculation method of transfer entropy, and the steps of LRP-based analysis. Third, we use weekly data and daily data to analyze the information flow between the two markets. Finally, we analyze the information flow between the S&P500 index and the industry indices of the Chinese A-share market.

Data
This paper uses three datasets, which include closing indices for two market composite indices and some industry indices. The composite index data includes daily and weekly closing data of S&P500 and CSI300 indices, as well as weekly closing data of some industry indices in the Chinese A-share market. All data are extracted from the iFinD database of Tonghuashun company. For the original closing index time series P = {P(t)}, we calculate the series R = {R(t)}, where R(t) = log(P(t + 1)) − log(P(t)).
For weekly data, we use R us and R cn to represent the return series of the S&P500 index and the CSI300 Index, respectively. We considered the data from 31 December 2004 to 17 June 2022, a total of 890 observations, so that each preprocessed series includes 889 observations. Table 1 lists some descriptive statistics of the two series. In addition, the Pearson correlation coefficient of the two-time series is 0.2118, which implies a weak correlation. In addition to the weekly data, we also analyzed the daily data of the two indexes. To clearly analyze the impact of major events, we divide the daily data into three periods. For the preprocessed index data, we use R 1 cn and R 1 us to represent the data of the first period and use similar symbols for other periods. The time intervals corresponding to the three data sets are 4 January 2007-31 December 2010, 2 January 2014-29 December 2017, and 2 January 2018-31 December 2021 respectively. Since the trading days of the two markets are not synchronized, we remove the missing values in the two series, leaving observations with the same time stamp. The time series of the three periods without missing values include 944, 948, and 942 observations, respectively. Table 1 lists the basic descriptive statistics of each return series. Here, we always use the rounding method to keep four decimal places.
We can find from Table 1 that all the averages are close to zero. In three of the four pairs of time series, the standard deviation of the series in the Chinese market is greater than that in the US market. In addition, all skewness values are less than zero, and in particular, all excess kurtosis values are greater than zero. Here, we report the excess kurtosis obtained by subtracting 3 from the original kurtosis value. The kurtosis values imply that all distributions have peak characteristics and are not normal.
In addition to the market composite index, we used 31 industry indices compiled by SWS Research Co., Ltd. Here, we used three of them for dynamic analysis. Table 1 lists the descriptive statistics of the return series for the three indices. The symbols R 1 ind , R 2 ind and R 3 ind correspond to the industry index codes, 801040.SL (Steel industry), 801180.SL (Real estate) and 801960.SL (Petrochemical industry), respectively. It can be found that the average of R 2 ind is significantly larger than the other two indices, while the difference between the standard deviations is small. In particular, all skewness values are less than 0, and kurtosis values are greater than 0.

Transfer Entropy
We consider two random variables I and J, the marginal probability distribution is p I (·) is p J (·), and the joint distribution is p I,J (·). We assume that the underlying dynamic structure conforms to a stationary Markov process, where I and J correspond to orders k and l, respectively. For I, the conditional distribution of state at time t + 1 (i t+1 ) is independent of state in t − k (i t−k ), that is, p I (i t+1 |i t , · · · , i n−k+1 ) = p I (i t+1 |i t , · · · , i n−k ). Similarly, for J, p J (j t+1 |j t , · · · , j n−k+1 ) = p J (j t+1 |j t , · · · , j n−k ), where j t represents the state at time t. The transfer entropy of J → I is defined as Equation (1) [15], where the symbols i k t = (i t , · · · , i n−k+1 ) and j k t = (j t , · · · , j n−k+1 ). In this article, we set l = k = 1.

The Method Used to Estimate the Transfer Entropy
For a time series, we can encode it into a time series including states and then estimate the transfer entropy. Here, we use a method proposed in previous studies to encode the time series [25,26,34], which encodes the original time series through a group of quantiles. One of the advantages of this method is that a large weight can be assigned to the tail of the distribution, thereby focusing on the influence of extreme values in the financial market [25,26]. In addition, we use the R package RTransferEntropy developed by Behrendt et al. [34].
If the time series s = {s(t)} is coded into n states, n − 1 quantiles Q = {q 1 , q 2 , · · · , q n−1 } are required, as shown in Equation (2). For example, if the observation s(t) is between the quantiles q 2 and q 3 of the distribution, then s (t) = 2. The converted time series s = {s (t)} includes n states, and can be used for conditional distribution and joint distribution.
In order to comprehensively analyze the significance of information flow at different quantile parameters, we set the following five sets of parameters: Here, we use the method proposed by Dimpfl et al. to test the significance of TE [25,34]. The method first estimates the dynamics between two variables through the Markov process. Secondly, based on the Markov process, a pair of simulated time series whose dependence has been eliminated can be generated, and the TE value can be estimated. Third, we repeatedly generate the simulated time series N boot times to obtain a TE value distribution for comparison. Finally, the p-value is generated by comparing the original TE value with the quantile of the baseline distribution. In this article, we always set N boot = 1000.

LRP-Based Analysis
The local random permutation (LRP) method uses RP surrogates locally to estimate the intensity of the information flow within a period [16]. Previous studies have confirmed that this method can effectively describe changes in information flow through toy models [16]. In particular, to maintain the correlation between series, LRP provides surrogates that keep the Pearson correlation unchanged.
We consider the information flow s 1 → s 2 between the series s 1 = {s 1 (t)} and s 2 = {s 2 (t)}, where each series includes N observations. We assume that the length of the calculation window for constructing local RP surrogates is L w , and the TE s 1 →s 2 value is significant. 1 We extract the observations in the time interval [1, L w ] to obtain the sub-time series s . In the scrambling step, the cor-respondence between the timestamps of the two series does not change, that is, the series {(s 1 (k), s 2 (k))} composed of pairs is scrambled. 2 We calculate the transfer entropy value between the time series obtained by local scrambling in step 1, and denoted it as TE s 1 →s 2 (1). 3 We repeat steps 1 and 2 a total of M times to obtain a distribution of TE values used as a benchmark (TE 1 We calculate the Z-score Z [1,L w ] = TE−m [1,Lw ] s [1,Lw ] . 5 We move the calculation window one observation each time, and repeat steps 1-4 s 1 →s 2 (k)|k = 1, 2, · · · , M} and the Z-score. For example, the second Z-value is obtained by scrambling all observations in the interval The estimation of the TE value requires setting the parameter Q k , so the parameter may affect the calculation result. Here, we use the following method to synthesize the calculation results of different parameters. For a sequence of Z-scores ({z k (t)}) generated by a parameter Q k , we construct a 0-1 sequence using Equation (3), where Z th is the threshold. We set the threshold Z th = 1.645 here. In this way, we obtain a series ben is a normal distribution, then the original TE value is greater than most of the values (95%) in TE i ben , which implies that there is a significant information flow in the period [i, i + L w − 1]. In this article, we always set M = 1000. Then, we calculate the series Since each Z-score corresponds to an interval, in this article, we use the following convention to plot the figure: the right end of the interval corresponding to the value is used as the time stamp in the figure. For example, for the value Z [1,L w ] , the time label in the figure is the timestamp corresponding to L w . Similarly, we plot the time series {I(t)}.

TE Values of Different Parameters
We use a group of parameters Q k (k = 1, · · · , 5) to calculate the information flow between R us and R cn . Table 2 shows the calculated results, all TE us→cn values are significant at the significance level α = 0.05, and three of the p-values are less than 0.01. In addition, all TE cn→us values are not significant. Table 2 suggests that there is an asymmetric information flow between the Chinese market and the US market, and the US market is a dominant information source. The minimum p-value of 0.005 corresponds to the parameter Q 2 , which means that significant information flow can be detected when a large weight is assigned to extreme values. In addition, the p-value corresponding to the equally divided quantile interval is also less than 0.05. This means that significant us → cn information flow can be detected regardless of whether a greater weight is given to the extreme values.
In summary, there is sufficient evidence to support the existence of information flow us → cn. The change of the p-value with the change of the parameter implies that the analysis of the information flow depends on the choice of the parameter, and thus the calculation results of multiple parameters need to be integrated.

Information Flow Dynamics for Weekly Data
In the previous section, we analyzed the information flow between the two markets globally. However, the static analysis only shows the existence of information flow in the considered period, thus lacking local dynamic analysis. Below, we use LRP to analyze the dynamics of information flow, in particular, to detect periods with localized strong information flow.
We set five sets of parameters Q k (k = 1, · · · , 5) and L w = 96, where the calculation window corresponds to two years of trading weeks. The five subfigures of Figure 1 show the Z-score time series. We observe that all subfigures include local peaks larger than 1.645, implying non-trivial information flow dynamics between series. These peaks correspond to periods of localized strong information flow. For example, for Figure 1a, it can be seen that the Z-score increased rapidly in 2008 and dropped significantly thereafter. Furthermore, we can also observe that the number of peaks in all Z-score series is greater than 1. For example, three localized peaks can be observed in Figure 1b, implying the existence of three periods of strong information flow.
The numbers of local peaks for the five Z-score series are 2, 3, 2, 3, and 3, respectively. Table 3 lists the periods corresponding to these peaks. We use T 1 , T 2 , and T 3 to represent the time intervals in three periods, where each period corresponds to two endpoints.  We find that most of the local peaks correspond to periods that are close to each other, such as the T 1 period of parameter  Table 3 shows that the 2008 crisis significantly affected the information flow of us → cn. In addition, the periods corresponding to several local peaks also include major events. For example, the third peak of Q 5 includes the oil crisis and the collapse of the US stock market. In summary, the LRP-based method identifies some local peaks, indicating that the dynamics of information flow are non-trivial.

Comprehensive Analysis of Z-Score Series
We roughly identified three important periods by Z-scores in the previous subsection. Below, we calculate the I series from the I k series of five Z-score series. The series I takes the value in the set {0, 1, 2, 3, 4, 5}. If I k (t) = 0, the observations in the time period [t − L w + 1, t] do not significantly affect the analysis results of the information flow. On the contrary, if I(t) = 5, it implies that the information flow within the period [t − L w + 1, t] is significant and does not depend on the choice of parameters. A large I(t) value implies that there is information flow in multiple quantile intervals {Q k |k = 1, · · · , 5.}. Figure 2 shows the I series. There are three periods where most observations are greater than or equal to 1. In particular, the analysis shows that there is a period where most observations are greater than or equal to 3, that is, 8 February 2008-24 September 2010. During the 2008-2009 crisis in the US market, there was strong information flow us → cn. We use N i to represent the number of observations equal to i in the I series, as shown in Table 4. It can be found that if the calculation results of 5 parameters are integrated, nearly half of the observations are greater than or equal to 1. This implies that information flows widely exist between the two markets during the period under consideration. Table 5 lists the three periods identified by Figure 2. Each period includes some major events that affect the market. The first period corresponds to the 2007-2009 crisis in the US market. The second period includes the crisis in the Chinese stock market in 2015. In addition, before the stock market crash in 2015, the bubble was generated in the second half of 2014. The second period not only includes the breakage period of the bubble, but also the generation period. The third period includes a series of major events, such as the circuit breaker in the US market in March 2020, and the COVID-19 pandemic. Based on this table, we divide three periods for the analysis of daily data in the next section  In this section, we analyze the relationship between the daily data of the two indices. Table 6 lists the transfer entropy values TE us→cn of different parameters. For the first and third periods, all TE values are significant at significance level α = 0.05. For the second period, the TE values of parameters Q 1 and Q 2 were not significant. However, the p-value for the TE value of Q 4 was less than 0.01. This implies that extreme values have little effect on information flow in the second period. In summary, we still found significant information flows in the three periods of daily data. Below, we use LRP-based analysis to discuss the dynamics of information flow in each period. In the next subsection, for the first period and the second period, we choose the parameter Q corresponding to the maximum TE value in Table 6. In the analysis of the first and third periods, the parameter Q was set to Q 4 and Q 2 , respectively. In addition, the second period is a special period, in which both Q 1 and Q 2 correspond to insignificant TE values, suggesting that extreme values in this period weakly affect the level of information flow. For the second period, we set Q 3 and Q 4 for calculation. The calculation results clearly show that the 2007-2009 crisis in the US market significantly affected the Chinese market. In addition, we also observe that the us → cn information flow also exists in the post-crisis period. These results imply not one peak with strong information flow, but multiple peaks with different intensities. Figure 4 shows the information flow dynamics for the 2014-2017 data. Figure 4a includes a major significant period: 6 January 2016 (6 August 2015)-10 May 2016. The results show that the information flow in the us → cn direction is enhanced after the Chinese market bubble burst. Figure 4b shows the calculation results corresponding to Q 4 . We found that it also includes a major significant period: 30 December 2016 (1 August 2016)-1 March 2017. The period included an important event that affected US markets: the 2016 presidential election. A previous study showed that the event significantly affected correlation dynamics in the US stock market [35]. The analysis here suggests that there was a significant information flow from the US stock market to the Chinese stock market around this event. The two results in Figure 4 show that from 2 January 2014 to 29 December 2017, the US market significantly influenced the Chinese market in two periods, and both periods included major events.  . We label the different subperiods with T i . It can be found that T 2 , T 3 , T 4 and T 5 can be combined into an important period. There were several major events in this period: Sino-US trade friction, the oil crisis, the COVID-19 pandemic, the US stock market crisis in March 2020. In addition, periods T 1 and T 6 still belong to the period of Sino-US trade friction and the period of the COVID-19 pandemic, respectively. In particular, the calculation window corresponding to the maximum value (3.4339) is 21 October 2019-23 March 2020. Several major events are included in this period, such as the oil crisis, the COVID-19 pandemic, and the March 2020 market crisis. To sum up, it is reasonable to speculate that the impact of multiple major events led to the active information flow during the period.

Global Information Flow Analysis between Industry and Market Index
Since the composite index includes stocks belonging to various industries, in this section, we analyze the relationship between the S&P500 index and the industry index in the Chinese market. Here, we consider the case where a larger weight is given to the tail of the distribution, i.e., Q = Q 1 . Table 7 lists the TE values and p-values, where the symbol "ind" represents the industry index, and the code is listed in the second column.
The fourth and sixth columns show that there are several industry indices that are closely related to the S&P500 index. For example, there is a significant bidirectional flow of information between the petrochemical industry and the S&P500 index. In addition, some unidirectional effects can also be found, such as in real estate. It should be noted that not all industry indices are affected by the S&P500 index, such as the public utilities index and the architectural ornament index. To sum up, Table 7 shows that the information flow from S&P500 to Chinese market industry indices is heterogeneous.

The Dynamics of Information Flow between Industry Index and S&P500 Index
In the previous section, we found diversity in the relationship between the S&P500 index and industry indices, with some industry indices having a stronger flow of information between them and the S&P500 index. A related issue is whether the dynamics of information flow for different industry indices have similar patterns. Here, we select three industry indices and compare information flow dynamics. The three indices are 801040.SL (Steel industry), 801180.SL (Real estate) and 801960.SL (Petrochemical industry), and all have significant us → ind information flow.
We analyze dynamics using weekly data and always set Q = Q 1 and L w = 96 weeks.  The information flow analysis result of us → 801960.SL is shown in Figure 8. We find that it included a period with most values greater than the critical threshold: 10 October 2014-9 June 2017. This period is close to that of the second peak in Figure 6. However, for the series shown in Figure 8, it includes some values less than the critical threshold. The results in Figures 6-8 show that the information flow dynamics of us → ind exhibit multiple patterns, thus indicating a high complexity of information flow between markets.

Discussion
This study examines periods of strong information flow between two important markets, where these periods are accompanied by major events. This implies that long-term major events may affect market relations more deeply. This provides a forward-looking analysis for analyzing the relationship between the two markets. In particular, we can analyze the dynamics of information flow in detail to identify changes in the relationship between markets when markets are impacted by major events. A related topic that can be studied is whether non-linear information flow helps predict future returns in the market.
Several previous studies have revealed that major events affect the relationship between markets [4][5][6][7][8][9][10][11][12][13][14]. Unlike previous studies, we analyze long-term data and focus on the dynamics of information flow. In particular, LRP-based analysis characterizes the strength of the local information flow, so that the time-varying characteristics of the information flow in a period can be observed in detail.

Conclusions
We discussed the long-term relationship between the Chinese market and the US market and found that there is an asymmetric information flow, where the US market is the source of information flow. Calculations show that the information flow between the Chinese market and the US market is not stable, but there are some significant periods. The LRP-based analysis identified three periods of significant information flow. All three periods include some major events affecting the market, such as the 2008 crisis, the oil crisis, and the COVID-19 pandemic, thus suggesting that the dynamics of information flow may be impacted by major events. In addition, we conducted a detailed dynamic analysis with daily data. Calculations show that each period includes nontrivial dynamics, some of which appear as local peaks, suggesting dramatic changes in information flow.
We also analyzed the information flow between the S&P500 index and some industry indices in the Chinese A-share market. Calculations show that the S&P500 index does not always significantly affect each industry index. Some industries are significantly affected, such as the steel industry and real estate. However, analysis based on weekly data shows that the dynamics of information flow between the industry indices and the S&P500 index exhibits multiple patterns. In addition, there are also some significant information flows from the Chinese market to the US market, thus demonstrating the heterogeneity of information flow.
This study demonstrates the high complexity of nonlinear causal relationships between two important markets and facilitates quantitative studies of the impact of major events. In particular, empirical analysis suggests that the heterogeneity of information flow in industry indices needs to be taken into account when discussing risk spillovers between markets. In this article, we only consider the information flow between the two markets. A related and important topic is the study of how major events affect information flow networks that include multiple markets. Since LRP-based methods involve two variables, for the information flow network, we need to construct a method that includes multiple variables. An alternative approach is to characterize changes in network structure by shuffling observations. In addition, the relationship between industry indices within a market can be analyzed in detail through LRP-based methods and high-frequency data. For example, for a trading day, we can analyze how important announcements impact the information flow network constructed by multiple industry indices.

Conflicts of Interest:
The authors declare no conflict of interest.