Improving the Process of Early-Warning Detection and Identifying the Most Affected Markets: Evidence from Subprime Mortgage Crisis and COVID-19 Outbreak—Application to American Stock Markets

Stock-market-crash predictability is of particular interest in the field of financial time-series analysis. Famous examples of major stock-market crashes are the real-estate bubble in 2008 and COVID-19 in 2020. Several studies have studied the prediction process without taking into consideration which markets might be falling into a crisis. To this end, a combination analysis is utilized in this manuscript. Firstly, the auto-regressive estimation (ARE) algorithm is successfully applied to electroencephalography (EEG) brain data for detecting diseases. The ARE algorithm is employed based on state-space modelling, which applies the expectation-maximization algorithm and Kalman filter. This manuscript introduces its application, for the first time, to stock-market data. For this purpose, a time-evolving interaction surface is constructed to observe the change in the surface topology. This enables tracking of the stock market’s behavior over time and differentiates between different states. This provides a deep understanding of the underlying system behavior before, during, and after a crisis. Different patterns of the stock-market movements are recognized, providing novel information regarding detecting an early-warning sign. Secondly, a Granger-causality time-domain technique, called directed partial correlation, is employed to infer the underlying interconnectivity structure among markets. This information is crucial for investors and market players, enabling them to differentiate between those markets which will fall in a catastrophic loss, and those which will not. Consequently, they can make successful decisions towards selecting less risky portfolios, which guarantees lower losses. The results showed the effectiveness of the use of this methodology in the framework of the process of early-warning detection.


Introduction
A stock price crash is a phenomenon which occurs in the stock market in which a stock index or individual stock price falls sharply within a short time period, [1]. Therefore, predicting crashes in the stock-market system has been the focus of numerous studies [1][2][3][4][5][6][7][8]. It is well-known that the existing literature on stock-market crashes is extensive. Numerous studies have focused on the detection of early-warning signals of market distress from option contracts [9][10][11][12][13]. In addition, some studies have focused on predicting stock crashes at the firm level [4,7], while others have studied the construction of generic indicators to capture critical transitions in the system, as in ecology and climate science [2,3,6,8]; however, the milestone research was conducted by Scheffer et al. [14]. Some studies have employed the concept of capturing critical transitions in complex systems for the purpose of constructing correlation indicators [2,15]. Numerous studies have applied multi-fractal methods to financial time-series data [16][17][18][19]. In addition, there is interest in the use of In particular, financial markets are well-known for being characterized by non-Gaussian distributions, as their fluctuations typically present tails in the case of short returns. However, the long-time returns follow Gaussian distributions. They are often obscured by a large amount of observational noise, which is assumed to be Gaussian noise. This observational noise is not a part of the dynamics of the process; on the contrary, dynamic noise is added to the dynamics of the process. Based on this, both types of noise can be taken into consideration under stochastic models using SSM. The SSM consists of two equations: one that describes the dynamics of the process, as well as an observation equation that models the observation function and observational noise. Such models require accurate parameter estimates; however, existing naïve parameter estimators neglect observational noise, resulting in biased estimates being obtained. Therefore, there is an essential need to determine more robust estimators.
By analyzing the resulting interaction surface topology together with the resulting interaction network structure, the following can be concluded: an early-warning sign of a potential crisis can be detected in the long run, relatively, as a consequence of capturing critical transitions. Such transitions are able to imply unwanted collapse. In addition, the reconstruction of the interaction networks is helpful in distinguishing between the markets which are most strongly affected by the crisis. This analysis, in turn, enables investors and market players to differentiate between those markets which will fall into a catastrophic loss and those which will not. As a consequence, they can make successful decisions towards selecting less risky portfolios, which guarantees lower losses.
In summary, various algorithms, as well as DPC techniques, can be applied to international stock-market time-series data for the purpose of reconstructing the time-dependent stock-market interaction parameter spaces. This indicates how the topology of the resultant interaction parameter spaces may change from a non-crisis state into a state in which a crisis occurs. Furthermore, DPC analysis provides a clear picture of how markets interact. This gives a clear warning sign which confirms the tendency for a potential crisis. This, in turn, allows investors to manage their losses, before their occurrence.
The remainder of this manuscript is structured as follows. The methods applied in this work are presented in Section 2. In Section 3, the applications of the methods to American stock markets are discussed.

Methods
This section presents the methods used in this manuscript, which are applied to American stock-market time-series data. The work is split into two parts, according to the aim of the manuscript. The first part introduces the methods used regarding the aim of constructing the interaction parameter spaces as density heat maps, in order to track the market motions through their topological changes during different periods of time. For this aim, the ARE algorithm is used. A diagram illustrating the underlying mechanism of the EM-KF algorithm is shown in Figure 1. The second part presents the method used for the aim of reconstructing the interaction network structure between the strongly interconnected market indices (i.e., DPC). For simplicity, an illustrative diagram of the methodology utilized in the manuscript is shown, step-wise, in Figure 2.
In the following, the SSM model is presented in the first sub-section. In the second sub-section, the model order-selection criterion AIC i is detailed. The EM-KF scheme is discussed and illustrated in the third sub-section. Finally, the DPC technique is presented in the fourth sub-section. Kalman filter in the expectation-maximization algorithm. The Kalman filter is utilized to obtain conditional means using as parameters the P(r) in every iteration r. Maximization of the expected value of the likelihood function leads to a new set of parameters P(r + 1) [26].

State Space Model (SSM)
The state space model (SSM) is a method for modeling both observed and hidden processes in a given system. The SSM model is used in the Kalman filter (KF) to model the data under analysis. This model contains two equations. The first equation models the dynamics or the state of the process and the Gaussian distributed driving noise, and it is called the state equation. The second equation models the observations with Gaussian distributed observational noise, and is called the output equation [26].
The dynamics of the underlying process are modeled by a linear stochastic equation, which is the vector autoregressive process of order p (VAR[p]) where x t is the current state vector based on the past p state vector and the Gaussian driving noise t , with zero mean and co-variance matrix Q. The transition matrix A ( τ) varies over time, which, in turn, determines the dynamics of the process. The state and noise vectors x t and t are of dimension d, while A and Q are d × d matrices.
On the other hand, the observation is modelled by the b × pd observation matrix C together with the observational noise η t . The dimension of y t and η t is b × 1. The observational noise is assumed to be Gaussiandistributed with zero mean and b × b co-variance matrix R. In general, b = pd is where not all components of the underlying process are observed. This is a special case when reformulating a VAR[p] as a VAR [1], where the first d hidden states are only observed. Then, the state space model (SSM) can be written as [26,31] x To sum up, the SSM consists of two equations: an equation that describes the dynamics of the process, in addition to an observation equation that models the observation function, and observational noise.

Model Order Selection Criterion AIC i
In time-series applications, before conducting the analysis, an appropriate model order must be chosen to characterize the collected data. The most standard criterion of scientific theory for this determination is the so-called Akaike information criterion (AIC), introduced by Akaike in 1974 [64]. This criterion is considered a data-driven selection method. The AIC can be obtained by evaluating: whereΘ k denotes the parameter estimate that is obtained by maximizing the likelihood function for the model, f (y|Θ k ) is the maximum, and k is the number of estimated parameters. Therefore, f (y|Θ k ) represents the resulting empirical likelihood. In addition, the AIC provides insight into how a fitted model is close to the underlying generating (or true) model; this approach might suit some models, but not all. To this end, based on extending the original work of Akaike, Sugiura (1978) [65] proposed the AIC c , which is a corrected version of AIC developed in the context of regression models with normal errors. In such a setting, the AIC c can be obtained by evaluating: where 2Tk T−k−1 is the bias correction term and T is the sample size. However, the effectiveness of the AIC c motivates the need for an improved variant of AIC for state-space models, as has been demonstrated in [66]. This variant is based on an idea presented by Hurvich, Shumway, and Tasai (1990) [67], in the context of autoregressive models. This model is known as the "improved" Akaike information criterion (AIC i ), which can be obtained by evaluating: where the penalty termB T (k, Θ s ) serves as a Monte-Carlo approximation [67,68]. The development of the AIC i for state-space applications, as well as its performance, have been investigated in [68] through a simulation study. In addition, they compared the performance of the AIC, AIC c , AIC i , and other criteria, and found that the AIC i outperformed the others in the context of SSM, where it provides the true model order.
As the AIC i is utilized in this manuscript, then the log returns are used. The price of an asset at time 0 is denoted by P 0 , and the price of an asset at time T is denoted by P T . The log-return formula is given by: To this end, in this manuscript, for state-space models, the AIC i is utilized to match the requirements of such models and provide a more accurate model-order selection.

Expectation-Maximisation (EM) Algorithm and Kalman Filter (KL)
In this subsection, for the estimation of the state-space model parameters, expectationmaximisation (EM) algorithm and Kalman filter are utilized and presented [27,69,70]. The expectation-maximisation (EM) algorithm is based on an iterative scheme which consists of two steps: the expectation step and the maximization step, accordingly (see Figure 1). In the expectation step, conditional expected values of the hidden state x(t) and its covariance P(t) are obtained using the Kalman filter based on the equations explained above. In the maximization step, based on these values, the expected value of the likelihood is maximized with respect to the parameters, which results in a new set of parameters, which is used in the next iteration of the EM algorithm [26]. In the first iteration of the EM algorithm, the parameters P(1) need to be initialized. Therefore, for instance, the leastsquares parameter estimates can then be used.
In other words, the expectation-maximization (EM) algorithm provides an iterative maximum likelihood estimator for the parameters in the state-space model (SSM) [28,30]. This EM algorithm for SSM is based on the so-called Kalman filter [29]. This filter is utilized to obtain estimates of the hidden states. The state estimates are then used to improve the estimates of the process parameters [26,27].
To introduce the Kalman filter, a measurement time series containing n observations is assumed with time t = 1, . . . , n is used to reference these observations [26]. For conditional expectations [31] x s t = E[x t |y 1 , . . . , y s ], The subscript denotes the estimation time point, while the superscript is up to which measurement it is conditioned on. The equality in Equation (8) holds if the underlying process is Gaussian, which is assumed here. The Kalman filter is described in terms of a set of equations which are based on an effective recursive computational way to estimate the state of the SSM process, which minimizes the mean of the squared error [26]. The Kalmanfilter equations are [71] x t−1 with initial values The idea of how the Kalman filter works is based on a recursive cycle of a time-update and a measurement-update step [72]. The time update, in Equations (10) and (12), predicts the state from time t − 1 to t, which results in the prior estimate x t−1 t and its co-variance P t−1 t [26]. The measurementupdate step consists of Equations (11) and (13), and it corrects the prior estimates by taking into account the current prediction x t−1 t , the measurement y t and the Kalman-filter gain, in Equation (14); this leads to the posterior estimates [26]. The Kalman-filter equations apply a recursive scheme, as only the observations and estimates from the past and the present are used [73]. Applying the steps of the EM algorithm together with the Kalman filter iteratively ensures convergence to the best estimator of the underlying dynamics and the parameters of the process [27].
Maximum-likelihood estimation (MLE) is one of the most effective approaches to fit model parameters to data. The likelihood is a function which describes the probability of the recorded data given the model parameters. The maximization of the likelihood function results in obtaining the parameters of the model where the observed time series is most likely. An iterative maximum-likelihood estimator of the SSM parameters is derived [26]. For the complete data log-likelihood: Since the hidden states x t are unknown, only the expected value of the log-likelihood conditioned on y 1 , ..., y n is accessible. The illustration of the abbreviations used in the previous equation is as follows: The quantities required in Equation (17) are the results of the Kalman filter of the r − th EM iteration. To maximize G(Θ), its derivative is set to zero, leading to the update rules: The update of µ is x n 0 of the last iteration of EM. If the measurement is corrected for the mean, then the initial value for the first EM iteration of µ is set to zero. In addition, the initial value of the co-variance of the process ∑ can be estimated or set to a reasonable baseline value [71]. In addition, the likelihood never decreases; therefore, there is no adjustment of step size is needed [28].
In summary, applying the EM algorithm together with the Kalman filter is a robust iterative procedure to estimate model parameters in the SSM, in addition to de-noising the timeseries data. The main drawback of this approach is that it has high computational burden.

Granger Causality in the Time Domain: Directed Partial Correlation (DPC) [74]
In order to provide a time-domain measure based on the concept of Granger causality, the directed partial correlation (DPC) was introduced by Eichler (2005) [56]. One of the most effective features of DPC is that it can be used as a measure of causal-effect strength [56]. When inferring causal relationships from time-series data, VAR[p] models can be fitted using least-squares estimation [75], which is utilized in this manuscript. For observations x V (1), . . . , x V (T) from a d-dimensional multiple time series x V , we obtain a vector autoregressive model (VAR) with the following representation: where x V (t) is the vector that represents the entire set of observed processes. Now, let R p = (R p (h, ν)) h,ν=1,...,p be the pd × pd matrix composed by sub-matrices [56] Similarly,r p is set to be such thatr p = (R p (0, 1), . . . ,R p (0, p)). Then, the least-squares estimates of the autoregressive coefficients are given bŷ where h = 1, . . . , p and the covariance matrix Σ is estimated bŷ are the least-squares residuals. However, the coefficients A ij (h) depend on the unit of measurement of x i and x j ; thus, they are not suitable for comparisons of the strength of causal relationships between variables [56]. Therefore, Eichler (2005) [56] proposed DPC as a measure of the strength of causal relationships. For h > 0, the DPC π ij (h) is defined as the correlation between x i (t) and x j (t − h) after removing the linear effects of the other variables included in the vector x V . For h < 0, π ij (h) = π ij (−h). In addition, it has been shown in [56] that for h > 0, estimates for the DPC π ij (h) can be obtained from the parameter estimates of a VAR[p] model by re-scaling the coefficients A ij (h) whereρ The matrixK =Σ −1 is the inverse of the estimated covariance matrixΣ of the residual noise processes.
To decide the significance of an estimated causal influence, we use a statistical evaluation procedure based on bootstrapping to construct the confidence interval as follows [74]: Generate B bootstrap surrogates (resamples) with the same length as the original data. A rough minimum of 1000 bootstrap surrogates is often sufficient to compute accurate confidence intervals, as has been suggested by Efron and Tibshirani [76].
Here, B is set to 10,000. The surrogates are generated using a non-parametric methodthe amplitude-adjusted Fourier transform (AAFT) which was originally proposed by Theiler et al. (1992) [77,78]. This method works under the null hypothesis that the original data are generated from a stationary, Gaussian and linear stochastic process [79]. The algorithm for generating the surrogates is described as follows [79,80]: The original data are re-scaled to a normal distribution. This is based on a simple rank ordering, which is performed by generating a time series with Gaussian distribution which is then sorted according to the original data.
A Fourier-transformed surrogate of the re-scaled data is constructed.
The final surrogate is scaled to the distribution of the original data by sorting the original data to the ranking of the Fourier-transformed surrogate.
The use of this algorithm is advantageous as it preserves the distribution, as well as approximately preserving the power spectrum (i.e., the autocorrelation structure), of the original data [79,80]. For the implementation of the AAFT method, we used the Tisean package (for details about the Tisean package, we refer to http://www.mpipksdresden.mpg.de/tisean/) [78]. Note that the Tisean program performs the algorithm described above, iteratively, until no further improvement can be made [78]. If the DPC value estimated from the original time series lies outside the confidence interval, then the value is considered to be significantly different from zero.

Degree-Centrality Measures
In this subsection, degree-centrality measures are described. Degree centrality corresponds to the total number of connections linked to a node of a network [82]. Degree centrality has two measures: mainly in-degree and out-degree. In-degree refers to the number of connections that point inward at a node, while out-degree refers to the number of connections that originate at a node and point outward to other nodes [83]. In this manuscript, the use of these measures is considered advantageous. The in-degree measure identifies the most affected market indices, while the out-degree measure identifies the most influential market indices. This differentiation is crucial for investors and market players in the decision-making process related to investment portfolios.

Application to American Stock Markets-Subprime Mortgage Crisis (2007-2008)
This section presents the results of applying the ARE algorithm to American stock markets. American stock-market time-series data are introduced in the first part of this section. Before estimating the parameters, the model order must be obtained. To this end, a model order selection criterion is utilized in the second part of this section. The final part of this section presents the results and our conclusions.

Data
The data sets included of 41 American stock-market indices for 14 countries. Therefore, the sample size was 41 data sets, each of which had 1417 observations. The indices for the markets of respective countries are displayed in

Model-Order Selection Criterion AIC i
In this subsection, the results of employing the AIC i criterion to calculate the SSM model order are presented in Table 1. It can be seen that the true model order corresponded to the largest AIC i value, which means that the optimal chosen order was three for the estimation process of the autoregressive coefficients for the SSM Model. Knowing the true model order enables accurate estimation of the autoregressive coefficients, (i.e.,α 1 , α 2 , and α 3 ) of the SSM by EM-KF scheme. These three autoregressive coefficients were estimated for each time period studied. Table 1. Results for order selection using AIC i . The table shows that the optimal order for the SSM is three. More precisely, the true model order corresponds to the largest AIC i value.

Results
The numerical algorithm ARE was applied to the 41 stock-market time-series data sets. The main objective of utilizing the ARE algorithm was to observe the pattern and the tendency of the market's movements, in order to distinguish between different crisis states.
In other words, the focus of the ARE algorithm was to observe the general pattern of the market flow and how the markets move from one state to another over time (2006)(2007)(2008)(2009)(2010). This allows for tracking market motions, for the purpose of early-warning detection of any unusual specific pattern. This tool is ideal for knowledge discovery in data sets, as it determines the grouping structure in time-series data [26,27,35,69]. The ARE analysis was conducted for each of the above-mentioned periods (detailed in Section 3.1 separately, with no overlap between them, in order to demonstrate how the topology of the constructed interaction surfaces of the stock markets under study changed from one state to another. Furthermore, the DPC technique was further employed to identify the most affected markets, as well as to determine the entire causal interaction structure. In the following, the discussion of each reconstructed interaction space, as well as the corresponding causal interaction structure, is presented. Note that the main interest of conducting DPC analysis was to draw conclusions regarding the interaction structures among the most affected markets, which are strongly interconnected. As only strong interactions were of interest here, only the interconnectivity links which were larger than or equal to 0.65 are shown. For the ARE constructed surface, each interaction surface was constructed based on the three leading estimated autoregressive parameters (i.e., α 1 , α 2 , and α 3 ). These estimated parameters are the coordinates of each point, which corresponds to each stock market; that is, for every point in time. The three-dimensional parameter spaces are shown as snapshots (i.e., time frames) representing the motion and the behavior of the markets over each period separately. More precisely, the estimation process resulted in a sequence of different sets of parameter values describing the state of each point, which represents each market in the parameter space. For a smoother view of the constructed surfaces, they are presented as heat maps, according to the density reflecting the interaction levels among markets.
The results of conducting the ARE algorithm, when there was no crisis, are presented in Figure 3, which shows the inferred interaction parameter space during the time period 1/1/2006 to 30/6/2006. This space was reconstructed based on the three auto-correlation coefficients estimated using the EM-KF scheme. This estimate determines the coordinates of the position of each market. This, in turn, provides the pattern of the market's movements. The color bar shows the heat map, representing the density where markets are positioned in the same place. In other words, the color becoming more red, reflects higher density. As such, the strong interactions among markets are found only in the yellow-red regions presented in Figure 3, while the low and the medium interactions are found in the blue regions. Finally, the white regions represent no interaction.
On the other hand, in order to identify which markets were strongly interacting, DPC analysis was conducted. Figures 4 and 5 show the inferred interaction network structures corresponding to the two yellow-red regions in Figure 3. Figure 4 reflects the underlying constructed interaction network structure corresponding to the first yellow-red region, which is located on the lower left side of the surface in Figure 3, while Figure 5 reflects the underlying constructed interaction network structure corresponding to the second yellowred region, which is located on the top-right side of the surface in Figure 3. In Figures 4  and 5, the color of the nodes corresponds to the country indices (see Table A1 in the Appendix A). In addition, the thickness of the arrows refers to the strength of interactions among markets. Note that, in this study, we focus only on interaction parameters equal to 0.65 and above, which reflect the strongest interactions. Each node represents the name of the market index. Here, there were four U.S. markets (nodes 3, 4, 5 and 6), and one Panama market (node 2) and one Canada market (node 1). According to Figure 4, which represents Figure 3, region 2, on the one hand, there was a strong interaction between U.S. market indices. This formed a community of strongly interacting U.S. market indices with an influence on one of Brazil's market indices. It can be observed that the link (8 (Brazil) →5 (U.S.)) was present only due to the strong interaction among U.S. markets which, in turn, affects Brazil. This led to Brazil influencing one of the U.S. markets in return. On the other hand, Figure 5, which represents Figure 3, region 2, demonstrates that the U.S. market indices strongly influenced both Panama and Canada markets. The link (2 (Panama) →1 (Canada)) is present as a result of the strong influence of U.S. market indices on node 2 (Panama). The same situation occurs for the link (2 (Panama) →6 (U.S.)), as this link is present due to the strong influence of U.S. markets on Panama markets. This also occurred for the link (1 (Canada) →6 (U.S.)), appearing as a consequence of the strong influence of the U.S. market indices on node 1 (Canada). Figure 3. The constructed stock-market three-dimensional interaction parameter space, which corresponds to period (a) that represents the first half of 2006. This space was reconstructed based on the three estimated auto-correlation coefficients of the SSM model, where the estimate determines the coordinates of the position of each market. The figure demonstrates the level of interaction, which differs from one region to another. Note that the density bar is divided into three parts (low, medium, and high), with the corresponding interaction coefficients for each part. Therefore, the high-density spots correspond to high interaction among markets. The constructed stock-market interaction network structure. The constructed network reflects the interaction structure among markets corresponding to region 1 in the constructed space presented in Figure 3. The red-colored nodes correspond to U.S. stock markets, while the greencolored node corresponds to a stock market belonging to Brazil. The causal strength of interest to be represented in this manuscript is above 0.65, reflecting strong interactions. More precisely, three kinds of strongly connected causal links are presented here. The first are the dashed links, which correspond to a causal strength between 0.65 and 0.74; the second are the light-colored links, which corresponds to a causal strength between 0.75 and 0.84, the third are the dark-colored links, which correspond to a causal strength between 0.85 and 0.95. The network shows that the majority of connected indices mostly belong to U.S. markets.

Figure 5.
The constructed stock-market interaction network structure. The constructed network reflects the interaction structure among markets which corresponds to region 2 in the constructed space presented in Figure 3. The red-coloured nodes correspond to U.S. stock markets, the blue-colored node corresponds to a stock market belonging to Canada and the purple-colored node corresponds to a stock market belonging to Panama. Recall that the causal strength of interest to be represented in this manuscript is above 0.65. The network shows that the strong connectivity structure is captured between Panamanian and Canadian markets with U.S. markets.
In general, for the first time period, it can be observed that a small number of the markets were strongly interconnected, where the rest moved in a distributed manner over the constructed surface. In particular, the interaction surface formed two small communities of markets which were very close to each other.
For the second time period (1 July 2006 to 30 June 2007), ARE and DPC analyses were also conducted. The ARE results are presented in Figure 6, and it can be seen that almost all markets were settled in one particular region with high density. In addition, there were a small number of markets that did not belong to the high-density cluster. Furthermore, it can be seen that the density of the collective motion becomes lower in the middle of the surface and almost zero at the end of it. This indicates a special pattern that occurs, which can be considered a warning sign regarding a crisis that will happen at some point in the future. Furthermore, to identify which markets are the most strongly interconnected, a DPC analysis was conducted. The inferred interaction network structure that corresponds to the yellow-red region is presented in Figure 7. The strong interconnectivity structure among U.S. markets was clearly detected. The reason behind the appearance of links going out to nodes 13 (Brazil), 14 (Brazil), 15 (Canada), 16 (Canada), and 17 (Colombia) was the strong influence of all U.S. market indices on Brazilian, Canadian, and Colombian markets. This indicates that the Brazilian, Canadian, and Colombian markets will potentially be the most affected markets due to the U.S. home mortgage crisis. This conclusion is evidenced in Figure 10.  The constructed stock-market interaction network structure. The constructed network reflects the interaction structure among markets corresponding to the high-density region in the constructed space presented in Figure 6. The red-colored nodes correspond to U.S. stock markets, the blue-colored nodes correspond to stock markets belonging to Canada, the green-colored nodes correspond to stock markets belonging to Brazil and the brown-colored nodes correspond to stock markets belonging to Colombia. The network shows that nodes 14, 15, 16 and 17 are the most interacting market indices with U.S. markets.
For the third time period (1/7/2007 to 31/12/2007), Figure 8 presents the reconstructed surface, where the high density of the markets is moving collectively from one state to another. The behavior direction is indicated by an arrow. Interestingly, this collective motion is known as "herding behavior" in the literature [85,86]. In order to identify the markets which are collectively moving, a DPC analysis was conducted, and the corresponding interaction network structure is reconstructed (see Figure 9). To distinguish between the most and least affected markets, the degree centrality measure was utilized. Table 2 provides the results for the calculation of the out-degree and the in-degree of each node presented in Figure 9. According to Table 2, the most influencing nodes were 1, 5, 10, and 16, while nodes 3 and 8 were the most affected markets. Table 2. Degree centrality. This table presents the result of the calculation of out-degree and in-degree for each node separately corresponding to the node number explanation of Figure 9. The table shows that nodes 1, 5, 10 and 16 are the most influential nodes in the interaction network corresponding to the highest out-degree, while nodes 3 and 8 are the most affected nodes corresponding to the highest in-degree.

Node Number
Out-Degree In-Degree  Figure 8. The topology of the constructed stock-market three-dimensional interaction parameter space, which corresponds to period (c) that represents the second half of 2007. Here, the whole market is in a state towards a crisis. The figure shows that there is a region in the middle of the space where the density is very high and wide, which is going in a specific direction, as indicated with an arrow. Figure 9. The constructed stock-market interaction network structure. The constructed network reflects the interaction structure among markets corresponding to the high-density region, which spans the red region along the directed arrow in the constructed space presented in Figure 8. The red-colored nodes correspond to U.S. stock markets, the blue-colored nodes correspond to stock markets belonging to Canada, the green-colored nodes correspond to stock markets belonging to Brazil and the brown-colored nodes correspond to stock markets belonging to Colombia. The network shows that all other markets are strongly interconnected with U.S. market indicies.
The analysis for the fourth time period (1 January 2008 to 31 December 2008), the state where the crisis broke out and reached its peak (in 2008), is shown in Figure 10. It can be observed that most companies moved to a different state, except a few of them. The figure also shows that a high density of markets settled at another position on the right side of the surface. This illustrates that the high-density cluster contained the vast majority of the markets which were strongly interconnected with each other. In order to distinguish the most and the least affected markets, a DPC analysis was conducted; the result is shown in Figure 11, which shows that the markets which strongly interacted before the crisis happened (see Figure 6) are those that are affected here, forming a cluster (see Figure 10). It can be observed that nodes 3 and 6 were both influenced by nodes 5 and 11, while nodes 7 and 15 influenced nodes 3 and 6. In addition, node 16 (Canada) was seen to influence the other markets; namely, nodes 9 (U.S.), 15 (Canada), and 17 (Colombia). To identify the most central influencing market indices, the degree-centrality measure was applied. The out-degree and in-degree for each node was calculated, and the results are given in Table 3. According to Table 3 corresponding to Figure 11, it can be noted that nodes 1, 5, and 10 were the most influencing U.S. markets on all the other markets. Furthermore, node 8 (U.S.) was the most affected U.S. market during the crisis, which influenced node 15 (Canada) through nodes 2 and 4. In addition, node 3 (U.S.) was the second-most affected U.S. market, which transmitted the crisis into Brazil (node 14) via node 10 (U.S.); which, in turn, affected node 13 (Brazil).
In summary, the crisis was transmitted from U.S. markets to Brazil markets, which were the most affected markets during the crisis, followed by Canada and (the least affected) Colombia. Table 3. Degree centrality. This table presents the result of the calculation of out-degree and in-degree for each node separately, corresponding to the node number explanation of Figure 11. The table shows that nodes 1, 5, 10 and 16 are the most influential nodes in the interaction network corresponding to the highest out-degree, while nodes 3, 6 and 8 are the most affected nodes corresponding to the highest in-degree.

Node Number
Out-Degree In-Degree  . The constructed stock-market interaction network structure. The constructed network reflects the interaction structure among markets corresponding to the high-density region, which spans the red region along the directed arrow in the constructed space presented in Figure 10. The redcolored nodes correspond to U.S. stock markets, the blue-colored nodes correspond to stock markets belonging to Canada, the green-colored nodes corresponding to stock markets belonging to Brazil and the brown-colored nodes correspond to a stock market belonging to Colombia. The network shows that nodes 13, 14, 15 and 16 are the most interacting market indices with U.S. markets.
For the final time period (1 January 2009 to 31 December 2010), after the crisis had finished, the surface returned to a state where no obvious pattern could be captured, and the markets were distributed all over the surface again (see Figure 12). Figure 12 shows that the topological structure of the surface is changed and formed into different clusters. To identify these clusters, the reconstructed interaction network structure based on DPC is presented in Figure 13. The first cluster contains nodes 2, 3, 7, and 9 (U.S. market indices), the second cluster contains nodes 5 and 6 (Canada market indices), the third cluster contains nodes 12 and 13 (Brazil market indices), and the final cluster contains nodes 1, 4, 8, 10, and 11 (U.S. market indices).
In comparison, Figures 6 and 10 show that all the markets were entirely connected to each other, forming obvious clusters, in contrast to the connectivity structure presented in Figure 12. Based on this connectivity structure, no specific pattern could be captured. However, U.S. markets continued their influence on Brazilian and Canadian markets after the crisis. Figure 12. The constructed stock-market three-dimensional interaction parameter space, when a crisis has ended for period (e) that represents the period (2009-2010). The figure shows that there is a region presented as a red curve where the topology of the density structure has changed from the ones observed before the crisis. Figure 13. The constructed stock-market interaction network structure. The constructed network reflects the interaction structure among markets, which corresponds to the high-density region, which spans the red region along the directed arrow in the constructed space presented in Figure 12. The red-colored nodes correspond to U.S. stock markets, the blue-colored nodes correspond to stock markets belonging to Canada and the green-colored nodes correspond to stock markets belonging to Brazil. The network shows that there is no interesting pattern to be captured.
Based on the observed interaction pattern, as well as its corresponding structure in Figures 7 and 11, the following can be concluded. The market indices observed before the crisis, toward the crisis, and during the crisis were the same for the three phases, as confirmed by the same clustering pattern being observed in these Figures.
In summary, the states in which the markets were not falling into a crisis or where no potential crisis existed are shown in Figures 3 and 12. On the other hand Interestingly, in the comparison between Table 2 referring to Figure 9 with Table 3 referring to Figure 11, it can be determined that the most affected markets corresponded to nodes 3 and 8. This conclusion indicates the importance of conducting DPC analysis together with calculating out-degree and in-degree measures, in order to provide a warning sign and identify which markets may be the most affected.
To sum up, an illustrative graph for showing the transition between the time before the crisis-period (b)-(second half of 2006 to first half of 2007) and the time during the crisis-period (d)-(2008), is presented in Figure 14.  Figures 6 and 10. It shows that there is a transition occurring between two states, mainly the time before the crisis and the crisis time, forming two holes of clusters. This explains when the interactions among markets reach their peak, which, in turn, can be considered an indication that there is a potential crisis.

Application to American Stock Markets: COVID-19 (2020)
To show the robustness of the methodology presented in this manuscript, in this section further analysis is performed to cover one more crisis. We take the COVID-19 outbreak as another example to conduct the same analysis.

Data
The data sets utilized in this section are the same 41 American stock-market indices for 14 countries. The data were collected from the Yahoo Finance database, on the basis of daily closing prices [84]. It is known in literature that the COVID-19 outbreak started on 20/2/2020 and reached the peak on 7/4/2020 [87][88][89][90][91][92][93]. For this reason, the analysis covered the period of years 2018-2021, which were divided into five periods, namely, (c) Towards crisis: 1/11/2019 to 28/2/2020 4.

Model-Order Selection Criterion AIC i
In this subsection, the results of employing the AIC i criterion to calculate the SSM model order are presented in Table 4. It can be seen that the true model order corresponded to the largest AIC i value, which means that the optimal chosen order was three for the estimation process of the autoregressive coefficients for the SSM model. Knowing the true model order enables accurate estimation of the autoregressive coefficients, (i.e., α 1 , α 2 , and α 3 ) of the SSM by EM-KF scheme. These three autoregressive coefficients were estimated for each time period studied. Table 4. Results for order selection using AIC i . The table shows that the optimal order for the SSM is three. More precisely, the true model order corresponds to the largest AIC i value.

Results
The same methodology is employed for each period separately, to show the possibility of detecting a warning sign of a potential crisis, that is, in the framework of COVID-19. The results are presented in Figure 15. The figure shows that the surface topology for state (a) no crisis and state (e) after crisis have completely different topologies from the rest of surfaces. The surface presented in state (b) before crisis can be considered as a warning sign that there is a potential for crisis. The evidence has been shown in state (d) crisis time, when the cluster has moved from one state into another. The conclusion of the financial crisis resulting from the subprime mortgage crisis, presented in Section 3, can also be drawn for an early-stage detection of the financial crisis resulting from COVID-19 outbreak.
These results provide evidence that there is the possibility of detecting a warning sign some time before the actual crisis happens. Further analysis can be carried out by conducting DPC analysis similar to the one conducted in Section 3.

Discussion and Conclusions
The prediction of stock-market crashes has attracted interest over the years. Several researchers have studied this phenomenon using different approaches; however, the identification of which markets will be affected during the crisis has not been studied properly in the literature.
In this manuscript, the behavior of stock markets was demonstrated using the ARE algorithm. Based on the estimated SSM model order, the three-dimensional interaction parameter spaces were reconstructed, and a change in the topology of these spaces served to identify state transitions. Specifically, the EM-KF algorithm provides a means of constructing a space which shows how close each market is to others. When observing a cluster of markets smoothly presented as a heat map, a high density refers to strong interactions among markets. This, in turn, means that these markets may be the most strongly affected during a crisis. This approach provides an insight into the idea of collective motions of large numbers of entities. In other words, the use of this algorithm is beneficial and advantageous in the case of having big data, as it presents the results in the sense of pattern recognition.
For practical examples to validate the methodology introduced, two crises examples were studied. According to the analysis and results for both crises, there were two obvious state transitions; the first state refers to the time period before the crisis and the second state refers to the time period during the crisis. More precisely, the first state (before crisis) can be considered as a warning sign of a potential crisis. In addition, in the corresponding interaction parameter spaces, only the high-density regions were analyzed, in order to identify the most interacting markets specifically for the first state (before crisis). This means that both the most affected and most influential markets could be distinguished. To this end, DPC was utilized, such that interaction networks corresponding to these highdensity regions could be reconstructed. As a first step, identifying the markets which could potentially succumb to a crisis is crucial. Furthermore, to distinguish between markets, the most affected and most influential markets, degree-centrality measures were used to calculate the in-degree and out-degree for each reconstructed network node.
These analyses results allow investors and market players to track those markets that are going through a potential crisis. In addition, it provides them a warning sign of the potential time that a crisis might occur. These results are expected to be of aid for investors, in terms of improving the decision-making process in portfolio selection. This allows them to reduce the risk exposure associated with their portfolios. Furthermore, investors can also exclude or withdraw their investments from companies which are expected to go through a potential crisis, in order to protect their investments against certain loss. To sum up, this methodology allows for early-stage detection of a financial crisis.
Such analysis can not only be carried out for financial markets, but also for other systems. For example, in neuroscience, recognizing certain patterns can provide early warnings for brain diseases, one of the main objectives in this field. Another example is the study of climatic changes to observe and detect certain patterns, which can be useful in predicting a potential catastrophe.
Funding: This research received no external funding.

Data Availability Statement:
The dataset utilized in this manuscript is available and can be obtained from the Yahoo Finance website [84]. Note that, after clicking on the specified index, historical data should be chosen and the dates under study determined.

Country Index
United