Uncovering the Origins of Instability in Dynamical Systems: How Can the Attention Mechanism Help?

: The behavior of the network and its stability are governed by both dynamics of the individual nodes, as well as their topological interconnections. The attention mechanism as an integral part of neural network models was initially designed for natural language processing (NLP) and, so far, has shown excellent performance in combining the dynamics of individual nodes and the coupling strengths between them within a network. Despite the undoubted impact of the attention mechanism, it is not yet clear why some nodes of a network obtain higher attention weights. To come up with more explainable solutions, we tried to look at the problem from a stability perspective. Based on stability theory, negative connections in a network can create feedback loops or other complex structures by allowing information to ﬂow in the opposite direction. These structures play a critical role in the dynamics of a complex system and can contribute to abnormal synchronization, ampliﬁcation, or suppression. We hypothesized that those nodes that are involved in organizing such structures could push the entire network into instability modes and therefore need more attention during analysis. To test this hypothesis, the attention mechanism, along with spectral and topological stability analyses, was performed on a real-world numerical problem, i.e


Introduction
In many networks, specific nodes at critical positions within the network act as drivers that push the system into particular modes of action [1]. Observing large-scale network catastrophes in sociological and biological systems, such as the widespread effects of epilepsy in brain networks, poses a few questions-How does a chaotic regime start in complex networks? Where should we look for spreading origins or initiators in the network? Which nodes are most influential in driving changes in the network's dynamics? Why do these particular nodes have the potential ability to facilitate changes in the state of a system? Can imminent shifts be predicted within the network's dynamics prior to the onset and to enhance preparedness? Answering these questions motivated us to explore how the local structures within a network cause deteriorating stability and push the network into a catastrophic regime. This study tried to leverage principles in stability theory and connect them to attention mechanisms in neural networks. convolutional network (AGCN) is used to classify the nodes of this network. After the learning process of the AGCN, an attention coefficient for each pair of nodes is extracted and nodes with higher attention coefficients are identified. To check our hypothesis, which stated that those nodes that have the potential to move the entire network into the unstable mode are the nodes that have higher attention coefficients, three different stability analyses are performed. Finally, the nodes with higher instability risk are identified and compared to those with higher attention coefficients.

Simulated Dynamical System
Dynamical systems can be stabilized by state feedback, which involves using the state vector for controlling system dynamics. This feedback mechanism can be applied to controllable states. Identifying the most important states can be very helpful in designing an optimum closed-loop control system. This study tried to identify important states using the attention mechanism, and it proves that these important nodes are the ones that show more tendency toward instability.
One of the dynamical systems that require a feedback mechanism to reach stability is Piezoelectric tube actuators. The problem modeling of these actuators has been considered a real-world numerical example in this study. These actuators are frequently used in micro/nano-scale applications, and they are highly sensitive to uncertainties, including environmental variations. The piezoelectric tube actuator can be expressed by a linear Multi-Input Multi-Output state-space model using the following equations [24]: where A, B, C, and D are, respectively, the state matrix, input matrix, output matrix, and feedforward matrix. Variables x and y are, respectively, state and output vectors.  Figure 1 shows a graph that was created from the dynamical system (1), considering A as an adjacency matrix representing the topology of a network. The random feature set in Table 1 was also assigned to each node as attributes. It should be noted that the coefficient matrices of B, C, and D in the dynamical system (1) were not utilized to construct the graph. Instead, the adjacency matrix representing the topology of a network was created based on the coefficient matrix of A. The A matrix in the state-space representation does play a central role in determining the dynamics of the system, as it describes how the system evolves over time. The other matrices, such as the input, output, and feedforward matrices, often describe the relationships between the inputs, outputs, and states of the system but may not directly determine its dynamics.

Attention Mechanism
In this study, an attention-enhanced graph convolutional network (AGCN), including different modules, was used for node classification. These modules are explained in the Sections 2.2.1-2.2.4. PEER REVIEW 4 Figure 1. Graph representation of dynamical system in (1) by considering A as the adjacency matrix. Matrix A represents dynamics of hidden states in the piezoelectric tube actuator model, and each node corresponds to one state. In this study, the output vector, y, as well as variables C and D, have no impact on the graph shown in this figure. (Note: two decimal numbers and the self-loops were removed from the graph to make it visually easier to explore). Table 1. The random feature set assigned to each node as attributes.

Attention Mechanism
In this study, an attention-enhanced graph convolutional network (AGCN), including different modules, was used for node classification. These modules are explained in the Sections 2.2.1-2.2.4.

First Module: Initial Node Feature Embedding
The first module performs a self-attention operator on the nodes, which is a simple dot product (multiplying the node features matrix by its transpose) that helps us to represent the relationship among features. The intuition behind the self-attention operator is to express how two feature vectors are related in the input space. In this operation, a weighted average over all the input vectors is taken. A visual illustration of this weighted average is shown in Figure 2. The dot product over each pair of feature vectors gives their corresponding weights. If the sign of a feature matches with the other one, this weight  (1) by considering A as the adjacency matrix. Matrix A represents dynamics of hidden states in the piezoelectric tube actuator model, and each node corresponds to one state. In this study, the output vector, y, as well as variables C and D, have no impact on the graph shown in this figure. (Note: two decimal numbers and the self-loops were removed from the graph to make it visually easier to explore). Table 1. The random feature set assigned to each node as attributes.

Node Id
Feature Set The first module performs a self-attention operator on the nodes, which is a simple dot product (multiplying the node features matrix by its transpose) that helps us to represent the relationship among features. The intuition behind the self-attention operator is to express how two feature vectors are related in the input space. In this operation, a weighted average over all the input vectors is taken. A visual illustration of this weighted average is shown in Figure 2. The dot product over each pair of feature vectors gives their corresponding weights. If the sign of a feature matches with the other one, this weight receives a positive term, and if the sign does not match, the corresponding weight is negative. The magnitude of the weight indicates how much the feature should contribute to the total score. As the weight value produced by this self-attention operator lies anywhere between negative and positive infinity, both Leaky ReLU and SoftMax operators need to be applied to map all the weight values between zero and one and for their summation to be one.
where X is the matrix of the nodes' features. ω self indicates the self-attention weights. Y is the weighted average of the node features. Y is the weighted average of features passed through activation functions.
where X is the matrix of the nodes' features. ωself indicates the self-attention weights. Y is the weighted average of the node features. Y′ is the weighted average of features passed through activation functions. This weighted average of the node features produces a new set of node features as the output of the self-attention operator, which forms the inputs for the next module.

Second Module: Learnable Attention Mechanism
The second module is a single-layer feedforward neural network parameterized by the attention weight vector ( ). In this module, the feature vectors of each pair in a new set of nodes (produced in the previous module) are concatenated and passed through Leaky ReLU and SoftMax operators. The goal here is to extract the attention coefficient for each pair of nodes, which represents the importance of one node's feature to the fea- This weighted average of the node features produces a new set of node features as the output of the self-attention operator, which forms the inputs for the next module.

Second Module: Learnable Attention Mechanism
The second module is a single-layer feedforward neural network parameterized by the attention weight vector (ω Att ). In this module, the feature vectors of each pair in a Dynamics 2023, 3 219 new set of nodes (produced in the previous module) are concatenated and passed through Leaky ReLU and SoftMax operators. The goal here is to extract the attention coefficient for each pair of nodes, which represents the importance of one node's feature to the feature of another one [26].
where ω Att is the attention weight vector. y is the weighted average of features passed through activation functions. ω α is the attention coefficient matrix.

Third Module: Graph Convolution
The third module performs features aggregation from the neighbors of each node. This can be calculated by the multiplication of adjacency and feature matrices. It should be considered that the features of the node itself are as important as its neighbors. To consider the features of the node itself, an identity matrix needs to be added to the adjacency matrix (A) to obtain a new adjacency matrix (Ã). To prevent exploding/vanishing gradients because of high-degree/low-degree nodes and to reduce the sensitivity of the network to the scale of input data, the matrix multiplication needs to be scaled according to the node degrees (scaling by both rows and columns). This scaling places more weight on the lowdegree nodes and reduces the impact of nodes with high degrees. The motivation behind this scaling is that nodes with low degrees have greater influences on their neighbors, whereas nodes with high degrees have lower effects as they spread their influence on too many neighbors. As scaling is performed twice (once across rows and once across columns), the square root of the node degree is taken into account. The influence of one node feature on the other nodes can also be reflected by the dot product of the new adjacency matrix with the attention coefficient's matrix. Finally, graph convolution can be completed by putting all these modules together and forming a forward model with a learnable weight matrix of W.
where D is the degree matrix. Ã is the normalized adjacency matrix with added self-loops.

Final Module: Backpropagation and Training
The goal here with using backpropagation is to update each weight in the attention layer (matrix of ω Att ) and convolution layer (matrix of W) so that the actual output gets closer to the target output. To do this, the partial derivative of error (gradient) with respect to these weights is calculated. It should be considered that the partial derivative of the SoftMax function is the output × (1 − output), where W is a learnable weight matrix. Ã is the new adjacency matrix, and ω Att is the attention weight vector in a single-layer feedforward neural network. ω α is the attention coefficient matrix for each pair of nodes, y target is the target output, and y is the actual output.

Spectral Stability Analysis
The spectral stability of a network is governed by the largest negative eigenvalue of its adjacency matrix [27]. Our hypothesis is that nodes that need more attention are the ones that can push the entire network into unstable mode. To test our hypothesis and to check the effect of each node on the stability of the network, we looked at how the perturbation in one column of the adjacency matrix [9] reflected in its largest eigenvalue. The perturbation level was initially set to 0.5 and gradually increased to 3. The following matrix shows the resulting adjacency matrix after perturbing node 1 by ∆. Those nodes for which the largest negative eigenvalue of matrixÂ 2 moves towards zero while their perturbation level increases have the potential to push the entire network into the unstable mode.
where ∆ is the perturbation level.

Topological Stability Analysis
How are the connections with positive and negative signs arranged within the network? And how do such arrangements affect network stability? Positive and negative signs are, respectively, referred to as the synchronous and anti-synchronous correlation. According to structural balance theory [28], the stability of a three-entity system can be investigated by a signed association between two entities in the presence of a third party. This could be generalized to any signed network by considering the associations between its motifs/subgraphs and the signed links within the motifs. A motif is a recurring pattern of interconnections within the graph, formed by a subset of nodes with a path between each pair of nodes. The collective behavior of the imbalanced motifs may push the network toward an unstable state. Considering all possible ways to connect, a motif is structurally imbalanced when the multiplication of the signs on its edges turns negative. In a signed graph, counting the number of imbalanced motifs can tell us about the stability of the network. Figure 3 shows some examples of imbalanced arrangements.
perturbation level was initially set to 0.5 and gradually increased to 3. The following matrix shows the resulting adjacency matrix after perturbing node 1 by ∆. Those nodes for which the largest negative eigenvalue of matrix ̂2 moves towards zero while their perturbation level increases have the potential to push the entire network into the unstable mode.

Topological Stability Analysis
How are the connections with positive and negative signs arranged within the network? And how do such arrangements affect network stability? Positive and negative signs are, respectively, referred to as the synchronous and anti-synchronous correlation. According to structural balance theory [28], the stability of a three-entity system can be investigated by a signed association between two entities in the presence of a third party. This could be generalized to any signed network by considering the associations between its motifs/subgraphs and the signed links within the motifs. A motif is a recurring pattern of interconnections within the graph, formed by a subset of nodes with a path between each pair of nodes. The collective behavior of the imbalanced motifs may push the network toward an unstable state. Considering all possible ways to connect, a motif is structurally imbalanced when the multiplication of the signs on its edges turns negative. In a signed graph, counting the number of imbalanced motifs can tell us about the stability of the network. Figure 3 shows some examples of imbalanced arrangements. The influence of each node on the stability of the network can be determined by the number of times that a node appears in the imbalanced motifs. To better quantify this influence, a measure is defined that not only considers the imbalanced motifs with different orders but also considers the weights of paths which form a cycle within these motifs. For each node and for each imbalanced motif of size 3 that includes that node, the weights of the paths are multiplied and then added together. The same procedure is repeated for the imbalanced motifs of sizes 4, 5, and 6. The cube root of absolute value for the multiplication of these three calculations is then calculated, and the total cost associated with that node is obtained as follows: The influence of each node on the stability of the network can be determined by the number of times that a node appears in the imbalanced motifs. To better quantify this influence, a measure is defined that not only considers the imbalanced motifs with different orders but also considers the weights of paths which form a cycle within these motifs. For each node and for each imbalanced motif of size 3 that includes that node, the weights of the paths are multiplied and then added together. The same procedure is repeated for the imbalanced motifs of sizes 4, 5, and 6. The cube root of absolute value for the multiplication of these three calculations is then calculated, and the total cost associated with that node is obtained as follows: where ω is the weight of the path between each pair of nodes within an imbalanced motif.
The terms , respectively, refer to the subset of all possible imbalanced motifs of sizes 3, 4, 5, and 6 that include one specific node. D is the degree of the corresponding node. W corresponds to the normalized sum over the products of motif paths calculated for each node.

Symmetry-Breaking Stability Analysis
In complex networks, symmetry breaking means that some nodes attract or transmit the flow of information more than other nodes due to the network dynamics or the presence of external stimuli. This can lead to the emergence of instability within a network. This phenomenon can occur through the process of self-organization when the nodes in a network interact in a way that they form specific patterns or structures [29]. If a network experiences symmetry breaking, some nodes may begin to differentiate from other nodes and form distinct sub-networks. This process of differentiation can be thought of as a bifurcation, as it represents a sudden and significant change in the structure and behavior of the network. Occurrences of symmetry breaking can be seen in nature, for example, when vascular systems, such as river basins, evolve [30]. This process of differentiation can trigger a cascade of further differentiations within those sub-networks. As the differentiations continue to cascade through the network, they can lead to the emergence of a chaotic regime.
As network dysfunction can be a function of microscale structures and flow distributions [31], and spatial symmetry breaking is one way of studying patterns of information flow, this subsection aimed at identifying spreaders of instability in the network by exploring spatial symmetry-breaking behavior in local flow structures.
Inspired by Flabellate [32], more than two paths can be branched off from each bifurcation point. They are called flabellate-shaped bifurcation in this study. Depending on the polarity and strength of individual connections within this symmetry-breaking structure, a polarity transition can occur to form a fractal dipole (Figure 4). This topological polarity transition breaks the balance and has the potential to spread instability across the network. To find bifurcation nodes in a network where symmetry breaking along with polarity transition occurs, firstly, the hidden structure of information flow needs to be extracted As topological properties of a system affect its dynamics, extracting hidden information flow structures in the network provides a useful tool for understanding the dynamical behavior of the network. A graph-based random walk is one of the well-known algorithms inspired by natural language processing that can reveal these local structures of information flow [33]. Walking on the graph means moving from one node to another in the direction of the edge, and the flow of information within the network corresponds to the walker stepping between nodes. In addition to information flow, activity dynamics on networks can also be modeled by a graph-based random walk [34]. Considering that the random walk on a network can model information spreading and capture network dynamics [35], we leveraged a graph-based random walk algorithm to investigate the existence of symmetry-breaking structures that are not visible in the network and ranked the nodes of the network based on their ability in pushing the network into unstable modes These random walks represent the local structure of information flow distribution and show how information from one node spreads to the other neighboring nodes.
Our goal is to understand whether hidden local structures of information flow can push the network into unstable modes. We hypothesized that the emergence of local polarized flabellate-shaped bifurcation in the information flow pathway causes symmetry breaking and identifies the initiator of instability within the network.
Each division of bifurcation can branch off in the form of nested projections accompanied by a polarity transition. These polarized structures of information flow with fractal-like geometry tend to propagate perturbation faster across the network.
Inspired by the formula for an electric dipole moment for a pair of charges that is computed based on the magnitude of charges multiplied by the distance between them, a measure was introduced to represent the overall moment generated by the potential symmetric fractal dipole. The individual nodes within the graph are considered charges with a unit magnitude, and the edge weight represents the distance between two charges. In this study, this measure was called the normalized summation of transition cost (NSTC) Given an array of weights of traversed edges in each two-step random walk starting from node k, the product of edge weights corresponding to each path is computed. All the products of the path's weights traversed from each node are then summed up together and normalized by dividing by , where k is the index of the starting node, and N is the total number of paths traversed from the starting node. The normalized summation of the transition cost as a measure of the overall moment generated by the potential symmetric fractal dipole is: To find bifurcation nodes in a network where symmetry breaking along with polarity transition occurs, firstly, the hidden structure of information flow needs to be extracted. As topological properties of a system affect its dynamics, extracting hidden information flow structures in the network provides a useful tool for understanding the dynamical behavior of the network. A graph-based random walk is one of the well-known algorithms inspired by natural language processing that can reveal these local structures of information flow [33]. Walking on the graph means moving from one node to another in the direction of the edge, and the flow of information within the network corresponds to the walker stepping between nodes. In addition to information flow, activity dynamics on networks can also be modeled by a graph-based random walk [34]. Considering that the random walk on a network can model information spreading and capture network dynamics [35], we leveraged a graph-based random walk algorithm to investigate the existence of symmetrybreaking structures that are not visible in the network and ranked the nodes of the network based on their ability in pushing the network into unstable modes. These random walks represent the local structure of information flow distribution and show how information from one node spreads to the other neighboring nodes.
Our goal is to understand whether hidden local structures of information flow can push the network into unstable modes. We hypothesized that the emergence of local polarized flabellate-shaped bifurcation in the information flow pathway causes symmetry breaking and identifies the initiator of instability within the network.
Each division of bifurcation can branch off in the form of nested projections accompanied by a polarity transition. These polarized structures of information flow with fractal-like geometry tend to propagate perturbation faster across the network.
Inspired by the formula for an electric dipole moment for a pair of charges that is computed based on the magnitude of charges multiplied by the distance between them, a measure was introduced to represent the overall moment generated by the potential symmetric fractal dipole. The individual nodes within the graph are considered charges with a unit magnitude, and the edge weight represents the distance between two charges. In this study, this measure was called the normalized summation of transition cost (NSTC). Given an array of weights of traversed edges in each two-step random walk starting from node k, the product of edge weights corresponding to each path is computed. All the products of the path's weights traversed from each node are then summed up together and normalized by dividing by N k , where k is the index of the starting node, and N is the total number of paths traversed from the starting node. The normalized summation of the transition cost as a measure of the overall moment generated by the potential symmetric fractal dipole is: where k is the starting node, i is the visited node in the first step, and j is the visited node in the second step. The more the NSTC k is negative, the stronger the topological polarity transition is. Nodes become more unstable, given a stronger topological polarity transition. Unstable nodes have a higher potential to spread the instability across the network. The spreading ability of nodes is ranked based on the negativity of NSTC k .

Theoretical Justification for Analysis Approaches
Various theories have been developed that provide mathematical and conceptual tools for comprehending complex systems across different domains. Drawing inspiration from these theories, we aim to assess the stability of our complex system from multiple perspectives, including spectral, topological stability, and symmetry-breaking viewpoints. For example, theoretical justification for polarity-driven structural instabilities within a network can be explained by the bipolar fuzzy set theory, which captures the bipolar nature of real-world systems and allows for more accurate representation [36,37]. According to the Equilibrium energy and stability measures for bipolar dynamics [38][39][40], networks can attain a stable equilibrium state by balancing opposing interactions, such as attraction and repulsion, positive and negative feedback, or excitation and inhibition. When this balance is disrupted due to a change in the strength, sign, or topology of the interactions between network components caused by external stimuli, the network may experience structural instability that results in a new equilibrium state or even a bifurcation to an entirely different regime.

Results
Our hypothesis was tested on the state-space model of the actuator, represented by equation (1). First, the attention coefficients were extracted for all the nodes using an AGCN. Then, three different stability analyses were performed, and the nodes with a higher instability risk were identified in each analysis. These three stability analyses included: 1-spectral stability analysis, 2-topological stability analysis, and 3-symmetry-breaking stability analysis.

Attention Mechanism
Considering the connections between the nodes in Figure 1, the nodes of 0, 3, 4, and 7 form one cluster (4-degree nodes), and the nodes of 1, 2, 5, and 6 form another cluster (2-degree nodes). Two different scenarios were tested. In the first scenario, an AGCN model was trained to classify these two clusters. In the second scenario, the perturbation on the feature set of node 0 was applied, and an AGCN model was trained to classify these two clusters in the presence of the node feature perturbation. The perturbation of the feature set was performed by multiplying a factor of 2. The labels of 0.01 and 0.2 for the first and second clusters were, respectively, assigned. Figures 5 and 6 show the training loss as a function of iteration numbers for two scenarios, namely, without and with perturbation. In both scenarios, the training loss approximately converged to a loss value of 0.0028 after 500 iterations, confirming the robustness of an AGCN model to feature perturbation. Table 2 compares the model predictions against truth labels for the above-mentioned scenarios. The model predictions summarized in Table 2 were not affected by the feature perturbation of node 1, indicating the robustness of an AGCN model with respect to feature perturbation. Figure 7 shows that nodes #2 and #6 have the highest attention coefficients.   Table 2 compares the model predictions against truth labels for the above-mentioned scenarios. The model predictions summarized in Table 2 were not affected by the feature perturbation of node 1, indicating the robustness of an AGCN model with respect to fea ture perturbation. Figure 7 shows that nodes #2 and #6 have the highest attention coefficients.    Table 2 compares the model predictions against truth labels for the above-mentioned scenarios. The model predictions summarized in Table 2 were not affected by the feature perturbation of node 1, indicating the robustness of an AGCN model with respect to feature perturbation. Figure 7 shows that nodes #2 and #6 have the highest attention coefficients.   0.01 0.083 7 Figure 7. Comparing attention coefficients of each node for the scenario with and without perturbation on node 0. Nodes 2 and 6 are the ones that need more attention.

Spectral Stability Analysis
To test our hypothesis and check the effect of each node on the stability of the network, we looked at how the perturbation in one column of the adjacency matrix [9] reflected in its largest eigenvalue. To verify the need for unstable nodes for more attention, we performed spectral stability analysis and calculated the change in the largest eigenvalue of the adjacency matrix by increasing the perturbation level. Figure 8 shows how different nodes in the graph responded to an increase in the perturbation level. As seen in Figure 8, nodes 2 and 6 are those that may move the system towards instability because the largest eigenvalue gets closer to zero as the perturbation level on these nodes increases.

Spectral Stability Analysis
To test our hypothesis and check the effect of each node on the stability of the network, we looked at how the perturbation in one column of the adjacency matrix [9] reflected in its largest eigenvalue. To verify the need for unstable nodes for more attention, we performed spectral stability analysis and calculated the change in the largest eigenvalue of the adjacency matrix by increasing the perturbation level. Figure 8 shows how different nodes in the graph responded to an increase in the perturbation level. As seen in Figure 8, nodes 2 and 6 are those that may move the system towards instability because the largest eigenvalue gets closer to zero as the perturbation level on these nodes increases.

Topological Stability Analysis
To check to what extent unstable nodes are involved in imbalanced motifs within the networks, a topological stability analysis was performed. The goal was to detect those nodes that lie within the path of imbalanced motifs of different orders. Figure 9 shows the trajectory starts at node 2 and traverses within three sample motifs of a different order.

Spectral Stability Analysis
To test our hypothesis and check the effect of each node on the stability of the network, we looked at how the perturbation in one column of the adjacency matrix [9] reflected in its largest eigenvalue. To verify the need for unstable nodes for more attention, we performed spectral stability analysis and calculated the change in the largest eigenvalue of the adjacency matrix by increasing the perturbation level. Figure 8 shows how different nodes in the graph responded to an increase in the perturbation level. As seen in Figure 8, nodes 2 and 6 are those that may move the system towards instability because the largest eigenvalue gets closer to zero as the perturbation level on these nodes increases.

Topological Stability Analysis
To check to what extent unstable nodes are involved in imbalanced motifs within the networks, a topological stability analysis was performed. The goal was to detect those nodes that lie within the path of imbalanced motifs of different orders. Figure 9 shows the trajectory starts at node 2 and traverses within three sample motifs of a different order. In the network under study, all the imbalanced motifs of size 3 that passed a specific node were first extracted. The product of the weights of the paths within each motif was then performed and stored as a single score. Similar scores were computed for other motifs that passed the same node, and all these scores were summed up to obtain the total score for each node. The total score of each node was normalized based on the square of the node's degree. A similar procedure was repeated for imbalanced motifs of sizes 4, 5, and 6. Table 3 summarizes the total scores for each individual node and for each order of motif. The last column of Table 3 shows the total cost obtained from the multiplication of these three scores and takes the cube root of the absolute value of it. Figure 10 provides a visual representation of the total cost for each node and reflects the potential role of nodes 2 and 6 in moving the network into unstable mode.

Topological Stability Analysis
To check to what extent unstable nodes are involved in imbalanced motifs within the networks, a topological stability analysis was performed. The goal was to detect those nodes that lie within the path of imbalanced motifs of different orders. Figure 9 shows the trajectory starts at node 2 and traverses within three sample motifs of a different order. In the network under study, all the imbalanced motifs of size 3 that passed a specific node were first extracted. The product of the weights of the paths within each motif was then performed and stored as a single score. Similar scores were computed for other motifs that passed the same node, and all these scores were summed up to obtain the total score for each node. The total score of each node was normalized based on the square of the node's degree. A similar procedure was repeated for imbalanced motifs of sizes 4, 5, and 6. Table 3 summarizes the total scores for each individual node and for each order of motif. The last column of Table 3 shows the total cost obtained from the multiplication of these three scores and takes the cube root of the absolute value of it. Figure 10 provides a visual representation of the total cost for each node and reflects the potential role of nodes 2 and 6 in moving the network into unstable mode. In the network under study, all the imbalanced motifs of size 3 that passed a specific node were first extracted. The product of the weights of the paths within each motif was then performed and stored as a single score. Similar scores were computed for other motifs that passed the same node, and all these scores were summed up to obtain the total score for each node. The total score of each node was normalized based on the square of the node's degree. A similar procedure was repeated for imbalanced motifs of sizes 4, 5, and 6. Table 3 summarizes the total scores for each individual node and for each order of motif. The last column of Table 3 shows the total cost obtained from the multiplication of these three scores and takes the cube root of the absolute value of it. Figure 10 provides a visual representation of the total cost for each node and reflects the potential role of nodes 2 and 6 in moving the network into unstable mode.

Symmetry-Breaking Stability Analysis
To confirm whether the unstable nodes contribute to some polarized structures within the network, a symmetry-breaking stability analysis was performed. To do this, the local structure of the information flow distribution was extracted for each node. The process of extracting these information flow distributions for two single nodes has been plotted in Figures 11 and 12. Figure 11 shows all the paths that start at node 0 and traverse within a two-step random walk. A similar figure has been plotted for random walks starting from node 2 ( Figure 12).   Figure 10. The contribution of each node in forming imbalanced motifs with different degrees. This contribution shows the overall influence of each node on moving the network toward an unstable state.

Symmetry-Breaking Stability Analysis
To confirm whether the unstable nodes contribute to some polarized structures within the network, a symmetry-breaking stability analysis was performed. To do this, the local structure of the information flow distribution was extracted for each node. The process of extracting these information flow distributions for two single nodes has been plotted in Figures 11 and 12. Figure 11 shows all the paths that start at node 0 and traverse within a two-step random walk. A similar figure has been plotted for random walks starting from node 2 ( Figure 12).    By simultaneously plotting all the random walks corresponding to each node ( Figure 13), a clear pattern of flabellate-shaped bifurcation appeared on nodes 2 and 6.   As shown in Figure 14, nodes 2 and 6 are those influential spreaders able to push the network into unstable mode. as the measure of spreading ability of each node. The negative score of the NSTC corresponds to the node where topological polarity transition occurs. Those nodes with more negative values of the NSTC have a higher ability to spread the instability across the network. A detailed explanation of the computation process for determining the spreading ability of nodes can be found in Appendix A.

Discussion and Conclusions
This study provided a proof of concept for the triangular relationships between the attention mechanism, instability, and structural dynamics in the network. We showed that the mechanism that enables a machine learning model to focus on relevant nodes could be explained from the perspective of structural dynamics with its inherent instability. Here, we studied such triangle relationships in a linear dynamical system whose outcomes helped to compensate for the lack of explainability in the attention mechanism. In future studies, we aim to expand our investigations for nonlinear and nonstationary dynamical systems. Figure 13. Information flow tree rooted in each node. Flabellate-shaped bifurcations observed in nodes 2 and 6. To determine whether these bifurcations form a fractal dipole, the polarity transition should be checked.
As shown in Figure 14, nodes 2 and 6 are those influential spreaders able to push the network into unstable mode. Figure 13. Information flow tree rooted in each node. Flabellate-shaped bifurcations observed in nodes 2 and 6. To determine whether these bifurcations form a fractal dipole, the polarity transition should be checked.
As shown in Figure 14, nodes 2 and 6 are those influential spreaders able to push the network into unstable mode. as the measure of spreading ability of each node. The negative score of the NSTC corresponds to the node where topological polarity transition occurs. Those nodes with more negative values of the NSTC have a higher ability to spread the instability across the network. A detailed explanation of the computation process for determining the spreading ability of nodes can be found in Appendix A.

Discussion and Conclusions
This study provided a proof of concept for the triangular relationships between the attention mechanism, instability, and structural dynamics in the network. We showed that the mechanism that enables a machine learning model to focus on relevant nodes could be explained from the perspective of structural dynamics with its inherent instability. Here, we studied such triangle relationships in a linear dynamical system whose outcomes helped to compensate for the lack of explainability in the attention mechanism. In future studies, we aim to expand our investigations for nonlinear and nonstationary dynamical systems. Figure 14. NSTC as the measure of spreading ability of each node. The negative score of the NSTC corresponds to the node where topological polarity transition occurs. Those nodes with more negative values of the NSTC have a higher ability to spread the instability across the network. A detailed explanation of the computation process for determining the spreading ability of nodes can be found in Appendix A.

Discussion and Conclusions
This study provided a proof of concept for the triangular relationships between the attention mechanism, instability, and structural dynamics in the network. We showed that the mechanism that enables a machine learning model to focus on relevant nodes could be explained from the perspective of structural dynamics with its inherent instability. Here, we studied such triangle relationships in a linear dynamical system whose outcomes helped to compensate for the lack of explainability in the attention mechanism. In future studies, we aim to expand our investigations for nonlinear and nonstationary dynamical systems.
The contributions of this study bring several interesting insights: First, this study provided evidence for the relationship between the attention mechanism, dynamics, and unstable nodes. It was found that the most relevant parts of the input data in graph neural networks are those that have the ability to change the network dynamics. This study tried to explain the attention mechanism through the lens of instability analysis. Second, it was found that the collective behavior of the imbalanced motifs in the network is also determinative in changing network dynamics, and this gave evidence that we need to pay more attention to. Third, we observed polarity-driven instabilities in hidden fractal patterns in the network, and this shifted the analytic strategy to paying more attention to hidden structures of polarity transition.
We showed that the stability analysis offers a promising solution for performing the attention mechanism in a graph convolutional network faster and more efficiently by reducing the computational complexity, increasing the interpretability, and eliminating sensitivity to hyperparameters. Ranking the stability properties of nodes makes attention models more transparent and explainable and can be applied to a wide range of tasks, including weight pruning [41], sparsification, and reducing the number of non-zero weights in the network [42], making structural bias [43], etc.
The intent of these contributions is to open doors for finding explainable tools that are able to speed up the process of training in graph machine learning. We want to know if we can make the process of graph machine learning more adaptive by incorporating knowledge from stability analysis. Can prior knowledge be incorporated into the graph attention network through stability analysis? How can this help to improve the accuracy of graph attention networks? If we already know from stability analysis which nodes need attention before conducting any learning process, how can this speed up the process of aggregating information in node embedding? Can the attention mechanism be replaced with stability analysis? Can we get rid of hyperparameter tuning in mechanisms such as biased random walks by determining the transition probability based on their spreading ability and stability analysis? These are the kinds of questions that we will be answering in our upcoming works.
An important aspect to consider for any future work is to apply bipolar fuzzy set theory as a theoretical framework to better understand polarity-driven structural instabilities within networks. By capturing the bipolar nature of real-world systems, this approach allows for a more accurate representation of complex network dynamics [36,37]. Furthermore, stability measures for bipolar dynamics provide a means to study how networks achieve a stable equilibrium state by balancing positive and negative interactions [38][39][40]. These measures can help us investigate how external stimuli that change the strength, sign, or topology of the interactions between network components can disrupt this balance, leading to structural instabilities. We believe that applying these theories can enhance our understanding of the underlying mechanisms behind the emergence of structural instabilities in networks.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
In symmetry-breaking stability analysis, the following calculations are the process of computing the spreading ability of each node based on Equation (7). Figure A1 shows one example of the path's weights traversed during a single random walk.