Rumor Detection in Social Media Based on Multi-Hop Graphs and Differential Time Series

: The widespread dissemination of rumors (fake information) on online social media has had a detrimental impact on public opinion and the social environment. This necessitates the urgent need for efﬁcient rumor detection methods. In recent years, deep learning techniques, including graph neural networks (GNNs) and recurrent neural networks (RNNs), have been employed to capture the spatiotemporal features of rumors. However, existing research has largely overlooked the limitations of traditional GNNs based on message-passing frameworks when dealing with rumor propagation graphs. In fact, due to the issues of excessive smoothing and gradient vanishing, traditional GNNs struggle to capture the interactive information among high-order neighbors when handling deep graphs, such as those in rumor propagation scenarios. Furthermore, previous methods used for learning the temporal features of rumors, whether based on dynamic graphs or time series, have overlooked the importance of differential temporal information. To address the aforementioned issues, this paper proposes a rumor detection model based on multi-hop graphs and differential time series. Speciﬁcally, this model consists of two components: the structural feature extraction module and the temporal feature extraction module. The former utilizes a multi-hop graph and the enhanced message passing framework to learn the high-order structural features of rumor propagation graphs. The latter explicitly models the differential time series to learn the temporal features of rumors. Extensive experiments conducted on multiple real-world datasets demonstrate that our proposed model outperforms the previous state-of-the-art methods.


Introduction
Since the advent of the Internet era, online social networks have become an indispensable part of our lives.Platforms such as Twitter, Facebook, and Sina Weibo, which focus on social networking or possess social networking attributes, have become primary channels for people to access and share information on a daily basis.However, the exponential growth of content on social media platforms has been accompanied by a proliferation of rumors (fake information), which has had a detrimental impact on the online social environment [1].The widespread dissemination of rumors distorts facts, leading individuals toward erroneous positions and thereby undermining the public opinion within social networks and posing a serious threat to society [2].
Detection methods and intervention strategies for rumors on social networking platforms have received considerable attention.Facebook encourages users to actively flag suspicious information, while Sina Weibo has established a dedicated Weibo Community Management Center to handle user reports of fake information.However, these existing approaches rely solely on manual verification which, although typically accurate, is limited in effectiveness due to the complexity of the identification process and the constraints of human resources in practical application.Consequently, an increasing number of researchers have been dedicating efforts to developing algorithms for detecting rumors, with the aim of automatically identifying rumors on the internet and addressing the challenges posed by the overwhelming volume of rumors that surpass the capacity of manual verification.
Early automatic rumor detection methods primarily relied on traditional machine learning.Researchers utilized feature engineering to model information from various dimensions of rumor events, followed by supervised training of classifiers to classify rumors and non-rumors.For instance, the authors of [3] employed decision trees, those for [4] utilized random forests, and the authors of [5,6] employed support vector machines (SVMs).These methods demonstrated certain rumor detection capabilities but heavily depended on feature engineering, thus exhibiting noticeable limitations.In recent years, however, deep neural networks (DNNs) have gained popularity, eliminating the need for intricate feature engineering.By training on raw data alone, DNNs can achieve optimal performance, making them widely applicable in the field of rumor detection.For instance, the authors of [7] employed recurrent neural networks (RNNs) to learn the textual content of rumors, while those for [8] utilized convolutional neural networks (CNNs) to extract textual information.Furthermore, with the development of graph neural networks (GNNs) and RNNs, effective modeling of the spatiotemporal features of rumors has become feasible.For instance, the authors of [9] used a tree-structured recursive neural network for propagation feature extraction, and those for [10,11] employed graph convolutional neural networks (GCNs) for structural feature extraction.The authors of [12] captured text and propagation features using a graph encoder and decoder model, and those for [13] utilized gated recurrent units (GRUs) to extract a rumor's temporal and propagation features.These methods have demonstrated excellent performance in rumor detection tasks.
Despite the effective progress made in previous work, several issues still remain.First, existing methods primarily utilize GNNs based on message-passing frameworks to learn the structural features of rumors.However, the propagation of rumors follows a tree structure that unfolds based on time and interaction relationships, with the information source serving as the root node.The connectivity within rumor propagation graphs is relatively simple, but the depth exceeds that of typical graphs.Figure 1 illustrates the distinction between typical graphs and rumor propagation graphs.The characteristics of rumor propagation graphs pose limitations on the application of existing GNN frameworks for rumor detection tasks.Specifically, when faced with deeper node relationships, traditional GNNs can only aggregate information from high-order neighboring nodes by stacking multiple layers.However, this approach leads to issues such as oversmoothing of node features and gradient vanishing, resulting in performance degradation of the network [14,15].Therefore, a challenge lies in how to better extract the interaction relationships among multi-hop neighbors within rumor propagation graphs.
Furthermore, the current approaches for extracting the temporal features of rumors, whether based on dynamic graphs [16][17][18][19] or time series [20,21], only focus on learning features at the level of the original semantic information.However, some studies have indicated that word embeddings, which are used to represent semantic information, possess certain distinctive properties.By analyzing arithmetic operations on word embeddings, the authors of [22] discovered that certain word embedding models can encode linguistic relational patterns.Moreover, the authors of [23][24][25] conducted case studies on the meanings of individual neurons in word embeddings and found systematic distributions of different linguistic attributes within the embeddings.This allows us to consider word embeddings as relatively stable signals and obtain their changing information through differential operation.The advantage of modeling differential time series explicitly is that rumor features can be extracted from a perspective that varies in time series.
To effectively capture the spatiotemporal features of rumors and achieve better detection performance, this paper proposes a novel self-connected multi-hop graph neural network and differential temporal perception (SMGaDTP) model.The model consists of two main components: the self-connected multi-hop graph attention network (SC-MGAT) module based on multi-hop graphs and the differential temporal perception (DTP) module based on differential time series.The former is utilized to capture the structural features of rumors during their propagation, while the latter focuses on extracting features from the temporal aspects of rumors.Additionally, data augmentation techniques such as DropEdge [26] and TemporalDrop are applied to the SC-MGAT and DTP modules, respectively.The main contributions of our work can be summarized as follows:  Rumor propagation graphs often exhibit a significant distance from the root node to the leaf nodes, while typical graphs lack a discernible root node, with all nodes maintaining a high level of connectivity.

Related Work
Currently, numerous scholars have proposed various methods for rumor detection tasks employing different feature types and architectures.The main features used include text, visuals, user profiles, statistics, structures, and time sequences.The primary architectures encompass traditional machine learning, CNNs, RNNs, GNNs, and dynamic graph neural networks (DGNNs).In this section, we will primarily focus on reviewing existing work that utilizes text and propagation features.Table 1 shows a summary of the previous work and points out the issues that this paper aims to overcome.

Foundation Method Limitation
Machine learning [3,27,28] Only shallow features can be expressed Text-based [7,8,[29][30][31][32][33][34][35][36][37] Lack of propagation features RNN-based [9,13,20,21,38] Weaker ability to model structural features GNN-based [11,39- Text is the core feature of rumors.In the era of traditional machine learning, researchers primarily relied on a series of feature engineering techniques to extract information such as lexical features, symbolic features, and sentiment features from text.For instance, the authors of [3] subdivided text features into string length, presence of emoticons, and personal pronouns.The authors of [27] incorporated the word distribution ratios of rumor and non-rumor information as text features.The authors of [28] included features such as tags, links, and questions present in the text.However, these methods only utilized shallow information and had limited generalization capabilities.Subsequent deep learning algorithms overcame the limitations of traditional machine learning and enabled modeling of deep semantic information.The authors of [7] used RNNs to capture long-range dependencies in text.The authors of [8] employed CNNs to extract deep features from text.The authors of [29] introduced a word-sentence-document structure to extract hierarchical text features while preserving the text's structural hierarchy.Attention mechanisms automatically capture the dependencies between words, giving them a significant advantage over CNNs or RNNs in modeling text content.Consequently, attention mechanisms have been employed to model tweet information in rumors by researchers, such as those in [30,31].Building upon this, some scholars [32,33] recognized that different domains have distinct linguistic expression forms and incorporated domain-specific terminologies into text features.Additionally, other researchers [34][35][36][37] noted the presence of emotional and thematic information in rumor events and extracted features such as sentiment and topics from text for rumor detection tasks.

Propagation-Based Rumor Detection
In order to effectively capture the multidimensional features of rumors and achieve better detection performance, recent works have focused on exploring the differences between rumors and non-rumors in the propagation process.They have modeled events from the perspective of propagation, including structural and temporal aspects, to achieve more accurate identification.The authors of [38] modeled rumor propagation as a propagation tree and employed kernel learning to extract features from the propagation tree.Similarly, the authors of [9] modeled rumor propagation as a tree and used recursive units to learn propagation features in a top-down and bottom-up manner.The authors of [13] considered the influence of temporal relationships based on the tree structure and proposed a deep spatiotemporal network to simultaneously learn the structural and temporal features of rumors.The authors of [20] combined an RNN with attention mechanisms to capture the contextual changes of semantic information over time in events.The authors of [21] modeled rumors as dynamic time series over time and used GRU units to learn temporal information.
Apart from using RNN architectures to extract propagation features, another mainstream approach is to model events as graphs and utilize graph neural networks under the message-passing framework to learn the propagation features of rumors.For instance, the authors of [39] proposed a GNN-based semi-supervised method for fake news detection.the authors of [11] employed a bidirectional graph convolutional network to learn the propagation and aggregation structures of rumors and included a root node enhancement mechanism in each GCN layer to strengthen the influence of the rumor source on the entire rumor event.The authors of [43] proposed source identification based on graph convolutional networks, using spectral domain convolution to obtain the multi-hop neighbor information of nodes and locate multiple rumor sources without prior knowledge of the underlying propagation model.In addition to the aforementioned methods based on homogeneous graphs, the authors of [40] modeled the global relationships among all source tweets, retweets, and users as a heterogeneous graph to capture richer structural information.The authors of [41] constructed a word-user heterogeneous graph based on the textual content of rumors and the propagation of source tweets, and they proposed a heterogeneous graph attention network framework based on metapaths to capture the global semantic relationships of text content and global structural information of source tweet propagation.The authors of [42] introduced the concept of a joint graph to integrate the propagation structure of all tweets and mitigate sparsity issues, and they utilized network embeddings to learn the representations of nodes in the joint graph.
Considering that static graph structures cannot model the temporal features of rumor propagation, recent research has extended events to dynamic graph structures.The authors of [16] represented rumor posts and their response posts as discrete dynamic graphs and used graph snapshot representation learning with attention mechanisms to capture the structural and temporal information of rumor propagation.The authors of [17] introduced a novel framework for fake news detection based on temporal propagation, modeling the temporal evolution patterns of real-world news as graph evolution patterns under continuous time dynamic diffusion network settings.The authors of [18] modeled each news propagation graph as a series of graph snapshots recorded at discrete time steps and used GCN and attention mechanisms to extract temporal information.The authors of [19] proposed a dual dynamic graph convolutional network that models the dynamic information in message propagation and the dynamic information in the knowledge graph background, learning the two types of structural information in a unified framework.

Problem Statement
The propagation of an event in the social network space can be viewed as a set of interacting temporal signals.Therefore, it can be represented by an undirected graph with temporal relationships, denoted as T = (V (t), E (t)), where V (t) = {v 1 , e (t 2 ) 2 , ..., e (t m ) m }.Here, n represents the number of nodes (i.e., the number of tweets in an event), and m represents the number of edges (i.e., the number of interaction relationships in an event).Each node v at time t i , and each edge e represents a response relationship between a tweet v j that appeared at time t j and a previous tweet v p , p ∈ [1, j − 1].It is important to note that the response relationship is undirected, and the nodes and edges are sequences with a temporal order.For each tweet v in the graph, its initial feature representation can be denoted as h i , which is obtained through processing the information of the original tweet, thereby forming a set of node features H = {h 1 , h 2 , ..., h n }, where h i ∈ R d and d represents the dimensionality of the node embeddings.As for the set of edges in the graph, this is represented by the adjacency matrix A = (a ij ) n×n , where The task of rumor detection aims to accurately identify which information in a series of events is a rumor and which is not.For a rumor detection dataset, it can be represented as C = {c 1 , c 2 , ..., c N }, where c i represents a specific event in C, N denotes the total number of events in the dataset, and c i = (T i , ŷi ), T i is the temporal propagation graph of c i , while ŷi ∈ {0, 1} represents the label of c i (where zero indicates a non-rumor and one indicates a rumor).The objective of this study is to learn a mapping function F from the dataset C such that, given the propagation graph T i of any other event, we can use F to track the interactions in T i and obtain its predicted label ŷi , enabling accurate classification of the event.In other words, the objective is to achieve F (T i ) → ŷi .

Model 4.1. Model Framework
This section provides a brief introduction to the proposed model, and the overall framework of the model is illustrated in Figure 2. The input of the model is a representation of a specific event in the form of a rumor propagation temporal graph T , where each node corresponds to a tweet and each edge represents the replying relationship between tweets.The output of the model is the probability that the event is a rumor.For an input T , we consider two components: the adjacency matrix and the node set.Regarding the adjacency matrix, it is initially decomposed into two directed adjacency matrices representing the paths of propagation and diffusion: where ToDirect(•) denotes the decomposition of an undirected adjacency matrix into two distinct directed adjacency matrices and A and A are the upper triangular and lower triangular matrices, respectively.Subsequently, matrix exponentiation is applied to compute N multi-hop adjacency matrices, each containing neighbors at varying distances: where Filter(• = 1) signifies the preservation of elements in the matrix that are equal to one, A k represents the adjacency matrix that exclusively contains k-hop neighbors, A = {A 1 , A 2 , ..., A N } denotes the collection of multi-hop adjacency matrices, and N represents the number of samples taken.As for the node set, the corresponding initial embeddings H = {h 1 , h 2 , ..., h n } are obtained through an embedding layer, which incorporates information such as the text, timestamp, and structural characteristics of each node.After obtaining the node embeddings and multi-hop adjacency matrices, we proceed to learn the features of the rumor in terms of both the structural and temporal aspects.Specifically, the initial node embeddings H and the collection of multi-hop adjacency matrices A are fed into the SC-MGAT module for in-depth structural feature learning.On the other hand, the node embeddings H are sorted in ascending order based on their timestamps: and the resulting time series S = {s 1 , s 2 , ..., s n } is then fed into the DTP module for temporal feature learning.Finally, the structural feature and temporal feature are combined and fed into a classifier composed of linear layers, yielding the ultimate classification result.

Embedding Layer
The embedding layer is designed to create original feature representations for each input graph.For an input graph T with n nodes, each node v i in the graph considers three aspects of information-tweet text, timestamp, and structure-and encodes them separately.Specifically, for the textual content, pretrained word embeddings are utilized to obtain robust text feature representations: where H word = [w 1 , w 2 , ..., w n ], w i ∈ R w represents the word embedding, and w is the dimension of embedding.Regarding the timestamp, it is decomposed and encoded to extract information such as the year, month, day, hour, minute, and second: where H time = [t 1 , t 2 , ..., t n ], t i ∈ R t represents the time embedding and t is the dimension of embedding.Most existing works have not taken into account encoding the structural information of nodes.However, previous research [44] has demonstrated that the expressive power of the classical GNN is limited by the 1-Weisfeiler-Lehman (1-WL) graph isomorphism test.Furthermore, the authors of [45] highlighted the importance of structural information for graph classification tasks.Encoding structural features for nodes can alleviate these limitations.Additionally, in the propagation process of rumor events, the local neighborhood structure of nodes reflects rich social information and can, to some extent, indicate the influence of nodes throughout the entire event.Therefore, drawing inspiration from [45,46], structural information encoding can be employed for nodes in graph T : where A ∈ R n×n represents the adjacency matrix of the graph T , 1 ∈ R n represents a vector of ones, diag(•) refers to a vector containing the diagonal elements of a matrix, q represents the number of recursive encodings, H struct = [u 1 , u 2 , ..., u n ], and u i ∈ R q represents the structural embedding.Finally, the textual, temporal, and structural embeddings are concatenated to obtain the initial embedding for each node: where H = [h 1 , h 2 , ..., h n ], h i ∈ R d represents the node embedding and d = w + t + q is the dimension of embedding.

Self Connected Multi-Hop Graph Attention Network (SC-MGAT)
A few researchers have noticed the issue of high-order neighbor message passing in GNNs.Inspired by the deep residual network (ResNet) [47], researchers such as those in [15] have employed residual connections to alleviate the problems of gradient vanishing and oversmoothing.However, they are not efficient in aggregating information from multiple hops of neighboring nodes.Another approach proposed in [48] involves concatenating features from multiple hops of neighbors and aggregating them using attention mechanisms.However, this approach simplifies the hierarchical structure of aggregating multi-hop neighbors.Therefore, building upon previous works, the self-connected multihop graph attention network (SC-MGAT) is proposed, which consists of two components: the multi-hop graph attention network (Multi-hop GAT) and the self-connected aggregation (SCA).Figure 3 illustrates the workflow of the SC-MGAT.

Multi-Hop Graph Attention Network (Multi-Hop GAT)
The Multi-hop GAT builds upon and improves the traditional message-passing paradigm.Specifically, for a given graph G = (V, E ), the input consists of a set of node representations h i ∈ R d | i ∈ V and the corresponding edges E .The output is a new set of node representations h i ∈ R d | i ∈ V .The nodes are updated using the following function: This differs from the approach presented in [44], where N (k) represents the k-hop neighbors of node v, indicating that during the lth layer of message passing, the k-hop neighbors' features from the l − 1 layer are utilized.The function f denotes the aggregation function, while φ represents the update function and where Z ≥ 1 refers to the set of positive integers.For a general GNN, ζ(l) ≡ 1, indicating the continuous use of one-hop neighbors message passing.By selecting different mapping functions ζ, various receptive fields for message passing can be achieved, thus enabling efficient aggregation of global graph information.
Inspired by [49,50], the aggregation and update process of neighbor nodes in this paper is as follows.First, for any node i and its k-hop neighbor j, the scoring function ϑ is used to compute the edge score between the two nodes.The edge score represents the importance of neighbor j to node i: where a ∈ R d and W 1 ∈ R d ×d are trainable parameters.After obtaining the edge scores for all neighbors j ∈ N (i), a softmax function is applied to normalize the edge scores.Finally, the node i obtains its new representation h i through weighted aggregation of the edge scores: where For graph-level classification tasks, after updating the features of the graph using a GNN, it is typically necessary to perform a readout operation to extract information that represents the entire graph: where h g represents the global representation of the graph G and f read refers to the method used to extract the global information from the graph.Common readout methods include global average pooling, Top-K pooling [51], DiffPool [52], and ASAPooling [53].However, the previous methods only performed the readout operation in the final iteration step, which is not favorable for our proposed Multi-hop GAT.Specifically, the primary objective of the Multi-hop GAT is to hierarchically aggregate high-order neighbor information to obtain rich graph representations, while performing a readout only in the final layer would result in the loss of information from previous layers.Meanwhile, generating graph-level representations, as pointed out in [54], is equivalent to having a virtual super node in the graph where real nodes aggregate information along virtual edges toward the super node: where h s represents the representation of the virtual super node.When the readout is performed only in the final layer, the self-loop of the super node is consistently overlooked which, as mentioned in [55], leads to an insufficient representational capacity for the super node.Therefore, inspired by [54], to better hierarchically aggregate multi-hop neighbor information and enhance the expressive power of global graph representations, we introduce the self-connected aggregation (SCA) module into Multi-hop GAT.In this module, the global information at each layer is determined by both the node information of the current layer and the global information from the previous layer.Specifically, the computation is as follows: where l denotes the layer of the GNN and f nn represents the linear projection.By integrating Multi-hop GAT with SCA, the algorithmic flow of the SC-MGAT can be obtained (see Algorithm 1).
A k = A[k] //obtain the adjacency matrix containing only the k-hop neighbors 5: DropEdge(A k )

6:
for all h v ∈ H do 7: //message passing 9: end for 10: Rumor events in social networks evolve in chronological order.When a hot topic emerges, an increasing number of users participate and contribute more information.These pieces of information exhibit rich variations in terms of cycles or trends, such as changes in sentiment polarity and topic shifts.Inspired by [56,57], this paper represents the temporal propagation process of events as a multivariate time series S = {s 1 , s 2 , ..., s n }, where n represents the number of tweets related to a specific event and s i ∈ R d denotes the feature representation at each time step.Based on this, the differential temporal perception (DTP) module is proposed to capture the evolutionary features of events at the temporal level.The process of the DTP module is shown in Figure 4.The first step of DTP is to perform differential time series modeling.First, to simulate the temporal changes of events, we perform a dropout operation on the initial series S while preserving the temporal relationships: where S = {s 1 , s 2 , ..., s m }.Then, the differential time series ∆ = {d 1 , d 2 , ..., d m } based on the original time series S is constructed, where d 1 = 0, and Similar to [58], to retain the positional information of the sequence, positional encoding is applied to the differential series, resulting in a series P = {p 1 , p 2 , ..., p m } with the same dimensions as ∆: PE (pos,2i+1) = cos(pos/10000 2i/c ), where pos ∈ [0, m] represents the position in the series and i ∈ [0, c] represents the dimension.The positional encoding series P is added to the differential series ∆ to obtain a new series ∆ = { d1 , d2 , ..., dm }, where di = d i + p i .Then, the series ∆, enhanced with positional encoding, will undergo local window attention (LWA) calculation.Specifically, to ensure temporal and local dependencies, a fixed window size ω is used, and for any di ∈ ∆, a corresponding subsequence Di = [ di , di−1 , ..., di−ω ] is extracted, where i > ω.If there are fewer than ω elements before di , then all preceding elements are considered.Subsequently, similar to [58], a multi-head attention calculation is performed for all elements in the subsequence Di with corresponding di values: Here, ω+1) .In this paper, we divide the input based on the feature dimension for different heads of the multi-head attention.Hence, ε p = d/h, where h is the number of heads.
For all di ∈ ∆, we have d i = MultiHead( di , Di ).This process yields a new sequence ∆ = { d 1 , d 2 , ..., d m } that incorporates local information.The new sequence is then fed into the GRU to learn temporal information.The forward propagation process is as follows: where r t represents the reset gate, z t represents the update gate, and W r , W z , W h ∈ R d ×(d+d ) are trainable parameters, while d t represents the input at time t, h t−1 ∈ R d is the hidden state at time t − 1, and d is the output dimension of the GRU.The output at the final time step is used as the temporal feature representation of the entire sequence.
Furthermore, by employing a multi-layer stacking approach, it is possible to learn the temporal features over a broader range.Specifically, after obtaining a new sequence ∆ that fuses local information through LWA, a larger range of information can be further fused on the basis of ∆ : Stacking LWA enables a linear expansion of the receptive field.Finally, all obtained sequences { ∆ 1 , ∆ 2 , ..., ∆ N } are processed in parallel using a GRU for temporal feature extrac-tion, and the resulting features are summed together to obtain the final temporal features:

Classification Layer
The output of the SC-MGAT module is the structural feature representation h (l) g , and the output of the DTP module is the temporal feature representation h t sum .These two features are combined through an addition operation to obtain the final feature representation h: Subsequently, several linear layers followed by nonlinear activation layers are applied to obtain the final predicted label ŷ: where W f ∈ R 1×d represents the trainable weight parameters and b f represents the bias term.Finally, the model is trained using the cross-entropy function where Θ m denotes the trainable parameters of the model, y i represents the true label of event i, and ŷi represents the corresponding predicted label.

Experiments
This section extensively evaluates the proposed model to demonstrate its effectiveness in the task of rumor detection.Specifically, Section 5.1 introduces the datasets and preprocessing methods.Section 5.2 presents the baseline models used for comparison.The experimental parameter settings are outlined in Section 5.3, while Section 5.4 showcases the experimental results and analysis.

Datasets and Preprocessing
There are two highly representative real-world and publicly available datasets constructed from Twitter and Weibo in the rumor detection task.Each dataset consists of a series of news events, with each event belonging to categories such as real news or fake news.Below is the introduction to the datasets:

•
Weibo: Initially proposed in [7], data were captured from Sina Weibo, a popular online social media platform in China.This dataset contains comprehensive event information, including text, timestamps, and user configurations, all stored in the JSON file format.It consists a total of 2351 real news and 2312 fake news instances.• Twitter 15 and Twitter 16: First introduced in [38], data were collected from the widely used online social networking platform Twitter.Each dataset consists of news events categorized into four classes: unverified, non-rumor, true, and false.These datasets include only the IDs of the tweets.We collected additional information such as reply texts and timestamps using the Twitter API.
We performed some simple preprocessing on the Weibo, Twitter 15, and Twitter 16 datasets.Subsequently, the preprocessed Twitter 15 and Twitter 16 datasets were merged into one dataset named Twitter, and the final datasets are described in Table 2.
For Twitter 15 and Twitter 16, we initially removed invalid tweets caused by user deletions or account suspensions.When constructing propagation graphs based on the reply relationships, we directly linked tweets with missing parent nodes to their source tweets.Next, we extracted events from Twitter 15 and Twitter 16 belonging to the nonrumors and true rumors categories, considering them true news and fake news, respectively.These events were combined to create a larger dataset called Twitter, containing 562 fake news and 575 true news instances.As for the Weibo dataset, due to limitations in computational resources, we had to remove events with over 2000 nodes.Similarly, we constructed the corresponding propagation graph based on the reply relationships.Finally, this resulted in a dataset comprising 2133 fake news and 2209 true news instances.

Baselines
The baseline models used for comparison with the proposed model in this paper are as follows, and Table 3 presents the feature types used by each model: BiGCN [11]: A model based on GCNs that models rumor events separately using propagation and diffusion structures, followed by a Bidirectional GCN for feature extraction.

Experimental Set-up
In the experiment of this section, all SVM-based methods were implemented using sklearn, while all deep learning-based methods were implemented using PyTorch.The parameters were optimized using the Adam optimizer [60] with an initial learning rate of 5 × 10 −5 and weight decay of 5 × 10 −4 .Similar to a previous work [13], the entire dataset was divided into training, validation, and testing sets at an 8:1:1 ratio.The model with the best performance on the validation set was selected for testing.To ensure fair comparison, all models that utilized textual information employed the average of all word embeddings from the first layer encoder and the last layer encoder of the pretrained BERT model (firstto-last layer average), resulting in a tweet embedding of dimension 768.It was observed that the first-to-last layer average approach yielded significant performance improvements compared with using the CLS token or averaging only the last layer, particularly on the English datasets.
In the SC-MGAT module, the parameter ζ(l) was set to 2l − 1, where l ∈ [1,6].This means that each GNN layer utilized multi-hop neighbors with an interval of 1 (1-hop, 3-hop, 5-hop, ..., 11-hop), resulting in a total of six message-passing layers.For the DTP module, the window size ω was set to four, and the number of layers in LWA was set to two.For all datasets, the evaluation metrics used were precision (Prec.),recall (Rec.),F1 score (F1), and accuracy (Acc.) based on the predicted results.For each sample c i ∈ C, the differences between the predicted values and true values were measured using true positive (TP), false positive (FP), false negative (FN), and true negative (TN) results, and the metrics were calculated using the following formulas:  4 presents the evaluation results of all methods using the Weibo dataset and the Twitter dataset, where the best model is marked with bold formatting.From the table, the following conclusions can be drawn: • Overall, our model outperformed other baseline methods in all datasets.On the Weibo dataset, our model improved the Acc and F1 by 1.39% and 1.42%, respectively, compared with the best baseline.On the Twitter dataset, the improvements were 3.51% and 3.57%, respectively.This confirms that our model effectively extracted more features compared with the baseline models, demonstrating the importance of high-order neighbor interaction features and differential temporal information in rumor detection tasks.

•
The traditional machine learning-based methods exhibited lower performance across all datasets compared with the deep learning-based methods.This is because traditional methods rely on manually selected features, while deep learning algorithms can capture complex high-order features.Moreover, traditional machine learning only utilizes statistical-level features for text content, making it difficult to model semantic information.

•
Consistent with [9] and others' findings, BU-RvNN performed worse than TD-RvNN.This is because BU-RvNN compresses features into a single node representation, resulting in significant information loss.In contrast, TD-RvNN performs pooling on all leaf nodes to obtain the final features, thereby retaining more useful information.STS-NN utilizes both temporal and structural features simultaneously, but it still failed to achieve satisfactory results.This is partly due to compressing all the information into the last node.• Among all the baseline models, BiGCN demonstrated stronger performance.Despite not utilizing temporal information, its ability to extract propagation and diffusion features enabled better structural information learning compared with STS-NN and TD-RvNN.However, the lack of high-order neighbor information (stacking only two layers of the GCN, capturing at most the interaction features of the two-hop neighbors) and the lack of temporal information restricted its performance.

Comparison of Early Detection
Early detection of rumors can significantly mitigate the damage caused by their spread.Numerous scholars have employed differential equations and numerical simulation methods to study the rate of rumor propagation and the control strategies [61][62][63].These studies have all indicated that rumor dissemination exhibits a significant outbreak period, and early intervention in rumors can greatly reduce their scale and destructiveness.Additionally, some researchers have explored the sentiment scope on social media and pointed out that fake news generates intense negative emotions with considerable aggressiveness and stability.Furthermore, it tends to spread widely within the social space over time [64][65][66].
All of these studies have demonstrated the utmost necessity of intervening early in the case of rumors.
Therefore, SMGaDTP and all well-performing baseline models were subjected to early rumor detection capability testing.Specifically, early rumor detection testing requires setting a series of truncation times where only tweets published before the truncation time are used to test the detection capabilities of all models.In order to demonstrate the early detection capabilities of all models comprehensively, the truncation time on the Twitter dataset was set to {10 min, 20 min, 30 min, 40 min, 50 min, 1 h, 2 h, 3 h}, and the truncation time on the Weibo dataset was set to {2 h, 4 h, 6 h, 8 h, 10 h, 12 h, 24 h, 36 h}.
The results of the early detection are shown in Figure 6.The models that used both structural and temporal features are marked with solid lines, while the models that only used structural features are represented with dashed lines.It can be observed that on all datasets, as time progressed, our model generally outperformed the other baseline models.Additionally, it can be noted that the models using temporal features exhibited a stronger dependence on temporal information, with significant fluctuations in identification accuracy as time advanced.However, regardless of the model, both on the Twitter and Weibo datasets, optimal detection performance could be achieved after 1 h and 12 h, respectively.This indicates that all models could effectively identify early-stage rumors.

Comparison of Deep Graph Detection
In this section, 818 events (409 positive and 409 negative samples) with relatively long node path relationships from the Weibo dataset were selected to demonstrate the differences between SC-MGAT and GAT in deep graphing.These events were used to train and test SC-MGAT and GAT, and all models used the same training parameters and experimental settings.The results are shown in Figure 7, where GAT represents a single-layer GAT and 6-GAT denotes stacking six layers of GAT.From the results, it can be observed that simply stacking multiple layers of GAT led to a decrease in performance compared with a single layer of GAT.On the other hand, SC-MGAT outperformed both the single-layer GAT and the stacked 6-GAT in all evaluation metrics.This further validates the effectiveness of our proposed approach.
A c c .
P r e c .R e c .F 1 0 .8 8 0 .9 0 0 .9 2 0 .9 4 0 .9 6 0 .9 8 1 .0 0 In addition, we randomly selected a sample from the test set and applied multilayer message passing using SC-MGAT and GAT.Afterward, for the nodes that had undergone message passing, the cosine similarity between each node and all other nodes was computed.The final similarity of each node in the graph was obtained by summing up its cosine similarities with all other nodes.Figures 8 and 9 illustrate the visualization results, where nodes with higher similarity correspond to colors closer to deep red.From the visualization results, it can be observed that the traditional GAT exhibited oversmoothing phenomena after only three layers of message passing.This is consistent with the experimental results that showed a performance decrease when stacking multiple layers of GAT.In contrast, SC-MGAT maintained a higher level of node discrimination even after six layers of message passing, effectively avoiding the problem of oversmoothing.

Conclusions
This paper proposes the SMGaDTP based on multi-hop graphing and differential time series, which consists of two parallel parts: SC-MGAT and DTP.SC-MGAT learns the structural features of rumors, while DTP learns the temporal features of rumors.We extensively tested the proposed model on widely used real-world Weibo and Twitter datasets, and the encouraging results demonstrated the effectiveness of our proposed model.This indicates that high-order structural information and differential temporal information can serve as effective features for rumor identification.Meanwhile, the ablation study demonstrates the respective contributions of each component of the model.Early detection indicates the model's capability to effectively recognize early-stage rumors.A comparison of SC-MGAT and GAT on the deep graph further confirmed their significant improvements in addressing the oversmoothing issue.
However, this study also has some limitations.First, the SC-MGAT module is built upon the homogeneous graph and has not been extended to the heterogeneous graph, limiting the model's ability to extract richer heterogeneous information.Secondly, it only models static graphs, making it challenging to capture the dynamic changes in event propagation structures over time.As for the DTP module, it focuses solely on temporal differences, neglecting the structural aspects.In the future, it can be expanded and integrated into the structural dimension to achieve a more comprehensive understanding.Lastly, concerning the fusion of temporal and structural features, this paper only employed a simple fully connected layer, which may have resulted in insufficient feature integration.Future works can explore more advanced methods, such as attention mechanisms and CNNs, to achieve a more comprehensive fusion of spatiotemporal features.These innovative approaches could potentially lead to improved performance and better understanding of the underlying dynamics in the data.

•
This paper proposes a novel SMGaDTP model for rumor detection tasks.Compared with previous works, this model has the capability to simultaneously learn both the deep structural features and temporal features of rumors.• The SC-MGAT is proposed in this paper, which builds upon the multi-hop graph and incorporates an enhanced message-passing framework to aggregate extensive neighborhood information.Additionally, a self-connected readout mechanism is introduced to achieve hierarchical extraction of global information.• DTP is proposed in this paper, which models events from the perspective of differential time series to characterize the temporal variations of events.Based on this, a novel local window attention mechanism and GRU are employed to learn temporal features.• Extensive experiments on real-world datasets demonstrate that the proposed methods outperform the previous state-of-the-art approaches.Further experiments also indicate that the SC-MGAT exhibits a significant improvement over traditional GNNs in addressing the oversmoothing problem.The outline of this paper is as follows.Section 2 presents an overview of the relevant previous work.Section 3 provides a formalized description of the proposed problem.Section 4 provides a comprehensive introduction to the proposed model.Section 5 conducts extensive experiments to analyze the effectiveness of the model.Section 6 summarizes the findings and discusses the limitations of this paper.

Figure 1 .
(a) A sample from the real-world Weibo dataset representing the propagation process of a specific rumor event.(b) A sample from the Les Misérables Co-Occurrence Network dataset, constructed based on Victor Hugo's novel Les Misérables, representing the network graph of character relationships.

Figure 3 .
Figure 3.The workflow of the SC-MGAT (taking the example of ζ(l) = l).

•
SVM-RBF[5]: A method based on SVM with a radial basis function (RBF) kernel.It utilizes a range of statistical features from tweets to identify fake news.• SVM-TS [6]: A linear SVM-based classifier that employs time series modeling techniques to capture the temporal features of rumors.• GCN [59]: A graph representation learning method that uses message passing to aggregate information from neighboring nodes for feature extraction.• GAT [49]: An advanced graph representation learning framework.Similar to a GCN, it incorporates attention mechanisms to differentiate the importance of different nodes.• BU-RvNN [9]: A rumor classification method based on bottom-up recursive neural networks.It integrates text content and propagation structure features using GRUs and performs classification based on the state of the root node.• TD-RvNN [9]: A rumor classification method based on top-down recursive neural networks.It integrates text content and propagation structure features using GRUs and performs classification based on the state of the leaf nodes.• STS-NN [13]: A rumor detection method based on deep spatiotemporal neural networks.It integrates rumor propagation and temporal features within a GRU-like unit for learning.•

Figure 5 .
(a) Results of ablation experiments on Twitter dataset.(b) Results of ablation experiments on Weibo dataset.

Figure 6 .
e c t i o n D e a d l i n e ( M i n u t e s ) e c t i o n D e a d l i n e ( H o u r s ) (a) Results of early detection on Twitter dataset.(b) Results of early detection on Weibo dataset.

Figure 8 .Figure 9 .
Node similarity after message passing with 6-GAT, where (a-f) represent the results from the first layer to the sixth layer, respectively.Node similarity after message passing with SC-MGAT, where (a-f) represent the results from the first layer to the sixth layer, respectively.

Table 1 .
Summary of previous work.

Table 3 .
Types of features used by different algorithms.

Table 4 .
Rumor detection results of all algorithms.
while the w/o DTP variant performed the best.This suggests that the structural features of rumors are more significant than temporal features, and the SC-MGAT module effectively captured the structural features of the rumors.•Bycomparing the w/o DTP and w/o DTP + SCA variants, it can be inferred that the SCA module plays an important role in SC-MGAT, enhancing its ability to learn high-order neighbor information.