Rumor Detection on Social Media via Fused Semantic Information and a Propagation Heterogeneous Graph

: Social media had a revolutionary impact because it provides an ideal platform for share information; however, it also leads to the publication and spreading of rumors. Existing rumor detection methods have relied on ﬁnding cues from only user-generated content, user proﬁles, or the structures of wide propagation. However, the previous works have ignored the organic combination of wide dispersion structures in rumor detection and text semantics. To this end, we propose KZWANG, a framework for rumor detection that provides sufﬁcient domain knowledge to classify rumors accurately, and semantic information and a propagation heterogeneous graph are symmetry fused together. We utilize an attention mechanism to learn a semantic representation of text and introduce a GCN to capture the global and local relationships among all the source microblogs, reposts, and users. An organic combination of text semantics and propagating heterogeneous graphs is then used to train a rumor detection classiﬁer. Experiments on Sina Weibo, Twitter15, and Twitter16 rumor detection datasets demonstrate the proposed model’s superiority over baseline methods. We also conduct an ablation study to understand the relative contributions of the various aspects of the method we proposed.


Introduction
With the rapid development of large-scale social network platforms such as Sina Weibo, Jinri Toutiao, and Tik Tok, rumor identification on social media has been a challenging topic. Rumors can spread and affect people's opinions due to the convenience of social media; however, rumors can cause significant harm to society and can result in huge economic losses. Therefore, to address the potential of rumors causing panic and threats, it is of high practical significance to propose a method that can efficiently identify rumors in social media content.
Previous research on automatic rumor detection has concentrated largely on extracting effective features from many different types information sources, including text content [1][2][3], user profiles [1,4], and propagation patterns [5][6][7]. However, the complexity and scale of social media data pose a sea of technical challenges. First, social media language is casual and informal, usually dynamic or ungrammatical; thus, traditional NLP techniques cannot be applied directly. Second, when using handcrafted features, there are many instances where individual or multiple handcrafted features are either unavailable, inadequate, or manipulated. Inspired by recent achievements in deep learning, the more recent studies have applied different neural networks to rumor detection tasks, including rumor detection itself [8], the identification of candidate rumors, and rumor verification [9,10]; the goal of the latter is to assess the veracity of a rumor. However, among different users and microblogs, these models ignore the information of global structural which has been verified to be useful in providing helpful clues for node classification [7].
Rumor structures spread also indicate that rumors have some particular spreading behaviors. Therefore, some studies have tried to include information regarding rumor structures dispersion by invoking CNN-based methods [11,12]. CNN-based methods can extract correlated characteristics among local neighbors but cannot deal with structural relationships which exist in trees or graphs [13]. Therefore, these approaches ignore rumor dispersion structural features. In fact, a CNN is not designed to study representations of high-level from structured data, but the graph convolutional network (GCN) can do [14]. Social networks are typically structured as heterogeneous graphs containing entities such as users, posts, geographic locations, hashtags, and these denote relationships such as followers, friendships, retweets, and spatial neighborhoods. Therefore, these heterogeneous networks provide new and different perspectives regarding the relationships among microblogs; thus, they contain rich information that can improve the rumor detection performance. However, most of the previous studies on rumor detection have treated each source microblog as independent, so that one does not affect another. Thus, they have not fully exploited the correlations between different node types.
Therefore, in this study, we introduce GCN to capture the structural information of spreading and diffusion in all the source microblogs, reposts, and users. To clarify our motivation, Figure 1 shows a global heterogeneous graph containing three source microblogs with corresponding user responses. In this example, two users (User 1 and User 2 ) have no friendship relations, they do not follow each other, but they both repost the same Weibo content weibo 1 . In addition, the three Weibo posts are not related in content, but weibo 2 and weibo 3 share neighbors are similar, which shows they probably have the same tags. We build a global heterogeneous graph over these observations to capture the local and global relationships between all the Weibo sources, reposts, and users. To achieve rumor detection, we first learn word embeddings from the text contents of microblogs posted by users. Then, we use a GCN to learn the representation of repost propagation and learn the representation of user interactions via graph-aware. Then, we construct a graph to model the potential interactions between users. In the method, we proposed symmetry fusing the semantic information and a propagation heterogeneous graph. The method we proposed also achieves much better effectiveness in early rumor detection, which is quite powerful to identify rumors and avoid rumor spreading.
The main contributions of this work are as follows: • We utilize a deep integration of rumor propagation along relationship chains and text semantic information via the heterogeneous network to detect rumors.

•
We apply multihead attention to fuse the local semantic information to generate a better-integrated representation for each microblog.

•
We concatenate the source microblog features with other microblogs at each graph convolutional layer to comprehensively use the root feature information and obtain excellent rumor detection performance.

Related Works
Automatic rumor detection aims to identify whether a microblog text on a social networking platform is a rumor via its related information, including comments, text content, communication mode, propagation patterns, etc. Ref. [15] has provided a survey of research into social media rumors. The related works that have proposed the most recent techniques to date can be grouped into three main categories: (1) deep learning methods, (2) graph neural networks, and (3) propagation tree-related methods.

Deep Learning Methods
Recent deep learning advances have successfully attained state-of-the-art performances on natural language processing tasks. Researchers have applied deep learning models to automatically learn efficient characteristics for detecting rumors. Ref. [16] presented an innovative method that learns successive representations of microblog events to identify rumors; the proposed model was based on a recurrent neural network (RNN), which learns hidden representations that capture the variations in the contextual information of relevant posts that occur over time. Ref. [9] proposed a novel model with a shared layer and two task-specific layers. They incorporated the information of user credibility into the rumor identification layer. In this study, we also introduce an attention mechanism to the rumor identification process. Ref. [17] utilized a specific task character based on the bidirectional language model and stacked LSTM to represent the text information and social temporal information of source tweets inputted to model the disseminate patterns of rumors during the development early stages. They also introduced multilayered attention models to jointly learn context embeddings via many context inputs. However, the above methods not only ignore the spread mode of microblog, but also model the diffusion path as a well-aligned structure, and fail to make the best of the diffusion information of microblog. Moreover, these methods focus little on the discovery of earlier rumors.

Graph Neural Network
Traditional deep learning methods only consider the patterns of deep propagation; they ignore the structures of wide dispersion during rumor detection. To address this challenge, a bi-directional graph convolutional network is proposed to explore these two features through top-down and bottom-up rumor spreading [18]. Their method is to use GCN and top-down rumor spreading maps to learn the pattern of rumor spreading, and use GCN with rumor spreading maps in opposite directions to capture the structure of rumor spreading. Ref. [19] proposed the graph-aware co-attention network (GCAN), which uses RNN and convolutional to learn a representation of repost diffusion via user features. Firstly, a graph is constructed to simulate the potential interaction between users, and then GCN is used to learn the graphical perception representation of user interaction. A new global-local attention network (GLAN) for detecting rumor is proposed [20], which encodes local semantics and global structure information jointly. In this study, we first fuse the semantic information of related tweets with the attention mechanism to produce a better-integrated representation for every content of the tweet. Then, we model the relationships in all source microblog, forwards, and users as heterogeneous graphs to obtain rich information of structural for identify rumors.

Propagation Tree Related Methods
Different from previous works that paid attention to microblog text content, the diffusion of tree-related methods pays attention to the differences in the features of real and fake information transference. Ref. [21] proposed to learn discriminative features from microblog posts via following their non-sequential diffusion structure to generate more strong representations for detecting rumors. They use a propagation tree to represent the spread of micro-blog posts, which provides useful clues as to how the claims in the original posts spread and develop over time. Ref. [8] utilized two recursive neural models based on top-down and bottom-up structured of tree neural networks to represent rumor learning and classification and tried to learn discriminative features from tweet content by following their nonsequential propagation structure and generating more helpful representations for detecting different types of rumors. Ref. [6,7,22] applied propagation tree methods to rumor detection, providing useful clues on how a microblog diffuses and develops over time.
Message proliferation ahead of online networking is essentially spread in the structure in the heterogeneous chart. In the heterogeneous graph, users convey or alternately forward messages to propagate quickly and widely. These techniques are dependent upon the proliferation trees used to investigate the contrasts in the structure of data transmission; however, it has not been considered for the connections around the distinctive proliferation trees.

Problem Statement
..m i } be a set of source microblogs and U = {u 1 , u 2 , u 3 , ...u i } be a set of users. Each source microblog m i consists of n repost microblogs {r 1 , r 2 , ..., r n }. For every microblog, we use the notation t i to denote the post time. W repost microblogs as neighbor nodes of the twitter or microblog in the heterogeneous graph, which are formulated as N (m i ) = [r 1 , r 2 , ..., r n ].
We aim to train a function P(c = 1|m i , N (m i ); U ; θ), it can predict a twitter or Sina Weibo is a rumor or not. Here, c is the label of class, and θ represents all the model parameters.

Methodology
The rumor detection model proposed in this paper consists of three main parts: microblog representation, representations of propagation and dispersion, and rumor detection. The module of microblog representation depicts microblogs mapping from word embedding to the space of semantic. Propagation and decentralized representations use a GCN to describe user propagation representations. The module of rumor detection train a function of classification, the function can predict the tags of the Twitter or Sina Weibo. As shown in Figure 2, it is the framework of KZWANG whose model we proposed. In the following, we introduce each major component in detail.

Microblog Representation
We utilize the multi-head attention [23] to represent microblog via learn context of Twitter or Sina Weibo. The module of multi-head attention uses three sentences of input: a sentence of query, a sentence of key, and a sentence of value, namely, Q ∈ R n q xd Q ∈ K n k xd , and V ∈ R n v xd , respectively, where n q , n k , and n v denote the amount of words in every sentence, and d is the embedding dimension. Attention module is the most significant module in the coding unit, and it can be defined as follows: By this means, the attention does catch dependencies among query sentences and key sentences and further use the relation information to assemble the components in the sentence of query and convert value sentences into component representations. To extend the model's ability to focus on different positions and improve the representation learning ability of the attention unit subspace, the transformer applies a "multi-head" mode that can be expressed as follows: We obtained microblog representations from word embeddings in the same way.

Construct Propagation and Dispersion Graphs
We aim to create a graph to model the potential interactions among users who repost the source story. The idea is that the correlations between users with specific features can reveal the feasibility that the Weibo post is a rumor. Specifically, the propagation graph contains the source Weibo node (called a "Weibo" hereafter), the publisher node of the source Weibo, and the repost node. The graph weight is calculated based on the time difference between the repost Weibo and the main Weibo, and the calculation method is 1 1+t . An edge represents a publishing relationship and a repost relationship. Then, all the nodes are uniformly numbered so that the adjacency matrix contains all the relationships.

Propagation and Dispersion Encoding
GCN is a multi-layer neural network, which processes graph data and generates embedded vectors of nodes according to their neighborhoods. GCN is able to catch information from a node direct and indirect neighbors via stacked hierarchical convolution. Many message propagation functions types, M, exist for GCN [13,24], where the message propagation function defined in ChebNet's [14] first-order approximation is as follows: In the above formula, the H k ∈ R n × vk is the hidden feature matrix, and M is the message diffusion function, and it relies on the adjacency matrix A, the hidden feature matrix H k−1 , and the trainable parameters W k−1 .Â is the normalized adjacency matrix, and σ(·) is an activation function.

Root Feature Enhancement
As is known to all, the original information of a rumor event usually contains a lot of information, so it can have a broad effect. It is important to make better use of the source thread info and learn more precise node representations from the relationship between source and nodes. Therefore, besides the hidden features from TD-GCN and BU-GCN, we introduce a root feature enhancement operation to enhance rumor detection performance, as shown in Figure 3, with H TD 0 = X. Therefore, we express TD-GCN using root feature enhancement via substituting H 1 in Equation (5) withĤ TD 1 = concat(H TD 1 , X root ), and then obtainĤ TD 2 as follows: Similarly, as shown in Figure 4, BU-GCN hidden feature measures based on root feature enhancement,Ĥ BU 1 andĤ BU 2 , are obtained in the same manner as Equations (6) and (7).

Rumor Classification
After the above procedures, we have obtained a text representation t j and a propagation representation p j . These two methods of representation are very important for rumor detection. Therefore, these two methods are related to each other to form the final characteristics for classification. Then, the final representation is projected into the target probabilistic space using the fully connected layer: where W ∈ R 2d×|c| is the weight parameter and b ∈ R is a bias term. Finally, for the optimization objective function, we adopt cross-entropy loss to identify rumor: where y i is the gold-standard probability of the class of rumor and θ represents all the model parameters.

Experiments
Firstly, we assess KZWANG model empirical performance and compare it with several baseline models. Then, we do ablation studies to explore the influence of the essential elements of the KZWANG model. Finally, we also test the ability of both KZWANG and the compared methods to perform early rumor detection. Replicating the experiment code is available at https://github.com/shanmon110/ rumordetection.

Datasets
The experiments on the KZWANG model has been done on three social media datasets: Twitter15, Twitter16 [7], and Sina Weibo [16], from the most popular social media sites in the United States and China, respectively. The Sina Weibo dataset is annotated with binary labels, either "false rumor" or "non-rumor". The Twitter16 datasets are each annotated with four tags, i.e., "false rumor" (FR), "non-rumor" (NR), "unverified rumor" (UR), and "true rumor" (TR). The label "true rumor" denotes a Twitter that tells people that a certain Twitter is a rumor. About every dataset, we can build heterogeneous graphs via the source tweet, the response tweet, and the relevant users. Table 1 shows summary statistics for the three datasets.

Baselines
We compare our method with the following baseline rumor detection models:  [26] Detection and ranking method of query phrase rumor based on decision tree. • GRU [16] Based on the RNN model, the temporal language pattern is studied from user comments. • RFC [27] A random forest classifier via utilizing linguistic, user, and structural features.
• PTK [7] An SVM classifier with diffusion tree kernel detects rumors by studying the time structure mode of propagation tree. • RvNN [28] A top-down and bottom-up model based on tree structure via recursive neural networks for fake news identified on Twitter. • PPC [29] A novel model for rumor detection by classifying propagation paths by a combination of recurrent and convolutional networks. • GLAN [20] A novel rumor detection model with global-local attention network (GLAN) that jointly encodes global structural information and the local semantic.

Setup
We randomly selected 10% of the instances as a validation dataset and split the rest among the training datasets and testing datasets at a ratio of 3:1 for all three datasets. In the model, all the word embeddings are initialized with 300-dimensional word vectors that trained on a corpora of a specific domain, and it is trained by the skip-gram algorithm. Uniformly distributed initialization is performed for words that do not exist in the pre-trained word vector set. During training, we keep word vectors trainable and for each task they can be fine-tuned. In the Twitter15/16 datasets, the words are segmented with spaces, while the words in the Weibo dataset are split by the Jieba library. We removed words that had more than two occurrences since those words might be the stop words. Our model was implemented using PyTorch, and introduced Adam algorithm [30] to update parameters, with the following parameters: β 1 and β 2 are 0.9 and 0.999, we initialized learning rate is 1 × 10 −3 and gradually decreases during the training process. We selected the best parameters on the base of model performance on the training dataset and evaluated the best parameters on the test datasets. The batch size of the training set was set to 64.

Evaluation Metrics
To evaluate the text classification performance of the model in this classification experiment, we introduced the precise rate (P), the recall rate (R), and the F1-metrics as the evaluation criteria. The calculation formulas for these indicators are as follows: In the above formula, TP represents the total number of sentences predicted as positive classes. FN means the total number of positive class sentences are predicted to be negative class. FP represents the total number of sentences whose negative class is forecasted to be positive label. TN means the total number of sentences predicted to be negative. Tables 2-4 show all compared models performance. The baseline models experimental results are cited from previous studies [20] directly due to compare fair.

Results and Analysis
From Tables 2-4, we can see that the proposed KZWANG performs better than all other baselines on the Twitter and Sina Weibo datasets. Specifically, our model KZWANG attains an accuracy of 95.0% on the Sina Weibo dataset and accuracies of 91.1% and 90.7% on the two Twitter datasets. These results indicate the adaptability of KZWANG on different types of datasets. In addition, these excellent results show that using a GCN heterogeneous graph models can effectively learn node representations using both semantic and structural information.
The performances of the models based on handcrafted features (DTR, DTC, RFC, SVM-RBF, and SVM-TS) gained evidently poor performance, showed they are unsuccessful in generalizing across the datasets due to their inability to capture sufficient helpful characteristics. In the above baselines, SVM-TS and RFC obtained a relatively better performance since they utilize extra structural characteristic or temporal features, Still, they clearly perform worse than models that do not rely on feature engineering.
Compared the two diffusion models based on tree structure, PTK depends on structural characteristic extracted from propagation trees and linguistic. The RvNN-based model is inherently structure based on tree, and it capitalizes on the representation learning following propagation structure-so, it outperforms PTK. However, while modeling the propagation process, these methods based on tree miss too much information since the information is via a graph structure to spread, not a tree structure.
Among the deep learning models, GRU and PPC perform better than the traditional classifiers that use handcrafted features. This result shows neural network models that can learn representative deep latent characteristics automatically. Meanwhile, we can also notice that PPC is more better effective than GRU, due to the following reasons: (1) GRU depends on temporal-linguistic mode, while PPC depends on the user fixed features of forward sequences. (2) the PPC model unions CNN and RNN to catch the difference of user features. Table 2. Experimental results on the Twitter15 dataset. The best and second-best results in each metric are bold and underlined, respectively.  Table 3. Experimental results on the Twitter16 dataset. The best and second-best results in each metric are bold and underlined, respectively.  In conclusion, KZWANG performs better than methods based on neural network and feature-based models, and its accuracy and precision on the rumor recognition task constitute improvements over previous models. Specifically, on the Weibo dataset, KZWANG improves the accuracy of the diffusion path classification model (the best baseline method) from 94.6% to 95.0%, and boosts the accuracy on the two Twitter datasets from 90.5% to 91.1% and from 90.2% to 90.7%, respectively. These results show the text information of semantic and the structure of propagation information are important to describe rumors and non-rumors difference.

Ablation Study
To determine the relative significance of each module in KZWANG, we conducted a series of ablation studies involving different modules of the model. The experimental results are shown in Figure 5. The ablation studies are conducted as follows: • GCN: This experiment explored the efficacy using an LSTM or a GCN to encode the propagation graph for rumor classification.

•
Attention: This experiment explored the efficacy of using an LSTM or an attention mechanism to extract text features from source tweets for rumor classification.

•
Only Text: This experiment removed the propagation and dispersion encoding modules and used only text information for rumor classification. Figure 5 shows the experimental results, we can observe the following. We first examined the influence of the GCN encoding module. We can see that replacing the GCN with an LSTM significantly affects the performance on all the datasets. The GCN captures the semantic relations between the source microblog and the corresponding repost microblogs; thus, these results show that it is critical to explicitly encode the propagation and dispersion relation.
Intuitively speaking, the attention mechanism causes the distance relevant semantic information to be close to each other, which leads to high cohesion and low coupling in the rumor and non-rumor groups, which improves the performance.
Finally, we evaluate the influence of the propagation heterogeneous graph encoding module. As shown in Figure 5, using only text results in significant performance declines on all the datasets.
In general, the performances obtained improve significantly after combining the text and propagation information, which shows that combining the information from these two aspects provides complementary effects from both local and global perspectives.

Early Detection
Early detection aims to detect rumors at their early propagation stages-as early as possible-allowing interventions to be made quickly [26].
To construct an early detection task, we instituted deadlines for detection and used only published posts before those deadlines to evaluate the accuracy of the KZWANG method and the benchmark models. By changing the time delay for forwarding, several competitive models accuracy is shown in Figures 6-8. In the first few hours, our KZWANG model using only data within 4 h of the source microblog already outperforms the classification models based on tree structure (the best baselines) using the full data, which clearly indicates the superior early detection ability of our model. In particular, KZWANG achieves an accuracy of 92% on Weibo, 90% on Twitter15, and 85% on Twitter16 within the initial 4-hour period, which is dramatically faster than other benchmark methods. As we varied the time delays change 4 to be 12 h, KZWANG experienced a light drop, still it is superior to the state-of-the-art (SOAT) methods. The reason is as microblog propagate, more information about semantic and structural is available. However, the noise will also increase. The results indicate that KZWANG is apathetic to data and it has fine robustness. The results of experimental on Twitter and Sina Weibo social media datasets show that KZWANG that we proposed model can significantly enhance the detection capability and be effectively discovered as early as possible at the same time.

Conclusions and Future Work
In this paper, we have proposed a rumor detection framework that combines text context semantic and propagate structural information to identify fake news and rumors. Different from most previous studies that use manually extracted features or deep learning automatic feature extraction methods, we fuse the contextual information from the source Weibo and corresponding reposts via multi-head attention, which generates a better semantic representation for each source Weibo. To capture the complex spread of information from different Weibo sources, we utilized a GCN to learn a heterogeneous graph constructed using propagate structural information to detect rumors. The extensive experiments conducted on real-world Weibo and Twitter datasets demonstrate the superiority of our proposed model compared to baseline models on rumor detection tasks.
Complex spread of information is an important feature of rumor, since constructing a heterogeneous graph which adequately represents propagation behavior is critical to the ability of GCN to learn propagation features. Graph neural network can better realize association mining across spatial clues, Combining hidden clues in target data such as attributes, structures, and behaviors, mining complete clue information is still an important research direction. At the same time, how to introduce a large amount of language knowledge and world knowledge accumulated by human beings into the rumor detection model is an important direction to improve the performance of the fake news detection deep learning model, and it also faces significant challenges.