A Novel Hybrid Deep Learning Model for Detecting COVID-19-Related Rumors on Social Media Based on LSTM and Concatenated Parallel CNNs

: Spreading rumors in social media is considered under cybercrimes that affect people, societies, and governments. For instance, some criminals create rumors and send them on the internet, then other people help them to spread it. Spreading rumors can be an example of cyber abuse, where rumors or lies about the victim are posted on the internet to send threatening messages or to share the victim’s personal information. During pandemics, a large amount of rumors spreads on social media very fast, which have dramatic effects on people’s health. Detecting these rumors manually by the authorities is very difﬁcult in these open platforms. Therefore, several researchers conducted studies on utilizing intelligent methods for detecting such rumors. The detection methods can be classiﬁed mainly into machine learning-based and deep learning-based methods. The deep learning methods have comparative advantages against machine learning ones as they do not require preprocessing and feature engineering processes and their performance showed superior enhancements in many ﬁelds. Therefore, this paper aims to propose a Novel Hybrid Deep Learning Model for Detecting COVID-19-related Rumors on Social Media (LSTM–PCNN). The proposed model is based on a Long Short-Term Memory (LSTM) and Concatenated Parallel Convolutional Neural Networks (PCNN). The experiments were conducted on an ArCOV-19 dataset that included 3157 tweets; 1480 of them were rumors (46.87%) and 1677 tweets were non-rumors (53.12%). The ﬁndings of the proposed model showed a superior performance compared to other methods in terms of accuracy, recall, precision, and F-score.


Introduction
One of the cybercrimes that has recently occurred is the spreading of rumors on social media. According to [1], people help criminals spread false rumors due to the the insufficient credibility of governments and mainstream media. Some web media use attractive titles and some even spread unconfirmed rumors so that people cannot clearly distinguish between facts and rumors. For example, during pandemics, rumors often spread on social networks in terms of those who are suspected of infection [2]. Currently, several infectious diseases and pandemics have so far emerged and they are by nature rapidly evolving. In this context, COVID-19 is considered as one of the rapidly spreading pandemics through which almost all countries around the world have been infected. In Saudi Arabia, since the first outbreak of COVID-19 epidemic on 2 March 2020,

•
A new LSTM-PCNN architecture is proposed and extensive experiments are presented to demonstrate the performance of the proposed model.

•
The impact of word embedding layers is investigated in order to select the appropriate scheme. For this purpose, we investigated the influence of static word embeddings such as word2vec, GloVe, and FastText on the proposed model.
The organization of the paper is as follows: In Section 2, we review the state-of-the-art techniques that address the rumor detection problem. Section 3 presents the architecture of the proposed model. In Section 4, the methodology of this study is described; the used dataset, preprocessing methods, evaluation metrics, and experimental design, and evaluation were highlighted. Section 5 gives the details of the experimental results to highlight our contribution. Section 6 concludes the whole paper by summarizing the contributions.

Related Studies
In the following subsections, we briefly present some of the notable works published during the COVID-19 pandemic that used ML and DL methods to detect COVID-19 related misinformation and rumors. We also report and summarize the results and limitations. This section also gives the reader the necessary background to understand the main characteristics of DL models that are investigated in this paper.

Rumor Detection Approaches
During the outbreak of COVID-19, especially when countries around the world began implementing a ban and a full lockdown, a wave of panic spread rapidly among citizens. Due to this, the WHO emphasized the need to fight against misinformation related to the virus and the methods of treatment [12]. To achieve this, the health authorities paid attention to debunking such rumors. However, verifying all these rumors required much human effort. Some researchers suggested developing AI techniques to fight against COVID-19 related misinformation/rumors on social media. Below are some of the notable works that used this approach.
Chen [13] embedded the pre-trained model of BERT with TextCNN and TextRNN models. The proposed model was trained on data with 3737 rumors collected from different Chinese platforms. The results showed that the proposed BERT model outperformed the other methods. In addition, all three models showed good results and could be used to defeat the COVD-19 related rumors.
Alqurashi et al. [14] conducted an extensive experiment on a dataset of COVID-19 misinformation written in Arabic spread on Twitter. The n-gram TF-IDF, word level TF-IDF feature representation, word2vec, and FastText word embedding were employed with several traditional ML and DL methods. As traditional ML methods, the random forest classifier, XGB, naïve Bayes, SGD, and SVM were investigated. In addition, the CNN, bi-LSTM, and CRNN models were used as DL methods. The findings showed that the TF-IDF word level performed well when employed with traditional ML methods comparing with n-gram TF-IDF. The FastText produced better results with ML methods and the CNN. The word2vec produced some improvement with CNN before optimizing the AUC score, while the RNN benefits more after optimizing the AUC. In [15] Wang et al. suggested combining text content [16], propagation patterns [17], and user feedback. They also analyzed the influence of these combinations on a deep attention model. The proposed model was tested on a set of publicly available datasets. They reported that when they tried to re-obtain the contents of some tweets, about of fifteen percent Twitter data has been lost. The results showed that this approach slightly improved rumor detection in the propagation cycle and achieved a good result with 94.2% accuracy.
In [18], Alsudias and Rayson collected around one million Arabic tweets related to COVID-19. Their aim was not only to detect rumors, but also to identify topics discussed during the period and to find the source of such rumors. For conducting rumor detection, the authors sampled only 2000 tweets and labeled them manually. After that, SVM, LR, and NB classifiers were used to distinguish rumor tweets from non-rumors. The highest achieved accuracy was 84.03%, which was achieved by LR with count vector and SVM with TF-IDF. They also examined the influence of word2vec and FastText on the classifiers' performance. They reported that applying the word embedding approaches did not impact positively on the classifiers' performance.
Apart from Coronavirus related rumors, a large amount of studies can be found in literature that addressed rumor detection via social media in general such as [3,5]. The subsection below briefly presents some DL techniques that are intensively used for detecting rumors via OSN.
A summary of the main existing methods on using machine learning and deep learning methods for detecting rumors is shown in Table 1. The proposed model extends the existing methods in the literature by proposing a new LSTM-PCNN architecture and conducting extensive experiments to demonstrate the performance of the proposed model. In addition, this study investigated the impact of word embedding layers to select the appropriate scheme such as the influence of static word embeddings (word2vec, GloVe, and FastText) on the proposed model. As a result, the proposed model provided interesting results and outperformed the other investigated models in terms of accuracy, recall, precision, and F-score.

Deep Learning Techniques
Today, detecting rumors on OSN has gained a significant improvement due to applying DL. According to [19], the main advantage of DL-based techniques is that they do not require any feature engineering. The DL classifier extracts and obtains the useful features directly from the entered data during the training phase. Since there are many proposed DL models, we focused on the models that will be used in this paper to present the proposed model. First, we present an overview of the LSTM architecture and CNN. Then, the word embedding that we used as text representation is also presented in this section.

Long Short-Term Memory
Long Short-Term Memory (LSTM) networks are a special class of recurrent neural networks (RNNs). Since the original RNNs are unable to learn the dependency found in input data especially when the gap is large, LSTM, due to the proposed gate functions, could handle such a problem well [20]. In practice, the powerful learning capacity of the LSTM method makes it one of the most used DL architectures and has been widely used in many fields, such as sentiment analysis [15,21,22], question answering systems [23], sentence embedding [24], and text classification [25].
A typical LSTM has three main gates: an input gate, a forget gate, and an output gate. In addition to the gates, LSTM uses a cell memory state to decide which information to save or discard. Figure 1 shows the original LSTM which was proposed by [26]. The original LSTM has been modified by several researchers. Variations include LSTM without a forget gate, LSTM with a forget gate [27], LSTM with a peephole connection, the gated recurrent unit (GRU) [28], Stacked LSTM [29], and Bi-LSTM [30].

Convolutional Neural Network
The CNN is another type of DL architecture that has gained more attention in the last few years. The CNN is an unsupervised multilayer feed-forward neural network. It consists of one input layer, one output layer, and the hidden layer that can include any combination of the convolutional layer, nonlinearity, pooling layer, fully connected layer, and regularization. Figure 2 illustrates a typical CNN architecture for binary rumor detection.
The CNN has been proven to perform effectively in image classification. Researchers found it a powerful method also in the natural language processing field, such as text classification [31][32][33]. In [3], authors investigated the influence of CNN on rumor detection task. They found that the CNN can capture rumor features well when the hidden layer is tuned gradually.

−
Convolutional layer: For textual data, a convolutional layer is connected to the input layer for extracting features around a particular window, ℎ, of words, , referred to as a filter. To capture the useful features, the filter slides across the data. The length of the filter is called the kernel size or window size. Once the features are extracted, the output is passed forward to the next layer. − Nonlinearity: Here, the goal is to include nonlinear properties in the network. The most used nonlinearity functions in CNN are tanh, sigmoid, and relu. Alsaeedi and Al-Sarem [3] found that the tanh activation function yielded better results compared to sigmoid and relu. Thus, in this paper, we followed their recommendation and empirically assessed the results. − Pooling layer: Often, the convolutional layer generates feature maps with high dimensionality. Thus, the role of the pooling layer is to reduce the dimensionality by applying a function such as max pooling, average pooling, and stochastic pooling. − Regularization layers: Similar to the traditional ML methods, the DL also suffers from an overfitting problem. Regularization methods such as early stopping, dropout, and weight penalties are type of techniques that are used for reducing the testing error [34].

Convolutional Neural Network
The CNN is another type of DL architecture that has gained more attention in the last few years. The CNN is an unsupervised multilayer feed-forward neural network. It consists of one input layer, one output layer, and the hidden layer that can include any combination of the convolutional layer, nonlinearity, pooling layer, fully connected layer, and regularization. Figure 2 illustrates a typical CNN architecture for binary rumor detection.

Word Embeddings
Word embedding (WE) is a representation technique of a text where the words with the same meaning have a similar representation. Recently, there have been several word embeddings widely used in ML and DL models. In the literature, there are many pretrained WEs that can be categorized into two groups [10]: static representation models and contextual models. Word2vec, GloVe, and FastText are types of static WEs that can convert a text into vectors of meaningful representation.

−
Word2vec works as a language model [35], which is widely used for many NLP The CNN has been proven to perform effectively in image classification. Researchers found it a powerful method also in the natural language processing field, such as text classification [31][32][33]. In [3], authors investigated the influence of CNN on rumor detection task. They found that the CNN can capture rumor features well when the hidden layer is tuned gradually. − Convolutional layer: For textual data, a convolutional layer is connected to the input layer for extracting features around a particular window, h, of words, w, referred to as a filter. To capture the useful features, the filter slides across the data. The length of the filter is called the kernel size or window size. Once the features are extracted, the output is passed forward to the next layer. − Nonlinearity: Here, the goal is to include nonlinear properties in the network. The most used nonlinearity functions in CNN are tanh, sigmoid, and relu. Alsaeedi and Al-Sarem [3] found that the tanh activation function yielded better results compared to sigmoid and relu. Thus, in this paper, we followed their recommendation and empirically assessed the results. − Pooling layer: Often, the convolutional layer generates feature maps with high dimensionality. Thus, the role of the pooling layer is to reduce the dimensionality by applying a function such as max pooling, average pooling, and stochastic pooling. − Regularization layers: Similar to the traditional ML methods, the DL also suffers from an overfitting problem. Regularization methods such as early stopping, dropout, and weight penalties are type of techniques that are used for reducing the testing error [34].

Word Embeddings
Word embedding (WE) is a representation technique of a text where the words with the same meaning have a similar representation. Recently, there have been several word embeddings widely used in ML and DL models. In the literature, there are many pretrained WEs that can be categorized into two groups [10]: static representation models and contextual models. Word2vec, GloVe, and FastText are types of static WEs that can convert a text into vectors of meaningful representation.
− Word2vec works as a language model [35], which is widely used for many NLP tasks.
In general, the word2vec embeddings can be obtained using either skip gram or common bag of words (CBOW) [36]. The skip-gram model computes the conditional probability of a word by predicting the surrounding context words given the central target word. The CBOW does the opposite of skip-gram, by computing the conditional probability of a target word give the context words surrounding it across a window of size k [23]. Mathematically, both CBOW (Equation (1)) and skip-gram (Equation (2)) models are trained as follows: log p(w t |w t−c , . . . , w t−1 , w t+1 , . . . , w t+c ) (1) where J is the loss function, [−c, c] is the word context of the target word w t , and Vvocabulary size. In this work, we used both models and the results of their influence on the proposed model was examined. The pre-trained word2vec word embeddings have 300 features which trained on 100 billion words. − GloVe is an unsupervised training "count-based" model [23]. Opposite to word2vec, the GloVe word embedding generates the embedding vector using word occurrences. Formally, the space vector is computed using a weighted least-squares method (Truşcǎ et al., 2020) as follows: where V is the vocabulary size and f X ij is a weighting function. The smallest package of the embedding is 822Mb, called "glove.6B.zip". The GloVe model is trained on a dataset having one billion words with a dictionary of 400 thousand words. There are different embedding vector sizes, with 50, 100, 200, and 300 dimensions for processing. In this paper, we used the 100 dimensional version. − Fast Text is an extension of the word2vec approach where the word embedding is represented using n-gram [37]. Once the word has been represented using n-grams, a skip-gram model or CBOW is trained to learn the embeddings. Today, the pre-trained FastText word vector supports 157 languages. The main parameters that need to be adjusted before using the FastText word embedding are the dimension and the range of subwords size. By default, the size of 100 dimensions is used. However, it is allowed to have a value in the 100-300 range. In this paper, we set the dimensionality of word embeddings to 300.

The Proposed Method
In this work, we propose a hybrid deep learning-based model LSTM-PCNN to detect rumors on Twitter. The proposed model hybridizes LSTM architecture with three parallel CNN models. The structure of the LSTM-PCNN model is shown in Figure 3.
where is the vocabulary size and f X ) is a weighting function. The smallest package of the embedding is 822Mb, called "glove.6B.zip". The GloVe model is trained on a dataset having one billion words with a dictionary of 400 thousand words. There are different embedding vector sizes, with 50, 100, 200, and 300 dimensions for processing. In this paper, we used the 100 dimensional version.

−
Fast Text is an extension of the word2vec approach where the word embedding is represented using n-gram [37]. Once the word has been represented using n-grams, a skip-gram model or CBOW is trained to learn the embeddings. Today, the pretrained FastText word vector supports 157 languages. The main parameters that need to be adjusted before using the FastText word embedding are the dimension and the range of subwords size. By default, the size of 100 dimensions is used. However, it is allowed to have a value in the 100−300 range. In this paper, we set the dimensionality of word embeddings to 300.

The Proposed Method
In this work, we propose a hybrid deep learning-based model LSTM-PCNN to detect rumors on Twitter. The proposed model hybridizes LSTM architecture with three parallel CNN models. The structure of the LSTM-PCNN model is shown in Figure 3.

Input Layer
There are several publicly available datasets. Table 2 presents the existing available publicly datasets. In this work, we used the ArCOV-19 dataset. The original dataset contains 95,000 tweets; out of them, 3612 tweets were annotated (the full description of the dataset is presented in Section 4.1). Since the maximum length of a tweet written in Arabic is 280 characters, the input layer was set to cover the maximum length as shown in the Appl. Sci. 2021, 11, 7940 8 of 17 input layer in Figure 3. Before feeding the tweet into the next layer, a set of preprocessing techniques were applied. The complete process is discussed in Section 4.2.

Embedding Layer
As shown in Figure 3, we employed three different pre-trained embedding layers, namely, word2vec, GloVe and Fast Text model. Each word embedding was fed separately into the LSTM layer. Table 3 shows the tuned hyper parameters of each used model. It is important to highlight that GloVe word embedding is a pre-trained model, while the other models are trained from the training data.

Long Short-Term Memory Layer
The output of the embedding layer was a vector with a predefined size in which the words per tweet w t were embedded. However, before feeding the output to the LSTM, we used a spatial dropout layer [38]. The spatial dropout layer has proven its benefit for improving the performance of CNN architecture [39] and avoiding overfitting in LSTM [40,41]. In this work, we suggest adding one spatial dropout layer before feeding the output into the LSTM layer. Table 4 presents the layered architecture of LSTM model.

Convolutional Neural Network Layer
As shown in Figure 3, the LSTM layer was followed by three parallel CNN layers. Each block generated a 150-dimensional vector F i t that indicated word features, where i is the number of CNN block and F t represents features obtained by each block. The configuration of each CNN block is presented as shown in Table 5.

Concatenation Layer
As described earlier, each CNN block generated a 150-dimensional vector. So, we concatenated each feature F t obtained by each block. As a result, we obtained a 450dimensional vector F. Thus, the vector F is given by:

Output Layer
Finally, the vector F was passed into the output layer. Since, the rumor detection task can be considered as a binary classification task, the vector F was passed into the Sigmoid function, which can take a value of either 0 or 1 as follows: where p is the possibility that the tweet is a rumor or non-rumor. The y is the classification result where y = 0 indicates that the tweet is non-rumor and y = 1 indicates a rumor tweet.

Experimental Design
In this paper, the experiments were conducted to evaluate the performance of the proposed LSTM-PCNN model. Therefore, we implemented two baseline DL-based models: (i) LSTM, and (ii) Parallel CNN. The experimental part of this work was performed on the Keras 2.2.4 API with TensorFlow backend using Python 3.6 with Windows 10 operating system. In addition, the used dataset, preprocessing methods, and the evaluation metrics are presented and explained in this section.

Data Sets
The ArCOV-19 dataset is a collection of Arabic tweets about the COVID-19 pandemic, considering the most common public dataset covering the period from 27 January to 30 April 2020. The Twitter API were used to collect the Arabic tweets based on manuallyentered queries targeting COVID-19 topics, including keywords such as "Corona," hashtags such as "#coronavirus," or phrases such as "COVID-19 pandemic." The search queries were customized to remove all retweets, avoid duplicate tweets, and return Arabic tweets only in chronological order. The ArCOV-19 dataset comprised 94K tweets. The original dataset contained 3612 tweets (Last access was on 10 March 2021. Thus, the number of collected tweets might have increased.). Since the ArCOV-19 dataset complied with the Twitter content redistribution policy, only the tweet IDs were published publicly. Therefore, the full object of tweets was obtained using the Hydrator tool to obtain tweets in JSON format for the given tweets' IDs. Due to the inaccessibility of some tweets (deleted tweets or deactivated accounts), the total number of tweets we retrieved was reduced to 3157 tweets, including 1480 rumors (46.87%) and 1677 non-rumors (53.12%). The dataset included several types of rumors related to COVID-19 such as social, political, health, and religious rumors. The main motivations for distributing these rumors were to provide health and social awareness and information about COVID-19. Some of these rumors were political and used to distribute misinformation against specific countries, while others tried to circulate rumors about the treatment of COVID-19 by taking the form of religious advice. By reviewing the rumors in this dataset, the majority of these rumors fell into the sociological and political types. Table 6 shows examples of these rumors.

Rumor in Arabic
Translated Text Type of Rumor in JSON format for the given tweets' IDs. Due to the inaccessibility of some tweets (deleted tweets or deactivated accounts), the total number of tweets we retrieved was reduced to 3,157 tweets, including 1,480 rumors (46.87%) and 1677 non-rumors (53.12%). The dataset included several types of rumors related to COVID-19 such as social, political, health, and religious rumors. The main motivations for distributing these rumors were to provide health and social awareness and information about COVID-19. Some of these rumors were political and used to distribute misinformation against specific countries, while others tried to circulate rumors about the treatment of COVID-19 by taking the form of religious advice. By reviewing the rumors in this dataset, the majority of these rumors fell into the sociological and political types. Table 6 shows examples of these rumors.

Data Preprocessing
Several preprocessing steps were performed to prepare the tweets' texts before feeding them into the embedding layer and the proposed deep learning classification models. First, we handled URLs by replacing them with ‫"ﺭﺍﺑﻂ"‬ meaning "hyperlink, mention character (@) removal, hashtag character (#) removal, handling words with repeating characters, numbers removal, and emoticon handling by replacing positive emoticons with ‫'ﺇﻳﺠﺎﺑﻲ'‪a‬‬ meaning 'positive' word and negative emoticons with a ‫'ﺳﻠﺒﻲ'‬ meaning 'negative' word. In addition, we removed punctuation and additional white spaces. We also normalized non-Arabic letters by converting them into Arabic using manually crafted translator. After that, we utilized the PyArabic library to normalize both "hamza" and ligature, and to strip both "tatweel" and "tashkeel". In addition to the above steps, we performed the stemming process using the snowball stemmer and removed stop words from the text. Figure 4 shows an illustrative example of a tweet after applying some of the preprocessing techniques.
An Indonesian village uses ghosts to force people to stay indoors during the Corona virus Social tweets or deactivated accounts), the total number of tweets we retrieved was reduced to 3,157 tweets, including 1,480 rumors (46.87%) and 1677 non-rumors (53.12%). The dataset included several types of rumors related to COVID-19 such as social, political, health, and religious rumors. The main motivations for distributing these rumors were to provide health and social awareness and information about COVID-19. Some of these rumors were political and used to distribute misinformation against specific countries, while others tried to circulate rumors about the treatment of COVID-19 by taking the form of religious advice. By reviewing the rumors in this dataset, the majority of these rumors fell into the sociological and political types. Table 6 shows examples of these rumors.

Data Preprocessing
Several preprocessing steps were performed to prepare the tweets' texts before feeding them into the embedding layer and the proposed deep learning classification models. First, we handled URLs by replacing them with ‫"ﺭﺍﺑﻂ"‬ meaning "hyperlink, mention character (@) removal, hashtag character (#) removal, handling words with repeating characters, numbers removal, and emoticon handling by replacing positive emoticons with ‫'ﺇﻳﺠﺎﺑﻲ'‪a‬‬ meaning 'positive' word and negative emoticons with a ‫'ﺳﻠﺒﻲ'‬ meaning 'negative' word. In addition, we removed punctuation and additional white spaces. We also normalized non-Arabic letters by converting them into Arabic using manually crafted translator. After that, we utilized the PyArabic library to normalize both "hamza" and ligature, and to strip both "tatweel" and "tashkeel". In addition to the above steps, we performed the stemming process using the snowball stemmer and removed stop words from the text. Figure 4 shows an illustrative example of a tweet after applying some of the preprocessing techniques.
A fatal epidemic every 100 years and a dangerous secret in number 20 Social tweets or deactivated accounts), the total number of tweets we retrieved was reduced to 3,157 tweets, including 1,480 rumors (46.87%) and 1677 non-rumors (53.12%). The dataset included several types of rumors related to COVID-19 such as social, political, health, and religious rumors. The main motivations for distributing these rumors were to provide health and social awareness and information about COVID-19. Some of these rumors were political and used to distribute misinformation against specific countries, while others tried to circulate rumors about the treatment of COVID-19 by taking the form of religious advice. By reviewing the rumors in this dataset, the majority of these rumors fell into the sociological and political types. Table 6 shows examples of these rumors.

Data Preprocessing
Several preprocessing steps were performed to prepare the tweets' texts before feeding them into the embedding layer and the proposed deep learning classification models. First, we handled URLs by replacing them with ‫"ﺭﺍﺑﻂ"‬ meaning "hyperlink, mention character (@) removal, hashtag character (#) removal, handling words with repeating characters, numbers removal, and emoticon handling by replacing positive emoticons with ‫'ﺇﻳﺠﺎﺑﻲ'‪a‬‬ meaning 'positive' word and negative emoticons with a ‫'ﺳﻠﺒﻲ'‬ meaning 'negative' word. In addition, we removed punctuation and additional white spaces. We also normalized non-Arabic letters by converting them into Arabic using manually crafted translator. After that, we utilized the PyArabic library to normalize both "hamza" and ligature, and to strip both "tatweel" and "tashkeel". In addition to the above steps, we performed the stemming process using the snowball stemmer and removed stop words from the text. Figure 4 shows an illustrative example of a tweet after applying some of the preprocessing techniques.
The cause of the spread of the Corona virus is due to the spread of Chinese 5G networks Political tweets or deactivated accounts), the total number of tweets we retrieved was reduced to 3,157 tweets, including 1,480 rumors (46.87%) and 1677 non-rumors (53.12%). The dataset included several types of rumors related to COVID-19 such as social, political, health, and religious rumors. The main motivations for distributing these rumors were to provide health and social awareness and information about COVID-19. Some of these rumors were political and used to distribute misinformation against specific countries, while others tried to circulate rumors about the treatment of COVID-19 by taking the form of religious advice. By reviewing the rumors in this dataset, the majority of these rumors fell into the sociological and political types. Table 6 shows examples of these rumors.

Data Preprocessing
Several preprocessing steps were performed to prepare the tweets' texts before feeding them into the embedding layer and the proposed deep learning classification models. First, we handled URLs by replacing them with ‫"ﺭﺍﺑﻂ"‬ meaning "hyperlink, mention character (@) removal, hashtag character (#) removal, handling words with repeating characters, numbers removal, and emoticon handling by replacing positive emoticons with ‫'ﺇﻳﺠﺎﺑﻲ'‪a‬‬ meaning 'positive' word and negative emoticons with a ‫'ﺳﻠﺒﻲ'‬ meaning 'negative' word. In addition, we removed punctuation and additional white spaces. We also normalized non-Arabic letters by converting them into Arabic using manually crafted translator. After that, we utilized the PyArabic library to normalize both "hamza" and ligature, and to strip both "tatweel" and "tashkeel". In addition to the above steps, we performed the stemming process using the snowball stemmer and removed stop words from the text. Figure 4 shows an illustrative example of a tweet after applying some of the preprocessing techniques.
In Ecuador, due to the collapse of the health system and the lack of resources, as well as the corruption and weakness of officials, the corpses of people infected with the Corona virus are thrown in the streets and in hospitals, they are placed in rubbish bags and thrown to the point that crows and birds are eating them Political 3,157 tweets, including 1,480 rumors (46.87%) and 1677 non-rumors (53.12%). The dataset included several types of rumors related to COVID-19 such as social, political, health, and religious rumors. The main motivations for distributing these rumors were to provide health and social awareness and information about COVID-19. Some of these rumors were political and used to distribute misinformation against specific countries, while others tried to circulate rumors about the treatment of COVID-19 by taking the form of religious advice. By reviewing the rumors in this dataset, the majority of these rumors fell into the sociological and political types. Table 6 shows examples of these rumors.

Data Preprocessing
Several preprocessing steps were performed to prepare the tweets' texts before feeding them into the embedding layer and the proposed deep learning classification models. First, we handled URLs by replacing them with ‫"ﺭﺍﺑﻂ"‬ meaning "hyperlink, mention character (@) removal, hashtag character (#) removal, handling words with repeating characters, numbers removal, and emoticon handling by replacing positive emoticons with ‫'ﺇﻳﺠﺎﺑﻲ'‪a‬‬ meaning 'positive' word and negative emoticons with a ‫'ﺳﻠﺒﻲ'‬ meaning 'negative' word. In addition, we removed punctuation and additional white spaces. We also normalized non-Arabic letters by converting them into Arabic using manually crafted translator. After that, we utilized the PyArabic library to normalize both "hamza" and ligature, and to strip both "tatweel" and "tashkeel". In addition to the above steps, we performed the stemming process using the snowball stemmer and removed stop words from the text. Figure 4 shows an illustrative example of a tweet after applying some of the preprocessing techniques.
Gargling with salt and cleaning the nose is an effective way to protect you from the Corona virus Health 3,157 tweets, including 1,480 rumors (46.87%) and 1677 non-rumors (53.12%). The dataset included several types of rumors related to COVID-19 such as social, political, health, and religious rumors. The main motivations for distributing these rumors were to provide health and social awareness and information about COVID-19. Some of these rumors were political and used to distribute misinformation against specific countries, while others tried to circulate rumors about the treatment of COVID-19 by taking the form of religious advice. By reviewing the rumors in this dataset, the majority of these rumors fell into the sociological and political types. Table 6 shows examples of these rumors.

Data Preprocessing
Several preprocessing steps were performed to prepare the tweets' texts before feeding them into the embedding layer and the proposed deep learning classification models. First, we handled URLs by replacing them with ‫"ﺭﺍﺑﻂ"‬ meaning "hyperlink, mention character (@) removal, hashtag character (#) removal, handling words with repeating characters, numbers removal, and emoticon handling by replacing positive emoticons with ‫'ﺇﻳﺠﺎﺑﻲ'‪a‬‬ meaning 'positive' word and negative emoticons with a ‫'ﺳﻠﺒﻲ'‬ meaning 'negative' word. In addition, we removed punctuation and additional white spaces. We also normalized non-Arabic letters by converting them into Arabic using manually crafted translator. After that, we utilized the PyArabic library to normalize both "hamza" and ligature, and to strip both "tatweel" and "tashkeel". In addition to the above steps, we performed the stemming process using the snowball stemmer and removed stop words from the text. Figure 4 shows an illustrative example of a tweet after applying some of the preprocessing techniques.

Trump suggests using antiseptics to inject Corona patients! Health
3,157 tweets, including 1,480 rumors (46.87%) and 1677 non-rumors (53.12%). The dataset included several types of rumors related to COVID-19 such as social, political, health, and religious rumors. The main motivations for distributing these rumors were to provide health and social awareness and information about COVID-19. Some of these rumors were political and used to distribute misinformation against specific countries, while others tried to circulate rumors about the treatment of COVID-19 by taking the form of religious advice. By reviewing the rumors in this dataset, the majority of these rumors fell into the sociological and political types. Table 6 shows examples of these rumors.

Data Preprocessing
Several preprocessing steps were performed to prepare the tweets' texts before feeding them into the embedding layer and the proposed deep learning classification models. First, we handled URLs by replacing them with ‫"ﺭﺍﺑﻂ"‬ meaning "hyperlink, mention character (@) removal, hashtag character (#) removal, handling words with repeating characters, numbers removal, and emoticon handling by replacing positive emoticons with ‫'ﺇﻳﺠﺎﺑﻲ'‪a‬‬ meaning 'positive' word and negative emoticons with a ‫'ﺳﻠﺒﻲ'‬ meaning 'negative' word. In addition, we removed punctuation and additional white spaces. We also normalized non-Arabic letters by converting them into Arabic using manually crafted translator. After that, we utilized the PyArabic library to normalize both "hamza" and ligature, and to strip both "tatweel" and "tashkeel". In addition to the above steps, we performed the stemming process using the snowball stemmer and removed stop words from the text. Figure 4 shows an illustrative example of a tweet after applying some of the preprocessing techniques.
Ablution is what protects a person from infectious diseases Religious

Data Preprocessing
Several preprocessing steps were performed to prepare the tweets' texts before feeding them into the embedding layer and the proposed deep learning classification models. First, we handled URLs by replacing them with 3,157 tweets, including 1,480 rumors (46.87%) and 1677 non-rumors (53.12%). The dataset included several types of rumors related to COVID-19 such as social, political, health, and religious rumors. The main motivations for distributing these rumors were to provide health and social awareness and information about COVID-19. Some of these rumors were political and used to distribute misinformation against specific countries, while others tried to circulate rumors about the treatment of COVID-19 by taking the form of religious advice. By reviewing the rumors in this dataset, the majority of these rumors fell into the sociological and political types. Table 6 shows examples of these rumors.

Data Preprocessing
Several preprocessing steps were performed to prepare the tweets' texts before feeding them into the embedding layer and the proposed deep learning classification models. First, we handled URLs by replacing them with ‫"ﺭﺍﺑﻂ"‬ meaning "hyperlink, mention character (@) removal, hashtag character (#) removal, handling words with repeating characters, numbers removal, and emoticon handling by replacing positive emoticons with ‫'ﺇﻳﺠﺎﺑﻲ'‪a‬‬ meaning 'positive' word and negative emoticons with a ‫'ﺳﻠﺒﻲ'‬ meaning 'negative' word. In addition, we removed punctuation and additional white spaces. We also normalized non-Arabic letters by converting them into Arabic using manually crafted translator. After that, we utilized the PyArabic library to normalize both "hamza" and ligature, and to strip both "tatweel" and "tashkeel". In addition to the above steps, we performed the stemming process using the snowball stemmer and removed stop words from the text. Figure 4 shows an illustrative example of a tweet after applying some of the preprocessing techniques.
meaning "hyperlink, mention character (@) removal, hashtag character (#) removal, handling words with repeating characters, numbers removal, and emoticon handling by replacing positive emoticons with a included severa religious rumor health and socia political and us tried to circulate advice.

Data Preproc
Several pre ing them into th First, we handle acter (@) remov ters, numbers r ‫'ﺇﻳﺠﺎﺑﻲ'‪a‬‬ meaning word. In additi malized non-Ar lator. After that and to strip bot the stemming p Figure 4 shows techniques.
meaning 'positive' word and negative emoticons with a 3,157 tweets, including 1,480 rumors (46.87%) and 1677 non-rumors (53.12%). The dataset included several types of rumors related to COVID-19 such as social, political, health, and religious rumors. The main motivations for distributing these rumors were to provide health and social awareness and information about COVID-19. Some of these rumors were political and used to distribute misinformation against specific countries, while others tried to circulate rumors about the treatment of COVID-19 by taking the form of religious advice. By reviewing the rumors in this dataset, the majority of these rumors fell into the sociological and political types. Table 6 shows examples of these rumors.

Data Preprocessing
Several preprocessing steps were performed to prepare the tweets' texts before feeding them into the embedding layer and the proposed deep learning classification models. First, we handled URLs by replacing them with ‫"ﺭﺍﺑﻂ"‬ meaning "hyperlink, mention character (@) removal, hashtag character (#) removal, handling words with repeating characters, numbers removal, and emoticon handling by replacing positive emoticons with ‫'ﺇﻳﺠﺎﺑﻲ'‪a‬‬ meaning 'positive' word and negative emoticons with a ‫'ﺳﻠﺒﻲ'‬ meaning 'negative' word. In addition, we removed punctuation and additional white spaces. We also normalized non-Arabic letters by converting them into Arabic using manually crafted translator. After that, we utilized the PyArabic library to normalize both "hamza" and ligature, and to strip both "tatweel" and "tashkeel". In addition to the above steps, we performed the stemming process using the snowball stemmer and removed stop words from the text. Figure 4 shows an illustrative example of a tweet after applying some of the preprocessing techniques.
meaning 'negative' word. In addition, we removed punctuation and additional white spaces. We also normalized non-Arabic letters by converting them into Arabic using manually crafted translator. After that, we utilized the PyArabic library to normalize both "hamza" and ligature, and to strip both "tatweel" and "tashkeel". In addition to the above steps, we performed the stemming process using the snowball stemmer and removed stop words from the text. Figure 4 shows an illustrative example of a tweet after applying some of the preprocessing techniques.

Evaluation Metrics
To evaluate the performance of the proposed model, the following performance measures were used: classification accuracy, precision, recall, and F1 score. In addition, we present the confusion matrix per each fold (refer to Table 7 for more details). These measures are commonly used by researchers to evaluate the performance of a rumor detection system. In order to precisely assess the proposed method, all the conducted experiments were validated using fivefold cross-validation.

Experimental Results
The results shown in this section are the average value of each experiment that was repeated, as stated earlier, five times independently.

Evaluation of the Embeddings
To choose the appropriate embedding extractor in the LSTM-PCNN model, we applied different static word embedding models: word2vec, GloVe, and FastText. The structures of the other parts in the DL models remain unchanged. Later, in the next section, we examined the performance of adding more dense layers to the models under investigation. The performance of baselines models with different embeddings is shown in Table 8.

Evaluation Metrics
To evaluate the performance of the proposed model, the following performance measures were used: classification accuracy, precision, recall, and F1 score. In addition, we present the confusion matrix per each fold (refer to Table 7 for more details). These measures are commonly used by researchers to evaluate the performance of a rumor detection system. In order to precisely assess the proposed method, all the conducted experiments were validated using fivefold cross-validation.

Experimental Results
The results shown in this section are the average value of each experiment that was repeated, as stated earlier, five times independently.

Evaluation of the Embeddings
To choose the appropriate embedding extractor in the LSTM-PCNN model, we applied different static word embedding models: word2vec, GloVe, and FastText. The structures of the other parts in the DL models remain unchanged. Later, in the next section, we examined the performance of adding more dense layers to the models under investigation. The performance of baselines models with different embeddings is shown in Table 8. It is important to report that we have trained all the word embedding models (without finetuning) on AraCOV−19 data for the fairness of the experiment. As shown in Table 8, FastText outperformed both word2vec and GloVe. The PCNN model benefitted more from the Fast Text and GloVe embeddings compared to word2vec. However, LSTM showed an improvement when the word2vec skip-gram model is used. Therefore, at this stage it was difficult to decide which model we should use. For this reason, we investigated the impact of these word embeddings on the proposed LSTM-PCNN model. As shown in Figure 5, the proposed LSTM-PCNN, unlike expected, achieved the highest performance when the word2vec skip-gram model was used. In addition, comparing the proposed model with the other baseline models, LSTM-PCNN achieved the best result among all the models. Figures 6 and 7 present the structure of LSTM and PCNN models, respectively.  It is important to report that we have trained all the word embedding models (without finetuning) on AraCOV−19 data for the fairness of the experiment. As shown in Table  8, FastText outperformed both word2vec and GloVe. The PCNN model benefitted more from the Fast Text and GloVe embeddings compared to word2vec. However, LSTM showed an improvement when the word2vec skip-gram model is used. Therefore, at this stage it was difficult to decide which model we should use. For this reason, we investigated the impact of these word embeddings on the proposed LSTM-PCNN model. As shown in Figure 5, the proposed LSTM-PCNN, unlike expected, achieved the highest performance when the word2vec skip-gram model was used. In addition, comparing the proposed model with the other baseline models, LSTM-PCNN achieved the best result among all the models. Figures 6 and 7 present the structure of LSTM and PCNN models, respectively.

Evaluation of Adding more Dense Layers
Similar to what we completed in the previous section, the influence of adding dense layers on the performance of the implemented DL models was investigated. Thus, to evaluate the contributions of these layers to the models, we added them gradually in turn to the model. Tables 9 and 10 show the median values obtained by adding more layers to the proposed LSTM-PCNN. The rest of the results are in "Appendix A" (see Tables A1 and A2).

Conclusions
The paper proposed a novel hybrid deep learning model for detecting COVID-19related rumors on social media based on a long short-term memory and concatenated parallel convolutional neural networks (LSTM-PCNN). The conducted experiments used three static word embedding models, which are word2vec, GloVe, and FastText. The experimental results showed that the proposed LSTM-PCNN model achieved the highest performance when the word2vec skip-gram model was used, and it outperformed the other baseline models, where the obtained detection accuracy reached 86.37%. The experiments also investigated adding more dense layers to the architecture of the proposed model leads. It was found that, in most cases, this adding degraded the overall performance. Statistical analysis was conducted using the Mann-Whitney-Wilcoxon test and the Wilcoxon signedrank test and the findings showed that adding more "Dense layers" did not improve the performance of the proposed model. As the rumors have negative impact on the social and political aspects of many countries, the proposed model can help the health and other governmental authorities to automatically detect fake information about COVID-19 on social media and mitigate this impact. In future work, other datasets with Arabic tweets could be used, and other deep learning-based methods could be proposed and investigated to enhance the detection of health-related rumors in Arabic and other languages.

Conflicts of Interest:
The authors declare no conflict of interest.