An Improved Multiple Features and Machine Learning-Based Approach for Detecting Clickbait News on Social Networks

The widespread usage of social media has led to the increasing popularity of online advertisements, which have been accompanied by a disturbing spread of clickbait headlines. Clickbait dissatisfies users because the article content does not match their expectation. Detecting clickbait posts in online social networks is an important task to fight this issue. Clickbait posts use phrases that are mainly posted to attract a user’s attention in order to click onto a specific fake link/website. That means clickbait headlines utilize misleading titles, which could carry hidden important information from the target website. It is very difficult to recognize these clickbait headlines manually. Therefore, there is a need for an intelligent method to detect clickbait and fake advertisements on social networks. Several machine learning methods have been applied for this detection purpose. However, the obtained performance (accuracy) only reached 87% and still needs to be improved. In addition, most of the existing studies were conducted on English headlines and contents. Few studies focused specifically on detecting clickbait headlines in Arabic. Therefore, this study constructed the first Arabic clickbait headline news dataset and presents an improved multiple feature-based approach for detecting clickbait news on social networks in Arabic language. The proposed approach includes three main phases: data collection, data preparation, and machine learning model training and testing phases. The collected dataset included 54,893 Arabic news items from Twitter (after pre-processing). Among these news items, 23,981 were clickbait news (43.69%) and 30,912 were legitimate news (56.31%). This dataset was pre-processed and then the most important features were selected using the ANOVA F-test. Several machine learning (ML) methods were then applied with hyper-parameter tuning methods to ensure finding the optimal settings. Finally, the ML models were evaluated, and the overall performance is reported in this paper. The experimental results show that the Support Vector Machine (SVM) with the top 10% of ANOVA F-test features (user-based features (UFs) and content-based features (CFs)) obtained the best performance and achieved 92.16% of detection accuracy.


Introduction
Currently, social networks have become the main environment for communicating, sharing, and posting news on the Internet. Twitter, Facebook, and Instagram are the main social networks that are used to share our opinions and news. With this development, a huge amount of textual data are posted on these media, which increasingly become difficult to process manually. Although the social networks provide an easy way to express our opinions, this platform also can be used to share misinformation in the form of news and advertisements. This is a very serious issue, because this misinformation has the power to influence individuals and sway their opinions. Therefore, finding a way to protect users of social networks from the spread of this misinformation and develop a reliable mechanism to detect it is very important. This misinformation can take the form of clickbait, which aims at enticing the users into clicking a link to news items or advertisements, whose titles (headlines) do not completely reflect the inside contents. According to Chen et al. [1], clickbait is defined as "Content whose main purpose is to attract attention and encourage visitors to click on a link to a particular web page".
The automatic detection of clickbait headlines from the huge volume of news on social networks has become a difficult research issue in the field of data science. Some previous efforts have utilized machine learning to detect clickbait headlines automatically. For instance, Biyani et al. [2] applied Gradient Boosted Decision Trees (GBDT) on a dataset drawn from news sites such as Huffington Post, New York Times, CBS, Associated Press and Forbes. The dataset contains 1349 clickbait and 2724 non-clickbait webpages. The best results achieved were an F1-score of 61.9% with five-fold cross-validation for the clickbait class and an F1-score of 84.6% for the non-clickbait category. Potthast et al. [3] applied linear regression, Naïve Bayes, and random forest methods on a dataset gathered from Twitter. The dataset contained 2992 data points. The results recorded were relatively close, with an approximate precision of 75%.
Khater et al. [5] proposed the use of logistic regression and linear SVM. They extracted 28 features from a dataset provided by Bauhaus-Universität Weimar at the time of a clickbait detection challenge. The most commonly extracted features were Bag of Words (BOW), noun extraction, similarity, readability, and formality. The best results achieved were 79% and 78% precision for logistic regression and linear SVM respectively. Since the methods of the first category require extracting and labeling each feature before feeding the data into the machine learning tool, researchers have found that deep learning techniques are useful to overcome the feature engineering phase. For instance, López-Sánchez et al. [6] combined metric learning with a CNN deep learning algorithm by integrating them with case-based reasoning methodology. For feature selection, they used TF-IDF, n-gram, and 300 dimensional Word2Vect using the dataset provided by [4]. The proposed approach achieved average areas of 99.4%, 95%, and 90% under the ROC curve using Word2vec, TF-IDF, and n-gram count. Agrawal [7] also used a CNN model to classify a manually constructed news corpus obtained from Reddit, Facebook, and Twitter social networks into clickbait and non-clickbait. As feature selection methods, they used Click-Word2vec and Click-scratch. The highest results that they achieved were 89% accuracy with 87% ROC-AUC score for Click-scratch features and 90% when the Click-Word2vec was used.
Kaur et al. [8] also proposed a hybrid model where a CNN model is combined with LSTM. They found that the CNN-LSTM model when implemented with pre-trained GloVe embedding yields the best results, based on accuracy, recall, precision, and F1-score performance metrics. They also identify eight other types of clickbait headlines: reaction, reasoning, revealing, number, hypothesis/guess, questionable, forward referencing, and shocking/unbelievable. They also found that shocking/unbelievable, hypothesis/guess, and reaction clickbait types to be the most frequently occurring types of clickbait headlines published online.
Although several machine learning approaches have been proposed to detect clickbait headlines, most of these recent methods are not very robust. The previous studies used hybrid categorization techniques such as Gradient Boosted Decision Trees, linear regression, Naïve Bayes and random forest methods, SVM, decision tree, logistic regression, and convolutional neural network deep learning. Most of these studies used datasets with headlines written in English. However, this paper uses an Arabic language dataset and proposes a comprehensive approach that includes three main phases: data collection, data preparation, and machine learning model training and testing phases. This dataset was pre-processed and then the most important features were selected using ANOVA F-test. Several machine learning methods were then applied which include random forest (RF), stochastic gradient descent (SGD), Support Vector Machine (SVM), logistic regression (LR), multinomial Naïve Bayes (NB), and k-nearest neighbor (k-NN). Hyper-parameter tuning methods were applied to ensure finding the optimal settings. Finally, the ML models were evaluated and the overall performance is reported here. The key contributions of this paper are as follows:

•
We constructed the first Arabic clickbait headline news dataset. The raw dataset is available publicly for research purpose.

•
We extracted a set of user-based features and content-based features for the constructed Arabic clickbait dataset.

•
We proposed an effective approach for enhancing the detection process using a feature selection technique, namely a one-way ANOVA F-test.

•
We conducted extensive experiments, and the results show that the proposed model enhances the performance of some classifiers in terms of accuracy, precision, and recall.

Characteristics of Clickbait News
Biyani et al. [2] define eight types of clickbait, which include exaggeration, teasing, inflammatory, formatting, graphic, bait-and-switch, ambiguous, and wrong. In exaggeration, the title overdraws the content on the target page. Teasing means hiding the details from the title to build more suspense. In the inflammatory type, inappropriate or vulgar words are phrased. Formatting means overusing the capitalization/punctuation in the headlines, for instance ALL CAPS or exclamation points are used. In graphic types, the subject matter is disturbing or unbelievable. Bait-and-switch means the news included in the title is not found at the target page. Ambiguous means the title is unclear or confusing, while wrong means using a plainly incorrect article. Kaur et al. [8] also identify eight other types of clickbait headlines: reaction, reasoning, revealing, number, hypothesis/guess, questionable, forward referencing, and shocking/unbelievable. They also found that shocking/unbelievable, hypothesis/guess, and reaction clickbait types to be the most frequently occurring types of clickbait headlines published online.
According to Zheng et al. [9], different ways of attracting users' attention are used by the headlines of different article types, which means the characteristics of clickbait vary between article types. This is different from traditional text-analysis issues. For instance, the headlines of forums or blogs are more colloquial than the headlines of other traditional news. The main difference between these two types of headlines is the use of functional linguistic characteristics such as wondering, exaggerating, and questioning. In [9], two types of characteristics were used: general clickbait, and the type-related characteristics, while the main characteristics used by Naeem et al. [10] for detection of clickbait were sensationalism, mystery, notions of curiosity, and shock.
In another approach, Potthast et al. [3] used three types of features for clickbait headlines, which are: the teaser message, the linked web page, and meta information. The first type includes basic text statistics and dictionary features, while the second type analyses the web pages linked from a tweet, and the third type includes meta information about the tweet's sender, medium, and time.
Bazaco, Redondo, and Sánchez-García [11] describe the characteristics of clickbait using six variables under two categories: presentation variables and content variables. The first category includes incomplete information, appealing expressions, repetition and serialisation, and exaggeration, while the second type includes the use of soft news and sensationalist content and striking audiovisual elements. According to [1], the characteristics of curiosity used in clickbait are: its intensity, tendency to disappoint, transience, and association with impulsivity. These lead to a knowledge gap that are exploited by the clickbait headlines to encourage readers to click through to read the whole article.

Machine Learning and Deep Learning Methods for Clickbait Detection
Several machine learning and deep learning methods have been applied to detect clickbait headlines from different social networks, including Twitter, Facebook, Instagram, Reddit, and others. Table 1 summarizes recent studies on clickbait detection methods. The results in the table show that the performance of machine learning methods still needs to be improved. In the best cases, the highest accuracy obtained reached 0.87 by [12]. In contrast, the use of deep learning showed a good improvement in performance, where the accuracy obtained by [13] reached 0.97. Most of the existing studies used headlines written in English or other languages. Only a few studies focused on clickbait headlines in Arabic. Although Arabic and English scripts have some similarities, there are a number of characteristics that specify the uniqueness of Arabic script. These include: the direction of Arabic, which is written from right to left, and the fact that neither upper nor lower cases exist in Arabic, which is written cursively. In Arabic, all letters are connected from both sides, except six letters that can be connected from the right side only. Each of the 28 letters of Arabic script has different shapes, depending on its position in the word, and some letters are very similar, differing only in the number and/or the position of dots [14,15]. In addition, there are other special features which are unique to Arabic script such as elongation, morphological characteristics, word meters, and morphemes [16]. Table 1. Summary of recent studies on clickbait detection methods.

Accuracy of the Model(s) Issues/Future Directions
The dataset includes 1349 clickbait and 2724 non-clickbait websites from different news websites whose pages surfaced on the Yahoo homepage. (1) Include the non-textual features (example: images and videos) and the comments of users on articles.
(2) Find the most effective types of clickbait that can attract clicks and propose methods to block them. (3) Deep learning is proposed to be applied to obtain more indicators for clickbaits.The obtained performance needs to be improved.

Accuracy of the Model(s) Issues/Future Directions
The dataset includes 2992 tweets from Twitter, 767 of which are clickbait.
Logistic regression, naive Bayes, and random forest 0.79 The first evaluation corpus was proposed with baseline detection methods. However, this task needs more investigation to detect clickbait between different social media, and improving the performance of detection. The obtained performance needs to be improved [17] Clickbait Challenge 2017 Dataset includes over 21,000 headlines. The maximum length of the headline is limited. If the headlines are long, this might cause information loss. This needs more investigation to solve information-loss problem and including user-behavior analysis. The obtained performance needs to be improved. [10] Dataset of head-lines from Reddit,. The datasets includes 16,000 legitimate news and 16,000 clickbait samples.
LSTM using word2vec word embedding 0.94 The good accuracy was obtained due to the loop back approach that was employed by the LSTM that allows for a better understanding of the context and then better classification of headlines. [6] The The other features like image information were not considered in this work. Also, the obtained performance needs to be improved.
To address the lack of study of clickbait detection in Arabic texts, this paper focuses on improving the performance of machine learning methods for detecting clickbait headlines on social networks in the Arabic language.

Problem Formulation for Clickbait Detection
The clickbait detection problem is a subset of natural language processing that can be represented as a binary classification as follows: Given a set of shared posts via social networking platforms (tweets) T = {t 1 , t 2 , . . ., t n }, let t ∈ T denote a post that is classified into a class C = {C+, C−} where C+ is a class of the tweets t i ∈ T that are considered as legitimate news, and C− is the class of the clickbait news t j / ∈ C+. To solve the problem, let D be a dataset of all posts n a vector of extracted features from user portfolio (user-based features (UFs)) and V2 = v 2 1 , v 2 2 , v 2 3 , . . . , v 2 n is a vector of extracted features from the post/tweet content (content-based features (CFs)). Let also v 1 i and v 2 i be the points of a specific feature I and v 1 i ∈ V1 and v 2 i ∈ V2. Let D be a training set and D be a testing set, where D and D ∈ D. Let ξ be a function that generates I from D and D based on the feature space V : ξ : T × V → I. As the vector space can be high-dimensional, the clickbait detection problem is now formulated as follows: Let χ be a function that maps post t i ∈ T to C = {C+, C−}, C: The dataset contains 14,922 headlines, where half of them are clickbait. These headlines are taken from four famous Chinese news websites The maximum length of the headline is limited. If the headlines are long, this might cause information loss. This needs more investigation to solve information-loss problem and including user-behavior analysis. The obtained performance needs to be improved. [10] Dataset of head-lines from Reddit,. The datasets includes 16,000 legitimate news and 16,000 clickbait samples.
LSTM using word2vec word embedding 0.94 The good accuracy was obtained due to the loop back approach that was employed by the LSTM that allows for a better understanding of the context and then better classification of headlines. [6] The The other features like image information were not considered in this work. Also, the obtained performance needs to be improved.

Problem Formulation for Clickbait Detection
The clickbait detection problem is a subset of natural language processing that can be represented as a binary classification as follows: Given a set of shared posts via social networking platforms (tweets) = { , , . . . , }, let ∈ denote a post that is classified into a class ℂ = {ℂ+, ℂ−} where ℂ + is a class of the tweets ∈ that are considered as legitimate news, and ℂ − is the class of the clickbait news ∉ ℂ +.
To solve the problem, let be a dataset of all posts = { 1, 2, ℂ} where 1 = { , , , … , } a vector of extracted features from user portfolio (user-based features (UFs)) and 2 = { , , , … , } is a vector of extracted features from the post/tweet content (content-based features (CFs)). Let also and be the points of a specific feature and ∈ 1 and ∈ 2.
Let ′ be a training set and '′ be a testing set, where ′ and ′′ ∈ . Let be a function that generates from ′ and ′′ based on the feature space V ∶ ξ ∶ T × V → I. As the vector space can be high-dimensional, the clickbait detection problem is now formulated as follows: Let be a function that maps post ∈ to ℂ = {ℂ+, ℂ−}, : : → , where = 〈ℂ, 〉 and is a binary relation which takes value 1 if a post ∈ is a legitimate post and ∈ ℂ+, and 0 otherwise.
The function can now be set as an optimization problem as follows: , r and r is a binary relation which takes value 1 if a post t i ∈ T is a legitimate post and t i ∈ The obtained performance needs to be improved. [12] CLDI dataset from Instagram includes 7769 instances and WCC dataset from Twitter includes 19538 instances. The maximum length of the headline is limited. If the headlines are long, this might cause information loss. This needs more investigation to solve information-loss problem and including user-behavior analysis. The obtained performance needs to be improved. [10] Dataset of head-lines from Reddit,. The datasets includes 16,000 legitimate news and 16,000 clickbait samples.
LSTM using word2vec word embedding 0.94 The good accuracy was obtained due to the loop back approach that was employed by the LSTM that allows for a better understanding of the context and then better classification of headlines. [6] The The other features like image information were not considered in this work. Also, the obtained performance needs to be improved.

Problem Formulation for Clickbait Detection
The clickbait detection problem is a subset of natural language processing that can be represented as a binary classification as follows: Given a set of shared posts via social networking platforms (tweets) = { , , . . . , }, let ∈ denote a post that is classified into a class ℂ = {ℂ+, ℂ−} where ℂ + is a class of the tweets ∈ that are considered as legitimate news, and ℂ − is the class of the clickbait news ∉ ℂ +.
To solve the problem, let be a dataset of all posts = { 1, 2, ℂ} where 1 = { , , , … , } a vector of extracted features from user portfolio (user-based features (UFs)) and 2 = { , , , … , } is a vector of extracted features from the post/tweet content (content-based features (CFs)). Let also and be the points of a specific feature and ∈ 1 and ∈ 2.
Let ′ be a training set and '′ be a testing set, where ′ and ′′ ∈ . Let be a function that generates from ′ and ′′ based on the feature space V ∶ ξ ∶ T × V → I. As the vector space can be high-dimensional, the clickbait detection problem is now formulated as follows: Let be a function that maps post ∈ to ℂ = {ℂ+, ℂ−}, : : → , where = 〈ℂ, 〉 and is a binary relation which takes value 1 if a post ∈ is a legitimate post and ∈ ℂ+, and 0 otherwise.
The function can now be set as an optimization problem as follows: +, and 0 otherwise. The function χ can now be set as an optimization problem as follows: optimize f χ(V1, V2) subject to c(V1, V2) where c is a constraint set on the search space.

Materials and Methods
The proposed multiple-feature-based approach for detecting clickbait news is presented in this section. Since the difference between clickbait and normal news can be distinguished directly by analysis of the linguistic character of news content [20], the proposed approach takes into consideration both the headlines and the content of the news features (CFs). In addition, to overcome the limitations of such approach, they are combined with news content features. Figure 1 presents the methodology followed in this study, which consists of the following phases: data collection, data preparation, and machine learning model training and testing. For detecting clickbait news on social networks, both of the investigated news and profile of the user who shared the post are collected. We first constructed a baseline dataset from the raw dataset by labelling the news as clickbait or legitimate. Since the amount of collected data was huge and for building a sufficiently satisfactory dataset, we used a pseudo labelling learning (PLL) technique [21]. In the next phase, both of the news headlines and contents are pre-processed, including text cleansing, normalization, stemming, stop word removal, and tokenization. These steps are necessary to enhance the overall performance of the ML-based model. We concatenated the processed text with user-based features and then applied the feature reduction using a one-way ANOVA test. The selected features were fed to the ML model. A set of ML models was tested, and their hyper-parameters were tuned to ensure finding the optimal settings. Finally, the ML model was evaluated, and the overall performance reported.

Data Collection
We collected 72,321 Arabic news items from Twitter. The dataset can be obtained from github.com (https://github.com/Moh-Sarem/Clickbait-Headlines#clickbait-headlines) (accessed on 1 October 2021). For this purpose, we implemented a special crawler that can access breaking news on social networks by feeding the name of the public breaking news agencies. Often, Twitter APIs return tweets in JSON format. However, because many features are not helpful for the proposed model, the used crawler filters out and saves all the collected information from user profile and shared content in comma-separated values (CSV) format. The details of the collection process through multiple feature analysis are shown in Algorithm 1. In addition, the full description of the features used is presented in Tables 2 and 3.

Algorithm 1 Pseudocode of dataset collection process for extracting UFs and CFs
Input: A list of public Twitter breaking news agencies' profiles N Output: Unlabelled dataset with UFs and CFs For each profile p ∈ N do: Access public page of p Retrieve all shared tweets t p Pull out using Twitter APIs tweet's features (USs) If t p contains an external URL Then: Visit the external webpage p e For all html tags in p e do: Find html tag that contains news full text (CFs)Compute similarity score between t p and p e End End if Store the extracted features in csv format End The geographical location UF 13 Hashtage The associated hashtag with the post UF 14 Lang The post language UF 15 Number of post shared Total number of content shared by the user

Data Annotation
Once we obtained the final dataset by using the implemented crawler, we prepared a baseline dataset from the retrieved dataset. Every shared tweet was labelled as a clickbait or legitimate by asking three media professionals to volunteer in judging 12,321 tweets and their associated news. They were asked to access the external webpage by following the URL link provided with tweet and comparing the tweet's body and headline with the full text in the destination webpage. To facilitate this job, we provided them with examples showing what clickbait news looks like. Table 4 shows a guideline for how to classify the content of the shared tweets. a baseline dataset from the retrieved dataset. Every shared tweet was labelled as a clickbait or legitimate by asking three media professionals to volunteer in judging 12,321 tweets and their associated news. They were asked to access the external webpage by following the URL link provided with tweet and comparing the tweet's body and headline with the full text in the destination webpage. To facilitate this job, we provided them with examples showing what clickbait news looks like. Table 4 shows a guideline for how to classify the content of the shared tweets. As shown in Table 4, there are seven categories that the volunteers could use to label each post as clickbait news. In case of unclearness or doubt about which class the post As shown in Table 4, there are seven categories that the volunteers could use to label each post as clickbait news. In case of unclearness or doubt about which class the post belongs to, the post is labelled as "incomplete". Every content text in the baseline dataset has three labels, one provided by each annotator. To assign the final class label, we applied the majority voting algorithm and labelled the content as clickbait or legitimate news. Table 5 shows the details of the baseline dataset, which includes 4325 items of clickbait news and 6743 legitimate items. The news items that are labelled as incomplete were later removed from the dataset. The remaining baseline dataset contained 11,068. As the size of our final baseline dataset was quite small (17% of the original dataset), we decided to apply a pseudo-labelling learning technique to enhance the performance of the ML model. PLL is an efficient semi-supervised technique that can be applied to utilize unlabeled data while training ML models. As shown in Figure 1, the ML model is trained first on the labeled data (in this case: the baseline dataset). The model then predicts the labels of unlabeled data. The predicted pseudo-labels are assigned as target classes for unlabeled data and combined with the original baseline dataset (labeled data). Finally, the produced new dataset is then used to train the proposed ML-models. After applying PLL technique, the size of the labeled dataset was increased to around 54893 instances. Table 6 shows the details of the final dataset after applying the PLL technique on 71.54% of the remaining unlabeled data.

Pre-Processing and Numeric Representation
Beside the UFs and CFs described above in Tables 2 and 3, the "headline" CF 3 , "tweet text" CF 4 , and "body text" CF 5 features from CFs required additional treatment.

Pre-Processing
For many text classification systems, pre-processing is considered as an essential step to improve the quality of data as well as the efficiency and accuracy of ML models [22,23]. The common pre-processing steps include text cleansing, tokenization, removing stop words, stemming, and normalization. Since the obtained data is pulled out from Twitter and by accessing the external web pages following the URL links associated with the body of the tweets, additional pre-processing steps were performed, such as deletion of unnecessary, insignificant items from texts (e.g., digits, punctuation marks, URLs, special characters, non-Arabic characters, diacritics), and removal of emojis and hashtags.

Numeric Representation
By numeric representation, we mean converting the textual content into a form that could be fed into the ML model in treatable format. In this work, the term frequencyinverse document frequency (TF-IDF) is used as a numeric representation. Mathematically, the TF-IDF can be calculated as in Equations (1)-(3): After applying the TF-IDF technique on the final dataset, the training time of ML models was long because of high dimensionality, where the number of extracted features reached 10,230.

Feature Selection
Feature selection (FS) is an effective way to reduce large data [23]. The main purpose of FS is to delete irrelevant and noisy data. It also enables a representative subset of all data to be chosen to minimize the complexity of the classification process. Several FS techniques can be found in the literature. These include: Mutual Information (MI), Information Gain (IG), improved Chi-square, and the one-way ANOVA F-test [24] (referred to, hereafter as FV-ANOVA). This paper proposes to use FV-ANOVA as a feature selection method that is used to statistically select the important features according to the F-values. The features are sorted in ascending order, where the most relevant features appear on the top. Finding the best cut-point value is a challenge. Thus, we divided the features into a set of groups based on a given percentile (p%) of the original number of features. This step allows us to find the top-scoring features. Later, only the p% top-scoring features were used to train the ML classifiers. The process of selecting features for FV-ANOVA is presented in Algorithm 2. For tuning hyper-parameters of the used ML classifiers, the grid search algorithm with k-fold cross-validation is used. Subsequently, the values of hyper-parameters that yield the highest performance measure are set to be the final values of hyper-parameters for each classifier. The set of values of hyper-parameters used in this work is presented in Table 7.

Model Evaluation
To evaluate the performance of classifiers, we computed the accuracy (Acc), recall (R), precision (P), and f1-score (F1) metric of each classifier with those features that were selected by the proposed F-values of the one-way ANOVA test. The descriptions of these metrics are shown in Equations (4)-(7) respectively.
where (TP + TN) is the accurately predicted content either clickbait or not, D is the total number of samples in the dataset.
where (TP + FP) is the total number of predicted clickbait content.
where (TP + FN) is the total number of actual clickbait content.

Experimental Design
The experiments in this study were performed on Python 3.8 with Windows 10 operating system. We used numerous Python packages including sklearn 0.22.2 for implementing the classifiers, nltk 3.6.2 for pre-processing Arabic text and Beautiful soup 4.9.0 for scraping data from external web pages. The user-based features and content-based features were fed into classifiers separately. Later, we merged both types and measured the performance of ML classifiers based on top-scoring features p% that were selected based on f-values of oneway ANOVA. For ensuring fair comparison between classifiers, the same pre-processing steps and the same set of features were used for each classifier. In addition, we considered four experimental scenarios per feature type, as illustrated in Table 8.

Results and Findings
This section describes and discusses the results for each experiment shown in Table 8. First, we present the findings that were obtained when only the user-based features were used. The accuracy of each classifier is presented in Table 9. The second type of features, content-based features, were then investigated, as shown in Table 10. Finally, we combined both types of features and the performance of classifiers is presented in Table 11. Based on the results presented in Table 9-11, the following findings are observed and can be summarized as follows: • When the content-based features were used, the classifiers performed well and SVM, NB, and RF achieved notable results using 10% of top-scoring features compared to their results in the baseline experiment. Among these methods, SVM obtained the best accuracy (91.83%) for content-based features.

•
In most cases of experiments with content-based features, all classifiers showed good results when the one-way ANOVA method was used as feature selection, except k-NN and LR. It is notable that k-NN had the worst performance when the number of selected features increased to 10% and 15%.

•
Increasing the percentage of the top-scoring features to more than 10% leads to a reduction in the performance of the ML classifiers. • RF and SVM benefited more when the user-based features were used, compared to their results in the baseline experiment.

•
The result for LR remained constant, and no change was observed when user-based features were fed into the classifier.

•
The k-NN and SGD do not benefit from the ANOVA-based feature selection at all for user-based features.
• Combining user-based features and content-based features shows an improvement in the performance of ML classifiers and only LR and k-NN classifiers did not show any improvement.

•
The SVM outperforms all other classifiers and benefited more when the proposed feature selection method was used for the combination of user-based features and content-based features. The highest accuracy achieved was 92.16%.

•
As the total number of features for the combination of user-based and content-based features is 10,251, selecting the top 10% of these features (2194) was more suitable for SVM, which performed well with low dimensionality data. • As shown in the results, using the user-based features achieved lower performance than using the content-based features for all ML methods. Therefore, the proposed model relies more on the content-based features and the combined ones.

Conclusions
This paper has proposed a comprehensive approach that includes three main phases: data collection, data preparation, and machine learning modeling phases. After collecting the dataset, which is considered the first Arabic clickbait headline news dataset, the preprocessing tasks were performed, which included text cleansing, normalization, stemming, stop words removal, and tokenization. The features of the processed text (content-based features) were then combined with the user-based features and the feature selection was then applied using one-way ANOVA test. Finally, the ML models were applied, which included Random Forest (RF), Stochastic Gradient Descent (SGD), Support Vector Machine (SVM), Logistic Regression (LR), Multinomial Naïve Bayes (NB), and K-nearest Neighbor (k-NN). Hyper-parameter tuning methods were applied to ensure finding the optimal settings. The experimental results showed a great enhancement when the CFs were used and also when a combination of UFs and CFs was used. The accuracy achieved reached 92.12% using 10% of the top-scoring features, which is better than that reported in many previous studies (discussed in the related works). This enhancement is particularly interesting, as we are dealing with Arabic contents. Future work will investigate the application of several deep learning methods on this Arabic dataset in order to enhance the detection performance. Moreover, collecting more Arabic content to add to the dataset will be a beneficial addition to conducting the analysis.