Abstract
Currently, with significant developments in technology and social networks, people gain rapid access to news without focusing on its reliability. Consequently, the proportion of fake news has increased. Fake news is a significant problem that hinders societies today, as it negatively impacts many aspects, including politics, the economy, and society. Fake news is widely disseminated via social media through modern digital platforms. In this paper, we focus on conducting a comprehensive review on fake news detection using machine learning and deep learning. Additionally, this review provides a brief survey and evaluation, as well as a discussion of gaps, and explores future perspectives. Through this research, this review addresses various research questions. This review also focuses on the importance of machine learning and deep learning for fake news detection, by providing a comparison and discussion of how they are used to detect fake news. The results of the review, presented between 2018 and 2025, with the most commonly used publishers being IEEE, Intelligent Systems, EMNLP, ACM, Springer, Elsevier, JAIR, and others, can be used to determine the most effective algorithm in terms of performance. Therefore, articles that did not demonstrate the use of algorithms or performance were excluded.
1. Introduction
In recent times, the world has become very fast-paced. Therefore, this rapid development, especially in the digital world, has several advantages and disadvantages. Due to the ease of accessing news without verifying its reliability, the prevalence of fake news has increased. One of the major drawbacks of the digital era is the rapid spread of misinformation. Individuals can unintentionally or deliberately disseminate fake news, potentially causing harm or offense to others or to organizations. Moreover, the spread of fake news can serve as a tool for propaganda against individuals through various online platforms [1,2,3]. On the contrary, machine learning and deep learning algorithms, which are part of artificial intelligence, have been utilized recently for the purpose of detecting fake news or prediction. The algorithms are first trained with a training dataset that contains both fake news and legitimate news. After training, those previously trained models are validated and tested. Then, the models are deployed to perform other tasks, such as predicting or revealing clues that aid in identifying fake news [1,2,3,4,5]. Online platforms prioritize delivering news in a convenient, accessible, and rapid manner. However, this speed and ease of access also create greater opportunities for the dissemination of fake news. As a result, efforts have been made by individuals and organizations to verify and expose false information. Detecting fake news remains a significant challenge. Numerous researchers are addressing this issue by employing machine learning and deep learning algorithms, training these models to identify fake content. Once adequately trained, these algorithms can automatically detect fake news with a certain degree of accuracy [6,7,8].
The accuracy of the classifier in detecting fake news must be observed in order for it to function properly, as failing to detect fake news might be harmful to different people. Some popular classifiers that are used for this purpose in machine learning are given below: naïve bayes, support vector machines (SVMs), random forests, k-nearest neighbors (KNNs), decision trees, and logistic regression. Some common deep learning algorithms used for this purpose are convolutional neural networks (CNNs), bidirectional long short-term memory networks (BI-LSTMs), recurrent neural networks (RNNs), and graph neural networks (GNNs) [9,10,11,12,13,14,15,16]. Figure 1 shows the concept of detecting fake news using machine or deep learning algorithms.
Figure 1.
Detecting fake news using machine or deep learning algorithms.
The research questions of the literature review will be answered by focusing on machine learning and deep learning for fake news detection. They will also address how machine learning and deep learning can be utilized for fake news detection through examining the relevant work in the literature. This can serve as a stepping stone toward developing a methodology for this research. Papers from various databases will be presented, utilizing the inclusion and exclusion technique, which will be discussed in this literature review [17,18,19,20,21].
The quality of all literature reviews of the collected research papers will be evaluated based on the research presented in those papers. Papers in which researchers have demonstrated the use of machine learning and deep learning to detect fake news will be considered high-quality papers and included in this research.
Qualitative research methods will be used to collect data. Qualitative research uses non-numerical data to understand and interpret fake news detection experiences using machine learning and deep learning by making comparisons between previous scientific papers to extract results, for example, algorithms, datasets, years of publication, features, and accuracy. The rest of the paper is structured around the related works in Section 2. Section 3 explains the methodology and research questions. Section 4 presents the results and discussion, and the conclusion is presented in Section 5. Finally, references are provided for the papers discussed in this literature review.
2. Related Works
In this section, we will classify previous studies based on detecting fake news using machine learning, deep learning, or both.
2.1. Machine Learning
Aphiwongsophon and Chongstitvatana [1] employed three machine learning methods to detect fake news: naïve Bayes, neural networks, and support vector machines (SVMs). Moreover, with the use of Twitter API, they extracted twenty-two features. As a result, naïve Bayes achieved an accuracy of over 96%, while neural networks and SVMs yielded an accuracy of 99.90%.
Natural language processing (NLP) techniques were employed in this research to distinguish real news from “fake news”, which comes from unreliable sources. The authors relied on building a model based on a count vector (using word statistics) or TF-IDF matrix (term frequency–inverse document frequency) (word statistics for how often they are used in other articles in a given dataset). However, these models carry out important features such as word organization and context. Therefore, the probability that two articles with similar word counts may be completely different in meaning is high. The dataset used in this model is the Kaggle “Fake News Challenge”. So, the proposed work preprocessed the dataset of fake and real news of the articles and employed a naïve Bayes classifier to build a binary word-based model to classify the news correctly. As a result, it achieved an accuracy of 92.20% [2].
In the study by Ni et al. [3], the features of fake news were examined to detect any sudden changes in the news context by using propensity score matching (PSM) to extract document frequency features that include all variables in order to mitigate the effects of unwanted variables. The experimental data was from open-source FakeNewsNet, which consists of data from PolitiFact and GossiCop, and the results demonstrated that PSM is more applicable to fake news than solely raw PSM, which also performs better than relying on raw frequency for feature selection. They achieved an accuracy of 68%. With the PolitiFact dataset, various fake news classifiers, including logistic regression, random forest, and support vector machine, were considered to evaluate the performance and observe the improvements [3].
Singh et al. [4] have compared ensemble learning models to sort fake news by analyzing the quality of the report and knowing the truth of the news. The aim of the paper was to use natural language processing (NLP) and machine learning (ML) algorithms to detect fake news based on the context of the news. They employed decision trees, random forest, AdaBoost classification, and XGBoost as classifiers. They utilized TF, TF-IDF, and word embedding as features that are fed to the aforementioned classifiers. Thus, a web application was developed to reduce the challenges users face in distinguishing fake news.
In this paper, the authors relied on analyzing fake news as a two-dimensional classification approach using content and context features [8]. Therefore, experiments were performed on the tree-based ensemble machine learning framework (gradient boosting) with full content-based modeling to detect fake news. The experimental results demonstrated higher accuracy compared to existing benchmarks, with the gradient boosting algorithm (an ensemble machine learning framework) achieving 86% accuracy in multi-class fake news classification [8].
Albahr and Albahr [9] examined several traditional machine learning algorithms, namely random forests, naïve Bayes, neural networks, and decision trees, to verify the classification performance in detecting fake news based on unigram, bigram, and trigram features. Training was performed on one of the popular datasets known as LIAR, and the results showed that naïve Bayes significantly outperforms its counterparts, achieving an accuracy of 99.0%.
Goldani et al. [10] focused on using capsule neural networks in the fake news detection process. Various embedding models with different lengths were utilized. In the case of short-length news items, fixed word embeddings and n-gram features were used, but for medium-length or large news items, non-fixed word embeddings that support progressive training were used. Moreover, different levels of n-grams were applied to extract features. For the evaluation process, they were trained on two recently known datasets in this field, namely ISOT and LIAR. The study demonstrated strong performance, with the new methods passing 7.8% on ISOT, while achieving similar performance on LIAR dataset with more than 3% on the validation set, and 1% on the test set.
Birunda and Devi [13] used a textual feature model from authentic and fake news texts using a term frequency equation. To calculate the credibility rating of sources, they relied on the characteristics of the website’s URL and top-level domain. By combining the TF-IDF, site_URL, and text-based features with the credibility rating of multiple sources, the credibility of the news was estimated. The experimental dataset collected from Kaggle contains 2050 news articles. The model was applied to machine learning (ML) classifiers to test its effectiveness in detecting fake news. Experimental results indicated that the proposed model achieved a maximum effectiveness of approximately 99.5%.
Mugdha et al. [14] demonstrated a model capable of detecting fake news based on news headlines by constructing a new dataset for the Bengali language. Using a Gaussian naïve Bayes algorithm, the model achieved acceptable performance. This algorithm used a TF-IDF-based text feature and an additional tree classifier for feature selection. The accuracy rate reached 87%, which is relatively better than any other algorithm used in this model.
Jardaneh et al. [16] used new features related to text containing user sentiment to detect fake news in Arabic. Sentiment analysis advanced the prediction process. Several machine learning algorithms were utilized to train classification models, including random forests, decision trees, AdaBoost, and logistic regression. As a result, they demonstrated that the system was able to detect fake news with an accuracy of 76%.
Tiwari and Jain [22] compared several machine learning algorithms, using decision tree classification, random forest classifiers, and logistic regression with the HSpam14 dataset, which contains a collection of 400,000 tweets and semantic features. The results demonstrated accuracy in identifying selected news items, with an accuracy rate between 98 and 99%.
Rampurkar and D.R [23] preprocessed the input texts to identify their features. The TF-IDF concept was used to estimate the importance of words in each article. The news items were then segmented using a naïve Bayesian algorithm to distinguish true news from fake news. The ISOT dataset contains 23,481 data pieces. This algorithm calculates the probability of classifying an article, assuming that the word is conditionally independent. The efficiency of the algorithms used was then determined using a confusion matrix to evaluate the validity of the model. The results showed that logistic regression performed well in detecting fake news, with an accuracy of 98.31%.
Mutri et al. [24] focused on developing a method for detecting fake news by sorting and analyzing past data using machine learning. Various machine learning methods have been used, including the proposed KNN and SVM algorithm as an effective solution for detecting fake news. KNN is a machine learning algorithm that classifies texts based on proximity to known data in features such as categorical and datetime. This method was used due to its ability to handle nonlinear data and its ease of use. Applying the KNN can increase the efficiency of identifying fake news by leveraging the characteristics of nearby text. In a study conducted using the FakeNewsDetection dataset, the KNN algorithm performed better than other models, achieving a mean absolute error (MAE) of 0.011, which measures the average size of false detections in a set of predictions without taking their direction into account, and a root mean square error (RMSE) of 0.077, which tells the square root of the mean squared difference between the predicted and observed outcomes of data.
2.2. Deep Learning
Gereme et al. [6] presented several models, including the Amharic fake news detection model, an Amharic language dataset (GPAC), the ETH_FAKE dataset, and Amharic FastText word embedding features (AMFTWE). Thus, the model developed using the ETH_FAKE dataset achieved superior accuracy, above 99% using the 300-and 200-dimension embedding.
Detecting fake news is a challenge for many researchers, especially when news is being circulated through social media platforms. This helps to identify false and misleading stories across social media. One of the key challenges in this area of research is the limited availability of data for training detection models. A novel method for automatically generating misleading (and possibly fake) Arabic news stories was presented by Nagoudi et al. [25]. Part of speech (POS) tagging and word embedding features were used. To facilitate future research, this requirement will be completely eliminated by providing a ready-to-use dataset called AraNews. Finally, models were developed for Arabic fake news detection, achieving an accuracy exceeding 70% [25].
Hamed et al. [26] focused on extracting features, specifically for sentiment analysis of news articles, which includes user comments about this news and emotion analysis features. These features, along with the news content feature, were added to a bidirectional short-term memory model for fake news detection. The standard Fakeddit dataset with published headlines was used to train and test the proposed model. The detection accuracy was high, at 96.77%, representing the highest percentage compared to other recent studies.
Verma et al. [27] proposed a two-step standard model called WELFake based on word embedding (WE) by introducing linguistic features to detect fake news using machine learning classification. The first step pre-processes the dataset and verifies the news content using linguistic features. The second step is to embed linguistic feature sets with WE and apply voting classification. To validate the effectiveness of their approach, a new WELFake dataset consisting of nearly many articles was selected, which contains different datasets to produce unbiased classification. As a result, the WELFake model demonstrated an accuracy of 96.73% in fake news detection.
Ivancova et al. [28] focused on detecting fake news from Slovak-language news articles based on Word2Vec, GloVe, and morphological analysis features. A dataset was created to train models on political news. Two architectures, CNNs and LSTM neural networks, were trained on the generated training data. The first model (Model 1) was a CNN, which achieved an overall accuracy of 92.38%. The second model (Model 2) was a recurrent neural network, in which an LSTM layer containing 128 neurons was fed by the output of the embedding layer. This model achieved an accuracy of 93.56% on the Slovak dataset.
Wang et al. [11] presented SemSeq4FD, a novel graph-based neural network model designed for the early detection of fake news using modified text structures. SemSeq4FD employs graphs to model the global semantic representations of sentences, and the global sentence representations are trained using a graph convolutional network. Sentence features were considered, using a one-dimensional convolutional network to train internal sentence classifiers using SLN and LUN data. For the optimized sentences, an LSTM-based network was used, producing the final document representation for fake news recognition using training data in both English and Chinese. An accuracy of 92.6% is achieved.
Subramanian et al. [29] detected fake content in Malayalam on social media platforms. The screening process consists of two subtasks: the first classifies the content as either fake or non-fake using contextual embedding and sequential features, while for the second subtask, the classification was expanded to five categories (false, half-true, mostly false, partially false, and mostly true) with the utilization of multilingual contextual embedding features. For the first task, machine learning methods such as SVM, naïve Bayes, and SGD, along with BERT-based algorithms, were used. Among these algorithms, XLM-RoBERTa achieved a high performance of 89.80%. For the second task, models using LSTM, GRU, XLM-RoBERTa, and SVM were used. XLM-RoBERTa again performed well over the other algorithms, achieving the highest overall F1 score of 62.83%.
Jingyuan et al. [30] focused on improving graph detection through significant improvements to language models, frameworks, and training models in the fake news literature. Building on several successful approaches, the potential for real-time cross-platform fake news detection will be highlighted. Context and Symantec features were used for misinformation detection knowledge integration, fake news detection with multimodal large language models, domain adaptive few-shot fake news detection, and a style-agnostic detection framework. All these models were built on graph neural networks (GNNs). Moreover, their experiment utilized the FakeNewsNet, PolitiFact fact, PAN2020, and COVID-19 datasets. Fake news detection using large multi-modal language models on the PolitiFact dataset yielded a high accuracy of 95.10%.
Tan and Bakir [31] presented a model based on the transformer algorithm, which has multiple uses for processing longer texts more reliably. A hybrid bidirectional long-term text processing unit with the transformer algorithm in the model was performed. To facilitate the identification of fake tweets (TruthSeeker), the researchers added a class-specific balancing factor to the dataset using word embedding. The TomekLinks algorithm was utilized for the purpose of enchanting prediction performance. In order to achieve this goal, a parameter set was considered, and grid search was performed to identify the parameters that yielded optimal results. As for the test results, the model achieved high performance, reaching 99.91% accuracy.
Alsuwat, E. and Alsuwat, H. [32] focused on a new proposal for fake news detection, termed Multi-Modal Fake News Detection (MM-FND). In their experiments, they relied on three datasets, namely the ISOT fake news dataset, the LIAR dataset, and the COVID-19 fake news dataset. For feature generation, they employed Word2Vec and term frequency–inverse document frequency (TF–IDF) to extract temporal features. Bi-LSTM was used to extract temporal features using bidirectional long short-term memory networks. Furthermore, spatial features were extracted using named entity recognition (NER) combined with global vector embeddings for word representation (GloVe). The results showed that the proposal achieved 96.3% accuracy with testing on the ISOT dataset. On the LIAR dataset, the algorithm achieved 95.6% accuracy. On the COVID-19 fake news dataset, the algorithm achieved an accuracy of 97.1%.
2.3. Machine Learning and Deep Learning
Jiang et al. [5] applied two approaches. First, five machine learning models were evaluated, and second, three deep learning models were tested. For evaluation, cross-validation was conducted using two fake news datasets of distinctly different sizes. In addition, term frequency–inverse document frequency (TF–IDF) features and word embeddings were extracted as inputs for the machine learning and deep learning models, respectively. They then proposed a stacking model, which, when tested on the ISOT and KDnugget datasets, achieved accuracies of nearly 99.95% and 96%, respectively.
Pardamean and Pardede [7] worked on identifying inaccurate news by using Bidirectional Encoding Representations from Transformers (BERT). BERT is a deep learning language model and is highly effective in language processing. Experiments have shown that the representations using hyperparameters features can achieve an accuracy of 99.23% by the Kaggle dataset.
Mouratidis et al. [33] conducted a comparative experiment on traditional machine learning classifiers including naïve Bayes, SVMs, and random forests, in addition to deep learning models, such as CNNs, LSTMs, and BERT. The study generated features including TF-IDF, Word2Vec, and contextual embeddings. Moreover, they conducted various tests based on multiple datasets. The researchers found that BERT-based models achieve strong performance, represented by an improvement in the accuracy of fake news detection. They achieved a performance of 98.40% when the BERT algorithm was applied.
Al-Tarawneh et al. [34] found that TF-IDF can potentially extract features exhibiting discrimination features from content. Furthermore, TF-IDF improves CNNs by effectively extracting local features and patterns within the content of text when the Truthseeker dataset is utilized, which contains news articles and social blogs labeled for this purpose. On the other hand, they demonstrated that Word2Vec and FastText embeddings did not perform well in capturing semantic and syntactic nuances, which is not always beneficial for traditional machine learning models, including multilayer perceptron (MLP) or SVMs. This study highlights the importance of carefully choosing the proper embedding techniques based on model algorithm to achieve strong predication performance on the fake news detection task. For TF-IDF embedding, CNN 1 and CNN 3 demonstrated a comparable performance, with an accuracy of 98.77% and 98.99%, respectively, demonstrating the necessity of using these two models for embedding.
Shen et al. [35] developed GAMED, a multi-media modeling algorithm that primarily generates distinct and distinctive features through media sorting to enhance interconnectedness, thus improving overall detection performance. Multiple parallel expert networks are leveraged to extract distinctive and discriminative features and incorporate semantic knowledge into GAMED. The feature distribution is then systematically adjusted. GAMED explains difficult decisions and performs a new classification to dynamically manage contributions from different media. Experimental results on the Fakeddit and Yang datasets show that GAMED performed better than state-of-the-art models, with an accuracy of 93.90%.
2.4. Optimization Techniques
Ozbay and Alatas [12] proposed a new approach to detecting fake news (FND) spread through social media. In this approach, the FND problem was formulated as an optimization problem, supported by the generation of features such as term frequency (TF) and document vectors. To address it, the authors proposed two metaheuristic algorithms, namely Grey Wolf Optimization (GWO) and Negative Swarm Optimization (SSO). The FND approach involves three stages, including data preprocessing, followed by adapting GWO and SSO to train a new FND model. The final stage is testing using the FND model. The results showed that the GWO algorithm has superior performance compared to SSO and other AI algorithms. In the evaluation process, they utilized a public fake news detection (FND) dataset, namely the LIAR benchmark, and achieved an accuracy of 96.5%.
Al-Ahmad et al. [15] presented a model that incorporates a feature selection process aimed at reducing redundancy among similar features, in addition to generating features using Bag of Words (BOW), term frequency (TF), and term frequency–inverse document frequency (TF-IDF). Furthermore, they employed metaheuristic algorithms for classification, namely Particle Swarm Optimization (PSO), genetic algorithms (GAs), and negative swarm algorithms (SSAs). To evaluate their approach, the generated models were tested on the Koirala dataset, achieving an accuracy of 75.4%.
3. Methodology
This section focuses on presenting a comprehensive discussion of the research methodology, where the research strategy, the purpose of the research, how data was collected and analyzed, quality standards, and ethical considerations of the research are discussed. In this research, qualitative research methods are used, based on the analysis of literature reviews extracted from various available research databases. Qualitative research is a research approach with a deep and interpretive focus on phenomena, relying on the context and complexity of the situations under study. In this research, the aim is not only to answer specific questions, but also to delve deeper into understanding the meanings, expectations, and experiences of the individuals or groups concerned. Qualitative methods often include data collection through observations or document analysis, which helps researchers and participants interact quickly with each other. Systematic literature reviews (SLRs) have been increasing in the field of management research. They focus on reviews between journals and researchers, as well as comprehensive searches of scientific databases for research data and application of inclusion/exclusion criteria, thus leading to theoretically and methodologically accurate results to build a reliable foundation for scholars and researchers.
In order to have comprehensive coverage of the relevant work, this review is conducted based on the guidelines provided by Kitchenhamy et al. [19], which contain several stages: “research questions”, “search process”, and compliance with PRISMA 2020 guidelines [36]. The flow diagram is presented in Figure 2, and the completed checklist is provided in the Supplementary Materials.
Figure 2.
PRISMA flow diagram to include papers captured by this research.
In this study, key results are presented through summary tables showing the characteristics and outcomes of included studies. Moreover, current challenges and future trends are highlighted based on the identification of research gaps.
3.1. Research Questions
This section outlines the research questions that defined the direction of this study:
- RQ1: What is the accuracy of the primary techniques employed to detect fake news?
- RQ2: What datasets are used?
- RQ3: Do gaps affect model performance?
3.2. Search Process
The search process was conducted by manually searching for the facts of research papers in scientific journals from 2018 to 2025. The search process used in this review can be further detailed as follows:
3.2.1. Sources and Data Collection
∙ The search method includes articles in journals and conference proceedings published between 2018 and 2025. The search was not limited to a single publisher and included leading sources such as IEEE, Intelligent Systems, EMNLP, ACM, Springer, Elsevier, JAIR, AAAI, and ACL. Furthermore, we extended the search to research-oriented databases, including Scopus, Web of Science, DBLP, and Google Scholar, to ensure comprehensive coverage of the relevant literature. Thus, the citations of all chosen articles were reviewed to find out which papers were not cited as relevant.
3.2.2. Search Keywords
The keywords discussed in the research questions of this research study are as follows:
Fake news, detection, machine learning, algorithms, deep learning, accuracy, features, dataset.
3.2.3. Expression of Research
The procedure described was implemented to enable the search terms in this review. Keywords are extracted from the search questions related to detecting fake news. The search expressions are made up of a set of target words, sorted using the AND logical operator, and a set of terms and synonyms, using the OR logical operator [19].
3.2.4. Inclusion and Exclusion Standards
For articles published between 2018 and 2025, we focused on the following topics:
- Detecting fake news;
- Using machine learning to detect the fake news;
- Using deep learning to detect the fake news.
Articles in which the literature review was the only component and articles in which the literature review was the main conclusion of the article were not included in this review:
- It does not present the use of algorithms to detect fake news.
- No performance has been provided in identifying fake news.
3.2.5. Quality Valuation
Each literature review was evaluated for review and publication in the database. Therefore, the quality valuation questions were listed based on several standards, including
- QV1: Did the study demonstrate the use of machine learning and deep learning methods/algorithms together to detect fake news?
- QV2: Is the dataset used in the model sufficient to achieve high performance?
- QV3: Does the model demonstrate high performance?
Regarding the questions, they were divided as follows:
- QA1 as described in QV1: Y (yes)—the study demonstrated both machine learning and deep learning methods for detecting fake news. P (partially)—the study demonstrated either machine learning or deep learning methods. N (no)—the study did not demonstrate clear methods for detecting fake news.
- QA2 as described in QV2: Y (yes)—the dataset is sufficient. P (partially)—the dataset is partially sufficient. N (no)—the study did not state a clear dataset.
- QA3 as described in QV3: The study showed a high performance of greater than or equal to 98%, with an RMSE of less than or equal to 0.75 and an MAE of less than 0.5. P (partial)—the study showed a performance of less than 98% and greater than or equal to 95%, with an RMSE of greater than 0.75 and less than or equal to 1 and an MAE of greater than 0.5 and less than or equal to 0.75. LP (less than partial)—the study showed a performance of less than 95%, with an RMSE of greater than 1 and less than or equal to 2 and am MAE of greater than 0.75 and less than or equal to 1.5.
The process of evaluating each paper was as follows: Y = 1, P = 0.5, LP = 0.25, and N = 0. When there was a conflict, opinions were discussed until an appropriate evaluation of the paper was reached [19].
Figure 2 displays the PRISMA flow diagram of the study. Out of 2746 citations retrieved by the electronic search, we found 30 eligible documents. We eliminated a total of 66 full-text articles for the following reasons: 50 articles represented review articles, and the impact factor of 16 articles was not high. The importance of a journal is measured by the number of times its selected articles are cited within the years specified in this study. Consequently, a lower impact factor corresponds to a lower journal ranking, and this metric was therefore adopted in our analysis.
This research focused on gaps in previous studies and compared algorithms, features, and performance, as well as datasets and performance. This is in contrast to previous literature reviews that did not focus on these points. Therefore, this research helps researchers quickly leverage machine learning and deep learning techniques for detecting fake news.
4. Results and Discussion
In most of the research conducted on classification to predict whether the obtained news is fake or real, the following algorithms have been used, whether in machine learning, deep learning, or optimization techniques. Machine learning algorithms include logistic regression classification, decision tree classification, gradient boosting classification, random forest classification, k-nearest neighbor classification, and naïve Bayes algorithm. On the other hand, deep learning algorithms include CNN, RNN, BI-LSTM, and GNN [18].
4.1. Machine and Deep Learning Algorithms
4.1.1. Logistic Regression Classification Algorithm
Logistic regression is typically used in two-class classification problems. The primary goal of classification algorithms is to classify objects based on the probability of the presence of the dependent variable. The relationship between the sigmoid function and the coefficients in this algorithm plays a key role in approximating the dependent variable [18].
4.1.2. Decision Tree Algorithm
Decision trees are a commonly utilized algorithm in machine learning. The algorithm works effectively on both classification and regression problems, making it easy for users to understand and interpret. To build a model, predictions based on test data are used in the first stage to determine whether the data is true or false. The algorithm works by splitting the dataset in the first stage and building a classification model for each subset. The model’s efficiency is carefully evaluated, and a classification report reveals the results [18].
4.1.3. Random Forest Classification Algorithm
The random forest classification algorithm is an ensemble learning technique that incorporates the properties of decision trees. The algorithm trains each tree separately, and the final model is obtained by averaging the predictions of these trees. This algorithm achieves a more reliable model by reducing the tendency of a single decision tree to overfit. The algorithm’s success is carefully evaluated [18].
4.1.4. Boosting Classification Algorithm
The concept of the progressive boosting algorithm is based on ensemble learning, combining weak decision trees to generate more accurate decisions. This algorithm thus improves the model’s success by using a sequential error reduction strategy. For classification and regression problems, the progressive boosting algorithm prefers decision trees. The model’s efficiency is evaluated and presented as a classification report [18].
4.1.5. K-Nearest Neighbor (KNN) Algorithm
The K-nearest neighbor (KNN) algorithm is a machine learning algorithm utilized in classification and regression problems. KNN is a simple and highly efficient algorithm that achieves high performance, especially for small datasets. The model’s success is efficiently evaluated, and a classification report is generated based on the results [18].
4.1.6. Naïve Bayes Classification Algorithm
The naïve Bayes classifier algorithm is based on the probability of an event occurring given information from another context. The “naïve” statement is assumed to be independent and unrelated to any other attribute. Therefore, the absence of any attribute does not affect the presence of others. Features are extracted by extracting text data and then converting it to a feature using the concept of “term frequency—inverse document frequency.” Thus, features in text documents can be either word frequencies or TF-IDF values. When testing text data, the naïve Bayes model calculates the probability that the data falls into each class. The data is then classified into the class with the highest probability. The model’s success is efficiently evaluated, and a classification report is printed accordingly [18].
4.1.7. Support Vector Machine (SVM) Algorithm
The SVM algorithm is widely used in machine learning problems for text and news classification and regression. It creates a hyperplane to separate each class in a given dataset. Thus, in a binary classification task, the SVM aims to find the highest hyperplane to separate the dataset into two classes. The success of the SVM in classifying data points belonging to a particular class is based on determining their distance from the hyperplane. The algorithm’s success is evaluated efficiently, and a classification report is printed based on its efficiency [18].
4.1.8. Convolutional Neural Network (CNN) Algorithm
This model evaluates and clarifies the adjustment of neural networks recognized for their effectiveness in sentiment analysis. The strongest feature of this model is that it allocates the highest total amount of information derived from texts through various layers [17].
4.1.9. Recurrent Neural Network (RNN) Algorithm
RNNs are now widely used for identifying fake news. The aim of RNN models is for a constrained-size vector to represent text by assigning each token a recurrent vector, allowing it to embody the crucial sequential nature of language [17].
4.1.10. BI-Directional Long Short-Term Memory (BI-LSTM) Algorithm
BI-LSTM is an extension of LSTM that reads in two directions through the input sequence. This allows the model to perform a richer understanding of the data, especially in tasks like detecting fake news [17].
4.1.11. Graph Neural Network (GNN) Algorithm
GNN are neural network models capable of working with graph data structures. GNNs are derived from CNNs and graph embedding in node and edge prediction and graph-based tasks [30].
4.2. Features Extraction
4.2.1. Term Frequency (TF)
TF measures how often a term appears in a text. It is the ratio of the number of times a word appears in a text to the total number of words in the text. The rule is shown in the TF formula [37]:
4.2.2. Term Frequency–Inverse Document Frequency (TF-IDF)
Inverse document frequency (IDF) scales down words that appear a lot across the corpus or the text. The rule is shown in the IDF formula of a term t:
where N represents the total documents in a collection, and df signifies the count of documents containing term t. The TF-IDF score of a word in a document is the product of its TF and IDF scores [37]. The rule is shown in the TF-IDF formula:
where t stands for term, and d for document.
4.2.3. Word2Vec Embedding
Word2Vec is a widely used technique for embedding words from text. A full text is scanned, and the vector is generated by identifying words that frequently occur with the target word [38].
4.2.4. FastText
FastText is a compact library that enables users to acquire text representations and text classifiers for text [38].
4.3. Performance
The research examines the identification of fake news employing machine learning, deep learning, and optimization techniques. Do et al. [20] introduced a system for assessing the evaluation and datasets for all contributors. The overall accuracy (OA) can be represented by ratios. F-score (F1) and Accuracy (A%) can be represented by ratios, while Precision (P) and Recall (R) can be expressed through ratios from the confusion matrix entries, as shown in Figure 3 [17,39].
where TP: true positive; TN: true negative; FP: false positive; and FN: false negative.
Figure 3.
Confusion matrix.
Machine learning models may be evaluated using the mean absolute error (MAE) and root mean square error (RMSE) metrics to provide a clearer picture of their predictive performance. MAE measures the average absolute difference between the predicted and true values, giving an impression of the amount of error occurring on average without considering its direction. RMSE, on the other hand, provides a more accurate picture of the likelihood of significant errors because it squares difference between the predicted and true values, highlighting significant errors [24].
From Table 1, Table 2, Table 3 and Table 4, it can be observed that deep learning algorithms achieve superior performance on average; however, some traditional machine learning algorithms outperform DL in detecting fake news in certain cases.
Table 1.
Performance comparison based on the machine learning algorithms. The Performance column indicates the performance measure used in each study, followed by its corresponding value.
Table 2.
Performance comparison based on the Deep Learning Algorithms. The Performance column indicates the performance measure used in each study, followed by its corresponding value.
Table 3.
Performance Comparison based on the Both Machine Learning and Deep Learning Algorithms. The Performance column indicates the performance measure used in each study, followed by its corresponding value.
Table 4.
Performance comparison based on optimization techniques. The Performance column indicates the performance measure used in each study, followed by its corresponding value.
Table 5, Table 6, Table 7 and Table 8 demonstrate that datasets such as LIAR and ISOT, which contain a larger volume of news articles, in both training and testing datasets, yielded higher accuracy in fake news detection. A complete list of all studies and their results in ascending order (S1–S30) is provided in Appendix A, Table A1, Table A2 and Table A3.
Table 5.
Performance comparison based on the Twitter/X API dataset. The Performance column indicates the performance measure used in each study, followed by its corresponding value.
Table 6.
Performance comparison based on Kaggle dataset. The Performance column indicates the performance measure used in each study, followed by its corresponding value.
Table 7.
Performance comparison based on the LIAR and ISOT datasets. The Performance column indicates the performance measure used in each study, followed by its corresponding value.
Table 8.
Performance comparison based on the different datasets. The Performance column indicates the performance measure used in each study, followed by its corresponding value.
4.4. Current Challenges and Future Perspectives
This study helps raise awareness about the spread of fake news. The main goal of detecting fake news is to maintain the credibility of news in general. Previous studies have used machine learning, deep learning techniques, and optimization techniques to develop models that enhance the identification of misleading news. However, various challenges and gaps remain in each study. The most notable of these gaps are the following:
A major gap identified in various studies (S1 [1], S2 [2], S5 [5], S12 [28], S18 [14], S21 [22], S29 [24], and S30 [32]) concerns the applicability of the results to real news due to the limited data used for training. Therefore, it is important to expand the scope of data collection and attempt to apply the algorithm more widely in the future, as explained in the research. Therefore, in machine learning problems, obtaining sufficient data often significantly improves the algorithm’s efficiency. The model in study S29 does not include different social media datasets for fake news detection [24]. Therefore, this model lacks a large dataset.
Furthermore, the issue of datasets is not limited to their size but rather expands to the importance of the proper selection of datasets and their category set, based on the gap identified in S13 [9]. Therefore, building the model requires several fine-tuning operations on different datasets during testing to obtain high accuracy in the results, and then relying on those results in future studies [9].
Another important consideration on datasets was identified by the gap in study S10, which lies in the difficulty of dealing with an imbalanced dataset with an uneven representation of categories, where one or more categories contain fewer examples than others [26].
As for the studies S19 [15] and S20 [16], they lack the ability to leverage Twitter responses to improve overall accuracy. To close this gap in research, achieving high performance requires larger datasets.
A shortcoming was found in study S25 [30], in which the current models were unable to adapt to the dynamic trends of social media due to the lack of features described in this research. Consequently, some models may provide inaccurate information and are difficult to scale to include all types of fake news.
A research challenge in study S24 [29] concerns the need to improve the model’s natural language processing (NLP) capabilities by adding features to enhance accuracy. The gap in the aforementioned studies [14,15,16,29] highlights the importance of expanding the feature extraction and generation process during the formation of datasets [14]. Similarly, study S3 [3] observed that the PSM model only considers biases resulting from observed variables and does not consider unobserved variables.
One of the challenges in study S4 [4] is that when using the AdaBoost algorithm, the number of iterations is excessively large, and, therefore, the model overfits the training data [4].
A limitation observed in study S6 [6] is the absence of a word embedding algorithm; this gap could be addressed by using other word embedding algorithms, such as BERT (Bidirectional Encoder Representations from Transformers), which may help train word embeddings better than AMFTWE. However, BERT requires a large amount of data. However, creating a dataset of Amharic fake news and providing its transcripts will be a significant challenge. As for the gap found in study S26 [34], word embedding was not sufficiently considered, so the choice of word embedding technique significantly impacts the model’s accuracy in detecting fake news.
One of the gaps in the S15 study is the need to extract most of the text structure information. Similarly, text modeling methods require further improvements in their accuracy to achieve the desired results and enable their application in other applications [11].
One of the challenges in study S27 [35] is that the model did not include all fake news from media outlets, such as audio or video, to obtain a systematic and comprehensive analysis.
One limitation observed in the S7 study is that BERT is a highly computational model and takes longer to train, so there is a need to reduce its computational load [7].
Various studies S8 [8], S14 [10], S17 [13], and S22 [23] suffered from not achieving high accuracy performance of classifying fake news into multiple categories, and the chosen models did not achieve high efficiency. Therefore, further training is needed [8]. Also, there was a loss of accuracy in the location and pose of objects in an image when the image was not fully classified. Location and pose were classified based on the content of the image and the perspective from which it was captured [10].
The gap in study S9 is that the model was limited to only one language and faced a significant challenge in text processing during training. Therefore, it must be applied to languages other than Arabic. The model also faced difficulties in text processing [25].
One of the limitations in study S11 is that the WELFake model did not address knowledge graph factors, such as the number of labels [27].
Most supervised learning algorithms applied to fake news detection are black-box approaches, as observed in S16 study [12], which does not facilitate the interpretation of the key factors contributing to the model’s predictions.
One of the challenges in study S23 involves the limited use of machine learning algorithms, which negatively impacted the model’s performance. Therefore, it is necessary to add more labels and leverage transfer learning techniques [33].
Based on the limitations in study S28 [31], it requires a more comprehensive study to enhance its ability to counter fake news on social media.
For future directions, this review has analyzed and thoroughly explained the previous literature. It demonstrates that fake news detection algorithms using machine learning and deep learning require large datasets to obtain highly accurate results. Therefore, there is significant scope for further research in this area.
A key recommendation is to expand the feature extraction and feature generation process to capture features that might assist and provide potential clues to fake news prediction process. For example, in the case of analyzing Twitter/X tweets, the incorporation of responses and related features can improve fake news detection.
The combination of sufficient data, effective feature extraction and generation, and appropriate machine learning techniques is a major contributing factor to fake news detection. An essential future direction is the development of interpretable prediction models, which can enhance understanding of the significance of the features selected or generated in the detection process. Few studies have addressed the purpose of ambiguous information, while extensive studies have used explicit information as a criterion for assessing fake news. One approach involves carefully selecting features and adding a large dataset. Table 9 presents the results obtained by displaying the gaps for each study.
Table 9.
The gaps for each study.
Table 10 presents the bibliometric assessment regarding authors’ names, author institutions, author countries, citations and accessibility.
Table 10.
Bibliometric analysis in terms of author.
Each literature review was evaluated for review and publication in the database. Therefore, the quality valuation questions were listed based on several standards, as shown in Table 11.
Table 11.
The quality valuation for each study.
The chart shows the rating of each study in the literature review, as shown in Figure 4. From Figure 4, we see that in studies number S14, S23, S25, and S26, both deep learning and machine learning algorithms were used, and the datasets were sufficient to train the data with the features used. Therefore, the accuracy demonstrated by each study was above 98%.
Figure 4.
The quality evaluation.
5. Conclusions
This research provided a review of machine learning and deep learning algorithms for detecting fake news. It also presented the datasets used in this research, along with the features used to extract important data. It also presented gaps identified in each study and how to fill them. Studies number S14, S23, S25, and S26 used both deep learning and machine learning algorithms, and the datasets were sufficient to train the data with the features used. Therefore, the accuracy demonstrated by each study was high. The performance and quality evaluation of each study were also presented. Finally, this review concluded with a discussion of challenges, highlighting future perspectives on the topic of fake news detection.
Supplementary Materials
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/computers14090394/s1, Table S1: PRISMA 2020 Checklist.
Author Contributions
Conceptualization, F.A.A. (Faisal A. Alshuwaier) and F.A.A. (Fawaz A. Alsulaiman); formal analysis, F.A.A. (Faisal A. Alshuwaier); methodology, F.A.A. (Faisal A. Alshuwaier) and F.A.A. (Fawaz A. Alsulaiman); project administration, F.A.A. (Fawaz A. Alsulaiman); resources, F.A.A. (Faisal A. Alshuwaier); supervision, F.A.A. (Fawaz A. Alsulaiman); writing—original draft, F.A.A. (Faisal A. Alshuwaier); writing—review and editing, F.A.A. (Faisal A. Alshuwaier) and F.A.A. (Fawaz A. Alsulaiman). All authors have read and agreed to the published version of the manuscript.
Funding
There is no funding for this research.
Data Availability Statement
No new data were created or analyzed in this study. Data sharing is not applicable to this article.
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A
Table A1 (S1–S30) presents the results obtained through the analyzed articles, features, datasets, and algorithms, reported by them.
Table A1.
Results obtained through the analyzed articles.
Table A1.
Results obtained through the analyzed articles.
| Study | Author | Year | Dataset | Algorithms/Methods |
|---|---|---|---|---|
| S1 | Aphiwongsophon and Chongstitvatana [1] | 2018 | Twitter API |
|
| S2 | Krishna and Kumar [2] | 2021 | Kaggle |
|
| S3 | Ni et al. [3] | 2021 | Open-source FakeNewsNet PolitiFact and GossipCop |
|
| S4 | Singh et al. [4] | 2023 | Kaggle |
|
| S5 | Jiang et al. [5] | 2021 | ISOT KDnugget |
|
| S6 | Gereme et al. [6] | 2021 | GPAC ETH_FAKE AMFTWE |
|
| S7 | Pardamean and Pardede [7] | 2021 | Kaggle |
|
| S8 | Kaliyar et al. [8] | 2019 | Multi-class |
|
| S9 | Nagoudi et al. [25] | 2020 | Arabic TreeBank AraNews |
|
| S10 | Hamed et al. [26] | 2023 | Fakeddit news |
|
| S11 | Verma et al. [27] | 2021 | WELFake articles |
|
| S12 | Ivancova et al. [28] | 2021 | Articles from Slovak websites |
|
| S13 | Albahr and Albahr [9] | 2020 | LIAR |
|
| S14 | Goldani et al. [10] | 2021 | ISOT LIAR |
|
| S15 | Wang et al. [11] | 2021 | LUN English SLN English Weibo Chinese RCED Chinese |
|
| S16 | Ozbay and Alatas [12] | 2019 | BuzzFeed political news Random political news LIAR |
|
| S17 | Birunda and Devi [13] | 2021 | Kaggle |
|
| S18 | Mugdha et al. [14] | 2020 | Bengali news |
|
| S19 | Al-Ahmad et al. [15] | 2021 | Koirala |
|
| S20 | Jardaneh et al. [16] | 2019 | Twitter API |
|
| S21 | Tiwari and Jain [22] | 2024 | Articles |
|
| S22 | Rampurkar and D.R [23] | 2024 | ISOT |
|
| S23 | Mouratidis et al. [33] | 2025 |
|
|
| S24 | Subramanian et al. [29] | 2025 |
|
|
| S25 | Jingyuan et al. [30] | 2025 |
|
|
| S26 | Al-Tarawneh et al. [34] | 2024 |
|
|
| S27 | Shen et al. [35] | 2025 |
|
|
| S28 | Tan and Bakir [31] | 2025 |
|
|
| S29 | Mutri et al. [24] | 2025 |
|
|
| S30 | Alsuwat, E. and Alsuwat, H. [32] | 2025 |
|
|
Table A2 presents the results obtained through the features analyzed and the languages used in the literature review.
Table A2.
Results obtained through the features analyzed and languages.
Table A2.
Results obtained through the features analyzed and languages.
| Study | Features/Attributes | Language |
|---|---|---|
| S1 |
| Thailand |
| S2 |
| English |
| S3 |
| English |
| S4 |
| English |
| S5 |
| English |
| S6 |
| Amharic (African) |
| S7 |
| English |
| S8 |
| English |
| S9 |
| Arabic |
| S10 |
| English |
| S11 |
| English |
| S12 |
| Slovak |
| S13 |
| English |
| S14 |
| English |
| S15 |
| English + Chinese |
| S16 |
| English |
| S17 |
| English |
| S18 |
| Bengali |
| S19 |
| English |
| S20 |
| Arabic |
| S21 |
| English |
| S22 |
| English |
| S23 |
| English |
| S24 |
| Malayalam |
| S25 |
| English |
| S26 |
| |
| S27 |
| English |
| S28 |
| English |
| S29 |
| English |
| S30 |
| English |
Table A3 presents the results obtained by displaying the models with their performances.
Table A3.
The models and performances. The Performance column indicates the performance measure used in each study, followed by its corresponding value.
Table A3.
The models and performances. The Performance column indicates the performance measure used in each study, followed by its corresponding value.
| Study | Model | Performance |
|---|---|---|
| S1 |
|
|
| S2 |
|
|
| S3 |
|
|
| S4 |
|
|
| S5 |
|
|
| S6 |
|
|
| S7 |
|
|
| S8 |
|
|
| S9 |
|
|
| S10 |
|
|
| S11 |
|
|
| S12 |
|
|
| S13 |
|
|
| S14 |
|
|
| S15 |
|
|
| S16 |
|
|
| S17 |
|
|
| S18 |
|
|
| S19 |
|
|
| S20 |
|
|
| S21 |
|
|
| S22 |
|
|
| S23 |
|
|
| S24 |
|
|
| S25 |
|
|
| S26 |
|
|
| S27 |
|
|
| S28 |
|
|
| S29 |
|
|
| S30 |
|
|
References
- Aphiwongsophon, S.; Chongstitvatana, P. Detecting Fake News with Machine Learning Method. In Proceedings of the 2018 15th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Chiang Rai, Thailand, 18–21 July 2018; pp. 528–531. [Google Scholar]
- Krishna, I.; Kumar, S. Fake News Detection using Naïve Bayes Classifier. Int. J. Creat. Res. Thought (IJCRT) 2021, 9, e757–e761. Available online: https://ijcrt.org/papers/IJCRT2106550.pdf (accessed on 26 May 2025).
- Ni, B.; Guo, Z.; Li, J.; Jiang, M. Improving Generalizability of Fake News Detection Methods using Propensity Score Matching. arXiv 2020, arXiv:2002.00838. [Google Scholar] [CrossRef]
- Singh, D.; Khan, A.H.; Meena, S. Fake News Detection Using Ensemble Learning Models. In Proceedings of the Data Analytics and Management. ICDAM 2023; Lecture Notes in Networks and Systems. Swaroop, A., Polkowski, Z., Correia, S.D., Virdee, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2023; Volume 78, pp. 55–63. [Google Scholar]
- Jiang, T.; Li, J.P.; Haq, A.U.; Saboor, A.; Ali, A. A novel stacking approach for accurate detection of fake news. IEEE Access 2021, 9, 22626–22639. [Google Scholar] [CrossRef]
- Gereme, F.; Zhu, W.; Ayall, T.; Alemu, D. Combating fake news in “low-resource” languages: Amharic fake news detection accompanied by resource crafting. Information 2021, 12, 20. [Google Scholar] [CrossRef]
- Pardamean, A.; Pardede, H.F. Tuned bidirectional encoder representations from transformers for fake news detection. Indones. J. Electr. Eng. Comput. Sci. 2021, 22, 1667–1671. [Google Scholar] [CrossRef]
- Kaliyar, R.K.; Goswami, A.; Narang, P. Multiclass Fake News Detection using Ensemble Machine Learning. In Proceedings of the 2019 IEEE 9th International Conference on Advanced Computing (IACC), Tiruchirappalli, India, 13–14 December 2019; pp. 103–107. [Google Scholar] [CrossRef]
- Albahr, A.; Albahar, M. An Empirical Comparison of Fake News Detection using different Machine Learning Algorithms. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 146–152. [Google Scholar] [CrossRef]
- Goldani, M.H.; Momtazi, S.; Safabakhsh, R. Detecting fake news with capsule neural networks. Appl. Soft Comput. 2021, 101, 106991. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, L.; Yang, Y.; Lian, T. Sem-Seq4FD: Integrating global semantic relationship and local sequential order to enhance text representation for fake news detection. Expert Syst. Appl. 2021, 166, 114090. [Google Scholar]
- Ozbay, F.A.; Alatas, B. A novel approach for detection of fake news on social media using metaheuristic optimization algorithms. Elektron. Ir. Elektrotechnika 2019, 25, 62–67. [Google Scholar]
- Birunda, S.S.; Devi, R.K. A Novel Score-Based Multi-Source Fake News Detection using Gradient Boosting Algorithm. In Proceedings of the 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Coimbatore, India, 25–27 March 2021; pp. 406–414. [Google Scholar] [CrossRef]
- Mugdha, S.B.S.; Ferdous, S.M.; Fahmin, A. Evaluating machine learning algorithms for bengali fake news detection. In Proceedings of the 23rd International Conference on Computer and Information Technology (ICCIT), DHAKA, Bangladesh, 19–21 December 2020; pp. 1–6. [Google Scholar]
- Al-Ahmad, B.; Al-Zoubi, A.M.; Abu Khurma, R.; Aljarah, I. An evolutionary fake news detection method for COVID-19 pandemic information. Symmetry 2021, 13, 1091. [Google Scholar] [CrossRef]
- Jardaneh, G.; Abdelhaq, H.; Buzz, M.; Johnson, D. Classifying Arabic tweets based on credibility using content and user features. In Proceedings of the 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), Amman, Jordan, 9–11 April 2019; pp. 596–601. [Google Scholar]
- Alshuwaier, F.; Areshey, A.; Poon, J. Applications and Enhancement of Document-Based Sentiment Analysis in Deep learning Methods: Systematic Literature Review. Intell. Syst. Appl. 2022, 15, 200090. [Google Scholar] [CrossRef]
- Battal, B.; Yıldırım, B.; Dinçaslan, Ö.F.; Cicek, G. Fake News Detection with Machine Learning Algorithms. Celal Bayar Univ. J. Sci. 2024, 20, 65–83. [Google Scholar]
- Kitchenhamy, B.; Brereton, O.; Budgen, D.; Turner, M.; Bailey, J.; Linkman, S. Systematic Literature Reviews in Software Engineering-A Systematic Literature Review; Elsevier: Amsterdam, The Netherlands, 2009; Volume 51, pp. 7–15. [Google Scholar]
- Do, H.H.; Prasad, P.; Maag, A.; Alsadoon, A. Deep learning for aspect-based sentiment analysis: A comparative review. Expert Syst. Appl. 2019, 118, 272–299. [Google Scholar] [CrossRef]
- Toyer, S.; Thiebaux, S.; Trevizan, F.; Xie, L. Asnets: Deep learning for generalised planning. J. Artif. Intell. Res. (JAIR) 2020, 68, 1–68. [Google Scholar] [CrossRef]
- Tiwari, S.; Jain, S. Fake News Detection Using Machine Learning Algorithms. In Proceedings of the KILBY 100 7th International Conference on Computing Sciences 2023 (ICCS 2023), Phagwara, India, 5 May 2024. [Google Scholar]
- Rampurkar, M.V.; Thirupurasundari, D.D. An Approach towards Fake News Detection using Machine Learning Techniques. Int. J. Intell. Syst. Appl. Eng. 2024, 12, 2868–2874. [Google Scholar]
- Murti, H.; Sulastri, S.; Santoso, D.B.; Diartono, D.A.; Nugroho, K. Design of Intelligent Model for Text-Based Fake News Detection Using K-Nearest Neighbor Method. Sinkron 2025, 9, 1–7. [Google Scholar] [CrossRef]
- Nagoudi, E.M.; Elmadany, A.; Abdul-Mageed, M.; Alhindi, T.; Cavusoglu, H. Machine Generation and Detection of Arabic Manipulated and Fake News. In Workshop on Arabic Natural Language Processing; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020. [Google Scholar]
- Hamed, S.K.; Ab Aziz, M.J.; Yaakub, M.R. Fake News Detection Model on Social Media by Leveraging Sentiment Analysis of News Content and Emotion Analysis of Users’ Comments. Sensors 2023, 23, 1748. [Google Scholar] [CrossRef]
- Verma, P.K.; Agrawal, P.; Amorim, I.; Prodan, R. WELFake: Word Embedding Over Linguistic Features for Fake News Detection. IEEE Trans. Comput. Soc. Syst. 2021, 8, 881–893. [Google Scholar] [CrossRef]
- Ivancova, K.; Sarnovsky, M.; Krešňáková, V. Fake news detection in Slovak language using deep learning techniques. In Proceedings of the SAMI 2021, IEEE 19th World Symposium on Applied Machine Intelligence and Informatics, Herl’any, Slovakia, 21–23 January 2021; pp. 000255–000260. [Google Scholar]
- Subramanian, M.; Premjith, B.; Shanmugavadivel, K.; Pandiyan, S.; Palani, B.; Chakravarthi., B. Overview of the Shared Task on Fake News Detection in Dravidian Languages-DravidianLangTech@NAACL 2025. In Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, Acoma, The Albuquerque Convention Center, Albuquerque, NM, USA, 3 May 2025; Association for Computational Linguistics (ACL): Stroudsburg, PA, USA, 2025; pp. 759–767. [Google Scholar]
- Jingyuan, Y.; Zeqiu, X.; Tianyi, H.; Peiyang, Y. Challenges and Innovations in LLM-Powered Fake News Detection: A Synthesis of Approaches and Future Directions. Comput. Lang. 2025, 87–93. [Google Scholar] [CrossRef]
- Tan, M.; Bakır, H. Fake News Detection Using BERT and Bi-LSTM with Grid Search Hyperparameter Optimization. Bilişim Teknolojileri Dergisi. 2025, 18, 11–28. [Google Scholar]
- Alsuwat, E.; Alsuwat, H. An improved multi-modal framework for fake news detection using NLP and Bi-LSTM. J. Supercomput. 2025, 81, 177. [Google Scholar] [CrossRef]
- Mouratidis, D.; Kanavos, A.; Kermanidis, K. From Misinformation to Insight: Machine Learning Strategies for Fake News Detection. Information 2025, 16, 189. [Google Scholar] [CrossRef]
- Al-Tarawneh, M.A.B.; Al-Irr, O.; Al-Maaitah, K.S.; Kanj, H.; Aly, W.H.F. Enhancing Fake News Detection with Word Embedding: A Machine Learning and Deep Learning Approach. Computers 2024, 13, 239. [Google Scholar] [CrossRef]
- Shen, L.; Long, Y.; Cai, X.; Razzak, I.; Chen, G.; Liu, K.; Jameel, S. GAMED: Knowledge Adaptive Multi- Experts Decoupling for Multimodal Fake News Detection. In Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining (WSDM ’25), Hannover, Germany, 10–14 March 2025; pp. 586–595. [Google Scholar]
- Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Aki, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef] [PubMed]
- Aizawa, A. An information-theoretic perspective of tf–idf measures. Inf. Process. Manag. 2003, 39, 45–65. [Google Scholar] [CrossRef]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar] [CrossRef]
- Matrix, C. Available online: https://h2o.ai/wiki/confusion-matrix/ (accessed on 26 May 2025).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).