Next Article in Journal
Retail Demand Forecasting: A Comparative Analysis of Deep Neural Networks and the Proposal of LSTMixer, a Linear Model Extension
Previous Article in Journal
Requirement Analysis for a Qualifications-Based Learning Model Platform Using Quantitative and Qualitative Methods
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Detecting Fake News in Urdu Language Using Machine Learning, Deep Learning, and Large Language Model-Based Approaches

by
Muhammad Shoaib Farooq
1,
Syed Muhammad Asadullah Gilani
1,
Muhammad Faraz Manzoor
1 and
Momina Shaheen
2,*
1
Department of Artificial Intelligence, University of Management and Technology, Lahore 54700, Pakistan
2
Department of Computing, University of Roehampton, London SW15 5PJ, UK
*
Author to whom correspondence should be addressed.
Information 2025, 16(7), 595; https://doi.org/10.3390/info16070595
Submission received: 20 May 2025 / Revised: 23 June 2025 / Accepted: 8 July 2025 / Published: 10 July 2025

Abstract

Fake news is false or misleading information that looks like real news and spreads through traditional and social media. It has a big impact on our social lives, especially in politics. In Pakistan, where Urdu is the main language, finding fake news in Urdu is difficult because there are not many effective systems for this. This study aims to solve this problem by creating a detailed process and training models using machine learning, deep learning, and large language models (LLMs). The research uses methods that look at the features of documents and classes to detect fake news in Urdu. Different models were tested, including machine learning models like Naïve Bayes and Support Vector Machine (SVM), as well as deep learning models like Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM), which used embedding techniques. The study also used advanced models like BERT and GPT to improve the detection process. These models were first evaluated on the Bend-the-Truth dataset, where CNN achieved an F1 score of 72%, Naïve Bayes scored 78%, and the BERT Transformer achieved the highest F1 score of 79% on Bend the Truth dataset. To further validate the approach, the models were tested on a more diverse dataset, Ax-to-Grind, where both SVM and LSTM achieved an F1 score of 89%, while BERT outperformed them with an F1 score of 93%.
Keywords: artificial intelligence; natural language processing; deep learning; machine learning; large language model; word embedding; fake news; Urdu artificial intelligence; natural language processing; deep learning; machine learning; large language model; word embedding; fake news; Urdu

Share and Cite

MDPI and ACS Style

Farooq, M.S.; Gilani, S.M.A.; Manzoor, M.F.; Shaheen, M. Detecting Fake News in Urdu Language Using Machine Learning, Deep Learning, and Large Language Model-Based Approaches. Information 2025, 16, 595. https://doi.org/10.3390/info16070595

AMA Style

Farooq MS, Gilani SMA, Manzoor MF, Shaheen M. Detecting Fake News in Urdu Language Using Machine Learning, Deep Learning, and Large Language Model-Based Approaches. Information. 2025; 16(7):595. https://doi.org/10.3390/info16070595

Chicago/Turabian Style

Farooq, Muhammad Shoaib, Syed Muhammad Asadullah Gilani, Muhammad Faraz Manzoor, and Momina Shaheen. 2025. "Detecting Fake News in Urdu Language Using Machine Learning, Deep Learning, and Large Language Model-Based Approaches" Information 16, no. 7: 595. https://doi.org/10.3390/info16070595

APA Style

Farooq, M. S., Gilani, S. M. A., Manzoor, M. F., & Shaheen, M. (2025). Detecting Fake News in Urdu Language Using Machine Learning, Deep Learning, and Large Language Model-Based Approaches. Information, 16(7), 595. https://doi.org/10.3390/info16070595

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop