Previous Article in Journal
Lightweight Deep Learning Models for Face Mask Detection in Real-Time Edge Environments: A Review and Future Research Directions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Fake News Detection Through LLM-Driven Text Augmentation Across Media and Languages

1
Jožef Stefan Institute, Jamova Cesta 39, 1000 Ljubljana, Slovenia
2
Faculty of Mechanical Engineering, University of Ljubljana, Aškerčeva Cesta 6, 1000 Ljubljana, Slovenia
*
Author to whom correspondence should be addressed.
Mach. Learn. Knowl. Extr. 2026, 8(4), 103; https://doi.org/10.3390/make8040103
Submission received: 2 March 2026 / Revised: 8 April 2026 / Accepted: 9 April 2026 / Published: 15 April 2026

Abstract

The proliferation of fake news across social media, headlines, and news articles poses major challenges for automated detection, particularly in multilingual and cross-media settings affected by data imbalance. We propose a fake news detection framework based on LLM-driven, feature-guided text augmentation. The method generates realistic synthetic samples across languages, media types, and text granularities while preserving meaning and stylistic coherence. Experiments with classical and transformer-based models (Random Forest, Logistic Regression, BERT, XLM-R) across social media, headlines, and multilingual news datasets show consistent improvements in performance. For inherently balanced datasets (e.g., social media), synthetic augmentation yields negligible but stable performance changes. Across imbalanced scenarios, synthetic augmentation substantially improves minority-class recall and F1-score (e.g., fake news recall from 0.57 to 0.86), while preserving majority-class performance, leading to more balanced and reliable classifiers, whereas oversampling significantly degrades results due to overfitting on duplicated language patterns. Overall, a hybrid semantic- and style-based model proves to be the most robust strategy, outperforming oversampling and matching or exceeding baseline performance across datasets.
Keywords: fake news detection; low-resource languages; data imbalance; synthetic data generation; prompt engineering; style-based features; semantic features fake news detection; low-resource languages; data imbalance; synthetic data generation; prompt engineering; style-based features; semantic features
Graphical Abstract

Share and Cite

MDPI and ACS Style

Sittar, A.; Smiljanic, M.; Guček, A.; Grobelnik, M. Fake News Detection Through LLM-Driven Text Augmentation Across Media and Languages. Mach. Learn. Knowl. Extr. 2026, 8, 103. https://doi.org/10.3390/make8040103

AMA Style

Sittar A, Smiljanic M, Guček A, Grobelnik M. Fake News Detection Through LLM-Driven Text Augmentation Across Media and Languages. Machine Learning and Knowledge Extraction. 2026; 8(4):103. https://doi.org/10.3390/make8040103

Chicago/Turabian Style

Sittar, Abdul, Mateja Smiljanic, Alenka Guček, and Marko Grobelnik. 2026. "Fake News Detection Through LLM-Driven Text Augmentation Across Media and Languages" Machine Learning and Knowledge Extraction 8, no. 4: 103. https://doi.org/10.3390/make8040103

APA Style

Sittar, A., Smiljanic, M., Guček, A., & Grobelnik, M. (2026). Fake News Detection Through LLM-Driven Text Augmentation Across Media and Languages. Machine Learning and Knowledge Extraction, 8(4), 103. https://doi.org/10.3390/make8040103

Article Metrics

Back to TopTop