Pretrained Models Against Traditional Machine Learning for Detecting Fake Hadith
Abstract
1. Introduction
- To implement and compare various ML models for classifying Hadith into Genuine and Fake categories, aiming for enhanced accuracy and robustness.
- To conduct a novel ablation study investigating the impact of including the chain of narrators on Hadith classification performance using both ML and DL models, providing insights into the computational significance of Isnad.
- To evaluate the effectiveness of PLMs in the context of Hadith authenticity classification for the first time, contributing new and critical insights into their applicability and superior performance in this specialized and culturally sensitive domain.
2. Related Work
2.1. Overview of Previous Research on Fake News Detection
2.2. Overview of PLMs
2.3. Problem Statement
3. Experiments
3.1. Experimental Setup
3.2. Data Collection
3.2.1. Genuine Hadith
3.2.2. Fake Hadith
3.3. Evaluation Metrics
4. Results and Discussion
4.1. Key Innovations
- Systematic Investigation of Isnad: We conducted a systematic investigation into the combined effect of linguistic content (Matn) and contextual features (Isnad) on Hadith authentication, a critical aspect largely overlooked in prior research.
- First-time Evaluation of PLMs: For the first time in this specialized domain, we evaluated the effectiveness of state-of-the-art pre-trained language models (PLMs) like AraBERT, CamelBERT, and mBERT for the explicit task of Hadith authenticity verification.
- Novel Ablation Study: We performed a unique ablation study to quantitatively assess the impact of removing the chain of narrators (Isnad) on classification performance, providing crucial insights into its computational significance.
4.2. Performance Comparison and Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Alghamdi, J.; Lin, Y.; Luo, S. ABERT: Adapting BERT model for efficient detection of human and AI-generated fake news. Int. J. Inf. Manag. Data Insights 2025, 5, 100353. [Google Scholar] [CrossRef]
- Zhao, J.H.S.; Al-Dala’in, T. The Hybrid Model Combination of Deep Learning Techniques, CNN-LSTM, BERT, Feature Selection, and Stop Words to Prevent Fake News. In Proceedings of the Third International Conference on Innovations in Computing Research (ICR’24), Athens, Greece, 12–14 August 2024; Daimi, K., Al Sadoon, A., Eds.; Springer: Cham, Switzerland, 2024; pp. 173–184. [Google Scholar]
- Al-Dala’in, T.; Zhao, J.H.S. Overview of the Benefits Deep Learning Can Provide Against Fake News, Cyberbullying and Hate Speech. In Proceedings of the Second International Conference on Innovations in Computing Research (ICR’23), Madrid, Spain, 4–6 September 2023; Daimi, K., Al Sadoon, A., Eds.; Springer: Cham, Switzerland, 2023; pp. 13–27. [Google Scholar]
- Jbara, K.M.A.; Sleit, A.T.; Hammo, B.H. Knowledge Discovery in Al-Hadith Using Text Classification Algorithm; University of Jordan: Amman, Jordan, 2009. [Google Scholar]
- Alkhatib, M. Classification of Al-Hadith Al-Shareef using data mining algorithm. In Proceedings of the European, Mediterranean and Middle Eastern Conference on Information Systems, EMCIS2010, Abu Dhabi, United Arab Emirates, 12–13 April 2010; pp. 1–23. [Google Scholar]
- Al-Kabi, M.N.; Wahsheh, H.A.; Alsmadi, I.M. A topical classification of hadith Arabic text. In Proceedings of the 2nd International Conference on Islamic Applications in Computer Science And Technology, Amman, Jordan, 12–13 October 2014. [Google Scholar]
- Bakar, M.Y.A.; Al Faraby, S.; Adiwijaya, K. Multi-label topic classification of hadith of Bukhari (Indonesian language translation) using information gain and backpropagation neural network. In Proceedings of the 2018 International Conference on Asian Language Processing (IALP), Bandung, Indonesia, 15–17 November 2018; IEEE: New York, NY, USA, 2018; pp. 344–350. [Google Scholar]
- Rostam, N.A.P.; Malim, N.H.A.H. Text categorisation in Quran and Hadith: Overcoming the interrelation challenges using machine learning and term weighting. J. King Saud Univ.-Comput. Inf. Sci. 2021, 33, 658–667. [Google Scholar] [CrossRef]
- Aldhlan, K.A.; Zeki, A.M.; Zeki, A.M.; Alreshidi, H.A. Novel mechanism to improve hadith classifier performance. In Proceedings of the 2012 International Conference on Advanced Computer Science Applications and Technologies (ACSAT), Kuala Lumpur, Malaysia, 26–28 November 2012; IEEE: New York, NY, USA, 2012; pp. 512–517. [Google Scholar]
- Shatnawi, M.Q.; Abuein, Q.Q.; Darwish, O. Verification hadith correctness in islamic web pages using information retrieval techniques. In Proceedings of the International Conference on Information & Communication Systems, Avila, Spain, 11–13 March 2011; pp. 164–167. [Google Scholar]
- Najiyah, I.; Susanti, S.; Riana, D.; Wahyudi, M. Hadith degree classification for Shahih Hadith identification web based. In Proceedings of the 2017 5th International Conference on Cyber and IT Service Management (CITSM), Denpasar, Indonesia, 8–10 August 2017; IEEE: New York, NY, USA, 2017; pp. 1–6. [Google Scholar]
- Ghanem, M.; Mouloudi, A.; Mourchid, M. Classification of hadiths using LVQ based on VSM considering words order. Int. J. Comput. Appl. 2016, 148, 25–28. [Google Scholar] [CrossRef]
- Ibrahim, N.K.; Samsuri, S.; Seman, M.S.A.; Ali, A.E.B.; Kartiwi, M. Frameworks for a computational isnad authentication and mechanism development. In Proceedings of the 2016 6th International Conference on Information and Communication Technology for The Muslim World (ICT4M), Jakarta, Indonesia, 22–24 November 2016; IEEE: New York, NY, USA, 2016; pp. 154–159. [Google Scholar]
- Balgasem, S.S.; Zakaria, L.Q. A hybrid method of rule-based approach and statistical measures for recognizing narrators name in hadith. In Proceedings of the 2017 6th International Conference on Electrical Engineering and Informatics (ICEEI), Langkawi, Malaysia, 25–27 November 2017; IEEE: New York, NY, USA, 2017; pp. 1–5. [Google Scholar]
- Hassaine, A.; Safi, Z.; Jaoua, A. Authenticity detection as a binary text categorization problem: Application to Hadith authentication. In Proceedings of the 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), Agadir, Morocco, 29 November–2 December 2016; IEEE: New York, NY, USA, 2016; pp. 1–7. [Google Scholar]
- Tarmom, T.; Atwell, E.; Alsalka, M. Deep learning vs compression-based vs traditional machine learning classifiers to detect Hadith authenticity. In Proceedings of the Annual International Conference on Information Management and Big Data, Virtual, 1–3 December 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 206–222. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
Hyperparameter | Value |
---|---|
Maximum seq. length | 512 |
Learning rate | 5 |
Optimizer | AdamW |
Loss function | BCE |
Batch size | 4 |
Number of epochs | 3 |
Dataset Language | A (%) | P (%) | R (%) | F1 (%) |
---|---|---|---|---|
LR | 99.36 | 99.35 | 99.36 | 99.35 |
SVM | 99.65 | 99.41 | 99.41 | 99.41 |
RF | 98.89 | 98.88 | 98.89 | 98.88 |
NB | 94.50 | 94.80 | 94.50 | 94.07 |
DT | 99.59 | 99.59 | 99.59 | 99.59 |
KNN | 94.62 | 94.81 | 94.62 | 94.24 |
AraBERT | 99.94 | 99.94 | 99.94 | 99.94 |
mBERT-uncased | 99.41 | 99.43 | 99.41 | 99.42 |
CamelBERT | 99.71 | 99.71 | 99.71 | 99.71 |
XLM-RoBERTa | 99.71 | 99.71 | 99.71 | 99.71 |
Dataset Language | A (%) | P (%) | R (%) | F1 (%) |
---|---|---|---|---|
LR | 93.79 | 93.68 | 93.79 | 93.53 |
SVM | 93.48 | 93.32 | 93.48 | 93.23 |
RF | 90.16 | 89.68 | 90.16 | 89.72 |
NB | 91.91 | 91.83 | 91.91 | 91.31 |
DT | 87.77 | 88.03 | 87.77 | 87.89 |
KNN | 90.34 | 90.17 | 90.34 | 89.43 |
AraBERT | 98.24 | 98.24 | 98.24 | 98.23 |
mBERT-uncased | 98.12 | 98.16 | 98.12 | 98.08 |
CamelBERT | 98.56 | 98.56 | 98.56 | 98.54 |
XLM-RoBERTa | 98.22 | 98.26 | 98.22 | 98.19 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alghamdi, J.; Albukhari, A.; Al-Dala’in, T. Pretrained Models Against Traditional Machine Learning for Detecting Fake Hadith. Electronics 2025, 14, 3484. https://doi.org/10.3390/electronics14173484
Alghamdi J, Albukhari A, Al-Dala’in T. Pretrained Models Against Traditional Machine Learning for Detecting Fake Hadith. Electronics. 2025; 14(17):3484. https://doi.org/10.3390/electronics14173484
Chicago/Turabian StyleAlghamdi, Jawaher, Adeeb Albukhari, and Thair Al-Dala’in. 2025. "Pretrained Models Against Traditional Machine Learning for Detecting Fake Hadith" Electronics 14, no. 17: 3484. https://doi.org/10.3390/electronics14173484
APA StyleAlghamdi, J., Albukhari, A., & Al-Dala’in, T. (2025). Pretrained Models Against Traditional Machine Learning for Detecting Fake Hadith. Electronics, 14(17), 3484. https://doi.org/10.3390/electronics14173484