Fake News Detection in Short Videos by Integrating Semantic Credibility and Multi-Granularity Contrastive Learning
Abstract
1. Introduction
- We propose a multi-dimensional credibility assessment framework for detecting hidden manipulation and supporting fact-checking. Our approach incorporates both intrinsic multimodal content such as text, audio, video, user profile information, and comment threads, as well as external fact-checking signals, which together enable the identification of subtle manipulation tactics. The framework evaluates nine specific dimensions: the level of expertise reflected in the content, consistency with physical laws and commonsense knowledge, the likelihood of being generated by AI, the presence of editing artifacts, the degree of alignment between title and content, the extent of emotional bias, the use of misleading cues, the reliability of the source, and the intention underlying information propagation. This fine-grained analysis provides a precise quantification of news credibility. In comparison with traditional single-modality or shallow-feature methods, the proposed framework more effectively detects misleading short videos that appear credible on the surface, thereby enhancing both detection accuracy and robustness.
- We propose a multi-granularity contrastive learning mechanism, conducting feature comparisons at global, modal, temporal, and spatial levels. This enhances cross-modal consistency and discrimination, making the model more sensitive to cross-modal contradictions and implicit manipulations, and thereby improving its generalization to complex short-video scenarios.
- We present an explainable detection framework for fake news in short videos. The neural-symbolic rule engine provides logical explanations for model decisions and outputs each video score on the credibility assessment dimensions along with the rule matching status. By combining this engine with the multimodal feature fusion module, we achieve traceable decision-making, enhancing system transparency and trustworthiness.
2. Related Work
2.1. Single-Modal Fake News Detection
2.2. Multimodal Fake News Detection
2.3. Fake News Datasets
3. Preliminaries
3.1. Contrastive Learning
3.2. Multi-Dimensional Credibility Assessment
3.3. Neural-Symbolic Rules
4. Methodology
4.1. Overview
4.2. Framework Design Principles
4.3. Multimodal Feature Extraction
4.4. Intra-Modal Semantic Enhancement Mechanism
4.5. Cross-Modal Viewpoint Interaction Model
4.6. Multi-Granularity Contrastive Learning Loss Integration
4.7. Multimodal Decision Fusion
4.8. Explainability
5. Results
5.1. Experimental Setup
5.2. Research Questions
5.3. Experimental Results and Analysis
6. Discussion
7. Conclusions
8. Ethical Considerations
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Business of Apps. TikTok App Report. Available online: https://www.businessofapps.com/data/tiktok-app-report/ (accessed on 26 May 2025).
- Allcott, H.; Gentzkow, M. Social media and fake news in the 2016 election. J. Econ. Perspect. 2017, 31, 211–236. [Google Scholar] [CrossRef]
- Leng, Y.; Zhai, Y.; Sun, S.; Wu, Y.; Selzer, J.; Strover, S.; Zhang, H.; Chen, A.; Ding, Y. Misinformation during the COVID-19 outbreak in China: Cultural, social and political entanglements. IEEE Trans. Big Data 2021, 7, 69–80. [Google Scholar] [CrossRef] [PubMed]
- Shahzad, S.A.; Hashmi, A.; Peng, Y.-T.; Tsao, Y.; Wang, H.-M. AV-Lip-Sync+: Leveraging AV-HuBERT to exploit multimodal inconsistency for video deepfake detection. arXiv 2023, arXiv:2311.02733. [Google Scholar] [CrossRef]
- Peng, L.; Zhang, Y.; Wang, W. Not all fake news is semantically similar: Contextual semantic representation learning for multimodal fake news detection. Inf. Process. Manag. 2024, 61, 102712. [Google Scholar] [CrossRef]
- Wang, Y.; Li, X.; Zhang, Y. Audio–visual deepfake detection using articulatory features. Signal Process. Image Commun. 2024, 101, 116–123. [Google Scholar]
- Javed, M.; Khan, M.S.; Wang, H.-M. Audio–visual synchronization and lip movement analysis for deepfake detection. J. Vis. Commun. Image Represent. 2025, 77, 103205. [Google Scholar] [CrossRef]
- Bohacek, M.; Farid, H. Lost in translation: Lip-sync deepfake detection from audio-video mismatch. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 17–18 June 2024; pp. 100–108. [Google Scholar]
- Liu, W.; Zhang, Y.; Wang, X. Spotting the temporal inconsistency between audio and lip movements for deepfake detection. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 9–15 December 2024; Volume 38, pp. 1234–1242. [Google Scholar]
- Peng, L.; Zhang, Y.; Wang, W. Dual emotion based fake news detection: A deep attention mechanism approach. Inf. Process. Manag. 2024, 61, 102813. [Google Scholar]
- Guo, L.N.; Huang, J.; Wu, X.C.; Yang, Z.; Liu, W. Fake news detection based on joint-training of dual-branch networks. Comput. Eng. Appl. 2022, 58, 153–161. [Google Scholar]
- Goldani, M.H.; Momtazi, S.; Safabakhsh, R. Detecting fake news with capsule neural networks. Appl. Soft Comput. 2021, 101, 106991. [Google Scholar] [CrossRef]
- Shen, R.L.; Pan, W.M.; Peng, C.; Yin, P.B. Microblog rumor detection method based on multi-task learning. Comput. Eng. Appl. 2021, 57, 192–197. [Google Scholar]
- Song, C.G.; Shu, K.; Wu, B. Temporally evolving graph neural network for fake news detection. Inf. Process. Manag. 2021, 58, 102712. [Google Scholar] [CrossRef]
- Verma, P.K.; Agrawal, P.; Amorim, I.; Prodan, R. WELFake: Word Embedding over linguistic features for fake news detection. IEEE Trans. Comput. Soc. Syst. 2021, 8, 881–893. [Google Scholar] [CrossRef]
- Silva, A.; Luo, L.; Karunasekera, S.; Leckie, C. Embracing domain differences in fake news: Cross-domain fake news detection using multi-modal data. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; pp. 557–565. [Google Scholar]
- Dou, Y.T.; Shu, K.; Xia, C.Y.; Yu, P.S. User preference-aware fake news detection. In Proceedings of the 44th ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 2051–2055. [Google Scholar]
- Zeng, X.Q.; Hua, X.; Liu, P.S.; Zuo, J.; Wang, M. Text sentiment distribution label augmentation method based on Plutchik’s wheel of emotions and sentiment lexicon. J. Comput. Res. Dev. 2021, 44, 1080–1094. [Google Scholar]
- Kausar, N.; Alikhan, A.; Sattar, M. Towards better representation learning using hybrid deep learning model for fake news detection. Soc. Netw. Anal. Min. 2022, 12, 165. [Google Scholar] [CrossRef]
- Alonso, M.A.; Vilares, D.; Gómez-Rodríguez, C.; Vilares, J. Sentiment Analysis for Fake News Detection. Electronics 2021, 10, 1348. [Google Scholar] [CrossRef]
- Qian, S.S.; Wang, J.G.; Hu, J.; Fang, Q.; Xu, C. Hierarchical multi-modal contextual attention network for fake news detection. In Proceedings of the 44th ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 153–162. [Google Scholar]
- Yi, S.R.; Soleymani, S.; Arabnia, H.R.; Li, S. Socially aware multimodal deep neural networks for fake news classification. In Proceedings of the IEEE 4th International Conference on Multimedia Information Processing and Retrieval, Tokyo, Japan, 8–10 September 2021; pp. 253–259. [Google Scholar]
- Choi, H.; Ko, Y. Using topic modeling and adversarial neural networks for fake news video detection. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management, Online, 1–5 November 2021; pp. 2950–2954. [Google Scholar]
- Wu, Y.; Zhan, P.W.; Zhang, Y.J.; Wang, L.; Xu, Z. Multimodal fusion with co-attention networks for fake news detection. In Findings of the ACL-IJCNLP; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 2560–2569. [Google Scholar]
- Qi, P.; Cao, J.; Li, X.; Liu, H.; Sheng, Q.; Mi, X.; He, Q.; Lv, Y.; Guo, C.; Yu, Y. Improving fake news detection by using an entity-enhanced framework to fuse diverse multimodal clues. J. Comput. Res. Dev. 2021, 58, 1456–1465. [Google Scholar]
- Qi, P.; Bu, Y.; Cao, J.; Ji, W.; Shui, R.; Xiao, J.; Wang, D.; Chua, T.-S. FakeSV: A multimodal benchmark with rich social context for fake news detection on short video platforms. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; pp. 14444–14452. [Google Scholar]
- Zong, L.; Zhou, J.; Lin, W.; Liu, X.; Zhang, X.; Xu, B. Unveiling opinion evolution via prompting and diffusion for short video fake news detection. In Findings of the Association for Computational Linguistics: ACL; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 10817–10826. [Google Scholar]
- Li, J.; Bin, Y.; Zou, J.; Zou, J.; Wang, G.; Yang, Y. Cross-modal consistency learning with fine-grained fusion network for multimodal fake news detection. arXiv 2023, arXiv:2311.01807. [Google Scholar]
- Wang, W.Y. “Liar, Liar Pants on Fire”: A new benchmark dataset for fake news detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 422–426. [Google Scholar]
- Golbeck, J.; Mauriello, M.; Auxier, B.; Gieringer, C.; Graney, J.; Hoffman, K.M.; Huth, L.; Ma, Z.; Jha, M.; Khan, M.; et al. Fake News vs Satire: A Dataset and Analysis. In Proceedings of the 10th ACM Conference on Web Science (WebSci), Amsterdam, The Netherlands, 20–30 May 2018; pp. 17–21. [Google Scholar]
- Abu Salem, F.K.; Al Feel, R.; Elbassuoni, S.; Jaber, M.; Farah, M. FA-KES: A Fake News Dataset around the Syrian War. In Proceedings of the 13th International AAAI Conference on Web and Social Media (ICWSM), Munich, Germany, 11–14 June 2019; pp. 573–582. [Google Scholar]
- Zubiaga, A.; Kochkina, E.; Liakata, M.; Procter, R.; Lukasik, M. Stance classification in rumours as a sequential task exploiting the tree structure of social media conversations. In Proceedings of the 26th International Conference on Computational Linguistics (COLING), Osaka, Japan, 11–16 December 2016; pp. 2438–2446. [Google Scholar]
- Nakamura, K.; Levy, S.; Wang, W.Y. Fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC), Marseille, France, 11–16 May 2020; pp. 755–762. [Google Scholar]
- Jindal, S.; Sood, R.; Singh, R.; Vatsa, M.; Chakraborty, T. NewsBag: A benchmark multimodal dataset for fake news detection. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA, 7–12 February 2020; pp. 138–145. [Google Scholar]
- Zhu, Y.; Liu, Y.; Zhang, X. MFND Dataset and Shallow-Deep Multitask Learning. arXiv 2025, arXiv:2505.06796. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, H.; Zhang, J.; Wang, C. MM-COVID: A multilingual multimodal dataset for COVID-19 fake news detection. arXiv 2020, arXiv:2011.04088. [Google Scholar]
- Tian, Y.; Krishnan, D.; Isola, P. Contrastive multiview coding. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 776–794. [Google Scholar]
- van den Oord, A.; Li, Y.; Vinyals, O. Representation learning with contrastive predictive coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning (ICML), Vienna, Austria, 12–18 July 2020; Volume 119, pp. 1597–1607. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020. [Google Scholar]
- Shi, T.; Huang, S.-L. Multiemo: An attention-based correlation-aware multimodal fusion framework for emotion recognition in conversations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), Toronto, ON, Canada, 9–14 July 2023; pp. 14752–14766. [Google Scholar]
- Qi, P.; Zhao, Y.; Shen, Y.; Ji, W.; Cao, J.; Chua, T.-S. Two Heads Are Better Than One: Improving Fake News Video Detection by Correlating with Neighbors. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; pp. 11947–11959. [Google Scholar]







| Dimension | Description |
|---|---|
| Professionalism | Are professional terms used? Are authoritative sources cited? Are academic terms used correctly and argumentation logically rigorous? Does it conform to domain knowledge? |
| Physical/Common-sense Consistency | Does the content violate natural laws or basic physical/social common sense? |
| AI-Generated Likelihood | Are there unnatural human motions, scene discontinuities, synthesized voice, or abnormal facial expressions indicative of AI generation? |
| Editing Artifacts | Are there unnatural jumps, audio-video desynchronization, repeated frames, or selective editing artifacts? |
| Title-Content Consistency | Compared to the content, does the title exaggerate, take things out of context, or mismatch? |
| Emotional Bias | Is fear, anger, or divisive emotion deliberately evoked through music, tone, or imagery? |
| Misleading Content | Are techniques like cherry-picking, bait-and-switch, or inverted causality used to induce misunderstanding, even if parts are true? |
| Source Reliability | Are information sources clearly cited? Is the publishing account credible? Is it from a mainstream or authoritative platform? |
| Intent to Spread | Are there obvious political or commercial motives or a clear stance, indicating manipulative agenda rather than objective reporting? |
| Rule | Weight |
|---|---|
| Logical Inconsistency | |
| Factual Contradiction | |
| Source Unreliability | |
| Emotional Manipulation | |
| Timeline Inconsistency | |
| Statistical Anomaly | |
| Authoritative Source | |
| Scientific Evidence | |
| Cross-Verification | |
| Expert Endorsement | |
| Official Documentation | |
| Peer Review |
| Model | Acc (%) | F1 (%) | Rec (%) | Pre (%) |
|---|---|---|---|---|
| Keyframes | 68.62 | 69.94 | 70.20 | 68.63 |
| Video motion | 68.62 | 69.90 | 70.11 | 68.63 |
| Audio | 67.76 | 67.74 | 67.78 | 68.27 |
| User | 78.83 | 78.40 | 80.48 | 79.70 |
| Comments | 63.61 | 63.78 | 65.82 | 65.87 |
| SCMG-FND (ours) | 89.11 | 89.53 | 88.27 | 90.73 |
| Model | Acc (%) | F1 (%) | Rec (%) | Pre (%) |
|---|---|---|---|---|
| MultiEMO | 82.05 | 81.87 | 82.30 | 82.58 |
| FANVM | 82.32 | 81.97 | 83.12 | 82.84 |
| SV-FEND-SNEED | 81.67 | 81.03 | 81.65 | 82.66 |
| SV-FEND | 81.69 | 81.78 | 84.63 | 81.92 |
| OpEvFake | 87.90 | 88.01 | ||
| SCMG-FND (ours) | 88.27 | 90.73 |
| Model | Acc (%) | F1 (%) | Rec (%) | Pre (%) |
|---|---|---|---|---|
| w/o Transformer | 85.62 | 85.94 | 85.20 | 85.63 |
| w/o Capsule | 84.40 | 84.21 | 85.73 | 84.76 |
| w/o Enhance | 81.53 | 81.87 | 82.20 | 82.32 |
| w/o Prompt | 83.84 | 82.79 | 83.12 | 83.27 |
| w/o Neuro | 87.92 | 88.27 | 87.67 | 87.84 |
| w/o OpiEvo | 85.49 | 85.34 | 85.11 | 85.76 |
| w/o Mgcl | 86.80 | 86.76 | 86.15 | 86.33 |
| SCMG-FND (ours) | 89.11 | 89.53 | 88.27 | 90.73 |
| Acc (%) | F1 (%) | |
|---|---|---|
| 0.05 | 88.20 | 88.37 |
| 0.10 | 89.11 | 89.53 |
| 0.20 | 88.42 | 88.61 |
| 0.30 | 87.95 | 88.05 |
| Model | Acc (%) | F1 (%) | Rec (%) | Pre (%) |
|---|---|---|---|---|
| w/o Global | 87.47 | 87.94 | 88.20 | 87.63 |
| w/o Modal | 88.12 | 89.10 | 88.11 | 88.27 |
| w/o Temporal | 87.66 | 88.27 | 87.75 | 88.16 |
| w/o Spatial | 88.78 | 88.19 | 87.68 | 88.43 |
| SCMG-FND (ours) | 89.11 | 89.53 | 88.27 | 90.73 |
| Case | Analysis Results |
|---|---|
| True Case | Title: Male driver violently beat female driver due to lane-change violation. Trust score: 0.88. Modality contributions: Text 55%, Video 25%, Audio 12%. Rule triggers: Title-content consistency verified through internet search. |
| Fake Case | Title: Male driver with road rage beat female driver for slow driving. Trust score: 0.86. Modality contributions: Text 48%, Video 32%, Audio 10%. Rule triggers: Title-content inconsistency and internet search verification. |
| Case | Real Event Background | Video Cues | Prediction (Conf.) | Main Misjudgment Reasons |
|---|---|---|---|---|
| Case 1 | Real incident: 16-year-old student accidentally fell at Taihe No. 2 High School, Anhui; police verified authenticity. | Blurred slideshow photos, synthetic narration, no official watermark. | true → fake (0.32) | (1) Blurring resembled intentional tampering; (2) Static slides failed to trigger temporal contrast, so cross-modal checks weakened; (3) Title keywords were not promptly cross-verified with authoritative sources. |
| Case 2 | Rumor: alleged patriotic activity in a Xi’an community, later confirmed staged and not linked to the cited outbreak. | Walls draped with flags, “epidemic recovery” overlay, patriotic slogans and cheering audio. | fake → true (0.78) | (1) Visual-text alignment biased the model toward authenticity; (2) Search retrieved few reliable sources about the scene, leaving credibility unchecked; (3) Emotional rhetoric dominated and the plausibility module failed to flag staged settings. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, Y.; Shi, X.; Li, H.; Fan, B.; Xu, Y. Fake News Detection in Short Videos by Integrating Semantic Credibility and Multi-Granularity Contrastive Learning. Appl. Sci. 2025, 15, 12621. https://doi.org/10.3390/app152312621
Yang Y, Shi X, Li H, Fan B, Xu Y. Fake News Detection in Short Videos by Integrating Semantic Credibility and Multi-Granularity Contrastive Learning. Applied Sciences. 2025; 15(23):12621. https://doi.org/10.3390/app152312621
Chicago/Turabian StyleYang, Yukun, Xiwei Shi, Haoxu Li, Buwei Fan, and Yijia Xu. 2025. "Fake News Detection in Short Videos by Integrating Semantic Credibility and Multi-Granularity Contrastive Learning" Applied Sciences 15, no. 23: 12621. https://doi.org/10.3390/app152312621
APA StyleYang, Y., Shi, X., Li, H., Fan, B., & Xu, Y. (2025). Fake News Detection in Short Videos by Integrating Semantic Credibility and Multi-Granularity Contrastive Learning. Applied Sciences, 15(23), 12621. https://doi.org/10.3390/app152312621

