- Article
Video Deepfake Detection Based on Multimodality Semantic Consistency Fusion
- Fang Sun,
- Xiaoxuan Guo and
- Jing Zhang
- + 2 authors
Deepfake detection in video data typically relies on mining deep embedded representations across multiple modalities to obtain discriminative fused features and thereby improve detection accuracy. However, existing approaches predominantly focus on how to exploit complementary information across modalities to ensure effective fusion, while often overlooking the impact of noise and interference present in the data. For instance, issues such as small objects, blurring, and occlusions in the visual modality can disrupt the semantic consistency of the fused features. To address this, we propose a Multimodality Semantic Consistency Fusion model for video forgery detection. The model introduces a semantic consistency gating mechanism to enhance the embedding of semantically aligned information across modalities, thereby improving the discriminability of the fused representations. Furthermore, we incorporate an event-level weakly supervised loss to strengthen the global semantic discrimination of the video data. Extensive experiments on standard video forgery detection benchmarks demonstrate the effectiveness of the proposed method, achieving superior performance in both forgery event detection and localization compared to state-of-the-art approaches.
23 January 2026



![Ransomware detection techniques. The arrow labeled “What we Use” indicates the specific detection approach adopted in this work, namely the file system event handler watcher based on decoy resources [43].](https://mdpi-res.com/futureinternet/futureinternet-18-00066/article_deploy/html/images/futureinternet-18-00066-ag-550.jpg)



