Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (98)

Search Parameters:
Keywords = deepfake video

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 10311 KB  
Article
DeepFakeX: A Comprehensive Multimodal Deepfake Dataset for Research and Analysis
by Sonia Salman, Jawwad Ahmed Shamsi and Rizwan Qureshi
Data 2026, 11(6), 141; https://doi.org/10.3390/data11060141 - 11 Jun 2026
Viewed by 587
Abstract
The expanding capabilities of deep learning-based media synthesis have intensified concerns regarding the authenticity of digital content and the reliability of forensic analysis tools. In response to these challenges, this work introduces DeepFakeX, a collection of 800 synthetically generated videos available under controlled [...] Read more.
The expanding capabilities of deep learning-based media synthesis have intensified concerns regarding the authenticity of digital content and the reliability of forensic analysis tools. In response to these challenges, this work introduces DeepFakeX, a collection of 800 synthetically generated videos available under controlled access for research purposes. The dataset encompasses four distinct categories of AI-driven synthesis: facial identity replacement, audio track substitution, neural voice cloning, and combined audiovisual alteration. Unlike existing deepfake datasets that predominantly focus on facial synthesis, DeepFakeX covers a broader range of manipulation modalities, reflecting the diversity of synthetic media encountered in real-world settings. All deepfakes were generated using state-of-the-art, publicly available tools. Standardized post-processing procedures were applied to each video to ensure uniformity in terms of quality, duration and encoding format. DeepFakeX also emphasizes diversity in gender, age, ethnicity, and language. Video contexts span speeches, informational videos, movie clips, news broadcasts, and interviews that reflect content scenarios commonly encountered in real-world online environments. The dataset includes videos in both English and Urdu. The dataset’s quality and structural variability were assessed through visual and audio analyses using the Structural Similarity Index Measure (SSIM), Mel-Frequency Cepstral Coefficients (MFCCs), and Principal Component Analysis (PCA). The evaluation results revealed substantial variability within each manipulation category, along with clearly distinguishable patterns specific to each modality. DeepFakeX has been developed to facilitate rigorous and transparent research in deepfake detection, cross-modal forensic analysis, and AI-driven media forensics. It is hosted on Zenodo under controlled access for research use. Full article
Show Figures

Figure 1

31 pages, 30018 KB  
Article
Sensors-Driven Multimodal Deepfake Detection: A Cross-Attention Fusion Approach with Adaptive Modality Gating
by Syeda Sitara Waseem, Noman Shabbir, Syed Rizwan Hassan and KangYoon Lee
Sensors 2026, 26(12), 3695; https://doi.org/10.3390/s26123695 - 10 Jun 2026
Viewed by 212
Abstract
Deepfakes threaten sensor-based authentication systems, including biometric sensors, surveillance cameras, and IoT edge devices. Unimodal detectors remain vulnerable to modality-specific attacks. We propose a multimodal deepfake detection framework optimized for resource-constrained edge devices, featuring a novel cross-modal attention fusion mechanism with adaptive gating. [...] Read more.
Deepfakes threaten sensor-based authentication systems, including biometric sensors, surveillance cameras, and IoT edge devices. Unimodal detectors remain vulnerable to modality-specific attacks. We propose a multimodal deepfake detection framework optimized for resource-constrained edge devices, featuring a novel cross-modal attention fusion mechanism with adaptive gating. The architecture combines enhanced Res2Net for audio, temporal 3D CNN with SE attention for video, and bidirectional cross-modal attention with quality-based gates. On our benchmark (5472 audio + 1842 video samples), the fusion model achieves 96.7% accuracy, 96.6% F1-score, 0.988 AUC-ROC, and 3.3% EER. Adversarial testing shows 92.3% accuracy under the Fast Gradient Sign Method (FGSM) attack. The model has a 30.3 MB footprint and runs at 20 FPS on edge hardware. Modality contribution analysis reveals adaptive weighting (72% audio for TTS forgery, 78% video for lip-synced attacks). Cross-dataset evaluation on FakeAVCeleb achieves 92.3% overall accuracy, confirming generalization. Full article
Show Figures

Figure 1

25 pages, 1735 KB  
Article
WAFF: A Synergetic Face Forgery Video Detection Method via Weakly Supervised EfficientNet
by Zhengzhuo Pan, Bohan Chen, Longxiang Ma, Dawei Jin, Yu Zhou and Yudi Huang
J. Imaging 2026, 12(6), 240; https://doi.org/10.3390/jimaging12060240 - 29 May 2026
Viewed by 317
Abstract
Deepfake detection has become an essential task for ensuring the authenticity and security of digital media. Although recent approaches have achieved notable progress, most existing detectors still exhibit limited generalization to unseen forgery techniques and remain vulnerable to common perturbations such as compression, [...] Read more.
Deepfake detection has become an essential task for ensuring the authenticity and security of digital media. Although recent approaches have achieved notable progress, most existing detectors still exhibit limited generalization to unseen forgery techniques and remain vulnerable to common perturbations such as compression, noise, and adversarial attacks. To overcome these issues, we propose Weakly Supervised EfficientNet Augmented Face Forgery Detector (WAFF), a novel framework that integrates fine-grained per-frame analysis with adaptive video-level fusion. Specifically, WAFF integrates WSEffiNet, an EfficientNet-B3-based backbone enhanced with a Weakly Supervised Data Augmentation Network (WS-DAN). This design generates attention maps to emphasize subtle facial forgery artifacts while encouraging complementary local–global feature learning. At the video level, WAFF incorporates a multi-strategy fusion scheme that combines fake-frame counting, confidence averaging, and attention-guided voting to strike a balance between sensitivity and stability. Extensive experiments on FaceForensics++, Celeb-DF v2, DFD, DFDC, and FFIW-10K demonstrate that WAFF can achieve state-of-the-art performance under both high- and low-quality compression, while also enhancing cross-dataset generalization. Full article
(This article belongs to the Special Issue AI-Driven Image and Video Understanding)
Show Figures

Figure 1

20 pages, 2215 KB  
Article
Frame Selection Strategies for Video Deepfake Detection: Benchmarking Accuracy and Runtime Trade-Offs
by Artūras Serackis, Mindaugas Jankauskas, Anastasija Grubinskienė and Vytautas Abromavičius
Appl. Sci. 2026, 16(11), 5364; https://doi.org/10.3390/app16115364 - 27 May 2026
Viewed by 348
Abstract
This study evaluates frame selection during inference as an independent factor in video deepfake detection while keeping the downstream detectors fixed. We compare twelve frame selection strategies, ranging from simple temporal and quality baselines to landmark aware policies, using four validated pretrained detectors: [...] Read more.
This study evaluates frame selection during inference as an independent factor in video deepfake detection while keeping the downstream detectors fixed. We compare twelve frame selection strategies, ranging from simple temporal and quality baselines to landmark aware policies, using four validated pretrained detectors: Self-Blended Images (SBIs), Frequency-Enhanced Self-Blended Images (FSBIs), Generative Convolutional Vision Transformer (GenConViT), and GenD. The primary experiment is a complete factorial benchmark with 300 videos and five frame budgets (2, 4, 8, 16, and 32 selected frames), which provides the reference results at 32 frames. To address sample size limitations, an additional validation experiment uses a deduplicated split of 1180 Celeb-DF++ and FaceForensics++ videos, with complete results for 2, 4, and 8 selected frames and a reported subset for 16 selected frames. In the complete 300-video benchmark, 32 frames achieved the strongest average AUC, while 8 and 16 frames recovered most of the attainable performance with lower runtime. The best single validated configuration was GenD with Shot-aware sampling at 32 frames, yielding an AUC of 0.9607 and a balanced accuracy of 0.9133. The study therefore does not claim that smaller budgets universally outperform 32 frames; instead, it quantifies the tradeoff between accuracy and runtime and shows that frame selection remains a meaningful design variable under constrained inference budgets. Full article
(This article belongs to the Special Issue Integration of AI in Signal and Image Processing)
Show Figures

Figure 1

19 pages, 1771 KB  
Article
Dynamic Spatial-Temporal Inconsistency Learning for General Deepfake Detection in Visual Understanding
by Jicheng Li, Guangjun Liao, Yufei Wang, Xing Liu and Beibei Liu
Mathematics 2026, 14(10), 1612; https://doi.org/10.3390/math14101612 - 9 May 2026
Viewed by 376
Abstract
Generalizable deepfake detection is essential for trustworthy visual understanding in real-world computer vision applications. This paper presents a dynamic spatial-temporal inconsistency learning algorithm designed to achieve high generalization in deepfake video detection. Current video-based detection approaches tend to either isolate spatial artifacts or [...] Read more.
Generalizable deepfake detection is essential for trustworthy visual understanding in real-world computer vision applications. This paper presents a dynamic spatial-temporal inconsistency learning algorithm designed to achieve high generalization in deepfake video detection. Current video-based detection approaches tend to either isolate spatial artifacts or merely exploit coarse temporal inconsistencies when identifying deepfake videos, which impedes the acquisition of fine-grained spatial-temporal clues and consequently limits their generalization capability. To this end, we propose the dynamic spatial-temporal network (DST-Net), a deep architecture that systematically mines comprehensive inconsistency cues through three synergistic modules. The short-term temporal modality extraction (STME) module captures temporal dynamics from adjacent frames. The short-term spatial-temporal inconsistency extraction (SSTIE) module with pixel-wise supervision learns semantically meaningful inconsistency features resistant to perturbations. The dynamic-term spatial-temporal inconsistency extraction (DSTIE) module adaptively aggregates these features across timescales, building robust multi-scale representations. This design ensures that the learned representations capture intrinsic forgery patterns, enhancing generalization and robustness. Comprehensive evaluations conducted on five widely adopted benchmark datasets reveal that our method surpasses nine representative competitors, with superior robustness to common image perturbations. This work advances the application of deep learning algorithms to reliable visual understanding in multimedia forensics. Full article
Show Figures

Figure 1

18 pages, 741 KB  
Review
A Review of Tools and Technologies to Combat Deepfakes
by Dmitry Erokhin and Nadejda Komendantova
Information 2026, 17(4), 347; https://doi.org/10.3390/info17040347 - 3 Apr 2026
Cited by 1 | Viewed by 2755
Abstract
Deepfakes and adjacent synthetic-media capabilities have become a systemic challenge for information integrity, security, and digital trust. Countermeasures now span passive detection methods that infer manipulation from content traces, active provenance systems that cryptographically bind metadata to media, and watermarking approaches that embed [...] Read more.
Deepfakes and adjacent synthetic-media capabilities have become a systemic challenge for information integrity, security, and digital trust. Countermeasures now span passive detection methods that infer manipulation from content traces, active provenance systems that cryptographically bind metadata to media, and watermarking approaches that embed detectable signals into content or generative processes. This review presents a rigorous synthesis of tools and technologies to combat deepfakes across modalities (image, video, audio, and selected multimodal settings), drawing primarily from the peer-reviewed literature, standardized benchmarks, and official technical specifications and reports. The review analyzes detection methods, provenance and authentication technologies, with emphasis on cryptographic manifests and threat models, watermarking and content provenance, including diffusion-era watermarking and industrial deployments, adversarial robustness and attacker adaptation, datasets and benchmarks, evaluation metrics across tasks, and deployment and scalability constraints. A dedicated section addresses legal, ethical, and policy issues, focusing on emerging transparency obligations and platform governance. The review finds that no single countermeasure is sufficient in realistic adversarial settings. The strongest practical approach is a layered defense that combines provenance, watermarking, content-based detection, and human oversight. The study concludes with limitations of the current evidence base and prioritized research directions to improve generalization, interoperability, and trustworthy user experiences. Full article
(This article belongs to the Special Issue Surveys in Information Systems and Applications)
Show Figures

Graphical abstract

10 pages, 375 KB  
Entry
Deepfakes
by Sean William Maher
Encyclopedia 2026, 6(4), 80; https://doi.org/10.3390/encyclopedia6040080 - 2 Apr 2026
Viewed by 87371
Definition
Deepfakes have emerged as one of the most significant developments in contemporary computational media, representing a sophisticated convergence of machine learning, computer vision, and audiovisual synthesis. Enabled primarily by deep neural networks such as generative adversarial networks (GANs) and transformer-based architectures, Deepfakes are [...] Read more.
Deepfakes have emerged as one of the most significant developments in contemporary computational media, representing a sophisticated convergence of machine learning, computer vision, and audiovisual synthesis. Enabled primarily by deep neural networks such as generative adversarial networks (GANs) and transformer-based architectures, Deepfakes are realistic video fabrications through sound and image alteration and substitution that synthesises human likeness, speech, and behaviours. Deepfakes function simultaneously as creative tools, political instruments, security risks, and epistemic disruptors. They have generated widespread scholarly, regulatory, and public concern by contributing to the reshaping of visual communication and posing significant challenges to established norms of authenticity. This entry defines Deepfakes, outlines their technological foundations, synthesises insights from current research and assesses implications for media industries, journalism, documentary, disinformation, governance, and digital culture. Full article
(This article belongs to the Section Social Sciences)
Show Figures

Figure 1

18 pages, 1850 KB  
Article
AT-HSTNet: An Efficient Hierarchical Action-Transformer Framework for Deepfake Video Detection
by Sameena Javaid, Marwa Chendeb El Rai, Abeer Elkhouly, Obada Al-Khatib, Aicha Beya Far and May El Barachi
Appl. Sci. 2026, 16(7), 3450; https://doi.org/10.3390/app16073450 - 2 Apr 2026
Viewed by 480
Abstract
The rapid advancement of deepfake generation technologies presents significant challenges to the verification of digital video authenticity. These time-dependent artifacts are difficult to detect using conventional frame-based detection approaches. This paper introduces AT-HSTNet, an Action-Transformer-based Hierarchical Spatiotemporal Network designed for robust and computationally [...] Read more.
The rapid advancement of deepfake generation technologies presents significant challenges to the verification of digital video authenticity. These time-dependent artifacts are difficult to detect using conventional frame-based detection approaches. This paper introduces AT-HSTNet, an Action-Transformer-based Hierarchical Spatiotemporal Network designed for robust and computationally efficient deepfake video detection. The proposed framework adopts a multi-stage hierarchical architecture in which frame-level visual features are extracted using an EfficientNet-B0 backbone, short- and medium-range temporal patterns are modeled through Bidirectional Long Short-Term Memory (BiLSTM) networks, and long-range temporal dependencies are captured using an action-aware Transformer operating on temporally aggregated representations. Unlike conventional video transformers that apply self-attention directly to raw frame-level features, the proposed action-aware attention mechanism reduces redundant computation and improves stability in temporal reasoning. Extensive experiments on the balanced FFIW-10K dataset demonstrate that AT-HSTNet achieves an accuracy of 98.7%, with 98.0% precision, 96.0% recall, and a 96.9% F1-score, outperforming representative CNN–BiLSTM and CNN–Transformer baseline architectures. In addition, AT-HSTNet is highly efficient, requiring only 0.45 GFLOPs and achieving an inference speed of approximately 30 FPS on consumer-grade GPU hardware. As a result of this study, we found hierarchical temporal modeling more effective when combined with action-aware attention for any deepfake video detection. Full article
Show Figures

Figure 1

22 pages, 3493 KB  
Article
Deepfake Detection Using Multimodal CLIP-Based SigLIP-2 Vision Transformers
by Joe Soundararajan and Dong Xu
AI 2026, 7(3), 115; https://doi.org/10.3390/ai7030115 - 19 Mar 2026
Viewed by 3570
Abstract
Background: Deepfakes pose a growing threat to the integrity of visual media, motivating detectors that remain reliable as forgeries become increasingly realistic. Methods: We propose a deepfake detection framework built on CLIP-derived SigLIP-2 vision transformers and a multi-task design that jointly performs (i) [...] Read more.
Background: Deepfakes pose a growing threat to the integrity of visual media, motivating detectors that remain reliable as forgeries become increasingly realistic. Methods: We propose a deepfake detection framework built on CLIP-derived SigLIP-2 vision transformers and a multi-task design that jointly performs (i) classification and (ii) manipulated-region localization when pixel-level supervision is available. We evaluated the approach on three public benchmarks of increasing complexity—HiDF, SID_Set (SIDA), and CiFake—using each dataset’s official partitions where provided (SID_Set uses the predefined train/validation split) and a standardized preprocessing and training pipeline across experiments. Results: On HiDF, our model achieved strong performance on both video and image tracks (AUC up to 0.931 on video and 0.968 on images), yielding large gains relative to previously reported HiDF baselines under their published settings. On SID_Set, the model achieved 99.1% three-class accuracy (real/synthetic/tampered) and produced accurate localization masks for many tampered regions, while we explicitly documented the split protocol and leakage checks to support the validity of the evaluation. On CiFake, the model exceeded 95% accuracy and attained an AUC of 0.986. Conclusions: Overall, the results indicate that SigLIP-2 representations combined with multi-task training can deliver high detection accuracy and interpretable localization on challenging, realistic forgeries, while highlighting the importance of clearly stated evaluation protocols for fair comparison. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

18 pages, 5241 KB  
Viewpoint
The Generative AI Paradox: GenAI and the Erosion of Trust, the Corrosion of Information Verification, and the Demise of Truth
by Emilio Ferrara
Future Internet 2026, 18(2), 73; https://doi.org/10.3390/fi18020073 - 1 Feb 2026
Cited by 2 | Viewed by 4146
Abstract
Generative AI (GenAI) now produces text, images, audio, and video that can be perceptually convincing at scale and at negligible marginal cost. While public debate often frames the associated harms as “deepfakes” or incremental extensions of misinformation and fraud, this view misses a [...] Read more.
Generative AI (GenAI) now produces text, images, audio, and video that can be perceptually convincing at scale and at negligible marginal cost. While public debate often frames the associated harms as “deepfakes” or incremental extensions of misinformation and fraud, this view misses a broader socio-technical shift: GenAI enables synthetic realities—coherent, interactive, and potentially personalized information environments in which content, identity, and social interaction are jointly manufactured and mutually reinforcing. We argue that the most consequential risk is not merely the production of isolated synthetic artifacts, but the progressive erosion of shared epistemic ground and institutional verification practices as synthetic content, synthetic identity, and synthetic interaction become easy to generate and hard to audit. This paper (i) formalizes synthetic reality as a layered stack (content, identity, interaction, institutions), (ii) expands a taxonomy of GenAI harms spanning personal, economic, informational, and socio-technical risks, (iii) articulates the qualitative shifts introduced by GenAI (cost collapse, throughput, customization, micro-segmentation, provenance gaps, and trust erosion), and (iv) synthesizes recent risk realizations (2023–2025) into a compact case bank illustrating how these mechanisms manifest in fraud, elections, harassment, documentation, and supply-chain compromise. We then propose a mitigation stack that treats provenance infrastructure, platform governance, institutional workflow redesign, and public resilience as complementary rather than substitutable, and outline a research agenda focused on measuring epistemic security. We conclude with the Generative AI Paradox: as synthetic media becomes ubiquitous, societies may rationally discount digital evidence altogether, raising the cost of truth for everyday life and for democratic and economic institutions. Full article
Show Figures

Figure 1

25 pages, 2900 KB  
Article
SDEQ-Net: A Deepfake Video Anomaly Detection Method Integrating Stochastic Differential Equations and Hermitian-Symmetric Quantum Representations
by Ruixing Zhang, Bin Li and Degang Xu
Symmetry 2026, 18(2), 259; https://doi.org/10.3390/sym18020259 - 30 Jan 2026
Viewed by 689
Abstract
With the rapid advancement of deepfake generation technologies, forged videos have become increasingly realistic in visual quality and temporal consistency, posing serious threats to multimedia security. Existing detection methods often struggle to effectively model temporal dynamics and capture subtle inter-frame anomalies. To address [...] Read more.
With the rapid advancement of deepfake generation technologies, forged videos have become increasingly realistic in visual quality and temporal consistency, posing serious threats to multimedia security. Existing detection methods often struggle to effectively model temporal dynamics and capture subtle inter-frame anomalies. To address these challenges, we propose a Stochastic Differential Equation and Quantum Uncertainty Network (SDEQ-Net), a novel deepfake video anomaly detection framework that integrates continuous time stochastic modeling with quantum uncertainty mechanisms. First, a Continuous Time Neural Stochastic Differential Filtering Module (CNSDFM) is introduced to characterize the continuous evolution of latent inter-frame states using neural stochastic differential equations, enabling robust temporal filtering and uncertainty estimation. Second, a Quantum Uncertainty Aware Fusion Module (QUAFM) incorporates Hermitian-symmetric density matrix representations and von Neumann entropy to enhance feature fusion under uncertainty, leveraging the mathematical symmetry properties of quantum state representations for principled uncertainty quantification. Third, a Fractional Order Temporal Anomaly Detection Module (FOTADM) is proposed to generate fine grained temporal anomaly scores based on fractional order residuals, which are used as dynamic weights to guide attention toward anomalous frames. Extensive experiments on three benchmark datasets, including FaceForensics++, Celeb-DF, and DFDC, demonstrate the effectiveness of the proposed method. SDEQ-Net achieves AUC scores of 99.81% on FF++ (c23) and 97.91% on FF++ (c40). In cross dataset evaluations, it obtains 89.55% AUC on Celeb-DF and 86.21% AUC on DFDC, consistently outperforming existing state-of-the-art methods in both detection accuracy and generalization capability. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

19 pages, 1747 KB  
Article
Video Deepfake Detection Based on Multimodality Semantic Consistency Fusion
by Fang Sun, Xiaoxuan Guo, Tong Zhang, Yang Liu and Jing Zhang
Future Internet 2026, 18(2), 67; https://doi.org/10.3390/fi18020067 - 23 Jan 2026
Cited by 1 | Viewed by 1256
Abstract
Deepfake detection in video data typically relies on mining deep embedded representations across multiple modalities to obtain discriminative fused features and thereby improve detection accuracy. However, existing approaches predominantly focus on how to exploit complementary information across modalities to ensure effective fusion, while [...] Read more.
Deepfake detection in video data typically relies on mining deep embedded representations across multiple modalities to obtain discriminative fused features and thereby improve detection accuracy. However, existing approaches predominantly focus on how to exploit complementary information across modalities to ensure effective fusion, while often overlooking the impact of noise and interference present in the data. For instance, issues such as small objects, blurring, and occlusions in the visual modality can disrupt the semantic consistency of the fused features. To address this, we propose a Multimodality Semantic Consistency Fusion model for video forgery detection. The model introduces a semantic consistency gating mechanism to enhance the embedding of semantically aligned information across modalities, thereby improving the discriminability of the fused representations. Furthermore, we incorporate an event-level weakly supervised loss to strengthen the global semantic discrimination of the video data. Extensive experiments on standard video forgery detection benchmarks demonstrate the effectiveness of the proposed method, achieving superior performance in both forgery event detection and localization compared to state-of-the-art approaches. Full article
Show Figures

Figure 1

21 pages, 1055 KB  
Article
FAIR-VID: A Multimodal Pre-Processing Pipeline for Student Application Analysis
by Algirdas Laukaitis, Diana Kalibatienė, Dovilė Jodenytė, Kęstutis Normantas, Julius Jancevičius, Mindaugas Jankauskas and Artūras Serackis
Appl. Sci. 2025, 15(24), 13127; https://doi.org/10.3390/app152413127 - 13 Dec 2025
Cited by 1 | Viewed by 1452
Abstract
The shift toward remote and automated admission processes in higher education introduces new challenges, including evaluator subjectivity and risks of applicant fraud. The FAIR-VID project addresses these issues by developing an artificial intelligence system that integrates multimodal data fusion with semi-supervised deep learning [...] Read more.
The shift toward remote and automated admission processes in higher education introduces new challenges, including evaluator subjectivity and risks of applicant fraud. The FAIR-VID project addresses these issues by developing an artificial intelligence system that integrates multimodal data fusion with semi-supervised deep learning to assess applicant video interviews, submitted documents, and form data. This paper presents the project’s data preprocessing pipeline, designed to fuse heterogeneous modalities and to support seamless interaction between AI agents and human decision-makers throughout the admission workflow. The proposed process is intentionally general, making it applicable not only to international university admissions but also to broader human resource management and hiring contexts. Emphasis is placed on the need for robust and transparent AI adoption in admission and recruitment, supported by open-source modules and models at every stage of interaction between applicants and institutions. As a proof of concept, we provide open-source solutions for the analysis of video interviews, images, and documents enriched with semantic descriptions generated by large multimodal and complementary AI models. The paper details the multi-phase implementation of this pipeline to create structured, semantically rich datasets suitable for training advanced deep learning systems for comprehensive applicant assessment and fraud detection. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

18 pages, 1001 KB  
Article
Artificial Intelligence Physician Avatars for Patient Education: A Pilot Study
by Syed Ali Haider, Srinivasagam Prabha, Cesar Abraham Gomez-Cabello, Ariana Genovese, Bernardo Collaco, Nadia Wood, Mark A. Lifson, Sanjay Bagaria, Cui Tao and Antonio Jorge Forte
J. Clin. Med. 2025, 14(23), 8595; https://doi.org/10.3390/jcm14238595 - 4 Dec 2025
Cited by 3 | Viewed by 3125
Abstract
Background: Generative AI and synthetic media have enabled realistic human Embodied Conversational Agents (ECAs) or avatars. A subset of this technology replicates faces and voices to create realistic likenesses. When combined with avatars, these methods enable the creation of “digital twins” of physicians, [...] Read more.
Background: Generative AI and synthetic media have enabled realistic human Embodied Conversational Agents (ECAs) or avatars. A subset of this technology replicates faces and voices to create realistic likenesses. When combined with avatars, these methods enable the creation of “digital twins” of physicians, offering patients scalable, 24/7 clinical communication outside the immediate clinical environment. This study evaluated surgical patient perceptions of an AI-generated surgeon avatar for postoperative education. Methods: We conducted a pilot feasibility study with 30 plastic surgery patients at Mayo Clinic, USA (July–August 2025). A bespoke interactive surgeon avatar was developed in Python using the HeyGen IV model to reproduce the surgeon’s likeness. Patients interacted with the avatar through natural voice queries, which were mapped to predetermined, pre-recorded video responses covering ten common postoperative topics. Patient perceptions were assessed using validated scales of usability, engagement, trust, eeriness, and realism, supplemented by qualitative feedback. Results: The avatar system reliably answered 297 of 300 patient queries (99%). Usability was excellent (mean System Usability Scale score = 87.7 ± 11.5) and engagement high (mean 4.27 ± 0.23). Trust was the highest-rated domain, with all participants (100%) finding the avatar trustworthy and its information believable. Eeriness was minimal (mean = 1.57 ± 0.48), and 96.7% found the avatar visually pleasing. Most participants (86.6%) recognized the avatar as their surgeon, although many still identified it as artificial; voice resemblance was less convincing (70%). Interestingly, participants with prior exposure to deepfakes demonstrated consistently higher acceptance, rating usability, trust, and engagement 5–10% higher than those without prior exposure. Qualitative feedback highlighted clarity, efficiency, and convenience, while noting limitations in realism and conversational scope. Conclusions: The AI-generated physician avatar achieved high patient acceptance without triggering uncanny valley effects. Transparency about the synthetic nature of the technology enhanced, rather than diminished, trust. Familiarity with the physician and institutional credibility likely played a key role in the high trust scores observed. When implemented transparently and with appropriate safeguards, synthetic physician avatars may offer a scalable solution for postoperative education while preserving trust in clinical relationships. Full article
Show Figures

Figure 1

26 pages, 2820 KB  
Article
Forensic Analysis of Manipulated Images and Videos
by Sergio A. Falcón-López, Llanos Tobarra, Antonio Robles-Gómez and Rafael Pastor-Vargas
Appl. Sci. 2025, 15(23), 12664; https://doi.org/10.3390/app152312664 - 29 Nov 2025
Cited by 1 | Viewed by 2483
Abstract
The transition from Industry 4.0 to Industry 5.0 emphasizes the need for ethical, transparent, and human-centric artificial intelligence systems. In this context, ensuring the authenticity of digital information has become crucial for maintaining societal trust. This study addresses the challenge of detecting manipulated [...] Read more.
The transition from Industry 4.0 to Industry 5.0 emphasizes the need for ethical, transparent, and human-centric artificial intelligence systems. In this context, ensuring the authenticity of digital information has become crucial for maintaining societal trust. This study addresses the challenge of detecting manipulated multimedia content, including synthetic images, videos, and audio generated by artificial intelligence, commonly known as Deepfakes. We analyze and compare general-purpose and Deepfake-specific detection methods to assess their effectiveness in real-world scenarios. This work introduces a refined reference model that integrates both application-oriented and methodological criteria, grouping tools into Blind Forensic, Handcrafted Machine Learning, Deep Learning-based methods, and Toolkits. This structured taxonomy provides a clearer comparative framework than existing works, which typically classify detectors using only one of these dimensions. To ensure reproducible evaluation, all experiments were performed using the SAFL dataset, which consolidates real and synthetic multimedia content generated with publicly available tools under a unified protocol. Among the tested tools, Forensically achieved the highest accuracy in image forgery detection 86.9%, while Autopsy reached 69.5% among Deepfake-specific image detectors. In video analysis, Forensically obtained 98.6% accuracy, whereas Deepware Scanner achieved 91.2% as the most effective Deepfake-focused tool. These results highlight that general-purpose methods remain robust for images, while specialized detectors perform competitively in videos. Overall, the proposed model and dataset establish a consistent foundation for advancing hybrid detection strategies aligned with the ethical and transparent AI principles envisioned in Industry 5.0. Full article
(This article belongs to the Special Issue AI from Industry 4.0 to Industry 5.0: Engineering for Social Change)
Show Figures

Figure 1

Back to TopTop