Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (93)

Search Parameters:
Keywords = video editing

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 685 KiB  
Article
Food Safety and Waste Management in TV Cooking Shows: A Comparative Study of Turkey and the UK
by Kemal Enes, Gülbanu Kaptan and Edgar Meyer
Foods 2025, 14(15), 2591; https://doi.org/10.3390/foods14152591 - 24 Jul 2025
Viewed by 411
Abstract
This study examines food safety and waste behaviours depicted in the televised cooking competition MasterChef, a globally franchised series that showcases diverse culinary traditions and influences viewers’ practices. The research focuses on the MasterChef editions aired in Turkey and the United Kingdom, [...] Read more.
This study examines food safety and waste behaviours depicted in the televised cooking competition MasterChef, a globally franchised series that showcases diverse culinary traditions and influences viewers’ practices. The research focuses on the MasterChef editions aired in Turkey and the United Kingdom, two countries with distinctly different social and cultural contexts. Video content analysis, based on predefined criteria, was employed to assess observable behaviours related to food safety and waste. Additionally, content analysis of episode transcripts identified verbal references to these themes. Principal Component Analysis was employed to categorise patterns in the observed behaviours. The findings revealed frequent lapses in food safety, with personal hygiene breaches more commonly observed in MasterChef UK, while cross-contamination issues were more prevalent in MasterChef Turkey. In both versions, the use of disposable materials and the discarding of edible food parts emerged as the most common waste-related practices. These behaviours appeared to be shaped by the cultural and culinary norms specific to each country. The study highlights the importance of cooking shows in promoting improved food safety and waste management practices. It recommends involving relevant experts during production and clearly communicating food safety and sustainability messages to increase viewer awareness and encourage positive behaviour change. Full article
(This article belongs to the Special Issue Food Policy, Strategy and Safety in the Middle East)
Show Figures

Figure 1

35 pages, 2865 KiB  
Article
eyeNotate: Interactive Annotation of Mobile Eye Tracking Data Based on Few-Shot Image Classification
by Michael Barz, Omair Shahzad Bhatti, Hasan Md Tusfiqur Alam, Duy Minh Ho Nguyen, Kristin Altmeyer, Sarah Malone and Daniel Sonntag
J. Eye Mov. Res. 2025, 18(4), 27; https://doi.org/10.3390/jemr18040027 - 7 Jul 2025
Viewed by 485
Abstract
Mobile eye tracking is an important tool in psychology and human-centered interaction design for understanding how people process visual scenes and user interfaces. However, analyzing recordings from head-mounted eye trackers, which typically include an egocentric video of the scene and a gaze signal, [...] Read more.
Mobile eye tracking is an important tool in psychology and human-centered interaction design for understanding how people process visual scenes and user interfaces. However, analyzing recordings from head-mounted eye trackers, which typically include an egocentric video of the scene and a gaze signal, is a time-consuming and largely manual process. To address this challenge, we develop eyeNotate, a web-based annotation tool that enables semi-automatic data annotation and learns to improve from corrective user feedback. Users can manually map fixation events to areas of interest (AOIs) in a video-editing-style interface (baseline version). Further, our tool can generate fixation-to-AOI mapping suggestions based on a few-shot image classification model (IML-support version). We conduct an expert study with trained annotators (n = 3) to compare the baseline and IML-support versions. We measure the perceived usability, annotations’ validity and reliability, and efficiency during a data annotation task. We asked our participants to re-annotate data from a single individual using an existing dataset (n = 48). Further, we conducted a semi-structured interview to understand how participants used the provided IML features and assessed our design decisions. In a post hoc experiment, we investigate the performance of three image classification models in annotating data of the remaining 47 individuals. Full article
Show Figures

Figure 1

21 pages, 2816 KiB  
Article
AutoStageMix: Fully Automated Stage Cross-Editing System Utilizing Facial Features
by Minjun Oh, Howon Jang and Daeho Lee
Appl. Sci. 2025, 15(13), 7613; https://doi.org/10.3390/app15137613 - 7 Jul 2025
Viewed by 313
Abstract
StageMix is a video compilation of multiple stage performances of the same song, edited seamlessly together using appropriate editing points. However, generating a StageMix requires specialized editing techniques and is a considerable time-consuming process. To address this challenge, we introduce AutoStageMix, an automated [...] Read more.
StageMix is a video compilation of multiple stage performances of the same song, edited seamlessly together using appropriate editing points. However, generating a StageMix requires specialized editing techniques and is a considerable time-consuming process. To address this challenge, we introduce AutoStageMix, an automated StageMix generation system designed to perform all processes automatically. The system is structured into five principal stages: preprocessing, feature extraction, identifying a transition point, editing path determination, and StageMix generation. The initial stage of the process involves audio analysis to synchronize the sequences across all input videos, followed by frame extraction. After that, the facial features are extracted from each video frame. Next, transition points are identified, which form the basis for face-based transitions, inter-stage cuts, and intra-stage cuts. Subsequently, a cost function is defined to facilitate the creation of cross-edited sequences. The optimal editing path is computed using Dijkstra’s algorithm to minimize the total cost of editing. Finally, the StageMix is generated by applying appropriate editing effects tailored to each transition type, aiming to maximize visual appeal. Experimental results suggest that our method generally achieves lower NME scores than existing StageMix generation approaches across multiple test songs. In a user study with 21 participants, AutoStageMix achieved viewer satisfaction comparable to that of professionally edited StageMixes, with no statistically significant difference between the two. AutoStageMix enables users to produce StageMixes effortlessly and efficiently by eliminating the need for manual editing. Full article
Show Figures

Figure 1

42 pages, 3407 KiB  
Review
Interframe Forgery Video Detection: Datasets, Methods, Challenges, and Search Directions
by Mona M. Ali, Neveen I. Ghali, Hanaa M. Hamza, Khalid M. Hosny, Eleni Vrochidou and George A. Papakostas
Electronics 2025, 14(13), 2680; https://doi.org/10.3390/electronics14132680 - 2 Jul 2025
Viewed by 565
Abstract
The authenticity of digital video content has become a critical issue in multimedia security due to the significant rise in video editing and manipulation in recent years. The detection of interframe forgeries is essential for identifying manipulations, including frame duplication, deletion, and insertion. [...] Read more.
The authenticity of digital video content has become a critical issue in multimedia security due to the significant rise in video editing and manipulation in recent years. The detection of interframe forgeries is essential for identifying manipulations, including frame duplication, deletion, and insertion. These are popular techniques for altering video footage without leaving visible visual evidence. This study provides a detailed review of various methods for detecting video forgery, with a primary focus on interframe forgery techniques. The article evaluates approaches by assessing key performance measures. According to a statistical overview, machine learning has traditionally been used more frequently, but deep learning techniques are gaining popularity due to their outstanding performance in handling complex tasks and robust post-processing capabilities. The study highlights the significance of interframe forgery detection for forensic analysis, surveillance, and content moderation, as demonstrated through both evaluation and case studies. It aims to summarize existing studies and identify limitations to guide future research towards more robust, scalable, and generalizable methods, such as the development of benchmark datasets that reflect real-world video manipulation diversity. This emphasizes the necessity of creating large public datasets of manipulated high-resolution videos to support reliable integrity evaluations in dealing with widespread media manipulation. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

20 pages, 2080 KiB  
Article
Spatially-Precise Video Editing with Reference Imitation: A Region-Aware Cross-Modal Framework
by Qibin Cui, Kai Yang, Lei Yang and Ying Yang
Appl. Sci. 2025, 15(8), 4349; https://doi.org/10.3390/app15084349 - 15 Apr 2025
Viewed by 659
Abstract
Video content is widely applied in the multimedia field. Meanwhile, artificial intelligence generative content (AIGC) is continuously developing. As a result, video editing techniques in the field of computer vision are constantly evolving. However, at present, there are still numerous challenges in achieving [...] Read more.
Video content is widely applied in the multimedia field. Meanwhile, artificial intelligence generative content (AIGC) is continuously developing. As a result, video editing techniques in the field of computer vision are constantly evolving. However, at present, there are still numerous challenges in achieving fine-grained local editing of videos and maintaining temporal consistency.In this paper, we propose a novel framework for local video editing that integrates image-based local editing techniques with CoDeF (Content Deformation Field). Unlike existing methods, our approach explicitly models temporal deformation, ensuring spatial precision in local edits while maintaining high temporal consistency. Our framework first utilizes CoDeF to generate canonical images that aggregate static video content while preserving semantic integrity. A dual diffusion-based local editing module is then applied to modify these canonical images with user-specified editing regions and reference images. Finally, the learned temporal deformation field propagates the edits back to the entire video, achieving coherent and seamless video editing. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in both editing quality and user satisfaction, offering a significant improvement in local precision and temporal stability. Full article
Show Figures

Figure 1

23 pages, 1716 KiB  
Article
Knowledge Translator: Cross-Lingual Course Video Text Style Transform via Imposed Sequential Attention Networks
by Jingyi Zhang, Bocheng Zhao, Wenxing Zhang and Qiguang Miao
Electronics 2025, 14(6), 1213; https://doi.org/10.3390/electronics14061213 - 19 Mar 2025
Cited by 1 | Viewed by 481
Abstract
Massive Online Open Courses (MOOCs) have been growing rapidly in the past few years. Video content is an important carrier for cultural exchange and education popularization, and needs to be translated into multiple language versions to meet the needs of learners from different [...] Read more.
Massive Online Open Courses (MOOCs) have been growing rapidly in the past few years. Video content is an important carrier for cultural exchange and education popularization, and needs to be translated into multiple language versions to meet the needs of learners from different countries and regions. However, current MOOC video processing solutions rely excessively on manual operations, resulting in low efficiency and difficulty in meeting the urgent requirement for large-scale content translation. Key technical challenges include the accurate localization of embedded text in complex video frames, maintaining style consistency across languages, and preserving text readability and visual quality during translation. Existing methods often struggle with handling diverse text styles, background interference, and language-specific typographic variations. In view of this, this paper proposes an innovative cross-language style transfer algorithm that integrates advanced techniques such as attention mechanisms, latent space mapping, and adaptive instance normalization. Specifically, the algorithm first utilizes attention mechanisms to accurately locate the position of each text in the image, ensuring that subsequent processing can be targeted at specific text areas. Subsequently, by extracting features corresponding to this location information, the algorithm can ensure accurate matching of styles and text features, achieving an effective style transfer. Additionally, this paper introduces a new color loss function aimed at ensuring the consistency of text colors before and after style transfer, further enhancing the visual quality of edited images. Through extensive experimental verification, the algorithm proposed in this paper demonstrated excellent performance on both synthetic and real-world datasets. Compared with existing methods, the algorithm exhibited significant advantages in multiple image evaluation metrics, and the proposed method achieved a 2% improvement in the FID metric and a 20% improvement in the IS metric on relevant datasets compared to SOTA methods. Additionally, both the proposed method and the introduced dataset, PTTEXT, will be made publicly available upon the acceptance of the paper. For additional details, please refer to the project URL, which will be made public after the paper has been accepted. Full article
(This article belongs to the Special Issue Applications of Computational Intelligence, 3rd Edition)
Show Figures

Figure 1

24 pages, 5798 KiB  
Article
Research on Personalized Course Resource Recommendation Method Based on GEMRec
by Enliang Wang and Zhixin Sun
Appl. Sci. 2025, 15(3), 1075; https://doi.org/10.3390/app15031075 - 22 Jan 2025
Cited by 1 | Viewed by 1266
Abstract
With the rapid growth of online educational resources, existing personalized course recommendation systems face challenges in multimodal feature integration and limited recommendation interpretability when dealing with complex and diverse instructional content. This paper proposes a graph-enhanced multimodal recommendation method (GEMRec), which effectively integrates [...] Read more.
With the rapid growth of online educational resources, existing personalized course recommendation systems face challenges in multimodal feature integration and limited recommendation interpretability when dealing with complex and diverse instructional content. This paper proposes a graph-enhanced multimodal recommendation method (GEMRec), which effectively integrates text, video, and audio features through a graph attention network and differentiable pooling. Innovatively, GEMRec introduces graph edit distance into the recommendation system to measure the structural similarity between a learner’s knowledge state and course content at the knowledge graph level. Additionally, it combines SHAP (SHapley Additive exPlanations) value computation with large language models to generate reliable and personalized recommendation explanations. Experiments on the MOOCCubeX dataset demonstrate that the GEMRec model exhibits strong convergence and generalization during training. Compared with existing methods, GEMRec achieves 0.267, 0.265, and 0.297 on the Precision@10, Recall@10, and NDCG@10 metrics, respectively, significantly outperforming traditional collaborative filtering and other deep learning models. These results validate the effectiveness of multimodal feature integration and knowledge graph enhancement in improving recommendation performance. Full article
Show Figures

Figure 1

24 pages, 3388 KiB  
Article
An Audiovisual Correlation Matching Method Based on Fine-Grained Emotion and Feature Fusion
by Zhibin Su, Yiming Feng, Jinyu Liu, Jing Peng, Wei Jiang and Jingyu Liu
Sensors 2024, 24(17), 5681; https://doi.org/10.3390/s24175681 - 31 Aug 2024
Cited by 1 | Viewed by 1923
Abstract
Most existing intelligent editing tools for music and video rely on the cross-modal matching technology of the affective consistency or the similarity of feature representations. However, these methods are not fully applicable to complex audiovisual matching scenarios, resulting in low matching accuracy and [...] Read more.
Most existing intelligent editing tools for music and video rely on the cross-modal matching technology of the affective consistency or the similarity of feature representations. However, these methods are not fully applicable to complex audiovisual matching scenarios, resulting in low matching accuracy and suboptimal audience perceptual effects due to ambiguous matching rules and associated factors. To address these limitations, this paper focuses on both the similarity and integration of affective distribution for the artistic audiovisual works of movie and television video and music. Based on the rich emotional perception elements, we propose a hybrid matching model based on feature canonical correlation analysis (CCA) and fine-grained affective similarity. The model refines KCCA fusion features by analyzing both matched and unmatched music–video pairs. Subsequently, the model employs XGBoost to predict relevance and to compute similarity by considering fine-grained affective semantic distance as well as affective factor distance. Ultimately, the matching prediction values are obtained through weight allocation. Experimental results on a self-built dataset demonstrate that the proposed affective matching model balances feature parameters and affective semantic cognitions, yielding relatively high prediction accuracy and better subjective experience of audiovisual association. This paper is crucial for exploring the affective association mechanisms of audiovisual objects from a sensory perspective and improving related intelligent tools, thereby offering a novel technical approach to retrieval and matching in music–video editing. Full article
(This article belongs to the Special Issue Recent Advances in Smart Mobile Sensing Technology)
Show Figures

Figure 1

14 pages, 2930 KiB  
Article
Editable Co-Speech Gesture Synthesis Enhanced with Individual Representative Gestures
by Yihua Bao, Dongdong Weng and Nan Gao
Electronics 2024, 13(16), 3315; https://doi.org/10.3390/electronics13163315 - 21 Aug 2024
Viewed by 1676
Abstract
Co-speech gesture synthesis is a challenging task due to the complexity and uncertainty between gestures and speech. Gestures that accompany speech (i.e., Co-Speech Gesture) are an essential part of natural and efficient embodied human communication, as they work in tandem with speech to [...] Read more.
Co-speech gesture synthesis is a challenging task due to the complexity and uncertainty between gestures and speech. Gestures that accompany speech (i.e., Co-Speech Gesture) are an essential part of natural and efficient embodied human communication, as they work in tandem with speech to convey information more effectively. Although data-driven approaches have improved gesture synthesis, existing deep learning-based methods use deterministic modeling which could lead to averaging out predicted gestures. Additionally, these methods lack control over gesture generation such as user editing of generated results. In this paper, we propose an editable gesture synthesis method based on a learned pose script, which disentangles gestures into individual representative and rhythmic gestures to produce high-quality, diverse and realistic poses. Specifically, we first detect the time of occurrence of gestures in video sequences and transform them into pose scripts. Regression models are then built to predict the pose scripts. Next, learned pose scripts are used for gesture synthesis, while rhythmic gestures are modeled using a variational auto-encoder and a one-dimensional convolutional network. Moreover, we introduce a large-scale Chinese co-speech gesture synthesis dataset with multimodal annotations for training and evaluation, which will be publicly available to facilitate future research. The proposed method allows for the re-editing of generated results by changing the pose scripts for applications such as interactive digital humans. The experimental results show that this method generates more quality, more diverse, and realistic gestures than other existing methods. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

18 pages, 37868 KiB  
Article
3D Character Animation and Asset Generation Using Deep Learning
by Vlad-Constantin Lungu-Stan and Irina Georgiana Mocanu
Appl. Sci. 2024, 14(16), 7234; https://doi.org/10.3390/app14167234 - 16 Aug 2024
Viewed by 3047
Abstract
Besides video content, a significant part of entertainment is represented by computer games and animations such as cartoons. Creating such entertainment is based on two fundamental steps: asset generation and character animation. The main problem stems from its repetitive nature and the needed [...] Read more.
Besides video content, a significant part of entertainment is represented by computer games and animations such as cartoons. Creating such entertainment is based on two fundamental steps: asset generation and character animation. The main problem stems from its repetitive nature and the needed amounts of concentration and skill. The latest advances in deep learning and generative techniques have provided a set of powerful tools which can be used to alleviate these problems by facilitating the tasks of artists and engineers and providing a better workflow. In this work we explore practical solutions for facilitating and hastening the creative process: character animation and asset generation. In character animation, the task is to either move the joints of a subject manually or to correct the noisy data coming out of motion capture. The main difficulties of these tasks are their repetitive nature and the needed amounts of concentration and skill. For the animation case, we propose two decoder-only transformer based solutions, inspired by the current success of GPT. The first, AnimGPT, targets the original animation workflow by predicting the next pose of an animation based on a set of previous poses, while the second, DenoiseAnimGPT, tackles the motion capture case by predicting the clean current pose based on all previous poses and the current noisy pose. Both models obtained good performances on the CMU motion dataset, with the generated results being imperceptible to the untrained human eye. Quantitative evaluation was performed using mean absolute error between the ground truth motion vectors and the predicted motion vector. For both networks AnimGPT and DenoiseAnimGPT errors were 0.345, respectively 0.2513 (for 50 frames) that indicates better performances compared with other solutions. For asset generation, diffusion models were used. Using image generation and outpainting, we created a method that generates good backgrounds by combining the idea of text conditioned generation and text conditioned image editing. A time coherent algorithm that creates animated effects for characters was obtained. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence and Machine Learning in Games)
Show Figures

Figure 1

14 pages, 2115 KiB  
Article
IV-SSIM—The Structural Similarity Metric for Immersive Video
by Adrian Dziembowski, Weronika Nowak and Jakub Stankowski
Appl. Sci. 2024, 14(16), 7090; https://doi.org/10.3390/app14167090 - 13 Aug 2024
Cited by 4 | Viewed by 2264
Abstract
In this paper, we present a new objective quality metric designed for immersive video applications—IV-SSIM. The proposed IV-SSIM metric is an evolution of our previous work—IV-PSNR (immersive video peak signal-to-noise ratio)—which became a commonly used metric in research and ISO/IEC MPEG standardization activities [...] Read more.
In this paper, we present a new objective quality metric designed for immersive video applications—IV-SSIM. The proposed IV-SSIM metric is an evolution of our previous work—IV-PSNR (immersive video peak signal-to-noise ratio)—which became a commonly used metric in research and ISO/IEC MPEG standardization activities on immersive video. IV-SSIM combines the advantages of IV-PSNR and metrics based on the structural similarity of images, being able to properly mimic the subjective quality perception of immersive video with its characteristic distortions induced by the reprojection of pixels between multiple views. The effectiveness of IV-SSIM was compared with 16 state-of-the-art quality metrics (including other metrics designed for immersive video). Tested metrics were evaluated in an immersive video coding scenario and against a commonly used image quality database—TID2013—showing their performance in both immersive and typical, non-immersive use cases. As presented, the proposed IV-SSIM metric clearly outperforms other metrics in immersive video applications, while also being highly competitive for 2D image quality assessment. The authors of this paper have provided a publicly accessible, efficient implementation of the proposed IV-SSIM metric, which is used by ISO/IEC MPEG video coding experts in the development of the forthcoming second edition of the MPEG immersive video (MIV) coding standard. Full article
(This article belongs to the Section Electrical, Electronics and Communications Engineering)
Show Figures

Figure 1

89 pages, 16650 KiB  
Review
Video and Audio Deepfake Datasets and Open Issues in Deepfake Technology: Being Ahead of the Curve
by Zahid Akhtar, Thanvi Lahari Pendyala and Virinchi Sai Athmakuri
Forensic Sci. 2024, 4(3), 289-377; https://doi.org/10.3390/forensicsci4030021 - 13 Jul 2024
Cited by 12 | Viewed by 9380
Abstract
The revolutionary breakthroughs in Machine Learning (ML) and Artificial Intelligence (AI) are extensively being harnessed across a diverse range of domains, e.g., forensic science, healthcare, virtual assistants, cybersecurity, and robotics. On the flip side, they can also be exploited for negative purposes, like [...] Read more.
The revolutionary breakthroughs in Machine Learning (ML) and Artificial Intelligence (AI) are extensively being harnessed across a diverse range of domains, e.g., forensic science, healthcare, virtual assistants, cybersecurity, and robotics. On the flip side, they can also be exploited for negative purposes, like producing authentic-looking fake news that propagates misinformation and diminishes public trust. Deepfakes pertain to audio or visual multimedia contents that have been artificially synthesized or digitally modified through the application of deep neural networks. Deepfakes can be employed for benign purposes (e.g., refinement of face pictures for optimal magazine cover quality) or malicious intentions (e.g., superimposing faces onto explicit image/video to harm individuals producing fake audio recordings of public figures making inflammatory statements to damage their reputation). With mobile devices and user-friendly audio and visual editing tools at hand, even non-experts can effortlessly craft intricate deepfakes and digitally altered audio and facial features. This presents challenges to contemporary computer forensic tools and human examiners, including common individuals and digital forensic investigators. There is a perpetual battle between attackers armed with deepfake generators and defenders utilizing deepfake detectors. This paper first comprehensively reviews existing image, video, and audio deepfake databases with the aim of propelling next-generation deepfake detectors for enhanced accuracy, generalization, robustness, and explainability. Then, the paper delves deeply into open challenges and potential avenues for research in the audio and video deepfake generation and mitigation field. The aspiration for this article is to complement prior studies and assist newcomers, researchers, engineers, and practitioners in gaining a deeper understanding and in the development of innovative deepfake technologies. Full article
(This article belongs to the Special Issue Human and Technical Drivers of Cybercrime)
Show Figures

Figure 1

19 pages, 4243 KiB  
Article
Refining Localized Attention Features with Multi-Scale Relationships for Enhanced Deepfake Detection in Spatial-Frequency Domain
by Yuan Gao, Yu Zhang, Ping Zeng and Yingjie Ma
Electronics 2024, 13(9), 1749; https://doi.org/10.3390/electronics13091749 - 1 May 2024
Cited by 4 | Viewed by 2346
Abstract
The rapid advancement of deep learning and large-scale AI models has simplified the creation and manipulation of deepfake technologies, which generate, edit, and replace faces in images and videos. This gradual ease of use has turned the malicious application of forged faces into [...] Read more.
The rapid advancement of deep learning and large-scale AI models has simplified the creation and manipulation of deepfake technologies, which generate, edit, and replace faces in images and videos. This gradual ease of use has turned the malicious application of forged faces into a significant threat, complicating the task of deepfake detection. Despite the notable success of current deepfake detection methods, which predominantly employ data-driven CNN classification models, these methods exhibit limited generalization capabilities and insufficient robustness against novel data unseen during training. To tackle these challenges, this paper introduces a novel detection framework, ReLAF-Net. This framework employs a restricted self-attention mechanism that applies self-attention to deep CNN features flexibly, facilitating the learning of local relationships and inter-regional dependencies at both fine-grained and global levels. This attention mechanism has a modular design that can be seamlessly integrated into CNN networks to improve overall detection performance. Additionally, we propose an adaptive local frequency feature extraction algorithm that decomposes RGB images into fine-grained frequency domains in a data-driven manner, effectively isolating fake indicators in the frequency space. Moreover, an attention-based channel fusion strategy is developed to amalgamate RGB and frequency information, achieving a comprehensive facial representation. Tested on the high-quality version of the FaceForensics++ dataset, our method attained a detection accuracy of 97.92%, outperforming other approaches. Cross-dataset validation on Celeb-DF, DFDC, and DFD confirms the robust generalizability, offering a new solution for detecting high-quality deepfake videos. Full article
(This article belongs to the Special Issue Deep Learning in Image Processing and Computer Vision)
Show Figures

Figure 1

18 pages, 2214 KiB  
Article
SupCon-MPL-DP: Supervised Contrastive Learning with Meta Pseudo Labels for Deepfake Image Detection
by Kyeong-Hwan Moon, Soo-Yol Ok and Suk-Hwan Lee
Appl. Sci. 2024, 14(8), 3249; https://doi.org/10.3390/app14083249 - 12 Apr 2024
Viewed by 1796
Abstract
Recently, there has been considerable research on deepfake detection. However, most existing methods face challenges in adapting to the advancements in new generative models within unknown domains. In addition, the emergence of new generative models capable of producing and editing high-quality images, such [...] Read more.
Recently, there has been considerable research on deepfake detection. However, most existing methods face challenges in adapting to the advancements in new generative models within unknown domains. In addition, the emergence of new generative models capable of producing and editing high-quality images, such as diffusion, consistency, and LCM, poses a challenge for traditional deepfake training models. These advancements highlight the need for adapting and evolving existing deepfake detection techniques to effectively counter the threats posed by sophisticated image manipulation technologies. In this paper, our objective is to detect deepfake videos in unknown domains using unlabeled data. Specifically, our proposed approach employs Meta Pseudo Labels (MPL) with supervised contrastive learning, so-called SupCon-MPL, allowing the model to be trained on unlabeled images. MPL involves the simultaneous training of both a teacher model and a student model, where the teacher model generates pseudo labels utilized to train the student model. This method aims to enhance the adaptability and robustness of deepfake detection systems against emerging unknown domains. Supervised contrastive learning utilizes labels to compare samples within similar classes more intensively, while encouraging greater distinction from samples in dissimilar classes. This facilitates the learning of features in a diverse set of deepfake images by the model, consequently contributing to the performance of deepfake detection in unknown domains. When utilizing the ResNet50 model as the backbone, SupCon-MPL exhibited an improvement of 1.58% in accuracy compared with traditional MPL in known domain detection. Moreover, in the same generation of unknown domain detection, there was a 1.32% accuracy enhancement, while in the detection of post-generation unknown domains, there was an 8.74% increase in accuracy. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

11 pages, 3130 KiB  
Article
A System for Mixed-Reality Holographic Overlays of Real-Time Rendered 3D-Reconstructed Imaging Using a Video Pass-through Head-Mounted Display—A Pathway to Future Navigation in Chest Wall Surgery
by Jan Arensmeyer, Benedetta Bedetti, Philipp Schnorr, Jens Buermann, Donatas Zalepugas, Joachim Schmidt and Philipp Feodorovici
J. Clin. Med. 2024, 13(7), 2080; https://doi.org/10.3390/jcm13072080 - 3 Apr 2024
Cited by 4 | Viewed by 2457
Abstract
Background: Three-dimensional reconstructions of state-of-the-art high-resolution imaging are progressively being used more for preprocedural assessment in thoracic surgery. It is a promising tool that aims to improve patient-specific treatment planning, for example, for minimally invasive or robotic-assisted lung resections. Increasingly available mixed-reality hardware [...] Read more.
Background: Three-dimensional reconstructions of state-of-the-art high-resolution imaging are progressively being used more for preprocedural assessment in thoracic surgery. It is a promising tool that aims to improve patient-specific treatment planning, for example, for minimally invasive or robotic-assisted lung resections. Increasingly available mixed-reality hardware based on video pass-through technology enables the projection of image data as a hologram onto the patient. We describe the novel method of real-time 3D surgical planning in a mixed-reality setting by presenting three representative cases utilizing volume rendering. Materials: A mixed-reality system was set up using a high-performance workstation running a video pass-through-based head-mounted display. Image data from computer tomography were imported and volume-rendered in real-time to be customized through live editing. The image-based hologram was projected onto the patient, highlighting the regions of interest. Results: Three oncological cases were selected to explore the potentials of the mixed-reality system. Two of them presented large tumor masses in the thoracic cavity, while a third case presented an unclear lesion of the chest wall. We aligned real-time rendered 3D holographic image data onto the patient allowing us to investigate the relationship between anatomical structures and their respective body position. Conclusions: The exploration of holographic overlay has proven to be promising in improving preprocedural surgical planning, particularly for complex oncological tasks in the thoracic surgical field. Further studies on outcome-related surgical planning and navigation should therefore be conducted. Ongoing technological progress of extended reality hardware and intelligent software features will most likely enhance applicability and the range of use in surgical fields within the near future. Full article
(This article belongs to the Special Issue Latest Advances in Thoracic Surgery)
Show Figures

Figure 1

Back to TopTop