Next Article in Journal
A Formal Model of Trajectories for the Aggregation of Semantic Attributes
Previous Article in Journal
Semantic-Driven Approach for Validation of IoT Streaming Data in Trustable Smart City Decision-Making and Monitoring Systems
 
 
Article
Peer-Review Record

Neural Network Ensemble Method for Deepfake Classification Using Golden Frame Selection

Big Data Cogn. Comput. 2025, 9(4), 109; https://doi.org/10.3390/bdcc9040109
by Khrystyna Lipianina-Honcharenko 1,*, Nazar Melnyk 1, Andriy Ivasechko 1, Mykola Telka 1 and Oleg Illiashenko 2,3
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Big Data Cogn. Comput. 2025, 9(4), 109; https://doi.org/10.3390/bdcc9040109
Submission received: 5 March 2025 / Revised: 7 April 2025 / Accepted: 11 April 2025 / Published: 21 April 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The research of deepfake detection has important application significance in network security, international relations and personnel security. This paper constructs a technical framework of deepfake classification detection based on the neural network. This method detects the change intensity of the scene by detecting the gray difference between two images in the video, and takes the frame that exceeds the threshold as the golden frame. ResNet50, EfficientNetB0, Xception, InceptionV3 and Facenet are combined in the deep learning model. Public datasets are used to verify the accuracy of the method. While the paper is technically sound, the shortcomings in the work prevent me from recommending its acceptance in its current form.

  1. russia should be Russia.
  2. Reference numbers should appear in order. For example,[1, 30].
  3. The change of scene intensity between two images may be caused by factors such as the object or camera movement. Is it reasonable to use the gray level change between two images as the index of the golden frame for the deepfake detection?
  4. In the “2. Method”, there are many incorrect expressions. 3.1, 3.2, 4.1, 4.2 and other expressions are very strange. There is no equation serial number behind the equation, which further leads to the lack of references between equations and unclear logic. The expression “𝐷 = 𝑐𝑣2. 𝑐𝑣𝑡𝐶𝑜𝑙𝑜𝑟(𝑐𝑣2. 𝑎𝑏𝑠𝑑𝑖𝑓𝑓(𝑓𝑝𝑟𝑒𝑣, 𝑓𝑖), 𝑐𝑣2. 𝐶𝑂𝐿𝑂𝑅𝐵𝐺𝑅2𝐺𝑅𝐴𝑌)” is more like a program than an equation.
  5. where 𝑚 × should be left aligned.
  6. Please use equations, flow charts, etc. to avoid the following programming details.

Convert the lists 𝑋 and 𝑌 into NumPy arrays:

𝑛𝑝. 𝑎𝑟𝑟𝑎𝑦(𝑋), 𝑛𝑝. 𝑎𝑟𝑟𝑎𝑦(𝑌).

  1. Please check whether all variables are defined. E.g. “Rd” in 𝑍 = 𝐹𝑙𝑎𝑡𝑡𝑒𝑛(𝐹𝑀𝑜𝑑𝑒𝑙(𝑋)) ∈ 𝑅 .
  2. Figure 8 should be deleted.
  3. Please add a comparison between the results with and without golden frames.
  4. Please add the ablation test.
  5. Please combine the conclusion into one paragraph.

Author Response

We sincerely thank Reviewer 1 for the thorough and insightful review of our manuscript. Your comments and suggestions were very helpful in improving the overall quality and clarity of our work. Below, we provide detailed responses to each point raised and indicate how the manuscript was revised accordingly. All modifications have been highlighted in blue in the revised version.

Answers:

Comments 1: [russia should be Russia]

Response 1: Thank you for pointing this out. We agree with this comment. We have corrected the word “russia” to “Russia”

 

Comments 2: [Reference numbers should appear in order. For example,[1, 30]]

Response 2: Thank you for pointing this out. We agree with this comment. We have reordered the references 

 

Comments 3: [The change of scene intensity between two images may be caused by factors such as the object or camera movement. Is it reasonable to use the gray level change between two images as the index of the golden frame for the deepfake detection?]

Response 3: [We thank the reviewer for this insightful observation regarding the potential variability in scene intensity due to object or camera movement. Indeed, the change in grayscale intensity between consecutive frames can be influenced by several factors, including lighting changes, object motion, and camera shake, which may not necessarily indicate deepfake artifacts.

However, in our method, the purpose of using grayscale difference as a heuristic for golden frame selection is not to directly detect deepfakes at this stage, but rather to identify key transitional moments in the video—those that are likely to carry the most information due to visual scene variation. This approach is designed to reduce redundant frames and ensure that our neural network models are trained on a diverse and informative subset of the video, rather than on arbitrarily selected frames.

Moreover, we found empirically (as shown in Figures 3 and 4 of the paper) that the frames selected using this method often contain critical facial anomalies or artifacts that are useful for deepfake classification. These anomalies are captured more effectively in moments where the face undergoes motion, which often aligns with frames of higher grayscale intensity change. Thus, this strategy serves as a computationally efficient pre-filtering mechanism to extract potentially informative frames, not as a direct indicator of deepfakeness.

To strengthen our argument and address the reviewer’s concern, we have added the following clarification to Section 3 (Method), Step 2:

“While grayscale intensity change between frames may be influenced by factors such as object or camera motion, our goal in using this metric is to identify visually dynamic frames likely to contain facial deformations or artifacts. These transitions often coincide with the manifestation of deepfake inconsistencies, thereby increasing the chances of capturing critical evidence in the selected golden frames.”

We hope this addresses the reviewer’s concern and better communicates the rationale behind our design choice.]

 

Comments 4: [In the “2. Method”, there are many incorrect expressions. 3.1, 3.2, 4.1, 4.2 and other expressions are very strange. There is no equation serial number behind the equation, which further leads to the lack of references between equations and unclear logic. The expression “? = ??2. ????????(??2. ???????(?????, ??), ??2. ????????2????)” is more like a program than an equation]

Response 4: [We sincerely thank the reviewer for the careful reading and valuable feedback regarding the mathematical expressions in Section 2 (“Method”), particularly subsections 3.1, 3.2, 4.1, and 4.2. We acknowledge that some parts of the method were presented in a format that may appear unusual or unclear, especially due to the inclusion of expressions resembling code rather than formal mathematical notation.

To address this, we have made the following improvements:

  1. Refinement of mathematical expressions. Expressions that were previously written in a code-like syntax have now been reformulated using proper mathematical notation.
  2. Equation numbering. All equations of significance have been numbered consistently throughout the “Method” section to allow for proper referencing and to enhance logical flow within the text (e.g., "as shown in Equation (3)").
  3. Clarification of subsections. The content of subsections 3.1, 3.2, 4.1,  4.2 and 5.1–5.3 has been revised for clarity and coherence. We restructured the explanations to better reflect the step-by-step logic of the proposed approach, focusing on mathematical precision rather than implementation details.

We believe these revisions have significantly improved the clarity and academic rigor of the manuscript and hope they meet the reviewer’s expectations.]

 

Comments 5: [Figure 8 should be deleted]

Response 5: [Thank you for pointing this out. We fully agree with this comment. We have removed Figure 8]

 

Comments 6: [Please add a comparison between the results with and without golden frames]

Response 6: [ Thank you for pointing this out. We fully agree with this comment. We added section 4.4 ]

 

Comments 7: [Please add the ablation test]

Response 8: [ Thank you for pointing this out. We fully agree with this comment. We added section 4.4]

 

Comments 8: [Please combine the conclusion into one paragraph]

Response 8: [Thank you for pointing this out. We fully agree with this comment. We have made changes to the Conclusions section and now there is only 1 paragraph]

Reviewer 2 Report

Comments and Suggestions for Authors

This paper presents a neural network ensemble method for deepfake classification using golden frame selection. However, several issues need to be addressed to improve the quality and clarity of the work.

  1. The structure of the paper is not properly formatted. The introduction should be numbered as section 1, not 0. Ensuring correct numbering will enhance readability and alignment with standard academic writing conventions.

  2.  The mathematical equations in the paper lack proper indexing, making it difficult for readers to reference them clearly. Additionally, the equations are hard to understand, and their application within the proposed method is not well explained. It is essential to provide a clearer explanation of their role, novelty, and significance.

  3. The contributions of the paper should be explicitly stated in the introduction section. Currently, the manuscript does not effectively highlight the novelty and unique aspects of the proposed approach. A clearer articulation of the contributions is necessary to strengthen the impact of the work.

  4. A more comprehensive literature review is required, particularly focusing on related approaches that employ golden ratio or golden frame selection methods. This will help situate the proposed method within the existing body of knowledge and demonstrate how it improves upon previous works.

  5. Originality and Novelty: The novelty of the proposed method is not sufficient for a journal publication. The originality of the approach appears to be limited, and further innovation is needed to justify its contribution to the field. The paper should provide a stronger argument for its uniqueness and significance.

  6. The paper lacks comprehensive benchmarking results. Stronger empirical evidence is needed to validate the effectiveness of the proposed method. A more extensive comparison with existing techniques will help support the hypothesis and demonstrate the superiority of the approach.

Author Response

We are grateful to Reviewer 2 for the valuable feedback and thoughtful observations regarding our manuscript. Your suggestions have guided us in strengthening the methodology and presentation of our research. Please find below our point-by-point responses to each of your comments, along with references to the corresponding changes in the manuscript (highlighted in blue).

Answers:

Comments 1: [The structure of the paper is not properly formatted. The introduction should be numbered as section 1, not 0. Ensuring correct numbering will enhance readability and alignment with standard academic writing conventions]

Response 1: [Thank you for pointing this out. We agree with this comment. We have corrected the numbering of the sections and now the “Introduction” is 1 section, not zero]

 

Comments 2: [The mathematical equations in the paper lack proper indexing, making it difficult for readers to reference them clearly. Additionally, the equations are hard to understand, and their application within the proposed method is not well explained. It is essential to provide a clearer explanation of their role, novelty, and significance]

Response 2:  [Thank you for pointing this out. We agree with this comment and have made the appropriate changes, which are highlighted in blue. ]

 

Comments 3: [The contributions of the paper should be explicitly stated in the introduction section. Currently, the manuscript does not effectively highlight the novelty and unique aspects of the proposed approach. A clearer articulation of the contributions is necessary to strengthen the impact of the work.]

Response 3: [The Introduction section has been revised to explicitly highlight the key contributions of the study and to include a brief overview of the paper structure.]

 

Comments 4: [A more comprehensive literature review is required, particularly focusing on related approaches that employ golden ratio or golden frame selection methods. This will help situate the proposed method within the existing body of knowledge and demonstrate how it improves upon previous works.]

Response 4: [The literature review has been expanded with five additional studies that specifically focus on golden frame selection techniques to better contextualize the proposed approach.]

 

Comments 5: [Originality and Novelty: The novelty of the proposed method is not sufficient for a journal publication. The originality of the approach appears to be limited, and further innovation is needed to justify its contribution to the field. The paper should provide a stronger argument for its uniqueness and significance.]

Response 5: [The manuscript has been revised to better emphasize the originality of the proposed method, particularly the integration of golden frame selection based on scene intensity with a neural ensemble and meta-classification strategy, and its advantages over existing approaches have been clarified in the introduction and literature review.]

 

Comments 6: [The paper lacks comprehensive benchmarking results. Stronger empirical evidence is needed to validate the effectiveness of the proposed method. A more extensive comparison with existing techniques will help support the hypothesis and demonstrate the superiority of the approach]

Response 6: [Thank you for pointing this out. We fully agree with this comment. We added section 4.4]

Reviewer 3 Report

Comments and Suggestions for Authors

This manuscript presents a neural network for deepfake video detection that utilizes a golden frame selection technique. The approach extracts the most informative frames from videos to reduce extra data and computational costs, and it integrates multiple pre-trained models for final classification. Experimental results show an accuracy of 91%, and at the same time, the Grad-CAM analysis reveals that different architectures focus on distinct facial regions, enhancing model interpretability. This work contributes to digital forensics, media verification, and cybersecurity. However, there are still some problems that need to be solved.

(1) In Figs. 4 and 5, it seems that some neural networks haven’t achieved the convergence state, and the accuracies are still in a state of growth (Like the EfficientNet). So more epochs are needed to make accuracies of all models to reach a convergence state. Meanwhile, it is recommended that Figs. 4 and 5 should be combined into one figure, for they are a series of figures.

(2) Besides, some format problems need to be addressed. For example, the formats of references are messy.

(3) The hyperparameter tuning methods are recommended to be mentioned in the main text or the supporting information (Like the hyperparameter tuning of XGBoost). This would improve reproducibility for other researchers.

The authors are recommended to cite the papers about neural network methods, such as Sens. Actuators B: Chem. 136206 (2024), IEEE Sens. J. 24, 671–678 (2024), and so on.

Author Response

We appreciate Reviewer 3 for the constructive critique and careful reading of our manuscript. Your remarks have helped us refine key aspects of the study and clarify several parts of the text. Our detailed responses to your comments are presented below, and all related changes have been incorporated and marked in blue in the revised document.

Answers:

Comments 1: [In Figs. 4 and 5, it seems that some neural networks haven’t achieved the convergence state, and the accuracies are still in a state of growth (Like the EfficientNet). So more epochs are needed to make accuracies of all models to reach a convergence state. Meanwhile, it is recommended that Figs. 4 and 5 should be combined into one figure, for they are a series of figures.]

Response 1: [Thank you for pointing this out. We agree with the comment to some extent, so we have increased the number of epochs to reach a state of convergence - see fig. 4 and 5.]

 

Comments 2: [Besides, some format problems need to be addressed. For example, the formats of references are messy.]

Response 2: Thank you for pointing this out. We agree with this comment. We have reordered the references 

 

Comments 3: [The hyperparameter tuning methods are recommended to be mentioned in the main text or the supporting information (Like the hyperparameter tuning of XGBoost). This would improve reproducibility for other researchers. ]

Response 3: [Thank you for pointing this out. We fully agree with this comment. We added section 4.4]

 

Comments 4: [The authors are recommended to cite the papers about neural network methods, such as Sens. Actuators B: Chem. 136206 (2024), IEEE Sens. J. 24, 671–678 (2024), and so on.]

Response 4: [Thank you for the valuable suggestions; the recommended articles have been reviewed and appropriately cited in the revised manuscript.]

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The research novelty and contributions presented in the paper are not sufficiently strong for a journal publication, even if the authors make significant improvements compared to the previous version. While enhancements may improve certain aspects, the overall quality of the paper does not meet the standards required for a journal publication.

Author Response

Comments: [ The research novelty and contributions presented in the paper are not sufficiently strong for a journal publication, even if the authors make significant improvements compared to the previous version. While enhancements may improve certain aspects, the overall quality of the paper does not meet the standards required for a journal publication.]

Response: [We sincerely thank you for your thoughtful review and for taking the time to provide feedback on our manuscript. We understand and respect your critical position regarding the strength of the paper’s contribution. However, we would like to gently draw your attention to several aspects that may not have been fully evident or may warrant reconsideration:

  • Combination of novel techniques: Our approach combines a new golden frame selection method with an ensemble of neural networks and an XGBoost meta-model. This is not merely a technical aggregation, but a purposeful integration aimed at increasing efficiency and interpretability, which we believe is a valuable addition to the deepfake detection field.

  • Experimental robustness and clarity: The study includes comprehensive ablation experiments, comparison with baseline approaches, and interpretability analysis using Grad-CAM. These provide transparent insight into model behavior and strengths.

  • Practical significance: The method offers a meaningful improvement in processing efficiency (reducing computational cost) without sacrificing accuracy, which is particularly relevant for real-time applications. In addition, it addresses an urgent societal challenge — the manipulation of information using deepfakes in geopolitical contexts.

We are grateful for your constructive perspective, which has motivated us to further strengthen the clarity and positioning of our contributions. We respectfully submit this clarification in the hope that it may be helpful in reconsidering the potential value of the work..

With sincere thanks,
The Authors]

Back to TopTop