Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Face Swapping Consistency Transfer with Neural Identity Carrier

Future Internet 2021, 13(11), 298; https://doi.org/10.3390/fi13110298

by Kunlin Liu¹, Ping Wang¹

, Wenbo Zhou^1,*, Zhenyu Zhang², Yanhao Ge², Honggu Liu¹

, Weiming Zhang¹ and Nenghai Yu¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Future Internet 2021, 13(11), 298; https://doi.org/10.3390/fi13110298

Submission received: 8 November 2021 / Revised: 18 November 2021 / Accepted: 19 November 2021 / Published: 22 November 2021

(This article belongs to the Special Issue Digital and Social Media in the Disinformation Age)

Round 1

Reviewer 1 Report

In the reviewed paper, the authors proposed using CNNs like u-net for face analysis. Moreover, the paper presents an architecture of framework called NICe. The paper shows some interesting ideas, but some issues should be improved:
1) Add some information about results in the abstract.
2) The introduction section should be extended to the latest solution in the literature. Add some discussion where your research can be applied. Discuss some security issues. See an interesting paper called Agent architecture of an intelligent medical system based on federated learning and blockchain technology.
3) Add pseudocodes to your proposal.
4) Explain how did you choose the architecture.
5) Experimental section should be extended with a comparison with state-of-art.

Author Response

Thanks for your review. We have taken your suggestions and updated the manuscript. Here are the changes：
1. According to your suggestion 1, we add some information about results in the latter abstract, describing our methods' ability to gain higher quality video and improve the current video-level deepfake detector.
2. According to your suggestion 2, we extend our introduction with the future application of our work.
3. Pseudocodes are not appropriate to describe our pipeline. We introduce the whole pipeline in the section Methods. Our method consists of three stages, and each stage leverages a different neural network.
4. We choose the architecture according to previous research. (Blind Video Temporal Consistency via Deep Video Prior, Neurips 2020). In (Blind Video Temporal Consistency via Deep Video Prior, Neurips 2020), authors conduct experiments to prove that U-Net is more effective than normal ResNet. Related descriptions are added in Section Methods.
5. The main idea of our methods is to eliminate the inconsistency of current deepfake datasets. We prove the effectiveness of our methods by applying our methods on the most famous deepfake detection datasets, and the results outperform previous methods both qualitatively and quantitively. Current state-of-art deepfake generation methods do not provide pair datasets or open-source their codes, so we cannot conduct experiments on them ( HiFiFace, IJCAI 2021).

Reviewer 2 Report

This paper proposes a new framework to improve a face swapping system increasing the consistency between consecutive frames. This framework can be applied to several systems. The paper demonstrates the improvement when the method is applied over several systems. Quantitative and qualitive metrics have been provided to demonstrate the improvement.

I have some minor comments to improve the paper:

In section 3, I’d suggest including more mathematical details for a more detailed description.
Regarding the results, (tables 1 and 2). I’s suggest including confidence intervals in order to see the significancy of the results.
Editing aspect: in a section, before a subsection title there must be an introductory paragraph.

Author Response

Thanks for your review. We have taken your suggestions and updated the manuscript. Here are the responses：
1. We describe the process of 3D rendering and the overview of our methods in Section 3. However, they cannot provide more mathematical details. For example, the FLAME's parameter numbers are fixed because each dim represent unique information. These are already defined by the authors of FLAME (Learning a model of facial shape and expression from 4D scans, SIGGRAPH ASIA 2017). The 3D rendering process is also invented by previous work too. Our method aims at increasing the consistency between consecutive frames, so these are not the main contributions of our paper. For a more detailed description, we improved our description of 3D rendering for the convenience of understanding.
2. Thanks for your suggestion very much. Table 1 leverage the temporal coherence e_stab to measure the temporal consistency. Our methods outperform previous work at a big margin. In Table 2, our methods also outperform previous work a lot. We think the significance of our methods is evident according to these two tables.
3. Thanks for your editing aspect's suggestion. We have added introductory paragraphs before every subsection title in a section.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

The paper is well-written and interesting, however there are some unclear points.

Equation 1 should be clarified. Symbols y and f are not explained (of course, a reader may guess their meaning).
3: What does the „circled dot” symbol denote here? Is it a kind of element-wise product?
It is not clear to me how Nd can be easily derived from Md.
How model noise parameter \sigma is estimated?

Reviewer 2 Report

The paper is an improvement on the face swapping algorithms. Overall, the paper is of good quality and small adjustments could increase its value further.

The abstract contain many details about the proposed method, but very few details about the general context. A rebalance in this sense would be advised.

There are Section and Subsection wrong numberings that should be addressed. Also, the indentation of the pages should be checked. A short description of the paper (paper structure and what is presented in each section) at the end of the Introduction section is also advised to be added.

In the Methods section there is a blend between the existing and proposed methods. The paper would be easier to read if the two are clearly delimited and highlighted in an added paragraphs.

In the Experiments section, the proposed method is evaluated against others and shows better performance. The provided auxiliary materials help in evaluating the method.

It is good that the authors highlight the ethical problems of face swapping and how the current paper helps in this aspect. This valuable information can be added in the abstract too.

The authors are advised to highlight in the conclusion the punctual aspects of the paper that is their contribution, referring to sections and subsections, for more clarity and ease of read.

Article Menu

Face Swapping Consistency Transfer with Neural Identity Carrier

Further Information

Guidelines

MDPI Initiatives

Follow MDPI