Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Learning Facial Motion Representation with a Lightweight Encoder for Identity Verification

Electronics 2022, 11(13), 1946; https://doi.org/10.3390/electronics11131946

by Zheng Sun

, Andrew W. Sumsion

, Shad A. Torrie

and Dah-Jye Lee^*

Reviewer 1: Anonymous

Reviewer 2:

Rajchandar Padmanaban

Electronics 2022, 11(13), 1946; https://doi.org/10.3390/electronics11131946

Submission received: 28 May 2022 / Revised: 16 June 2022 / Accepted: 20 June 2022 / Published: 22 June 2022

(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)

Round 1

Reviewer 1 Report

The paper presents a method for identity verification relying on facial motion. In particular, the paper focuses on learning facial motion features using a spatial encoder followed by a sequence encoder. The usage of a triplet contrastive loss allows the model to learn discriminative features to represent facial motions tied to identities.

I have a few comments I would like the authors to address:

- The sequence encoder performs a sort of temporal max pooling, which has the advantage of making the descriptor invariant to the number of frames. However pooling is known to remove ordering information, which might be relevant for verification via motion cues. Does this have some impact on the model? I would like the authors to discuss this in the manuscript

- The paper mentions (L111) inference times of milliseconds. On which architecture? Model complexity could be better characterized.

- Literature review should be expanded including works that model temporal cues of adopt motion features related to faces. A few examples:

* Liu, Yong-Jin, et al. "A main directional mean optical flow feature for spontaneous micro-expression recognition." IEEE Transactions on Affective Computing 7.4 (2015): 299-310.

* Becattini, Federico, et al. "PLM-IPE: A Pixel-Landmark Mutual Enhanced Framework for Implicit Preference Estimation." ACM Multimedia Asia. 2021. 1-5.

*Liu, Xin, et al. "Region based parallel hierarchy convolutional neural network for automatic facial nerve paralysis evaluation." IEEE Transactions on Neural Systems and Rehabilitation Engineering 28.10 (2020): 2325-2332.

- A conclusion section should be added to the manuscript.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

1. From line 21 - 23 need reference.

2. Figure 1 (face image) please mention the source in the figure title.

3. Line 46 GPU or ASIC need abbreviation.

4. Mention source of the face images in all figure titles.

5. follow same style in all references.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Article Menu

Learning Facial Motion Representation with a Lightweight Encoder for Identity Verification

Further Information

Guidelines

MDPI Initiatives

Follow MDPI