Next Article in Journal
Scene Classification, Data Cleaning, and Comment Summarization for Large-Scale Location Databases
Next Article in Special Issue
Feature Activation through First Power Linear Unit with Sign
Previous Article in Journal
Monitoring Time-Non-Stable Surfaces Using Mobile NIR DLP Spectroscopy
Previous Article in Special Issue
Attentive Part-Based Alignment Network for Vehicle Re-Identification
Peer-Review Record

Learning Facial Motion Representation with a Lightweight Encoder for Identity Verification

Electronics 2022, 11(13), 1946;
by Zheng Sun, Andrew W. Sumsion, Shad A. Torrie and Dah-Jye Lee *
Reviewer 1: Anonymous
Electronics 2022, 11(13), 1946;
Submission received: 28 May 2022 / Revised: 16 June 2022 / Accepted: 20 June 2022 / Published: 22 June 2022
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)

Round 1

Reviewer 1 Report

The paper presents a method for identity verification relying on facial motion. In particular, the paper focuses on learning facial motion features using a spatial encoder followed by a sequence encoder. The usage of a triplet contrastive loss allows the model to learn discriminative features to represent facial motions tied to identities.


I have a few comments I would like the authors to address:


- The sequence encoder performs a sort of temporal max pooling, which has the advantage of making the descriptor invariant to the number of frames. However pooling is known to remove ordering information, which might be relevant for verification via motion cues. Does this have some impact on the model? I would like the authors to discuss this in the manuscript


- The paper mentions (L111) inference times of milliseconds. On which architecture? Model complexity could be better characterized.


- Literature review should be expanded including works that model temporal cues of adopt motion features related to faces. A few examples:

* Liu, Yong-Jin, et al. "A main directional mean optical flow feature for spontaneous micro-expression recognition." IEEE Transactions on Affective Computing 7.4 (2015): 299-310.

* Becattini, Federico, et al. "PLM-IPE: A Pixel-Landmark Mutual Enhanced Framework for Implicit Preference Estimation." ACM Multimedia Asia. 2021. 1-5.

*Liu, Xin, et al. "Region based parallel hierarchy convolutional neural network for automatic facial nerve paralysis evaluation." IEEE Transactions on Neural Systems and Rehabilitation Engineering 28.10 (2020): 2325-2332. 


- A conclusion section should be added to the manuscript.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

1. From line 21 - 23 need reference.

2. Figure 1 (face image) please mention the source in the figure title.

3. Line 46 GPU or ASIC need abbreviation.

4. Mention source of the face images in all figure titles.

5. follow same style in all references.




Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop