Next Article in Journal
Efficient Architectures for Full Hardware Scrypt-Based Block Hashing System
Previous Article in Journal
A Novel Deep Learning Model Compression Algorithm
 
 
Article
Peer-Review Record

Facial Identity Verification Robust to Pose Variations and Low Image Resolution: Image Comparison Based on Anatomical Facial Landmarks

Electronics 2022, 11(7), 1067; https://doi.org/10.3390/electronics11071067
by Yu-Jin Hong
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2022, 11(7), 1067; https://doi.org/10.3390/electronics11071067
Submission received: 4 February 2022 / Revised: 1 March 2022 / Accepted: 17 March 2022 / Published: 28 March 2022
(This article belongs to the Special Issue Face Recognition: Latest Trends and Future Perspectives)

Round 1

Reviewer 1 Report

The author discuss the face mapping technique, a face biometrics method, to verify the faces in pair images are of same person or of different persons. In the proposed study, major facial landmarks are computed from 2D and 3D facial images to obtain standard values from diverse facial angles and image resolutions. The proposed method is applied to obtain 3D face images based on multiple 2D images of different angles. The author claimed that the proposed method could enable future studies to perform face-to-face analysis to determine two different images are of same person. The detailed comments are given below:

  1. The term “deep learning facial features” is vaguely used. It is suggested to correct it.

 

  1. Overall framework of the proposed method given in figure 1 is not explained. The author is suggested to clearly explain each stage of the proposed method?

 

  1. For full transparency and replicability, the authors should attach / make available computer code and data to replicate the study easily. Highly recommended is code with comments which precisely generates the exact figures and result as in the paper.

 

  1. What is Computational complexity of proposed method?

Author Response

Response to Reviewer 1 Comments

The author discuss the face mapping technique, a face biometrics method, to verify the faces in pair images are of same person or of different persons. In the proposed study, major facial landmarks are computed from 2D and 3D facial images to obtain standard values from diverse facial angles and image resolutions. The proposed method is applied to obtain 3D face images based on multiple 2D images of different angles. The author claimed that the proposed method could enable future studies to perform face-to-face analysis to determine two different images are of same person. The detailed comments are given below:

 

Point 1: The term “deep learning facial features” is vaguely used. It is suggested to correct it.

Response 1: Thank you for your valuable comments. We corrected the term “deep learning facial features” on Line 45 in Page 2 in red color as follows:

  • not only visually easier to explain than deep facial features, but also objective…

Point 2: Overall framework of the proposed method given in figure 1 is not explained. The author is suggested to clearly explain each stage of the proposed method?

Response 2: Thank you for your valuable suggestions. We had explained about the overall framework for the Figure 1 on Line 90~102 in Page 3, but it was not enough and clear to express our whole procedure, for example, the 3D face reconstruction and super resolution process. Following the reviewer’s comments, we added the explanation of the Figure 1 on Line 73-87 in Page 3 as below:

  • First, to estimate a 3D geometry of the comparison face, we used photogrammetric range imaging technique from a sequence of facial images. The facial pose of the reconstructed 3D face model is rotated into the same pose as comparison image obtained by security cameras such as CCTV. In addition, super-resolution technology is applied to the comparison image with low resolution to accurately extract facial landmarks. Next, facial landmarks within each image are detected by an elastic model-based facial landmark detector; a professional facial analyst may adjust the detected landmarks for increased accuracy. Based on these facial landmarks, size normalization is performed to generate two images of same size and enable comparison of the locations of landmarks. The size of the facial image is adjusted by setting the interpupillary distance (IPD) to 100 pixels for a frontal presentation of the image. For other views, the distance between the nasion (midline depth of the nasal root) and gnathion (lowest median landmark on the lower border of the chin) are set to 100 pixels. Finally, we analyze the distribution of distances for the same individual and different individuals by calculating the distance between reciprocal landmarks of normalized facial images. The whole process is described in Figure 1.

 

Point 3: For full transparency and replicability, the authors should attach / make available computer code and data to replicate the study easily. Highly recommended is code with comments which precisely generates the exact figures and result as in the paper.

Response 3: We used open codes for the 3d face restoration, the super-resolution method, and the facial landmark extraction. Following the reviewer’s suggestion, we will put the links to these parts on our Github page. The core algorithm that calculates the distance between the feature points and the distribution of each landmark was written in C++ and will be disclosed also. However, in order to apply the proposed algorithms and see the same results for anyone, it is necessary to clean up our messy codes. As soon as possible, we will organize them with comments and post it on the Github page. We apologize for the delay. Also, following on the reviewer’s comments, we included pseudocodes as Algorithm 1 for the entire process in Page 13 as follows:

Algorithm 1. Proposed method for selecting reliable landmark indices

Input : Multi-view references images, comparison image

Output : ED and landmark indices

1. Repeat:

2. Apply SfM [41] on multi-view images, generate a 3D face mesh

2.1. Rotate the 3D mesh to a comparison facial pose and acquire rendered image

3. Conduct VDSR [47] on the comparison image, acquire high-definition image

4. Conduct wild-feature detector [54] on the rendered and comparison image, obtain landmark points

5. Select the landmark indices for frontal and profile face images following [2, 20, 21, 55]

6. Normalize the images by IPD and distance between the nasion and gnathion both 100 pixels

7. Calculate distances between the selected landmarks following Eq.1

8. Calculate d’ measure following Eq.2

9.Continue Until: Paired (genuine and imposter) images ended

10. Analyze the distribution of for same and different individuals

11. Select reliable landmark indices

 

 

Point 4: What is Computational complexity of proposed method?

Response 4: Thank you for your valuable questions. In our study, SfM and VDSR methods have been tested to solve the low-resolution and face pose problems that impede the face recognition performance. The wild-feature detector that we used is also a real-time detector that has been verified in practice. As shown in our literature (on Line 163-164 in Page 4), computation time taken for the 3D model restoration operation is less than 1 minute with GPU acceleration, and the VDSR process is also less than 1 minute (However, training time is about 2-3 days in NVIDIA 2080 TI GPUx4 for fine-tuning various DBs.) Besides, the distances and d' measurements of landmarks are calculated in 0.01/s.

 

Author Response File: Author Response.docx

Reviewer 2 Report

This paper proposes a novel approach to face biometrics, known as face mapping/face comparison. It has a smaller workload than traditional methods. This paper mainly generates a 3D face model from 2D faces from different angles, proposes an authentication method, and uses a super-resolution method to solve the problem of inaccurate evaluation. This is an interesting research paper. There are some suggestions for revision.

  1. The motivation is not clear. Please specify the importance of the proposed solution.
  2. Please highlight the contributions/innovations of the proposed solution in introduction
  3. The discussion of related work is a little bit weak. The authors ignore some relevant papers, such as "Deep convolutional neural network for real and fake face discrimination", Chinese Intelligent Systems Conference, 590-598, 2020 and "Camera style transformation with preserved self-similarity and domain-dissimilarity in unsupervised person re-identification", Journal of Visual Communication and Image Representation 80, 103303, 2021. The authors should compare the proposed method with them carefully.
  4. The article contains many acronyms of unknown meaning. For example, CCTV, SIFT, etc. in the article, people do not understand what the article is talking about, so the full name should be added first, and then the abbreviation should be used.
  5. In the article, the 2D image is reconstructed into a 3D image, and it is mentioned that the feature points are extracted from the video. How are the feature points extracted? Are there any relevant references?
  6. The method part of the article introduces the super-resolution method. The related contents should not be described in the method part but should be placed in the introduction or related work. The method part should introduce the method you use.
  7. Have you tried faces from other angles to calculate relevant indicators during the experimental verification? Do I need to re-select the marker points if I give a photo from a different angle?
  8. More technical details of the proposed solution should be given, such as mathematical analysis and related equations.
  9. The standard verified in the experimental part can be effectively differentiated and verified in the data set proposed by itself. It does not show whether the verification method has been verified on other datasets.
  10. The experimental results are not convincing. Please compare the proposed solution with more recently published solutions.

Author Response

Response to Reviewer 2 Comments

This paper proposes a novel approach to face biometrics, known as face mapping/face comparison. It has a smaller workload than traditional methods. This paper mainly generates a 3D face model from 2D faces from different angles, proposes an authentication method, and uses a super-resolution method to solve the problem of inaccurate evaluation. This is an interesting research paper. There are some suggestions for revision.

Point 1: The motivation is not clear. Please specify the importance of the proposed solution. 

Response 1: Thank you for your insightful comments. As the reviewer’s comments, our literature lacks the motivation of the research. We have specified the importance of the proposed solution on Line 97-107 in Page 3 as follows:

  • In general, large-scale datasets, such as MS-Celeb-1M [48]-10 million face images, are required for face identification, and complex algorithms and deep learning architectures need to be designed to process them. Our study has novelty in that it provides effective landmarks for face identification by analyzing traditional facial landmarks that represent the structure of a face, as a way to determine whether it is the same person by comparing two input images. This can provide visual clues for identification in that it is simpler and interpretable morphometric features than the deep learning features. Also, the images of suspects acquired from security cameras contain variations in facial pose and expression, low image resolution, and occlusion of the face by hair or accessories. These factors can degrade the comparison performance of experts such as investigators [49,50].

 

Point 2: Please highlight the contributions/innovations of the proposed solution in introduction.

 

Response 2: Thank you for your valuable comments. We added the contributions/innovations of the proposed method as highlights on Line 120-128 in Page 3 as following:

 

  • The contributions of this study can be summarized as follows:
    • We present a landmark-based face mapping method which can represent the morphology of the face, and this is easier to provide visual and interpretable cues than deep features.
    • We provide the landmark indices and associated thresholds by which to determine whether input images have the same identity.
    • To cope with the images of low resolution and various poses, this study extracts more accurate facial landmarks from the input faces through restoring the 3D model to correct the poses and improving the image quality.

 

Point 3: The discussion of related work is a little bit weak. The authors ignore some relevant papers, such as "Deep convolutional neural network for real and fake face discrimination", Chinese Intelligent Systems Conference, 590-598, 2020 and "Camera style transformation with preserved self-similarity and domain-dissimilarity in unsupervised person re-identification", Journal of Visual Communication and Image Representation 80, 103303, 2021. The authors should compare the proposed method with them carefully.

 

Response 3: Thank you for your valuable comments. We added the two important related works that you have suggested. The first paper is cited as [4] on Line 21-23 in Page 1 as follows:

 

  • Currently, with increasing utilization of security and surveillance systems, acquired images are commonly used in public safety and image forgery detection [1-4].
  •  

The second paper is cited as [27] on Line 63-66 in Page 2 as follows:

  • Finding a similar face is related to the face verification that selects one ID that is most likely to match the input in that it infers the face image that is most perceptually similar to the input [22,27,28].

 

Point 4: The article contains many acronyms of unknown meaning. For example, CCTV, SIFT, etc. in the article, people do not understand what the article is talking about, so the full name should be added first, and then the abbreviation should be used.

 

Response 4: Thank you for your insightful comments. As you commented, we added the full name of the abbreviation in our literature for better understanding of the readers as follows:

 

On Line 68 in Page 2,

  • SIFT (Scale-Invariant Feature Transform)

On Line 111-112 in Page 3,

  • CCTV (Closed-Circuit Television)

On Line 163 in Page 4,

  • GPU (Graphics Processing Unit)

 

Point 5: In the article, the 2D image is reconstructed into a 3D image, and it is mentioned that the feature points are extracted from the video. How are the feature points extracted? Are there any relevant references?

 

Response 5: We appreciate for the good comments. We used wild-feature detector on Line 184 in Page 5 of our literature for the landmark detection process. The wild-feature detector, as you know, is a face detector for still images and real-time videos based on viola-jones algorithm. The landmarks, the result of this study, were estimated from image sequences, but our method can be applied to video such as CCTV which is a set of image sequences. However, as you pointed out, the term video has been changed to image as it may confuse readers. The corrected term is represented on Line 162 in Page 4 as follows:

 

  • …2D feature points from images…

 

Point 6: The method part of the article introduces the super-resolution method. The related contents should not be described in the method part but should be placed in the introduction or related work. The method part should introduce the method you use.

 

Response 6: I appreciate for your valuable comments. Thanks to your advice, we have been able to improve your thesis. We added 8 reference works for 3D face reconstructions and moved the related work of super resolution on Line 73-87 in Page 2 as follows:

 

  • To handle the pose problem, a promising method is to use 3D facial information because the restored 3d face model is available to render target facial poses.It is an ill-posed problem to generate 3D information from a single image [37], prior-model based methods have been proposed, which requires 3D scan data of a large number of people to reproduce various face shapes of the input [38-40]. In addition, a 3D geometry can be directly estimated from images using the SfM (Structure from Motion) - based methods for estimating camera movements from image sequences with various viewpoints [41,42]. Also, if there are several photos of the object illuminated in different directions, the depth can be estimated more accurately by combining this information. These photometric stereo techniques have been traditionally applied for 3D reconstruction [43,44]. Recently, deep learning-based super-resolution studies have brought great progress in image quality improvement. There are many methods for super resolution that are based on deep-learning technology, such as super resolution using deep convolutional network (SRCNN), coupled deep convolutional auto-encoder (CDCA), very deep convolutional network-based super resolution (VDSR) [45-47].

 

Point 7: Have you tried faces from other angles to calculate relevant indicators during the experimental verification? Do I need to re-select the marker points if I give a photo from different angle?

 

Response 7: Thank you for the good questions. Unfortunately, we did not conduct our method on various face poses. We would appreciate it if you consider that there are still many shortcomings for our study as a starting point for proposing landmarks that can be used for recognition in profile and frontal faces. This study is proposed to aid in forensic investigations. This is because the suspect mugshots include the front and side poses. As the reviewer pointed out, maybe different landmarks are selected for different angles of the face. Our future work includes to discover landmarks that can be effectively used for identification of various face poses.

 

Point 8: More technical details of the proposed solution should be given, such as mathematical analysis and related equations.

 

Response 8: Thanks for pointing out the important part. As the reviewer knows, we provided a standard ED value and more reliable facial landmarks for identity verification. By using the d measurement between two distributions, which are from same identity and different identity, we could confirm the reliable landmarks according to each facial pose. The algorithms such as SfM and VDSR were used in the image preprocessing part, but mathematical descriptions of these methods were not included as they thought it was beyond the scope of the paper. If this part is needed, please let the reviewer point it out again, we'll add it. The mathematical analysis that we used to obtain the results of this paper is to calculate the Euclidean Distance of the feature points, and based on this, calculate the mean and standard deviation of the genuine when the same identity and the imposter in the different identity to calculate the d' measure. As the distance between the two distributions increases, the value of d also becomes greater as well. Since we used the heuristic threshold selection, we would like to provide numerical results of our literature (Figure 8~9, Table 3, Table 5~ 6) that show how the landmarks were chosen instead of a mathematical explanation.

 

Point 9: The standard verified in the experimental part can be effectively differentiated and verified in the data set proposed by itself. It does not show whether the verification method has been verified on other datasets.

 

Response 9: Thank you for the nice pointing out. Since our experiment was evaluated on the dataset that we built, we did not test whether it is useful in other datasets. It is true that this part also has to be dealt with in our future work and was not explained in our document. Based on the reviewer's good suggestion, We will specify this part in the future work of Section 4 on Line 313-315 in Page 14 as follows:

 

  • In future work, we plan to study standard measures by applying the proposed methods from various face datasets and further, identity verification based on image comparisons that are robust to different facial expressions and occlusion of facial areas by hair or accessories.

 

Point 10: The experimental results are not convincing. Please compare the proposed solution with more recently published solutions.

 

Response 10: Thank you for your insightful comments. In fact, unfortunately, there are not many recent studies similar to our work yet. Our study is a result of the needs of forensic investigators, and from that point of view, a landmark-based face comparison method of a face can be useful in a forensic context. The landmarks that describe the shape of the face have already been defined and widely used, and our research also utilizes them. We suggest the landmark indices and measures that useful for identification by using ED and d' measure. A similar work to our study [1] begins with selecting a landmark index based on the previously studied studies we have referenced. The landmark indices values such as averages, SD for the reference image will be presented, and they calculate the probability of these values from the probability distribution for the suspect as well as from the probability distribution of the population to give the likelihood ratio. As a result, it is a method of determining whether the input images are the same person or not by a heuristic threshold method of the LR values ​​of the landmarks. Therefore, a direct comparison does not seem appropriate, as it is very different from our method where we used the d’ measure to prove effective landmarks for recognition. Also, [2] is the related to our study, it is difficult to quantitatively compare their study with ours because the number of identities used in the experiment (5 persons) and the data set used are not publicly available, making it difficult to conduct comparative studies. The reference work [3] is a similar work with our study but, only the front face was used in this study and it is unlike our work using profile images. We ask for the reviewer’s generous understanding in this regard, and in our document, we intend to supplement in the related work section to prove convince results for reviewers and readers on Line 33-44 in Page 1-2 as follows:

 

  • In a forensic context, landmark-based face comparison methods can be useful. The method that using likelihood ratios (LRs) to assess the grouping of facial images based on the morphometric indices had been presented [6]. The landmark values such as averages, SDs for the reference image are estimated, and it calculates the probability of these values from the probability distributions for the suspect as well as the population to acquire the likelihood ratio. Also, using facial ratio from inter-landmark distances, it performed intra- and inter-sample comparisons by mean absolute value, Euclidean distance and cosine distance between ratios [7]. The statistics of the two frontal-faces were examined to determine the calculation that best identifies a detectable correlation difference between faces belonging to the same person. In addtion, a series of area triplets was constructed for geometric invariance including area ratio and angle, then used as feature vectors in face identification by the detected landmarks [8].

 

References

[1] Rajesh V. et al. Towards facial recognition using likelihood ratio approach to facial landmark indices from images. (Foresic Sci. Int. Rep, 2022)

[2] Juhong A. et al., Face recognition based on facial landmark detection (BMEiCON 2017).

[3] Kleinberg, K.F. et al., A study of quantitative comparisons of photographs and video images based on landmark derived feature vectors (Foresic Sci. Int. 2012).

Author Response File: Author Response.docx

Reviewer 3 Report

The authors should add a section with the description of similar works.
The authors should describe with pseudocode the whole algorithm.
The authors should compare their methodology with other similar works.
The authors should better explain why the proposed methodology seems to work well and present some information about the time efficiency of their method.

Author Response

Response to Reviewer 3 Comments

Point 1: The authors should add a section with the description of similar works.

 

Response 1: Thank you for your insightful comments. As the reviewer comments, we added the related work of our study on Line 33-34 in Page 1-2 as follows:

 

  • In a forensic context, landmark-based face comparison methods can be useful. The method that using likelihood ratios (LRs) to assess the grouping of facial images based on the morphometric indices had been presented [6]. The landmark values such as averages, SDs for the reference image are estimated, and it calculates the probability of these values from the probability distributions for the suspect as well as the population to acquire the likelihood ratio. Also, using facial ratio from inter-landmark distances, it performed intra- and inter-sample comparisons by mean absolute value, Euclidean distance and cosine distance between ratios [7]. The statistics of the two frontal-faces were examined to determine the calculation that best identifies a detectable correlation difference between faces belonging to the same person. In addtion, a series of area triplets was constructed for geometric invariance including area ratio and angle, then used as feature vectors in face identification by the detected landmarks [8].

 

Point 2: The authors should describe with pseudocode the whole algorithm.

 

Response 2: Thank you for your insightful suggestions. To make your results more convincing, we added pseudocode for our whole procedure. The pseudocodes are following as:

  •  

 

Algorithm 1. Proposed method for selecting reliable landmark indices

 

Input : Multi-view references images, comparison image

Output : ED and landmark indices

1. Repeat:

2. Apply SfM [41] on multi-view images, generate a 3D face mesh

2.1. Rotate the 3D mesh to a comparison facial pose and acquire rendered image

3. Conduct VDSR [47] on the comparison image, acquire high-definition image

4. Conduct wild-feature detector [54] on the rendered and comparison image, obtain landmark points

5. Select the landmark indices for frontal and profile face images following [2, 20, 21, 55]

6. Normalize the images by IPD and distance between the nasion and gnathion both 100 pixels

7. Calculate distances between the selected landmarks following Eq.1

8. Calculate d’ measure following Eq.2

9.Continue Until: Paired (genuine and imposter) images ended

10. Analyze the distribution of for same and different individuals

11. Select reliable landmark indices

 

Point 3: The authors should compare their methodology with other similar works.

 

Response 3: Thank you for your insightful comments. In fact, unfortunately, there are not many recent studies similar to our work yet. Our study is a result of the needs of forensic investigators, and from that point of view, a landmark-based face comparison method of a face can be useful in a forensic context. The landmarks that describe the shape of the face have already been defined and widely used, and our research also utilizes them. We suggest the landmark indices and measures that useful for identification by using ED and d' measure. A similar work to our study [1] begins with selecting a landmark index based on the previously studied studies we have referenced. The landmark indices values such as averages, SD for the reference image will be presented, and they calculate the probability of these values from the probability distribution for the suspect as well as from the probability distribution of the population to give the likelihood ratio. As a result, it is a method of determining whether the input images are the same person or not by a heuristic threshold method of the LR values ​​of the landmarks. Therefore, a direct comparison does not seem appropriate, as it is very different from our method where we used the d’ measure to prove effective landmarks for recognition. Also, [2] is the related to our study, it is difficult to quantitatively compare their study with ours because the number of identities used in the experiment (5 persons) and the data set used are not publicly available, making it difficult to conduct comparative studies. The reference work [3] is a similar work with our study but, only the front face was used in this study and it is unlike our work using profile images. We ask for the reviewer’s generous understanding in this regard, and in our document, we intend to supplement in the related work section to prove convince results for reviewers and readers on Line 33-44 in Page 1-2 as follows:

 

  • In a forensic context, landmark-based face comparison methods can be useful. The method that using likelihood ratios (LRs) to assess the grouping of facial images based on the morphometric indices had been presented [6]. The landmark values such as averages, SDs for the reference image are estimated, and it calculates the probability of these values from the probability distributions for the suspect as well as the population to acquire the likelihood ratio. Also, using facial ratio from inter-landmark distances, it performed intra- and inter-sample comparisons by mean absolute value, Euclidean distance and cosine distance between ratios [7]. The statistics of the two frontal-faces were examined to determine the calculation that best identifies a detectable correlation difference between faces belonging to the same person. In addtion, a series of area triplets was constructed for geometric invariance including area ratio and angle, then used as feature vectors in face identification by the detected landmarks [8].

 

Point 4: The authors should better explain why the proposed methodology seems to work well and present some information about the time efficiency of their method.

 

Response 4: Thank you for your valuable comments. In our study, many algorithms have been tested to solve the low-resolution and face pose problems that impede the face recognition performance, and as a result, the accuracy of the results is increased by using the SfM and VDSR methods, which show the best performance and speed. The wild-feature detector that we used is also a real-time detector that has been verified in practice. We used the landmarks that are verified in many researches, and studied for mathematical probabilistic models to increase the reliability of the landmark selection required for face recognition. As shown in our literature (on Line 163-164 in Page 4), computation time taken for the 3D model restoration operation is less than 1 minute using GPU acceleration, and the VDSR process is also less than 1 minute (However, training time is about 2-3 days in NVIDIA 2080 TI GPUx4 for GPUx4 for fine-tuning various DBs.) Besides, the distances and d' measurements of landmarks are calculated in 0.01/s.

 

References

[1] Rajesh V. et al. Towards facial recognition using likelihood ratio approach to facial landmark indices from images. (Foresic Sci. Int. Rep, 2022)

[2] Juhong A. et al., Face recognition based on facial landmark detection (BMEiCON 2017).

[3] Kleinberg, K.F. et al., A study of quantitative comparisons of photographs and video images based on landmark derived feature vectors (Foresic Sci. Int. 2012).

 

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

I accept the paper. 

Reviewer 2 Report

All my concerns have been addressed. I recommend this paper for publication.

Reviewer 3 Report

The paper could be accepted in the current form

Back to TopTop