Historical Blurry Video-Based Face Recognition
Abstract
:1. Introduction
2. Related Works
2.1. Face Detection Using Deep Learning Algorithms
2.2. Face Recognition Algorithms
2.3. Face-Tracking Algorithms
2.4. Image Restoration
3. Face Recognition Structure
3.1. Face Detection
- P-Net (Proposal Network): P-Net is the first stage in the TB-MTCNN cascade and plays a crucial role in generating preliminary face candidates across the image. It scans the image using a sliding window approach, allowing for the detection of faces at various locations.
- -
- Multi-Scale Detection: To capture faces of different sizes, P-Net operates on multiple scaled versions of the input image, effectively enabling the detection of both small and large faces within a single frame.
- -
- Probabilistic Assessment: Along with proposing candidate bounding boxes, P-Net also assigns a probability score to each box, indicating the likelihood that it contains a face. This score is critical for subsequent filtering processes in later stages.
- -
- Input Image Size: The network processes input images of size pixels, optimized for rapid detection and low computational overhead, making it suitable for real-time applications.
- R-Net (Refinement Network):
- -
- Refinement of Proposals: After the initial detection by P-Net, R-Net refines the candidate bounding boxes by more accurately discerning which contain faces, significantly reducing the number of false positives.
- -
- Enhanced Spatial Resolution: R-Net processes higher resolution images (24 × 24 pixels) compared to P-Net’s 12 × 12 pixels, enabling more effective discernment of facial features and improved localization of face boundaries.
- -
- Probability Filtering: R-Net computes a second-level confidence score to assess the likelihood of a face within each box, discarding boxes with low probability scores to reduce computational load and improve the precision of detections.
- O-Net (Output Network):
- -
- Final Bounding Box Adjustments: O-Net provides final adjustments to the bounding boxes, ensuring they are tightly fitted around the detected faces. This fine-tuning is crucial for applications requiring precise facial recognition or analysis.
- -
- Facial Landmark Detection: Beyond identifying faces, O-Net detects facial landmarks such as the eyes, nose, and mouth, supporting advanced facial analysis tasks like emotion recognition, facial alignment, and augmented reality applications.
- -
- High Receptive Field: With the largest input size (48 × 48 pixels), O-Net has a wider receptive field, allowing it to better integrate contextual information from a larger area of the image. This aids in accurate face and landmark localization even in complex scenarios.
- -
- Additional Outputs: The processing method of O-Net is similar to that of R-Net, but it also outputs face bounding box regressions and landmarks locations, in addition to performing face classification.
3.2. Face-Tracking Algorithm
3.3. Face Classifier
4. Experiments
4.1. Effectiveness of the Proposed TB-MTCNN Model
4.2. Fine-Tuning the Face Classifier
4.3. Training the Deep CNN as a Face Appearance Descriptor
4.4. Face Recognition and Tracking Experiments
5. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Correction Statement
References
- Carstensen, L.L. The Influence of a Sense of Time on Human Development. Science 2006, 312, 1913–1915. [Google Scholar] [CrossRef] [PubMed]
- Best, J.J. Who Talked to the President When? A Study of Lyndon B. Johnson. Political Sci. Q. 1988, 103, 531–545. [Google Scholar] [CrossRef]
- Sun, X.; Wu, P.; Hoi, S.C. Face detection using deep learning: An improved faster RCNN approach. Neurocomputing 2018, 299, 42–50. [Google Scholar] [CrossRef]
- Ding, C.; Tao, D. Trunk-Branch Ensemble Convolutional Neural Networks for Video-Based Face Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 1002–1014. [Google Scholar] [CrossRef]
- Hadid, A.; Pietikainen, M. From still image to video-based face recognition: An experimental analysis. In Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, Republic of Korea, 19 May 2004; pp. 813–818. [Google Scholar] [CrossRef]
- Li, Z.; Tie, Y.; Qi, L. Face Recognition in Real-world Internet Videos Based on Deep Learning. In Proceedings of the 2019 8th International Symposium on Next Generation Electronics (ISNE), Zhengzhou, China, 9–10 October 2019; pp. 1–3. [Google Scholar] [CrossRef]
- Huang, Z.; Shan, S.; Wang, R.; Zhang, H.; Lao, S.; Kuerban, A.; Chen, X. A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database. IEEE Trans. Image Process. 2015, 24, 5967–5981. [Google Scholar] [CrossRef]
- Ong, E.P.; Loke, M.H.; Lin, W.; Lu, Z.; Yao, S. Video Quality Metrics—An Analysis for Low Bit Rate Videos. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing—ICASSP ’07, Honolulu, HI, USA, 15–20 April 2007; Volume 1, pp. I–889–I–892. [Google Scholar] [CrossRef]
- Li, M.; Jianbin, S.; Hui, L. A Determining Method of Frame Rate and Resolution to Boost the Video Live QoE. In Proceedings of the 2nd International Conference on Multimedia and Image Processing (ICMIP), Wuhan, China, 17–19 March 2017; pp. 206–209. [Google Scholar] [CrossRef]
- Kharchevnikova, A.; Savchenko, A.V. Efficient video face recognition based on frame selection and quality assessment. PeerJ Comput. Sci. 2021, 7, e391. [Google Scholar] [CrossRef] [PubMed]
- Taskiran, M.; Kahraman, N.; Eroglu Erdem, C. Hybrid face recognition under adverse conditions using appearance-based and dynamic features of smile expression. IET Biom. 2021, 10, 99–115. [Google Scholar] [CrossRef]
- Handa, A.; Agarwal, R.; Kohli, N. Incremental approach for multi-modal face expression recognition system using deep neural networks. Int. J. Comput. Vis. Robot. 2021, 11, 1–20. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Xu, Y.; Yan, W.; Sun, H.; Yang, G.; Luo, J. CenterFace: Joint Face Detection and Alignment Using Face as Point. arXiv 2019, arXiv:1911.03599. [Google Scholar] [CrossRef]
- He, Y.; Xu, D.; Wu, L.; Jian, M.; Xiang, S.; Pan, C. LFFD: A Light and Fast Face Detector for Edge Devices. arXiv 2019, arXiv:1904.10633. [Google Scholar]
- Zhang, S.; Zhu, X.; Lei, Z.; Shi, H.; Wang, X.; Li, S.Z. S3FD: Single Shot Scale-invariant Face Detector. arXiv 2017, arXiv:1708.05237. [Google Scholar]
- Yang, S.; Luo, P.; Loy, C.C.; Tang, X. WIDER FACE: A Face Detection Benchmark. arXiv 2015, arXiv:1511.06523. [Google Scholar]
- Yang, S.; Luo, P.; Loy, C.C.; Tang, X. From facial parts responses to face detection: A deep learning approach. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3676–3684. [Google Scholar]
- Zhu, C.; Zheng, Y.; Luu, K.; Savvides, M. CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection. arXiv 2016, arXiv:1606.05413. [Google Scholar]
- Li, J.; Wang, Y.; Wang, C.; Tai, Y.; Qian, J.; Yang, J.; Wang, C.; Li, J.; Huang, F. DSFD: Dual Shot Face Detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5055–5064. [Google Scholar] [CrossRef]
- Albiero, V.; Chen, X.; Yin, X.; Pang, G.; Hassner, T. img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation. arXiv 2021, arXiv:2012.07791. [Google Scholar]
- Chou, K.; Cheng, Y.; Chen, W.; Chen, Y. Multi-task Cascaded and Densely Connected Convolutional Networks Applied to Human Face Detection and Facial Expression Recognition System. In Proceedings of the 2019 International Automatic Control Conference (CACS), Keelung, Taiwan, 13–16 November 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric. arXiv 2017, arXiv:1703.07402. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Wan, Z.; Zhang, B.; Chen, D.; Zhang, P.; Chen, D.; Liao, J.; Wen, F. Bringing Old Photos Back to Life. arXiv 2020, arXiv:2004.09484. [Google Scholar]
- Høye, T.T.; Ärje, J.; Bjerge, K.; Hansen, O.L.; Iosifidis, A.; Leese, F.; Mann, H.M.; Meissner, K.; Melvad, C.; Raitoharju, J. Deep learning and computer vision will transform entomology. Proc. Natl. Acad. Sci. USA 2021, 118, e2002545117. [Google Scholar] [CrossRef]
- LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Sadiku, M.N.O.; Zhou, Y.; Musa, S.M. Smart Computing. Int. J. Eng. Res. Adv. Technol. 2019, 5, 26–29. [Google Scholar] [CrossRef]
- Nandal, P.; Bura, D.; Singh, M. Emerging Trends of Big Data in Cloud Computing. In Applications of Big Data in Large-and Small-Scale Systems; IGI Global: Hershey, PA, USA, 2021; pp. 38–55. [Google Scholar]
- Lim, H.i. A Study on Dropout Techniques to Reduce Overfitting in Deep Neural Networks. In Advanced Multimedia and Ubiquitous Engineering: MUE-FutureTech; Park, J.J., Loia, V., Pan, Y., Sung, Y., Eds.; Springer: Singapore, 2021; pp. 133–139. [Google Scholar]
- Chen, T.; Zhang, Z.; Liu, S.; Chang, S.; Wang, Z. Robust overfitting may be mitigated by properly learned smoothening. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 4 May 2021; Volume 1. [Google Scholar]
- Shi, X.; Liu, Y. Sample Contribution Pattern Based Big Data Mining Optimization Algorithms. IEEE Access 2021, 9, 32734–32746. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Hu, P.; Ramanan, D. Finding Tiny Faces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1522–1530. [Google Scholar] [CrossRef]
- Yoo, Y.; Han, D.; Yun, S. EXTD: Extremely Tiny Face Detector via Iterative Filter Reuse. arXiv 2019, arXiv:1906.06579. [Google Scholar]
- Li, H.; Lin, Z.; Shen, X.; Brandt, J.; Hua, G. A convolutional neural network cascade for face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 5325–5334. [Google Scholar] [CrossRef]
- Zhang, F.; Fan, X.; Ai, G.; Song, J.; Qin, Y.; Wu, J. Accurate Face Detection for High Performance. arXiv 2019, arXiv:1905.01585. [Google Scholar]
- Zhang, C.; Xu, X.; Tu, D. Face Detection Using Improved Faster RCNN. arXiv 2018, arXiv:1802.02142. [Google Scholar]
- Wang, Y.; Ji, X.; Zhou, Z.; Wang, H.; Li, Z. Detecting Faces Using Region-based Fully Convolutional Networks. arXiv 2017, arXiv:1709.05256. [Google Scholar]
- Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. DeepFace: Closing the Gap to Human-Level Performance in Face Verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1701–1708. [Google Scholar] [CrossRef]
- Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar] [CrossRef]
- Yi, D.; Lei, Z.; Liao, S.; Li, S.Z. Learning Face Representation from Scratch. arXiv 2014, arXiv:1411.7923. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. arXiv 2014, arXiv:1409.4842. [Google Scholar]
- Kim, C.; Li, F.; Ciptadi, A.; Rehg, J.M. Multiple Hypothesis Tracking Revisited. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 4696–4704. [Google Scholar] [CrossRef]
- Rezatofighi, S.H.; Milan, A.; Zhang, Z.; Shi, Q.; Dick, A.; Reid, I. Joint Probabilistic Data Association Revisited. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), hlSantiago, Chile, 7–13 December 2015; pp. 3047–3055. [Google Scholar] [CrossRef]
- Kim, C.; Fuxin, L.; Alotaibi, M.; Rehg, J.M. Discriminative Appearance Modeling with Multi-track Pooling for Real-time Multi-object Tracking. arXiv 2021, arXiv:2101.12159. [Google Scholar]
- Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016. [Google Scholar] [CrossRef]
- Giakoumis, I.; Nikolaidis, N.; Pitas, I. Digital image processing techniques for the detection and removal of cracks in digitized paintings. IEEE Trans. Image Process. 2006, 15, 178–188. [Google Scholar] [CrossRef] [PubMed]
- Chang, R.C.; Sie, Y.L.; Chou, S.M.; Shih, T. Photo Defect Detection for Image Inpainting. In Proceedings of the 7th IEEE International Symposium on Multimedia (ISM’05), Irvine, CA, USA, 14 December 2005; p. 5. [Google Scholar] [CrossRef]
- Whyte, O.; Sivic, J.; Zisserman, A.; Ponce, J. Non-uniform deblurring for shaken images. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 491–498. [Google Scholar] [CrossRef]
- Noroozi, M.; Chandramouli, P.; Favaro, P. Motion Deblurring in the Wild. arXiv 2017, arXiv:1701.01486. [Google Scholar]
- Fan, S.; Luo, Y. Deblurring Processor for Motion-Blurred Faces Based on Generative Adversarial Networks. arXiv 2021, arXiv:2103.02121. [Google Scholar]
- Kupyn, O.; Budzan, V.; Mykhailych, M.; Mishkin, D.; Matas, J. DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. arXiv 2018, arXiv:1711.07064. [Google Scholar]
- Lenka, M.K.; Pandey, A.; Mittal, A. Blind Deblurring Using GANs. arXiv 2019, arXiv:1907.11880. [Google Scholar]
- Ghosh, S.S.; Hua, Y.; Mukherjee, S.S.; Robertson, N.M. Improving Detection And Recognition Of Degraded Faces By Discriminative Feature Restoration Using GAN. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Abu Dhabi, UAE, 25–28 October 2020; pp. 2146–2150. [Google Scholar] [CrossRef]
- Wojke, N.; Bewley, A. Deep Cosine Metric Learning for Person Re-Identification. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 748–756. [Google Scholar] [CrossRef]
- Koonce, B. ResNet 50. In Convolutional Neural Networks with Swift for Tensorflow; Springer: Berlin/Heidelber, Germany, 2021; pp. 63–72. [Google Scholar]
- Huang, G.B.; Ramesh, M.; Berg, T.; Learned-Miller, E. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments; Technical Report 07-49; University of Massachusetts: Amherst, MA, USA, 2007. [Google Scholar]
P-net | in shape | in channels | out channels | kernel size | stride | padding | out shape |
3 | 10 | 3 | 1 | 0 | |||
10 | 10 | 2 | 2 | 1 | |||
10 | 16 | 3 | 1 | 0 | |||
10 | 16 | 1 | 1 | 0 | |||
16 | 32 | 3 | 1 | 0 | |||
16 | 32 | 5 | 1 | 0 | |||
32 | 2 | 1 | 1 | 0 | |||
32 | 4 | 1 | 1 | 0 | |||
32 | 10 | 1 | 1 | 0 | |||
R-net | in shape | in channels | out channels | kernel size | stride | padding | out shape |
3 | 28 | 3 | 1 | 0 | |||
28 | 28 | 3 | 2 | 1 | |||
28 | 48 | 3 | 1 | 0 | |||
48 | 48 | 3 | 2 | 0 | |||
28 | 48 | 5 | 1 | 0 | |||
28 | 48 | 7 | 1 | 0 | |||
48 | 64 | 2 | 1 | 0 | |||
48 | 64 | 5 | 1 | 0 | |||
48 | 64 | 3 | 1 | 0 | |||
line | in unit | out unit | |||||
192× 3× 3 | 128 | ||||||
128 | 1 | ||||||
128 | 4 | ||||||
128 | 10 | ||||||
O-net | in shape | in channels | out channels | kernel size | stride | padding | out shape |
3 | 32 | 3 | 1 | 0 | |||
32 | 32 | 2 | 2 | 1 | |||
32 | 64 | 3 | 1 | 0 | |||
64 | 64 | 3 | 2 | 0 | |||
64 | 64 | 2 | 1 | 0 | |||
64 | 64 | 2 | 2 | 0 | |||
64 | 128 | 5 | 1 | 0 | |||
128 | 128 | 2 | 2 | 0 | |||
64 | 128 | 7 | 1 | 0 | |||
128 | 128 | 2 | 1 | 0 | |||
64 | 128 | 2 | 1 | 0 | |||
line | in unit | out unit | |||||
384 × 3 × 3 | 256 | ||||||
256 | 1 | ||||||
256 | 4 | ||||||
256 | 10 |
Method | Precision | Recall | F1 Score |
---|---|---|---|
Two-Stage CNN [17] | 0.589 | 0.496 | 0.539 |
Faceness [18] | 0.604 | 0.617 | 0.610 |
LFFD [15] | 0.865 | 0.693 | 0.769 |
CMS-RCNN [19] | 0.874 | 0.704 | 0.779 |
img2pose [21] | 0.891 | 0.735 | 0.805 |
MTCNN [22] | 0.820 | 0.636 | 0.716 |
TB-MTCNN (present) | 0.883 | 0.713 | 0.789 |
Method | Precision | Recall | F1 Score |
---|---|---|---|
Two-Stage CNN [17] | 0.750 | 0.631 | 0.686 |
Faceness [18] | 0.807 | 0.716 | 0.759 |
LFFD [15] | 0.938 | 0.940 | 0.939 |
CMS-RCNN [19] | 0.943 | 0.937 | 0.940 |
img2pose [21] | 0.962 | 0.950 | 0.956 |
MTCNN [22] | 0.925 | 0.892 | 0.908 |
TB-MTCNN (present) | 0.964 | 0.942 | 0.945 |
Method | Precision | Recall | F1 Score |
---|---|---|---|
Two-Stage CNN [17] | 0.517 | 0.192 | 0.280 |
Faceness [18] | 0.548 | 0.206 | 0.299 |
LFFD [15] | 0.829 | 0.315 | 0.457 |
CMS-RCNN [19] | 0.831 | 0.319 | 0.461 |
img2pose [21] | 0.842 | 0.336 | 0.480 |
MTCNN [22] | 0.750 | 0.223 | 0.344 |
TB-MTCNN (present) | 0.836 | 0.325 | 0.468 |
Method | Recall Rate | Precision |
---|---|---|
TB-MTCNN + ResNet | 0.685 | 0.835 |
TB-MTCNN + ResNet + SORT | 0.754 | 0.812 |
TB-MTCNN + ResNet + Deep SORT | 0.863 | 0.834 |
TB-MTCNN + ResNet + Deep SORT+ Image Restoration | 0.879 | 0.845 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhai, L.; Cui, S.; Wang, Y.; Wang, S.; Zhou, J.; Wilsbacher, G. Historical Blurry Video-Based Face Recognition. J. Imaging 2024, 10, 236. https://doi.org/10.3390/jimaging10090236
Zhai L, Cui S, Wang Y, Wang S, Zhou J, Wilsbacher G. Historical Blurry Video-Based Face Recognition. Journal of Imaging. 2024; 10(9):236. https://doi.org/10.3390/jimaging10090236
Chicago/Turabian StyleZhai, Lujun, Suxia Cui, Yonghui Wang, Song Wang, Jun Zhou, and Greg Wilsbacher. 2024. "Historical Blurry Video-Based Face Recognition" Journal of Imaging 10, no. 9: 236. https://doi.org/10.3390/jimaging10090236
APA StyleZhai, L., Cui, S., Wang, Y., Wang, S., Zhou, J., & Wilsbacher, G. (2024). Historical Blurry Video-Based Face Recognition. Journal of Imaging, 10(9), 236. https://doi.org/10.3390/jimaging10090236