MDPI - Publisher of Open Access Journals

21 pages, 3599 KiB

Open AccessArticle

Using Deep Learning to Identify Deepfakes Created Using Generative Adversarial Networks

by Jhanvi Jheelan and Sameerchand Pudaruth

Computers 2025, 14(2), 60; https://doi.org/10.3390/computers14020060 - 10 Feb 2025

Cited by 4 | Viewed by 2241

Generative adversarial networks (GANs) have revolutionised various fields by creating highly realistic images, videos, and audio, thus enhancing applications such as video game development and data augmentation. However, this technology has also given rise to deepfakes, which pose serious challenges due to their [...] Read more.

Generative adversarial networks (GANs) have revolutionised various fields by creating highly realistic images, videos, and audio, thus enhancing applications such as video game development and data augmentation. However, this technology has also given rise to deepfakes, which pose serious challenges due to their potential to create deceptive content. Thousands of media reports have informed us of such occurrences, highlighting the urgent need for reliable detection methods. This study addresses the issue by developing a deep learning (DL) model capable of distinguishing between real and fake face images generated by StyleGAN. Using a subset of the 140K real and fake face dataset, we explored five different models: a custom CNN, ResNet50, DenseNet121, MobileNet, and InceptionV3. We leveraged the pre-trained models to utilise their robust feature extraction and computational efficiency, which are essential for distinguishing between real and fake features. Through extensive experimentation with various dataset sizes, preprocessing techniques, and split ratios, we identified the optimal ones. The 20k_gan_8_1_1 dataset produced the best results, with MobileNet achieving a test accuracy of 98.5%, followed by InceptionV3 at 98.0%, DenseNet121 at 97.3%, ResNet50 at 96.1%, and the custom CNN at 86.2%. All of these models were trained on only 16,000 images and validated and tested on 2000 images each. The custom CNN model was built with a simpler architecture of two convolutional layers and, hence, lagged in accuracy due to its limited feature extraction capabilities compared with deeper networks. This research work also included the development of a user-friendly web interface that allows deepfake detection by uploading images. The web interface backend was developed using Flask, enabling real-time deepfake detection, allowing users to upload images for analysis and demonstrating a practical use for platforms in need of quick, user-friendly verification. This application demonstrates significant potential for practical applications, such as on social media platforms, where the model can help prevent the spread of fake content by flagging suspicious images for review. This study makes important contributions by comparing different deep learning models, including a custom CNN, to understand the balance between model complexity and accuracy in deepfake detection. It also identifies the best dataset setup that improves detection while keeping computational costs low. Additionally, it introduces a user-friendly web tool that allows real-time deepfake detection, making the research useful for social media moderation, security, and content verification. Nevertheless, identifying specific features of GAN-generated deepfakes remains challenging due to their high realism. Future works will aim to expand the dataset by using all 140,000 images, refine the custom CNN model to increase its accuracy, and incorporate more advanced techniques, such as Vision Transformers and diffusion models. The outcomes of this study contribute to the ongoing efforts to counteract the negative impacts of GAN-generated images. Full article

► Show Figures

Figure 1

19 pages, 9180 KiB

Open AccessArticle

Accurate Real-Time Live Face Detection Using Snapshot Spectral Imaging Method

by Zhihai Wang, Shuai Wang, Weixing Yu, Bo Gao, Chenxi Li and Tianxin Wang

Sensors 2025, 25(3), 952; https://doi.org/10.3390/s25030952 - 5 Feb 2025

Cited by 3 | Viewed by 1494

Abstract

Traditional facial recognition is realized by facial recognition algorithms based on 2D or 3D digital images and has been well developed and has found wide applications in areas related to identification verification. In this work, we propose a novel live face detection (LFD) [...] Read more.

Traditional facial recognition is realized by facial recognition algorithms based on 2D or 3D digital images and has been well developed and has found wide applications in areas related to identification verification. In this work, we propose a novel live face detection (LFD) method by utilizing snapshot spectral imaging technology, which takes advantage of the distinctive reflected spectra from human faces. By employing a computational spectral reconstruction algorithm based on Tikhonov regularization, a rapid and precise spectral reconstruction with a fidelity of over 99% for the color checkers and various types of “face” samples has been achieved. The flat face areas were extracted exactly from the “face” images with Dlib face detection and Euclidean distance selection algorithms. A large quantity of spectra were rapidly reconstructed from the selected areas and compiled into an extensive database. The convolutional neural network model trained on this database demonstrates an excellent capability for predicting different types of “faces” with an accuracy exceeding 98%, and, according to a series of evaluations, the system’s detection time consistently remained under one second, much faster than other spectral imaging LFD methods. Moreover, a pixel-level liveness detection test system is developed and a LFD experiment shows good agreement with theoretical results, which demonstrates the potential of our method to be applied in other recognition fields. The superior performance and compatibility of our method provide an alternative solution for accurate, highly integrated video LFD applications. Full article

(This article belongs to the Special Issue Advances in Optical Sensing, Instrumentation and Systems: 2nd Edition)

► Show Figures

Figure 1

20 pages, 25584 KiB

Open AccessArticle

LIDeepDet: Deepfake Detection via Image Decomposition and Advanced Lighting Information Analysis

by Zhimao Lai, Jicheng Li, Chuntao Wang, Jianhua Wu and Donghua Jiang

Electronics 2024, 13(22), 4466; https://doi.org/10.3390/electronics13224466 - 14 Nov 2024

Cited by 2 | Viewed by 2407

Abstract

The proliferation of AI-generated content (AIGC) has empowered non-experts to create highly realistic Deepfake images and videos using user-friendly software, posing significant challenges to the legal system, particularly in criminal investigations, court proceedings, and accident analyses. The absence of reliable Deepfake verification methods [...] Read more.

The proliferation of AI-generated content (AIGC) has empowered non-experts to create highly realistic Deepfake images and videos using user-friendly software, posing significant challenges to the legal system, particularly in criminal investigations, court proceedings, and accident analyses. The absence of reliable Deepfake verification methods threatens the integrity of legal processes. In response, researchers have explored deep forgery detection, proposing various forensic techniques. However, the swift evolution of deep forgery creation and the limited generalizability of current detection methods impede practical application. We introduce a new deep forgery detection method that utilizes image decomposition and lighting inconsistency. By exploiting inherent discrepancies in imaging environments between genuine and fabricated images, this method extracts robust lighting cues and mitigates disturbances from environmental factors, revealing deeper-level alterations. A crucial element is the lighting information feature extractor, designed according to color constancy principles, to identify inconsistencies in lighting conditions. To address lighting variations, we employ a face material feature extractor using Pattern of Local Gravitational Force (PLGF), which selectively processes image patterns with defined convolutional masks to isolate and focus on reflectance coefficients, rich in textural details essential for forgery detection. Utilizing the Lambertian lighting model, we generate lighting direction vectors across frames to provide temporal context for detection. This framework processes RGB images, face reflectance maps, lighting features, and lighting direction vectors as multi-channel inputs, applying a cross-attention mechanism at the feature level to enhance detection accuracy and adaptability. Experimental results show that our proposed method performs exceptionally well and is widely applicable across multiple datasets, underscoring its importance in advancing deep forgery detection. Full article

(This article belongs to the Special Issue Deep Learning Approach for Secure and Trustworthy Biometric System)

► Show Figures

Figure 1

25 pages, 8352 KiB

Open AccessArticle

Real-Time Deepfake Video Detection Using Eye Movement Analysis with a Hybrid Deep Learning Approach

by Muhammad Javed, Zhaohui Zhang, Fida Hussain Dahri and Asif Ali Laghari

Electronics 2024, 13(15), 2947; https://doi.org/10.3390/electronics13152947 - 26 Jul 2024

Cited by 16 | Viewed by 8774

Abstract

Deepfake technology uses artificial intelligence to create realistic but false audio, images, and videos. Deepfake technology poses a significant threat to the authenticity of visual content, particularly in live-stream scenarios where the immediacy of detection is crucial. Existing Deepfake detection approaches have limitations [...] Read more.

Deepfake technology uses artificial intelligence to create realistic but false audio, images, and videos. Deepfake technology poses a significant threat to the authenticity of visual content, particularly in live-stream scenarios where the immediacy of detection is crucial. Existing Deepfake detection approaches have limitations and challenges, prompting the need for more robust and accurate solutions. This research proposes an innovative approach: combining eye movement analysis with a hybrid deep learning model to address the need for real-time Deepfake detection. The proposed hybrid deep learning model integrates two deep neural network architectures, MesoNet4 and ResNet101, to leverage their respective architectures’ strengths for effective Deepfake classification. MesoNet4 is a lightweight CNN model designed explicitly to detect subtle manipulations in facial images. At the same time, ResNet101 handles complex visual data and robust feature extraction. Combining the localized feature learning of MesoNet4 with the deeper, more comprehensive feature representations of ResNet101, our robust hybrid model achieves enhanced performance in distinguishing between manipulated and authentic videos, which cannot be conducted with the naked eye or traditional methods. The model is evaluated on diverse datasets, including FaceForensics++, CelebV1, and CelebV2, demonstrating compelling accuracy results, with the hybrid model attaining an accuracy of 0.9873 on FaceForensics++, 0.9689 on CelebV1, and 0.9790 on CelebV2, showcasing its robustness and potential for real-world deployment in content integrity verification and video forensics applications. Full article

(This article belongs to the Special Issue Artificial Intelligence in Image and Video Processing)

► Show Figures

Figure 1

15 pages, 11302 KiB

Open AccessArticle

Digital Image Identification and Verification Using Maximum and Preliminary Score Approach with Watermarking for Security and Validation Enhancement

by Shrikant Upadhyay, Mohit Kumar, Aditi Upadhyay, Sahil Verma, Kavita, A. S. M. Sanwar Hosen, In-Ho Ra, Maninder Kaur and Satnam Singh

Electronics 2023, 12(7), 1609; https://doi.org/10.3390/electronics12071609 - 29 Mar 2023

Cited by 5 | Viewed by 2480

Abstract

Digital face approaches possess currently received awesome attention because of their huge wide variety of digital audio, and visual programs. Digitized snapshots are progressively more communicated using an un-relaxed medium together with cyberspace. Consequently, defence, clinical, medical, and exceptional supervised photographs are essentially [...] Read more.

Digital face approaches possess currently received awesome attention because of their huge wide variety of digital audio, and visual programs. Digitized snapshots are progressively more communicated using an un-relaxed medium together with cyberspace. Consequently, defence, clinical, medical, and exceptional supervised photographs are essentially blanketed towards trying to employ it; such controls ought to damage such choices constructed totally based on those pictures. So, to shield the originality of digital audio/visual snapshots, several approaches proposed. Such techniques incorporate traditional encoding, breakable and nominal breakable watermarking with virtual impressions which are based upon the material of image content. Over the last few decades, various holistic approaches are proposed for improving image identification and verification. In this paper, a combination of both the feature level and score level of different techniques were used. Image is one of the identities of a person which reflects its emotions, feeling, age etc. which also helps to gather an information about a person without knowing their name, caste, and age and this could be not of much importance when it is used for domestic or framing applications. To secure the originality of digital audio/visual impressions many methods come into pictures and are proposed which include digital signatures, watermarking, cryptography, and fragile depend upon face contents. The objective of this research article is to identify & verify real-time video images using feature and score levels using watermarking that will help to judge the authenticity of any images at the initial stage by extracting the features which are evaluated by following an algorithm known as Viterbi and where input data is changed initially into an embedded treat or state then the matrix is evaluated of achieved transformation and on this basis preliminary score estimation will be generated after many iterations for each image that will help in validation. Finally, the tested image will be verified using several approaches to protect and provide security to the original image being verified. This approach may be useful for different surveillance applications for real-time image identification and verification. Also, measurement of accuracy was done by reconfiguring the HMM to identify the constant segmentation and feature removal of the image was settled by initializing parameters and teaching the image feature using the algorithm “Viterbi”. Full article

► Show Figures

Figure 1

19 pages, 5181 KiB

Open AccessArticle

Deep Clustering Efficient Learning Network for Motion Recognition Based on Self-Attention Mechanism

by Tielin Ru and Ziheng Zhu

Appl. Sci. 2023, 13(5), 2996; https://doi.org/10.3390/app13052996 - 26 Feb 2023

Cited by 5 | Viewed by 1951

Abstract

Multi-person behavior event recognition has become an increasingly challenging research field in human–computer interaction. With the rapid development of deep learning and computer vision, it plays an important role in the inference and analysis of real sports events, that is, given the video [...] Read more.

Multi-person behavior event recognition has become an increasingly challenging research field in human–computer interaction. With the rapid development of deep learning and computer vision, it plays an important role in the inference and analysis of real sports events, that is, given the video frequency of sports events, when letting it analyze and judge the behavior trend of athletes, often faced with the limitations of large-scale data sets and hardware, it takes a lot of time, and the accuracy of the results is not high. Therefore, we propose a deep clustering learning network for motion recognition under the self-attention mechanism, which can efficiently solve the accuracy and efficiency problems of sports event analysis and judgment. This method can not only solve the problem of gradient disappearance and explosion in the recurrent neural network (RNN), but also capture the internal correlation between multiple people on the sports field for identification, etc., by using the long and short-term memory network (LSTM), and combine the motion coding information in the key frames with the deep embedded clustering (DEC) to better analyze and judge the complex behavior change types of athletes. In addition, by using the self-attention mechanism, we can not only analyze the whole process of the sports video macroscopically, but also focus on the specific attributes of the movement, extract the key posture features of the athletes, further enhance the features, effectively reduce the amount of parameters in the calculation process of self-attention, reduce the computational complexity, and maintain the ability to capture details. The accuracy and efficiency of reasoning and judgment are improved. Through verification on large video datasets of mainstream sports, we achieved high accuracy and improved the efficiency of inference and prediction. It is proved that the method is effective and feasible in the analysis and reasoning of sports videos. Full article

(This article belongs to the Special Issue Current Trends and Future Perspectives on Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

21 pages, 2071 KiB

Open AccessArticle

FRMDB: Face Recognition Using Multiple Points of View

by Paolo Contardo, Paolo Sernani, Selene Tomassini, Nicola Falcionelli, Milena Martarelli, Paolo Castellini and Aldo Franco Dragoni

Sensors 2023, 23(4), 1939; https://doi.org/10.3390/s23041939 - 9 Feb 2023

Cited by 7 | Viewed by 3697

Abstract

Although face recognition technology is currently integrated into industrial applications, it has open challenges, such as verification and identification from arbitrary poses. Specifically, there is a lack of research about face recognition in surveillance videos using, as reference images, mugshots taken from multiple [...] Read more.

Although face recognition technology is currently integrated into industrial applications, it has open challenges, such as verification and identification from arbitrary poses. Specifically, there is a lack of research about face recognition in surveillance videos using, as reference images, mugshots taken from multiple Points of View (POVs) in addition to the frontal picture and the right profile traditionally collected by national police forces. To start filling this gap and tackling the scarcity of databases devoted to the study of this problem, we present the Face Recognition from Mugshots Database (FRMDB). It includes 28 mugshots and 5 surveillance videos taken from different angles for 39 distinct subjects. The FRMDB is intended to analyze the impact of using mugshots taken from multiple points of view on face recognition on the frames of the surveillance videos. To validate the FRMDB and provide a first benchmark on it, we ran accuracy tests using two CNNs, namely VGG16 and ResNet50, pre-trained on the VGGFace and VGGFace2 datasets for the extraction of face image features. We compared the results to those obtained from a dataset from the related literature, the Surveillance Cameras Face Database (SCFace). In addition to showing the features of the proposed database, the results highlight that the subset of mugshots composed of the frontal picture and the right profile scores the lowest accuracy result among those tested. Therefore, additional research is suggested to understand the ideal number of mugshots for face recognition on frames from surveillance videos. Full article

(This article belongs to the Special Issue Biometric Recognition System Based on Iris, Fingerprint and Face)

► Show Figures

Figure 1

19 pages, 7914 KiB

Open AccessArticle

Analysis of Real-Time Face-Verification Methods for Surveillance Applications

by Filiberto Perez-Montes, Jesus Olivares-Mercado, Gabriel Sanchez-Perez, Gibran Benitez-Garcia, Lidia Prudente-Tixteco and Osvaldo Lopez-Garcia

J. Imaging 2023, 9(2), 21; https://doi.org/10.3390/jimaging9020021 - 18 Jan 2023

Cited by 6 | Viewed by 3416

Abstract

In the last decade, face-recognition and -verification methods based on deep learning have increasingly used deeper and more complex architectures to obtain state-of-the-art (SOTA) accuracy. Hence, these architectures are limited to powerful devices that can handle heavy computational resources. Conversely, lightweight and efficient [...] Read more.

In the last decade, face-recognition and -verification methods based on deep learning have increasingly used deeper and more complex architectures to obtain state-of-the-art (SOTA) accuracy. Hence, these architectures are limited to powerful devices that can handle heavy computational resources. Conversely, lightweight and efficient methods have recently been proposed to achieve real-time performance on limited devices and embedded systems. However, real-time face-verification methods struggle with problems usually solved by their heavy counterparts—for example, illumination changes, occlusions, face rotation, and distance to the subject. These challenges are strongly related to surveillance applications that deal with low-resolution face images under unconstrained conditions. Therefore, this paper compares three SOTA real-time face-verification methods for coping with specific problems in surveillance applications. To this end, we created an evaluation subset from two available datasets consisting of 3000 face images presenting face rotation and low-resolution problems. We defined five groups of face rotation with five levels of resolutions that can appear in common surveillance scenarios. With our evaluation subset, we methodically evaluated the face-verification accuracy of MobileFaceNet, EfficientNet-B0, and GhostNet. Furthermore, we also evaluated them with conventional datasets, such as Cross-Pose LFW and QMUL-SurvFace. When examining the experimental results of the three mentioned datasets, we found that EfficientNet-B0 could deal with both surveillance problems, but MobileFaceNet was better at handling extreme face rotation over 80 degrees. Full article

(This article belongs to the Special Issue Image Processing and Biometric Facial Analysis)

► Show Figures

Figure 1

14 pages, 12975 KiB

Open AccessArticle

Learning Facial Motion Representation with a Lightweight Encoder for Identity Verification

by Zheng Sun, Andrew W. Sumsion, Shad A. Torrie and Dah-Jye Lee

Electronics 2022, 11(13), 1946; https://doi.org/10.3390/electronics11131946 - 22 Jun 2022

Cited by 5 | Viewed by 2310

Abstract

Deep learning became an important image classification and object detection technique more than a decade ago. It has since achieved human-like performance for many computer vision tasks. Some of them involve the analysis of human face for applications like facial recognition, expression recognition, [...] Read more.

Deep learning became an important image classification and object detection technique more than a decade ago. It has since achieved human-like performance for many computer vision tasks. Some of them involve the analysis of human face for applications like facial recognition, expression recognition, and facial landmark detection. In recent years, researchers have generated and made publicly available many valuable datasets that allow for the development of more accurate and robust models for these important tasks. Exploiting the information contained inside these pretrained deep structures could open the door to many new applications and provide a quick path to their success. This research focuses on a unique application that analyzes short facial motion video for identity verification. Our proposed solution leverages the rich information in those deep structures to provide accurate face representation for facial motion analysis. We have developed two strategies to employ the information contained in the existing models for image-based face analysis to learn the facial motion representations for our application. Combining with those pretrained spatial feature extractors for face-related analyses, our customized sequence encoder is capable of generating accurate facial motion embedding for identity verification application. The experimental results show that the facial geometry information from those feature extractors is valuable and helps our model achieve an impressive average precision of 98.8% for identity verification using facial motion. Full article

(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, Volume II)

► Show Figures

Figure 1

15 pages, 559 KiB

Open AccessArticle

Multimodal Diarization Systems by Training Enrollment Models as Identity Representations

by Victoria Mingote, Ignacio Viñals, Pablo Gimeno, Antonio Miguel, Alfonso Ortega and Eduardo Lleida

Appl. Sci. 2022, 12(3), 1141; https://doi.org/10.3390/app12031141 - 21 Jan 2022

Viewed by 2361

Abstract

This paper describes a post-evaluation analysis of the system developed by ViVoLAB research group for the IberSPEECH-RTVE 2020 Multimodal Diarization (MD) Challenge. This challenge focuses on the study of multimodal systems for the diarization of audiovisual files and the assignment of an identity [...] Read more.

This paper describes a post-evaluation analysis of the system developed by ViVoLAB research group for the IberSPEECH-RTVE 2020 Multimodal Diarization (MD) Challenge. This challenge focuses on the study of multimodal systems for the diarization of audiovisual files and the assignment of an identity to each segment where a person is detected. In this work, we implemented two different subsystems to address this task using the audio and the video from audiovisual files separately. To develop our subsystems, we used the state-of-the-art speaker and face verification embeddings extracted from publicly available deep neural networks (DNN). Different clustering techniques were also employed in combination with the tracking and identity assignment process. Furthermore, we included a novel back-end approach in the face verification subsystem to train an enrollment model for each identity, which we have previously shown to improve the results compared to the average of the enrollment data. Using this approach, we trained a learnable vector to represent each enrollment character. The loss function employed to train this vector was an approximated version of the detection cost function (aDCF) which is inspired by the DCF widely used metric to measure performance in verification tasks. In this paper, we also focused on exploring and analyzing the effect of training this vector with several configurations of this objective loss function. This analysis allows us to assess the impact of the configuration parameters of the loss in the amount and type of errors produced by the system. Full article

(This article belongs to the Special Issue IberSPEECH 2020: Speech and Language Technologies for Iberian Languages)

► Show Figures

Figure 1

11 pages, 1665 KiB

Open AccessArticle

Libraries Fight Disinformation: An Analysis of Online Practices to Help Users’ Generations in Spotting Fake News

by Paula Herrero-Diz and Clara López-Rufino

Societies 2021, 11(4), 133; https://doi.org/10.3390/soc11040133 - 1 Nov 2021

Cited by 13 | Viewed by 7333

Abstract

The work of libraries during the COVID-19 pandemic, as facilitators of reliable information on health issues, has shown that these entities can play an active role as verification agents in the fight against disinformation (false information that is intended to mislead), focusing on [...] Read more.

The work of libraries during the COVID-19 pandemic, as facilitators of reliable information on health issues, has shown that these entities can play an active role as verification agents in the fight against disinformation (false information that is intended to mislead), focusing on media and informational literacy. To help citizens, these entities have developed a wide range of actions that range from online seminars, to learning how to evaluate the quality of a source, to video tutorials or the creation of repositories with resources of various natures. To identify the most common media literacy practices in the face of fake news (news that conveys or incorporates false, fabricated, or deliberately misleading information), this exploratory study designed an ad hoc analysis sheet, validated by the inter-judge method, which allowed one to classify the practices of N = 216 libraries from all over the world. The results reveal that the libraries most involved in this task are those belonging to public universities. Among the actions carried out to counteract misinformation, open-access materials that favor self-learning stand out. These resources, aimed primarily at university students and adults in general, are aimed at acquiring skills related to fact-checking and critical thinking. Therefore, libraries vindicate their role as components of the literacy triad, together with professors and communication professionals. Full article

(This article belongs to the Special Issue Fighting Fake News: A Generational Approach)

► Show Figures

Figure 1

12 pages, 1677 KiB

Open AccessArticle

A Novel Video Face Verification Algorithm Based on TPLBP and the 3D Siamese-CNN

by Yu Wang, Shuyang Ma and Xuanjing Shen

Electronics 2019, 8(12), 1544; https://doi.org/10.3390/electronics8121544 - 14 Dec 2019

Cited by 3 | Viewed by 3366

Abstract

In order to reduce the computational consumption of the training and the testing phases of video face recognition methods based on a global statistical method and a deep learning network, a novel video face verification algorithm based on a three-patch local binary pattern [...] Read more.

In order to reduce the computational consumption of the training and the testing phases of video face recognition methods based on a global statistical method and a deep learning network, a novel video face verification algorithm based on a three-patch local binary pattern (TPLBP) and the 3D Siamese convolutional neural network is proposed in this paper. The proposed method takes the TPLBP texture feature which has excellent performance in face analysis as the input of the network. In order to extract the inter-frame information of the video, the texture feature maps of the multi-frames are stacked, and then a shallow Siamese 3D convolutional neural network is used to realize dimension reduction. The similarity of high-level features of the video pair is solved by the shallow Siamese 3D convolutional neural network, and then mapped to the interval of 0 to 1 by linear transformation. The classification result can be obtained with the threshold of 0.5. Through an experiment on the YouTube Face database, the proposed algorithm got higher accuracy with less computational consumption than baseline methods and deep learning methods. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

13 pages, 4571 KiB

Open AccessArticle

Multi-Task Learning Using Task Dependencies for Face Attributes Prediction

by Di Fan, Hyunwoo Kim, Junmo Kim, Yunhui Liu and Qiang Huang

Appl. Sci. 2019, 9(12), 2535; https://doi.org/10.3390/app9122535 - 21 Jun 2019

Cited by 4 | Viewed by 6252

Abstract

Face attributes prediction has an increasing amount of applications in human–computer interaction, face verification and video surveillance. Various studies show that dependencies exist in face attributes. Multi-task learning architecture can build a synergy among the correlated tasks by parameter sharing in the shared [...] Read more.

Face attributes prediction has an increasing amount of applications in human–computer interaction, face verification and video surveillance. Various studies show that dependencies exist in face attributes. Multi-task learning architecture can build a synergy among the correlated tasks by parameter sharing in the shared layers. However, the dependencies between the tasks have been ignored in the task-specific layers of most multi-task learning architectures. Thus, how to further boost the performance of individual tasks by using task dependencies among face attributes is quite challenging. In this paper, we propose a multi-task learning using task dependencies architecture for face attributes prediction and evaluate the performance with the tasks of smile and gender prediction. The designed attention modules in task-specific layers of our proposed architecture are used for learning task-dependent disentangled representations. The experimental results demonstrate the effectiveness of our proposed network by comparing with the traditional multi-task learning architecture and the state-of-the-art methods on Faces of the world (FotW) and Labeled faces in the wild-a (LFWA) datasets. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition in the Era of Deep Learning)

► Show Figures

Figure 1

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI