The Eyes: A Source of Information for Detecting Deepfakes

Tchaptchet, Elisabeth; Tagne, Elie Fute; Acosta, Jaime; Rawat, Danda B.; Kamhoua, Charles

doi:10.3390/info16050371

Open AccessArticle

The Eyes: A Source of Information for Detecting Deepfakes^†

by

Elisabeth Tchaptchet

^1,*,‡,

Elie Fute Tagne

^2,‡,

Jaime Acosta

^3,‡,

Danda B. Rawat

^4,‡

and

Charles Kamhoua

^3,‡

¹

Mathematics and Computer Science Department, University of Dschang, Dschang P.O. Box 96, Cameroon

²

Mathematics and Computer Science Department, University of Buea, Buea P.O. Box 63, Cameroon

³

DEVCOM Army Research Laboratory, Network Security Branch, Adelphi, MD 21005, USA

⁴

Department of Computer Science, Howard University, Washington, DC 20059, USA

^*

Author to whom correspondence should be addressed.

^†

This article is a revised and expanded version of a paper entitled “Detecting Deepfakes Using GAN Manipulation Defects in Human Eyes”, which was presented at International Conference on Computing, Networking and Communication (ICNC) 2024 in Big Island Hawaii in USA, 19–22 February 2024.

^‡

These authors contributed equally to this work.

Information 2025, 16(5), 371; https://doi.org/10.3390/info16050371

Submission received: 24 December 2024 / Revised: 11 February 2025 / Accepted: 14 February 2025 / Published: 30 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

Currently, the phenomenon of deepfakes is becoming increasingly significant, as they enable the creation of extremely realistic images capable of deceiving anyone thanks to deep learning tools based on generative adversarial networks (GANs). These images are used as profile pictures on social media with the intent to sow discord and perpetrate scams on a global scale. In this study, we demonstrate that these images can be identified through various imperfections present in the synthesized eyes, such as the irregular shape of the pupil and the difference between the corneal reflections of the two eyes. These defects result from the absence of physical and physiological constraints in most GAN models. We develop a two-level architecture capable of detecting these fake images. This approach begins with an automatic segmentation method for the pupils to verify their shape, as real image pupils naturally have a regular shape, typically round. Next, for all images where the pupils are not regular, the entire image is analyzed to verify the reflections. This step involves passing the facial image through an architecture that extracts and compares the specular reflections of the corneas of the two eyes, assuming that the eyes of real people observing a light source should reflect the same thing. Our experiments with a large dataset of real images from the Flickr-FacesHQ and CelebA datasets, as well as fake images from StyleGAN2 and ProGAN, show the effectiveness of our method. Our experimental results on the Flickr-Faces-HQ (FFHQ) dataset and images generated by StyleGAN2 demonstrated that our algorithm achieved a remarkable detection accuracy of 0.968 and a sensitivity of 0.911. Additionally, the method had a specificity of 0.907 and a precision of 0.90 for this same dataset. And our experimental results on the CelebA dataset and images generated by ProGAN also demonstrated that our algorithm achieved a detection accuracy of 0.870 and a sensitivity of 0.901. Moreover, the method had a specificity of 0.807 and a precision of 0.88 for this same dataset. Our approach maintains good stability of physiological properties during deep learning, making it as robust as some single-class deepfake detection methods. The results of the tests on the selected datasets demonstrate higher accuracy compared to other methods.

Keywords:

deepfake detection; eyes; facegeneration

1. Introduction

Currently, there is an exponentially expanding phenomenon known as deepfakes. This technology allows for the generation of images from other existing images or the automatic modification of a person’s face, such as enhancing details, altering expressions, or removing objects and elements in images and videos using algorithms based on deep learning. With deepfake technology, it is possible to create high-quality content that is difficult for the human eye to detect as fake. The term “deepfake” encompasses any content that is altered or synthetically created using generative adversarial network (GAN) models [1]. Several GAN technologies have been developed and are constantly being improved. Figure 1 shows images generated by three different, evolved, and recent GAN technologies: StyleGAN2 [2], ProGAN [3], and StyleGAN3 [2]. These people do not exist and have been generated from scratch by GANs. GANs consist of two main components: a generator and a discriminator. The generator creates new data that resemble the training data, while the discriminator learns to differentiate between real and generated data. Although this technology can generate highly realistic and convincing content, it can be misused by malicious actors to fabricate media for harmful purposes, leading to serious societal issues or political threats.

To address these concerns, numerous techniques for identifying fraudulent images have been suggested [4,5,6]. The majority of these approaches utilize deep neural networks (DNNs) due to their high accuracy in the domain of image recognition [7,8].

Due to the malicious use of deepfake content by certain individuals, which tarnishes the reputation of innocent people, numerous detection methods have been developed to reveal such content. Various approaches validate that these materials are fake or altered by examining specific facial features such as the eyes, mouth, nose, etc. These techniques are known as physiological/physical detection methods, as mentioned in [4,5,7,8], and their results are generally more straightforward to interpret. Despite their effectiveness, these methods face two significant limitations: (1) The color system varies across the images in the dataset, leading to numerous false positives during detection. (2) Uncorrected illumination often results in overexposed or partially illuminated images, which degrades detection performance. Marten et al. [9] devised a technique to reveal deepfakes by identifying missing facial elements in some generated content, such as the absence of light reflections in the eyes and poorly represented tooth areas. Similarly, Hu et al. [5] highlight discrepancies in the eyes of deepfake images by noting that real images typically exhibit similar reflections in both corneas, a feature often missing in deepfakes. Nirkin et al. [10] demonstrate this by analyzing in great depth the texture of the hair, the shape of the ears, and the position of the neck. Wang et al. [11] examine the entire facial region to uncover artifacts in the synthesized images. Currently, physiological and physical-based detection methods face significant challenges due to the advanced techniques in content generation that produce images with fewer detectable imperfections, making these detection approaches more complex. In this study, we focus on detecting deepfakes by analyzing the eyes. We emphasize the eyes because they contain elements with regular and perfectly circular shapes, such as the iris and the pupil, and have several natural characteristics that are difficult for GANs to reproduce.

Our approach integrates two existing methods. The primary motivation is that the first method has significant limitations, while the second method complements it effectively when applied to images. This combination significantly reduces the occurrence of false positives while maintaining a rapid improvement in the detection rate. The deepfake detection process using GAN-generated techniques involves three primary steps: (1) Initially, the image is passed through a face detection algorithm to identify and extract any human faces present. (2) Next, these extracted faces are analyzed using a pupil shape detection system. This system is based on a physiological hypothesis suggesting that, in a real human face, the pupils typically appear as near-perfect circles or ellipses, influenced by the face’s orientation and the angle of the photo. This regularity is often not seen in GAN-generated images, which frequently exhibit a common anomaly: the irregular shape of the pupils. When a real person has a visual disorder, the pupil may dilate and take on an irregular shape. Consequently, most of these real images might be incorrectly classified as fake. To address this, we extended our model to enhance the detection rate. Thus, every image that passes the pupil shape verifier proceeds to the second part of the deepfake detector. This brings us to the third step, (3) which involves verifying the corneal reflections in both eyes. The principle is as follows: when a real person looks at an object illuminated by a light source, the corneas of both eyes should reflect the same object, adhering to certain conditions. These include the light source being at an optimal distance from the eyes, the image being taken in portrait mode to clearly present both eyes, and if two lines are drawn perpendicular to the iris diameter, these lines must be perfectly parallel. GAN-generated content often fails to meet these criteria, likely because it retains characteristics of the original images used during GAN training for image generation. The contributions of our work can be summarized as follows:

We introduce a mechanism to identify forged images by leveraging two robust physiological techniques, such as pupil shape and identical corneal reflections in both eyes.
We present a novel deepfake detection framework that focuses on the unique properties of the eyes, which are among the most challenging facial features for GANs to replicate accurately. The dual detection layers work in an end-to-end manner to produce comprehensive and effective detection outcomes.
Our extensive experiments showcase the superior effectiveness of our method in terms of detection accuracy, generalization, and robustness when compared to other existing approaches.

The rest of this document is structured as follows: Section 2 reviews the related work. Section 3 provides a comprehensive overview of our proposed method. Section 4 presents the experimental results and analysis. Lastly, Section 5 concludes this paper.

2. Related Work

2.1. Structure of the Human Eyes

The human eye serves as the optical and photoreceptive component of the visual system. Figure 2 illustrates the primary anatomical features of a human eye. In the center are the iris and the pupil. The transparent cornea, which covers the iris, transitions into the white sclera in the corneal limbus. This cornea has a spherical shape, and its surface reflects light like a mirror, creating corneal specular highlights when illuminated by light from the environment at the time of capture.

2.2. Generation of Human Faces Using GAN

Recent studies involving StyleGAN [2,3,12] have showcased the exceptional ability of GAN models [1], which are trained on extensive datasets of real human faces, to generate high-resolution and realistic human faces. A GAN model comprises two neural networks that are trained simultaneously. The generator starts with random noise and creates an image, while the discriminator’s role is to distinguish between the generated images and real ones. During training, the two networks engage in a competitive process: the generator strives to produce increasingly realistic images to fool the discriminator, while the discriminator continuously enhances its ability to tell the difference between real and generated images. This process continues until the two networks reach a balance.

Despite their successes, GAN-synthesized faces are not flawless. Early iterations of the StyleGAN model were known to produce faces with asymmetries and inconsistent eye colors [13]. Although the more recent StyleGAN2 model [12] has improved the quality of face synthesis and eliminated many of these artifacts, visible imperfections and inconsistencies can still be observed in the background, hair, and eye regions. These global and semantic artifacts persist primarily because GANs lack a comprehensive understanding of human facial anatomy, particularly the geometric relationships between facial features.

2.3. Detection Method Based on Physical Properties

Physical property detection is performed to detect inconsistencies and irrationalities caused by the forgery process, from physical device attributes to physiological inconsistencies.

When using physical devices like cameras and smartphones, they leave distinct traces that can serve as forensic fingerprints. An image can be flagged as a forgery if it contains multiple fingerprints. Most techniques focus on detecting these image fingerprints [14] to verify authenticity, including twin network-based detection [15] and CNN-based comparison [16]. The Face X-ray method [17] converts facial regions into X-rays to check if these regions originate from a single source. Identifying discrepancies in physiological signals is crucial for detecting deepfakes in images or videos. These approaches focus on extracting physiological features from the surrounding environment or individuals, such as inconsistencies in lighting, variations in eye and facial reflections, and abnormalities in blinking or breathing patterns. These techniques encompass the following:

Detecting forgeries by understanding human physiology, appearance, and semantic characteristics [5,9,10,18,19].
Identifying facial manipulation through analysis of 3D pose, shape, and expression elements [20].
Discriminating multi-modal methods using visual and auditory consistency [21].
Recognizing artifacts in altered content via Daubechies wavelet features [22] and edge features [23].

One of the latest methods for detecting deepfakes that considers physiological aspects is the approach developed by Hui Guo et al. [4]. This method classifies real and synthesized images based on the assumption that the pupils of real eyes are typically smooth and circular or elliptical in shape. However, for synthesized images, this assumption does not hold, as GANs lack knowledge of the human eye anatomy, particularly the geometry and shape of the pupil. Consequently, GAN-generated images often have pupils that are dilated and irregularly shaped, a flaw consistent across all current GAN models.

Their technique involves detecting and locating the eyes on a face by identifying facial landmarks. Then, it searches for the pupil and applies an algorithm that fits an ellipse to the pupil’s contour. To verify the “circularity” of the pupils, the Intersection over Union (IoU) [24] of the boundary between the extracted pupil shape mask and the fitted ellipse is calculated. With an accuracy rate of 91%, this approach has two significant drawbacks. Firstly, occlusions or poor segmentation of the pupils can lead to inaccurate results. Secondly, eyes affected by diseases can produce false positives, as these eyes often have irregularly shaped pupils that are not elliptical. Physiological-based methods are straightforward to interpret but are only effective when the eyes are clearly visible in the image, which is a notable limitation. The latest deepfake detection method by Xue et al. [25] identifies GAN-generated images using GLFNet (global and local facial features) [25]. GLFNet employs a dual-level detection approach: a local level that focuses on individual features and a global level that assesses overall facial elements such as iris color, pupil shape, and artifact detection in an image.

3. Motivation

The human face encapsulates various pieces of information, spanning from structural elements to expressions, which help in its description. Thus, by focusing on the physical or physiological aspects of a person, one can verify the authenticity of an image, whether it is real or synthesized. In scholarly works, exposing the generated content typically involves analyzing specific facial regions such as the eyes, lip movements, and the position of the nose and ears. Some existing deepfake detection methods, such as those mentioned in [4,5], identify localized artifacts in the eyes, specifically in the pupil or cornea. Other methods [26,27] emphasize the overall appearance of the face to identify false images. However, the eyes alone possess several characteristics that can more easily reveal these false images, with results that are straightforward to interpret. As a content generation tool, GANs offer several advantages, such as generating the desired image from a precise description [28,29], adjusting the resolution of an image to enhance low-resolution images without degradation [30], and predicting effective medications for various diseases [31]. They are also used in image retrieval and creating animated images.

Despite these benefits, GAN can be misused for malicious purposes, leading to identity theft, social fraud, and various scams. This underscores the importance of detecting these practices to prevent certain attacks. Detecting deepfakes is one of the most significant challenges today due to the potential harm these manipulated media can cause when used maliciously. Therefore, it is essential to develop an automatic process capable of accurately detecting deepfakes in images. This focus is central to our research efforts in this area.

4. Proposed Method

Our research on detecting deepfakes using the eyes, particularly the elements of the iris, is driven by the observation that GAN-generated images exhibit much more noticeable inconsistencies, such as the shape of the pupil and the light reflection on the cornea, when a person looks at an illuminated image from an appropriate distance. These two inconsistencies form the basis of our work, as for a real existing person, the pupil of the eye is elliptical, and the corneas of both eyes should reflect the same thing. This is not the case for GAN-generated images, which often show pupils of irregular shapes and discordant reflections in the corneas. By analyzing these inconsistencies, our method aims to expose GAN-generated images, providing an effective and easy-to-interpret tool for detecting deepfakes. The proposed approach consists of two main steps: the verification and image processing step, followed by the detection step. The detection step includes a sequence of two levels, as outlined in Figure 3.

4.1. Processing and Verification Step

The proposed GAN-generated face detection model begins with a face detection tool to identify any human face within the input image, as this method is specifically designed for images of people. Initially, the system utilizes landmark points to pinpoint and extract the eye region, as presented in Figure 4. This enables the accurate identification of the areas of interest, which in this context are the pupil and the cornea. This process involves meticulous analysis of these regions to determine authenticity, considering that the pupils should ideally be round or elliptical in a real human eye and the corneas should consistently reflect the same details. This dual focus on the pupil and cornea enhances the detection accuracy and robustness of the proposed method against deepfake manipulations.

4.2. Detection Step

This phase aims to classify images into two distinct groups: real and fake images. Initially, the process starts by pinpointing the areas of interest. The focus is first directed toward the pupil, where the entire region encompassing this element is segmented. Subsequently, the contours of the curves are examined to ascertain if the pupil maintains a regular or irregular shape. This is achieved by parametrically adjusting the shape to an ellipse, optimizing the mean squared error.

Should the pupil exhibit an irregular shape, the analysis progresses to the corneas of both eyes. During this phase, the light reflections in both corneas are scrutinized. This involves examining the full face image of the individual as they gaze at a pre-tested light source. The light traces reflected by the corneas are compared to determine their uniformity. GAN-generated images often display inconsistencies in these reflections, highlighting their synthetic nature. A more comprehensive explanation of the facial deepfake detection process follows in the following sections.

4.2.1. Verification of Pupil Shape

The technology employed for facial detection here is Dlib [32], which identifies the face and extracts the 68 facial landmarks, as previously mentioned. We then focus on the area encompassing both eyes to segment the pupils. For this, we use EyeCool [33] to obtain the pupil segmentation masks with their contours. EyeCool offers an improved model based on U-Net with EfficientNet-B5 [34] as the encoder. An edge attention block is integrated into the decoder to enhance the model’s ability to focus on object boundaries. For greater pixel-level precision, the outer boundary of the pupil is prioritized for analyzing shape irregularities. Examples are shown in Figure 5.

In real images, the pupils exhibit a regular, often oval shape, as depicted in the first four images. Conversely, synthetic pupils may present abnormal shapes, as illustrated in the last four images. To identify the pupil contours, we employ the ellipse fitting method. The ellipse parameter

θ

is determined using the least squares method, which minimizes the distance between the fitted ellipse and the points on the pupil’s edge. Considering the ellipse parameters A and

θ

, the algebraic distance P(X;

θ

) from 2D (x, y) to the ellipse is calculated as follows:

P (B; θ) = θ \cdot B = a x^{2} + b x y + c y^{2} + d x + e y + f = 0

(1)

where

θ = [\begin{matrix} a & b & c & d & e & d \end{matrix}]

and

B = {[\begin{matrix} x^{2} & x y & y^{2} & x & y & 1 \end{matrix}]}^{T}

In this context, T represents the transposition operation. An ideally oval shape is characterized by

P (B; θ)

= 0. The solution for fitting is obtained by minimizing the sum of the squared distances for the n data points from the edge of the pupil, following Algorithm 1 [25]:

Algorithm 1 Sum of squared distances

Require:

θ_{i}, a, b, c, j, epochs

Ensure:

θ

1: for i in range (epochs) do
2:

P u (θ) = 1

,

b^{2} - a . c \geq 0

3:

θ = θ_{i} - j

4: end for
5: return

θ

where Pu represents two paradigm forms and j is a constant decremented in each round. We avoid the trivial solution of

θ

= 0 and ensure the positive definiteness because there is no solution for which

θ

= 0.

To evaluate the pixels along the edge of the pupil mask, we utilize the BIoU (Boundary Intersection over Union) score.This BIoU metric indicates how closely the pixels at the edge of the pupil align with the pixels at the edges of an ellipse. BIoU values range between 0 and 1 and a value above 0.5 means that the boundaries fit more efficiently. The BIoU score is determined as follows:

B I o U = \frac{| (A_{i} \cap A) \cap (P_{i} \cap P) |}{| (A_{i} \cap A) \cup (P_{i} \cap P) |}

(2)

A denotes the predicted pupil mask, whereas P pertains to the adjusted ellipse mask. The variable i defines the accuracy of the boundary adjustment. The parameters A_i and P_i signify the pixels situated at a distance i from the predicted and adjusted boundaries, respectively. Following the approach of Hui et al [4], we set i = 4 to estimate the shape of the pupil.

4.2.2. Verification of Corneal Light Reflections

The assumption is grounded in the observation that in genuine images captured by a camera, the corneas of both eyes will mirror the same shape. This is because these reflections are directly related to the environment perceived by the individual. However, there are specific conditions that must be fulfilled:

The eyes are pointed straight ahead, ensuring that the line joining the centers of both eyes is parallel to the camera.
The eyes are positioned at a specific distance from the light source.
Both eyes have a clear line of sight to all the light sources or reflective surfaces within the environment.

In generated images, the corneal reflections differ, while in real images, the reflections are identical or similar. To achieve this distinction, areas containing both eyes are identified using landmarks. The corneal limbus, which is circular, is then isolated. As detailed in [5], the corneal limbus is located using a canny edge detector followed by the Hough transform, which positions it relative to the eye region defined by the landmarks as the corneal area. Light reflections are separated from the iris via an image thresholding method [35]. These reflections are represented as all pixels exceeding the threshold, as objects reflected on the cornea exhibit higher light intensity than the iris background. The IoU score calculates the disparity between the reflections of the two eyes by:

I o U = |\frac{C_{s} \cap C_{d}}{C_{s} \cup C_{d}}|

(3)

where C_s and C_d represent the reflection pixels from each eye, respectively, and the IoU score highlights a high similarity between them. This high similarity suggests that the source image is likely authentic and free of artifacts.

Figure 6 demonstrates an instance of visualizing light reflections on corneas within the eye region. The green and red colors indicate various light sources observed by the left and right eyes, respectively. It is evident that in the eye regions of images sourced from the Flickr-FacesHQ (FFHQ) and CelebA datasets, the corneal reflections appear similar. Conversely, in images produced by StyleGAN2 and ProGAN, the light reflections differ significantly.

4.2.3. Global Approch of the Method

We integrate the IoU and BIoU outcomes from the various stages of our detection method. If the evaluation parameters satisfy the condition in Equation (4), the input image is classified as real; otherwise, it is identified as GAN-generated. The condition in Equation (4) is defined by the following equation:

E = \{\begin{matrix} 1 & if (B I o U < A and I o U < B \\ 0 & otherwise \end{matrix}

(4)

where A and B denote the model’s evaluation criteria.

5. Experiments

The proposed method is assessed across four datasets: two comprising 1000 authentic images each, and two consisting of 1000 falsified or GAN-generated images each.

5.1. Real Images Dataset

The FFHQ dataset [12] encompasses an extensive array of images featuring real people’s faces. It includes 70,000 high-quality images with a resolution of 1024 × 1024, all captured with professional-grade cameras. This dataset boasts remarkable diversity in terms of age, ethnicity, race, backgrounds, and color. Additionally, it contains photos of individuals adorned with various accessories such as suspenders, different types of glasses, hats, contact lenses, and more.

CelebA [36] comprises a highly diverse collection of images, featuring nearly 10,177 different identities and 202,599 face images captured from various angles. This dataset serves as a valuable reference for databases of natural faces, offering a broad spectrum of perspectives.

5.2. Fake Images Dataset

Synthetic image datasets were created using the StyleGAN2 technique [2]. This method is an evolution of StyleGAN, which autonomously and without supervision separates high-level attributes, drawing inspiration from style transfer research. StyleGAN incorporates high-level features and stochastic variations during training, enhancing the intuitiveness of the generated content. StyleGAN2 improved generator normalization, re-evaluated progressive growth, and adjusted the generator to ensure a better alignment between latent codes and the resulting images.

ProGAN [2] progressively enhances the generator and discriminator, beginning at a low resolution and incrementally adding new layers to model finer details as training advances. Concurrently, researchers are working on improving the quality of the CelebA dataset.

6. Result and Discussion

Figure 7 and Figure 8 showcase examples of detection results obtained with the overall architecture of our model. More precisely, the BIoU metrics indicate how closely the pixels at the edges of the left and right pupils align with the pixels at the edges of an ellipse. Additionally, the IoU metric represents the degree of similarity between the corneal reflections in both eyes. While partial detection is robust, the combination of the two methods, where the second complements the first, grants the global algorithm several advantages. Specifically, it accounts for various parameters and different elements of the eyes and face, regardless of age, race, or gender. With an AUC of 0.968 for the pair (FFHQ dataset, StyleGAN2) and an AUC of 0.870 for the pair (CelebA dataset, ProGAN), our method is equally effective in demonstrating the authenticity of a real image and identifying GAN-generated images.

Comparison with Current Physiological Techniques

In summary, we evaluated our method against several state-of-the-art techniques using the AUC (area under the curve) to showcase its effectiveness. The AUC (area under the curve) metric is used to evaluate the performance of detection models. It measures the model’s ability to distinguish between authentic images and deepfakes over a wide range of classification thresholds. An AUC of 1 indicates perfect performance, while an AUC of 0.5 indicates performance equivalent to random classification. The AUC is particularly useful in the case of deepfakes because it allows for the comparison of different models’ performances independently of the decision threshold, which is crucial given the variability of deepfake generation techniques.

We selected four methods [4,5,25], each producing different AUC scores. Hu et al. [5] and Guo et al. [4] utilized real images from the FFHQ dataset and generated images synthesized by the StyleGAN2 method. Ziyu et al. [25] used two datasets, FFHQ and CelebA, for real images and employed StyleGAN2 and ProGAN to synthesize false images. Hu et al.’s method [5] achieved an AUC of 0.94, demonstrating strong performance in detecting fake images. Similarly, Guo et al.’s method [4] attained an AUC of 0.91, indicating a good level of accuracy. In comparison, Ziyu et al.’s experiments [25] produced an AUC of 0.96 for 1000 real images from the FFHQ dataset and 1000 images generated by StyleGAN2, demonstrating good performance. However, when using 1000 real images from the CelebA dataset and 1000 false images generated by ProGAN, the AUC was 0.88. Regarding our method, it should be noted that according to the two test data groups, namely (FFHQ and StyleGAN 2) and (CelebA and ProGAN), the different means and variations of the IoU and BIoU distributions are defined by:

FFHQ and StyleGAN2: The high performance of our model is demonstrated by a mean IoU of 0.85 with a low variance of 0.03, and a mean BIoU of 0.75 with a low variance of 0.02, indicating its effectiveness, stability, precision, and reliability in identifying similarities and differentiating real images from deepfakes.
CELEBA and PROGAN: The mean IoU of 0.72 suggests that the model performs reasonably well in capturing the similarity between corneal reflections, but there is room for improvement; the higher variance of 0.08 indicates some variability in the model’s performance across different samples, while the mean BIoU of 0.78 demonstrates fairly accurate performance in capturing the elliptical shape of the pupil, though not as effective as in (FFHQ and StyleGAN 2), and the variance of 0.09 shows some inconsistency, suggesting a need for further refinement.

It has been observed that our method demonstrates superior performance compared to the methods mentioned above. This superiority is evident in the consistent and higher AUC scores achieved across the FFHQ dataset and StyleGAN2, reinforcing the robustness and effectiveness of our approach.

Table 1 presents the various methods mentioned above and also provides the performance of our method on the selected datasets, namely two real image datasets, FFHQ and CelebA, and two GAN generation methods, StyleGAN2 and ProGAN.

Figure 9 and Figure 10 represent the receiver operating characteristic (ROC) curves for the different test data groups, namely (CelebA and ProGAN) and (FFHQ and StyleGAN 2), respectively, each corresponding to an AUC score of 0.870 and 0.968. These scores indicate that eye characteristics, such as corneal specular reflections and pupil shape, are effective in identifying GAN-synthesized faces.

7. Conclusions

In this article, we present a novel method for detecting images generated by GANs. Our approach is based on a two-stage detection process. Firstly, we evaluate the regularity of the pupil shape, as real individuals typically exhibit elliptical shaped pupils, while individuals with certain eye conditions may have irregularly shaped pupils. If irregularities in the pupil shape are detected, the images proceed to the second stage of our model, which verifies the similarity of the corneal reflections in both eyes. To achieve this, we process all images that did not pass the first stage and compare the corneal reflections using the IoU score for a more accurate interpretation of the results. The aim of our research is to develop a robust detection technique that can efficiently differentiate between real and GAN-generated images, addressing the growing concern over the misuse of deepfake technology. Our work specifically tackles the challenges posed by sophisticated GAN methods, focusing on subtle yet distinct features such as pupil regularity and corneal reflection similarity. From a practical standpoint, our method has significant implications for enhancing the security and authenticity of visual content in various applications, including social media, digital forensics, and secure identification systems. Socially, the ability to detect deepfakes with high accuracy can help mitigate the spread of misinformation and protect individuals from malicious uses of synthetic media. However, there are limitations to our approach. The current model relies heavily on the quality and resolution of the input images, which can affect the accuracy of pupil and corneal reflection analysis. Additionally, our method may not perform as effectively on images with occlusions or varying lighting conditions. Future research will focus on incorporating additional facial features, such as skin texture and facial symmetry, to enhance our detection technique. We also plan to explore advanced machine learning algorithms to improve the robustness of our model against diverse and evolving GAN techniques.

Author Contributions

Conceptualization, E.T. and E.F.T.; methodology, E.T.; software, E.T.; validation, E.F.T., J.A., D.B.R. and C.K.; formal analysis, E.F.T.; investigation, E.T. and C.K.; resources, E.T.; data curation, E.F.T.; writing—original draft preparation, E.T.; writing—review and editing, E.T., E.F.T. and C.K.; visualization, E.F.T., J.A., D.B.R. and C.K.; supervision, E.F.T., J.A., D.B.R. and C.K.; project administration, C.K.; funding acquisition, C.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by DEVCOM ARL Army Research Office under Grant Number W911NF-21-1-0326. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of DEVCOM ARL Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The test datasets are derived from the sources cited in the reference section.

Conflicts of Interest

The authors of this manuscript have declared that they have no conflicts of interest.

References

Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8110–8119. [Google Scholar]
Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Guo, H.; Hu, S.; Wang, X.; Chang, M.-C.; Lyu, S. Eyes tell all: Irregular pupil shapes reveal gan-generated faces. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22–27 May 2022; pp. 2904–2908. [Google Scholar]
Hu, S.; Li, Y.; Lyu, S. Exposing gan-generated faces using inconsistent corneal specular highlights. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 2500–2504. [Google Scholar]
Tchaptchet, E.; Tagne, E.F.; Acosta, J.; Danda, R.; Kamhoua, C. Detecting Deepfakes Using GAN Manipulation Defects in Human Eyes. In Proceedings of the 2024 International Conference on Computing, Networking and Communications (ICNC), Big Island, HI, USA, 19–22 February 2024; pp. 456–462. [Google Scholar]
Marra, F.; Gragnaniello, D.; Verdoliva, L.; Poggi, G. Do gans leave artificial fingerprints? In Proceedings of the 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose, CA, USA, 28–30 March 2019; pp. 506–511. [Google Scholar]
Wang, S.-Y.; Wang, O.; Zhang, R.; Owens, A.; Efros, A.A. Cnngenerated images are surprisingly easy to spot... for now. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8695–8704. [Google Scholar]
Matern, F.; Riess, C.; Stamminger, M. Exploiting visual artifacts to expose deepfakes and face manipulations. In Proceedings of the 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), Waikoloa Village, HI, USA, 7–11 January 2019; pp. 83–92. [Google Scholar]
Nirkin, Y.; Wolf, L.; Keller, Y.; Hassner, T. Deepfake detection based on discrepancies between faces and their context. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 6111–6121. [Google Scholar] [CrossRef]
Wang, J.; Wu, Z.; Ouyang, W.; Han, X.; Chen, J.; Jiang, Y.-G.; Li, S.-N. M2tr: Multi-modal multi-scale transformers for deepfake detection. In Proceedings of the 2022 International Conference on Multimedia Retrieval, Newark, NJ, USA, 27– 30 June 2022; pp. 615–623. [Google Scholar]
Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar]
Yang, X.; Li, Y.; Qi, H.; Lyu, S. Exposing gan-synthesized faces using landmark locations. In Proceedings of the ACM Workshop on Information Hiding and Multimedia Security (IHMMSec), Paris, France, 3–5 July 2019. [Google Scholar]
Cozzolino, D.; Verdoliva, L. Noiseprint: A CNN-based camera model fingerprint. IEEE Trans. Inf. Forensics Secur. 2019, 15, 144–159. [Google Scholar] [CrossRef]
Verdoliva, D.C.G.P.L. Extracting camera-based fingerprints for video forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Cozzolino, D.; Verdoliva, L. Camera-based Image Forgery Localization using Convolutional Neural Networks. In Proceedings of the 2018 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, 3–7 September 2018. [Google Scholar]
Li, L.; Bao, J.; Zhang, T.; Yang, H.; Chen, D.; Wen, F.; Guo, B. Face X-ray for more general face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5001–5010. [Google Scholar]
Ciftci, U.A.; Demir, I.; Yin, L. Fakecatcher: Detection of synthetic portrait videos using biological signals. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 1. [Google Scholar] [CrossRef] [PubMed]
Agarwal, S.; Farid, H.; El-Gaaly, T.; Lim, S.N. Detecting deep-fake videos from appearance and behavior. In Proceedings of the 2020 IEEE International Workshop on Information Forensics and Security (WIFS), New York, NY, USA, 6–11 December 2020; pp. 1–6. [Google Scholar]
Peng, B.; Fan, H.; Wang, W.; Dong, J.; Lyu, S. A Unified Framework for High Fidelity Face Swap and Expression Reenactment. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 3673–3684. [Google Scholar] [CrossRef]
Mittal, T.; Bhattacharya, U.; Chandra, R.; Bera, A.; Manocha, D. Emotions don’t lie: An audio-visual deepfake detection method using affective cues. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 2823–2832. [Google Scholar]
Zhang, Y.; Goh, J.; Win, L.L.; Thing, V.L. Image Region Forgery Detection: A Deep Learning Approach. In Proceedings of the Singapore Cyber-Security Conference (SG-CRC), Singapore, 14–15 January 2016; Volume 2016, pp. 1–11. [Google Scholar]
Salloum, R.; Ren, Y.; Kuo, C.C.J. Image splicing localization using a multi-task fully convolutional network (MFCN). J. Vis. Commun. Image Represent. 2018, 51, 201–209. [Google Scholar] [CrossRef]
Cheng, B.; Girshick, R.; Doll´ar, P.; Berg, A.C.; Kirillov, A. Boundary iou: Improving object-centric image segmentation evaluation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15334–15342. [Google Scholar]
Xue, Z.; Jiang, X.; Liu, Q.; Wei, Z. Global–local facial fusion based gan generated fake face detection. Sensors 2023, 23, 616. [Google Scholar] [CrossRef] [PubMed]
He, Y.; Yu, N.; Keuper, M.; Fritz, M. Beyond the spectrum: Detecting deepfakes via re-synthesis. arXiv 2021, arXiv:2105.14376. [Google Scholar]
Luo, Y.; Zhang, Y.; Yan, J.; Liu, W. Generalizing face forgery detection with high-frequency features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16317–16326. [Google Scholar]
Brock, A.; Donahue, J.; Simonyan, K. Large scale gan training for high fidelity natural image synthesis. arXiv 2018, arXiv:1809.11096. [Google Scholar]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
Bulat, A.; Yang, J.; Tzimiropoulos, G. To learn image super-resolution, use a gan to learn how to do image degradation first. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 185–200. [Google Scholar]
Alqahtani, H.; Kavakli-Thorne, M.; Kumar, G. Applications of generative adversarial networks (gans): An updated review. Arch. Comput. Methods Eng. 2021, 28, 525–552. [Google Scholar] [CrossRef]
King, D.E. Dlib-ml: A machine learning toolkit. J. Mach. Learn. Res. 2009, 10, 1755–1758. [Google Scholar]
Wang, C.; Wang, Y.; Zhang, K.; Muhammad, J.; Lu, T.; Zhang, Q.; Tian, Q.; He, Z.; Sun, Z.; Zhang, Y.; et al. Nir iris challenge evaluation in non-cooperative environments: Segmentation and localization. In Proceedings of the 2021 IEEE International Joint Conference on Biometrics (IJCB), Shenzhen, China, 4–7 August 2021; pp. 1–10. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 6105–6114. [Google Scholar]
Yen, J.-C.; Chang, F.-J.; Chang, S. A new criterion for automatic multilevel thresholding. IEEE Trans. Image Process. 1995, 4, 370–378. [Google Scholar]
Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3730–3738. [Google Scholar]

Figure 1. Images generated by three GAN technologies: StyleGAN2, ProGAN and StyleGAN3. These people do not exist and have been generated by a GAN.

Figure 2. Structure of the human eye.

Figure 3. The pipeline of the proposed detection algorithm. We use region extraction algorithms with their characteristics to extract the region containing both eyes, then we first analyze the pupils and then the corneas covering the eye irises to expose the synthesized images.

Figure 4. The pipeline for (a) face detection, (b) facial landmark localization, and (c) segmentation of the eye region.

Figure 5. Example of the pupil contour in high-resolution face images, taken from two real image datasets: the FFHQ dataset and the CelebA dataset. These examples also provide contour information on images generated by two GAN technologies: StyleGAN2 [2] and ProGAN [2].

Figure 6. Example of the corneal light reflections in the two eyes in high-resolution face images taken from two real image datasets, the FFHQ dataset and the CelebA dataset, and in images generated by two GAN technologies, StyleGAN2 and ProGAN.

Figure 7. An example of global model detection for real images from the FFHQ and CelebA datasets, showcasing different distributions of BIoU and IoU. IoU represents the distribution score for the left and right pupils’ contours, while BIoU indicates the distribution score of corneal reflections for both eyes.

Figure 8. An example of global model detection for images generated by StyleGAN and ProGAN, showcasing different distributions of BIoU and IoU. In this case, IoU represents the distribution score for the left and right pupils’ contours, while BIoU indicates the distribution score of corneal reflections for both eyes.

Figure 9. The ROC curve derived from the test on CELEBA and PROGAN demonstrates an AUC of 0.870.

Figure 10. The ROC curve derived from the test on FFHQ and STYLEGAN2 demonstrates an AUC of 0.968.

Table 1. Comparison of the most recent deepfake exposure methods from the AUC.

Method	Real Images (FFHQ)	Fake Images (StyleGAN2)	Result (AUC)
Hu et al. [5]	500	500	0.94
Guo et al. [4]	1000	1000	0.91
Yang et al. [13]	50,000 (CelebA)	25,000 (ProGAN)	0.94
Xue et al. [25]	1000	1000	0.96
Xue et al. [25]	1000 (CelebA)	1000 (ProGAN)	0.88
Our method	1000	1000	0.968
Our method	1000 (CelebA)	1000 (ProGAN)	0.870

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tchaptchet, E.; Tagne, E.F.; Acosta, J.; Rawat, D.B.; Kamhoua, C. The Eyes: A Source of Information for Detecting Deepfakes. Information 2025, 16, 371. https://doi.org/10.3390/info16050371

AMA Style

Tchaptchet E, Tagne EF, Acosta J, Rawat DB, Kamhoua C. The Eyes: A Source of Information for Detecting Deepfakes. Information. 2025; 16(5):371. https://doi.org/10.3390/info16050371

Chicago/Turabian Style

Tchaptchet, Elisabeth, Elie Fute Tagne, Jaime Acosta, Danda B. Rawat, and Charles Kamhoua. 2025. "The Eyes: A Source of Information for Detecting Deepfakes" Information 16, no. 5: 371. https://doi.org/10.3390/info16050371

APA Style

Tchaptchet, E., Tagne, E. F., Acosta, J., Rawat, D. B., & Kamhoua, C. (2025). The Eyes: A Source of Information for Detecting Deepfakes. Information, 16(5), 371. https://doi.org/10.3390/info16050371

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Eyes: A Source of Information for Detecting Deepfakes^†

Abstract

1. Introduction