The experiments were designed to answer the following research questions.
4.1. Deep Learning-Based Gaze Estimation under Low and Dark Illumination Conditions
In this section, we answer our first research question: Can a method based on a deep neural network provide a good estimation of gaze points under low-light and dark illumination conditions?
First, we computed the gaze angular error of the MPIIGaze protocol on the test set. Afterward, we evaluated the performance of the MPIIGaze protocol using the various aforementioned low and dark-illuminated test sets (i.e., , , , , , , and ). In this experiment, an enhancement method was not applied when testing the network. The results of this experiment would enable us to determine the effect of low and dark illumination conditions on the performance of the gaze estimation method based on deep learning.
The performance of the MPIIGaze protocol on the low and dark illuminated test sets is reported in Table 1
. The reported values are the averaged angular error measured from all the participants in each test set. The results in the table show that low and dark illumination conditions drastically degrade the performance of the MPIIGaze protocol. Compared to the
test set (i.e.,
), the average gaze angular error on the low illuminated sets increased to 9.71–10.4 degrees (i.e., 9.71 for
, 9.9 for
, 10.4 for
, and 10.2 for
). In addition, the mean angular error on the dark illuminated test sets increased to approximately 10 degrees (i.e., 10.56 for
, 10.4 for
, and 10.22 for
). This implies that low and dark illumination conditions cause the performance of ResNet to decrease by as much as 80.5% (i.e., from 5.76 to 10.4 on
) and 83.3% (i.e., from 5.76 to 10.56 on
), respectively. From these results, we showed that H1 can be confirmed. Furthermore, the findings for the test sets containing images with directional light reveal that the brightness level does not affect the performance. For example, both
caused angular errors of
. Similarly, both
caused angular errors of
. For these test sets, the direction of the light mainly affected the gaze estimation performance.
By contrast, the performance on
is largely dependent on the brightness level. As indicated in Table 1
, when tested with
, the average angular error increased by
degrees compared to
, respectively. The MPIIGaze protocol experienced the highest performance reduction on
. This result is consistent with our observation that, in the binary images of the eye (Figure 6
), the appearance of the eyeball in
is much more noisy and thus, less recognizable than in
. Finally, considering the performance difference between
(i.e., 9.9 on
and 9.71 on
), it can be inferred that gamma correction retains more of the available gaze information.
4.2. Performance Improvement of the Proposed Approach
The next experiment was carried out to answer our second research question: Could the proposed approach improve the performance of gaze estimation under low-illumination conditions?
For this experiment, we evaluated the performance of the proposed approach that applied GAN-based image enhancement before the image preprocessing step and gaze estimation using ResNet (see Figure 2
First, we describe the enhancement of low and dark illuminated images of the eye with the GAN-based enhancement method. Illustrative examples of low-light eye images and their enhanced versions are shown in Figure 10
. As shown in Figure 10
a, the images that were enhanced by the generative approach definitely have brighter illumination with more well-defined eye and pupil boundaries when compared with the original low and dark illuminated images shown in Figure 10
b. Figure 11
represents the difference between the binary images of the enhanced eye images (Figure 11
a) and the original low/dark illuminated images (Figure 11
b). These results show that the binary images of the original low/dark illuminated images are generally noisier compared to the binary images of the enhanced eye images. Specifically, the binary versions of the original low and dark illuminated images contain more dark holes, which may lead to the loss of additional information that would be useful for gaze estimation. In particular, the binary image of the original image in
appears to be partially cropped, with the result that the eye boundary is less well-defined compared to the binary version of the enhanced image of the eye. A similar phenomenon is observed in the case of the binary images of test sets with directional lights. Based on this qualitative analysis, gaze estimation using images enhanced by the generative approach would be expected to yield superior performance.
Next, we present our quantitative analysis of the performance of the proposed approach. The performance of the MPIIGaze protocol and that of the proposed approach on low and dark illuminated test sets is compared in Table 2
. First, as summarized in the third and fourth row of the table, the performance of the proposed approach on
sets improved by up to 4.53%–8.9% compared to that of the MPIIGaze protocol. Specifically, the proposed method achieved a performance gain of 0.44–0.48 degrees (4.53%–4.8%) on the low illuminated test sets (i.e., from 9.9 to 9.42 on
and from 9.71 to 9.27 on
) and 0.95 degree (8.9%) on the dark illuminated test set (i.e., from 10.56 to 9.61 on
), which confirms H2. This also indicates that the performance improvement with the proposed method increases as the illumination of images decreases; therefore, our hypothesis H3 can be confirmed. This is consistent with our analysis of the binary images of the eye obtained from the original images and those obtained from the images enhanced by EnlightenGAN. As mentioned earlier, we found that the binary version of the original image in
appears to be partially cropped, thereby rendering the appearance of the eye ambiguous and incomplete. We believe that this adversely affected the performance of the MPIIGaze protocol.
Second, the proposed method was also able to effectively process the directional lighting test sets. The proposed method achieved a performance improvement of 6.8%–7.15% for the low-light directional sets (i.e., from 10.4 to 9.69 on the
test set and from 10.2 to 9.47 on the
test set). In addition, a performance improvement of 6.75%–7.01% was obtained for the dark-light directional sets (i.e., from 10.22 to 9.53 on the
test set and from 10.4 to 9.67 on the
test set). These results validated our hypothesis H2; however, hypothesis H3 was not met. There was no significant difference in performance improvement between low-light directional sets and dark-light directional sets. Similar to the experimental result reported in Section 4.1
, however, the performance differs slightly depending on the direction of light. Similar to the MPIIGaze protocol, the proposed method is more effective on the right directional test sets than the left directional test sets. As shown by the results in Table 2
, the mean angular error of the proposed method on the left directional test sets is 9.67–9.69 degrees (i.e.,
), which is larger than on the right directional test sets of 9.47–9.53 degrees (i.e.,
). This result implies that the binary image of the eye of the right directional sets preserve more informative data than the left directional sets even though the appearance of the eyeball is still not clear. Conversely, we also found that the brightness level does not affect the performance of the proposed method on the directional light test sets. For example, the proposed method has an angular error of 9.67–9.69 on
vs. 9.47–9.53 on
. This result is also consistent with our previous observation that the performance of the MPIIGaze protocol is not dependent on the brightness level when processing the directional light test sets.
In summary, we found that the proposed method outperformed the MPIIGaze protocol under various challenging conditions. The performance improvement can be attributed to the capability of the proposed approach to recover missing information from low and dark images of the eye as a result of enhancement in a generative way. More importantly, the performance improvement could be achieved without any additional hardware set-up or training.
4.3. Comparison with Baseline Methods
In this section, we provide the answer to our final research question: Would a GAN-based adaptive enhancement method be more effective than a simple method involving manual brightness adjustment?
Our experimental results showed that the gaze estimation performance can be improved under challenging conditions when using the proposed method with EnlightenGAN, which can recover the features of low and dark illuminated eye images. However, it would also be reasonable to consider simple image processing methods, such as intensity adjustment or gamma correction, to enhance the low-light image. To address this consideration, we conducted experiments to compare the gaze estimation performance between the proposed method and the baseline methods, with the latter methods consisting of both intensity adjustment and gamma correction methods. The list of baseline methods is as follows.
Gamma correction methods: , , and
Set the gamma value to be 1.5, 2, and 3
Intensity adjustment methods: , , and
Increase the intensity by 40, 70, and 110
Examples of images modified by the baseline methods can be seen in Figure 12
. The illumination of modified images of the eye has been changed; however, the quality of the result differs significantly depending on the particular method that was used. For example, the images that were modified by adjusting their intensity (i.e.,
) are more blurry than those processed with gamma correction (i.e.,
). In particular, the images that were modified by adjusting their intensity appear to be merely hazy, rather than brighten. Conversely, the gamma correction methods successfully enhanced the images in
, such that the resultant images of the eye contained little noise and had well-defined appearances (see row (1) and (2) in Figure 12
). However, these methods failed to enhance the images in the remaining test sets (see row (3)–(7) in Figure 12
). In addition, for all the gamma correction methods, the larger the gamma value the noisier the appearance became. Based on this qualitative result, certain baseline methods would be expected to have a chance to outperform the MPIIGaze protocol.
Next, we report the quantitative results of the comparative experiments. The performance of the proposed method is compared with that of the baseline methods in Table 3
. We first discuss the results of the gamma correction methods and then analyze the performance of the intensity-based methods.
First, the average angular error of the baseline methods that employ gamma correction decreased by 0.07–0.13 degrees compared with the results obtained by using . In particular, achieved the highest performance gain among the gamma correction methods. However, it should be noted that the average angular error of the different gamma correction methods does not vary much (i.e., the average error ranges from 10.06 to 10.12). In addition, as observed from our qualitative analysis, the gamma correction methods performed well on the and test sets (i.e., the angular error was in the range 9.56 to 9.88); however, these methods were ineffective on the test set (i.e., the angular was in the range of 10.39 to 10.48). The gamma correction methods were similarly ineffective on the directional light test sets (i.e., the angular error was 10.1–10.36). Nevertheless, the methods based on gamma correction outperformed the MPIIGaze protocol. Specifically, the average angular error decreased by 0.02–0.15 degrees, 0.08–0.17 degrees, and 0.06–0.25 degrees for and , , and the directional light test sets, respectively.
Even though the gamma correction methods slightly outperformed , they still performed worse than the proposed method. The proposed method achieved an average performance improvement of 5.67%–6.3% over gamma correction methods for all the cases. For example, gamma correction produced errors in the range 9.56 to 9.88 degrees on the and test sets, whereas the errors of the proposed method were between 9.27 and 9.42. Furthermore, the performance of the gamma correction method became much worse on the and directional light sets (i.e., 10.25 degrees on average), whereas the errors of the proposed increased slightly to 9.59 degrees on average. This means that the proposed method can successfully recover information lost under low-light conditions compared with the gamma correction methods.
Second, we discuss the performance of the methods based on intensity adjustment (i.e.,
). The results in Table 3
indicate that the methods based on intensity adjustment delivered the lowest performance (i.e., the angular error was in the range 10.11 to 10.75) among the enhancement methods. In particular, the proposed method outperformed the intensity adjustment methods with decreased angular errors of 0.59–1.23 degrees (i.e., performance improvement of 5.83%–11.44%). In the case of intensity-based methods, we found that the larger the intensity value, the lower the angular error. Specifically, the performance of
(i.e., an error of 10.11) was comparable with that of
(i.e., errors of 10.12, and 10.11, respectively). On the other hand, the performance of the intensity adjustment method was also lower as the illumination became darker, similar to the other methods. These results can be interpreted to mean that the intensity adjustment methods failed to enhance the appearance of images of eyes, such that the overall gaze estimation performance was degraded compared with the other enhancement methods. Finally, we could confirm our hypothesis H4 that the baseline methods cannot be applied to practical use-cases, such as a digital gallery or an art exhibition hall, where various lighting conditions are necessary. This is because the baseline methods adjust the illumination in a fixed way (e.g., by increasing the intensity by a particular value), compared to the proposed method which adaptively enhances the illumination of image.
Next, we provide subject-level details of the experimental results for each method. The error score reported here is the mean angular error averaged across all test sets. Figure 13
presents the subject-level results of the MPIIGaze protocol and the proposed method. Overall, the proposed method outperformed the MPIIGaze protocol for most of the subjects. However, significant differences between the performance of the MPIIGaze protocol and the proposed method did not exist in the case of subjects S2, S3, and S10 (i.e., the errors differ by approximately 0.12 degrees). For the remaining subjects, the proposed method outperformed the MPIIGaze protocol with a moderate difference of 0.6–1.3 degrees. In the case of subject S15, the MPIIGaze protocol outperformed the proposed method with a slight improvement (i.e., 10.23 for
vs. 10.38 for the proposed method).
presents the averaged mean angular errors of the gamma correction methods (referred to as
, the average of errors from
), the intensity adjustment methods (referred to as
, the average errors from
), and the proposed method. The results closely resembled the findings presented in the previous Figure 13
. First, similar to the previous result, the proposed method outperformed the baseline methods for most of the subjects. Second, significant performance differences did not exist between the baseline methods and the proposed method in the case of subjects S2, S3, and S10 (the difference between the proposed method and
is 0.01–0.05, and between
it is 0.06–0.18). For the remaining subjects, the proposed method outperformed both
with a moderate difference of 0.77 and 1.23 degrees on average, respectively. The baseline methods also outperformed the proposed method for subject S15. The angular error of the baseline methods was lower than that of the proposed method with differences of 0.12 (
) and 0.21 (
Finally, we report the changes in the mean angular errors of each method according to training epochs in Figure 15
. In this figure,
denotes the performance of the original MPIIGaze protocol on the original MPIIGaze test set. The errors produced by all the methods gradually decreased during the first 30 epochs and then converged. At the beginning of training, the MPIIGaze protocol and the baseline methods (i.e.,
) produced high errors of 14.12, 13.82, and 12.49, respectively, whereas the error of the proposed method was relatively low (10.15). In the early stage of training, the average error of
was lower than that of
. However, at the end of training,
outperformed both the MPIIGaze protocol and
, as already summarized in Table 3
, which confirms our hypothesis H5. Another noteworthy result is that the error curve of
and that of the proposed method behaved similarly. This can be interpreted to signify that the appearance of an image enhanced by the EnlightenGAN framework closely resembles that of the original normal-light images from the MPIIGaze data set.