Low-Complexity Pupil Tracking for Sunglasses-Wearing Faces for Glasses-Free 3D HUDs

: This study proposes a pupil-tracking method applicable to drivers both with and without sunglasses on, which has greater compatibility with augmented reality (AR) three-dimensional (3D) head-up displays (HUDs). Performing real-time pupil localization and tracking is complicated by drivers wearing facial accessories such as masks, caps, or sunglasses. The proposed method fulﬁlls two key requirements: low complexity and algorithm performance. Our system assesses both bare and sunglasses-wearing faces by ﬁrst classifying images according to these modes and then assigning the appropriate eye tracker. For bare faces with unobstructed eyes, we applied our previous regression-algorithm-based method that uses scale-invariant feature transform features. For eyes occluded by sunglasses, we propose an eye position estimation method: our eye tracker uses nonoccluded face area tracking and a supervised regression-based pupil position estimation method to locate pupil centers. Experiments showed that the proposed method achieved high accuracy and speed, with a precision error of <10 mm in <5 ms for bare and sunglasses-wearing faces for both a 2.5 GHz CPU and a commercial 2.0 GHz CPU vehicle-embedded system. Coupled with its performance, the low CPU consumption (10%) demonstrated by the proposed algorithm highlights its promise for implementation in AR 3D HUD systems. eye regions for the classiﬁcation of bare and sunglasses-wearing faces. Cropping was performed using the statistical mean shape determined according to training dataset labeling.


Introduction
Three-dimensional (3D) displays provide realistic visual experiences with an enhanced sense of image depth [1,2]. Recent developments of holographic optical elements (HOE) technologies have increased the possibilities for commercialization of augmented reality (AR) devices including wearable AR glasses [3]. Additionally, autostereoscopic 3D displays offer the full benefits of the 3D experience without requiring the observer to wear 3D glasses. This is enabled via the eye-tracking-based autostereoscopic 3D method, which overcomes 3D viewing zone limitations, thereby allowing users a seamless 3D experience with higher 3Dresolved content [1,2]. The eye-tracking-based autostereoscopic 3D method can be adopted in automobile head-up displays (HUDs) for drivers, which display realistic 3D navigation information about the road via combiners placed on the windshield [4][5][6]. Eye-trackingbased autostereoscopic 3D display systems require accurate and fast 3D measurements of the eye positions of the viewer to display 3D images with low 3D crosstalk and high 3D resolution. However, owing to the limited capabilities of automobile computing systems, recent deep-learning-based algorithms are not used because they typically require heavy graphical processing units (GPUs).
Fast and accurate eye-gaze-tracking technologies are required not only in glasses-free 3D HUDs, but also in many AR display systems [7]. Other applications such as antispoofing in face recognition [8] and interaction systems with virtual contents [9] also utilize eye-gazetracking technologies. To date, many studies of eye tracking have focused on eye-gazetracking for wearable devices such as head-mounted devices. These methods calculate the 2 of 13 vector between the pupil center and the corneal reflection using near-infrared (NIR) light sources to estimate the viewer's looking direction. In most methods, the NIR camera and light sources must be placed in specific locations to achieve clear bright pupil and corneal reflection images [10][11][12][13][14][15][16]. Remote eye-tracking methods have adopted recent computer vision techniques to detect and track viewers' eyes at greater distances for various consumer electronics such as 3D televisions and monitors, gaming devices, smartphones, driver monitoring systems, and HUDs in automobiles [17][18][19]. These methods usually adopt NIR light sources to capture high-quality eye images under various light conditions [20][21][22].
The goal of gaze tracking is to estimate the viewer's gaze direction, while our eye position tracking aims to detect and track 3D eye positions using red-green-blue (RGB) web-cameras at remote distances. In previous studies, we published real-time computationbased eye-tracking methods for bare faces [23,24]. This accurate and fast pupil position tracking provided clear 3D images with high 3D resolution using limited embedded system resources, even when the head movements of the users were considered in real time. However, localization and tracking of the real-time pupil position of a driver can be affected by various occlusion factors, such as the driver wearing a mask, cap, or sunglasses. If the eye-tracking precision error is larger than the 3D margin of the HUD, users can see the 3D crosstalk and will feel 3D fatigue. As the 3D crosstalk margins of our AR 3D HUD prototype are 12 mm [2], the eye-tracking precision must be less than this threshold to allow users to enjoy 3D-crosstalk-free content ( Figure 1). Furthermore, a user will experience 3D crosstalk when they move their head if the overall system latency, including eye tracking and 3D rendering, is high. Complex state-of-the-art deep-learning-based algorithms for detecting facial features [25][26][27][28][29] are very accurate, but incompatible with 3D autostereoscopic display systems, which have limited GPU capacity.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 2 of 13 spoofing in face recognition [8] and interaction systems with virtual contents [9] also utilize eye-gaze-tracking technologies. To date, many studies of eye tracking have focused on eye-gaze-tracking for wearable devices such as head-mounted devices. These methods calculate the vector between the pupil center and the corneal reflection using near-infrared (NIR) light sources to estimate the viewer's looking direction. In most methods, the NIR camera and light sources must be placed in specific locations to achieve clear bright pupil and corneal reflection images [10][11][12][13][14][15][16]. Remote eye-tracking methods have adopted recent computer vision techniques to detect and track viewers' eyes at greater distances for various consumer electronics such as 3D televisions and monitors, gaming devices, smartphones, driver monitoring systems, and HUDs in automobiles [17][18][19]. These methods usually adopt NIR light sources to capture high-quality eye images under various light conditions [20][21][22]. The goal of gaze tracking is to estimate the viewer's gaze direction, while our eye position tracking aims to detect and track 3D eye positions using red-green-blue (RGB) web-cameras at remote distances. In previous studies, we published real-time computation-based eye-tracking methods for bare faces [23,24]. This accurate and fast pupil position tracking provided clear 3D images with high 3D resolution using limited embedded system resources, even when the head movements of the users were considered in real time. However, localization and tracking of the real-time pupil position of a driver can be affected by various occlusion factors, such as the driver wearing a mask, cap, or sunglasses. If the eye-tracking precision error is larger than the 3D margin of the HUD, users can see the 3D crosstalk and will feel 3D fatigue. As the 3D crosstalk margins of our AR 3D HUD prototype are 12 mm [2], the eye-tracking precision must be less than this threshold to allow users to enjoy 3D-crosstalk-free content ( Figure 1). Furthermore, a user will experience 3D crosstalk when they move their head if the overall system latency, including eye tracking and 3D rendering, is high. Complex state-of-the-art deep-learning-based algorithms for detecting facial features [25][26][27][28][29] are very accurate, but incompatible with 3D autostereoscopic display systems, which have limited GPU capacity.  The purpose of this study was to realize pupil center localization for drivers wearing sunglasses. Our proposed system consists of face detection, eye-nose shape keypoint alignment, a tracker checker, and tracking mode switching. For scenarios such as the user wearing sunglasses, corresponding shape aligners are applied using an image classification approach for pupil center localization. Occlusion, which can be exacerbated by sunlight reflection, makes pupil localization quite challenging in such scenarios. To tackle this problem, we use nonoccluded areas to infer the pupil center. The contributions of our method can be summarized by the following points:

•
We propose a pupil center localization system applicable to both bare faces and sunglasses-wearing faces. We classified facial images into these two categories and performed pupil tracking accordingly. • For sunglasses-wearing faces, we inferred the eye position behind the sunglasses by applying a supervised regression method to the non-occluded areas.

Methods
The proposed eye-tracking method, which works for bare faces and sunglasseswearing faces, comprises two different modes. Two different machine-learning-based eye trackers, which include facial shape alignment and tracker checker modules, are adopted to deduce the eye center positions. The eye-tracking mode selection is based on an eye-area classifier, which determines whether a person is wearing sunglasses or not.
Both eye-tracking modes use the face region as their input, which can be obtained via face detection or face region extraction using the frame-by-frame tracking of facial points. Depending on facial classification differentiators, such as whether a person is wearing sunglasses, corresponding shape aligners are applied to obtain the pupil center localization. In our previous work [23,24], we developed an eye-tracking method for bare faces that involves 11-point eye-nose shape tracking based on the supervised descent method (SDM) [30] (for details of the algorithms for bare-face eye tracking, see References [23,24]). For sunglasses-wearing faces, we propose an eye-position estimation method: the eye tracker uses nonoccluded face area tracking (nose-mouth-face boundary) combined with a supervised regression-based pupil position estimation method to identify the pupil centers. The basic components of our proposed eye-tracking method for people wearing sunglasses can be divided into two main stages: (1) nonoccluded face area tracking and (2) eye-center position estimation for occluded areas. A flowchart describing the proposed eye-tracking method for bare and sunglasses-wearing faces is presented in Figure 2.

Whole-Face Detection and Classification of Faces with Sunglasses
We utilized the error-based learning (EBL) method published in our previous work [23], which utilizes the AdaBoost classifier [31][32][33] and local binary pattern (LBP) features [31][32][33] for facial detection tasks. The EBL framework trains only a small fraction (less than 5%) of the detection training image DBs in much shorter training times, while it improves the detection rate through three stages [23]. We used a cascaded classifier with N-boosting substages for each stage in EBL [23]. Multiscale block LBP feature space was adopted for the cascaded AdaBoost classifier [23]. Unlike our previous method, which used an eye-nose detector, for this study we constructed an entire face detector to address the issue of eye occlusion by sunglasses. The cascade-AdaBoost algorithm is used extensively for facial detection tasks. Although deep-learning-based facial detection approaches such as SqueezeDet [26], region-based convolutional neural networks (R-CNN) [34], and multitask CNNs [27] have been developed recently, such techniques require considerable GPU resources and, therefore, require network lightening for integration with automobiles, for example as part of AR 3D HUD systems, which adopt commercial vehicle-embedded computing boards with limited GPU resources. Therefore, fast processing must be married with low complexity to maximize the performance of the limited computational resources of AR 3D HUD systems. Appl. Sci. 2021, 11, x FOR PEER REVIEW 4 of 13 Figure 2. Flowchart of the proposed eye-tracking method for bare and sunglasses-wearing faces.

Whole-Face Detection and Classification of Faces with Sunglasses
We utilized the error-based learning (EBL) method published in our previous work [23], which utilizes the AdaBoost classifier [31][32][33] and local binary pattern (LBP) features [31][32][33] for facial detection tasks. The EBL framework trains only a small fraction (less than 5%) of the detection training image DBs in much shorter training times, while it improves the detection rate through three stages [23]. We used a cascaded classifier with N-boosting substages for each stage in EBL [23]. Multiscale block LBP feature space was adopted for the cascaded AdaBoost classifier [23]. Unlike our previous method, which used an eyenose detector, for this study we constructed an entire face detector to address the issue of eye occlusion by sunglasses. The cascade-AdaBoost algorithm is used extensively for facial detection tasks. Although deep-learning-based facial detection approaches such as SqueezeDet [26], region-based convolutional neural networks (R-CNN) [34], and multitask CNNs [27] have been developed recently, such techniques require considerable GPU resources and, therefore, require network lightening for integration with automobiles, for example as part of AR 3D HUD systems, which adopt commercial vehicle-embedded computing boards with limited GPU resources. Therefore, fast processing must be married with low complexity to maximize the performance of the limited computational resources of AR 3D HUD systems.
Our proposed detector is both practical and simple, requiring only a CPU. For the bare-face training dataset, we reutilized samples constructed for our previous study [23], which outlines the details of the EBL algorithm used for training involving the bare-face image datasets. For sunglasses-wearing faces, we constructed a new training image database containing 30,000 images of 37 people wearing sunglasses according to our own capturing and labeling annotations (Figure 3a). For the final face-detection training step, the 50,000 bare faces and 30,000 sunglasses-wearing faces from the training samples were used. To run two different eye-tracking modes according to the presence of sunglasses, we performed an image classification routine to separate the bare and sunglasses-wearing faces. The region surrounding the eyes was cropped using the statistical mean shape from our training dataset labeling (Figure 3b). In addition, the eye region classifications were assisted by the cascade-AdaBoost classifier and LBP features. Our proposed detector is both practical and simple, requiring only a CPU. For the bare-face training dataset, we reutilized samples constructed for our previous study [23], which outlines the details of the EBL algorithm used for training involving the bareface image datasets. For sunglasses-wearing faces, we constructed a new training image database containing 30,000 images of 37 people wearing sunglasses according to our own capturing and labeling annotations (Figure 3a). For the final face-detection training step, the 50,000 bare faces and 30,000 sunglasses-wearing faces from the training samples were used. To run two different eye-tracking modes according to the presence of sunglasses, we performed an image classification routine to separate the bare and sunglasses-wearing faces. The region surrounding the eyes was cropped using the statistical mean shape from our training dataset labeling (Figure 3b). In addition, the eye region classifications were assisted by the cascade-AdaBoost classifier and LBP features.

Nose-Mouth-Face Boundary Tracking: Alignment and Tracker Checker
After the detector identifies the eye-nose region, the tracking mode starts to extract the coordinates of the pupil centers based on eye-nose shape alignments using the SDM [30] and the scale-invariant feature transform (SIFT) [35]. The SDM mode trains a sequence of descent directions that minimize the mean of the nonlinear square functions from each landmark point [30]. SIFT extracts feature description from landmark points and is invariant to translations, rotations, and scaling transformations, and is widely used in face recognition technologies [35]. This non-CNN-based method has the advantages of low CPU consumption and fast speed compared to state-of-the-art CNN-based landmark point alignment methods.
To prevent erroneous detection or tracking, we propose a novel tracker checker idea. The proposed tracker checker guarantees that the aligned results contain the eyes. For each frame, after the nose-mouth-face boundary points are aligned, the proposed nosemouth-face boundary tracker checker performs a final examination of the tracking results, irrespective of whether it tracks eyes or not. Thus, more efficient, faster eye-tracking system computations can be achieved.
A greedy search algorithm was performed to find the best nonoccluded area points for estimating the eye center positions. Among the 23 candidate points from nonoccluded areas, such as the nose-mouth-face boundary, we identified the 11 points that best suggested the eye position via the greedy algorithm with end-to-end eye position calculation (further details are provided in Section 2.3). Owing to the computational complexity of the SIFT feature extraction increasing according to the number of points involved, we only utilized 11 points to ensure reliability and manageable complexity. Among these points, the nosemouth and nose-mouth-face boundary points were identified as the best two positions (Figure 4).

Nose-Mouth-Face Boundary Tracking: Alignment and Tracker Checker
After the detector identifies the eye-nose region, the tracking mode starts to extract the coordinates of the pupil centers based on eye-nose shape alignments using the SDM [30] and the scale-invariant feature transform (SIFT) [35]. The SDM mode trains a sequence of descent directions that minimize the mean of the nonlinear square functions from each landmark point [30]. SIFT extracts feature description from landmark points and is invariant to translations, rotations, and scaling transformations, and is widely used in face recognition technologies [35]. This non-CNN-based method has the advantages of low CPU consumption and fast speed compared to state-of-the-art CNN-based landmark point alignment methods.
To prevent erroneous detection or tracking, we propose a novel tracker checker idea. for estimating the eye center positions. Among the 23 candidate points from nonoccluded areas, such as the nose-mouth-face boundary, we identified the 11 points that best suggested the eye position via the greedy algorithm with end-to-end eye position calculation (further details are provided in Section 2.3). Owing to the computational complexity of the SIFT feature extraction increasing according to the number of points involved, we only utilized 11 points to ensure reliability and manageable complexity. Among these points, the nose-mouth and nose-mouth-face boundary points were identified as the best two positions ( Figure 4).

Eye Center Position Estimation from Nose-Mouth-Face Boundary Points Using a Supervised Linear Regression Algorithm
Using the 11 points representing nonoccluded face areas identified in Section 2.2, we estimated the eye center positions using a supervised linear regression method, meaning that we constructed the training image database with the ground truth of eye position on sunglasses according to other points ( Figure 5). All the ground truths of eye positions were annotated as precisely as possible via comparison with the corresponding bare-face images. Ultimately, 32,400 sunglasses-wearing images with a head pose less than 20° were constructed with eye position labeling and used for eye position estimations using a supervised linear regression.

Eye Center Position Estimation from Nose-Mouth-Face Boundary Points Using a Supervised Linear Regression Algorithm
Using the 11 points representing nonoccluded face areas identified in Section 2.2, we estimated the eye center positions using a supervised linear regression method, meaning that we constructed the training image database with the ground truth of eye position on sunglasses according to other points ( Figure 5). All the ground truths of eye positions were annotated as precisely as possible via comparison with the corresponding bare-face images. Ultimately, 32,400 sunglasses-wearing images with a head pose less than 20 • were constructed with eye position labeling and used for eye position estimations using a supervised linear regression. In this paper, we set as the left eye center position and let X represent the set of points corresponding to nonoccluded areas, i.e., the nose-mouth-face boundary points introduced in Section 2.1. If is a linear regression matrix, then we can write = .
(1) In this paper, we set E le f t as the left eye center position and let X represent the set of points corresponding to nonoccluded areas, i.e., the nose-mouth-face boundary points introduced in Section 2.1. If β le f t is a linear regression matrix, then we can write Similarly, the right eye position, E right , can be formulated using β right : Solving Equations (1) and (2) leads to where X + is the pseudo-inverse of X when assuming a linear relationship between E and X. Collecting more data for X and E produces a more accurate value of β. This process of eye position estimation using a supervised linear regression method from n-samples of the 11 nose-mouth-face boundary (NMB) points is illustrated in detail below (Box 1): x 1, 1 y 1,1 x 2,1 y 2,1 . . . x 11,1 y 11,1 x 1, 2 y 1,2 x 2,2 y 2,2 . . . x 11,2 y 11,2 . . . x 1, n y 1,n x 2,n y 2,n . . . x 11,n y 11,n

Results
The proposed algorithm yielded successful real-time detection (~60 fps) and tracking (~200 fps) for a range of different environments, users, and system challenges based only on CPU computations. When the tracking mode was considered, the execution time was approximately 2-5 ms when using a standard PC with a 2.5 GHz CPU (consumption 10%, see Table 1) and running Windows 7. The algorithm was constructed in C++.Additionally, when tested using a commercial embedded computing board (Samsung Exynos-auto evt1 for 3D HUD), our proposed method achieved comparable eye-tracking speed and CPU usage results. Figure 6 shows some examples of real-time seamless pupil tracking using the proposed method with a stereo camera. The camera image resolution was 640 × 480 pixels with a recording speed of 60 fps and a field of view of 60 • × 40 • . This proved effective for capturing various images of sunglasses-wearing subjects, provided head movements had a speed below 250 mm/s. We tested the performance of the proposed algorithm for image and video database entries captured inside and outside an office environment of 10 people wearing different styles of sunglasses. The average aligner precision error was 2 mm for the bare-face images and 10 mm for the sunglasses-wearing images.

Discussion
This study proposes a low complexity technique for the detection and tracking of bare and sunglasses-wearing faces. The algorithm results showed high accuracy and fast The two databases were trained using the SIFT and SDM algorithms. Moreover, both bare-face and sunglasses-wearing-face image databases were used to inform the eye position estimation algorithm based on the supervised linear regression algorithm. Our method demonstrated high accuracy and speed, estimating the pupil center position with less than 10 mm error at a speed of 9 ms for sunglasses-wearing faces.

Discussion
This study proposes a low complexity technique for the detection and tracking of bare and sunglasses-wearing faces. The algorithm results showed high accuracy and fast speed even when eyes were occluded by sunglasses, while the CPU consumption was only 10%. In previous studies, we reported on the performance of a bare-face eye tracker [23]. Although this tracker identified and tracked eyes with high accuracy and speed, its performance deteriorated for users wearing sunglasses. The algorithms proposed herein for eye position estimation using nonoccluded shape points on sunglasses-wearing faces address this issue and showed a marked improvement. Furthermore, the proposed algorithm retains the advantages of low complexity and high speed demonstrated by our algorithms for analyzing bare faces [23]. Moreover, when compared to state-of-the-art deep-learning-based methods such as the practical facial landmark detector (PFLD) [36], which has a precision of 8 mm, our proposed method offers comparable precision (10 mm), reduced CPU consumption (10% down from 25%), and is faster (4 ms down from 12 ms). Figure 7 presents a performance comparison between our proposed algorithm and those reported in other studies.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 10 of 13 has a precision of 8 mm, our proposed method offers comparable precision (10 mm), reduced CPU consumption (10% down from 25%), and is faster (4 ms down from 12 ms). Figure 7 presents a performance comparison between our proposed algorithm and those reported in other studies. One of the main advances of our method compared with previously published works is its low complexity, with only low CPU computation required to provide real-time eye tracking. This is particularly valuable when considering the limited system resources of current vehicles. Additionally, our method offers different mode-switching systems according to user conditions. In that sense, our proposed system structures are highly compatible with other algorithms, including deep neural algorithms.  [23]. It showed 2 mm precision with 10% CPU consumption at a speed of 2-5 ms per frame. (b) Direct eye tracking for sunglasses-wearing faces using the SDM+SIFT algorithm [23]. It showed 25 mm precision with 10% CPU consumption at a speed of 2-5 ms per frame (c) Direct eye tracking using a CNN-based PFLD algorithm [36]. It showed 8 mm precision with 25% CPU consumption at a speed of 12 ms per frame. (d) Proposed indirect pupil position estimation method, which includes an indirect eye-tracking algorithm that estimates pupil location based on other facial feature points. The proposed method showed 10 mm precision with 10% CPU consumption at a speed of 2-5 ms per frame.
We also conducted experiments involving 20 test subjects to investigate the effects of 3D crosstalk when viewing 3D content generated by the AR 3D HUD prototype while wearing sunglasses ( Table 2). None of the users reported experiencing static 3D crosstalk (i.e., user head movements < 250 mm/s) while wearing sunglasses. However, when user head movements exceeded 250 mm/s (i.e., dynamic 3D crosstalk), 11 of the 20 participants reported experiencing 3D crosstalk while wearing sunglasses, while only 2 of the 20 participants reported experiencing dynamic 3D crosstalk while not wearing sunglasses. Con-  [23]. It showed 2 mm precision with 10% CPU consumption at a speed of 2-5 ms per frame. (b) Direct eye tracking for sunglasses-wearing faces using the SDM+SIFT algorithm [23]. It showed 25 mm precision with 10% CPU consumption at a speed of 2-5 ms per frame (c) Direct eye tracking using a CNN-based PFLD algorithm [36]. It showed 8 mm precision with 25% CPU consumption at a speed of 12 ms per frame. (d) Proposed indirect pupil position estimation method, which includes an indirect eye-tracking algorithm that estimates pupil location based on other facial feature points. The proposed method showed 10 mm precision with 10% CPU consumption at a speed of 2-5 ms per frame.
One of the main advances of our method compared with previously published works is its low complexity, with only low CPU computation required to provide real-time eye tracking. This is particularly valuable when considering the limited system resources of current vehicles. Additionally, our method offers different mode-switching systems according to user conditions. In that sense, our proposed system structures are highly compatible with other algorithms, including deep neural algorithms.
We also conducted experiments involving 20 test subjects to investigate the effects of 3D crosstalk when viewing 3D content generated by the AR 3D HUD prototype while wearing sunglasses ( Table 2). None of the users reported experiencing static 3D crosstalk (i.e., user head movements < 250 mm/s) while wearing sunglasses. However, when user head movements exceeded 250 mm/s (i.e., dynamic 3D crosstalk), 11 of the 20 participants reported experiencing 3D crosstalk while wearing sunglasses, while only 2 of the 20 participants reported experiencing dynamic 3D crosstalk while not wearing sunglasses. Considering that the 3D margin of our AR 3D HUD prototype is 12 mm, the 10 mm precision error associated with the proposed method is near the 3D margin boundaries, which could cause delayed left and right 3D view conversion owing to limited system latency. Therefore, further improvements to the eye-tracking precision are required in the future to eliminate the potential for dynamic 3D crosstalk. To validate 3D crosstalk with a large eye-tracking error of 10 mm, we added a random eye position error of <8 mm to the bare-face eye tracker, which had an average mean error of 2 mm (Figure 8). The addition of this random error to our bare-face eye tracker did not lead to users reporting static 3D crosstalk when the eye-tracking precision was <10 mm. Table 2. Experimental results for 3D crosstalk with the proposed eye-tracking algorithms for bare and sunglasses-wearing faces. All experiments were performed using a commercial embedded system with an AR 3D HUD prototype. This study has a few limitations that we hope to address in subsequent work. Here, we assumed that only eyes are occluded by sunglasses, leaving the nose, mouth, and face boundaries unobstructed by other accessories such as masks or caps. To handle a greater variety of user face-shape occlusions, the proposed system structures require multiple different eye trackers, each trained using a separate image database, which requires enhanced multiclass classification capabilities. Additionally, the number of publicly available images of drivers wearing sunglasses for use in face occlusion training image datasets is limited, presenting an obstacle for state-of-the-art deep-learning-based algorithm training. Therefore, data augmentation is required to construct large face occlusion datasets for the designated eye trackers. Finally, to handle various user face occlusion cases, further study is necessary to develop and optimize our algorithm. To validate 3D crosstalk with a large eye-tracking error of 10 mm, we added a random eye-position error (right). Users experienced 3D crosstalk when the eye-tracking precision error was >15 mm. For the current 3D HUD prototype, the 3D crosstalk was low with an eye-tracking precision error of <10 mm. To validate 3D crosstalk with a large eye-tracking error of 10 mm, we added a random eye-position error (right). Users experienced 3D crosstalk when the eye-tracking precision error was >15 mm. For the current 3D HUD prototype, the 3D crosstalk was low with an eye-tracking precision error of <10 mm.

Conclusions
In this paper, we propose a new pupil center detection system that is capable of analyzing both bare faces and those obstructed by sunglasses. The proposed method deploys different eye-tracking mode-switching systems according to the user conditions. For bare faces, we propose an SDM-based pupil segmentation method and utilize a coarseto-fine strategy for multiclass shape detection. For sunglasses-wearing faces, pupil centers are located indirectly by estimating the pupil positions based on the nose-mouth-face boundary points using a supervised linear regression algorithm. The proposed system is fast, accurate, robust, and computationally undemanding compared to state-of-theart CNN-based algorithms. Our system has the potential to accelerate AR 3D HUD commercialization, with simplicity and low processing power demands compatible with the limited computing resources of commercial embedded systems. Further study is necessary to extend our algorithm to handle various user face occlusion cases such as masks or caps with multiclass classification and training data augmentation.