2D Gaze Estimation Based on Pupil-Glint Vector Using an Artificial Neural Network

Abstract: Gaze estimation methods play an important role in a gaze tracking system. A novel 2D gaze estimation method based on the pupil-glint vector is proposed in this paper. First, the circular ring rays location (CRRL) method and Gaussian fitting are utilized for pupil and glint detection, respectively. Then the pupil-glint vector is calculated through subtraction of pupil and glint center fitting. Second, a mapping function is established according to the corresponding relationship between pupil-glint vectors and actual gaze calibration points. In order to solve the mapping function, an improved artificial neural network (DLSR-ANN) based on direct least squares regression is proposed. When the mapping function is determined, gaze estimation can be actualized through calculating gaze point coordinates. Finally, error compensation is implemented to further enhance accuracy of gaze estimation. The proposed method can achieve a corresponding accuracy of 1.29 ̋, 0.89 ̋, 0.52 ̋, and 0.39 ̋ when a model with four, six, nine, or 16 calibration markers is utilized for calibration, respectively. Considering error compensation, gaze estimation accuracy can reach 0.36 ̋. The experimental results show that gaze estimation accuracy of the proposed method in this paper is better than that of linear regression (direct least squares regression) and nonlinear regression (generic artificial neural network). The proposed method contributes to enhancing the total accuracy of a gaze tracking system.


Introduction
Human beings acquire 80%-90% of outside information through the eyes. Humans' visual perception of information can be acquired through eye gaze tracking [1][2][3][4]. With the increasing development of computer/machine vision technology, gaze tracking technology has been more and more widely applied in the fields of medicine [5], production tests [6], human-machine interaction [7,8], military aviation [9,10], etc.
For 2D gaze estimation methods, mapping function between gaze points and target plane or regions of interest is firstly established. The mapping function solved is then further utilized to calculate the gaze point on certain targets or regions. For 3D gaze estimation methods, a human eyeball that of linear regression. Multi-layer perceptrons (MLPs) are utilized by Coughlin et al. [68] to calculate gaze point coordinates based on electro-oculogram (EOG). The number of input nodes depends on the number of data points chosen to represent the saccadic waveforms. The output nodes of the network provide the horizontal and vertical 2D spatial coordinates of the line-of-sight on a particular training or test trial. In order to determine the number of nodes that can provide the optimal outputs, hidden layers containing different numbers of nodes are selected to train MLP ANN. Initial weights trained on another person are referred to in order to reduce training time. The experimental results show that using MLPs for calibration appears to be able to overcome some of the disadvantages of the EOG and provides an accuracy not significantly different from that obtained with the infrared tracker. In addition, Sesin et al. [69] find that MLPs can produce positive effectives: jitter reduction of gaze point estimation and enhancing the calculating stability of gaze points. Gneo et al. [70] utilize multilayer neural feedforward networks (MFNNs) to calculate gaze point coordinates based on pupil-glint vectors. Two separate MFNNs (each one having the same eye features as inputs, with one single output neuron directly estimating one of the X and Y coordinates of the POG), each containing 10 neurons in the hidden layer, are employed for training to acquire the outputs. The use of MFNNs overcomes the drawbacks of the model-based EGTSs and the potential reasons for their failure, which sometimes give ANNs an undeservedly poor reputation. Zhu and Ji [71] utilize generalized regression neural networks (GRNNs) to calculate a mapping function from pupil parameters to screen coordinates in a calibration procedure. The GRNN topology consists of four layers: input layer, hidden layer, summation layer, and output layer. Six factors including pupil-glint vector, pupil ellipse orientation, etc. are chosen as the input parameters of GRNNs. The output nodes represent the horizontal and vertical coordinates of the gaze point. Though the use of hierarchical classification schemes simplifies the calibration procedure, the gaze estimation accuracy of this method is not perfect. Kiat and Ranganath [72] utilize two single radial basis function neural networks (RBFNNs) to map the complex and non-linear relationship between the pupil and glint parameters (inputs) to the gaze point on the screen (outputs). Both of the networks have 11 inputs including x and y coordinates of left and right pupils, pupil-to-glint vectors of the left and right eyes, etc. The number of network output nodes depends on the number of calibration regions in the horizontal and vertical direction. The weights of the network are stored as calibration data for every subsequent time the user operates the system. As is the case with GRNNs, the gaze estimation accuracy of RBFNNs is not high enough. Wu et al. [73] employ the Active Appearance Model (AAM) to represent the eye image features, which combines the shape and texture information in the eye region. The support vector machine (SVM) is utilized to classify 36 2D eye feature points set (including eye contour, iris and pupil parameters, etc.) into eye gazing direction. The final results show the independence of the classifications and the accurate estimation of the gazing directions.
In this paper, considering the high speed of direct least squares regression and the high accuracy of artificial neural network, we propose an improved artificial neural network based on direct least squares regression (DLSR-ANN) to calculate the mapping function between pupil-glint vectors and actual gaze points. Different from general artificial neural networks, coefficient matrix elements of direct least squares regression are employed as connection coefficients in the input and hidden layers of DLSR-ANN. The error cost function and continuous-time learning rule of DLSR-ANN are defined and calculated according to the constraint condition of solving direct least squares regression. The initial condition of an integrator associated with the learning rule of DLSR-ANN is acquired through linear polynomial calculation of direct least squares regression. The learning rate parameter is limited to a range determined by the maximal eigenvalue of auto-correlation matrix composed by input vector of direct least squares regression. The proposed method contains advantages of both direct least squares regression and artificial neural network.
The remainder of this paper is organized as follows: Section 2 presents the proposed neural network method for gaze estimation in detail. Section 3 describes the experimental system and shows the results. Section 4 concludes the whole work. The experimental results show that the training process of the proposed method is stable. The gaze estimation accuracy of the proposed method in this paper is better than that of conventional linear regression (direct least squares regression) and nonlinear regression (generic artificial neural network). The proposed method contributes to enhance the total accuracy of a gaze tracking system.

Proposed Methods for Gaze Estimation
According to the respective characteristics of linear and nonlinear regression, a novel 2D gaze estimation method based on pupil-glint vector is proposed in this paper. An improved artificial neural network (DLSR-ANN) based on direct least squares regression is developed to solve the mapping function between pupil-glint vector and gaze point and then calculate gaze direction. The flow-process of gaze direction estimation is shown in Figure 1. First, when gazing at the calibration markers on the screen, corresponding eye images of subjects are acquired through a camera fixed on the head-mounted gaze tracking system. Second, through preprocessing such as Otsu optimal threshold binarization and opening-and-closing operation, pupil and glint centers are detected by utilizing circular ring rays location (CRRL) method. As inputs of the proposed DLSR-ANN, pupil-glint vector is calculated through the subtraction of pupil and glint center coordinates. Third, a three-layer DLSR-ANN (input layer, hidden layer, and output layer) is developed to calculate the mapping function between pupil-glint vectors and corresponding gaze points. Finally, gaze points on the screen can be estimated according to the mapping function determined.
Appl. Sci. 2016, 6, 174 4 of 17 regression) and nonlinear regression (generic artificial neural network). The proposed method contributes to enhance the total accuracy of a gaze tracking system.

Proposed Methods for Gaze Estimation
According to the respective characteristics of linear and nonlinear regression, a novel 2D gaze estimation method based on pupil-glint vector is proposed in this paper. An improved artificial neural network (DLSR-ANN) based on direct least squares regression is developed to solve the mapping function between pupil-glint vector and gaze point and then calculate gaze direction. The flow-process of gaze direction estimation is shown in Figure 1. First, when gazing at the calibration markers on the screen, corresponding eye images of subjects are acquired through a camera fixed on the head-mounted gaze tracking system. Second, through preprocessing such as Otsu optimal threshold binarization and opening-and-closing operation, pupil and glint centers are detected by utilizing circular ring rays location (CRRL) method. As inputs of the proposed DLSR-ANN, pupil-glint vector is calculated through the subtraction of pupil and glint center coordinates. Third, a three-layer DLSR-ANN (input layer, hidden layer, and output layer) is developed to calculate the mapping function between pupil-glint vectors and corresponding gaze points. Finally, gaze points on the screen can be estimated according to the mapping function determined.  Pupil-glint vector is calculated through the subtraction of pupil and glint center coordinate. 2nd linear gaze mapping function based on pupil-glint vector is expressed as Equation (1).

,
( where 1,2, ⋯ , . is the number of calibration markers. , is the coordinate of gaze calibration markers on screen coordinate system. , is the coordinate of pupil-glint vector on image coordinate system. Least squares, as conventional linear methods, is utilized to solve the gaze mapping function shown in Equation (1). Residual error is defined as: By calculating a partial derivative of 1,2,3,4,5,6 in Equation (2), the constraint condition can be obtained as in Equation (3).
where σ 1, σ , σ , σ , σ , σ . The value of can be calculated according to Equation (4). Pupil-glint vector is calculated through the subtraction of pupil and glint center coordinate. 2nd linear gaze mapping function based on pupil-glint vector is expressed as Equation (1).

#
x ci " a 1`a2 x ei`a3 y ei`a4 x ei y ei`a5 x 2 ei`a 6 y 2 where i " 1, 2,¨¨¨, N. N is the number of calibration markers. px ci , y ci q is the coordinate of gaze calibration markers on screen coordinate system. px ei , y ei q is the coordinate of pupil-glint vector on image coordinate system. Least squares, as conventional linear methods, is utilized to solve the gaze mapping function shown in Equation (1). Residual error is defined as: x ci´´a1`a2 x ei`a3 y ei`a4 x ei y ei`a5 x 2 ei`a 6 y 2 ei¯ı 2 . (2) By calculating a partial derivative of a j pj " 1, 2, 3, 4, 5, 6q in Equation (2), the constraint condition can be obtained as in Equation (3).
As with a j , the value of b j pj " 1, 2, 3, 4, 5, 6q can be calculated. In fact, the relationship between the number of coefficients in mapping function (r) and polynomial order (s) is as follows: According to Equation (5), when an s order polynomial is utilized to solve the gaze mapping function, at least r gaze calibration markers are required. For a head-mounted (intrusive) gaze tracking system, the relative position of the monitor screen and the user's head and eyes remains nearly fixed. In this case, the higher-order terms in the mapping function are mainly utilized to compensate for error between the estimated and actual gaze direction. The higher the polynomial order, the higher the calculation accuracy. However, the number of polynomial coefficients to be solved will increase at the same time (Equation (5)). In addition, the number of calibration markers required also increases. This not only makes the calibration time longer; the cumbersome calibration process also adds to the user's burden. Users are prone to be fatigued, thus affecting the calibration accuracy. In order to further enhance the mapping accuracy and realize precise estimation of gaze direction, a novel artificial neural network (DLSR-ANN) based on direct least squares regression is proposed to solve a mapping function between pupil-glint vectors and calibration markers.
We rewrite the matrix equation in Equation (4) as: where a " " a 1 a 2 a 3 a 4 a 5 a 6 ı T , p " Figure 2 shows the scheme framework of an improved artificial neural network based on direct least squares regression. The DLSR-ANN is a three-layer neural network with input layer, hidden layer, and output layer. Elements of matrix p including pupil-glint vectors gazing at calibration markers are determined as the input of a neural network. Elements of matrix a are determined as the output of a neural network. The input, output, and hidden layers contain one, one, and three nodes, respectively. As shown in Figure 2, coefficient matrix elements of direct least squares regression are employed as connection coefficients in the input and hidden layers of DLSR-ANN. According to the respective characteristics of input, hidden, and output layers and the relationship among them, appropriate weighting functions g , g , g are determined. Derivatives of g , g , and g , as the transfer function of the neuron. The selection of specific parameters is described in Section 3.4. As a three-layer neural network, its output layer carries an integrator. The integrator's initial condition 0 0 0 0 0 0 0 is calculated through a linear polynomial solution utilizing direct least squares regression.
In the proposed method, to solve the mapping function in Equation (6), the steepest gradient descent method [74] is adopted as the training method of the neural network. To determine the relationship between hidden layer and output layer, the error cost function and continuous-time learning rule of DLSR-ANN are defined according to the constraint condition of solving direct least squares regression. According to the error distribution characteristics of gaze estimation, the Euclid norm ( norm) is selected to acquire the minimal error cost function, which is in the same form as the error solving criterion of direct least squares regression, as defined in Equation (7): where is the solution error of Equation (6) in direct least squares regression. Equation (7) can be further expressed as follows: According to an error cost function based on the constraint condition of direct least squares regression, the learning rule of a continuous-time neural network is set as Equation (9). The function of the learning rule is to modify the weights of DLSR-ANN adaptively to acquire the optimal solution.
where µ is the learning rate parameter. As a positive-definite matrix, μ , , 1,2, ⋯ , is generally selected as a diagonal matrix. In general, µ is determined by experience. If µ is set too small, the weights of the neural network will be modified by the learning rule slowly. More iterations will be needed to reach the error bottom. If µ is set too large, the learning rule will show As shown in Figure 2, coefficient matrix elements of direct least squares regression are employed as connection coefficients in the input and hidden layers of DLSR-ANN. According to the respective characteristics of input, hidden, and output layers and the relationship among them, appropriate weighting functions g 1 ptq, g 2 ptq, g 3 ptq are determined. Derivatives of g 1 ptq, g 2 ptq, and g 3 ptq , respectively, are calculated ( f 1 ptq, f 2 ptq, f 3 ptq p f ptq " dg ptq {dtqq as the transfer function of the neuron. The selection of specific parameters is described in Section 3.4. As a three-layer neural network, its output layer carries an integrator. The integrator's initial condition a p0q " a 0 " " a 1 p0q a 2 p0q a 3 p0q a 4 p0q a 5 p0q a 6 p0q ı T is calculated through a linear polynomial solution utilizing direct least squares regression. In the proposed method, to solve the mapping function in Equation (6), the steepest gradient descent method [74] is adopted as the training method of the neural network. To determine the relationship between hidden layer and output layer, the error cost function and continuous-time learning rule of DLSR-ANN are defined according to the constraint condition of solving direct least squares regression. According to the error distribution characteristics of gaze estimation, the Euclid norm (L 2 norm) is selected to acquire the minimal error cost function, which is in the same form as the error solving criterion of direct least squares regression, as defined in Equation (7): where e " Qa´p is the solution error of Equation (6) in direct least squares regression. Equation (7) can be further expressed as follows: According to an error cost function based on the constraint condition of direct least squares regression, the learning rule of a continuous-time neural network is set as Equation (9). The function of the learning rule is to modify the weights of DLSR-ANN adaptively to acquire the optimal solution.
where µ is the learning rate parameter. As a positive-definite matrix, µ pµ " rµ vw s , υ, w " 1, 2,¨¨¨, nq is generally selected as a diagonal matrix. In general, µ is determined by experience. If µ is set too small, the weights of the neural network will be modified by the learning rule slowly. More iterations will be needed to reach the error bottom. If µ is set too large, the learning rule will show numerical instability. To ensure the stability of the differential equation in Equation (9) and the convergence of its solution, a small enough µ is chosen according to Equation (10): where λ max is the maximal eigenvalue of auto-correlation matrix composed by input vector p in direct least squares regression. When the eigenvalue is unavailable, the auto-correlation matrix can replace it. By calculating a partial derivative of variable a in Equation (8), the learning rule of a continuous-time neural network for solving matrix equation Qa " p can be deduced as:

Experimental System
In this study, we develop a wearable gaze tracking system composed of a helmet, a monitor, an array of four near-infrared light emitting diodes (NIR LEDs), and a microspur camera, as shown in Figure 3. The screen size of the monitor is 75 mmˆ50 mm. Considering the imaging distance is limited between 3 cm and 7 cm, a microspur camera is adopted to acquire the eye image. The image resolution is 640ˆ480 pixels (CCD sensor). As described in [75], when the wavelength of NIR LED is located within the range of 760 nm-1400 nm, the pupil absorbs nearly all the near-infrared light and the iris obviously reflects it. The wavelength of NIR LED employed in this paper is 850 nm and the power is less than 5 mw. The experimental system brings no harm to human eyes [76]. An NVIDIA Jetson TK1 embedded development board (Figure 4) is utilized for image acquiring and processing (NVIDIA: NVIDIA Corporation (Santa Clara, California, CA, USA). TK1: Tegra K1. Jetson TK1 is a code of embedded development board manufactured by NVIDIA Corporation).
Appl. Sci. 2016, 6, 174 7 of 17 numerical instability. To ensure the stability of the differential equation in Equation (9) and the convergence of its solution, a small enough µ is chosen according to Equation (10): where λ is the maximal eigenvalue of auto-correlation matrix composed by input vector in direct least squares regression. When the eigenvalue is unavailable, the auto-correlation matrix can replace it.
By calculating a partial derivative of variable in Equation (8), the learning rule of a continuous-time neural network for solving matrix equation can be deduced as:

Experimental System
In this study, we develop a wearable gaze tracking system composed of a helmet, a monitor, an array of four near-infrared light emitting diodes (NIR LEDs), and a microspur camera, as shown in Figure 3. The screen size of the monitor is 75 mm × 50 mm. Considering the imaging distance is limited between 3 cm and 7 cm, a microspur camera is adopted to acquire the eye image. The image resolution is 640 × 480 pixels (CCD sensor). As described in [75], when the wavelength of NIR LED is located within the range of 760 nm-1400 nm, the pupil absorbs nearly all the near-infrared light and the iris obviously reflects it. The wavelength of NIR LED employed in this paper is 850 nm and the power is less than 5 mw. The experimental system brings no harm to human eyes [76]. An NVIDIA Jetson TK1 embedded development board (Figure 4) is utilized for image acquiring and processing (NVIDIA: NVIDIA Corporation (Santa Clara, California, CA, USA). TK1: Tegra K1. Jetson TK1 is a code of embedded development board manufactured by NVIDIA Corporation).

Pupil Detection
The circular ring rays location (CRRL) method [77] is utilized for pupil center detection, for the reason that it is more robust and accurate than conventional detection methods. As shown in Figure   Figure 4. NVIDIA Jetson TK1 embedded development board.

Pupil Detection
The circular ring rays location (CRRL) method [77] is utilized for pupil center detection, for the reason that it is more robust and accurate than conventional detection methods. As shown in Figure 5, in the CRRL method, improved Otsu optimal threshold binarization is utilized on a gray-scale eye image to eliminate the influence caused by illumination change. Through an opening-and-closing operation, rough location of pupil area, and circular ring rays, and pupil boundary points and center can be detected accurately when interference factors such as eyelashes, glint, and natural light reflection are located on the pupil contour. The CRRL method contributes to enhance the stability, accuracy, and real-time quality of a gaze tracking system.

Pupil Detection
The circular ring rays location (CRRL) method [77] is utilized for pupil center detection, for the reason that it is more robust and accurate than conventional detection methods. As shown in Figure  5, in the CRRL method, improved Otsu optimal threshold binarization is utilized on a gray-scale eye image to eliminate the influence caused by illumination change. Through an opening-and-closing operation, rough location of pupil area, and circular ring rays, and pupil boundary points and center can be detected accurately when interference factors such as eyelashes, glint, and natural light reflection are located on the pupil contour. The CRRL method contributes to enhance the stability, accuracy, and real-time quality of a gaze tracking system.

Glint Detection
For the reason that the glint's illumination intensity is suitable for Gaussian distribution, Gaussian function deformation solved by improved total least squares [77] is utilized to calculate the glint center. The detection result of glint is shown in Figure 6.

Glint Detection
For the reason that the glint's illumination intensity is suitable for Gaussian distribution, Gaussian function deformation solved by improved total least squares [77] is utilized to calculate the glint center. The detection result of glint is shown in Figure 6.

Pupil Detection
The circular ring rays location (CRRL) method [77] is utilized for pupil center detection, for the reason that it is more robust and accurate than conventional detection methods. As shown in Figure  5, in the CRRL method, improved Otsu optimal threshold binarization is utilized on a gray-scale eye image to eliminate the influence caused by illumination change. Through an opening-and-closing operation, rough location of pupil area, and circular ring rays, and pupil boundary points and center can be detected accurately when interference factors such as eyelashes, glint, and natural light reflection are located on the pupil contour. The CRRL method contributes to enhance the stability, accuracy, and real-time quality of a gaze tracking system.

Glint Detection
For the reason that the glint's illumination intensity is suitable for Gaussian distribution, Gaussian function deformation solved by improved total least squares [77] is utilized to calculate the glint center. The detection result of glint is shown in Figure 6.  As a sample, some of the pupil and glint centers detected are shown in Table 1.

Calibration Model
As expressed in Equation (5), at least three, six, and 10 polynomial coefficients are required to be calculated, respectively, when a 1st, 2nd, and 3rd order linear polynomial is utilized for calibration, which means that at least three, six, and 10 calibration markers are required. When the number of calibration markers needed is too large, unessential input items can be removed according to principal component analysis to reduce the number of polynomial coefficients to be solved. Generally, based on an overall consideration of the real-time quality and accuracy of a gaze tracking system, four-and five-marker calibration models are most widely employed for 1st order calculation, while six-and nine-marker calibration models are most widely employed for 2nd order calculation [78,79].
Considering that there is some motion between the wearable gaze tracking system and the user's head, error of gaze point data will occur along with a drifting motion. In this paper, position coordinates of quadrangular NIR LEDs are considered as inputs of gaze estimation model to compensate for error caused by drifting motion. As shown in Figure 7, for the purpose of comparison, four-, six-, nine-, and 16-marker calibration models are employed in the process of calculating mapping function. Gaze direction is estimated with and without error compensation. The gaze tracking accuracy of the two cases is compared.
Appl. Sci. 2016, 6, 174 9 of 17 As a sample, some of the pupil and glint centers detected are shown in Table 1.

Calibration Model
As expressed in Equation (5), at least three, six, and 10 polynomial coefficients are required to be calculated, respectively, when a 1st, 2nd, and 3rd order linear polynomial is utilized for calibration, which means that at least three, six, and 10 calibration markers are required. When the number of calibration markers needed is too large, unessential input items can be removed according to principal component analysis to reduce the number of polynomial coefficients to be solved. Generally, based on an overall consideration of the real-time quality and accuracy of a gaze tracking system, four-and five-marker calibration models are most widely employed for 1st order calculation, while six-and nine-marker calibration models are most widely employed for 2nd order calculation [78,79].
Considering that there is some motion between the wearable gaze tracking system and the user's head, error of gaze point data will occur along with a drifting motion. In this paper, position coordinates of quadrangular NIR LEDs are considered as inputs of gaze estimation model to compensate for error caused by drifting motion. As shown in Figure 7, for the purpose of comparison, four-, six-, nine-, and 16-marker calibration models are employed in the process of calculating mapping function. Gaze direction is estimated with and without error compensation. The gaze tracking accuracy of the two cases is compared.

Gaze Point Estimation
An improved artificial neural network (DLSR-ANN) based on direct least squares regression is developed to calculate the mapping function between pupil-glint vectors and calibration markers. For four-, six-, nine-, and 16-marker calibration models, the number of training samples is selected as 180. The number of hidden nodes is equal to the number of training samples. The (or ) coordinate is set as the output of the neural network. Two separate DLSR-ANNs are utilized to estimate the and coordinates of the gaze point on the screen. Each separate neural network has the same inputs. Weighting function g , g , g is respectively determined as g , g , β β | | , β , g β ln cosh . The transfer function for input, hidden, and output layers is selected as the derivative of g , g , g , which is respectively calculated

Gaze Point Estimation
An improved artificial neural network (DLSR-ANN) based on direct least squares regression is developed to calculate the mapping function between pupil-glint vectors and calibration markers. For four-, six-, nine-, and 16-marker calibration models, the number of training samples is selected as 180. The number of hidden nodes is equal to the number of training samples. The x (or y) coordinate is set as the output of the neural network. Two separate DLSR-ANNs are utilized to estimate the x and y coordinates of the gaze point on the screen. Each separate neural network has the same inputs. Weighting function g 1 ptq, g 2 ptq, g 3 ptq is respectively determined as g 1 ptq " 1´e´β1 t 2 1`e´β1 t 2 , The transfer function for input, hidden, and output layers is selected as the derivative of g 1 ptq, g 2 ptq, g 3 ptq, which is respectively calculated as . Learning rate parameter µ is determined by µ " µ j " 0.0025 (when a four-marker calibration model is employed, j " 1, 2, 3, 4; when a six-, nine-, or 16-marker calibration model is employed, j " 1, 2, 3, 4, 5, 6). In order to acquire optimal learning and training results, β 1 , β 2 , β 3 is respectively determined as β 1 " 0.8, β 2 " 0.7, β 3 " 0.7 through a process of trial and error. The initial condition a p0q of an integrator associated with learning rules is acquired through linear polynomial calculation in direct least squares regression.
In the developed wearable gaze tracking system, an array of four near-infrared light emitting diodes (NIR LEDs) is employed instead of the conventional single one. The NIR LEDs array can form well-distributed illumination around the human eye, which contributes to extract pupil and glint characteristics more stably and precisely. In addition the center position coordinates of quadrangular NIR LEDs, considered as inputs of the neural network, can further compensate for error caused during the process of gaze point calculation. When a calibration process is accomplished, a model with 8ˆ8 test markers is employed to validate the calculation accuracy of the gaze point. Figure 8a-d shows the gaze point estimated through the proposed method with/without considering error compensation, utilizing a four-, six-, nine-, or 16-marker calibration model, respectively. The cyan " " symbols represent actual reference gaze points on the monitor screen. The magenta "+" symbols represent gaze points estimated through the proposed method without considering error compensation. The blue "*" symbols represent gaze points estimated through the proposed method considering error compensation.
Learning rate parameter μ is determined by μ μ 0.0025 (when a four-marker calibration model is employed, 1,2,3,4; when a six-, nine-, or 16-marker calibration model is employed, 1,2,3,4,5,6). In order to acquire optimal learning and training results, β , β , β is respectively determined as β 0.8, β 0.7, β 0.7 through a process of trial and error. The initial condition 0 of an integrator associated with learning rules is acquired through linear polynomial calculation in direct least squares regression.
In the developed wearable gaze tracking system, an array of four near-infrared light emitting diodes (NIR LEDs) is employed instead of the conventional single one. The NIR LEDs array can form well-distributed illumination around the human eye, which contributes to extract pupil and glint characteristics more stably and precisely. In addition the center position coordinates of quadrangular NIR LEDs, considered as inputs of the neural network, can further compensate for error caused during the process of gaze point calculation. When a calibration process is accomplished, a model with 8 × 8 test markers is employed to validate the calculation accuracy of the gaze point. Figure 8a-d shows the gaze point estimated through the proposed method with/without considering error compensation, utilizing a four-, six-, nine-, or 16-marker calibration model, respectively. The cyan "  " symbols represent actual reference gaze points on the monitor screen. The magenta " " symbols represent gaze points estimated through the proposed method without considering error compensation. The blue " * " symbols represent gaze points estimated through the proposed method considering error compensation.

Gaze Estimation Accuracy Comparison of Different Methods
As shown in Figure 9, gaze estimation accuracy is expressed as intersection angle θ between actual gaze direction (A as gaze point) and estimated gaze direction (A' as gaze point). Angle θ can be calculated through Equation (12), where is the distance between the human eye and the monitor screen:

Gaze Estimation Accuracy Comparison of Different Methods
As shown in Figure 9, gaze estimation accuracy is expressed as intersection angle θ between actual gaze direction (A as gaze point) and estimated gaze direction (A' as gaze point).
Learning rate parameter μ is determined by μ μ 0.0025 (when a four-marker calibration model is employed, 1,2,3,4; when a six-, nine-, or 16-marker calibration model is employed, 1,2,3,4,5,6). In order to acquire optimal learning and training results, β , β , β is respectively determined as β 0.8, β 0.7, β 0.7 through a process of trial and error. The initial condition 0 of an integrator associated with learning rules is acquired through linear polynomial calculation in direct least squares regression.
In the developed wearable gaze tracking system, an array of four near-infrared light emitting diodes (NIR LEDs) is employed instead of the conventional single one. The NIR LEDs array can form well-distributed illumination around the human eye, which contributes to extract pupil and glint characteristics more stably and precisely. In addition the center position coordinates of quadrangular NIR LEDs, considered as inputs of the neural network, can further compensate for error caused during the process of gaze point calculation. When a calibration process is accomplished, a model with 8 × 8 test markers is employed to validate the calculation accuracy of the gaze point. Figure 8a-d shows the gaze point estimated through the proposed method with/without considering error compensation, utilizing a four-, six-, nine-, or 16-marker calibration model, respectively. The cyan "  " symbols represent actual reference gaze points on the monitor screen. The magenta " " symbols represent gaze points estimated through the proposed method without considering error compensation. The blue " * " symbols represent gaze points estimated through the proposed method considering error compensation.

Gaze Estimation Accuracy Comparison of Different Methods
As shown in Figure 9, gaze estimation accuracy is expressed as intersection angle θ between actual gaze direction (A as gaze point) and estimated gaze direction (A' as gaze point). Angle θ can be calculated through Equation (12), where is the distance between the human eye and the monitor screen: Angle θ can be calculated through Equation (12), where L is the distance between the human eye and the monitor screen: The standard deviation of gaze estimation accuracy θ is defined as Equation (13), where θ represents the mean value of θ j pj " 1, 2,¨¨¨, Kq and K is the total number of gaze points estimated: 3.5.1. Gaze Estimation Accuracy without Considering Error Compensation Figure 10 shows a comparison of gaze estimation accuracy and standard deviation calculated through the proposed method and other neural network methods, respectively, without considering error compensation. The proposed method can provide an accuracy of 1.29˝, 0.89˝, 0.52˝, and 0.39w hen a four-, six-, nine-, or 16-marker calibration model is utilized for calibration, respectively. The maximum gaze estimation error through the proposed method for a four-, six-, nine-, or 16-marker calibration model is, respectively, 2.45˝, 1.98˝, 1.21˝, and 0.82˝. The specific results are shown in Table A1.
The standard deviation of gaze estimation accuracy θ is defined as Equation (13), where θ represents the mean value of θ 1,2, ⋯ , and is the total number of gaze points estimated: 3.5.1. Gaze Estimation Accuracy without Considering Error Compensation Figure 10 shows a comparison of gaze estimation accuracy and standard deviation calculated through the proposed method and other neural network methods, respectively, without considering error compensation. The proposed method can provide an accuracy of 1.29°, 0.89°, 0.52°, and 0.39° when a four-, six-, nine-, or 16-marker calibration model is utilized for calibration, respectively.
The maximum gaze estimation error through the proposed method for a four-, six-, nine-, or 16-marker calibration model is, respectively, 2.45°, 1.98°, 1.21°, and 0.82°. The specific results are shown in Table A1 of the Appendix.  Figure 11 shows the comparison of gaze estimation accuracy and standard deviation calculated respectively through the proposed method and other NN (Neural Network) methods considering error compensation. The proposed method can provide an accuracy of 1.17°, 0.79°, 0.47°, and 0.36°, respectively, when a four-, six-, nine-, or 16-marker calibration model is utilized for calibration. When considering error compensation, the improvement percentage of gaze estimation accuracy for four-, six-, nine-, and 16-marker calibration models is 9.3%, 11.2%, 9.6%, and 7.6%, respectively. The specific results are shown in Table A2 of the Appendix.  Figure 11 shows the comparison of gaze estimation accuracy and standard deviation calculated respectively through the proposed method and other NN (Neural Network) methods considering error compensation. The proposed method can provide an accuracy of 1.17˝, 0.79˝, 0.47˝, and 0.36r espectively, when a four-, six-, nine-, or 16-marker calibration model is utilized for calibration. When considering error compensation, the improvement percentage of gaze estimation accuracy for four-, six-, nine-, and 16-marker calibration models is 9.3%, 11.2%, 9.6%, and 7.6%, respectively. The specific results are shown in Table A2 of the Appendix.

Conclusions
In this paper, a novel 2D gaze estimation method based on pupil-glint vector is proposed on the basis of conventional gaze tracking methods. In order to realize the accurate estimation of gaze direction, an improved artificial neural network (DLSR-ANN) based on direct least squares regression is developed. Learning rate parameter, weighting function, and corresponding coefficients are determined according to trial and experience. Detected coordinates of pupil-glint vectors are applied as inputs to train an improved neural network. The mapping function model is solved and then utilized to calculate gaze point coordinates. An array of four NIR LEDs is employed to form quadrangular glints. The NIR LEDs array can generate well-distributed illumination around the human eye, which contributes to extracting pupil and glint characteristics more stably and precisely. In addition, the center coordinates of quadrangular NIR LEDs, considered as additional inputs of neural network, can further compensate for the error caused during the process of calculating the gaze point, which can enhance the accuracy of gaze point coordinates. When the gaze tracking system is established, calibration models with different numbers of markers are utilized to validate the proposed method. When a four-, six-, nine-, or 16-marker calibration model is employed for the calibration process, the proposed method can achieve an accuracy of 1.29°, 0.89°, 0.52°, and 0.39°, respectively. Taking into account error compensation, the proposed method can achieve an accuracy of 1.17°, 0.79°, 0.47°, and 0.36°, respectively, when a four-, six-, nine-, or 16-marker calibration model is employed. When considering error compensation, the improvement percentage of gaze estimation accuracy for a four-, six-, nine-, or 16-marker calibration model is 9.3%, 11.2%, 9.6%, and 7.6%, respectively. The experimental results show that the training process of the proposed method is stable. The gaze estimation accuracy of the proposed method in this paper is better than that of conventional linear regression (direct least squares regression) and nonlinear regression (generic artificial neural network). The proposed method contributes to enhance the total accuracy of a gaze tracking system.

Conclusions
In this paper, a novel 2D gaze estimation method based on pupil-glint vector is proposed on the basis of conventional gaze tracking methods. In order to realize the accurate estimation of gaze direction, an improved artificial neural network (DLSR-ANN) based on direct least squares regression is developed. Learning rate parameter, weighting function, and corresponding coefficients are determined according to trial and experience. Detected coordinates of pupil-glint vectors are applied as inputs to train an improved neural network. The mapping function model is solved and then utilized to calculate gaze point coordinates. An array of four NIR LEDs is employed to form quadrangular glints. The NIR LEDs array can generate well-distributed illumination around the human eye, which contributes to extracting pupil and glint characteristics more stably and precisely. In addition, the center coordinates of quadrangular NIR LEDs, considered as additional inputs of neural network, can further compensate for the error caused during the process of calculating the gaze point, which can enhance the accuracy of gaze point coordinates. When the gaze tracking system is established, calibration models with different numbers of markers are utilized to validate the proposed method. When a four-, six-, nine-, or 16-marker calibration model is employed for the calibration process, the proposed method can achieve an accuracy of 1.29˝, 0.89˝, 0.52˝, and 0.39˝, respectively. Taking into account error compensation, the proposed method can achieve an accuracy of 1.17˝, 0.79˝, 0.47˝, and 0.36˝, respectively, when a four-, six-, nine-, or 16-marker calibration model is employed. When considering error compensation, the improvement percentage of gaze estimation accuracy for a four-, six-, nine-, or 16-marker calibration model is 9.3%, 11.2%, 9.6%, and 7.6%, respectively. The experimental results show that the training process of the proposed method is stable. The gaze estimation accuracy of the proposed method in this paper is better than that of conventional linear regression (direct least squares regression) and nonlinear regression (generic artificial neural network). The proposed method contributes to enhance the total accuracy of a gaze tracking system. Author Contributions: All authors made significant contributions to this article. Jianzhong Wang was mainly responsible for deployment of the system and revision of the paper; Guangyue Zhang was responsible for developing gaze estimation method, performing the experiments, and writing the paper; Jiadong Shi, the corresponding author, was responsible for conceiving and designing the experiments.

Conflicts of Interest:
The authors declare no conflict of interest. Table A1. Comparison of gaze estimation accuracy between proposed method and other NN methods without considering error compensation.