Real-Time Head Orientation and Eye-Tracking Algorithm Using Adaptive Feature Extraction and Refinement Mechanisms

Ye, Ming-Chang; Ding, Jian-Jiun

doi:10.3390/engproc2025092043

Open AccessProceeding Paper

Real-Time Head Orientation and Eye-Tracking Algorithm Using Adaptive Feature Extraction and Refinement Mechanisms^†

by

Ming-Chang Ye

and

Jian-Jiun Ding

^*

Graduate Institute of Communication Engineering, National Taiwan University, Taipei 106319, Taiwan

^*

Author to whom correspondence should be addressed.

^†

Presented at the 2024 IEEE 6th Eurasia Conference on IoT, Communication and Engineering, Yunlin, Taiwan, 15–17 November 2024.

Eng. Proc. 2025, 92(1), 43; https://doi.org/10.3390/engproc2025092043

Published: 30 April 2025

(This article belongs to the Proceedings of 2024 IEEE 6th Eurasia Conference on IoT, Communication and Engineering)

Download

Browse Figures

Versions Notes

Abstract

We propose a fast eye-tracking method that takes the depth image and the gray-scale infrared (IR) image using a traditional image processing algorithm. As an IR image contains one face and the corresponding depth image, the method locates the real coordinate of the camera with a high speed (>90 frames per second) and with and acceptable error. The method takes advantage of the depth information to quickly locate the face by shrinking the eyeballs. The method decreases the error rate but accelerates the operation speed. After finding the face region, less complicated computer vision algorithms are used at a high execution speed. Refinement mechanisms for extracting features and determining edge distribution are used to locate the eyeball’s position and transform the pixel coordinate of the image to the real coordinate.

Keywords:

eye tracking; real time algorithm; head orientation; interactive machine; infrared images

1. Introduction

Eye tracking is used to find the position of eyeballs. The eye-tracking method is designed for an application using the coordinate (x, y, z) of the eyes [1]. The coordinate is determined instantly after the camera captures an image that contains the eyes and is updated for every frame. The frame rate of the camera is the lower bound of the execution speed. A conventional camera captures frames at a rate of 90 images per second, but we used a camera that could take 270–320 frames per second, which is three times more than that of the conventional camera.

The flow chart and the visualization of the method in this study are shown in Figure 1 and Figure 2. The output three-dimensional coordinate of the sample images was (1.43514, 4.08796, 58.1) cm. The method was developed to achieve high execution speed by shrinking the image in the preprocessing step. The rotation of the face or the mask was used for the detection of the eyes. The head-state classification was also used to check if the face was rotated correctly (the orientation of the head) or if the mask was worn. Finally, we determined the position of the eyes on the shrunken image after detection and postprocessing. The midpoint of elements of the pair was transformed on the coordinate set by a camera.

2. Method

2.1. Input Images

In the method, two input images were captured at a speed of 90 frames per second. The pixels of the depth image were 0–65,535 (16 bits). The gray-scale infrared image had a range of 0–255. When two objects were captured in the same resolution at the same time, their positions and sizes were also identical. Therefore, if the midpoint of the eyeballs is at (r, c) on the image, the depth value at (r, c) can be computed on the 3D coordinate. This property enables the position of the face by decreasing the detection region in image preprocessing.

2.2. Preprocessing

The size of the input image was 848 columns by 480 rows, which was too large for the system to achieve high execution speed. Apart from the speed, the error rate must be reduced. By highlighting the facial region on images, confusing objects such as the background or the users’ clothes were omitted. Figure 3 shows the flow chart of image preprocessing. The region of interest (ROI) contained the facial region. Other parameters such as average intensity were also recorded (Table 1). The eye-tracking method measures the distance between the camera and the user less than 700 mm; so, the thresholding was defined as (1).

F_{0} = D < 800

(1)

Then, we used binary open and closing to remove noise and fill holes in the image by using (2) and (3).

F_{0} \leftarrow (F_{0} ⊖ k_{o}) \oplus k_{o}

(2)

F_{0} \leftarrow (F_{0} \oplus k_{c}) {⊖ k}_{c}

(3)

We searched the images as shown in Figure 4 to determine the ROI and other outputs, which are denoted “F” and are M rows by N columns (Figure 5).

2.3. Head State Classification

The face image was rotated to determine whether the mask was on or not. By changing either state, the detection process was conducted. If the head is tilted, a pair of eyeballs do not lie in the same horizontal position. Therefore, it is necessary to check the state of the face before detecting the eyeballs. Then, we represented each type of state as state variables. The name and brief introduction of state variables are listed in Table 2. The variables pitch, roll, and yaw are represented in “char”, and the variable mask is presented as “bool”. Figure 6 shows the examples of four head states.

2.3.1. Roll

In step 7 of procedure 1 in Table 1, a tilting angle θ was determined by rotating the head by −θ to make the look as if it was not tilted. In Figure 7, we show the result to make the orientation of the head align with the vertical axis by rotation.

2.3.2. Pitch-Up, Yaw, and Mask

We checked if the head was lifted, turned, or was on the mask using the information of the edge distribution of the lower part of F. We unrelated the mask with the three types of rotation into the face state as they affected the edge distribution on the bottom part of the face.

To obtain the edge of F, we used the Sobel operator (4) on F [3], and the Sobel operator was calculated using (5). I is the gray-scale image on the edges, and S is the edge distribution with the variable type, integer. In (6), Eg is the edge distribution represented by a binary image and

p_{80}

is the 80th percentile in S found by a histogram. In the histogram, the O(nlogn) was determined at a lower bound. Therefore, 1-pixels of Eg represented the top 20% likely-to-be-edge pixels.

G_{X} = [\begin{matrix} 1 & 0 & - 1 \\ 2 & 0 & - 2 \\ 1 & 0 & - 1 \end{matrix}] * I, G_{Y} = [\begin{matrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ - 1 & - 2 & - 1 \end{matrix}] * I

(4)

S (r, c) \leftarrow r o u n d (\sqrt{{G_{X} (r, c)}^{2} + {G_{Y} (r, c)}^{2}})

(5)

E g = S \geq p_{80}

(6)

The edge distributions of the lower part of different cases are shown in Table 3.

When the mask was not worn and the face was not lifted, there were “thick” edges at the bottom part of the image due to the fact that mouth and nose were there. When the mask was not worn and the head was lifted, there were no edges at the bottommost region, but several edges of the mouth or nose were visible below the half of the image. When the mask was worn and the head was not rotated, a horizontal edge was found across the face, which was caused by the border between the mask and the face. The number of edges on the lower part was significantly less than that on the upper part. When the mask was worn and the head was turned, a thin edge on the border between the mask and the cheeks was visible. When the mask was worn and the head was lifted, no visible edges except for side regions were observed. Thick edges of the mouth were observed in the x direction. In comparison, the vertical edge of the mask and the cheek was thinner. We excluded the thin edges by simply opening the image with a kernel with one row.

In the method, we determined the three state variables by Procedure 2 in Table 4 using the above principle.

To consider the edges of the mask or the face and remove the edge between the background and the user, steps 1 and 2 were used. Steps 3 to 5 were used to check if the face met the condition. If so, the system judged that the user did not wear mask and did not lift the head, and then determined if the head was turned from the position of thick edges. If there were no or few thick edges, the user lifted the head with a mask. Then, steps 6 to 8 were applied to judge the status of the face. Edges close to the center and located at the bottom half of F were determined.

∆_{u b}

in step 6 was computed to determine the number of edges in the upper part that was significantly larger than that in the lower part.

In step 8, the standard deviation of the x coordinates of thick edges and y coordinates of all edges was calculated. Standard deviation was used to determine the dispersion of a dataset in x or y directions.

Then, state classification is performed according to the criterions in Table 5. The edges of the mouth and the nose were concentrated compared to the horizontal edge of the mask. Therefore, if the standard deviation of x coordinates is small, the edges are relatively concentrated, and the system judges that the face without the mask was lifted.

When the standard deviation of the x coordinates was large, the dispersion of the y coordinate was observed. The vertical edge of the mask was invisible when the mask was worn and the face was not turned. Although the edges spread over the x-axis, they were concentrated on the y-axis. On the other hand, the vertical edge was visible and spread over the entire y-axis.

2.3.3. Pitch Down

To detect if the head was pitched down, the edge distribution of the upper part of the face was determined. When a person pitched the head down, the hair covered the upper part of the head. Therefore, the number of edges in the upper part decreased.

2.4. Detection and Postprocessing

After size reduction and determining the states of the face, the system detected the eyeballs from F. The first step was to determine find Mid[i], the mid-column coordinate of non-zero pixels at the i^th row. Then, the ROIs were calculated in four directions using (7) (Table 6).

E_{0} = b i t w i s e_a n d (F < t h r, F^{'} ⊖ k_{e})

(7)

where F^′ means the non-zero region of F.

After thresholding, we viewed each connected component of one pixel as a candidate for the eye, and we removed all candidates whose area was larger than 0.015MN or was not located in the ROI of eyes.

Finally, we selected a pair of candidates that were most likely to be eyes. Every pair was composed of a candidate on the left to the midline of the face, and a candidate was located near it, and every pair had a score calculated by using (8).

s c o r e (L_{i}, R_{j}) = V A_{L_{i}} A_{R_{j}} w_{L_{i}} w_{R_{j}}

(8)

where

A_{L_{i}}

and

A_{R_{j}}

are the areas of the candidates in the pair;

w_{L_{i}}

and

w_{R_{j}}

are weight that will be explained later; and V means the validity of the pair, in other words, if the pair is impossible to be the pair of eyes, V = 0, and Table 7 shows the condition that V = 1.

The uppermost and bottommost coordinates of the two elements were labeled as

L_{i}

and

R_{j}

, in the pair as

u_{L_{i}}, u_{R_{j}}, b_{L_{i}}, b_{R_{j}}

. The two candidates were overlapped in a row by “k” means. The two integer intervals (

u_{i} - k, b_{i} + k

) and (

u_{j} - k, b_{j} + k

) were overlapped. “

w_{L_{i}}

” and “

w_{R_{j}}

” are the “weight”, which is either 1 or 4 when pitch != ‘D’. The weight of a side was set to be 4 when the candidate formed a nonzero score pair, which was the lowest (the row coordinate was the highest), or the candidate overlapped with the lowest candidate of its side in the row in a range of 2. Otherwise, the weight was 1. Because eyebrows were a dark object, they became candidates, and the positions and areas were close to those of eyeballs which caused misdetection. Therefore, the weight was used to prevent the situation that eyebrows were chosen because eyebrows were positioned higher than the eyeballs. In the pitch-down cases, the eyeballs’ region shrank compared with that in the non-rotated case; so, the weight became higher.

The candidate was the eyes connected to the eyebrows, and the problem was solved by postprocessing. When the difference in intensity,

∆_{I}

, in a candidate was larger than 3, the pixels whose intensity >

\max (intensity) - \frac{∆_{I}}{2}

was reset to 0. Then, because this operation broke a candidate into several pieces, the system selected two candidates again (Table 8).

2.5. Coordinate Transformation and Correction

After determining the two candidates, we transformed the midpoint of the centroids of the two candidates to the global coordinate by using the pinhole camera model. The raw result might be an image with lots of non-smooth peaks, oscillations. To avoid this, a correction was made as follows.

A coordinate that is the nth sample, f[n], is an oscillation if (9) is satisfied.

|f^{″} [n]| \geq 0.3 and m > 9

(9)

where

f^{″} [n]

is the second-order difference and the variable m is the samples after the last reset of the correction step. The corrected image was reset (m reset to 0) if the number of continuous corrections was larger than 5. This was necessary because the error presented in Figure 8 occurred.

When a coordinate was an oscillation, we corrected it by finding the linear regression [4] of the last nine samples behind it and then corrected the value of the coordinate by treating it as the tenth samples. Then, the value was predicted using the linear regression model. The result after correction is shown as the orange curve in Figure 9, where the blue curve is the x-curve in Figure 10.

3. Discussions

The execution time of the method was calculated. We used Intel core i7-6700K @ 4.00 GHz and ISO C++ 14 on the platform Visual Studio 2022 v143. In the developed method, 270–320 frames per second was used, which is three times faster than the camera’s capture rate.

3.1. Detection Rates in Different Cases

A camera was placed in front of the user who had different head states, and images were captured at a rate of 90 frames per second. A detection was regarded successful if the pair of candidates overlapped the eyeballs on F. Figure 11 shows an example of successful detection. The detection rate is listed in Table 9. Because the detection was successful only if the head state classification was correct, the detection rate was regarded as the lower bound of the successful rate of the head orientation detection.

3.2. Accuracy and Stability

We evaluated the method using random face images. The printed face with a mask and glasses was put in 27 different places in front of the camera at distances of 50, 60, and 70 cm and 9 angles per distance. Then, 9000 frames (100 s) were captured in each place. Three parameters were defined using (10) to (12), and the result are shown in Table 10, Table 11 and Table 12.

a c c u r a c y = m e a n (d a t a) - r e f e r e n c e p o s i t i o n

(10)

p r e c i s i o n = s t d (d a t a),

(11)

\begin{matrix} m a x s h i f t = m a x (m a x (d a t a) - m e a n (d a t a), \\ m e a n (d a t a) - m i n (d a t a)) \end{matrix}

(12)

4. Conclusions

We propose a fast eye-tracking method based on computer vision and image processing algorithms. We shrank the input image with the assistance of depth information to significantly increase the speed of the system and remove interferences. For eyes in the rotated face, we adopted edge detection and PCA to determine the status of the face. After two preparatory processes, we detected eyes by using the connected component analysis. Then, we transformed the average of the centers of the eyes to the coordinates in the camera model. The system showed a speed of 270 to 320 frames per second with stability. The standard deviation (precision) of the method in detecting eyes on static face was lower than 1 mm, and the maximum deviation (maxshift) was lower than 5 mm. For accuracy and high detection rate, we did not eliminate the influence of the camera distortion, and the mean error (accuracy) seldom exceeded 10 mm. The detection rate surpassed 90% when the face was not rotated.

Author Contributions

Conceptualization, M.-C.Y. and J.-J.D.; methodology, M.-C.Y.; software, M.-C.Y.; validation, M.-C.Y.; formal analysis, J.-J.D.; investigation, J.-J.D.; resources, M.-C.Y.; data curation, M.-C.Y.; writing—original draft preparation, M.-C.Y.; writing—review and editing, J.-J.D.; visualization, M.-C.Y.; supervision, J.-J.D.; project administration, J.-J.D.; funding acquisition, J.-J.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the AUO Corporation.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

No new data were created.

Acknowledgments

The authors thank for the support of the AUO Corporation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Satoh, K.; Kitahara, I.; Ohta, Y. 3D Image Display with Motion Parallax by Camera Matrix Stereo. In Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems, Hiroshima, Japan, 17–23 June 1996; pp. 349–357. [Google Scholar]
Wijewickrema, S.N.R.; Papliński, A.P. Principal Component Analysis for The Approximation of an Image as an Ellipse. In Proceedings of the 13th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision 2005, Plzen, Czech Republic, 31 January–4 February 2005; pp. 69–70. [Google Scholar]
Vincent, O.R.; Folorunso, O. A Descriptive Algorithm for Sobel Image Edge Detection. Informing Sci. IT Educ. Conf. 2009, 40, 97–107. [Google Scholar]
Jiang, B.N. On The Least-Squares Method. Comput. Methods Appl. Mech. Eng. 1998, 152, 239–257. [Google Scholar] [CrossRef]

Figure 1. Flow chart of the developed method.

Figure 2. Depth input and IR input, after preprocessing, after detection, and postprocessing from left to right.

Figure 3. Flow chart of preprocessing.

Figure 4. (a) Input depth image, (b) after thresholding, and (c) after morphology.

Figure 5. Sample outputs of the preprocessing step.

Figure 6. (a) Roll right, (b) pitch up, (c) yaw left, and (d) mask on.

Figure 7. Canceled effect of “roll”.

Figure 8. Error caused by continuous “corrections”.

Figure 9. Corrected x position in Figure 10.

Figure 10. Sample output of the 3D coordinates.

Figure 11. Successful detection.

Table 1. Searching step of preprocessing.

Step 1	$Find the upmost 1 - pixel, F_{0} (r_{u}, c_{u})$
Step 2	$For each 1 - pixel F_{0} (r, c)$ $, if \|c - c_{u}\| < 150$ , we call the 1-pixel “valid”
Step 3	When both r and c are multiple of 5, push it into a set of pixel coordinates, P
Step 4	Stop condition: the number of the rows which have been searched exceed $m a x (10, \frac{75,000}{A v e r a g e d e p t h t o t h a t p o i n t})$
Step 5	“roi” is the smallest rectangle that contains all “valid” 1-pixels
Step 6	Average depth and intensity are also recorded
Step 7	Performing PCA on P to find tilting angle θ. [2]
Step 8	Resize the image by $\frac{a v e r a g e d e p t h}{70}$

Table 2. State variables.

States	Meaning	Possible Values	Meanings
Pitch	The head lifted or fallen.	‘\0’, ‘U’, ‘D’	None, Up, Down
Roll	The head tilted left or right.	‘\0’, ‘L’, ‘R’	None, Left, Right
Yaw	The head turns left or right	‘\0’, ‘L’, ‘R’	None, Left, Right
Mask	Whether wears a mask.	False, True	No, Yes

Table 3. Edge distribution in different cases.

(Mask, Pitch, Yaw)	IR Image	Edge Distribution
(0, ‘\0’, ‘\0’)
(0, ‘\0’, ‘L’ or ‘R’)
(0, ‘U’, ‘\0’)
(1, ‘\0’, ‘\0’)
(1, ‘\0’, ‘L’ or ‘R’)
(1, ‘U’, ‘\0’)

Table 4. Procedure 2: state classification part 2.

Step 1	Erode (F != 0) by a 1 rows by 23 columns kernel, we obtain a binary image F^′.
Step 2	$E g \leftarrow b i t w i s e_a n d (E g, F^{'}), {E g}_{T} \leftarrow b i t w i s e_a n d ({E g}_{T}, F^{'})$
Step 3	$Find C e n = \frac{1}{M} \sum_{i = 0}^{M - 1} {c e n}_{i}, L = \frac{C e n}{2}, R = \frac{M + C e n}{2}$ $, where {c e n}_{i}$ is the midpoint of 1-pixels at i-th row of F’ and M is the number of rows of F
Step 4	In this step, we search ${E g}_{T} (r, c)$ for the three regions: (a) $r > f l o o r (\frac{4}{5} M)$ $and C e n - 5 \leq c \leq C e n + 5$ (b) $r > f l o o r (\frac{4}{5} M)$ $and L \leq c \leq L + 5$ (c) $r > f l o o r (\frac{4}{5} M)$ $and R - 5 \leq c \leq R$ $Then, calculate \frac{number of 1 - pixels}{number of all pixels}$ in each region, denote them as $e_{c}$ $, e_{l}$ $, and e_{r}$ respectively.
Step 5	(1) If $e_{c} > 0.1$ && ( $e_{l}$ and $e_{r}$ < 0.15 \|\| abs( $e_{l}$ − $e_{r}$ ) < 0.1): Mask $\leftarrow$ $F a l s e, Y a w \leftarrow$ $‘ \ 0 ’, P i t c h \leftarrow$ ‘\0’, procedure stops (2) If ( $e_{l}$ or $e_{r} \geq$ 0.15 && abs( $e_{l}$ − $e_{r}$ ) $\geq$ 0.1: Mask $\leftarrow$ $False, P i t c h \leftarrow$ ‘\0’. $If e_{l} \geq e_{r} :$ Yaw $\leftarrow$ ‘L’, procedure stops $otherwise Y a w \leftarrow$ ‘R’, procedure stops.
Step 6	Search $E g (r, c)$ for $Cen - 0.3 N \leq c \leq Cen + 0.3 N$ and $r > floor (\frac{1}{2} M)$ where N is the number of columns of $F^{'} . Then calculate σ_{r}$ and $μ_{c}$ . The former is the standard deviation of the row coordinates of all found 1-pixels, and the latter is the average value of column coordinates of all found 1-pixels. In addition, calculate $n_{u}$ and $n_{b}$ , the number of 1-pixels located at the upper half and bottom half of the search region, then calculate $∆_{u b} = \frac{n_{u} - n_{b}}{number of all 1 - pixels}$ .
Step 7	$Search {E g}_{T} (r, c)$ for the same range as that in step 7. $If \frac{number of 1 - pixels}{number of all pixels} \leq$ 0.01: Mask ← True, Pitch ← ‘U’, Yaw ← ‘\0’, procedure stops. Also, calculate $σ_{T c}$ , which is the standard deviation of the column (x) coordinates of all found 1-pixels in the searching of in this step.
Step 8	(1) If $σ_{T c}$ > 0.125N && ( $σ_{r}$ < 0.1M \|\| $∆_{u b}$ > 0.8): Mask $\leftarrow$ $True, P i t c h \leftarrow$ $‘ \ 0 ’, Y a w \leftarrow$ ‘\0’, procedure stops. (2) If $σ_{T c}$ > 0.125N && $σ_{r} \geq$ 0.1M: Mask $\leftarrow$ $True, P i t c h \leftarrow$ ‘\0’. $If μ_{c}$ closer to L: Yaw $\leftarrow$ $‘ R ’ otherwise Y a w \leftarrow$ ‘L’, procedure stops. (3) If $σ_{T c} \leq$ 0.125N: Mask $\leftarrow$ $False, P i t c h \leftarrow$ $‘ U ’, Y a w \leftarrow$ ‘\0’, procedure stops

Table 5. Procedure 3: state classification part 3.

Step 1	Search ${E g}_{T}$ $(r, c) for f l o o r (\frac{4}{10} M) > r > f l o o r (\frac{1}{10} M)$ and all c.
Step 2	$Check if \frac{number of 1 - pixels}{number of all pixels} < 0.02$ $. If so, P i t c h \leftarrow$ ‘D’

Table 6. Procedure 4: find the ROI of eyes.

Step 1	$Search E g$ along Mids[i]. i start at 0.1M, But if Yaw != ‘\0’ or Pitch == ‘D’, start at 0.2M.
Step 2	$When E g$ (i, Mids[i]) != 0, procedure stops and record i.
Step 3	Pitch == ‘U’	up $\leftarrow$ i + 0.05M, if up > 0.3M, set it to be 0.3M.
	Pitch == ‘D’	up $\leftarrow$ i + 0.1M, if up > 0.7M or <0.5M, set it to be 0.7M or 0.5M.
	Yaw != ‘\0’	up $\leftarrow$ i + 0.1M, if up > 0.4M or <0.3M, set it to be 0.4M or 0.3M.
	No pitch & yaw	up $\leftarrow$ i + 0.1M, if up > 0.4M or <0.25M, set it to be 0.4M or 0.25M.
Step 4	Pitch == ‘U’	low $\leftarrow$ up + 0.25M
Step 4	Not pitch up	low $\leftarrow$ up + 0.3M
Step 5	left ← Mids[ $\frac{u p + l o w}{2}$ $] - 0.35 N, right \leftarrow$ Mids[ $\frac{u p + l o w}{2}$ ] + 0.35N

Table 7. Thresholds in different cases.

Mask off	No Yaw&Pitch	$t h r = p_{11} i n r o i + 1$
	Pitch up	$t h r = p_{11} i n r o i + 2$
	Pitch down or Yaw	$t h r = \min (r o i) + 5$
Mask on	No Pitch&Yaw	$t h r = p_{11} i n r o i + 1 + a$
	Pitch up	$t h r = p_{11} i n r o i + 3 + a^{2}$
	Pitch down	$t h r = \min (r o i) + 5$
	Yaw	$t h r = \min (r o i) + 5 + a$

Table 8. Requirements for V = 1.

State	Distance	Y-Position	Other
Yaw != ‘\0’	0.15N~0.4N	Overlapped by range 2	Width both < 0.1N
Pitch == ‘U’	0.3N~0.5N	Overlapped by range 0
else	0.3N~0.5N		$\| H o r i z o n t a l A n g l e \| < 20 °$

Table 9. Detection rates in different cases.

Mask	Orientation	Successful Frame/All Frame	Detection Rate
off	Normal	957/988	0.968623
	Yaw	820/902	0.909091
	Pitch up	843/938	0.898721
	Pitch down	476/581	0.819277
on	Normal	898/999	0.898899
	Yaw	772/911	0.84742
	Pitch up	777/901	0.862375
	Pitch down	326/407	0.800983

Table 10. Results in 50 cm.

Degree	acc_x (mm)	acc_y (mm)	acc_z (mm)	pre_x (mm)	pre_y (mm)	pre_z (mm)	shift_x (mm)	shift_y (mm)	shift_z (mm)
−18.88	−2.4492	−3.4629	0.8525	0.6194	0.6164	0.555	2.5382	3.7481	1.6015
−14.51	−2.6547	−4.1006	−0.1107	0.5741	0.5359	0.6304	2.4783	3.0624	2.5647
−9.85	−1.8524	−4.0964	0.1772	0.6375	0.5396	0.6323	3.0258	3.2125	2.2768
−4.98	−0.431	−4.3965	−0.1065	0.7668	0.7185	0.6496	3.037	8.8755	2.5605
0	0.1263	−4.5644	−0.2726	0.6634	0.6055	0.5406	4.6565	2.8621	1.7316
4.98	0.4748	−5.0363	−0.17	0.6156	0.5305	0.5882	2.6783	2.4536	2.624
9.85	1.512	−5.292	−0.6269	0.6749	0.5539	0.6578	2.9627	3.0016	3.0809
14.51	2.4336	−5.2404	−2.8617	0.7007	0.6676	0.7664	2.9136	3.1101	2.3307
18.88	3.2321	−5.7841	−0.4962	0.6968	0.5597	0.6014	3.8441	3.5357	3.0198

Table 11. Results in 60 cm.

Degree	acc_x (mm)	acc_y (mm)	acc_z (mm)	pre_x (mm)	pre_y (mm)	pre_z (mm)	shift_x (mm)	shift_y (mm)	shift_z (mm)
−18.88	−3.8767	−6.0773	−0.4985	0.6471	0.6978	0.5193	5.3883	2.6787	3.5215
−14.51	−2.8117	−6.5566	−0.1195	0.635	0.5305	0.6568	6.9853	2.3665	2.9055
−9.85	−2.5407	−6.4374	−0.9341	0.5689	0.5563	0.5425	5.1153	9.8616	2.0909
−4.98	−1.879	−6.9238	−0.0603	0.5661	0.5834	0.4523	2.4781	2.9914	2.0093
0	−0.3749	−7.5495	−0.4131	0.8067	0.7756	0.5818	2.6884	2.7945	2.6119
4.98	0.3135	−7.5778	−0.8175	0.701	0.5564	0.596	2.7689	2.3434	2.2075
9.85	1.3503	−8.0553	−0.8716	0.5376	0.556	0.6401	2.4033	2.3747	2.8206
14.51	2.9756	−8.915	−0.4168	0.7161	0.5543	0.6403	3.0826	2.753	2.6082
18.88	4.7939	−9.1908	−0.0723	0.8076	0.6773	0.6404	4.5021	3.1392	4.9427

Table 12. Results in 70 cm.

Degree	acc_x (mm)	acc_y (mm)	acc_z (mm)	pre_x (mm)	pre_y (mm)	pre_z (mm)	shift_x (mm)	shift_y (mm)	shift_z (mm)
−18.88	−3.57	−7.8049	1.938	0.5845	0.595	0.7452	2.739	2.4911	2.492
−14.51	−1.2507	−8.3596	0.6858	0.4244	0.4474	0.7636	5.9303	2.1004	2.7492
−9.85	−1.9073	−8.8737	1.919	0.6381	0.6587	0.7262	2.2437	2.5373	2.511
−4.98	0.6765	−8.9088	2.012	0.5839	0.5578	0.6365	2.3947	2.5722	2.557
0	0.3695	−9.1156	1.2228	0.5503	0.5205	0.8112	2.5646	2.7014	2.2122
4.98	0.5622	−9.9922	1.5319	0.4461	0.4612	0.8425	2.0999	2.1098	2.0769
9.85	2.1014	−10.3007	0.1522	0.7928	0.6795	0.8221	4.5986	2.6173	3.2828
14.51	2.9638	−10.4852	0.8016	0.7589	0.6964	0.8911	2.5768	2.7318	2.6334
18.88	4.5946	−11.1624	1.2776	0.8217	0.8634	0.9154	3.1266	3.3616	2.8176

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ye, M.-C.; Ding, J.-J. Real-Time Head Orientation and Eye-Tracking Algorithm Using Adaptive Feature Extraction and Refinement Mechanisms. Eng. Proc. 2025, 92, 43. https://doi.org/10.3390/engproc2025092043

AMA Style

Ye M-C, Ding J-J. Real-Time Head Orientation and Eye-Tracking Algorithm Using Adaptive Feature Extraction and Refinement Mechanisms. Engineering Proceedings. 2025; 92(1):43. https://doi.org/10.3390/engproc2025092043

Chicago/Turabian Style

Ye, Ming-Chang, and Jian-Jiun Ding. 2025. "Real-Time Head Orientation and Eye-Tracking Algorithm Using Adaptive Feature Extraction and Refinement Mechanisms" Engineering Proceedings 92, no. 1: 43. https://doi.org/10.3390/engproc2025092043

APA Style

Ye, M.-C., & Ding, J.-J. (2025). Real-Time Head Orientation and Eye-Tracking Algorithm Using Adaptive Feature Extraction and Refinement Mechanisms. Engineering Proceedings, 92(1), 43. https://doi.org/10.3390/engproc2025092043

Article Menu

Real-Time Head Orientation and Eye-Tracking Algorithm Using Adaptive Feature Extraction and Refinement Mechanisms^†

Abstract

1. Introduction