Exploring Gaze Movement Gesture Recognition Method for Eye-Based Interaction Using Eyewear with Infrared Distance Sensor Array †

: With the spread of eyewear devices, people are increasingly using information devices in various everyday situations. In these situations, it is important for eyewear devices to have eye-based interaction functions for simple hands-free input at a low cost. This paper proposes a gaze movement recognition method for simple hands-free interaction that uses eyewear equipped with an infrared distance sensor. The proposed method measures eyelid skin movement using an infrared distance sensor inside the eyewear and applies machine learning to the time-series sensor data to recognize gaze movements (e.g., up, down, left, and right). We implemented a prototype system and conducted evaluations with gaze movements including factors such as movement directions at 45-degree intervals and the movement distance difference in the same direction. The results showed the feasibility of the proposed method. The proposed method recognized 5 to 20 types of gaze movements with an F-value of 0.96 to 1.0. In addition, the proposed method was available with a limited number of sensors, such as two or three, and robust against disturbance in some usage conditions (e.g., body vibration, facial expression change). This paper provides helpful ﬁndings for the design of gaze movement recognition methods for simple hands-free interaction using eyewear devices at a low cost.


Introduction
In recent years, eyewear-type information devices have become popular with information devices able to be used anytime, anywhere. For example, there are now smart glasses/optical see-through head-mounted displays (e.g., Epson MOVERIO, google glass), AR/XR Glasses (e.g., HoloLens, Nreal Air), and smart audio glasses (e.g., Bose Frames). It has become essential to provide simple hands-free input methods because these devices are used in various situations in which it is difficult to use the hands for input operations.
In connection with this, the gaze movement input function can be one of the helpful options for hands-free input using eyewear devices. This is because gaze movement directly reflects the user's intention and has been used as a hands-free gesture input method [1][2][3][4][5]. In addition, it is easy to apply gaze movement recognition sensors to eyewear devices. Furthermore, gaze movement input is useful for situations where other promising input methods are not suitable for a variety of reasons, including psychological pressure in terms of privacy and embarrassment when inputting (e.g., voice input [6,7], facial expression input [8]), physical fatigue (e.g., hand input in the air [9]), and requirements to carry and wear additional equipment (e.g., ear accessory devices [10], finger ring devices [11]). Therefore, a gaze movement recognition method suitable for hands-free input on an eyewear device in continuous daily use is required. Some methods are promising. For example, the method of mounting a small camera on eyewear can recognize eye movements with high accuracy; however, the methods are not suitable for continuous daily wear for the general public. This is because the price of such consumer products is high [12,13] and power consumption and processor performance are high due to real-time processing of the image data [14][15][16]. Although methods of mounting the EOG sensor on the nasal bridge of eyewear are currently the least expensive, these methods have issues in terms of the recognition accuracy [16,17]. There is still a need to explore methods which can enable continuous daily use at a low cost.
One existing method uses an infrared distance sensor on eyewear to constantly sense eyelid skin movements. For example, this method is used for blink recognition (e.g., Google Glass [18], Dual Blink [19]) and facial expression recognition [20,21]. Infrared distance sensors are low-cost and sufficiently small to be installed in various eyewear devices (e.g., vision correction glasses and AR/VR glasses). In addition, previous studies have demonstrated that infrared distance sensors can effectively recognize skin movements and require low processor and power consumption for data processing [11,19,22]. Since it has been found that gaze movements may appear through the skin of the eyelids (e.g., gaze vibrations in sleep appear through the eyelids [23]), it is assumed that gaze movements can be recognized using this method. Although the recognition accuracy of this approach is assumed to be less than that of methods that use camera images, it is important to know the recognition accuracy of this approach and what applications it can be used for. If this approach is effective, this approach could help add simple gaze movement interaction functions to many eyewear devices for daily use at a low cost.
Therefore, we propose a gaze movement recognition method for simple hands-free interaction that uses eyewear equipped with an infrared distance sensor. The proposed method measures eyelid skin movement using an infrared distance sensor inside the eyewear, applies machine learning to the time-series sensor data, and recognizes gaze movement (e.g., up, down, left, and right). We implemented a prototype system of the proposed method. We conducted three types of evaluations. In Evaluation 1, we evaluated 20 types of gaze movement with 14 subjects to verify the feasibility of the proposed method as a simple hands-free input interface. In Evaluation 2, we investigated the necessary sensor positions when the number of sensors is small, such as two or three. The results of this evaluation were intended to help improve the proposed method so that it could be used with fewer sensors. In Evaluation 3, we evaluated whether the recognition accuracy of the proposed method changed under certain conditions, such as reattachment of the device, body vibration, and facial expression change. The results of this evaluation were important to aid understanding of the robustness of the proposed method for various situations typical of the use of the wearable devices.
We published the concept of the proposed method in work-in-progress papers in 2019 [24] and 2021 [25]. This paper improves Evaluation 1 and adds Evaluation 2 and Evaluation 3. Although Evaluation 1 was described in a previous paper, this paper improved Evaluation 1 by increasing the number of subjects from 5 to 14 and by adding an analysis of the recognition accuracy for each gesture. This paper also adds Section 2.2, Section 2.3 and Section 7.

Eye Activity Sensing Technology Using Eyewear
In recent years, various methods have been proposed to sense eye activity using eyewear in different situations.
One existing method uses a camera attached to eyewear to recognize eye activity by processing the acquired images of the eye. Representative consumer devices include the Tobii Pro Glasses [12] (Tobii), Pupil Core [13] (Pupil Labs), and the SMI eye-tracking glasses (SMI). These methods can perform highly accurate recognition; however, the accuracy of such devices can be reduced by various factors, such as lighting conditions [14,15]. There are also issues related to cost and hardware. The data are acquired high frequently; thus, processor performance and power consumption to process the camera data are high. Since high processor processing power is required for data processing, a computer must process the data after data collection; thus, it is difficult to realize real-time applications, e.g., live notifications. Therefore, this method is mostly used for expert purposes, such as research, and is not used regularly by general consumers. Although a camera is not used, a method exists for an infrared corneal limbus tracker that uses a light source (infrared LED) and an optical sensor (phototransistor) to recognize the contour of the cornea. This method is also used with eyewear to sense gaze [26].
Another method uses an electrooculogram (EOG) sensor mounted on eyewear. This method senses the change in electric potential when the eye moves by placing electrodes on the skin near the eye, which is effective because the human eye is an electric dipole. Such EOG methods are used to estimate fatigue by sensing eyelid and eye movements [27,28]. Recently, some studies have attempted to reduce the size and number of electrodes assuming constant use in daily life [29]. Although it does not involve eyewear, an existing method recognizes the gaze movement direction using an EOG sensor attached to other devices, such as earphones and headphones [2,30]. In addition, the JINS MEME [31] is a consumer device that uses an EOG sensor on the nasal bridge and can be used at all times [32,33]. However, this method is not good at recognizing vertical gaze movement and distinguishing gaze and eyelid movements, and there is a limit to its ability to recognize eye activity with high resolution [16,17].
Another method uses an infrared distance sensor on eyewear, similar to our study. This method recognizes eye activity using an infrared distance sensor installed in front of the eye to recognize blinks. In addition, Google Glass has a function to recognize winks (blinks) that the user exaggerates as a gesture input [34]. The Dual Blink application [19] has shown that this method can recognize blinks and that it is suitable for constant use from various perspectives, including power consumption. Dual Blink can also induce blinks by providing a stimulus, e.g., blowing air onto the cornea. Futami [24] showed examples of the use of three sensors to recognize four types of gaze movements (up, down, left, right) and the gazing point of nine segments of the field of view. Masai et al. [35] also provided examples of using 16 sensors to recognize 7 types of gestures, including blink and gaze movement with movement directions at 90 • intervals, and the gazing point of 25 segments of the field of view. In addition, this method has been used to recognize facial expressions by sensing eyelid and cheek movements [21,22,36]. Based on these previous studies, it is expected that eyewear equipped with infrared distance sensors will increase in the future.

Recognition Method of Skin Movement Using Infrared Distance Sensor
An infrared distance sensor attached to the wearer has been used to recognize the skin's movement. For example, many studies have applied infrared distance sensors to eyeglass-type wearing as outlined below.
Fukumoto et al. [21] proposed a method that recognizes a smile by sensing the movements around the cheeks and outer corners of the eyes. Masai et al. [22] proposed a method that recognizes facial expressions, such as smiles and expressions of surprise, in daily life by sensing the movements of the eyelids and cheeks. Masai et al. [37] proposed a method that recognizes gesture input of rubbing the cheeks with the hands by sensing the movements of the cheeks. Regarding blink detection, Google Glass [18] recognizes intentional blink gestures and Dual Blink [19] recognizes natural blinks. Dual Blink [19] also has functions to physically induce user's blinks by hitting the eyelids with an air cannon. Futami et al. [24,25] and Masai et al. [35] proposed a method that recognizes gaze movements. In addition, some studies have shown that an infrared distance sensor is suitable for wearable devices that can be continuously used from multiple perspectives (e.g., power consumption) [19,22]. For improving the VR experience, an infrared distance sensor is used in a head-mounted display (HMD), with mapping of the movement of the skin of the user's face to the facial expression of an avatar in the virtual space [36]. An infrared distance sensor is used for the earphone to recognize input gestures such as methods of pulling the ear [38] and moving the tongue [39]. Other examples include ear accessory devices to recognize facial gestures [10], a wristband to recognize hand-shape gestures [40], a ring to recognize finger movement gestures [11], and a mouthpiece to recognize tongue gestures [41].
These studies have shown that an infrared distance sensor can recognize skin movements with high accuracy and robustness. In addition, infrared distance sensors are inexpensive, lightweight, compact, have low power consumption, and use little data. This reduces the cost and size of the battery and processor, making them suitable for wearable devices that are continuously used. Based on previous studies, an infrared distance sensor is considered to be suitable for the proposed method.

Simple Hands-Free Input Method
This paper investigates the feasibility of simple hands-free input for the proposed method. A hands-free input method is useful in situations and for people where a general input interface is not available, such as for people with a disability or in situations where both hands cannot be used for input. Previous studies have shown that various hands-free input methods expand the usage scenarios of information devices and make the use of information devices comfortable. Similar to these previous studies, our method is predicted to expand the usage scenarios of information devices and make the use of information devices comfortable.
Many methods for hands-free input have been proposed. Recognition methods of face or body movements (e.g., face [8,10], ear [38], tongue [39], finger [11]) are often used for simple hands-free input. Postures of the wrist [42] and torso [43] are also used for navigation input. The input method of speech recognition with a microphone is used for applications such as text input [6] and navigation [7]. One of the disadvantages of this method is that the recognition accuracy of the method decreases in situations where there is significant environmental sound [44]. Gaze movement is also used as an input method [2,3,24,25,35] because gaze can reflect the user's intention. There is also a method that uses movement of the head, for purposes such as turning pages when browsing [45] and operating the cursor (e.g., desktop devices [46], mobile devices [47]). There are methods that use a combination of head movement and gaze [48] and a combination of brain and gaze [49].

Method
The proposed method recognizes gaze movements (e.g., up, down, left, and right) based on the movement of the skin around the eyes that accompanies gaze movements. A flowchart of the proposed method is shown in Figure 1. The proposed method comprises three main steps: (1) First, for the sensing mechanism, multiple infrared distance sensors measure the skin movements of the eyelids. This skin movement is sensed based on the change in distance from the skin and the infrared distance sensor installed in front of the eyes (i.e., the inner circumference of the glasses). The infrared distance sensor uses infrared light to measure distance. (2) Second, the feature amounts of time series data are extracted.
(3) Finally, in the recognition step, machine learning is applied to the data to recognize gaze movement.

Recognition Mechanism
To recognize gaze movement, DTW and kNN were used for time series data. DTW (dynamic time warping) is an algorithm for calculating the similarity value of time series.
The details are as follows: (1) First, the similarity value between the acquired data and the training data is calculated by DTW. The training data includes all the gesture data that were prepared in advance. The similarity value is calculated for each sensor. (2) From the training data, the data with high similarity to the acquired data is selected by kNN. From the percentage of gesture labels of the selected data, we calculate the affiliation probability of which gesture label the acquired data is. For example, if kNN (k = 3) selected three training data of gesture label 1, the affiliation probability of a gesture label 1 is 100%. The affiliation probability is calculated for each sensor. (3) Then, the gesture label with the highest sum of the affiliation probability of all sensors is judged as the recognition result of the acquired data. For example, if the total number of sensors is two and the affiliation probability of gesture label 1 of sensor one and sensor two are 0.3 and 0.4, the sum of the affiliation probability of gesture label 1 of the acquired data is 0.7 (i.e., the total value of all sensors).

Implementation
We implemented a prototype system of the proposed method. The entire prototype system consisted of a sensor device, microprocessor (Arduino), laptop, and software. The software was implemented with Processing and Python. Figure 2 shows the system configuration. The prototype device is shown in Figure 3B and consisted of 16 infrared distance sensors (TRP-105, SANYO Electric Co., Ltd., Moriguchi, Japan) that were installed inside the spectacle frame. The sampling rate was 200 [Hz]. k (k = 7) of knn was set to the optimum parameter based on the data of the experimenter although recognition accuracy does not change significantly when the value of k is changed. We prepared two types of feature amounts to determine which one had higher recognition accuracy. The first one was a 16-dimensional pattern, which was the value of the 16 infrared distance sensors. The second one was a 40-dimensional pattern, which was the sum of three factors, i.e., the 16 values of the 16 infrared distance sensors, 16 values of the difference between the 16 adjacent sensors, and eight values of the difference between the 16 diagonal sensors.

Evaluation 1: Gaze Movement Recognition
In this experiment, we evaluated the eyeball movement recognition accuracy of the proposed method. This experiment focused on two points: (1) First, we evaluated the proposed method's accuracy and limitations in terms of gaze movement recognition. Here, multiple gaze movement patterns were prepared, and the recognition accuracy of each pattern was evaluated. (2) Second, we evaluated the feasibility of a gaze input interface based on gaze movement.
The subjects included 14 college students (average age: 21 years, maximum age: 32 years, minimum age: 20 years; 10 males, 4 females). This study was approved by the research ethics committee of Kobe University (Permission number: 03-19) and was carried out according to the guidelines of the committee.

Types of Gaze Movement
We prepared 20 types of gaze movement gestures as shown in Figure 4. The detailed movement involved the following: There were essentially 10 types of gaze movement gestures (G1 to G10) (i.e., Gesture 1 to Gesture 10), where each gesture involved small and large movement patterns, e.g., G1S and G1L (i.e., G1 Small and G1 Large). The 10 main types of movement patterns are summarized as follows: G1 (up and down movement), G2 (up and down movement in the right diagonal direction), G3 (right and left movement), G4 (down and up movement in the left diagonal direction), G5 (down and up movement), G6 (down and up movement in the right diagonal direction), G7 (left and right movement), G8 (up and down movement in the left diagonal direction), G9 (hourglass-shaped movement), and G10 (square movement).
The detailed sizes of each gesture were as follows: Figure 5A shows the distance interval of the marks used when moving the gaze. The marks were placed at a transparent shield, and the transparent shield was positioned in front of the subject's face with a visor, as shown in Figure 5B. Note that the letter P in the following explanation indicates the point P in Figure 5A. For movement patterns G1 to G8, a small movement pattern was between the start point of P13 and one next point (e.g., P12, P7, or P8). For example, pattern G1S involved a movement order of P13, P12, and P13. Pattern G2S involved a movement order of P13, P7, and P13. A large movement pattern was between the start point of P13 and the two next points (e.g., P11, P1, and P3). For example, pattern G1L involved a movement order of P13, P11, and P13, and G2L involved a movement order of P13, P1, and P13. For movement patterns G9 and G10, a small movement pattern was between the start point (P17) and the two next points (e.g., P7). For example, G9S involved a movement order of P17, P7, P19, P9, and P17, and G10S involved a movement order of P17, P7, P9, P19, and P17. Then, a large movement pattern was between the start point (P21) and two next points (e.g., P1). For example, G9L involved a movement order of P21, P1, P25, P5, and P21, and G10L involved a movement order of P21, P1, P5, P25, and P21. Note that the movement speed was set to 0.5 s when moving between points. For example, about 0.5 s were required to move from P13 to P8 and to move from P13 to P3; thus, gestures G1 to G8 required 1 s, and gestures G9 to G10 required 2 s. How to perform each eye movement gesture and the speed of movement were instructed using a video and the experimenter's explanation.

Experimental Procedure
The experimental gaze movement task involved performing a designated gaze movement. Here, the subject wore the prototype device and sat on a chair. In addition, a shield ( Figure 5B) was attached to the subject's head. Visible points were arranged in front of the shield to guide the subject's gaze movements. Based on the points on the shield, the subject was instructed on how to perform each eye movement gesture and the speed of movement using a video and the experimenter's explanation. One trial involved performing the 20 types of gaze movements. The order of gaze movements performed was random. 10 trials were performed in this experiment. Therefore, data per person consisted of 10 trials for each of 20 different gestures. For gaze movements G1S to G8S and G1L to G8L, the time-series data were recorded for 1 s (i.e., 200 samples) because the time required for a single movement was approximately 1 s. For gaze movements G9S, G10S, G9L, and G10L, the time-series data were recorded for 2 s (i.e., 400 samples) because the time required for a single movement was approximately 2 s. This experiment assumed an intentional gaze input gesture; thus, the subjects did not blink while performing the gaze movements. 10-fold cross-validation was performed on the data acquired in the 10 trials. Recognition accuracy was evaluated for the following three patterns.

•
(1) The 5-movement pattern comprised G1S, G3S, G5S, G7S, and G9S. This was set to evaluate the feasibility of a gaze input interface using the proposed method. An example of an application with a hands-free input interface is a media player (e.g., music, video, and still images). To operate such applications, it is sufficient to use approximately five types of commands, e.g., play, stop, forward, and back. In fact, the effectiveness of the hands-free input method has been evaluated previously using approximately five gestures [8]. In addition, if five gestures can be recognized, dozens of diverse inputs can be produced by combining those gestures. This movement pattern was considered to evaluate whether the proposed method could recognize differences in moving direction at every 90 • s interval. • (2) The 10-movement pattern comprised G1S to G10S. This pattern was set to evaluate whether the proposed method could recognize differences in moving direction every 45 • s interval. • (3) The 20-movement pattern comprised all movements containing the small and large movement patterns. This pattern was set to evaluate whether the difference in the degree of movement in the same direction could be recognized. Figure 6 shows the F-value results. Table 1 shows the results for F-value, precision, and recall. The results are shown for each type of feature amount. Figure 7 shows the confusion matrix for each gaze movement. The value shown is the F-value. This result is the 16dimensional pattern. This figure helps us understand the tendency toward misrecognition of each gaze direction due to the increased number of gaze movements. Figure 8 and Table 2 show the results for each individual. The value shown is the F-value. This result is the 16-dimensional pattern. This helps us understand individual differences in the recognition accuracy of the proposed method.

Result
The 5-movement pattern gave an average F-value of 1.0. The 5-movement pattern consisted of gaze movements with vertical and horizontal movement directions at 90 • intervals. Therefore, this result indicates that the proposed method was able to recognize the moving direction at 90 • intervals with high accuracy.
The 10-movement pattern gave an average F-value of approximately 0.99. In addition, the proportion of F-values of 0.9 or more was 100% (14 subjects). The 10-movement pattern consisted of gaze movements with movement directions at 45 • intervals. Therefore, this result indicates that the proposed method was effective for all subjects and could recognize the moving direction at 45 • intervals with high accuracy.
The result of the 20-movement pattern gave an average F-value of approximately 0.96. In addition, the proportion of F-values of 0.9 or more was 86% (12 subjects). The 20movement pattern consisted of gaze movements with movement directions at 45 • intervals containing small and large movement patterns. These results indicate that our method could recognize the change in gaze movement, which was about twice as large, in the same direction with high accuracy.
The tendency for erroneous recognition due to increased gaze movements was as follows: Erroneous recognition between small and large movement patterns (e.g., between G5S and G5L) tended to increase, as demonstrated by the 20-movement pattern. Therefore, erroneous recognition was assumed to increase as the difference in the degree of movement in the same direction decreased. There was almost no difference between the results for the 16-dimensional pattern and the 40-dimensional pattern. Therefore, it is appropriate to adopt the 16-dimensional pattern considering the calculation cost.

Discussion
The results showed the feasibility of the proposed method for a simple hands-free input method. The gaze movement appears on the eyelid skin, and the proposed method recognized the pattern of gaze movement. The proposed method can recognize 5 to 20 types of gaze movement patterns with an F-value of 0.9 or higher. Movement directions at 90 • intervals were recognized with high accuracy, and movement directions at 45 • intervals and the movement distance difference in the same direction were also recognized. A previous study showed that about five types of command recognition are necessary and sufficient for simple hands-free input [8]. For example, to operate a media player (e.g., music, video, still image), it was sufficient to use approximately five types of commands, e.g., play, stop, forward, and back. In addition, previous studies investigating a simple hands-free input method showed that five to seven types of input gestures of the face or gaze are recognized with F-values of 0.85 to 0.9 [8,35,50], although these studies were not based on the same experiments. Based on these, the proposed method seems to have the same level of recognition accuracy as the previous study, and the proposed method can be utilized as a simple hands-free input method.

Evaluation 2: Recognition Accuracy with a Small Number of Sensors
In this experiment, we investigated whether the proposed method could recognize gaze movement gestures with a small number of sensors, such as two or three sensors. In addition, we investigated the necessary sensor positions for recognition when the number of sensors was small. It was feasible to implement the proposed method using the minimum number of sensors; thus, this evaluation provided an example of its application.
Here, the proposed method was evaluated with five gestures that were the same as in the previous section. We investigated the combination of two or three sensors with high recognition accuracy. We narrowed down the number of sensors to be used to eight on the upper side of the eyeglass frame. This was because the sensors on the lower side of the eyeglass frame could not be used for gaze movement recognition when the lower side of the eyeglass frame was wide. This setting was enough to assess whether the proposed method could recognize a small number of sensors. With two sensors, the recognition accuracy was calculated for 28 patterns of the combination of two sensors. With three sensors, the recognition accuracy was calculated for 56 patterns of the combination of three sensors. Since this evaluation aimed to assess whether the proposed method could recognize a small number of sensors, we only used a single pattern of feature amounts of the 16-dimensional pattern.

Result
10-fold cross-validation was performed on the data acquired in the 10 trials. Tables 3 and 4 show the combinations of the top five patterns. Regarding the two-sensor pattern, the highest F-value was 0.918, and the combination of sensors with the highest F-value included the 1st and 8th sensors. Regarding the three-sensor pattern, the highest F-value was 0.966, and the combination of sensors with the highest F-value was the 4th, 7th, and 8th sensors. Figure 9 shows the examples of the proposed method using two or three sensors.

Discussion
The results showed that the proposed method could be used with a small number of sensors, such as two or three sensors. Sensor positions where gaze movement was likely to be sensed were also identified. The proposed method with a small number of sensors was assumed to be adequate for a simple hands-free input method since it had an F-value of 0.9 or more. Although reducing the number of sensors can decrease the recognition accuracy, reducing the number of sensors is an appropriate design for implementing the proposed method for reasons such as lower cost and lighter weight of the whole system, if the necessary recognition accuracy can be obtained.

Evaluation 3: Robustness in Gaze Movement Recognition
This experiment evaluated whether the recognition accuracy of the proposed method changed under various conditions assuming the usage scenario of wearable devices. The conditions were reattachment, body vibration, and facial expression change. Each condition was as follows: Condition 1. Reattachment condition This condition evaluated the accuracy of the proposed method after the sensor device was reattached. This condition was selected to assess whether it is necessary to reacquire the learning data when using the proposed method after the sensor device is reattached. Thirteen subjects participated in this experiment. Firstly, each subject performed ten trials for the 5-movement pattern task. This task was the same as for Evaluation 1 and under the same conditions as Evaluation 1. Subjects who had already participated in Evaluation 1 did not perform this data acquisition exercise. The data for these ten trials were used as training data. Then, each subject reattached the sensor device and performed five trials of the 5-movement pattern task. The data for these five trials were used as test data.

Condition 2. Body vibration condition
This condition evaluated the accuracy of the proposed method when body vibration occurs (e.g., walking). Twelve subjects participated in this experiment. In this evaluation, walking vibration was considered. Strong body vibration (e.g., a vibration that occurs when dashing or dancing) was not assessed because such vibration is considered to clearly decrease the accuracy of the proposed method. Firstly, each subject performed ten trials of 5-movement pattern task. This task was the same as for Evaluation 1 and performed under the same conditions as Evaluation 1. Subjects who had already participated in Evaluation 1 did not perform this data acquisition. The data for these ten trials were used as training data. Then, each subject performed five trials of the 5-movement pattern task while walking on the spot. The walking speed reflected the natural speed of each individual. The data for these five trials were used as test data.

Condition 3. Facial expression change condition
Although the same value for the infrared distance sensors can be obtained with the same facial expression, we should verify whether it is necessary to reacquire the learning data when the facial expression changes. Therefore, this condition evaluated whether the learning data acquired when normal facial expressions can be used for different facial expressions. A smile was adopted for the facial expression. This was because the previous study [19], which involved detection of blinks using an infrared distance sensor, reported an example in which the accuracy of blink recognition was reduced due to the facial expression of a smile. The number of subjects was thirteen. Firstly, each subject performed ten trials of the 5-movement pattern task. This task was the same as for Evaluation 1 and under the same conditions as Evaluation 1. Subjects who had already participated in Evaluation 1 did not perform this data acquisition exercise. The data for these ten trials were used as training data. Each subject performed five trials of the 5-movement pattern task while smiling. The smiling expression involved the intentional raising of both corners of the mouth so that the teeth could be seen. The data of these five trials were used as test data.

Result
The recognition results for each condition are shown in Figure 10 and Table 5. Values indicate F-value, precision, and recall. This result relates to the 16-dimensional pattern.
The results of the reattachment conditions were as follows: As an overall tendency, the recognition accuracy after reattachment slightly decreased when using the learning data before reattachment compared with Evaluation 1. The average F-value was 0.95 for the 5-movement pattern, which was lower than that in Evaluation 1. From the results for each individual, the proportion of F-values of 0.9 or more was 77% (10 subjects of 13 subjects). These results showed that the 5-movement pattern could be recognized with high recognition accuracy even after reattachment.
The results of the body vibration condition were as follows. As an overall tendency, the recognition accuracy decreased. The average F-value was 0.89 for the 5-movement pattern. From the results for each individual, the proportion of F-values of 0.9 or more was 58% (7 of 12 subjects). These results indicate that the accuracy of our method is assumed to decrease in situations where the body vibrates. However, more than half of the people had high recognition accuracy, which indicates that there were individual differences in the effect of body vibration on accuracy.
The results of the facial expression change conditions were as follows. As an overall tendency, the recognition accuracy slightly decreased. The average F-value was 0.91 for the 5-movement pattern. From the results for each individual, the proportion of F-values of 0.9 or more was 77% (10 of 13 subjects). Although these results indicate that the accuracy of our method is assumed to decrease when facial expression changes, more than half of the participants had high recognition accuracy.

Discussion
The results showed that the proposed method could recognize the gaze movement pattern, although the recognition accuracy decreased due to the disturbance in each condition. The recognition accuracy did not decrease significantly after reattachment. Therefore, it can be assumed that the proposed method can obtain high recognition accuracy if the mounting positions of the eyewear devices are almost the same. However, if the mounting position shifts significantly after remounting and the recognition accuracy decreases, it is necessary to correct the mounting position. If the recognition accuracy does not improve after correcting the mounting position, it is necessary to acquire the learning data again. Since the learning data can be acquired in a short time (e.g., about 1 min is required for five types of gestures and five trials), the burden on the user is not large. Under the body vibration condition and facial expression change condition, the recognition accuracy decreased to about 0.9 of the F-value. This F-value seems to be sufficient for the proposed method. However, in order to obtain high recognition accuracy under such conditions, the user should apply ingenuity when using hands-free input, such as returning the facial expression to neutral (i.e., returning to the same facial expression as when acquiring the learning data) and reducing body vibration (e.g., slowing down walking speed).

Limitations and Future Work
This paper showed the following. Evaluation 1 showed that the proposed method could recognize patterns of gaze movements and be feasible for a simple hands-free input method. Evaluation 2 provided an example in which the proposed method could recognize the eye movement pattern even with a small number of sensors, such as two to three, leading to reducing the cost and weight of the entire system. Evaluation 3 showed that the proposed method could recognize the gaze movement pattern, although the recognition accuracy decreased due to disturbance in the usage scene of the wearable device. These results are helpful for designing gaze movement recognition methods using eyewear with infrared distance sensors. This section describes future work.
Individual differences that may affect recognition accuracy: We plan to investigate individual characteristics that reduce the recognition accuracy of the proposed method. For example, Sub.14 in Evaluation 1 had lower recognition accuracy than other subjects. Some factors may affect the recognition accuracy of the proposed method. For example, recognition accuracy will decrease if the eyelid skin does not move much when moving the gaze. In addition, recognition accuracy may decrease for people with very dark skin color since the infrared distance sensor often has difficulty responding to black color. Since the subjects were mainly young Asians, we plan to evaluate subjects with various attributes in the future.
Evaluation in the natural environment: We plan to evaluate the proposed method in the natural environment. For example, the intensity of ambient light is a factor that affects the recognition accuracy of the proposed method. The value of the infrared distance sensor differs between outdoors and indoors because the intensity of ambient light differs between outdoors and indoors. Although the same value of infrared distance sensors can be obtained with the same lighting intensity, we should verify whether it is necessary to reacquire the learning data when entering an environment with different lighting intensity.
Investigation of the limits of recognizable gaze movement patterns and expanding the number of input commands while maintaining recognition accuracy: This paper examined about 20 types of gaze movement patterns. Although these patterns are assumed to be enough in many hands-free input scenes, we plan to investigate how much difference in movement patterns can be recognized in the future (e.g., the difference in moving direction at 10, 20, and 30 degrees intervals). In addition, although one gesture was performed independently in this experiment, the number of input commands can be increased while maintaining high recognition accuracy by combining gestures with high recognition accuracy. For example, if users want 10 input commands with higher recognition accuracy, instead of using the gaze movements with 45 • intervals used in Evaluation 1, combining multiple gaze movements with 90 • intervals is assumed to be useful. Command examples consisting of two gestures include continuous inputs of G1 and G3 or repeating G1 twice.
Application of the proposed method: We plan to verify whether the gaze movement recognition of the proposed method can be applied to applications other than hands-free input interfaces. Recognizing and clarifying the characteristics of humans is one of the applications. For example, medically important findings of individual characteristics have been clarified from eye activities, such as ADHD [51], autism [52], Williams syndrome [53], schizophrenia, and Parkinson's disease. In addition, the characteristics of the gaze movement pattern of a highly skilled player (e.g., gaze movement of a highly skilled basketball player before the shot is small [54]) may be investigated. Although the recognition accuracy is assumed to be lower if gaze movement contains randomized disturbances, the proposed method can recognize them if a similar pattern of gaze movements occurs consistently.

Conclusions
In this study, we proposed a method for recognizing gaze movements using eyewear equipped with multiple infrared distance sensors as a simple method to add gaze interaction functions to eyewear. We implemented a prototype system and conducted evaluations of the gaze movements, including movement directions at 45 • intervals and the movement distance difference in the same direction. The results showed that the proposed method could recognize gaze movement patterns and be feasible as a simple hands-free input method. The proposed method recognized five types of movement with an F-value of 1.0, 10 types of movement with an F-value of 0.99, and 20 types of movement with an F-value of 0.96. In addition, the proposed method recognized the gaze movement pattern in conditions of reattachment, body vibration, and facial expression change, although the recognition accuracy decreased. The results also showed that the proposed method recognized gaze movements even with a small number of sensors, such as two to three. Since the previous studies for a simple hands-free input method showed that five to seven types of input gestures of the face or gaze are recognized with an F-value of 0.85 to 0.9, the proposed method seems to have the same level of recognition accuracy as the previous study. These results are helpful for designing gaze movement recognition methods for continuous daily use, using eyewear with infrared distance sensors.