Development and Application of a Human–Machine Interface Using Head Control and Flexible Numeric Tables for the Severely Disabled

: The human–machine interface with head control can be applied in many domains. This technology has the valuable application of helping people who cannot use their hands, enabling them to use a computer or speak. This study combines several image processing and computer vision technologies, a digital camera, and software to develop the following system: image processing technologies are adopted to capture the features of head motion; the recognized head gestures include forward, upward, downward, leftward, rightward, right-upper, right-lower, left-upper, and left-lower; corresponding sound modules are used so that patients can communicate with others through a phonetic system and numeric tables. Innovative skin color recognition technology can obtain head features in images. The barycenter of pixels in the feature area is then quickly calculated, and the o ﬀ set of the barycenter is observed to judge the direction of head motion. This architecture can substantially reduce the distraction of non-targeted objects and enhance the accuracy of systematic judgment. validation, C.-S.L., W.-C.C. and Y.-L.H.; formal analysis, C.-S.L. and W.-C.C.; investigation, C.-M.C.; resources, C.-S.L.; data curation, C.-T.C.; writing—original draft preparation, C.-S.L.; writing—review and editing, C.-S.L. and C.-M.C.; visualization, C.-M.C.; supervision, C.-S.L.; project administration, C.-S.L., Y.-L.H., and C.-M.C.; funding acquisition, C.-S.L., C.-M.C., and Y.-L.H. All


Introduction
For some people with limb problems, it is impossible to have accurate control over hand movement. Common assistive devices for them include a head-controlled control stick-shaped device, mouth-held stick device, and mouth-controlled blow device. However, these assistive tools are not sanitary, comfortable, or convenient because users have to wear or touch some mechanical sensing devices [1][2][3]. Given the inconvenience of the existing assistive systems for the disabled, a system, featuring a combination of computer-based vision technology and movement detection, was developed [4][5][6]. Based on information about the head, it is a household control system for the disabled, and includes color segmentation and head recognition [7].
When the head is held straight without any deflection, the eyes are almost on the same horizontal line; when the head tilts right or left, the angle between the line linking the canthi of the eyes and the horizontal line changes. Therefore, a tilt of the head can be judged according to the angle. There are many algorithms and configurations for head-control devices and sensing human activities [8]. In an

Conceptual Design
This research used image-sensing equipment to detect head rotation position information and complete the comparison of head posture changes. Due to the use of a telescope lens, the user does not need to be very close to the camera to get a clear head image, which can save processing time for face or head detection. A personal computer and a C++ object-oriented development platform were used for program development. In Microsoft Windows, the user's control speed was improved with simple operation methods. In the aspect of searching for the target object (the head), first we adjusted the proper skin color threshold value for image input and, at the same time, calculated the coordinates of the center of the head (Figure 1).
where R XY , G XY , and B XY are the values of RGB components. Rmax, Rmin, Gmax, Gmin, Bmax, and Bmin are the upper and lower values of RGB channels in the skin image. g(x,y) is the new gray level. Equation (1) assumed that in the RGB space skin color is cuboid, which was a simplification. The user had to move along a specific path, according to the coordinates indicated by the system, to confirm that the system had retrieved the feature object. If not, the system must return to the initial setting to retrieve the feature object again. According to the head direction judgment rule of this study, the skin color barycenter point of the front-side image was taken as the datum point [18].
When the head moved, the system recorded the position of skin color points and then judged if the head moved upwards or downwards, according to the change in the position of the skin color points. If the number of skin color points after head movement was smaller than that of the frontside position, the result would be taken as an additional condition for subsequent direction judgment. The barycenter after object movement was then sought in the dynamic image search box. The relevance between the barycenter point after the movement and the new datum point (Ux2, Uy2) was obtained in the judgment equation, and its corresponding status was displayed ( Figure 2) [19].
The total number of skin color highlights of the front side of the head was Ua, while Ua2 was the total number of skin color highlights after head movement. When Ua2 was smaller than Ua, the judgment parameter Uc was set as 0; otherwise it was set as 1. We then activated the dynamic image search method and set up a dynamic image search frame by taking the coordinates of the barycenter point of the skin color part in the image (Figure 1) as the initial value. After finding the barycenter point (Ux, Uy) in the image, the system proceeded to the validation process.
where x f and y f are the coordinates of the points of the skin color part in the image. a is the number of pixels in the skin image. The user had to move along a specific path, according to the coordinates indicated by the system, to confirm that the system had retrieved the feature object. If not, the system must return to the initial setting to retrieve the feature object again. According to the head direction judgment rule of this study, the skin color barycenter point of the front-side image was taken as the datum point [18].
When the head moved, the system recorded the position of skin color points and then judged if the head moved upwards or downwards, according to the change in the position of the skin color points. If the number of skin color points after head movement was smaller than that of the front-side position, the result would be taken as an additional condition for subsequent direction judgment. The barycenter after object movement was then sought in the dynamic image search box. The relevance between the barycenter point after the movement and the new datum point (Ux 2 , Uy 2 ) was obtained in the judgment equation, and its corresponding status was displayed ( Figure 2) [19]. Here (Ux, Uy) are, respectively, the components x and y of the datum point; (Ux2, Uy2) are the respective components x and y of the barycenter coordinates of the image after its movement. When the head was raised or lowered, due to light reflection, many pixels belonging to the face were unrecognized as skin, so the system must have had a fault-tolerant design. After the above-mentioned judgment of head direction, the system was able to judge the upward, downward, leftward, and rightward movements of the head. In Figure 3, the blue parts show the facial positions traced by the system. The background was normally the wall of an office room. It can be defined as follows: where w, v are the adjustment parameters. It could be extended to judge the right-upward, rightdownward, left-upward, and left-downward movements of the head ( Figure 4). For initialization, we usually set w = 20 and v = 10. The skin color area would be much larger when the neck was visible.
As the head moved downwards, the area would be smaller. The adjustment parameters could be applied here to solve this problem. The situations in Figure 4 did not show significant differences in blue-area size, but through the innovative judgment method of this system, head movement in eight directions could still be correctly judged. The total number of skin color highlights of the front side of the head was Ua, while Ua 2 was the total number of skin color highlights after head movement. When Ua 2 was smaller than Ua, the judgment parameter Uc was set as 0; otherwise it was set as 1.
Here (Ux, Uy) are, respectively, the components x and y of the datum point; (Ux 2 , Uy 2 ) are the respective components x and y of the barycenter coordinates of the image after its movement. When the head was raised or lowered, due to light reflection, many pixels belonging to the face were unrecognized as skin, so the system must have had a fault-tolerant design. After the above-mentioned judgment of head direction, the system was able to judge the upward, downward, leftward, and rightward movements of the head. In Figure 3, the blue parts show the facial positions traced by the system. The background was normally the wall of an office room. It can be defined as follows: where w, v are the adjustment parameters. It could be extended to judge the right-upward, right-downward, left-upward, and left-downward movements of the head ( Figure 4). For initialization, we usually set w = 20 and v = 10. The skin color area would be much larger when the neck was visible. As the head moved downwards, the area would be smaller. The adjustment parameters could be applied here to solve this problem. The situations in Figure 4 did not show significant differences in blue-area size, but through the innovative judgment method of this system, head movement in eight directions could still be correctly judged.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 14      Figure 5 shows an example of a Chinese sentence table [20] frequently used in numeric human-machine interfaces. This research established two different Chinese code tables. For the disabled, digitalized Chinese phonetic alphabet tables are far more complicated and difficult to use than the English ones. The selection of the eight-direction model or the four-direction model would lead users to the two-numeric-codes procedure ( Figure 6) or the three-numeric-codes procedure (Figure 7). The rules in Chinese reduced the problems that users had with the flexible phonetic alphabet. It could obtain eight postures of up, down, left, right, top right, bottom right, top left, and bottom left, then converted the feature point position result detected by the system for judging. For example, the English letter "E" has a two numeric code of 15. The user needed to complete the first head-action selection "1" and then the second head-action selection "5" to complete the input of the English letter "E". The system did not need an extra activation command, so the user moved their head to select a sentence. When a command was made by mistake, the user needed only to shake his head and try again.

User Testing
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 14 Figure 5 shows an example of a Chinese sentence table [20] frequently used in numeric humanmachine interfaces. This research established two different Chinese code tables. For the disabled, digitalized Chinese phonetic alphabet tables are far more complicated and difficult to use than the English ones. The selection of the eight-direction model or the four-direction model would lead users to the two-numeric-codes procedure (Figure 6) or the three-numeric-codes procedure (Figure 7). The rules in Chinese reduced the problems that users had with the flexible phonetic alphabet. It could obtain eight postures of up, down, left, right, top right, bottom right, top left, and bottom left, then converted the feature point position result detected by the system for judging. For example, the English letter "E" has a two numeric code of 15. The user needed to complete the first head-action selection "1" and then the second head-action selection "5" to complete the input of the English letter "E". The system did not need an extra activation command, so the user moved their head to select a sentence. When a command was made by mistake, the user needed only to shake his head and try again.   The test results of computing position stability of this study are shown in Figure 8. For the analysis of the barycenter algorithm, the experiment placed the black center point in the center of the image. The actual position coordinates in the image captured by the camera were compared with barycenter position coordinates. The experiment was repeated 50 times, and, according to Figure 8,       The test results of computing position stability of this study are shown in Figure 8. For the analysis of the barycenter algorithm, the experiment placed the black center point in the center of the image. The actual position coordinates in the image captured by the camera were compared with barycenter position coordinates. The experiment was repeated 50 times, and, according to Figure 8, the accuracy of the barycenter algorithm was high; hence, the human-machine interface of the system resulted in a low probability of misjudgment and was convenient to operate.

User Testing
Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 14 the accuracy of the barycenter algorithm was high; hence, the human-machine interface of the system resulted in a low probability of misjudgment and was convenient to operate.
(a) (b) the accuracy of the barycenter algorithm was high; hence, the human-machine interface of the system resulted in a low probability of misjudgment and was convenient to operate.
(a) (b) The scope of the direction judgment in the experiment was as follows: up, y < 220; down, y > 370; left, x > 375; right, x < 310; lower right, x < 310 and y > 370; upper right, x < 310 and y < 220; lower left, x>375 and y > 370; upper left, x < 310 and y < 220; x and y are the respective horizontal and vertical coordinate points of the barycenter. According to the statistical results, the accuracy of one movement in the eight directions within 10 s was 100%, 97.5% within 5 s, and 92.5% within 2 s. The accuracy of one movement in the four directions was 100% within 10 s and 5s, and 97.5% within 2 s. This demonstrated that if users required to finish a head turn within 2 s with the system's help, there might be errors due to inadequate time for procedural judgment. Nonetheless, the accuracy was still over 90%. Therefore, the system was quite stable.
The experiment of coding time count consisted of two parts: 1. The average time for each word in the word-by-word sound-generating system; 2. The time for a sentence in the frequently used sentence system.
The two parts included the four-direction and eight-direction codes, respectively. Six users and their family members were invited to participate in the experiment. In the word-by-word soundgenerating system, the words with three different syllables were taken as the testing words, and "Please help me." was taken as the testing sentence for the pronunciation of frequently used sentences [20][21][22]. The experiment results showed the average time the system required to spell a word and to finish a frequently used sentence. Recording of the time started with the user's head turn and did not stop until the system finished the coding and generated sound.

Experimental Results and Discussion
According to the experiment results, the average time for the word-by-word sound-generating system to spell a word was as follows: 1) eight directions = 11.17 ± 0.85 s; 2) four directions = 16.27 ± 0.74 s. The average time for the frequently used sentence system to finish a sentence was as follows: eight directions = 6.00 ± 0.63 s; 2) four directions = 8.86 ± 0.58 s.
With the users' experience and their suggestions, this study improved the head-controlled system. Aside from testing the system, the patients also participated in the experiment of coding time count. With head-turn images filmed by the CCD and image scanning, this study obtained the parameters of a head turn and facilitated position detection. The obtained parameters were used as coding elements in a phonetic system. As head feature colors could be captured, it was possible to locate the head without the parameters set in the procedure. This not only simplified parameter adjustment, but also enabled the camera to locate the head at any time.
In this study, different test environments and scenes were taken, and test results were used as adjustment coefficients in image processing. After the tests, the image processing and environment The scope of the direction judgment in the experiment was as follows: up, y < 220; down, y > 370; left, x > 375; right, x < 310; lower right, x < 310 and y > 370; upper right, x < 310 and y < 220; lower left, x>375 and y > 370; upper left, x < 310 and y < 220; x and y are the respective horizontal and vertical coordinate points of the barycenter. According to the statistical results, the accuracy of one movement in the eight directions within 10 s was 100%, 97.5% within 5 s, and 92.5% within 2 s. The accuracy of one movement in the four directions was 100% within 10 s and 5s, and 97.5% within 2 s. This demonstrated that if users required to finish a head turn within 2 s with the system's help, there might be errors due to inadequate time for procedural judgment. Nonetheless, the accuracy was still over 90%. Therefore, the system was quite stable.
The experiment of coding time count consisted of two parts: 1.
The average time for each word in the word-by-word sound-generating system; 2.
The time for a sentence in the frequently used sentence system.
The two parts included the four-direction and eight-direction codes, respectively. Six users and their family members were invited to participate in the experiment. In the word-by-word sound-generating system, the words with three different syllables were taken as the testing words, and "Please help me." was taken as the testing sentence for the pronunciation of frequently used sentences [20][21][22]. The experiment results showed the average time the system required to spell a word and to finish a frequently used sentence. Recording of the time started with the user's head turn and did not stop until the system finished the coding and generated sound.

Experimental Results and Discussion
According to the experiment results, the average time for the word-by-word sound-generating system to spell a word was as follows: (1) eight directions = 11.17 ± 0.85 s; (2) four directions = 16.27 ± 0.74 s. The average time for the frequently used sentence system to finish a sentence was as follows: eight directions = 6.00 ± 0.63 s; (3) four directions = 8.86 ± 0.58 s.
With the users' experience and their suggestions, this study improved the head-controlled system. Aside from testing the system, the patients also participated in the experiment of coding time count. With head-turn images filmed by the CCD and image scanning, this study obtained the parameters of a head turn and facilitated position detection. The obtained parameters were used as coding elements in a phonetic system. As head feature colors could be captured, it was possible to locate the head without the parameters set in the procedure. This not only simplified parameter adjustment, but also enabled the camera to locate the head at any time.
In this study, different test environments and scenes were taken, and test results were used as adjustment coefficients in image processing. After the tests, the image processing and environment control methods most suitable for the system were found. The methods could lead to the correct judgment of head-turn directions. The system was used for repeated experiments, and the misjudgment rate was less than 3%, which meant that the system was highly stable.
For the disabled who cannot accurately operate the bi-system coding of four or eight directions, there was a probability of misjudgment, caused by their inability to control their bodies. In the future, it is suggested that image judgment should be upgraded to make the program more complete. Additionally, the existing four-direction coding is too difficult for Chinese grammatical coding. If it could be extended to the eight-direction judgment for coding, then the combination of the recognized data and the phonetic system would allow users to operate the system more conveniently.
The detection and coding of the system could also be applied to a rehabilitation system for the disabled. Unlike the traditional method, where the architecture is specifically designed for different movement control training, the combination of computer vision technology and movement control training enables users to train themselves with head turn games, makes rehabilitation training more attractive, and equips rehabilitation with both training and entertainment.

Conclusions
This study developed a system where machine vision technology was integrated with action-control for display on a numeric human-machine interface and applied it to a phonetic system for patients with general paralysis. The system used machine vision technology to detect eight directions of head motion and displayed the obtained direction data on the human-machine interface. This was then combined with a phonetic module to develop software that generated sound according to head control. Differently from traditional supporting devices for those with physical problems, the interactive head-control device featuring the combination of machine vision technology and action recognition was both practical and convenient. This device had many applications in complex and economical equipment, as confirmed by its use in environmental control. We hope our head-control device can become more useful for human-machine interfaces in the future.