Next Article in Journal
Unmasking Cybercrime with Artificial-Intelligence-Driven Cybersecurity Analytics
Next Article in Special Issue
RUDE-AL: Roped UGV Deployment Algorithm of an MCDPR for Sinkhole Exploration
Previous Article in Journal
Mach–Zehnder Modulators for Microwave Polarization Measurement in Astronomy Applications
Previous Article in Special Issue
A Convex Optimization Approach to Multi-Robot Task Allocation and Path Planning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Gaze Point Tracking Based on a Robotic Body–Head–Eye Coordination Method

1
Army Academy of Armored Forces, Beijing 100072, China
2
Research Center of Precision Sensing and Control, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Sensors 2023, 23(14), 6299; https://doi.org/10.3390/s23146299
Submission received: 1 June 2023 / Revised: 29 June 2023 / Accepted: 6 July 2023 / Published: 11 July 2023
(This article belongs to the Special Issue Mobile Robots: Navigation, Control and Sensing)

Abstract

:
When the magnitude of a gaze is too large, human beings change the orientation of their head or body to assist their eyes in tracking targets because saccade alone is insufficient to keep a target at the center region of the retina. To make a robot gaze at targets rapidly and stably (as a human does), it is necessary to design a body–head–eye coordinated motion control strategy. A robot system equipped with eyes and a head is designed in this paper. Gaze point tracking problems are divided into two sub-problems: in situ gaze point tracking and approaching gaze point tracking. In the in situ gaze tracking state, the desired positions of the eye, head and body are calculated on the basis of minimizing resource consumption and maximizing stability. In the approaching gaze point tracking state, the robot is expected to approach the object at a zero angle. In the process of tracking, the three-dimensional (3D) coordinates of the object are obtained by the bionic eye and then converted to the head coordinate system and the mobile robot coordinate system. The desired positions of the head, eyes and body are obtained according to the object’s 3D coordinates. Then, using sophisticated motor control methods, the head, eyes and body are controlled to the desired position. This method avoids the complex process of adjusting control parameters and does not require the design of complex control algorithms. Based on this strategy, in situ gaze point tracking and approaching gaze point tracking experiments are performed by the robot. The experimental results show that body–head–eye coordination gaze point tracking based on the 3D coordinates of an object is feasible. This paper provides a new method that differs from the traditional two-dimensional image-based method for robotic body–head–eye gaze point tracking.

1. Introduction

When the magnitude of a gaze is too large, human beings change the orientation of their head or body to assist their eyes in tracking targets because saccade alone is insufficient to keep a target at the center region of the retina. Studies on body–head–eye coordination gaze point tracking are still rare because the body–head–eye coordination mechanism of humans is prohibitively complex. Multiple researchers have investigated the eye–head coordination mechanism, binocular coordination mechanism and bionic eye movement control. In addition, researchers have validated the eye–head coordination models on eye–head systems. This work is significant for the development of intelligent robots for human–robot interaction. However, most of these methods are based on the principle of neurology, and their further developments and applications may be limited by people’s understanding of human processes. However, binocular coordination based on the 3D coordinates of an object is simple and practical, as verified by our previous paper [1].
When the fixation point transfers greatly, the head and eyes should move in coordination to accurately shift the gaze to the target. Multiple studies have built models of eye–head coordination based on the physiological characteristics of humans. For example, Kardamakis A A et al. [2] researched eye–head movement and gaze shifting. The best balance between eye movement speed and the duration time was sought, and the optimal control method was used to minimize the loss of motion. Freedman E G et al. [3] studied the physiological mechanism of coordinated eye–head movement. However, they did not establish an engineering model. Nakashima et al. [4] proposed a method for gaze prediction that combines information on the head direction with a saliency map. In another study [5], the authors presented a robotic head for social robots to attend to scene saliency with bio-inspired saccadic behaviors. The scene saliency was determined by measuring low-level static scene information, motion, and prior object knowledge. Law et al. [6] described a biologically constrained architecture for developmental learning of eye–head gaze control on an iCub robot. They also identified stages in the development of infant gaze control and proposed a framework of artificial constraints to shape the learning of the robot in a similar manner. Other studies have investigated the mechanisms of eye–head movement for robots and achieved satisfactory performance [7,8].
Some application studies based on coordinated eye–head movement have been carried out in addition to the mechanism research. For example, Kuang et al. [9] developed a method for egocentric distance estimation based on the parallax that emerges during compensatory head–eye movements. This method was tested in a robotic platform equipped with an anthropomorphic neck and two binocular pan–tilt units. Reference [10]’s model is capable of reaching static targets posed at a starting distance of 1.2 m in approximately 250 control steps. Hülse et al. [11] introduced a computational framework that integrates robotic active vision and reaching. Essential elements of this framework are sensorimotor mappings that link three different computational domains relating to visual data, gaze control and reaching.
Some researchers have applied the combined movement of the eyes, head and body in mobile robots. In one study [12], large reorientations of the line of sight, involving combined rotations of the eyes, head, trunk and lower extremities, were executed either as fast single-step or as slow multiple-step gaze transfers. Daye et al. [13] proposed a novel approach for the control of linked systems with feedback loops for each part. The proximal parts had separate goals. In addition, an efficient and robust human tracker for a humanoid robot was implemented and experimentally evaluated in another study [14].
On the one hand, human eyes can obtain three-dimensional (3D) information from objects. This 3D information is useful for humans to make decisions. Human can shift their gaze stably and approach a target using the 3D information of the object. When the human gaze shifts to a moving target, the eyes first rotate to the target, and then the head and even the body rotate if the target leaves the sight of the eyes [15]. Therefore, the eyes, head and body move in coordination to shift the gaze to the target with minimal energy expenditure. On the other hand, when a human approaches a target, the eyes, head and body rotate to face the target and the body moves toward the target. The two movements are typically executed with the eyes, head and body acting in conjunction. A robot that can execute these two functions will be more intelligent. Such a robot would need to exploit the smooth pursuit of eyes [16], coordinated eye–head movement [17], target detection and the combined movement of the eyes, head and robot body to carry out these two functions. Studies have achieved many positive results in these aspects.
Mobile robots can track and locate objects according to 3D information. Some special cameras such as deep cameras and 3D lasers have been applied to obtain the 3D information of the environment and target. In one study [18], a nonholonomic under-actuated robot with bounded control was described that travels within a 3D region. A single sensor provided the value of an unknown scalar field at the current location of the robot. Nefti-Meziani S et al. [19] presented the implementation of a stereo-vision system integrated in a humanoid robot. The low cost of the vision system is one of the main aims, avoiding expensive investment in hardware when used in robotics for 3D perception. Namavari A et al. [20] presented an automatic system for the gauging and digitalization of 3D indoor environments. The configuration consisted of an autonomous mobile robot, a reliable 3D laser rangefinder and three elaborated software modules.
The main forms of motion of bionic eyes include saccade [1], smooth pursuit, vergence [21], vestibule–ocular reflex (VOR) [22] and optokinetic reflex (OKR) [23]. Saccade and smooth pursuit are the two most important functions of the human eye. Saccade is used to move eyes voluntarily from one point to another by rapid jumping, while smooth pursuit can be applied to track moving targets. In addition, binocular coordination and eye–head coordination are of high importance to realize object tracking and gaze control.
It is of great significance for robots to be able change their fixation point quickly. In control models, the saccade control system should be implemented using a position servo controller to change and keep the target at the center region of the retina with minimum time consumption. Researchers have been studying the implementation of saccade on robots over the last twenty years. For example, in 1997, Bruske et al. [24] incorporated saccadic control into a binocular vision system by using the feedback error learning (FEL) strategy. In 2013, Wang et al. [25] designed an active vision system that can imitate saccade and other eye movements. The saccadic movements were implemented with an open-loop controller, which ensures faster saccadic eye movements than a closed-loop controller can accommodate. In 2015, Antonelli et al. [26] achieved saccadic movements on a robot head by using a model called recurrent architecture (RA). In this model, the cerebellum is regarded as an adaptive element used to learn an internal model, while the brainstem is regarded as a fixed-inverse model. The experimental results on the robot showed that this model is more accurate and less sensitive to the choice of the inverse model relative to the FEL model.
The smooth pursuit system acts as a velocity servo controller to rotate eyes at the same angular rate as the target while keeping them oriented toward the desired position or in the desired region. In Robinson’s model of smooth pursuit [27], the input is the velocity of the target’s image across the retina. The velocity deviation is taken as the major stimulus to pursue and is transformed into an eye velocity command. Based on Robinson’s model, Brown [28] added a smooth predictor to accommodate time delays. Deno et al. [29] applied a dynamic neural network, which unified two apparently disparate models of smooth pursuit and dynamic element organization to the smooth pursuit system. The dynamic neural network can compensate for delays from the sensory input to the motor response. Lunghi et al. [30] introduced a neural adaptive predictor that was previously trained to accomplish smooth pursuit. This model can explain a human’s ability to compensate for the 130 ms physiological delay when they follow external targets with their eyes. Lee et al. [31] applied a bilateral OCS model on a robot head and established rudimentary prediction mechanisms for both slow and fast phases. Avni et al. [32] presented a framework for visual scanning and target tracking with a set of independent pan–tilt cameras based on model predictive control (MPC). In another study [33], the authors implemented smooth pursuit eye movement with prediction and learning in addition to solving the problem of time delays in the visual pathways. In addition, some saccade and smooth pursuit models have been validated on bionic eye systems [34,35,36,37]. Santini F et al. [34] showed that the oculomotor strategies by which humans scan visual scenes produce parallaxes that provide an accurate estimation of distance. Other studies have realized the coordinated control of eye and arm movements through configuration and training [35]. Song Y et al. [36] proposed a binocular control model, which was derived from a neural pathway, for smooth pursuit. In their smooth pursuit experiments, the maximum retinal error was less than 2.2°, which is sufficient to keep a target in the field of view accurately. An autonomous mobile manipulation system was developed in the form of a modified image-based visual servo (IBVS) controller in a study [37].
The above-mentioned work is significant for the development of intelligent robots. However, there are some shortcomings. First, most of the existing methods are based on the principle of neurology, and further developments and applications may be limited by people’s understanding aimed at humans. Second, only two-dimensional (2D) image information is applied when gaze shifts to targets are implemented, while 3D information is ignored. Third, the studies of smooth pursuit [16], eye–head coordination [17], gaze shift and approach are independent and have not been integrated. Fourth, bionic eyes are different from human eyes; for example, some of them are two eyes that are fixed without movement or move with only 1 DOF, whereas some of them use special cameras or a single camera. Fifth, the movements of bionic eyes and heads are performed separately, without coordination.
To overcome the shortcomings mentioned above to a certain extent, a novel control method that implements the gaze shift and approach of a robot according to 3D coordinates is proposed in this paper. A robot system equipped with bionic eyes, a head and a mobile robot is designed to help nurses deliver medicine in hospitals. In this system, both the pan and each eye have 2 DOF (namely, tilt and pan [38]), and the mobile robot can rotate and move forward over the ground. When the robot gaze shifts to the target, the 3D coordinates of the target are acquired by the bionic eyes and transferred to the eye coordination system, head coordination system and robot coordination system. The desired position of the eye, head and robot are calculated based on the 3D information of the target. Then, the eye, head and mobile robot are driven to the desired positions. When the robot approaches the target, the eye, head and mobile robot first rotate to the target and then move to the target. This method allows the robot to achieve the above-mentioned functions with minimal resource consumption and can separate the control of the eye, head and mobile robot, which can improve the interactions between robots, human beings and the environment.
The rest of the paper is organized as follows. In Section 2, the robot system platform is introduced, and the control system is presented. In Section 3, the desired position is discussed and calculated. Robot pose control is described in Section 4. The experimental results are given and discussed in Section 5; finally, conclusions are drawn in Section 6.

2. Platform and Control System

To study the gaze point tracking of the robot, this paper designs a robot experiment platform including the eye–head subsystem and the mobile robot subsystem.

2.1. Robot Platform

The physical object of the robot is shown in Figure 1. With the mobile robot as a carrier, a head with two degrees of freedom is fixed on the mobile robot, and the horizontal and vertical rotations of the head are controlled by Mhu and Mhd, respectively. The bionic eye system is fixed to the head. The mobile robot is driven by two wheels, each of which is individually controlled by a servo motor. The angle and displacement of the robot platform can be determined by controlling the distance and speed of each wheel’s movement. The output shaft of each stepper motor of the head and eye is equipped with a rotary encoder to detect the position of the motor. Using the frequency multiplication technique, the resolution of the rotary encoder is 0.036°. The purpose of using a rotary encoder is to prevent the effects of lost motor motion on the 3D coordinate calculations. The movement of each motor is limited by a limit switch. The initial positioning of the eye system is based on the visual positioning plate [39].
The robot system includes two eyes and one mobile robot. To simulate the eyes and the head, six DOFs are designed in this system. The left eye’s pan and tilt are controlled by motors Mlu and Mld, respectively. The right eye’s pan and tilt are controlled by motors Mru and Mrd, respectively. The head’s pan and tilt are controlled by motors Mhu and Mhd, respectively. The mobile robot has two driving wheels and can perform rotation and forward movement. When the mobile robot needs to rotate, two wheels are set to turn the same amount in different directions. When the mobile robot needs to go forward, two wheels are set to turn the same amount in the same direction.
A diagram of the robot system’s organization is shown in Figure 2. The host computer and the mobile robot motion controller, the head motion controller and the eye motion controller all communicate through the serial ports. For satisfactory communication quality and stability, the baud rate of serial communication is 9600 bps. The camera communicates with the host computer via a GigE Gigabit Network. The camera’s native resolution is 1600 × 1200 pixels. To increase the calculation speed, the system uses an image downsampled to 400 × 300 pixels.

2.2. Control System

Figure 3 shows the control block diagram of the gaze point tracking of the mobile robot. First, based on binocular stereo-vision perception, the binocular pose and the left and right images are used to calculate the 3D coordinates of the target [40], and the coordinates of the target in the eye coordinate system are converted to the head and mobile robot coordinate system. Then, the desired poses of the eyes, head and mobile robot are calculated according to the 3D coordinates of the target. Finally, according to the desired pose, the motor is controlled to move to the desired position, and the change in the position of the motor is converted into changes in the eyes, head and mobile robot.
The tracking and approaching motion control problem based on the target 3D coordinates [1] is equivalent to solving the index J minimization problem of Equation (1), where fi is the current state vector of the joint pose of the eye, head and mobile robot and fq is the desired state vector:
J = f i f q
where J is the indicator function.
Figure 4a shows the definition of each coordinate system of the robot. The coordinate system of the eye is OeXeYeZe, which coincides with the left motion module’s base coordinate system at the initial position. The head coordinate system is OhXhYhZh, and the coordinates Ph (xh, yh, zh) of the point P in the head coordinate system can be calculated using the coordinates Pe (xe, ye, ze) in the eye coordinate system. The definitions of dx and dy are shown in Figure 4b. The robot coordinate system OwXwYwZw coincides with the head coordinate system of the initial position. In the bionic eye system, the axis of rotation of the robot approximately coincides with Yw.
Figure 4b,c show the definition of each system parameter. lθp and lθt are the pan and tilt of the left eye, respectively. rθp and rθt are the pan and tilt of the right eye, respectively. hθp and hθt are the pan and tilt of the head, respectively. The angle of the robot that rotates around the Yw axis is wθp. The robot can not only rotate around Yw but can also shift in the XwOwZw plane. When the robot moves, the robot coordinate system at time i is the base coordinate system, and the position of the robot at time i + 1 relative to the base coordinate system is wPm (wxm, wzm). When the robot performs gaze point tracking or approaches the target, the 3D coordinates of the target are first calculated at time i, and then the desired posture fq of each part of the robot at time i + 1 is calculated according to the 3D coordinates of the target. When the current pose fi of the robot system is equal to the desired pose, the robot maintains the current pose; when not equal, the system controls the various parts of the robot to move to the desired pose. The current pose vector of the robot system is fi = (wxmi, wzmi, wθpi, hθpi, hθti, lθpi, lθti, rθpi, rθti), and the desired pose is fq = (wxmq, wzmq, wθpq, hθpq, hθtq, lθpq, lθtq, rθpq, rθtq). When performing in situ gaze point tracking, the robot performs only pure rotation and does not move forward. When the robot approaches the target, it first turns to the target and then moves straight toward the target. Therefore, the definition of fq in the two tasks is different. Let gfq be the desired pose when the gaze point is tracked and afq be the desired pose of the robot when approaching the target.
After analyzing the control system, we found that the most important step in solving this control problem is to determine the desired pose.

3. Desired Pose Calculation

When performing in situ gaze point tracking, the robot performs only pure rotation and does not move forward. When the robot approaches the target, it first turns to the target and then moves straight toward the target. Therefore, the calculation of the desired pose can be divided into two sub-problems: (1) desired pose calculation for in situ gaze point tracking and (2) desired pose calculation for approaching gaze point tracking.
The optimal observation position is used for the accurate acquisition of 3D coordinates. The 3D coordinate accuracy is related to the baseline, time difference and image distortion. In the bionic eye platform, the baseline is changed with the changes in the cameras’ positions because the optical center is not coincident with the center of rotation. The 3D coordinate error of the target is smaller when the baseline of the two cameras is longer. Therefore, it is necessary to keep the baseline unchanged. On the other hand, there is a time difference caused by unstick synchronization between image acquisition and camera position acquisition. In addition, it is necessary to keep the target in the center areas of the two camera images to obtain accurate 3D coordinates of the target.

3.1. Optimal Observation Position of Eyes

In the desired pose of the robot, the most important aspect is the expected pose of the bionic eye [40]. Following the definition of this parameter, the calculation of the desired pose of the robot system is greatly simplified; thus, we present an engineering definition here of the desired pose of the bionic eye.
As shown in Figure 5, lmi (lui, lvi) and r(rui, rvi) are the image coordinates of point eP in the camera at time i. lmo and rmo are the image centers of the left and right cameras, respectively. lP is the vertical point of eP along the line lOclZc, and rP is the vertical point of eP along the line rOcrZc. lm is the distance between lm and lmo. rm is the distance between rm and rmo. Db is the baseline length. The pan angles of the left and right cameras in the optimal observation position are lθp and rθp, respectively. The tilt angles of the left and right cameras in the optimal observation position are lθt and rθt, respectively. Pob (lθp, lθt, rθp, rθt) is the optimal observation position.
When the two eyeballs of the bionic eye move relative to each other, the 3D coordinates of the target obtained by the bionic eye produces a large error. To characterize this error, we give a detailed analysis of its origins in Appendix A. Through analysis, we obtain the following conclusions to reduce the measurement error of the bionic eye:
(1) Make the length of Db long enough, and maintain as much length as possible during the movement;
(2) Try to observe the target closer to the target so that the depth error is as small as possible;
(3) During the movement of the bionic eye, control the two cameras so that they move at the same angular velocity;
(4) Try to keep the target symmetrical, and make lΔm and rΔm as equal as possible in the left and right camera images.
Based on these four methods, the motion strategy of the motor is designed, and the measurement accuracy of the target’s 3D information can be effectively improved.
According to the conclusion, we can define a definition of the optimal observed pose of the bionic eye to reduce the measurement error.
The optimal observation position needed to meet the conditions is listed in Equation (2). When the target is very close to the eyes, the target’s optimal observation position cannot be obtained because the image position of the target can be kept at the image center region. It is challenging to obtain the optimal solution of the observation position based on Equation (12). However, a suboptimal solution can be obtained by using a simplified calculation method. First, lθt and rθt are calculated in the case that lθt and rθt are equal to zero; then, lθt and rθt are calculated while lθt and rθt are kept equal to the calculated value. Trial-and-error methods can be used to obtain the optimal solution when the suboptimal solution is obtained.
l θ pq = r θ pq = θ p l θ tq = r θ tq = θ t l Δ m = r Δ m
where
l Δ m = l Δ u l Δ v = l u i l u 0 l v i l v 0
r Δ m = r Δ u r Δ v = r u i r u 0 r v i r v 0

3.2. Desired Pose Calculation for In Situ Gaze Point Tracking

When the range of target motion is large and the desired posture of the eyeball exceeds its reachable posture, the head and mobile robot move to keep the target in the center region of the image. In robotic systems, eye movements tend to consume the least amount of resources and do not have much impact on the stability of the head and mobile robot during exercise. Head rotation consumes more resources than the eyeball but consumes fewer resources than trunk rotation. At the same time, the rotation of the head affects the stability of the eyeball but does not have much impact on the stability of the trunk. Mobile robot rotation consumes the most resources and has a large impact on the stability of the head and eyeball. When tracking the target, one needs only to keep the target in the center region of the binocular image. Therefore, when performing gaze point tracking, the movement mechanism of the head, eyes and mobile robot are designed with the principle of minimal resource consumption and maximum system stability. When the eyeball can perceive the 3D coordinates of the target in the reachable and optimal viewing posture, only the eye is rotated; otherwise, the head is rotated. The head also has an attainable range of poses. When the desired pose exceeds this range, the mobile robot needs to be turned so that the bionic eye always perceives the 3D coordinates of the target in the optimal viewing position. Let hγp and hγt be the angles between the head and the gaze point in the XhOhZh and YhOhZh planes, respectively. The range of binocular rotation in the horizontal direction is [−eθpmax, eθpmax], and the range of binocular rotation in the vertical direction is [−eθtmax, eθtmax]. The range of head rotation in the horizontal direction is [−hθpmax, hθpmax], and the range of head rotation in the vertical direction is [−hθtmax, hθtmax]. For the convenience of calculation, the angles between the head and the fixation point in the horizontal direction and the vertical direction are designated as [−hγpmax, hγpmax] and [−hγtmax, hγtmax], respectively. When the angle between the head and the target exceeds a set threshold, the head needs to be rotated to the h θ p and h θ t positions in the horizontal and vertical directions, respectively. When h θ p exceeds the angle that the head can attain, the angle at which the mobile robot needs to be compensated is wθp. In the in situ gaze point tracking task, the cart does not need to translate in the XwOwZw plane, so xw = 0, and zw = 0. Furthermore, according to the definition of the optimal observation pose of the bionic eye, the conditions that gfq should satisfy are
g f q = w x mq = 0 w z mq = 0 w θ pq = { θ     | θ | 2 π , h θ pq + θ = h θ p } h θ pq = { θ     | θ | h θ pmax , | h γ p | h γ pmax } h θ tq = { θ     | θ | h θ tmax , | h γ t | h γ tmax } l θ pq = r θ pq = { θ     | θ | e θ pmax , l Δ m l = Δ m r } l θ tq = r θ tq = { θ     | θ | e θ tmax , Δ m l = Δ m r }
The desired pose needs to be calculated based on the 3D coordinates of the target. Therefore, to obtain the desired pose, it is necessary to acquire the 3D coordinates of the target according to the current pose of the robot.

3.2.1. Three-Dimensional Coordinate Calculation

The mechanical structure and coordinate settings of the system are shown in Figure 6a. The principle of binocular stereoscopic 3D perception is shown in Figure 6b. E is the eye coordinate system, El is the left motion module’s end coordinate system, Er is the right motion module’s end coordinate system, Bl is the left motion module’s base coordinate system, Br is the right motion module’s base coordinate system, Cl is the left camera coordinate system and Cr is the right camera coordinate system. In the initial position, El coincides with Bl, and Er overlaps with Br. When the binocular system moves, the base coordinate system does not change. lT represents the transformation matrix of the eye coordinate system E to the left motion module’s base coordinate system Bl, rT represents the transformation matrix of E to Br, lTe represents the transformation matrix of Bl to El, rTe represents the transformation matrix of Br to Er and lTm represents the leftward motion. The module end coordinate system corresponds to the transformation matrix of the left camera coordinate system, and rTm represents the transformation matrix of the right motion module’s end coordinate system to the right camera coordinate system. lTr represents the transformation matrix of the right camera coordinate system to the left camera coordinate system at the initial position.
The origin lOc of Cl lies at the optical center of the left camera, the lZc axis points in the direction of the object parallel to the optical axis of the camera, the lXc axis points horizontally to the right along the image plane and the lYc axis points vertically downward along the image plane. The origin rOc of Cr lies at the optical center of the right camera, rZc is aligned with the direction of the object parallel to the optical axis of the camera, rXc points horizontally to the right along the image plane and rYc points vertically downward along the image plane. El’s origin lOe is set at the intersection of the two rotation axes of the left motion module, lZe is perpendicular to the two rotation axes and points to the front of the platform, lXe coincides with the vertical rotation axis and lYe coincides with the horizontal rotation axis. Similarly, the origin rOe of the coordinate system Er is set at the intersection of the two rotation axes of the right motion module, rZe is perpendicular to the two rotation axes and points toward the front of the platform, rXe coincides with the vertical rotation axis and rYe coincides with the horizontal rotation axis.
The left motion module’s base coordinates system Bl coincides with the eye coordinate system E; thus, lT consists of an identity matrix. To calculate the 3D coordinates of the feature points in real time from the camera pose, it is necessary to calculate rT. At the initial position of the system, the external parameters lTr of the left and right cameras are calibrated offline, as are the hand–eye parameters of the left–right motion module to the camera coordinate system.
When the system is in its initial configuration, the coordinates of point P in the eye coordinate system are Pe (xe, ye, ze). Its coordinates in Bl are lPe (lxe, lye, lze), and its coordinates lPc (lxc, lyc, lzc) in Cl are
l P c = l T m 1 P e
The coordinates rPe (rxe, rye, rze) of point P in Br are
r P e = r T P e
The coordinates rPc (rxc, ryc, rzc) of point P in Cr are
r P c = r T m 1 r T P e
The point in Cr is transformed into Cl:
l P c = l T r r T m 1 r T P e
Based on the Equations (6) and (9), rT is available:
r T = r T m l T r 1 l T m 1
During the movement of the system, when the left motion module rotates by lθp and lθt in the horizontal and vertical directions, respectively, the transformation relationship between Bl and El is
l T e = Rot ( Y , l θ p ) Rot ( X , l θ t ) 0 0 1
The coordinates of point P in Cl are
l P e = l T m 1 l T e P w = l T d P e
Assume that
l T d = l n x l o x l a x l p x l n y l o y l a y l p y l n z l o z l a z l p z 0 0 0 1
The point lP1c (lx1c, ly1c) at which line PlOc intersects lZc = 1 is
l x 1 c l y 1 c = l n x x e + l o x y e + l a x z e + l p x l n z x e + l o z y e + l a z z e + l p z l n y x e + l o y y e + l a y z e + l p y l n z x e + l o z y e + l a z z e + l p z
The image coordinates of lP1c in the left camera are ml (ul, vl), (lx1c, ly1c) and (ul, vl) and can be converted by the parameters of the camera. According to the camera’s internal parameter model, the following can be obtained:
l x 1 c l y 1 c 1 = l M in 1 u l v l 1
where lMin is the internal parameter matrix of the left camera. The value of (lx1c, ly1c) can be obtained by the image coordinates of lP1c, and the parameters of the left camera can be obtained by substituting (15) into (14):
( l n x l x 1 c l n z ) x e + ( l o x l x 1 c l o z ) y e + ( l a x l x 1 c l a z ) z e + l p x l x 1 c l p z = 0 ( l n y l y 1 c l n z ) x e + ( l o y l y 1 c l o z ) y e + ( l a y l y 1 c l a z ) z e + l p y l y 1 c l p z = 0
During the motion of the system, when the right motion module rotates through rθp and rθt in the horizontal and vertical directions, respectively, the transformation relationship between Br and Er is
r T e = Rot ( Y , r θ p ) Rot ( X , r θ t ) 0 0 1
The coordinates of point P in Cr are
r P e = r T m 1 r T e r T P e = r T d P e
Assume that
r T d = r n x r o x r a x r p x r n y r o y r a y r p y r n z r o z r a z r p z 0 0 0 1
The point lP1c (rx1c, ry1c) at which line PrOc intersects rZc = 1 is
r x 1 c r y 1 c = r n x x e + r o x y e + r a x z e + r p x r n z x e + r o z y e + r a z z e + r p z r n y x e + r o y y e + r a y z e + r p y r n z x e + r o z y e + r a z z e + r p z
The image coordinates of rP1c in the camera, namely, mr (ur, vr), (rx1c, ry1c) and (ur, vr), can be converted using the parameters of the camera. According to the camera’s internal parameter model, the following can be obtained:
r x 1 c r y 1 c 1 = l M in 1 u r v r 1
where rMin is the inner parameter matrix of the right camera. The value of (rx1c, ry1c) can be obtained by the image coordinates of rP1c and the parameters in the camera, and the following can be obtained by substituting (21) into (20):
( r n x r x 1 c r n z ) x e + ( r o x r x 1 c r o z ) y e + ( r a x r x 1 c r a z ) z e + r p x r x 1 c r p z = 0 ( r n y r y 1 c r n z ) x e + ( r o y r y 1 c r o z ) y e + ( r a y r y 1 c r a z ) z e + r p y r y 1 c r p z = 0
Four equations can be obtained from Equations (16) and (22) for xe, ye and ze, and the 3D coordinates of point Pe can be calculated by the least squares method.
The 3D coordinates Ph (xh, yh, zh) in the head coordinate system can be obtained by Equation (23). dx and dy are illustrated in Figure 4.
x h y h z h = x e d x y e d y z e
Let the angles at which the current moment of the head rotate relative to the initial position be hθpi and hθti; the coordinates of the target in the robot coordinate system are
w x m w y m w z m 1 = Rot ( X , h θ t i ) Rot ( Y , h θ p i ) 0 0 1 1 x h y h z h 1
According to the 3D coordinates of the target in the head coordinate system, the angle between the target and Zh in the horizontal direction and the vertical direction can be obtained as follows:
h γ p = arctan ( x h z h )
h γ t = arctan ( y h z h )
When hγp and hγt exceed a set threshold, the head needs to rotate. To leave a certain margin for the rotation of the eyeball and for the convenience of calculation, the angles required for the head to rotate in the horizontal direction and the vertical direction are calculated by the principle shown in Figure 7a,b, respectively. Figure 7a shows the calculation principle of the horizontal direction angle when the target’s x coordinates of the head coordinate system is greater than zero. After the head is rotated to h θ p , the target point is on the lZe axis of the left motion module end coordinate system, and the left motion module reaches the maximum rotatable threshold eθpmax. Figure 7b shows the calculation principle of the vertical direction when the target’s y coordinates of the head coordinate system are greater than dy. After the head is rotated to h θ t , the target point is on the Ze axis of the eye coordinate system, and the eye reaches the maximum threshold eθtmax that can be rotated.

3.2.2. Horizontal Rotation Angle Calculation

Let the current angle of the head in the horizontal direction be hθpi. When the head is rotated in the horizontal direction to h θ p , the 3D coordinates of the target in the new head coordinate system are
x h y h z h 1 = cos ( h θ p h θ p i ) 0 sin ( h θ p h θ p i ) 0 0 1 0 0 sin ( h θ p h θ p i ) 0 cos ( h θ p h θ p i ) 0 0 0 0 1 x h y h z h 1
Therefore,
x h y h z h = x h cos ( h θ p h θ p i ) z h sin ( h θ p h θ p i ) y h x h sin ( h θ p h θ p i ) + z h cos ( h θ p h θ p i )
The coordinates of the target in the new eye coordinate system are
e x h e y h e z h = d x + x h cos ( h θ p h θ p i ) z h sin ( h θ p h θ p i ) y h + d y x h sin ( h θ p h θ p i ) + z h cos ( h θ p h θ p i )
After turning, the left motion module reaches the maximum threshold eθpmax that can be rotated, so that
tan ( e θ pmax ) = e z h e x h = x h sin ( h θ p h θ p i ) + z h cos ( h θ p h θ p i ) d x + x h cos ( h θ p h θ p i ) z h sin ( h θ p h θ p i )
Simplifying Equation (30), we have
sin ( h θ p h θ p i ) = d x tan ( e θ qmax ) x h + z h tan ( e θ qmax ) + x h tan ( e θ qmax ) z h x h + z h tan ( e θ qmax ) cos ( h θ p h θ p i )
Assume that
k 1 = x h tan ( e θ qmax ) z h x h + z h tan ( e θ qmax ) k 2 = d x tan ( e θ qmax ) x h + z h tan ( e θ qmax )
According to the triangular relationship,
[ k 1 cos ( h θ p h θ p i ) + k 2 ] 2 + cos 2 ( h θ p h θ p i ) = 1
The solution of Equation (33) is
cos ( h θ p h θ p i ) = k 1 k 2 ± k 1 2 k 2 2 + 1 k 1 2 + 1
Therefore,
h θ p = h θ p i + arccos ( k 1 k 2 ± k 1 2 k 2 2 + 1 k 1 2 + 1 )
Equation (35) has two solutions; therefore, we choose the solution in which the deviation e of Equation (36) is minimized:
e = tan ( e θ qmax ) x h sin ( h θ p h θ p i ) + z h cos ( h θ p h θ p i ) d x + x h cos ( h θ p h θ p i ) z h sin ( h θ p h θ p i )
When the obtained h θ p is outside of the range [−hθpmax, hθpmax], the value of hθpq is
h θ pq = h θ pmax , h θ p h θ pmax h θ pmax , h θ p h θ pmax h θ p , else
Finally, one can obtain the wθpq value:
w θ pq = h θ p h θ pmax , h θ p > h θ pmax h θ p + h θ pmax , h θ p < h θ p max 0 , else
Based on the same principle, when the x coordinate of the target in the head coordinate system is less than 0, the coordinates of the target in the right motion module base coordinate system after the rotation are
r x e r y e r z e 1 = x h cos ( h θ p h θ p i ) z h sin ( h θ p h θ p i ) d x y h + d y x h sin ( h θ p h θ p i ) + z h cos ( h θ p h θ p i ) 1
After turning, the right motion module reaches −eθpmax, and the following can be obtained:
tan ( e θ qmax ) = r z e r x e = x h sin ( h θ p h θ p i ) + z h cos ( h θ p h θ p i ) x h cos ( h θ p h θ p i ) z h sin ( h θ p h θ p i ) d x
We simplify Equation (40) as follows:
sin ( h θ p h θ p i ) = d x tan ( e θ qmax ) x h z h tan ( e θ qmax ) x h tan ( e θ qmax ) + z h x h z h tan ( e θ qmax ) cos ( h θ p h θ p i )
Let
k 1 = x h tan ( e θ qmax ) + z h x h z h tan ( e θ qmax ) k 2 = d x tan ( e θ qmax ) x h z h tan ( e θ qmax )
The same two solutions are available:
h θ p = h θ p i + arccos ( k 1 k 2 ± ( k 1 ) 2 ( k 2 ) 2 + 1 ( k 1 ) 2 + 1 )
Select the solution in which the deviation e of Equation (44) is minimized:
e = tan ( e θ qmax ) x h sin ( h θ p h θ p i ) + z h cos ( h θ p h θ p i ) x h cos ( h θ p h θ p i ) z h sin ( h θ p h θ p i ) d x
Using Equations (37) and (38), hθpq and wθpq can be obtained.

3.2.3. Vertical Rotation Angle Calculation

When the target’s y coordinate in the head coordinate system is greater than dy, the current angle of the head in the vertical direction is hθti, and when the head is rotated in the vertical direction to h θ t , the target is in the new head coordinate system. The 3D coordinates are
x h y h z h 1 = 1 0 0 0 0 cos ( h θ t h θ t i ) sin ( h θ t h θ t i ) 0 0 sin ( h θ t h θ t i ) cos ( h θ t h θ t i ) 0 0 0 0 1 x h y h z h 1
Therefore,
x h y h z h = x h y h cos ( h θ t h θ t i ) + z h sin ( h θ t h θ t i ) z h cos ( h θ t h θ t i ) y h sin ( h θ t h θ t i )
Using Equation (29), the coordinates of the eye coordinate system after the rotation of the target can be calculated:
e x h e y h e z h = d x + x h y h cos ( h θ t h θ t i ) + z h sin ( h θ t h θ t i ) + d y z h cos ( h θ t h θ t i ) y h sin ( h θ t h θ t i )
After rotation, the left and right motion modules reach the rotatable maximum value eθtmax in the vertical direction, so that
tan ( e θ tmax ) = e y h e z h = y h cos ( h θ t h θ t i ) + z h sin ( h θ t h θ t i ) + d y z h cos ( h θ t h θ t i ) y h sin ( h θ t h θ t i )
Simplifying Equation (48), we obtain
sin ( h θ t h θ t i ) = z h tan ( e θ tmax ) y h z h + y h tan ( e θ tmax ) cos ( h θ t h θ t i ) d y z h + y h tan ( e θ tmax )
Let
k 1 = z h tan ( e θ tmax ) y h z h + y h tan ( e θ tmax ) k 2 = d y z h + y h tan ( e θ tmax )
Therefore,
h θ t = h θ t i + arccos ( k 1 k 2 ± k 1 2 k 2 2 + 1 k 1 2 + 1 )
Equation (51) has two solutions; therefore, we choose the solution in which the deviation e of Equation (52) is minimized:
e = tan ( e θ tmax ) y h cos ( h θ t h θ t i ) + z h sin ( h θ t h θ t i ) + d y z h cos ( h θ t h θ t i ) y h sin ( h θ t h θ t i )
Similarly, when the target’s y coordinates in the head coordinate system are less than dy, we have
tan ( e θ tmax ) = e y h e z h = y h cos ( h θ t h θ t i ) + z h sin ( h θ t h θ t i ) + d y z h cos ( h θ t h θ t i ) y h sin ( h θ t h θ t i )
sin ( h θ t h θ t i ) = z h tan ( e θ tmax ) + y h z h y h tan ( e θ tmax ) cos ( h θ t h θ t i ) d y z h y h tan ( e θ tmax )
k 1 = z h tan ( e θ tmax ) + y h z h y h tan ( e θ tmax ) k 2 = d y z h y h tan ( e θ tmax )
e = tan ( e θ tmax ) y h cos ( h θ t h θ t i ) + z h sin ( h θ t h θ t i ) + d y z h cos ( h θ t h θ t i ) y h sin ( h θ t h θ t i )
When the obtained h θ t is outside of the range [−hθtmax, hθtmax], the value of hθtq is
h θ tq = h θ tmax , h θ t h θ tmax h θ tmax , h θ t h θ tmax h θ t , else
After obtaining hθpq, hθtq and wθpq, P e x e , y e , z e are the coordinates of the target in the eye coordinate system after the mobile robot and the head are rotated:
x e y e z e 1 = Rot ( X , h θ tq ) Rot ( Y , h θ pq ) 0 0 1 Rot ( Y , w θ pq ) 0 0 1 x w y w z w 1 + d x d y 0 0
The desired observation pose of the eye, characterized by lθtq, lθpq, rθtq and rθpq, can be obtained using the method described in the following section.

3.2.4. Calculation of the Desired Observation Poses of the Eye

According to Formula (2), lθtq = rθtq = θt, and lθpq = rθpq = θp.
The inverse of the hand–eye matrix of the left camera and left motion module end coordinate system is
l T m 1 = l n x l o x l a x l p x l n y l o y l a y l p y l n z l o z l a z l p z 0 0 0 1
The coordinate lPc (lxc, lyc, lzc) of P e x e , y e , z e in the left camera coordinate system satisfies the following relationship:
l P c 1 = l T m 1 Rot ( X , l θ t ) 0 0 1 Rot ( Y , l θ p ) 0 0 1 P e 1
According to the small hole imaging model, the imaging coordinates of the P e x e , y e , z e point in the left camera are
l u l v 1 = l M in l P 1 c = l k x 0 l u 0 0 l k y l v 0 0 0 1 l x c / l z c l y c / l z c 1 = l k x l x c / l z c + l u 0 l k y l y c / l z c + l v 0 1
Substituting Equation (61) into Equation (2), we obtain
l Δ u l Δ v = l k x l x c / l z c l k y l y c / l z c 1
Based on the same principle, the coordinate rPc (rxc, ryc, rzc) of P e x e , y e , z e in the right camera coordinate system is
r P c 1 = r T m 1 Rot ( X , r θ t ) 0 0 1 Rot ( Y , r θ p ) 0 0 1 r T e P e 1
The imaging coordinates of point P e x e , y e , z e in the right camera are
r u r v 1 = r M in r P 1 c = r k x 0 r u 0 0 r k y r v 0 0 0 1 r x c / r z c r y c / r z c 1 = r k x r x c / r z c + r u 0 r k y r y c / r z c + r v 0 1
r Δ u r Δ v = r k x r x c / r z c r k y r y c / r z c 1
By Equations (2), (62) and (65), two equations related to θt and θp (see Appendix C for the complete equations) can be obtained. It is challenging to calculate the values of θt and θp directly from these two equations, however. To obtain a solution, we consider a suboptimal observation pose and use this pose as the initial value; then, we use the trial-and-error method to obtain the optimal observation pose. When θt is calculated, let θp = 0; the solution of θt can then be obtained by Δvl = −Δvr. When θp is calculated, the solution of θp is solved by Δul = −Δur. The solution Pob (θt, θt, θp, θp) is a suboptimal observed pose. Based on the suboptimal observation pose, the trial-and-error method can be used to obtain the optimal solution with the smallest error. The range of θt is [−θtmax, θtmax]. The range of θp is [−θpmax, θpmax].
According to Equations (60) and (63), let θp be equal to 0 to obtain
l P c 1 = ( l T m ) 1 Rot ( X , θ t ) 0 0 1 l T e P e 1
The following result is also available:
r P c 1 = ( r T m ) 1 Rot ( X , θ t ) 0 0 1 r T e P e 1
The base coordinate system of the left motion module is the world coordinate system. Therefore, lTw is a unit matrix. To simplify the calculation, we have
r P e 1 = r x e r y e r z e 1 = r T e P e 1
According to the calculation principle of Section 3.2.1, we have the following:
Δ u l = l f x ( l o x y e + l a x z e ) cos θ t + ( l o x z e l a x y e ) sin θ t + ( l p x + l n x x e ) ( l o z y e + l a z z e ) cos θ t + ( l o z z e l a z y e ) sin θ t + ( l p z + l n z x e ) Δ v l = l f y ( l o y y e + l a y z e ) cos θ t + ( l o y z e l a y y e ) sin θ t + ( l p y + l n y x e ) ( l o z y e + l a z z e ) cos θ t + ( l o z z e l a z y e ) sin θ t + ( l p z + l n z x e )
Δ u r = r f x ( r o x r y e + r a x r z e ) cos θ t + ( r o x r z e r a x r y e ) sin θ t + ( r p x + r n x r x e ) ( r o z r y e + r a z r z e ) cos θ t + ( r o z z e r a z r y e ) sin θ t + ( r p z + r n z r x e ) Δ v r = r f y ( r o y r y e + r a y r z e ) cos θ t + ( r o y r z e r a y r y e ) sin θ t + ( r p y + r n y r x e ) ( r o z r y e + r a z r z e ) cos θ t + ( r o z r z e r a z r y e ) sin θ t + ( r p z + r n z r x e )
Assume the following:
E sv = Δ v l + Δ v r
The solution to θt that keeps the target at the center of the two cameras needs to satisfy the following conditions:
Δ v l + Δ v r = 0 θ tmax θ t θ tmax θ t = arg min ( E sv )
Substituting the second equation of Equations (69) and (70) into Equation (72) and solving the equation, we have
k 1 cos 2 θ t + k 2 sin 2 θ t + k 3 sin θ t cos θ t + k 4 cos θ t + k 5 sin θ t + k 6 = 0
where k 1 , k 2 , k 3 , k 4 , k 5 are
k 1 = l f y ( l o y y e + l a y z e ) ( r o z r y e + r a z r z e ) + r f y ( l o z y e + l a z z e ) ( r o y r y e + r a y r z e )
k 2 = l f y ( l o y z e l a y y e ) ( r o z r z e r a z r y e ) + r f y ( l o z z e l a z y e ) ( r o y r z e r a y r y e )
k 3 = l f y ( l o y y e + l a y z e ) ( r o z r z e r a z r y e ) + l f y ( l o y z e l a y y e ) ( r o z r y e + r a z r z e ) + r f y ( l o z y e + l a z z e ) ( r o y r z e r a y r y e ) + r f y ( l o z z e l a z y e ) ( r o y r y e + r a y r z e )
k 4 = l f y ( l o y y e + l a y z e ) ( r p z + r n z r x e ) + l f y ( l p y + l n y x e ) ( r o z r y e + r a z r z e ) + r f y ( l o z y e + l a z z e ) ( r p y + r n y r x e ) + r f y ( l p z + l n z x e ) ( r o y r y e + r a y r z e )
k 5 = l f y ( l o y z e l a y y e ) ( r p z + r n z r x e ) + l f y ( l p y + l n y x e ) ( r o z r z e r a z r y e ) + r f y ( l o z z e l a z y e ) ( r p y + r n y r x e ) + r f y ( l p z + l n z x e ) ( r o y r z e r a y r y e )
k 6 = l f y ( l p y + l n y x e ) ( r p z + r n z r x e ) + r f y ( l p z + l n z x e ) ( r p y + r n y r x e )
According to the triangle relationship, we have
cos 2 θ t + sin 2 θ t = 1
Replacing cosθt in Equation (73) with sinθt, we obtain the following:
k 1 sin 4 θ t + k 2 sin 3 θ t + k 3 sin 2 θ t + k 4 sin θ t + k 5 = 0
where k 1 , k 2 , k 3 , k 4 , k 5 are
k 1 = ( k 2 k 1 ) 2 + k 3 2
k 2 = 2 ( k 2 k 1 ) k 5 + 2 k 3 k 4
k 3 = 2 ( k 2 k 1 ) k 6 + k 5 2 + k 4 2 k 3 2
k 4 = 2 k 5 k 6 2 k 3 k 4
k 5 = k 6 2 k 4 2
Four solutions can be obtained using Equation (81). The optimal solution is a real number, and the most suitable solution can be selected by the condition of Equation (72).
After θt is obtained, θp can be solved based on the obtained θt.
According to Equations (60) and (63), θ ¯ t is the solution obtained in Section 3.2.2, so that
l P c 1 = ( l T m ) 1 Rot ( X , θ ¯ t ) 0 0 1 Rot ( Y , l θ p ) 0 0 1 l T e P e 1
The following result is also available:
r P c 1 = ( r T m ) 1 Rot ( X , θ ¯ t ) 0 0 1 Rot ( Y , l θ p ) 0 0 1 r T e P e 1
Since θ ¯ t is known, for convenience of calculation, we set
l T m = ( l T m ) 1 Rot ( X , θ ¯ t ) 0 0 1 = l n x l o x l a x l p x l n y l o y l a y l p y l n z l o z l a z l p z 0 0 0 1
r T m = ( r T m ) 1 Rot ( X , θ ¯ t ) 0 0 1 = r n x r o x r a x r p x r n y r o y r a y r p y r n z r o z r a z r p z 0 0 0 1
The following results are obtained:
Δ u l = l f x ( l n x x e + l a x z e ) cos θ p + ( l a x x e l n x z e ) sin θ p + ( l p x + l o x y e ) ( l n z x e + l a z z e ) cos θ p + ( l a z x e l n z z e ) sin θ p + ( l p z + l o z y e ) Δ v l = l f y ( l n y x e + l a y z e ) cos θ p + ( l a y x e l n y z e ) sin θ p + ( l p y + l o y y e ) ( l n z x e + l a z z e ) cos θ p + ( l a z x e l n z z e ) sin θ p + ( l p z + l o z y e )
Δ u r = r f x ( r n x r x e + r a x r z e ) cos θ p + ( r a x r x e r n x r z e ) sin θ p + ( r p x + r o x r y e ) ( r n z r x e + r a z r z e ) cos θ p + ( r a z r x e r n z r z e ) sin θ p + ( r p z + r o z r y e ) Δ v r = r f y ( r n y r x e + r a y r z e ) cos θ p + ( r a y r x e r n y r z e ) sin θ p + ( r p y + r o y r y e ) ( r n z r x e + r a z r z e ) cos θ p + ( r a z r x e r n z r z e ) sin θ p + ( r p z + r o z r y e )
Assume that
E su = Δ u l + Δ u r
The solution to θp that keeps the target at the center of the two cameras needs to satisfy the following conditions:
Δ u l + Δ u r = 0 θ pmax θ p θ pmax θ p = arg min ( E su )
Substituting the second equation of Equations (91) and (92) into Equation (94) and solving the available equation, we obtain
k 1 cos 2 θ p + k 2 sin 2 θ p + k 3 sin θ p cos θ p + k 4 cos θ p + k 5 sin θ p + k 6 = 0
where
k 1 = l f x ( l n x x e + l a x z e ) ( r n z r x e + r a z r z e ) + r f x ( r n x r x e + r a x r z e ) ( l n z x e + l a z z e )
k 2 = l f x ( l n x z e l a x x e ) ( r n z r z e r a z r x e ) + r f x ( r n x r z e r a x r x e ) ( l n z z e l a z x e )
k 3 = l f x ( l n x x e + l a x z e ) ( r n z r z e r a z r x e ) + l f x ( l n x z e l a x x e ) ( r n z r x e + r a z r z e ) + r f x ( r n x r z e r a x r x e ) ( l n z x e + l a z z e ) + r f x ( r n x r x e + r a x r z e ) ( l n z z e l a z x e )
k 4 = l f x ( l n x x e + l a x z e ) ( r o z r y e + r p z ) + l f x ( l o x y e + l p x ) ( r n z r x e + r a z r z e ) + r f x ( r o x r y e + r p x ) ( l n z x e + l a z z e ) + r f x ( r n x r x e + r a x r z e ) ( l o z y e + l p z )
k 5 = l f x ( l n x z e l a x x e ) ( r o z r y e + r p z ) + l f x ( l o x y e + l p x ) ( r n z r z e r a z r x e ) + r f x ( r o x r y e + r p x ) ( l n z z e l a z x e ) + r f x ( r n x r z e r a x r x e ) ( l o z y e + l p z )
k 6 = l f x ( l o x y e + l p x ) ( r o z r y e + r p z ) + r f x ( r o x r y e + r p x ) ( l o z y e + l p z )
Replacing cosθp in Equation (73) with sinθp, we obtain
k 1 sin 4 θ p + k 2 sin 3 θ p + k 3 sin 2 θ p + k 4 sin θ p + k 5 = 0
where
k 1 = ( k 2 k 1 ) 2 + ( k 3 ) 2
k 2 = 2 ( k 2 k 1 ) k 5 + 2 k 3 k 4
k 3 = 2 ( k 2 k 1 ) k 6 + ( k 5 ) 2 + ( k 4 ) 2 ( k 3 ) 2
k 4 = 2 k 5 k 6 2 k 3 k 4
k 5 = ( k 6 ) 2 ( k 4 ) 2
Four solutions can be obtained using Equation (102). The optimal solution must be a real number, and the most suitable solution can be selected using the condition of Equation (94). For the case where the four solutions cannot satisfy Equation (94), the position of the target is beyond the position that the bionic eye can reach. In this case, compensation is required through the head or torso. θ ¯ t and θ ¯ p obtained at this time are suboptimal solutions close to the optimal solution. θt and θp are the optimal solutions.
Through the above steps, the desired observation pose can be calculated. The calculation steps of gfq can be summarized by the flow chart shown in Figure 8.

3.3. Desired Pose Calculation for Approaching Gaze Point Tracking

The mobile robot approaches the target in two steps: the first step is that the robot and the head rotate in the horizontal direction until the robot and the head are facing the target, and the second step is that the robot moves straight toward the target. The desired position of the approaching motion should satisfy the following conditions: (1) the target should be on the Z axis of the robot and the head coordinate system, (2) the distance between the target and the robot should be less than the set threshold DT and (3) the eye should be in the optimal observation position. afq can be defined as
a f q = w x mq = 0 w z mq = { z     0 < w z m z D T } w θ pq = { θ     | θ | 2 π , w γ p = 0 } h θ pq = 0 h θ tq = { θ     | θ | h θ tmax , | h γ t | h γ tmax } l θ pq = r θ pq = { θ     | θ | e θ pmax , Δ m l = Δ m r } l θ tq = r θ tq = { θ     | θ | e θ tmax , Δ m l = Δ m r }
The desired rotation angle wθpq of the moving robot is the same as the angle bγp between the robot and the target and can be obtained by
w θ pq = w γ p = arctan ( w z m w x m )
hθtq can be obtained using the method described in Section 3.2. The optimal observation pose described in Section 3.2.4 can be used to obtain lθtq, lθpq, rθtq and rθpq.

4. Robot Pose Control

After obtaining the desired pose of the robot system, the control block diagram shown in Figure 9 is used to control the robot to move to the desired pose.
The desired pose is converted to the desired position of the motor. Δθlt, Δθlp, Δθrt, Δθrp, Δθht and Δθhp are deviations of the desired angle from the current angle of motor Mlu, motor Mld, motor Mru, motor Mrd, motor Mhu and motor Mhd, respectively. lθm and rθm are the angles at which each wheel of the moving robot needs to be rotated. During the in situ gaze point tracking process, the moving robot performs only the rotation of the original position, and the angle of the robot movement can be calculated according to the desired angle of the robot. When the robot rotates, the two wheels move in opposite directions at the same speed. Let the distance between the two wheels of the moving robot be Dr; when the robot rotates around an angle wθpq, the distance that each wheel needs to move is
S = w θ pq D r 2
The diameter of each wheel is dw, and the angle of rotation of each wheel is (where counterclockwise is positive)
r θ m = l θ m = 2 S d w
In the process of approaching the target, the moving robot follows a straight line, and the angle of rotation of each wheel is
r θ m = l θ m = 2 w z mq d w
The movement of the moving robot is achieved by controlling the rotation of each wheel. Each wheel is equipped with a DC brushless motor, and a DSP2000 controller is used to control the movement of the DC brushless motor. Position servo control is implemented in the DSP2000 controller.
In the robot system, the weight of the camera and lens is approximately 80 g, the weight of the camera and the fixed mechanical parts is approximately 50 g and the motor that controls the vertical rotation of the camera (rotating around the horizontal axis of rotation) and the corresponding encoder weighs approximately 250 g. The mechanical parts of the fixed vertical rotating motor and encoder weigh approximately 100 g. The radius of the rotation of the camera in the vertical direction is approximately 1 cm, and the rotation in the horizontal direction (rotation about the vertical axis of rotation) has a radius of approximately 2 cm. Therefore, when the gravitational acceleration is 9.8 m/s2, the torque required for the vertical rotating electric machine is approximately 0.013 N·m, and the torque required for the horizontal rotating electric machine is approximately 0.043 N·m. The vertical rotating motor uses a 28BYG5401 stepping motor with a holding torque of 0.1 N·m and a positioning torque of 0.008 N·m. The driver is HSM20403A. The horizontal rotating motor is a 57BYGH301 stepping motor with a holding torque of 1.5 N·m, a positioning torque of 0.07 N·m and drive model HSM20504A. The four stepping motors of the eye have a step angle of 1.8° and are all subdivided by 25, so the actual step angle of each motor is 0.072°, and the minimum pulse width that the driver can receive is 2.5 µs. The stepper motor has a maximum angular velocity of 200°/s.
The head vertical rotary motor uses a 57BYGH401 stepper motor with a holding torque of 2.2 N·m, a positioning torque of 0.098 N·m and drive model HSM20504A. The head horizontal rotary motor is an 86BYG350B three-phase AC stepping motor with a holding torque of 5 N·m, a positioning torque of 0.3 N-m and an HSM30860M driver. The step angle of the head motor after subdivision is also 0.072°. The head vertical motor has a load of approximately 5 kg and a radius of rotation of less than 1 cm. The head horizontal rotary motor has a load of approximately 9.5 kg and a radius of rotation of approximately 5 cm. In the experiment, we found that the maximum horizontal pulse frequency that the head horizontal rotary motor can receive is 0.6 Kpps. Its maximum angular velocity is 43.2°/s.

5. Experiments and Discussion

Using the robot platform introduced in Section 2, experiments on in situ gaze point tracking and approaching gaze point tracking were performed
Each camera has a resolution of 400 × 300 pixels. The directions of rotation are [−45°, 45°]. The range of rotation of the head is [−30°, 30°]. dx and dy are 150 mm and 200 mm, respectively. The internal and external parameters, distortion parameters, initial position parameters and left- and right-hand–eye parameters of the dual purpose method are calibrated as follows:
l M in = 341.58 0 201.6 0 341.97 147.62 0 0 1
K l = 0.1905 0.2171 0.0018 0.0005 0.0823
l T m = 1.0 0.0078 0.0022 58.4172 0.0001 0.9954 0.0959 3.6042   0.0013 0.0959 0.9954 51.9366 0 0 0 1
r M in = 335.13 0 184.32 0 335.5 141.26 0 0 1
K r = 0.1861 0.1987 0.004 0.0011 0.0739
r T m = 0.9999 0.0086 0.0125 45.0147 0.0190 0.9969 0.0782 24.5528   0.0097 0.0784 0.9970 42.9270 0 0 0 1
l T r = 0.9998 0.0099 0.0193 189.5922 0.0095 0.9997 0.0215 0.0426   0.0195 0.0213 0.9996 8.9671 0 0 0 1
The experimental in situ gaze point tracking scene is shown in Figure 10, with a checkerboard target used as the target. For in situ gaze point tracking, the target is held by a person. In the approaching target gaze tracking experiment, the target is fixed in front of the robot.

5.1. In Situ Gaze Point Tracking Experiment

In the in situ gaze experiment, the target moves at a low speed within a certain range, and the robot combines the movement of the eye, the head and the mobile robot so that the binocular vision can always perceive the 3D coordinates of the target at the optimal observation posture. This experiment prompts the robot to find the target and gaze at it. In the gaze point tracking process, binocular stereo vision is used to calculate the 3D coordinates of the target in the eye coordinate system in real time. Through the positional relationship between the eye and the head, the coordinate system of the target in the eye can be converted to the head coordinate system. Similarly, the 3D coordinates of the target in the robot coordinate system can be obtained. Through the 3D coordinates, the desired poses of the eyes, head and mobile robot are calculated according to the method proposed in this paper. Then, the camera is controlled to the desired position by the stepping motor; after reaching the desired position, the image and the motor position information are collected again, and the 3D coordinates of the target are calculated.
In the experiment, the angles between the head and the target, hγpmax and hγpmax, are each 30°. The method described in Section 3 is used to calculate the desired pose of each joint of the robot based on the 3D coordinates of the target. In the experiment, the actual coordinate position and desired coordinate position of the target in the binocular image space, the actual position and desired position of the eye and head motor, the angle between the head and the robot and the target, and the target in the robot coordinate system are stored. Figure 11a,b show the u and v coordinates of the target on the left image, respectively, and Figure 11c,d show the u and v coordinates of the target on the right image, respectively. The desired image coordinates are recalculated based on the optimal observation position. Figure 11e–h show the positions of the tilt motor (Mlu) of the left eye, the pan motor (Mld) of the left eye, the tilt motor (Mru) of the right eye and the pan motor (Mrd) of the right eye, respectively. Figure 11i shows the positions of the pan motor (Mhd) of the head. Since the target moves in the vertical direction with small amplitude, the motor Mhu does not rotate, and the case is similar to the motion principle of the motor Mhd, so the motor position of the head only provides the result of the motor Mhd. Figure 11j shows the angle deviation and rotation. In this figure, T-h is the angle between the head and target, T-r is the angle between the robot and target, R-r is the angle of the robot rotation from the origin location and T-o is the angle of the target to the origin location. Figure 11k shows the coordinates (wx, wz) of the target in the world coordinate system. Figure 11l shows the coordinates (ox, oz) of the target in the world coordinate system of the origin location.
As shown in Figure 11, the image coordinate of the target is substantially within ±40 pixels in the central region of the left and right images in the x direction. These coordinates are kept within ±10 pixels of the center region of the left and right images in the y direction. Throughout the experiment, the target was rotated approximately 200° around the robot. The robot moved approximately 140°, the head rotated 30° and the target could be kept in the center region of the binocular images. The motor position curve shows that the motor’s operating position can track the desired position very well. The angle variation curve shows that the angle between the target and the head and the robot changes and that the robot turning angles are suitably consistent. The coordinates of the target shown in Figure 11 in the robot coordinate system and the coordinates of the target in the initial position of the world coordinate system are very close to the actual position change in the target’s position.
Through the above analysis, we can determine the following: (1) It is feasible to realize gaze point tracking of a robot based on 3D coordinates. (2) Using the movement of the head, eyes and mobile robot used in this paper, it is possible to achieve gaze point tracking of the target while ensuring minimum resource consumption.

5.2. Approaching Gaze Point Tracking Experiment

The approaching gaze point tracking experimental scene is shown in Figure 12.
The robot approaches the target without obstacles and reaches the area in which the robot can operate on the target. The target can be grasped or carefully observed. In the approaching gaze experiment, a target is fixed at a position 2.2 m from the robot, and when the robot moves to a position where the distance from the target to robot is 0.6 m, the motion is stopped, and the maximum speed of the moving robot is 1 m/s. The experiment realizes the approaching movement to the target in two steps: first, the head, the eye and the moving robot chassis are rotated so that the head and the moving robot are facing the target, and the head observes the target in the optimal observation posture; second, the movement is controlled. The robot moves linearly in the target’s direction. During the movement, the angles of the head and the eye are fine-tuned, and the 3D coordinates of the target are detected in real time until the z coordinate of the target in the robot coordinate system is less than the threshold set to stop the motion.
Figure 13 shows the results of the approaching gaze point tracking experiment. Figure 13a,b show the u and v coordinates of the target on the left image, respectively, and Figure 13c,d show the u and v coordinates of the target on the right image, respectively. The desired image coordinates are recalculated based on the optimal observation position. Figure 13e–h show the positions of the tilt motor (Mlu) of the left eye, the pan motor (Mld) of the left eye, the tilt motor (Mru) of the right eye and the pan motor (Mrd) of the right eye, respectively. Figure 13i shows the positions of the pan motor (Mhd) of the head. Figure 13j shows the angle deviation and rotation. T-h is the angle between the head and the target, T-r is the angle between the robot and the target, R-r is the angle of the robot’s rotation from the origin location and T-o is the angle of the target to the origin location. Figure 13k shows the coordinates (wx, wz) of the target in the world coordinate system. Figure 13l shows the robot’s forward distance and the distance between the target and the robot.
The change in the image’s coordinate curve indicates that the coordinates of the target in the left and right images move from the initial position to the central region of the image and stabilize in the center region of the image during the approach process. In the process of turning towards the target in the first step, the target coordinates in the image fluctuate because the head motor rotates a large amount and is accompanied by a certain vibration during the rotation, which can be avoided by using a system with better stability. The variety curve of the motor position in Figure 13 shows that the motion of the motor can track the target well with the desired pose, and the prediction of the 3D coordinates is not used during the tracking process, so this prediction is accompanied by a cycle lag. The changes in angle in Figure 13 show that the robot system achieves the task of steering toward the target in the first few control cycles and then moves toward the target at a stable angle. Figure 13a shows the change in the coordinates of the target in the robot coordinate system. When the robot rotates, fluctuations arise around the measured x coordinate, mainly due to the measurement error caused by the shaking of the system. The experimental results in Figure 13b show that the robot’s movement toward the target is very consistent. During the approach process, the target can be kept within ±50 pixels of the desired position in the horizontal direction of the image while being within ±20 pixels of the desired position in the vertical direction of the image. The eye motor achieves fast tracking of the target in 1.5 s. The angle between the target and the head is reduced from 20° to 0°, and the angle between the target and the robot is reduced from 35° to 0°. The robot then over-turns. At 34°, the target changes by 34° from the initial position.
Through the above analysis, it can be found that by using the combination of the head, the eye and the trunk in the present method, the approach toward the target can be achieved while ensuring that the robot is gazing at the target.

6. Conclusions

This study achieved gaze point tracking based on the 3D coordinates of the target. First, a robot experiment platform was designed. Based on the bionic eye experiment platform, a head with two degrees of freedom was added, using the mobile robot as a carrier.
Based on the characteristics of the robot platform, this paper proposed a method of gaze point tracking. To achieve in situ gaze point tracking, the combination of the eyes, head and trunk is designed based on the principles of minimum resource consumption and maximum system stability. Eye rotation consumes the least amount of resources and has minimal impact on the stability of the overall system during the exercise. The head rotation consumes more resources than the eyeball but fewer than the trunk rotation. At the same time, the rotation of the head affects the stability of the eyeball but only minimally affects the stability of the entire robotic system. The resources consumed by the rotation of the trunk generally predominate, and the rotation of the trunk tends to affect the stability of the head and the eye. Therefore, when the eye can observe the target in the optimal observation posture, only the eye is rotated; otherwise, the head is rotated, and when the angle at which the head needs to move exceeds its threshold, the mobile robot rotates. When approaching gaze point tracking is performed, the robot and head first face the target and then move straight toward the vicinity of the target. Based on the proposed gaze point tracking method, this paper provides an expected pose calculation method for the horizontal rotation angle and the vertical rotation angle.
Based on the experimental robot platform, a series of experiments was performed, and the effectiveness of the gaze point tracking method was verified. In our future works, a practical task of delivering medicine in a hospital and more detailed comparative experiments, as well as discussions with other similar studies, will be implemented.

Author Contributions

Conceptualization, Q.W. and M.Q.; Methodology, X.F.; Software, X.F.; Validation, X.F.; Formal analysis, H.C.; Investigation, M.Q.; Data curation, Q.W. and Y.Z.; Writing—original draft, X.F. and Q.W.; Writing—review & editing, M.Q.; Project administration, H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data was created.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Measurement Error Analysis of Binocular Stereo Vision

It is assumed that the angle at which the left motion module rotates in the horizontal direction with respect to the initial position is lθp and that the angle at which the right motion module rotates in the horizontal direction with respect to the initial position is rθp. When using the bionic eye platform for 3D coordinate calculation, we find that when lθp = rθp, the measured three-dimensional coordinate ratio is higher than lθp > 0 and rθp < 0, which is more accurate, as shown in Figure A1a,b. In this case, the measurement accuracy is higher than that in the case shown in Figure A1c. This chapter analyzes the measurement error of three-dimensional coordinates, explains the reason for this phenomenon and proposes a method to improve the accuracy of three-dimensional coordinate measurement.
Figure A1. Binocular motion mode, (a) initial position lθp = rθp, (b) lθp= rθp, (c) lθp > 0 and rθp < 0.
Figure A1. Binocular motion mode, (a) initial position lθp = rθp, (b) lθp= rθp, (c) lθp > 0 and rθp < 0.
Sensors 23 06299 g0a1
For the convenience of calculation, according to the characteristics of the bionic eye platform, the binocular stereo vision measurement model shown in Figure A2 is used to analyze the error of the three-dimensional coordinate measurement.
Figure A2. The principle of the vision system of bionic eyes.
Figure A2. The principle of the vision system of bionic eyes.
Sensors 23 06299 g0a2

Appendix A.1. Vision System of Bionic Eyes

Two cameras are used to imitate human eyes and the principle of the vision system of bionic eyes is shown in Figure 1. We suppose that the optical axes of two cameras used to imitate human eyes are coplanar. As shown in Figure 1, OwXwZw is the world coordinate system, the Xw axis is along the baseline of two cameras and the Zw axis is in the plane , which consists of the two cameras’ optical axes. lOclXclZc is the coordinate system of the left camera, where lZc is along the optical axis of the left camera and lXc axis is in the plane . rOcrXcrZc is the coordinate system of the right camera, rZc is along the optical axis of the right camera and rXc axis is in the plane . The two cameras can move cooperatively to imitate the movement of human eyes.

Appendix A.2. Depth Measurement Model

In Figure A2, the position vector of point lOc in OwXwZw is wOl = [−di/2, 0]T and the position vector of point rOc in OwXwZw is wOr = [di/2, 0]T. di is the length of the baseline. Let lP = [xl, zl]T and rP = [xr, zr]T be the position vectors of the object point P in the left and right camera coordinate systems, respectively. Let wP = [xw, zw]T be the position vector of point P in OwXwZw; then, lP can be obtained as follows:
l P = l R w w P w O l
where lRw is the rotation transformation matrix from the world coordinate system to the left camera coordinate system. lRw can be expressed as
l R w = cos θ l sin θ l sin θ l cos θ l
where θ l is the rotation angle which is defined as the angle between camera optical axis and the Zw axis.
Based on the same principle, we can obtain
r P = r R w w P w O r
where
r R w = cos θ r sin θ r sin θ r cos θ r
As shown in Figure 1, Pl1 is the intersection point of the line lOcP and the normalized image plane of the left camera, and Pr1 is the intersection point of the line rOcP and the normalized image plane of the right camera. From the geometric relationship as shown in Figure 1, the position vector of point Pl1 in lOclXclZc can be expressed as lPl1 = [xl/zl, 1]T, and the position vector of point Pr1 in rOcrXcrZc can be expressed as rPr1 = [xr/zr, 1]. Let wPl1 = [wxl1, wzl1]T be the position vector of point Pl1 in OwXwZw and wPr1 = [wxr1, wzr1]T be the position vector of point Pr1 in OwXwZw. wPl1 and wPr1 can be calculated in (A5) and (A6) by coordinate transformation according to (A1) and (A3).
w P l 1 = l R w 1 l P l 1 + w O l
w P r 1 = r R w 1 r P r 1 + w O r
From Equations (99) and (100), we can obtain
w P l 1 = w x l 1 w z l 1 = x l z l cos θ l sin θ l d i 2 x l z l sin θ l + cos θ l
w P r 1 = w x r 1 w z r 1 = x r z r cos θ r sin θ r + d i 2 x r z r sin θ r + cos θ r
According to wOl = [−di/2, 0]T and Equation (A7), the line lOcP can be expressed as Equation (A9) in the world coordinate system OwXwZw.
z = w z l 1 d i 2 + w x l 1 x + w z r 1 × d i 2 d i 2 + w x l 1
Based on wOr = [di/2, 0]T and Equation (A8), the line rOcP in the world coordinate system OwXwZw can be expressed as follows:
z = w z r 1 d i 2 w x r 1 x + w z r 1 × d i 2 d i 2 w x l 1
The intersection point of the lines rOcP and lOcP is the point P, so the depth zw of point P can be obtained by Equations (A9) and (A10).
z w = d i w z r 1 w z r 1 w x r 1 w z l 1 w z l 1 d i 2 w z r 1 d i 2 w x l 1 w z r 1

Appendix A.3. Measurement Error Analysis

In the real situation, there are the eyes’ rotation angle errors (usually caused by time difference between image acquisition and motor angle acquisition when the two cameras move, the stepper motor’s clearance error and the encoder’s resolution) and image errors (usually caused by image distortion, image resolution and image feature extraction error). Let Δ m l and Δ m r be the image errors of the two cameras, respectively, then lPl1= [xl/zl, 1]T and rPr1= [xr/zr, 1] can be revised as l P l 1 = [xl/zl + Δ m l , 1]T and r P r 1 = [xr/zr + Δ m r , 1]. Let Δ θ l and Δ θ r be the errors of two cameras’ rotation angles, respectively. Therefore, we can rewrite Equations (A7) and (A8) as follows:
w P l 1 = w x l 1 w z l 1 = x l z l + Δ m l cos θ l + Δ θ l sin θ l + Δ θ l d i 2 x l z l + Δ m l sin θ l + Δ θ l + cos θ l + Δ θ l
w P r 1 = w x r 1 w z r 1 = x r z r + Δ m r cos θ r + Δ θ r sin θ r + Δ θ r + d i 2 x r z r + Δ m r sin θ r + Δ θ r + cos θ r + Δ θ r
Based on the same principle, the revised depth z w of point P can be obtained as follows:
z w = d i w z l 1 w z r 1 w x r 1 w z l 1 w z l 1 d i 2 w z r 1 d i 2 w x l 1 w z r 1
According to the cooperative movement pattern of human eyes, the absolute values of θl and θr are restricted to a limited range and assumed to be equal as follows:
θ l = θ r = θ s . t . 0 θ < π 2
Since Δmr, Δml, Δθl and Δθr are usually close to 0, the simplified expression of the error Δ z between the actual value zw and measurement value z w can be derived from (A7), (A8) and (A11)–(A14).
Δ z = z w z w z w z w ( Δ m l cos 2 θ Δ m r cos 2 θ Δ θ l + Δ θ r ) d i + z w ( Δ m l cos 2 θ Δ m r cos 2 θ Δ θ l + Δ θ r )
In addition, the relative error of zw is
Δ z r = Δ z z w z w ( Δ m l cos 2 θ Δ m r cos 2 θ Δ θ l + Δ θ r ) d i + z w ( Δ m l cos 2 θ Δ m r cos 2 θ Δ θ l + Δ θ r )
Let
ε = Δ m l cos 2 θ Δ m r cos 2 θ Δ θ l + Δ θ r
In practice, ɛ has very small value and zwɛ < < di, so Δ z r can be simplified as follows:
Δ z r z w ε d i
From Equation (A19), it can be known that relative error of zw is proportional to zwɛ and inversely proportional to di. Thus, we can adopt the following strategies to reduce the error of depth:
(1) Keep di long enough and constant when the bionic eyes move.
(2) Observe the target as close as possible since the depth error is smaller when bionic eyes observe target in a close distance.
(3) Control the two cameras of the bionic eyes with the same angular velocity during the process of the eyes’ movement. In this way, Δθl and Δθr will be approximately equal to each other, and ɛ can be reduced.
(4) Keep the target on the Zw axis if possible, so that Δml and Δmr are close to each other.
These strategies can be used to design effective motion control methods so that bionic eyes can perceive the target’s 3D information accurately.

Appendix B. Proof of Equation (100)

Let
a = d i w z l 1 w z r 1
b = w x r 1 w z l 1 w z l 1 d i 2 w z r 1 d i 2 w x l 1 w z r 1
Then, z w in Equation (A14) can be expressed as
z w = a b
From Equations (A7) and (A8), we can obtain
w z l 1 = ( x l z l + Δ m l ) sin ( θ l + Δ θ l ) + cos ( θ l + Δ θ l )
w z r 1 = ( x r z r + Δ m r ) sin ( θ r + Δ θ r ) + cos ( θ r + Δ θ r )
From (A10), (A23) and (A24), we can obtain
a = d i x r z r x l z l [ sin θ r cos Δ θ r + cos θ r sin Δ θ r ] [ sin θ l cos Δ θ l + cos θ l sin Δ θ l ] d i x r z r [ sin θ r cos Δ θ r + cos θ r sin Δ θ r ] [ sin θ l cos Δ θ l + cos θ l sin Δ θ l ] Δ m l d i x r z r [ sin θ r cos Δ θ r + cos θ r sin Δ θ r ] [ cos θ l cos Δ θ l sin θ l sin Δ θ l ] d i x l z l [ sin θ r cos Δ θ r + cos θ r sin Δ θ r ] [ sin θ l cos Δ θ l + cos θ l sin Δ θ l ] Δ m r d i [ sin θ r cos Δ θ r + cos θ r sin Δ θ r ] [ sin θ l cos Δ θ l + cos θ l sin Δ θ l ] Δ m l Δ m r d i [ sin θ r cos Δ θ r + cos θ r sin Δ θ r ] [ cos θ l cos Δ θ l sin θ l sin Δ θ l ] Δ m r d i x l z l [ sin θ l cos Δ θ l + cos θ l sin Δ θ l ] [ cos θ r cos Δ θ r sin θ r sin Δ θ r ] d i [ sin θ l cos Δ θ l + cos θ l sin Δ θ l ] [ cos θ r cos Δ θ r sin θ r sin Δ θ r ] Δ m l d i [ cos θ r cos Δ θ r sin θ r sin Δ θ r ] [ cos θ l cos Δ θ l sin θ l sin Δ θ l ]
Δmr, Δml, Δθl and Δθr are usually close to 0, so
cos Δ θ r 1 cos Δ θ l 1 sin Δ θ r sin Δ θ l 0 Δ m l Δ m r 0 Δ m l sin Δ θ l 0 Δ m r sin Δ θ l 0 Δ m l sin Δ θ r 0 Δ m r sin Δ θ r 0
From (A25) and (A26), we can obtain
a d i 1 z r z l [ x l x r sin θ l sin θ r x r z l cos θ l sin θ r x l z r sin θ l cos θ r z r z l cos θ r cos θ l + sin Δ θ l ( x l x r sin θ r cos θ l + x r z l sin θ r sin θ l x l z r cos θ r cos θ l + z r z l cos θ r sin θ l ) + sin Δ θ r ( x l z r sin θ l sin θ r + z r z l cos θ l sin θ r x l x r sin θ l cos θ r x r z l cos θ l cos θ r ) Δ m l ( x r z l sin θ l sin θ r l + z r z l sin θ l cos θ r ) Δ m r ( x l z r sin θ r sin θ l z r z l sin θ r cos θ l ) ]
From (A7), (A8) and (A19), we can obtain
x l = ( x w + d i 2 ) cos θ z w sin θ z l = ( x w + d i 2 ) sin θ + z w cos θ x r = ( x w d i 2 ) cos θ + z w sin θ z r = ( x w d i 2 ) sin θ + z w cos θ
Equation (A19) can be derived from (A19), (A27) and (A28):
a z w 2 d i + [ d i 2 sin 2 θ + d i ( x w + d i 2 ) sin 2 θ z w ] Δ m l z l z r + z w 2 [ d i ( x w d i 2 ) sin 2 θ z w d i 2 sin 2 θ ] Δ m r z l z r + z w 2 d i ( x w + d i 2 ) sin Δ θ l z w + d i ( x w d i 2 ) sin Δ θ r z w z l z r
Δmr, Δml, Δθl and Δθr are usually close to 0, x w z w , and d i z w . Thus,
a z w 2 d i z l z r
From (A21), (A23) and (A24), we can obtain
b = x l z l x r z r ( sin θ l cos Δ θ l + cos θ l sin Δ θ l ) ( cos θ r cos Δ θ r r sin θ r sin Δ θ r ) + x l z l ( sin θ l cos Δ θ l + cos θ l sin Δ θ l ) ( cos θ r cos Δ θ r sin θ r sin Δ θ r ) Δ m r x l z l ( sin θ l cos Δ θ l + cos θ l sin Δ θ l ) ( sin θ r cos Δ θ r + cos θ r sin Δ θ r ) + x r z r ( sin θ l cos Δ θ l + cos θ l sin Δ θ l ) ( cos θ r cos Δ θ r sin θ r sin Δ θ r ) Δ m l + ( sin θ l cos Δ θ l + cos θ l sin Δ θ l ) ( cos θ r cos Δ θ r sin θ r sin Δ θ r ) Δ m l Δ m r ( sin θ l cos Δ θ l + cos θ l sin Δ θ l ) ( sin θ r cos Δ θ r + cos θ r sin Δ θ r ) Δ m l + x r z r ( cos θ l cos Δ θ l sin θ l sin Δ θ l ) ( cos θ r cos Δ θ r sin θ r sin Δ θ r ) + ( cos θ l cos Δ θ l sin θ l sin Δ θ l ) ( cos θ r cos Δ θ r sin θ r sin Δ θ r ) Δ m r ( cos θ l cos Δ θ l sin θ l sin Δ θ l ) ( sin θ r cos Δ θ r + cos θ r sin Δ θ r ) x r z r x l z l ( sin θ r cos Δ θ r + cos θ r sin Δ θ r ) ( cos θ l cos Δ θ l sin θ l sin Δ θ l ) x r z r ( sin θ r cos Δ θ r + cos θ r sin Δ θ r ) ( cos θ l cos Δ θ l sin θ l sin Δ θ l ) Δ m l + x r z r ( sin θ r cos Δ θ r + cos θ r sin Δ θ r ) ( sin θ l cos Δ θ l + cos θ l sin Δ θ l ) x l z l ( sin θ r cos Δ θ r + cos θ r sin Δ θ r ) ( cos θ l cos Δ θ l sin θ l sin Δ θ l ) Δ m r ( sin θ r cos Δ θ r + cos θ r sin Δ θ r ) ( cos θ l cos Δ θ l sin θ l sin Δ θ l ) Δ m l Δ m r + ( sin θ r cos Δ θ r + cos θ r sin Δ θ r ) ( sin θ l cos Δ θ l + cos θ l sin Δ θ l ) Δ m r x l z l ( cos θ r cos Δ θ r sin θ r sin Δ θ r ) ( cos θ l cos Δ θ l sin θ l sin Δ θ l ) ( cos θ r cos Δ θ r sin θ r sin Δ θ r ) ( cos θ l cos Δ θ l sin θ l sin Δ θ l ) Δ m l + ( cos θ r cos Δ θ r sin θ r sin Δ θ r ) ( sin θ l cos Δ θ l + cos θ l sin Δ θ l )
From (A26) and (A31), we can obtain:
b 1 z l z r [ x l x r sin θ l cos θ r x l z r sin θ l sin θ r + z l x r cos θ l cos θ r z l z r cos θ l sin θ r x l x r sin θ r cos θ l + z l z r sin θ l cos θ r x l z r cos θ l cos θ r + z l x r sin θ l sin θ r + sin Δ θ l ( x l x r cos θ l cos θ r x l z r cos θ l sin θ r z l x r cos θ r sin θ l + x l z r sin θ l cos θ r + z l z r cos θ r cos θ l + z l z r sin θ l sin θ r + x l x r sin θ l sin θ r + z l x r sin θ r cos θ l ) + sin Δ θ r ( x l x r sin θ l sin θ r x l z r sin θ l cos θ r z l x r cos θ l sin θ r z l z r cos θ l cos θ r x l x r cos θ l cos θ r + z l x r sin θ l cos θ r + x l z r cos θ l sin θ r z l z r sin θ l sin θ r ) + Δ m l ( z l x r sin θ l cos θ r z l z r sin θ l sin θ r z l x r cos θ l sin θ r z l z r cos θ l cos θ r ) + Δ m r ( x l z r sin θ l cos θ r + z l z r cos θ l cos θ r x l z r cos θ l sin θ r + z l z r sin θ l sin θ r ) ]
Equation (A33) can be derived by (A19), (A28) and (A32):
b z w d i Δ m l [ 2 x w sin θ cos θ + ( x w 2 d i 2 4 ) z w sin 2 θ + z w cos 2 θ ] z l z r + z w Δ m r [ z w cos 2 θ + ( x w 2 d i 2 4 ) z w sin 2 θ 2 x w sin θ cos θ ] z l z r + z w sin Δ θ l [ z w + x w 2 d i 2 4 z w ] sin Δ θ r [ z w + x w 2 d i 2 4 z w ] z l z r
Δmr, Δml, Δθl and Δθr are usually close to 0, x w z w , and d i z w . Thus,
b z w d i + z w Δ m r cos 2 θ Δ m l cos 2 θ + sin Δ θ l sin Δ θ r z l z r
From (A22), (A29) and (A34), we can obtain
z w z w d i d i z w Δ m r cos 2 θ Δ m l cos 2 θ + sin Δ θ l sin Δ θ r
So,
Δ z = z w z w z w z w ( Δ m l cos 2 θ Δ m r cos 2 θ sin Δ θ l + sin Δ θ r ) d i + z w ( Δ m l cos 2 θ Δ m r cos 2 θ sin Δ θ l + sin Δ θ r )
Δθl and Δθr are usually close to 0, so
Δ z z w z w ( Δ m l cos 2 θ Δ m r cos 2 θ Δ θ l + Δ θ r ) d i + z w ( Δ m l cos 2 θ Δ m r cos 2 θ Δ θ l + Δ θ r )
The proof is completed.

Appendix C. Two Equations Related to θt and θp

Substituting Equation (59) into Equation (60), we can obtain:
( 1 x c 1 y c 1 z c 1 ) = ( 1 n x 1 o x 1 a x 1 p x 1 n y 1 o y 1 a y 1 p y 1 n z 1 o z 1 a z 1 p z 0 0 0 1 ) ( 1 0 0 0 0 cos 1 θ t sin 1 θ t 0 0 sin 1 θ t cos 1 θ t 0 0 0 0 1 ) ( cos 1 θ p 0 sin 1 θ p 0 0 1 0 0 sin 1 θ p 0 cos 1 θ p 0 0 0 0 1 ) ( x w y w z w 1 )
Equation (A38) can be factored out:
( 1 x c 1 y c 1 z c ) = ( l n x x w cos l θ p l n x z w sin l θ p + l o x x w sin l θ p sin l θ t + l o x y w cos l θ t + l o x z w cos l θ p sin l θ t + l a x x w sin l θ p cos l θ t l a x y w sin l θ t + l a x z w cos l θ p cos l θ t + l p x l n y x w cos l θ p l n y z w sin l θ p + l o y x w sin l θ p sin l θ t + l o y y w cos l θ t + l o y z w cos l θ p sin l θ t + l a y x w sin l θ p cos l θ t l a y y w sin l θ t + l a y z w cos l θ p cos l θ t + l p y l n z x w cos l θ p l n z z w sin l θ p + l o z x w sin l θ p sin l θ t + l o z y w cos l θ t + l o z z w cos l θ p sin l θ t + l a z x w sin l θ p cos l θ t l a z y w sin l θ t + l a z z w cos l θ p cos l θ t + l p z )
Substituting Equation (A39) into Equation (62), we can obtain:
( Δ u l Δ v l ) = ( ( l k x l n x x w cos l θ p l k x l n x z w sin l θ p + l k x l o x x w sin l θ p sin l θ t + l k x l o x y w cos l θ t + l k x l o x z w cos l θ p sin l θ t + l k x l a x x w sin l θ p cos l θ t l k x l a x y w sin l θ t + l k x l a x z w cos l θ p cos l θ t + l k x l p x ) ( l n z x w cos l θ p l n z z w sin l θ p + l o z x w sin l θ p sin l θ t + l o z y w cos l θ t + l o z z w cos l θ p sin l θ t + l a z x w sin l θ p cos l θ t l a z y w sin l θ t + l a z z w cos l θ p cos l θ t + l p z ) ( l k y l n y x w cos l θ p l k y l n y z w sin l θ p + l k y l o y x w sin l θ p sin l θ t + l k y l o y y w cos l θ t + l k y l o y z w cos l θ p sin l θ t + l k y l a y x w sin l θ p cos l θ t l k y l a y y w sin l θ t + l k y l a y z w cos l θ p cos l θ t + l k y l p y ) ( l n z x w cos l θ p l n z z w sin l θ p + l o z x w sin l θ p sin l θ t + l o z y w cos l θ t + l o z z w cos l θ p sin l θ t + l a z x w sin l θ p cos l θ t l a z y w sin l θ t + l a z z w cos l θ p cos l θ t + l p z ) )
Based on the same principle, substituting into each matrix and factoring the value of ∆mr, we can obtain
( Δ u r Δ v r ) = ( ( r k x r n x r x w cos r θ p r k x r n x r z w sin r θ p + r k x r o x r x w sin r θ p sin r θ t + r k x r o x r y w cos r θ t + r k x r o x r z w cos r θ p sin r θ t + r k x r a x r x w sin r θ p cos r θ t r k x r a x r y w sin r θ t + r k x r a x r z w cos r θ p cos r θ t + r k x r p x ) ( r n z r x w cos r θ p r n z r z w sin r θ p + r o z r x w sin r θ p sin r θ t + r o z r y w cos r θ t + r o z r z w cos r θ p sin r θ t + r a z r x w sin r θ p cos r θ t r a z r y w sin r θ t + r a z r z w cos r θ p cos r θ t + r p z ) ( r k y r n y r x w cos r θ p r k y r n y r z w sin r θ p + r k y r o y r x w sin r θ p sin r θ t + r k y r o y r y w cos r θ t + r k y r o y r z w cos r θ p sin r θ t + r k y r a y r x w sin r θ p cos r θ t r k y r a y r y w sin r θ t + r k y r a y r z w cos r θ p cos r θ t + r k y r p y ) ( r n z r x w cos r θ p r n z r z w sin r θ p + r o z r x w sin r θ p sin r θ t + r o z r y w cos r θ t + r o z r z w cos r θ p sin r θ t + r a z r x w sin r θ p cos r θ t r a z r y w sin r θ t + r a z r z w cos r θ p cos r θ t + r p z ) )
By Equations (2), (A40) and (A41), Equation (A42) related to θt and θp can be obtained. It can be found from Equation (A32) that both θt and θp appear in the form of a trigonometric function, and it is difficult to obtain values of θt and θp directly from these two equations. In order to obtain the solution available in the project, we firstly obtain a sub-optimal observation pose and then use the sub-optimal observation pose as the initial value. We finally use the trial and error method to obtain the optimal observation pose.
{ ( l k x l n x x w cos θ p l k x l n x z w sin θ p + l k x l o x x w sin θ p sin θ t + l k x l o x y w cos θ t + l k x l o x z w cos θ p sin θ t + l k x l a x y w sin θ t l k x l a x x w sin θ p cos θ t + l k x l a x z w cos θ p cos θ t + l k x l p x ) ( r n z r x w cos θ p r n z r z w sin θ p + r o z r x w sin θ p sin θ t + r p z + r o z r y w cos θ t + r o z r z w cos θ p sin θ t + r a z r x w sin θ p cos θ t r a z r y w sin θ t + r a z r z w cos θ p cos θ t ) = ( r k x r n x r x w cos θ p r k x r n x r z w sin θ p + r k x r o x r x w sin θ p sin θ t + r k x r o x r y w cos θ t + r k x r o x r z w cos θ p sin θ t + r k x r a x r x w sin θ p cos θ t + r k x r p x r k x r a x r y w sin θ t + r k x r a x r z w cos θ p cos θ t ) ( l n z x w cos θ p l n z z w sin θ p + l o z x w sin θ p sin θ t + l o z y w cos θ t + l o z z w cos θ p sin θ t + l a z x w sin θ p cos θ t l a z y w sin θ t + l a z z w cos θ p cos θ t + l p z ) ( l k y l n y x w cos θ p l k y l n y z w sin θ p + l k y l o y x w sin θ p sin θ t + l k y l o y y w cos θ t + l k y l o y z w cos θ p sin θ t l k y l a y y w sin θ t + l k y l a y x w sin θ p cos θ t + l k y l a y z w cos θ p cos θ t + l k y l p y ) ( r n z r x w cos θ p r n z r z w sin θ p + r o z r x w sin θ p sin θ t + r p z + r o z r y w cos θ t + r o z r z w cos θ p sin θ t + r a z r x w sin θ p cos θ t r a z r y w sin θ t + r a z r z w cos θ p cos θ t ) = ( r k y r n y r x w cos θ p r k y r n y r z w sin θ p + r k y r o y r x w sin θ p sin θ t + r k y r o y r y w cos θ t + r k y r o y r z w cos θ p sin θ t + r k y r a y r x w sin θ p cos θ t + r k y r p y r k y r a y r y w sin θ t + r k y r a y r z w cos θ p cos θ t ) ( l n z x w cos θ p l n z z w sin θ p + l o z x w sin θ p sin θ t + l o z y w cos θ t + l o z z w cos θ p sin θ t + l a z x w sin θ p cos θ t l a z y w sin θ t + l a z z w cos θ p cos θ t + l p z )

References

  1. Wang, Q.; Zou, W.; Xu, D.; Zhu, Z. Motion control in saccade and smooth pursuit for bionic eye based on three-dimensional coordinates. J. Bionic Eng. 2017, 14, 336–347. [Google Scholar] [CrossRef]
  2. Kardamakis, A.A.; Moschovakis, A.K. Optimal control of gaze shifts. J. Neurosci. 2009, 29, 7723–7730. [Google Scholar] [CrossRef] [PubMed]
  3. Freedman, E.G.; Sparks, D.L. Coordination of the eyes and head: Movement kinematics. Exp. Brain Res. 2000, 131, 22–32. [Google Scholar] [CrossRef] [PubMed]
  4. Nakashima, R.; Fang, Y.; Hatori, Y.; Hiratani, A.; Matsumiya, K.; Kuriki, I.; Shioiri, S. Saliency-based gaze prediction based on head direction. Vis. Res. 2015, 117, 59–66. [Google Scholar] [CrossRef] [PubMed]
  5. He, H.; Ge, S.S.; Zhang, Z. A saliency-driven robotic head with bio-inspired saccadic behaviors for social robotics. Auton. Robot. 2014, 36, 225–240. [Google Scholar] [CrossRef]
  6. Law, J.; Shaw, P.; Lee, M. A biologically constrained architecture for developmental learning of eye–head gaze control on a humanoid robot. Auton. Robot. 2013, 35, 77–92. [Google Scholar] [CrossRef]
  7. Wijayasinghe, I.B.; Aulisa, E.; Buttner, U.; Ghosh, B.K.; Glasauer, S.; Kremmyda, O. Potential and optimal target fixating control of the human head/eye complex. IEEE Trans. Control Syst. Technol. 2015, 23, 796–804. [Google Scholar] [CrossRef]
  8. Ghosh, B.K.; Wijayasinghe, I.B.; Kahagalage, S.D. A geometric approach to head/eye control. IEEE Access 2014, 2, 316–332. [Google Scholar] [CrossRef]
  9. Kuang, X.; Gibson, M.; Shi, B.E.; Rucci, M. Active vision during coordinated head/eye movements in a humanoid robot. IEEE Trans. Robot. 2012, 28, 1423–1430. [Google Scholar] [CrossRef]
  10. Vannucci, L.; Cauli, N.; Falotico, E.; Bernardino, A.; Laschi, C. Adaptive visual pursuit involving eye-head coordination and prediction of the target motion. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots, Madrid, Spain, 18–20 November 2014; pp. 541–546. [Google Scholar]
  11. Huelse, M.; McBride, S.; Law, J.; Lee, M. Integration of active vision and reaching from a developmental robotics perspective. IEEE Trans. Auton. Ment. Dev. 2010, 2, 355–367. [Google Scholar] [CrossRef] [Green Version]
  12. Anastasopoulos, D.; Naushahi, J.; Sklavos, S.; Bronstein, A.M. Fast gaze reorientations by combined movements of the eye, head, trunk and lower extremities. Exp. Brain Res. 2015, 233, 1639–1650. [Google Scholar] [CrossRef] [Green Version]
  13. Daye, P.M.; Optican, L.M.; Blohm, G.; Lefèvre, P. Hierarchical control of two-dimensional gaze saccades. J. Comput. Neurosci. 2014, 36, 355–382. [Google Scholar] [CrossRef] [PubMed]
  14. Rajruangrabin, J.; Popa, D.O. Robot head motion control with an emphasis on realism of neck–eye coordination during object tracking. J. Intell. Robot. Syst. 2011, 63, 163–190. [Google Scholar] [CrossRef]
  15. Schulze, L.; Renneberg, B.; Lobmaier, J.S. Gaze perception in social anxiety and social anxiety disorder. Front. Hum. Neurosci. 2013, 7, 1–5. [Google Scholar] [CrossRef] [PubMed]
  16. Liu, Y.; Zhu, D.; Peng, J.; Wang, X.; Wang, L.; Chen, L.; Li, J.; Zhang, X. Real-time robust stereo visual SLAM system based on bionic eyes. IEEE Trans. Med. Robot. Bionics 2020, 2, 391–398. [Google Scholar] [CrossRef]
  17. Guitton, D. Control of eye–head coordination during orienting gaze shifts. Trends Neurosci. 1993, 15, 174–179. [Google Scholar] [CrossRef]
  18. Matveev, A.S.; Hoy, M.C.; Savkin, A.V. 3D environmental extremum seeking navigation of a nonholonomic mobile robot. Automatica 2014, 50, 1802–1815. [Google Scholar] [CrossRef]
  19. Nefti-Meziani, S.; Manzoor, U.; Davis, S.; Pupala, S.K. 3D perception from binocular vision for a low cost humanoid robot NAO. Robot. Auton. Syst. 2015, 68, 129–139. [Google Scholar] [CrossRef]
  20. Surmann, H.; Nüchter, A.; Hertzberg, J. An autonomous mobile robot with a 3D laser range finder for 3D exploration and digitalization of indoor environments. Robot. Auton. Syst. 2003, 45, 181–198. [Google Scholar] [CrossRef]
  21. Song, W.; Minami, M.; Shen, L.Y.; Zhang, Y.N. Bionic tracking method by hand & eye-vergence visual servoing. Adv. Manuf. 2016, 4, 157–166. [Google Scholar]
  22. Li, H.Y.; Luo, J.; Huang, C.J.; Huang, Q.Z.; Xie, S.R. Design and control of 3-DoF spherical parallel mechanism robot eyes inspired by the binocular vestibule-ocular reflex. J. Intell. Robot. Syst. 2015, 78, 425–441. [Google Scholar] [CrossRef]
  23. Masseck, O.A.; Hoffmann, K.P. Comparative neurobiology of the optokinetic reflex. Ann. N. Y. Acad. Sci. 2009, 1164, 430–439. [Google Scholar] [CrossRef] [PubMed]
  24. Bruske, J.; Hansen, M.; Riehn, L.; Sommer, G. Biologically inspired calibration-free adaptive saccade control of a binocular camera-head. Biol. Cybern. 1997, 77, 433–446. [Google Scholar] [CrossRef]
  25. Wang, X.; Van De Weem, J.; Jonker, P. An advanced active vision system imitating human eye movements. In Proceedings of the 2013 16th International Conference on Advanced Robotics, Montevideo, Uruguay, 25–29 November 2013; pp. 5–10. [Google Scholar]
  26. Antonelli, M.; Duran, A.J.; Chinellato, E.; Pobil, A.P. Adaptive saccade controller inspired by the primates’ cerebellum. In Proceedings of the IEEE International Conference on Robotics and Automation, Seattle, WA, USA, 26–30 May 2015; pp. 5048–5053. [Google Scholar]
  27. Robinson, D.A.; Gordon, J.L.; Gordon, S.E. A model of the smooth pursuit eye movement system. Biol. Cybern. 1986, 55, 43–57. [Google Scholar] [CrossRef] [PubMed]
  28. Brown, C. Gaze controls with interactions and delays. IEEE Trans. Syst. Man Cybern. 1990, 20, 518–527. [Google Scholar] [CrossRef]
  29. Deno, D.C.; Keller, E.L.; Crandall, W.F. Dynamical neural network organization of the visual pursuit system. IEEE Trans. Biomed. Eng. 1989, 36, 85–92. [Google Scholar] [CrossRef] [PubMed]
  30. Lunghi, F.; Lazzari, S.; Magenes, G. Neural adaptive predictor for visual tracking system. In Proceedings of the 20th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Hong Kong, China, 1 November 1998; Volume 20, pp. 1389–1392. [Google Scholar]
  31. Lee, W.J.; Galiana, H.L. An internally switched model of ocular tracking with prediction. IEEE Trans. Neural Syst. Rehabil. Eng. 2005, 13, 186–193. [Google Scholar] [CrossRef]
  32. Avni, O.; Borrelli, F.; Katzir, G.; Rivlin, E.; Rotstein, H. Scanning and tracking with independent cameras-a biologically motivated approach based on model predictive control. Auton. Robot. 2008, 24, 285–302. [Google Scholar] [CrossRef]
  33. Zhang, M.; Ma, X.; Qin, B.; Wang, G.; Guo, Y.; Xu, Z.; Wang, Y.; Li, Y. Information fusion control with time delay for smooth pursuit eye movement. Physiol. Rep. 2016, 4, e12775. [Google Scholar] [CrossRef]
  34. Santini, F.; Rucci, M. Active estimation of distance in a robotic system that replicates human eye movement. Robot. Auton. Syst. 2007, 55, 107–121. [Google Scholar] [CrossRef]
  35. Chinellato, E.; Antonelli, M.; Grzyb, B.J.; Del Pobil, A.P. Implicit sensorimotor mapping of the peripersonal space by gazing and reaching. IEEE Trans. Auton. Ment. Dev. 2011, 3, 43–53. [Google Scholar] [CrossRef]
  36. Song, Y.; Zhang, X. An active binocular integrated system for intelligent robot vision. In Proceedings of the IEEE International Conference on Intelligence and Security Informatics, Washington, DC, USA, 11–14 June 2012; pp. 48–53. [Google Scholar]
  37. Wang, Y.; Zhang, G.; Lang, H.; Zuo, B.; De Silva, C.W. A modified image-based visual servo controller with hybrid camera configuration for robust robotic grasping. Robot. Auton. Syst. 2014, 62, 1398–1407. [Google Scholar] [CrossRef]
  38. Lee, Y.C.; Lan, C.C.; Chu, C.Y.; Lai, C.M.; Chen, Y.J. A pan-tilt orienting mechanism with parallel axes of flexural actuation. IEEE-ASME Trans. Mechatron. 2013, 18, 1100–1112. [Google Scholar] [CrossRef]
  39. Wang, Q.; Zou, W.; Zhang, F.; Xu, D. Binocular initial location and extrinsic parameters real-time calculation for bionic eye system. In Proceedings of the 11th World Congress on Intelligent Control and Automation, Shenyang, China, 29 June–4 July 2014; pp. 74–80. [Google Scholar]
  40. Fan, D.; Liu, Y.Y.; Chen, X.P.; Meng, F.; Liu, X.L.; Ullah, Z.; Cheng, W.; Liu, Y.H.; Huang, Q. Eye gaze based 3D triangulation for robotic bionic eyes. Sensors 2020, 20, 5271. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Physical implementation of the robot system. (a) The front side. (b) The left side. (c) The right side.
Figure 1. Physical implementation of the robot system. (a) The front side. (b) The left side. (c) The right side.
Sensors 23 06299 g001
Figure 2. Robot system’s organization diagram.
Figure 2. Robot system’s organization diagram.
Sensors 23 06299 g002
Figure 3. Block diagram of the gaze point tracking control system.
Figure 3. Block diagram of the gaze point tracking control system.
Sensors 23 06299 g003
Figure 4. Robot coordinates system and system parameter definition, (a) coordinate system definition, (b) eye–head system parameters and (c) mobile robot parameters.
Figure 4. Robot coordinates system and system parameter definition, (a) coordinate system definition, (b) eye–head system parameters and (c) mobile robot parameters.
Sensors 23 06299 g004
Figure 5. Schematic of the relationship between a Cartesian point and its image point.
Figure 5. Schematic of the relationship between a Cartesian point and its image point.
Sensors 23 06299 g005
Figure 6. (a) Mechanical structure and coordinate systems of the bionic eye platform and (b) binocular 3D perception principle of bionic eyes.
Figure 6. (a) Mechanical structure and coordinate systems of the bionic eye platform and (b) binocular 3D perception principle of bionic eyes.
Sensors 23 06299 g006
Figure 7. Principle of head rotation calculation in fixation point tracking: (a) horizontal rotation angle and (b) vertical rotation angle.
Figure 7. Principle of head rotation calculation in fixation point tracking: (a) horizontal rotation angle and (b) vertical rotation angle.
Sensors 23 06299 g007
Figure 8. Steps for calculating the desired pose of the fixation point.
Figure 8. Steps for calculating the desired pose of the fixation point.
Sensors 23 06299 g008
Figure 9. Robot pose control block diagram.
Figure 9. Robot pose control block diagram.
Sensors 23 06299 g009
Figure 10. Experimental in situ gaze point tracking scene.
Figure 10. Experimental in situ gaze point tracking scene.
Sensors 23 06299 g010
Figure 11. Experimental results of gaze shifting to the target: (a) U coordinates of the target on the left image. (b) V coordinates of the target on the left image. (c) U coordinates of the target on the right image. (d) V coordinates of the target on the right image. (e) Left camera tilt. (f) Left camera pan. (g) Right camera tilt. (h) Right camera pan. (i) Head pan. (j) Angle deviation and rotation. (k) Coordinates (wx, wz) of the target in the world coordinate system. (l) Coordinates (ox, oz) of the target in the world coordinate system based on the origin location. The “+” in the subfigures (k,l) represents the position of the target in the coordinate system, and the “☆” represents the position of the robot in the coordinate system.
Figure 11. Experimental results of gaze shifting to the target: (a) U coordinates of the target on the left image. (b) V coordinates of the target on the left image. (c) U coordinates of the target on the right image. (d) V coordinates of the target on the right image. (e) Left camera tilt. (f) Left camera pan. (g) Right camera tilt. (h) Right camera pan. (i) Head pan. (j) Angle deviation and rotation. (k) Coordinates (wx, wz) of the target in the world coordinate system. (l) Coordinates (ox, oz) of the target in the world coordinate system based on the origin location. The “+” in the subfigures (k,l) represents the position of the target in the coordinate system, and the “☆” represents the position of the robot in the coordinate system.
Sensors 23 06299 g011
Figure 12. Experimental approaching gaze point tracking scene.
Figure 12. Experimental approaching gaze point tracking scene.
Sensors 23 06299 g012
Figure 13. Experimental results of gaze shifting to the target: (a) U coordinates of the target on the left image. (b) V coordinates of the target on the left image. (c) U coordinates of the target on the right image. (d) V coordinates of the target on the right image. (e) Left camera tilt. (f) Left camera pan. (g) Right camera tilt. (h) Right camera pan. (i) Head pan. (j) Angular deviation and rotation. (k) Coordinates (wx, wz) of the target in the world coordinate system. (l) Robot forward distance and the distance between the target and robot.
Figure 13. Experimental results of gaze shifting to the target: (a) U coordinates of the target on the left image. (b) V coordinates of the target on the left image. (c) U coordinates of the target on the right image. (d) V coordinates of the target on the right image. (e) Left camera tilt. (f) Left camera pan. (g) Right camera tilt. (h) Right camera pan. (i) Head pan. (j) Angular deviation and rotation. (k) Coordinates (wx, wz) of the target in the world coordinate system. (l) Robot forward distance and the distance between the target and robot.
Sensors 23 06299 g013
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Feng, X.; Wang, Q.; Cong, H.; Zhang, Y.; Qiu, M. Gaze Point Tracking Based on a Robotic Body–Head–Eye Coordination Method. Sensors 2023, 23, 6299. https://doi.org/10.3390/s23146299

AMA Style

Feng X, Wang Q, Cong H, Zhang Y, Qiu M. Gaze Point Tracking Based on a Robotic Body–Head–Eye Coordination Method. Sensors. 2023; 23(14):6299. https://doi.org/10.3390/s23146299

Chicago/Turabian Style

Feng, Xingyang, Qingbin Wang, Hua Cong, Yu Zhang, and Mianhao Qiu. 2023. "Gaze Point Tracking Based on a Robotic Body–Head–Eye Coordination Method" Sensors 23, no. 14: 6299. https://doi.org/10.3390/s23146299

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop