A Study on Vision-Based Backstepping Control for a Target Tracking System

: This paper proposes a new method to control the pose of a camera mounted on a two-axis gimbal system for visual servoing applications. In these applications, the camera should be stable while its line-of-sight points at a target located within the camera’s ﬁeld of view. One of the most challenging aspects of these systems is the coupling in the gimbal kinematics as well as the imaging geometry. Such factors must be considered in the control system design process to achieve better control performances. The novelty of this study is that the couplings in both mechanism’s kinematics and imaging geometry are decoupled simultaneously by a new technique, so popular control methods can be easily implemented, and good tracking performances are obtained. The proposed control conﬁguration includes a calculation of the gimbal’s desired motion taking into account the coupling inﬂuence, and a control law derived by the backstepping procedure. Simulation and experimental studies were conducted, and their results validate the efﬁciency of the proposed control system. Moreover, comparison studies are conducted between the proposed control scheme, the image-based pointing control, and the decoupled control. This proves the superiority of the proposed approach that requires fewer measurements and results in smoother transient responses.


Introduction
Vision-based motion control refers to the use of computer vision as feedback to control the motion of a system. The topic is becoming more and more popular, thanks to the development of intelligent computer vision techniques. In general, the objective of visionbased control is to minimize the error between the actual image measurement of the visual features vector and its desired value. Numerous studies have reviewed several aspects of this topic, especially the two-part tutorial series by Chaumette and Hutchinson, [1,2]. Two control approaches are named depending on the value of the image measurement: image-based visual servo if the value is immediately available in the visual data, or positionbased visual servo if the value contains a set of 3D parameters estimated from the visual data. Obviously, the visual data are acquired from a camera or a set of cameras, which is either fixed in the workspace or directly mounted on the controlled system. Applications of visual servoing are becoming popular in robotics, where the pose of a robot manipulator end-effector is estimated and controlled with 6 degrees-of-freedom (DoF), such as in [3][4][5].
In outdoor applications, thanks to their ability to perceive the surrounding environment, camera systems have been widely applied for controlling the motion wheeled robots [6,7] and attitudes of unmanned vehicles [8][9][10]. Gimbaled mechanisms with multiple rotational axes are preferred to carry the camera if its pose needs controlling. Since a gimbal system can only control the angular motions of the camera (the number of controllable DoFs is equal to the number of independent gimbal channels), the control objectives are different from classical robotics. Instead, the fundamental goal is to direct the camera's line-of-sight (LOS) to a target and keep it in its field of view (FOV). Thus, two control actions are required: detecting the target in optical imagery and controlling the gimbal for target tracking. Additionally, the camera-based visual data can be used to control behaviors of relative systems, for example, a stereo camera unit carried by a gimbal on a vessel that keeps track of the target and measures the distance for a berthing aid system [11]. Although many aspects of visual servoing and gimbal motion control have been studied individually for decades ( [12][13][14]), the issue of vision-based control is seldom tackled.
To detect targets in optical imagery, an image tracker is required. In particular, the tracker detects the presence of a target in the camera's FOV and then tracks its projection in the image plane frame-by-frame. Thus, the tracker measures the error between the LOS orientation and target location. Image processing techniques and computer vision algorithms are implemented to fulfill these objectives. Traditional visual tracking approaches rely on deterministic feature search. For instance, the Kanade-Lucas-Tomasi tracker [15][16][17], that prefers corner features in an image patch, and the Continuously Adaptive Mean Shift (CAMSHIFT) algorithm in [18,19], which is based on color distribution within a video. However, they have difficulty tracking a target when it appears temporarily or when the background is contaminated. To cope with natural image changes, modern tracking methods are trained online with sample patches updated at every new detection. Especially, discriminative tracking methods, such as the Tracking-Learning-Detection (TLD) [20] and the Kernelized Correlation Filter (KCF) [21], have enhanced their performance significantly. However, limitations of the reliability and robustness of the tracker heavily affect the efficiency of the vision-based control system. For instance, when the transient response is fast, the image may become blurred, the visual tracking may simply fail to measure the location of the target, and thus, resulting in missing data samples. From a control engineering standpoint, the tracker is a sampled device [13], therefore, missing data samples generates delays in the tracking loop control, and the system stability may be affected.
To achieve the desired performances, the motion controller should overcome not only the tracker's limitations but also the nonlinearities in gimbal dynamics and the imaging geometry. Besides, the gimbal's kinematics are complex due to the intertwined trigonometric functions [12]. The system dynamics are highly nonlinear, the nonlinearities are mainly induced by unbalanced masses, parameter uncertainties, and torque disturbances ( [22][23][24][25][26][27]). Additionally, difficulties may also come from the imaging geometry couplings along with the gimbal system's lack of controllable directions and measurable variables [28,29]. Furthermore, there are non-ideal practical factors, such as delay time, measurement noise, and the camera shuttle speed.
Meanwhile, most of the available studies using visual data to control the gimbal are rather simple. For instance, decoupled approaches, where all the coupling terms are neglected, were used in [11,13,30]. In [31], desired orientations of the gimbal channels were calculated from their forward kinematics, and then a piece-wise linear controller was implemented. However, due to the nature of the underactuated gimbal system, these approaches are unable to achieve effective results. One of the significant studies in vision-based motion control is the one by Hurák et al. ( [28,29]). The authors proposed an image-based pointing controller designed based on the image-based visual servo approach, in which the camera's angular rates were the control inputs. The designed control law took into account the couplings and limitations of the gimbal channels, but at the same time, it required a lot of measurements. Besides, the stability of the designed control system is questionable, which may result in unwanted responses. In one of the latest studies related to the topic, X. Liu et al. [32] treated unknown imaging geometry, angular rate errors, and uncalibrated camera parameters as lumped disturbances. The authors proposed a combination of a disturbance observer and a model predictive controller to reduce the number of required measurements; disturbance rejection and good tracking performances were obtained. It is worth noting that any estimation from an observer is always lagging behind the true values for some time. Along with the delay in image streaming and computer vision processing, the observer may lead to performance deterioration and system instability. Therefore, in this paper, a new vision-based tracking control scheme for a two-axis gimbal system was designed. A visual camera served as a payload for the gimbal and provides visual information for the control system. This type of system is mainly used in aerial surveillance. The control objective was to bring a target to the center of the image plane with zero steady-state error and smooth transient response so that effective visual tracking was achieved. To fulfill these tasks, the proposed control scheme was designed following these steps. First, the complete model of the system was derived, where the gimbaled mechanism dynamics, imaging kinematics, and the actuation model were all taken into consideration. Secondly, the required rotation angles to track a predefined target were determined from a new perspective with the decoupling effects were fully isolated. Then, the proposed controller was designed based on the backstepping procedure ensuring the system stability and tracking performances. Finally, simulations and experiments were conducted for validation. The proposed controller requires fewer measurements, and it ought to perform better than the benchmark controller in Hurák's research. In short, the contributions of the paper are summarized as follows:

•
A complete model is derived representing the target tracking system with the gimbaled mechanism and image measurements. • A new control scheme for visual servoing systems is proposed. The novelty of this approach is that the couplings in both the gimbal kinematics and imaging geometry are decoupled using a new technique, namely the calculation of additional orientation. Then, the vision-based target tracking system can be expressed with recursive structures of separate SISO systems. Therefore, conventional control schemes can be easily implemented.

•
The stability of the closed-loop system is analyzed. Simulation and experimental results are presented and discussed; thus, the effectiveness of the proposed system is validated.
Accordingly, the remainder of the paper is organized as follows. Characteristics of the gimbal motion and the imaging geometry are analyzed and presented in Section 2. Then, to improve the control performance of gimbal systems, a new control scheme is introduced in Section 3. Section 4 details the implementation of the visual tracking algorithm KCF for the image tracker. Simulation and experimental comparison studies were conducted, and the results are presented in Section 4. Finally, conclusions are drawn in Section 5.

System Model
In this study, a two-axis gimbal system mounted on an aerial vehicle is considered. The camera's aperture was assumed to coincide with the rotational center of the inner channel. Let us assign the coordinate system of five frames for the system as follows. First, a coordinate frame (B) was fixed to the platform carrying the gimbal with its Z-axis pointing upward and X-axis pointing forward. This frame was brought into the rotational center of the tilt channel by the three dashed orthogonal vectors, as depicted in Figure 1. Two more frames, namely (P) and (T), were associated with the pan and tilt channels, respectively. Y p -and Y t -axes of the P and T frames coincided, while the Z p -axis of the P frame coincided with the Z-axis of frame B. The camera was carried by the tilt channel, and so the camera frame (C) was fixed to frame T. However, its Z c -axis was now denoting the camera's LOS, as illustrated by the coordinate OX c Y c Z c in Figure 1. Finally, the image coordinate frame was at an appropriate focal position along the optical axis from the origin of the camera frame. The planar orthogonal to the LOS is called the image plane, in which the x-and y-axes are the image coordinates, as shown in Figure 1.

System Kinematics
The following kinematic relation of the gimbal system is the result of the transformation from frame B to frame T, and it is given by: where ω t = [ω tx ω ty ω tz ] T and ω b = [ ω bx ω by ω bz ] T are the angular rate vectors of the tilt channel and the platform, respectively. ψ is the relative rotation between the outer gimbal and the platform about the Z-axis, whereas θ is the relative motion between the inner and the outer gimbal about the Y p -axis, while ω ψ and ω θ are their respective rates. The rotation between the camera frame C and the tilt channel frame T is given by the following rotation matrix: Pre-multiplying Equation (1) by R T C results in the transformation from the base to the camera frame. Similarly, by a translation equal to the lens focal length along the Z c -axis, the kinematics of the image frame is thus obtained. However, it is preferable to consider the relationship between the projected point on the image plane and the gimbal motion.
In the pinhole camera model, the camera aperture can be described as a point and no lenses were used to focus light. That is, a first-order approximation of the mapping from an object in the three-dimensional space to its projection on the image plane is obtained. If the location of a 3-D point in the camera frame is given by a vector P = [X c Y c Z c ] T , and its projection on the image plane is p = [x y] T , the pinhole camera model yields the following projective transformation: x with f is the focal distance. Time-derivatives of x and y result in the velocities of the projected point, which highly depends on the focal length, the motion of the observed target, and the spatial velocities of the camera. Assume that the focal length is constant, then the time variation of the projected point is related to the spatial motion of the camera through an interaction matrix ( [1,29]). Considering the fixed rotation between the camera frame C and the tilt channel frame T (Equation (2)), the following equation is derived: where L is the interaction matrix between the camera motions and the projected point. υ t is the spatial velocity vector of the tilt channel, which includes the instantaneous linear velocity of the tilt frame origin ν t = [ v tx v ty v tz ] T and ω t the angular velocity. e d is the term expressing the influence of the target motion on the velocity of the projected point. The matrix and vectors above mentioned are given by: The last three columns in the interaction matrix represent the relation between the gimbal orientation and the motion of the projected point. Equations (1) and (5) show the total kinematic coupling between the gimbal motion and the projection in the image plane. The first row of Equation (1) indicates a redundant roll motion generated by the two other gimbal orientations, and the third column in Equation (5) reveals the influence of this motion on the projection onto the image plane. The third row of Equation (1), especially, shows that the required motion of the outer gimbal increases in proportion with the secant of the angle θ. Therefore, the gimbal mechanism cannot be globally asymptotically stabilized because when the angle θ approaches 90 [deg] or −90 [deg], the gimbal is completely out of control (this phenomenon is known as the gimbal lock, which has been thoroughly analyzed in [12,14]). Common solutions for this problem are feedforward approaches, additional channels, or safety mechanisms to prevent the tilt channel from reaching 90 [deg] or −90 [deg]. The two-axis gimbal system used for this study is equipped with the latter.

System Dynamics and Actuation Model
When applying Euler's equation of motion for rigid body dynamics, the torque relationships of the inner and outer gimbals are derived as: The subscript t denotes the tilt channel, while subscript p is used to denote the pan channel. J is the inertia matrix, T is the applied torque, and T t/p is the tilt gimbal's torque as observed from the coordinate frame of the pan gimbal. Equation (6) represents the dynamics of both channels expressed in their respective coordinate axes. As aforementioned, the pan channel rotates about the Z-axis, whereas the tilt rotates about the Y p -axis. Thus, only two motion equations are considered: The external torques acting on each channel consist of the driving torque (T ty and T pz ) and the friction torques. The latter is assumed to be proportional to the relative speed, and the coefficients of proportionality are K t and K p . In a typical gimbal system, the driving torques are generated by the stabilization controllers whose references are generated by the tracking loops ( [12,13,20]). The actuators in this study, in particular, are servo systems: each one of them has an integrated controller responsible for speed control. The closed-loop configuration of this structure is illustrated in Figure 2, and it is used as the stabilization loop for the gimbal control system. Then, the tracking loop model is derived taking into consideration both the system dynamics and the speed control loops. Proportional controllers for speed control are used in the stabilization loops, thus the driving torques are: where P t and P p are the proportional control gains, ω θd and ω ψd are the desired rate commands, d t and d p are the matched disturbances due to measurement errors, actuator saturation, etc. ω θd and ω ψd are now the control signals that the tracking controllers compute. From Equations (1), (4), (7), and (8), the system model is rearranged and given by: where ϕ = [ ϕ ty ϕ tz ] T is the angular position of the tilt and pan channels, and ω = [ ω ty ω tz ] T is the corresponding angular rates vector. L ω1 , L ω2 and L v are elements of the interaction matrix L compatible with ω, ω tx and ν y , respectively. u = [ ω θd ω ψd ] T is the rate commands vector, B = diag{B 1 , B 2 } and K = diag{K 1 , K 2 } are system parameter matrices, and d = [d 1 d 2 ] T is the vector of unpredictable disturbances. The L matrix components and the system parameter matrices are respectively expressed as: and the elements of the disturbances vector are: θ cos θ − . θ cos 2 θ ω tz − ω px sin θ +(J tz − J ty )ω ty ω tz sin θ cos θ − cos θ(J py − J px )ω px ω py − (J ty − J tx )ω tx ω ty cos 2 θ + d p − cos θP p ω tz (12) B is nonsingular within the operating range of the system, thus invertible. Neglecting the disturbances term, the proportional gains of the integrated controllers should be chosen such that K = B. Thus, the actuation system-consisting of the gimbaled mechanism and the two integrated actuators-works as a low pass filter ensuring the actual speeds are the filtered version of the command rates.

Vision-Based Tracking Control System
In general, the mapping from the 3D to 2D coordinates using the pinhole camera model allows the determination of the target location in the camera coordinates, as in Equation (3). Then, using the inverse tangent function, the desired rotations of the gimbal camera are derived. Nevertheless, Equations (1) and (5) show the unwanted effects of the motion of one actuator on the other orientations. In this section, the coupling effect in visual servoing is analyzed from a new viewpoint, where the required rotation of the gimbal camera is fully determined. Then, the proposed controller is designed based on the backstepping procedure. Due to many practical limitations, the actuation is not an ideal low pass filter. The backstepping technique is well known to be effective in dealing with uncertainties, hence ensures system performance and stability. On the other hand, an image-based pointing and tracking control scheme ( [28,29]), and a decoupled approach [13] are implemented for a comparison study.

Calculation of Additional Orientation
Let ϕ a = [ ϕ tya ϕ tza ] T be the additional angles about Y t -and Z t -axes that the inner gimbal needs to point at the predefined target. This value is known as the difference between the angular positions of the target and the camera's LOS. The computer vision tracker defines ϕ a from the location of the projected point in the image plane. Thus, when the LOS is pointing at the target, its projection is at the center of the image plane, that is ϕ a = [ 0 0 ] T . Otherwise, the tilt and/or pan motions of the gimbal are needed.
In Figure 3, point A expressed by the vector [x A y A ] T is the projection of the target onto the horizontal axis of the image plane. From the pinhole camera model, the image plane is tangent to a spherical surface whose radius equals the focal length of the camera. Moreover, the image plane's center point is the intersection point of the LOS and the spherical surface. Besides, the relative angle between the tilt and the pan gimbals along the Y p -axis is θ. This makes the rotation planes of the pan gimbal and the camera different. Thus, the additional rotation about the camera's Y c -axis brings the center of the image plane to point A. On the other hand, the pan motion makes the image plane's center point below point A, hence a tilt motion is required even though y A = 0.
As shown in Figure 3, the additional orientation of the camera's plane is easily obtained with: From the coordinate allocation, this orientation is equivalent to the rotation about the Z t -axis of the tilt channel frame.
while the additional rotation about the Y t -axis of the tilt channel is required to be as the following: ϕ tya = θ − arcsin(cos ϕ a sin θ) The third subscript a of the left terms in Equations (14) and (15) indicates the additional rotations, while the first and second subscripts denote the channels as well as the axis in which the values are measured. In general, when the target is expressed by its position p = [x y] T in the image plane, the rotation angles of both channels are computed as:

Backstepping Controller Design
For a target located within the camera's FOV and the gimbal's operating range, the control objective is to bring its projection to the center of the image frame, which is equivalent to [ϕ tya ϕ tza ] T → [ 0 0 ] T . Note that ϕ a = ϕ − ϕ d is the difference between ϕ and ϕ d which are the angular position of the tilt channel and the target position that the LOS should point at, respectively. Interestingly, the value of ϕ d is not only dependent on the location of the target, but also the translational motion of the camera. However, due to the characteristics of outdoor visual servoing applications, it is reasonable to assume that ϕ d is a slow time-varying value, i.e., . ϕ d ≈ 0. Besides, the additional angular rate is defined as follows: Accordingly, the following control law is proposed: Λ and Γ are positive definite diagonal matrices. Equation (18) gives the rate command vector to send to the gimbal system. Note that the inverse of matrix B contains a secant gain correction, 1/cos θ. The proposed control scheme is illustrated in Figure 4. To analyze the system stability of the proposed control scheme, let us consider a positive definite function and its time-derivative as follows: By substituting Equation (20) and the control law in Equation (18) into the expression of . V, we obtain: According to Young's inequality for products: Applying the inequality (22) to Equation (21) results in: δ is an arbitrary value, which will be chosen such that Γ − 1 2δ 2 I is positive definite (where I is the identity matrix). Hence, the system is proved to be input-to-state stable (ISS) with V an ISS-Lyapunov function for the system, [33]. That is, the system is asymptotically stable without the disturbances, and the control errors are bounded if the disturbances are bounded.

Image-Based Pointing Control
The image-based pointing and tracking control implemented in this section has been previously discussed in the work of Hurák et al., [28,29]. The objective of the pointingtracking controller is to set the reference rates for the motion system. A classical imagebased motion control uses the inverse of the interaction matrix to ensure a decoupled decrease of the error between the current value of the visual feature and its desired value.
Since only the rotations are controllable in the gimbal system, Equation (4) can be rewritten as follows: where L v contains the first three columns and L ω the last three columns of the interaction matrix L in Equation (5). Thus, the reference rates of the tilt channel are computed as: The first term guarantees the asymptotically stable projection, where α is the exponential rate and L * ω is the right pseudoinverse of L ω . However, since the two-axis gimbal can only control the pitch and yaw motions, the parameter k is chosen such that the reference rate for the roll motion is equal to the measured velocity ω tx . From Hurák et al. ([14,15]), k is given by: Then, from Equation (25), the reference rates for the inner gimbal orientations are given as: From the system kinematics in Equation (1), the rate commands for the two gimbal channels are: Generally, the distance to the target and the gimbal translations and rotations are required to generate the control law. In the case of no translational motions, the third terms of the reference vector elements in Equation (27) are equal to zero. Thus, the control law depends only on the rotations of the gimbal. However, the stability of the system remains questionable since disturbances are neglected and k is chosen as a function of both the camera motion and the projection's location.
Additionally, the simplistic decoupled controller studied in [13] is implemented for comparison. This control scheme uses a single-input-single-output loop for every channel of the gimbal. Each channel receives the corresponding vertical or horizontal locations of the target in the image plane. That is:

Implementation
In this study, the image tracker uses the KCF algorithm. This tracker initializes with a target patch cropped from the target location in the initial frame and gives back its location in each frame of the sequence. The details of the KCF algorithm are presented in [21]. In this paper, a combination of the KCF tracker and a Gaussian kernel working on the histogram of oriented gradients (HOG) was implemented.
In the following simulations and experiments, the sampling time of the control systems and the image tracker was set at 0.05 [s]. From the camera to the control system, the image sequences were streamed using the real-time streaming protocol (RTSP) with a resolution of 640 × 360 (pixel) and a framerate of 20 (fps). The image streaming was significantly influenced by the communication delays-the tracker received an image that was a delayed version of the one taken by the camera. With our experimental apparatus, the average overall delay time was 0.3 [s], and in the worst case, the delay was 0.45 [s]. Besides, motion sensors and gimbal actuators were also connected to the controller via an RS-232 serial port. However, the resulting delay time was relatively small compared to the image streaming delay, so it was neglected. The detailed control parameters are shown in Table 1. A fast response was not the priority when tuning parameters due to the limitations of the camera's shutter speed. All the controllers were tuned such that from the zero orientation, the controllers could bring a projected point from an arbitrary location to the center of the image plane with a linear tracking path and reasonable rotational speeds.

Simulation Studies
Control systems were simulated with a target located out of the center of the image plane. A comparison between the performance of the three control schemes was carried out. Moreover, to highlight the coupling effects of the gimbal mechanism, the tilt angle was initially set at 40 [deg] In the first simulation, the target projection is on the horizontal line of the image plane, while the target is located at the bottom-left corner in the second simulation test. Ideally, the controllers should follow a linear path in the image plane since it is the shortest way, and the response in both directions should converge smoothly to zero at the same time with no overshoot. The following figures, Figures 5 and 6, show these simulation results. In each figure, the responses with the proposed controller (Equation (18)), the image-based pointing controller (Equations (27) and (28)), and the decoupled control system (Equation (29)) are plotted and compared. Although the unit of x and y is (m), in the following figures they are displayed in [pixel], which corresponds to the displayed images in the user interface. Converting from one unit to the other is simply done by the multiplication of pixel dimensions. All the control actions take place from the 5th second of the simulation time.
Both simulations show that all the controllers struggle to follow a linear path in the image plane. The main reason is the association of time delay and kinematic couplings acting as disturbances on the system. In the first simulation, the initial control signal from the decoupled control scheme is 0, as shown in Figure 5c. Since the decoupled controller cannot anticipate the influence of the coupling at a nonzero tilt angle, it leads to a high peak of the y-coordinate of the projected point, which results in a bold curve in the image plane. In contrast, nonzero control signals are generated by the proposed controller and the image-based pointing controller; however, a great overshoot can be seen in the response of the latter (Figure 5b,d). The image-based pointing controller is sensitive to disturbances and uncertainties, while the backstepping technique is well known to be robust against them. Thus, the proposed controller performs well with an almost linear tracking path and smooth transient gimbal motions.  The results of the second simulation provide similar findings, that is the proposed controller performed the best among the three control schemes. For instance, an overshoot is recorded in the response of the decoupled controller, and non-smooth motions are obtained with the image-based pointing controller. Meanwhile, good step responses in both directions of the image plane are achieved with the proposed controller.

Experiments
The experimental studies were conducted with the 2-axis gimbal system illustrated in Figure 7. In this section, the experimental results of several scenarios are presented. Three studies were conducted in total. The first two were similar to the above-mentioned simulations, and the third scenario was a vision-based tracking in the presence of disturbances. The following figures, Figures 8-10, illustrate their results. The system specifications and the control parameters were addressed in the previous section. The frame-by-frame visualization of the three controllers' tracking performances is shown in Figure 8a. The horizontal green line indicates the horizontal axis, which cuts through the center of the image (the red circle). The position of the tracked target is represented by its bounding box (the red square) on the image frame. Although all the controllers could bring the target's projection to the center of the image (frame 200), the orientation of the pan actuator created a curvilinear path in the image plane resulting in an upward motion at the beginning of the control action (frame 112). The decoupled control performed worst since the LOS projection continued to deviate further away from the horizontal axis. Both tracking paths resulting from the two other controllers were close to the horizontal line, and the performance of the proposed controller remained better. Meanwhile, with the image-based pointing controller, the location of the projection went above and below the horizontal line indicating the overaction of the control system. The projection's paths on the image frame as shown in Figure 8b and the time responses along the Y t -axis direction in Figure 8c prove this remark. Moreover, the final angular positions of the inner gimbal were not the same even though all the image errors were converging (Figure 8e). The two vector elements in Equations (4) and (5) were determined by three angular rates of the camera's motion (without the translational motion). Since only two rotations were controllable, there were infinite combinations to bring a projected point to the center of the image plane.   Furthermore, Figure 9 shows that in the second experiment the proposed controller provided the smoothest response among the three control schemes. Although its tracking path was curvilinear (Figure 9a), the transient responses were good in both directions (Figure 9b). Slow response in the horizontal direction and an overshoot in the vertical direction were seen in the performance of the decoupled controller. Moreover, the imagebased pointing controller generated bumpy actions in the Y t -axis direction despite the good horizontal response. On the other hand, delay systems usually have bad reputations regarding system stability and performance. The delay puts a fundamental limitation on the achievable bandwidth of the system, which then recommended not to exceed π/2T or 2/T, with T is the delay time ( [34,35]). In the third experiment, the system needed to track a target initially situated at a corner of the image plan from zero initial tilt orientation. Disturbances in experiment 3 were induced by a manual rotation of the gimbal platform. The disturbances were a combination of high frequency damped sinusoidal oscillations and a continuously increasing rotation. Experimental results in Figure 10 show that all controllers could effectively attenuate the latter, but not the former due to limited bandwidths. In practical scenarios when the gimbal system needs to track a moving target, the unpredicted target motion will result in an additional term in the time derivative of its projection (Equation (4)). This can be considered as a disturbance to the system, for which the proposed controller can still be effective if it is within the controllable bandwidth. Thus, the LOS follows the motion of the target estimated by the image tracker.

Conclusions
In this paper, a novel control scheme was designed and implemented for visual servoing with a two-axis gimbal mechanism. The design procedure was presented systematically, with the couplings in the gimbal kinematics as well as the imaging geometry were addressed as the main difficulties. To overcome these difficulties, the proposed controller computes the required motions from the visual data and then generates the control signals using the backstepping technique. In simulation and experimental studies, the proposed controller was challenged by the non-ideal specifications of the system through different tracking scenarios. Hence, its effectiveness was validated, and its superiority was proved in comparison with other control schemes. On the other hand, practical limitations of the experimental apparatus were also highlighted.
Visual servoing is still a wide-open research field with the purpose of enhancing the reliability of visual tracking algorithms and system robustness. Moreover, deep learning and 3D imaging are bringing new perspectives and rapid changes to computer vision. Meanwhile, vision-based motion control systems will still be affected by moving targets, disturbances, and delays. For gimbal-based tracking systems, in particular, the effects of the target motion, as well as the disturbances, are complicated because they are highly dependent on the area of application. Therefore, predicting the target motion, attenuating the disturbances, and diminishing the time delay influence are control objectives that require further studies. In future works, the enhancement of the visual servoing controller is necessary to keep up with the developing visual tracking algorithms, whereas the system performance and robustness while tracking moving targets and facing external disturbances should be improved.