Visual Servoing for an Autonomous Hexarotor Using a Neural Network Based PID Controller

In recent years, unmanned aerial vehicles (UAVs) have gained significant attention. However, we face two major drawbacks when working with UAVs: high nonlinearities and unknown position in 3D space since it is not provided with on-board sensors that can measure its position with respect to a global coordinate system. In this paper, we present a real-time implementation of a servo control, integrating vision sensors, with a neural proportional integral derivative (PID), in order to develop an hexarotor image based visual servo control (IBVS) that knows the position of the robot by using a velocity vector as a reference to control the hexarotor position. This integration requires a tight coordination between control algorithms, models of the system to be controlled, sensors, hardware and software platforms and well-defined interfaces, to allow the real-time implementation, as well as the design of different processing stages with their respective communication architecture. All of these issues and others provoke the idea that real-time implementations can be considered as a difficult task. For the purpose of showing the effectiveness of the sensor integration and control algorithm to address these issues on a high nonlinear system with noisy sensors as cameras, experiments were performed on the Asctec Firefly on-board computer, including both simulation and experimenta results.


Introduction
The use of Unmanned Aerial Vehicles (UAVs) has been increased over the last few decades. UAVs have shown satisfactory flight and navigation capabilities, which are very important in applications like surveillance, mapping, search and rescue, etc. The ability to move freely in a 3D space represents a great advantage over ground vehicles, especially when the robot is supposed to travel long distances or move in dangerous environments, like in search and rescue tasks. Commonly, UAVs have four rotors; however, having more than four gives them a higher lifting capacity. The hexarotor has some advantages over the highly popular quadrotor, such as their increased load capacity, higher speed and safety, because the two extra rotors allow the UAV landing even if it loses one of the motors. However, the hexarotor is a highly nonlinear and under actuated system because it has fewer control inputs than degrees of freedom and its Lagrangian dynamics contain feedforward nonlinearities; in other words, there are some acceleration directions that can only be produced by a combination of the actuators.
In contrast with ground vehicles, it is not possible to use sensors like encoders to estimate its position. A good alternative is to use visual information as a reference, due to the high amount of information that a camera can provide in contrast with their low power consumption and low weight. Since it is not possible to know the position of a hexarotor with common on-board sensors such as Inertial Measurement Units (IMU), some works use off board sensor systems [1][2][3][4][5]; however, this kind of control limits the application to indoor navigation and adds noise and delays because of the communication between the robot and the ground station.
For such reason, visual control of UAVs has been widely performed. Although stereo vision is extensively used in mapping applications [6,7], when used in UAV navigation like in [8,9], it requires 3D reconstruction or optical flow, which are computationally expensive algorithms. In this approach, monocular vision is used, and the feature error position in the image coordinate plane is related with the robot velocity vector that reduces this error [10][11][12][13][14][15]. Consequently, we can set the position of the robot based on the camera information to control its navigation not only in indoor environments [16,17]. Classical Image Based Visual Servo (IBVS) control stabilizes attitude and position separately [18], which is not possible for underactuated systems. In [18], an Image Based approach is used for an underactuated system but approximating the depth distance to the features.
In [19], a PID controller is implemented on an hexarotor and comparisons between quaternions and Euler angles are made. In [20], the authors propose a visual servoing algorithm combined with a proportional derivative (PD) controller. However, PID approaches are not effective on highly nonlinear systems with model uncertainties such as the hexarotor [21,22]. According to this, another approach is required. In this paper, we propose the use of a Neural Network based PID. The advantages of using neural networks to control nonlinear systems are that the controller will have the adaptability and learning capabilities of the neural network [23], making the system able to adapt to actuator faults such as loss of effectiveness, which is described in [24] and solves disadvantages of the traditional PID [25] such as uncertainties of the system, communication time-delay, parametric uncertainties, external disturbances, actuator saturations and unmodeled system dynamics, among others. If to all of these issues we add the complexity to integrate servo control algorithms with vision sensors and a neural PID in a real-time implementation, it is required to have a well-designed coordination between all of the elements of this implementation, requiring different processing stages with their respective communication architecture (software and hardware).
The rest of the paper is structured as follows: Section 2 describes the robot and its dynamics. In Section 3, the visual servo control approach is introduced. In Section 5, the design of the PID controller and weights adjustment are shown. Section 4 presents the relationship between the error signals from the visual algorithm and the control signals of the hexarotor. Sections 6 and 7 present the simulation and experimental results of the proposed approach and its comparison with the conventional PID controller. Finally, the conclusions are given in Section 8.

Hexarotor Dynamic Modeling
The hexarotor consists of six arms connected symmetrically to the central hub. At the end of each arm, a propeller driven by a brushless Direct Current (DC) motor is attached. Each propeller produces an upward thrust and, since they are located outside the center of gravity, differential thrust is used to rotate the hexarotor. In addition, the rotation of the propellers also produces a torque in the opposite direction of the rotation of the motors; therefore, there must be two groups of rotors spinning in the opposite direction for the purpose of making this reaction torque equal to zero.
The pose of an hexarotor is given by its position ζ = [x, y, z] T and its orientation η = [φ, θ, ψ] T in the three Euler angles roll, pitch and yaw, respectively. For the sake of simplicity, sin(·) and cos(·) will be abbreviated s· and c·. The transformation from world frame O to body frame ( Figure 1) is given by The dynamic model of the robot expressed in the body frame in Newton-Euler formalism is obtained as in [26].
where I is the 3 × 3 inertia matrix; V the lineal speed vector and ω the body angular speed. The equations of motion for the helicopter of Figure 1 can be written as in [27] where ζ is the position vector, R the rotation matrix from the body frame to the world frame, R represents the rotation dynamics,ω represents the skew symmetric matrix, Ω is the rotor speed, I the body inertia, J r the rotor inertia, b is the thrust factor and τ is the torque applied to the body frame due to the rotors. Since we are dealing with an hexarotor, this torque vector differs from the well-known quadrotor torque vector and, if we are working with a structure like the one in Figure 2, it can be written as where l is the distance from the center of gravity of the robot to the rotor and b is the thrust factor. The full dynamic model isẍ where U 1 , U 2 , U 3 , U 4 and Ω represent the system inputs and in the case of the hexarotor are obtained as follows: where d is the drag factor.

Visual Servo Control
In this paper, we use an Image Based Visual Servo control approach and the eye-in-hand case. The camera is mounted on the robot and the movement of the hexarotor induces camera motion [28].
The purpose of the vision based control is to minimize the error where s is the vector of captured features and in the function of a vector of 2D points coordinates in the image plane, m(t) and a are the set of known parameters of the camera (e.g., camera intrinsic parameters). Vector s* contains the desired values. Since the error e(t) is defined on the image space and the robot moves in the 3D space, it is necessary to relate changes in the image features with the hexarotor displacement. The image Jacobian [29] (also known as interaction matrix) captures the relation between features and robot velocities as showṅ whereṡ is the variation of the features position, L s is the interaction matrix and v c = (v c ,ω c ) denotes the camera translational (v c ) and rotational (ω c ) velocities. Considering v c as the control input, we can try to ensure an exponential decrease of the error with where λ is a positive constant, L s ∈ R 6×k is the pseudo-inverse of L s , k is the number of features and e the feature error.
To calculate the interaction matrix, consider a 3D point X with coordinates (X, Y, Z) in the camera frame, the projected point in the image plane x with coordinates (x, y) is defined as where (u, v) are the coordinates of the point in the image space expressed in pixel units, (c u , c v ) are the coordinates of the principal point, α is the ratio of pixel dimensions and f the focal length. If we derive (10), we haveẋ The relation between a fixed 3D point and the camera spatial velocity is stated as follows: Then, we can write the derivatives of the 3D coordinates aṡ Substituting (13) in (11), we can state the pixel coordinates variation as follows: which can be writtenẋ with where Z is the actual distance from the vision sensor to the feature, for this reason, most of the IBVS algorithms need to approximate this depth. In our case, we use an RGB-D sensor and this distance is known. To control the six degrees of freedom (DoF), at least three points are necessary [28]. In that particular case, we would have three interaction matrices L x 1 , L x 2 , L x 3 , one for each feature, and the complete interaction matrix is now When using three points, there are some configurations for which L x is singular and four global minima [30]. More precisely, there are four poses for the camera such thatṡ = 0, and these four poses are impossible to differentiate [31]. With this in mind, it is usual to consider more points [28].
On the other hand, only one pose achieves s = s* when using four points. Moreover, we can use the pseudo-inverse or the transpose of the interaction matrix indistinctly to solve for v c in (15) [32,33].
In this paper, four points are used. In addition, our pattern does not move and because of the nature of the hexarotor, we suppose that the pattern will never be rotated since any rotation in roll or pitch will produce a translation. In other words, the hexarotor is an underactuated system, and it means there are some acceleration directions that can be only produced by a combination of the actuators. Most of the time, this is due to a lower number of actuators than degrees of freedom of the system; however, in the hexarotor, there is no actuator that can produce by itself a translational acceleration in the x and y directions. In consequence, it is not possible to have this kind of robot static and tilted at the same time, and, consequently, rotational velocities related to roll and pitch in v c are 0.

Control of Hexarotor
The hexarotor has four control inputs U i , U 1 , which represents the translation in the z-axis, U 2 , which represents the roll torque, U 3 the pitch, and U 4 represents the yaw torque. The visual algorithm act as a proportional controller, where λ in (9) works as a proportional gain. When combined with the Artificial Neural Network (ANN) based PID, we can adapt not only this proportional gain but also the derivative and the integral gains. Since the system is underactuated, we can use the translational velocities [ẋ,ẏ] computed by IBVS as input roll and pitch torques, and the error will be reduced. This is shown in Figure 3.
Velocities mapping . The output of the IBVS block is the velocity vector with angular velocity equal to 0 for roll and pitch since we assume the pattern will never be rotated in those angles. The output of IBVS control block consists of translational velocities v x , v y , v z , which must be mapped to U 2 , U 3 and U 1 , respectively, in order to be added to the PID-ANN output. These U i control actions will be traduced to translational displacement of the robot.
The velocity mapping block in Figure 3 traduces the velocities vector from IBVS to hexarotor inputs, i.e., v x = roll, v y = thrust, v z = pitch and ω ψ = yaw. In our case, ω φ = 0 and ω θ = 0, since we assume the pattern will never be rotated because it is an underactuated system.

Neural Network Based PID
Considering the control loop with unitary feedback as shown in Figure 4. Conventional digital Proportional-Integral-Derivative (PID) with unitary feedback is described in [34] and the control law is given by where E(z) is the error calculated as the difference between the reference signal and the system output R(z) − Y(z). The terms K P , K I , and K D are the proportional, integral and derivative gains, respectively. These gains are related as follows: where K is the gain, T is the sample time, T i is the integration time and T d the derivative time.
Applying the inverse Z-transform to (18), the PID sequence u(k) is given by Despite conventional PID being widely used to control these vehicles due to its simplicity and performance, it is not intended to control highly nonlinear systems, such as hexarotors.
In order to handle these nonlinearities, an ANN based PID controller is used. The purpose of the ANN is not only to deal with the nonlinear system, but also adjusting the PID controller gains to the end that it can also handle with uncertainties in the model. The topology of the PID-ANN used is shown in Figure 5. There is one module PID-ANN for every DoF. e(k) represents the error in that DoF, e i (k) is a vector that represents the error, derivative of the error and the integral of the error and η i is the learning rate, which must be heuristically selected to be small enough to avoid saturation of the neuron and not necessarily the same for every gain (proportional, derivative and integral). The Euclidean norm block normalizes the neuron weights w i in order to avoid divergence since the neuron weights are the PID controller gains. The output of the neuron is the control input u(k) that will be traduced as thrust (U 1 ), roll (U 2 ), pitch (U 3 ) or yaw (U 4 ), depending on the state error at the input of the ANN.
From Figure 5, the e i (k) vector represents the proportional error, the derivative of the error and the integral of the error. They are defined as follows: Accordingly, the control law of the conventional PID can be rewritten as u(k) = u(k − 1) + K P e 1 (k) + K I e 2 (k) + K D e 3 (k).
The Neuron input is defined as where vector w i (k) represents the weights of the network, which are incremented by ∆w i = η i e i (k) e (k) u (k) (24) with a learning factor η. The new value of w i (k) will be The Euclidean norm will be used to limit the values of w i (k) as The activation function of the neuron is the hyperbolic tangent; therefore, the output will be where A is a gain factor to escalate the maximum value of the activation function, which is between [−1, 1] and b is a scalar to avoid saturation of the neuron. The control law of the ANN based PID is expressed as follows: and there is one PID-ANN module for every U i to control in (6). U i has information of the rotor speed combination necessary to achieve a specific rotation, i.e., if the robot needs to move in more than one direction at the same time, there will be more than one U i with values different to zero.

Simulation Results
Simulations are implemented in Matlab (Matlab R2016a, The MathWorks Inc., Natick, MA, USA) using the Robotics Toolbox [35]. For the visual servo algorithm, four points are used. In the first experiment, the robot starts on the ground and has to reach a certain position given by these 2D points. With the intention of proving the algorithms, we simulate uncertainty of the system changing two parameters separately in two simulations.
In the first simulation, at the second 10, the mass of the robot has been increased 50%. It can be seen that conventional PID controller is unable to keep the position. The results are shown in Figure 6.
In the second simulation, the mass of the system remains constant, but the moment of inertia I x is increased from I x = 0.0820 to I x = 0.550. Figure 7 shows the results of using conventional PID when the moment of inertia is increased. The results show that the control input is excessively high to the robot (Figure 7b), making the system unable to follow the reference (Figure 7c).
In the following simulations, the PID-ANN is now controlling the system under the same conditions. The mass has been incremented at second 10. It can be seen in Figure 8 that the controller can be adapted to this mass increment and keep the reference.
Finally, we show, in Figure 9 the results of changing the moment of inertia I x while mass remains constant. In contrast with conventional PID, the PID-ANN is able to keep its position.

Experimental Results
The hexarotor used in the experiments is the Asctec Firefly (Ascending Technologies, Krailling, Germany). The actual configuration of the experiment is shown in Figure 10. The vision sensor has been changed, we use the Intel RealSense R200 camera (Intel, Santa Clara, CA, USA) with RGB and infrared depth sensing features and an indoor range from 0.4 m to 2.8 m. This change in the vision sensor will modify robot mass and moment of inertia. This uncertainty can be absorbed by the neural network.
Vision information is highly noisy and presents a high computational cost even when working with low resolution images (in this case, 640 × 480). The more time the algorithm uses in image capture and processing phase, the more error will exist between what the robot sees and the actual position. Computer vision algorithms, such as optical flow approaches, requires tracking of a set of n features using some kind of descriptor. Other approaches use stereo vision but that requires a 3D reconstruction. We propose to use only four points to reduce this time.
It is important to note that coordination between vision sensors, neural network and model system working at different processing stages and its communication at their respective architectures is crucial to achieve real-time implementation. A QR-code is used, and we track it with a Zbar bar code reader library. This pattern is chosen because of its robustness to rotations and illumination changes. The algorithms are implemented on the onboard computer of the hexarotor. For a first experiment, the moment of inertia and mass of the system changed and a previously tuned conventional PID controller will be compared with the proposed algorithm. Figure 11 shows results when the pattern is fixed at a certain position and the hexarotor is at hover position. As can be seen in Figure 11b, the conventional controller cannot achieve system stabilization at a fixed position when the model changes. Table 1 shows the Root Mean Square Error (RMSE) and the Average Absolute Deviation (AAD) in pixel units. The pair (x i , y i ) is the location of feature (i = 1, 2, 3, 4) in image coordinates.
In Figure 11b, the solid increasing lines represent the x position of the four features in image coordinates (pixel units). As can be seen, if a conventional PID is not correctly tuned for this specific system, its position diverges. On the other hand, when the system is controlled by the PID-ANN, its position does not diverge (Figure 11d) even when the controller has not been previously tuned.  Once ANN-PID has demonstrated its effectiveness over the PID controller, the experiment is repeated, but now the QR pattern has movement. As shown in Figure 12, the hexarotor does not lose sight of the objective.

Conclusions
In this paper, we propose a Neural Network based PID controller with visual feedback to control an hexarotor. The hexarotor is equipped with an RGB-D sensor that allows for estimating the feature error, this error has been used to compute the camera velocities. The proposed approach is able to deal with delays due to image processing, system uncertainties, noises and changes in the model since the ANN is continuously adapting the PID gains. In contrast with conventional PID controllers, where it is mandatory to tune it according to a specific system, the ANN can deal with nonlinearities and changes in the system.

Acknowledgments:
The authors thank the support of CONACYT Mexico, through Projects CB256769 and CB258068 (Project supported by Fondo Sectorial de Investigacion para la Educacion).
Author Contributions: All of the experiments reported in this paper have been designed and performed by Javier Gomez-Avila and Carlos Villaseñor. The data and results presented on this work were analyzed and validated by Carlos Lopez-Franco, Alma Y. Alanis and Nancy Arana-Daniel. All authors are credited for their contribution on the writing and edition of the presented manuscript.

Conflicts of Interest:
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Abbreviations
The following abbreviations are used in this manuscript: