Neural PD Controller for an Unmanned Aerial Vehicle Trained with Extended Kalman Filter

: Flying robots have gained great interest because of their numerous applications. For this reason, the control of Unmanned Aerial Vehicles (UAVs) is one of the most important challenges in mobile robotics. These kinds of robots are commonly controlled with Proportional-Integral-Derivative (PID) controllers; however, traditional linear controllers have limitations when controlling highly nonlinear and uncertain systems such as UAVs. In this paper, a control scheme for the pose of a quadrotor is presented. The scheme presented has the behavior of a PD controller and it is based on a Multilayer Perceptron trained with an Extended Kalman Filter. The Neural Network is trained online in order to ensure adaptation to changes in the presence of dynamics and uncertainties. The control scheme is tested in real time experiments in order to show its effectiveness


Introduction
Unmanned Aerial Vehicles (UAVs) have become very popular thanks to recent progress in propulsion technologies and small sensors with low power consumption [1]. These kinds of vehicles surpass other types of robotics platforms in both military [2] and civilian applications [3], including surveillance, agriculture, traffic monitoring, fire detection and high social impact activities, such as search and rescue in disaster zones. Figure 1 shows the classification of several types of aerial vehicles based on [4]. This paper is focused on Vertical Take-Off and Landing (VTOL) vehicles, such as multirotors.  VTOL vehicles are more cost-efficient than bigger drones like Medium Altitude Long Endurance (MALE) or High Altitude Long Endurance (HALE) and present the ability to hover above a reference position [5]. In contrast, multirotors have limited energy consumption and payload; consequently, light

Related Work
Multirotors are commonly controlled by PID [21]. PID controllers have been widely used in industry, mainly because of the trade-off between efficiency and simplicity [22], and numerous offline tuning techniques have been reported in the literature [19]. The main problem with conventional approaches is that physical systems present parametric changes and uncertainties, and they are perturbed by external disturbances; consequently, an online, continuous tuning approach is needed.
Another common approach is the Linear Quadratic Regulator (LQR) for attitude stabilization [23][24][25]. While this may be applicable for some configurations, multirotors are non-linear systems that present uncertainties, such as unknown physical parameters, actuator degradation, unknown delays in-process communication and unmodeled dynamics [26]. Hence, an approach considering these characteristics must be used [27,28]. Nonlinear control techniques provide better performance [29], and one of the most applied methods is feedback linearization, which relies on cancellation of all nonlinearities to convert the non-linear system into a system with linear dynamics [30][31][32]. In [33][34][35], the authors use a backstepping control, which is designed to stabilize the whole system based on the Lyapunov stability theory, showing good results. However, backstepping requires the analytic calculation of the partial derivatives of the stabilizing functions, which becomes impractical as the order of the system grows [36], and in general, most of the nonlinear techniques require complete knowledge of the nonlinearities present in the plant and are vulnerable to modeling errors or parametric uncertainty [29]. In this paper, an adaptive controller based on an Artificial Neural Network (ANN) is proposed. The ANN has been used to control complex nonlinear systems [9,[37][38][39]. The controller used in this scheme has the same simple structure and easy implementation of a PD controller but with the adaptability and learning capabilities of a neural network [9]. The system will be able to adapt to actuator faults, such as loss of effectiveness [40] and solve the principal disadvantages of traditional PID [41].
On the other hand, there is no onboard sensor to read the absolute position of the UAV. There are two common solutions to solve this; the first one consists of an external motion detection system, which has to be mounted, thereby limiting the applications to known indoor environments.
The second approach is based on solving the SLAM problem. For this, different sensors can be used, such as laser range scanners [42], stereo vision systems [43,44], RGB-Depth sensors [45][46][47] and monocular cameras [48][49][50]. In this work, a monocular vision system is preferred because of its lower power consumption and compact size, compared to the amount of information it delivers. The advantage is that the range of a camera is virtually unlimited [6], making it possible flying in both small and large environments. Despite the advantages of monocular vision, it is impossible to determine the scale of the environment using only one view, and it is necessary to fuse this information with inertial information provided by an Inertial Measurement Unit (IMU). To solve this, Parallel Tracking and Mapping (PTAM) for robot localization (introduced in [6,51,52]) will be implemented.

Multirotor Dynamic Model
The multirotor used for this work was the quadcopter. There are two principal configurations for quadcopters; in this case, the configuration selected is described in Figure 2. The robotic platform where the algorithm was tested is the Parrot AR.Drone 2.0 R quadrotor. Multirotor systems have an even number of rotors divided into two groups rotating in opposite directions. The configuration of the selected platform is depicted in Figure 2. For this specific configuration, where the robot structure does not match with the x and y axis from the body frame ( Figure 3), the movement of the vehicle is given by the following combination of rotor actions: increasing or decreasing the speed of the four rotors in the same proportion changes the altitude of the system. Then, because the multirotor is an underactuated system, there is no actuator that generates movement in x and y directions directly; instead, these displacements are achieved by changes in the attitude due to combinations of the two pairs of propellers: the rotation in y axis (pitch θ) results in translational movement in x; contrarily, a rotation in x (roll φ) results in translational movement in y. Similarly, the orientation (yaw ψ) needs a combination of the four propellers and it is the result of the difference of the counter-torque between both pairs. In general, the center of mass is considered to be at the origin of the body fixed frame. The quadrotor orientation in space is given by a rotation matrix R ∈ SO(3) from the quadrotor B to the inertial frame E. As in [4], the dynamics of the quadrotor may be expressed as follows where I is the inertia matrix, V is the body linear speed vector and ω is the angular speed. The quadrotor equations of motion can be expressed aṡ where the gravity g is acting on z axis (e 3 ),ζ is the linear velocity vector, the rotation matrix is represented by R,ω is the skew symmetric matrix of the vector of angular velocity and Ω i represents the speed of rotor i. I and J r are the body and the rotor inertia respectively. m is the mass of the system, b is the thrust factor, l is the distance from the center of mass to the rotor (it is assumed the same distance to all rotors) and τ a represents the torque applied to the quadrotor. For this configuration, τ a is denoted as follows with a drag factor d.
The quadrotor dynamics, as described in [4], are given bÿ where Ω represents the gyroscopic effects induced by the propellers, which are usually neglected [53], and they are given by U i are the system inputs: U 1 is related to its translation and U 2 to U 4 are related to its attitude and orientation. The relation between both subsystems can be seen in Figure 4. The inputs U i for this specific configuration of multirotor are given by and they are calculated by the neural network. In this work, a Multilayer Perceptron trained with the Extended Kalman Filter is selected.

MLP Trained with the EKF
The architecture of an ANN was inspired by biological neurons. Like synaptic connection in a regular biological neuron, each node in an ANN receives a signal from an adjacent node and its output is computed by some nonlinear function of the sum of the inputs. These connections between adjacent neurons have weights that adjusts as the network learns [19].
The training process of the MLP can be seen as an estimation problem for a nonlinear system that can be solved with the EKF [54][55][56]. Neural networks trained with the EKF have demonstrated faster learning speeds and convergence times than networks trained with algorithms based on backpropagation, which is ideal for real time applications [57,58]. The objective is to find the optimal weights that minimize the prediction error [59].
Consider an MLP with L weights and m output nodes. The neural network can be modeled as follows where w(k) is the state vector, u(k) is the input vector,ŷ is the output vector and h is the nonlinear output function. From Kalman Filter equations, it is known that where L is the total number of weights, η is the learning rate, P(k) ∈ L×L and P(k + 1) ∈ L×L are the prediction error covariance matrices in k and k + 1. K(k) ∈ L×m is the Kalman gain matrix. y ∈ m is the system output, andŷ is the network output. Q ∈ L×L is the process noise covariance matrix, and R ∈ m×m represents the measurement covariance error. w ∈ L is the weights (state) vector, and H ∈ m×L contains the partial derivatives of each output of the neural networkŷ with respect to each weight w j and it is defined as Consider the MLP in Figure 5 with one hidden layer and one node at the output layer, where p denotes the number of inputs to the network and h the number of nodes in the hidden layer. The output of the neural network is defined by Finally, vector H can be expressed as Let us define a new variable γ as follows then, the vector H for the MLP shown in Figure 5 with sigmoid activation functions for the hidden layers and linear function for the output node can be expressed as follows The network designed for this work has two inputs which are the error and the derivative of the error between the desired pose and the pose of the system which is calculated using monocular visual odometry.

Monocular Visual Odometry
Multirotors are equipped with inertial sensors which are capable of measuring the attitude and orientation of the system. Unfortunately, most of the quadrotors do not have any onboard sensors to measure position in indoor environments; in contrast, position is usually measured by a GPS sensor which has an error between 1 and 5 meters depending on the quality of the GPS signal.
This approach is based on Parallel Tracking and Mapping (PTAM) algorithm [60] to solve the localization problem. The algorithm is named this way because tracking and mapping are separated and run in parallel; it creates a keyframe based map using Bundle Adjustment (BA). This map is initialized from a stereo pair, and new points are initialized with epipolar searches.
It is well known that the scale of the environment cannot be determined using only monocular vision [52]. Once the map is initialized, the visual map is rotated such that the xy plane corresponds to the horizontal plane according to the accelerometer, and it is scaled with an average keypoint depth of 1. During the tracking, the scale of the map λ ∈ R must be estimated as in [52].
The quadrotor measures in regular intervals, the distance traveled according to the visual SLAM x i ∈ R d , and uses the metric sensors available, denoted by y i ∈ R d . To each interval, a pair (x i , y i ) is given, where x i is scaled according to the visual map and y i is in metric units. Both x i and y i measure motion of the UAV, and they are related by x i ≈ λy i . If Gaussian noise in the measurements is assumed, then a set of sample pairs {(x 1 , y 1 ) · · · (x n , y n )} is given with where µ 1 · · · µ i ∈ R 2 are the unknown distances, σ 2 x , σ 2 y ∈ R + are the variances of the measurements error and I is the identity matrix of dimension d. To estimate the unknown parameters, maximum-likelihood estimation is used, minimizing the negative log-likelihood.
The global minimum of (23) is unique; its derivation can be found in [52,61], and it is stated as follows λ * = s xx − s yy + sign(s xy ) (s xx − s yy ) 2 + 4s 2 xy 2σ −1 x σ y s xy (25) with Assuming σ 2 x and σ 2 y are known, (25) gives a closed solution for the estimation of λ. For the estimation of the measurement variances, the authors refer to [52]. To generate the sample pairs, for each visual altitude measurement a v (t i ) ∈ R there is a corresponding metric altitude measurement a m (t i ) ∈ R over a window of sensor measurements, and then a visual and metric distance traveled within a period is computed as follows.
For visual odometry, EKF is used to identify and reject falsely tracked frames and compensate for time delays [6]. The filter includes the motion model of the quadrotor; the state space of the EKF is given by where (x, y, z) is the position of the quadrotor; (ẋ,ẏ,ż) its velocity; (Φ, Θ, Ψ) are the roll, pitch and yaw angle respectively (in degrees); andΨ is the yaw rotational speed. For each sensor, an observation function is defined as below and the respective observation vector, derived from sensor measurements is defined as shown where v x , t and v y , t are velocities in xy plane directly measured by the quadrotor. The velocity in z direction can be computed since the altitude h t is given by an ultrasonic or an air pressure sensor and δ t is the time from time step t and t + 1. Equally, for rotation readings, roll Φ and pitch Θ are directly measured and the yaw rotation speed can be calculated since Ψ is given by the IMU. If PTAM tracks a video frame, the pose estimation is scaled with λ and transformed to the coordinate system of the quadrotor, leading to direct observation of its pose as follows.
where E C,t is the estimated camera pose scaled with λ, E DC is the rigid transformation from the camera to the quadrotor coordinate system and f is the function that maps from SE(3) to roll-pitch-yaw representation. The Kalman filter predicts how the state vector x t evolves to the next time step. The horizontal acceleration is proportional to the horizontal force and is given by where F acc is the drag force and F acc denotes the accelerating force which is proportional to the projection of the z axis of the quadrotor onto the horizontal plane, which leads tö where coefficients c 1 and c 2 are estimated from data collected from test flights. The influence of the control inputs u t = Φ ,Θ,z,¯Ψ is described bẏ where coefficients (c 3 , · · · , c 8 ) are estimated from flight tests. The overall state transition is given by For a complete derivation of the state transition and delay compensation for a quickly reacting system to avoid oscillations and unstable behavior, the authors refer to [52].

Quadrotor Control Scheme
In this section, the control scheme is described. An MLP is used with an architecture as shown in Figure 5. The ANN has two inputs: the error between the desired value and the output measured from the system, and the derivative of this error to get information about the rate of change of the process output. PTAM estimates the positional observations (x, y, z), and the yaw angle ψ can be directly measured by the IMU.
The network has one hidden layer with four nodes and one neuron at the output layer, and there are four MLP modules, one for each controllable Degree of Freedom ( Figure 6). The weights are randomly selected and uniformly distributed between [−1, 1]. The outputs of the four MLPs represent the control inputs U i from (6). Despite all Degrees of Freedom being internally coupled, the advantage of using the MLP modules separately (one for every DoF) is an easier implementation and interpretation of the control scheme behavior.

Results
The platform selected to test the algorithm was the Parrot Ar.Drone 2.0, which is a quadrotor equipped with a 3-axis gyroscope, 3-axis accelerometer, a magnetic compass, an ultrasound altimeter, one camera looking in x direction and another camera looking downward. For these experiments, the camera looking forward was used; it has a field of view of 92 • and a resolution of 640 × 360. Its video is streamed at 30 fps to the ground station. Sensor measurements of the Ar.Drone 2.0 can be sent to the ground station at a frequency of 200 Hz; however, due to integration of the ANN training, control, localization and mapping, this transmission rate is decreased to 30 Hz in the following experiments.

Simulation Results
Simulation experiments were carried out in Linux Ubuntu with Gazebo, which is an open-source simulator with easy integration with a Robot Operating System (ROS). The world scene used for the test and the camera view are shown in Figure 7. In the first experiment, a trajectory in the xy plane is selected to train weights for both directions at the same time. For each degree, all the weights of an MLP module with ten nodes in the hidden layer are selected. As can be seen in Figure 8, the overshoot is reduced and eventually eliminated after some iterations. After some iterations of x and y training, the z axis is added and a new test is performed training the three Degrees of Freedom. Again, the z axis MLP module is initialized with random weights. The results of these tests are shown in Figure 9. In Figure 10 the PTAM output can be seen. It can be seen that the overshoot in z is reduced after some iterations, and eventually, the path is followed correctly. Table 1 shows the Root Mean Squared Error (RMSE) for each axis.   Figure 10. x, y and z axes are denoted red, green and white respectively. It can be seen that the quadrotor follows the reference correctly.

Experimental Results
The experimental tests were carried out in indoor unknown environment; however, since the localization of the system is estimated using only onboard sensors, it can also navigate in outdoor conditions. It is preferable that the scene includes sufficient objects (features) to be seen by the camera in a range of 0-7 m. The weights were initialized with the last value of the simulation. Since the Ar.Drone 2.0 driver for ROS was used, the parameters should be close to the real system. The data were received at 30 Hz (the time interval of each iteration was 1/30 s). Figure 11 shows the camera view and mapping output of the experiment. In Figure 12 are the results of controlling the quadrotor using the MLP using online training. As shown, oscillations and overshot were eliminated.  For the yaw angle ψ, in contrast with the other three modules, the weights were initialized randomly. The fourth MLP module was trained online to control the yaw angle, and the results are shown in Figure 13. As can be seen, the neural network adapts its weights to follow the reference.
For a second experiment, a mass is added to the system during the flight in order to test its adaptability; the mass has a value of 45 g, which represents, approximately, 1/3 of the maximum payload of the Ar.Drone 2.0. The results of this experiment are shown in Figure 14. As can be seen, at iteration 750 approximately, the mass is added on the x axis of the quadrotor. After some iterations, the oscillations are eliminated.

Conclusions
In this paper, a direct control approach for a quadrotor was presented. The controller was based in a Multilayer Perceptron trained with the Extended Kalman Filter. Each MLP module consists of two inputs, ten nodes in the hidden layer and one output, which is the control action. Each controllable degree of freedom required an MLP module that was trained online to adapt its control law to uncertainties, loss of rotor efficiency, time delays and changes in the dynamic model of the system. The results were first validated via simulation using the Ar.Drone 2.0 ROS driver with Gazebo simulator. The neural network was initialized with random weights for x, y and z axes, and once the overshoot and oscillations were reduced, the weights were used to initialize the MLP modules in the real system. In the simulation, every degree of freedom was separately trained. For experimental tests, x, y, and z were initialized with the weights calculated in simulation, and all of them continued their training online. As can be seen in Figure 12, even using the simulation results, the neural networks must continue the training, since they present oscillations around the reference location. After some iteration steps, the oscillations were eliminated. For the experimental tests on the yaw angle (ψ), the weights were initialized randomly; that is, without previous training in a simulation. Figure 13 can be seen as the erratic behavior at the beginning of the experiment, and it is shown that the MLP adapt its weights to control the yaw of the quadrotor. Finally, in Figure 14, it has been shown that the controller adapts its weights when changing the dynamics during the flight.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: