UAV Payload Transportation via RTDP Based Optimized Velocity Proﬁles

: This paper explores the application of a real-time dynamic programming (RTDP) algorithm to transport a payload using a multi-rotor unmanned aerial vehicle (UAV) in order to optimize journey time and energy consumption. The RTDP algorithm is developed by discretizing the journey into distance interval horizons and applying the RTDP sweep to the current horizon to get the optimal velocity decision. RTDP sweep requires the current state of the UAV to generate the next best velocity decision. To the best of the authors knowledge, this is the ﬁrst time that such real-time optimization algorithm is applied to multi-rotor based transportation. The algorithm was ﬁrst tested in simulations and then experiments were performed. The results show the effectiveness and applicability of the proposed algorithm.


Introduction
Unmanned aerial vehicles (UAVs) have been widely used by researchers, security and law enforcement agencies, search and rescue operators, firefighters, farmers, filmmakers, photographers and delivery companies.UAVs can perform tasks in confined spaces or in hazardous environments.UAVs are now evolving from just being a sensor to air-borne manipulators.Researchers have started to focus on attaching manipulators to the UAVs, which enables them to perform tasks such as opening a valve, and picking and placing objects.Other than manipulation, researchers claim that UAVs, specifically small UAVs, have the potential to significantly improve the research in remote sensing [1].Recently, a Canadian firm has started drone based payload delivery services [2]; another firm started coffee delivery via drones in Australia [3].Recently, a drone was tested for its ability to transport organs for organ transplants [4].UAVs are used to transport deform-able linear objects such as hose and goods [5,6].The Swiss post used UAVs to transport medical samples, which helped them to reduce a 45 min car journey to a few minutes of flight [6].A recent study showed that UAVs delivery might even help with reducing greenhouse emissions caused by the freight industry [7].
Motivated by the potential of UAV based payload transportation, we have investigated the possibility of applying a real-time multi-criteria optimization approach.Continuous improvement of the computational capabilities and reduction of the size of computational platforms have allowed for installing them on-board small UAVs.Therefore, dynamic programming based optimization techniques can now be applied for controlling UAVs.Specifically, the contribution of this paper is the development, numerical and hardware testing of a real-time dynamic programming algorithm for achieving velocity optimized energy efficient aerial transportation using a multi-rotor platform.The optimization algorithm can be applied in real-time scenarios, and it considers the aerodynamic influences.

Relevant Work
Energy consumption in multi-rotors is critical since they are powered by batteries that have limited endurance.The major power consumption results from the motors that are rotating the propellers to generate thrust that keeps the UAV airborne.The electrical energy consumed by the motors depends on the thrust requirements, and also includes the electrical losses due to heat and friction and overall propulsion system efficiency.These include losses of motors and electronic speed controllers.Although several studies have focused on improving battery charging methods for multi-rotors such as wireless charging [8,9], still, on average, the flight time of multi-rotors after every charge is around 20 to 30 min [10], which limits the applications of the multi-rotors.Other causes of energy consumption include the autopilot or any companion computer attached to the aerial platform, sensors such as a camera for visual servoing or communication links.The lateral motion of hex-rotors also deals with parasitic drags, which results in a higher requirement of rotor torque that results in more energy consumption.The energy constraints in aerial transportation can be addressed via two approaches.One approach is the design stage of the aerial transportation.The second stage is the energy savings by the efficient planning of the operation.Energy savings in the design stage can be achieved in various ways, including reducing the amount of weight carried by the UAV.Batteries account for up to 50% of total UAV mass for small UAVs [11], the addition of further payload mass drastically affects the already burdened energy budget.Flying with an optimum mass can maximize the endurance of the UAV [12].In some cases, an aerial platform could utilize a hybrid design such as [13] that travels on the ground when the flight is not necessary to save energy.
The second energy saving approach is related to efficient motion planning.It is possible in these cases to choose a minimum energy consumption path from multiple available paths, generated by a path planner [14] for a single UAV.The method developed by [14] performs an offline energy efficient path planning based on Dijkstra's Algorithm.
A paper by [15] showed that energy consumption increases with forward velocity after a first decrease, when compared to the energy consumption in a hover condition.Another study by [16] evaluated the direct relation between energy efficiency and the speed of a multi-rotor system.It was argued and proved by [16] that a hex-rotors system have to consume energy to continuously support their weight and minimizing the time in the air could result in overall energy savings.Experiments showed that there was a 29% difference between the hex-rotor transportation with two different speeds.A novel path-following controller was presented by [16] in which the speed of the rotor-craft is a dynamic profile that varies with the geometric requirements of the desired path.
Minimizing time in the air to save energy is significant in case of moderate speeds up to 10 m/s where parasitic drag can be ignored [17].However, for higher speeds and for bigger hex-rotors, this parasitic drag can be significant and would lead to more energy consumption.In some cases, a cage surrounds the multi-rotor, which also adds to the parasitic drag [18].It can be inferred from previous studies that energy efficiency can also be increased by incorporating the aerodynamic factors and making velocity decisions that result in less overall energy consumption.It is also possible to generate a minimum energy consumption path for a single UAV as performed by [10].The proposed method in [10] takes the brush-less DC motor model into account and finds the optimal path by using a predefined initial and final configuration of the multi-rotor.However, the method proposed by [10] was performed offline and it does not take into account the uncertainties present between actuation and expected motion.Furthermore, a lack of feedback in the proposed strategy makes it unfeasible for real-time multi-rotor transportation.
This research study presents an optimization algorithm for a platform's velocity based on a real-time dynamic programming algorithm (RTDP).The RTDP algorithm is developed utilizing the original dynamic programming principle presented by [19].The developed optimization algorithm is inspired by a real-time dynamic programming introduced in [20] for the optimization of velocity to minimize energy consumption in automobiles.To the best of the authors' knowledge, RTDP based energy optimization for multi-rotors energy savings has not been attempted before.
In Section 2, we will describe the fundamentals of the RTDP algorithm used for energy optimization.After that, Section 3 will discuss the experimental methodology used in this study, and results of the numerical experiments will be presented and discussed in Section 3.7, followed by software in the loop simulation results in Section 4. Finally, hardware experiments are also discussed in Section 5 followed by conclusions and references.

Algorithm Description
The optimization algorithm is based on minimization of a cost function as described in Equations ( 1) and (2).The optimization algorithm is applied in two steps as shown in Figure 1.The first step is applied on a pre-defined horizon length, while not considering the end of the journey.The cost function for first step with a non-finite stage is described in Equation ( 1): where J N is the cumulative cost at the last stage, V is the reference velocity of the UAV and decision variable, J k is the cost at a particular time step k which is a function of the decision variable V, J e (x k , u k , z k ) is the energy cost of a particular step k, and is further explained in Section 2.1, is the journey time cost of the step k and λ is the weighing factor as described further in Section 2.1; g(x N ) is the terminal cost.The terminal cost is used to approximate the cost incurred after the horizon up to the end of the journey at an unknown distance, required to minimize the impact on future horizons of decisions made for the current horizon.The second step is when the aerial vehicle enters the last horizon; during this step, the end of the journey is considered.The main difference between the first step and the second step is that, in the former, the cost function includes a terminal cost, while the second step defines a final state of the system.The second step of the optimization is activated when the vehicle enters the last horizon.This step is called the Dynamic programming (DP) algorithm with the finite stage.In this step, the DP algorithm is applied at the start of each stage, but the number of stages is reduced for each DP computation, based on the proximity of the vehicle to the goal position.The final state is predefined and, in this case, it is considered to be a velocity state equal to zero.Equation (2) describes the second step of the optimization algorithm: ( As described in [21], apart from the minimization of cumulative cost function, we also need to describe the system in a discrete time model.The overall model is discretized such that it can be written as Equation (3): where k = 0, 1, 2, 3.....N − 1 is the time index showing the stages.Here, x k+1 represents the current state of the system and f k is the transition relationship, as a function of the current state x k and u k , the control decisions which are defined further below.x k is the state vector defined as where V is the velocity of the payload.u k is the decision variable defined as where V 1 is the reference velocity of the UAVs in the transport direction.As a simple case, this optimization is performed for the transportation from point A to B having the same altitude.The disturbance can be defined as which is based on wind speed and ρ is the air density at the transportation altitude.
In order to apply dynamic programming in real time, the problem is divided into horizons.If the DP sweep is performed on the complete journey from point A to point B, then the computational costs would increase, rendering the method to be incapable of working in real time.The application of DP sweep on the horizon length reduced the computational time, therefore the DP sweep is performed frequently.The DP sweep is performed every time the vehicle moves a predefined distance step.Each horizon includes a pre-defined set of distance intervals.The current horizon is the one that includes the current state of the agent.The application of the DP algorithm can be seen in Figure 1.It can be seen in Figure 1 that there are two types of DP sweeps performed: one type of DP sweep with terminal costs and the other type without terminal costs.The second step of DP sweep is required because the final state of the UAV is in hovering mode in order to drop the payload at goal position.
During the start of the journey, the DP sweep with terminal costs is performed as follows.The transition costs are calculated for all stages of current horizon only.Starting with the origin or taking the current state of the agent, the transition costs of all possible states will be calculated and stored for the first stage.Now, for the second stage, at each node, the cumulative transition costs of reaching this node from all states of the first stage will be calculated.This process will be repeated from k = 0 to k = N − 1 until the terminal node.After reaching the terminal node of the current horizon, the transition cost to the terminal node will include the terminal cost as shown in Figure 1 and explained in Equation (1).Starting from the terminal node k = N with minimum cost, the optimal solution will be traced back to node k = 0.After obtaining the optimal policies, control actions will be immediately utilized until the agent reaches the next stage.When the agent reaches the next stage, we will take the current state of the agent and consider it as the origin and run the algorithm again until the end of the rolling horizon.After reaching the terminal node of the current horizon, we will again trace back the solution and immediately implement the optimal policies.The process will be repeated until the vehicle enters the last horizon.
When the vehicle enters the last horizon, the DP sweep is performed following the cost function Equation (2) and as shown in the end part of the journey, as shown in Figure 1.In the last horizon, cumulative transition costs will be calculated while considering the final state, which is pre-defined.Hence, the terminal cost will not be calculated.After each stage, the number of stages will be reduced, until the vehicle reaches its destination.

Definition of Cost Function
The cost function is defined based on two parameters: energy consumption and time spent.A parameter λ is defined as the weighting factor to select the preference between the time spent to reach the goal or the energy consumption to reach the goal.If λ is close to one, the energy consumption is given a priority and, if λ is close to zero, then minimization of time spent is of higher priority.In order to make these factors comparable to each other, a normalization is required.The combined cost function can be written as follows: where J e and J t are the normalized energy and journey time cost functions, respectively.

Normalization of Time Spent
The time spent to reach the goal position can be optimized by minimizing the following cost function at each backup operation, when the algorithm is going through the options within the search space.In order to make it compatible with the other part of the cost function, the time cost function is normalized and made dimensionless using J t = T a T n , where T a is the time spent by following an option from search space and T n is the normalization factor to make it compatible with the power consumption factor in the combined cost function.This normalization factor is carefully selected to make sure that the normalized time spent is less than one.This allows for comparison between the energy and time.This dimensionless component of the cost function reduces the time spent to reach a goal position because of the cost function's minimization.

Normalization of Energy Consumption
Using the weighting factor, the cost function for the energy can be written as J e = E c E n , where E c is the energy consumed during the transition between two states, whereas E n is the normalization factor used to make it compatible with the time factor.The factor E n is selected such that the normalized energy consumption factor turns out to be less than one.

Terminal Cost
The terminal cost is calculated during first step of the optimization.The terminal cost is calculated during DP computation; when the cost function of the last node is calculated, the terminal cost is considered a multiple of the last transition cost to reach the last node.Since a DP sweep is repeatedly performed at each distance step for the length of the horizon, the terminal cost penalizes the decisions that can benefit the current horizon at the cost of the future horizons.An example of such is the reduction of velocity significantly at the end of current horizon, which would result in high acceleration in the beginning of the next horizon.

Model of the Evaluated System
The system consists of a hex-rotor as shown in Figure 2. The UAV model was simplified by assuming that the UAV structure is rigid and CoG (center of gravity) and geometric centre is at origin B of the body frame and coincide.Let P = [x, y, z] T be the position vector of the center of mass of the hex-rotor relative to the fixed inertial frame ε = [X , Y , Z ] T .The orientation of the hex-rotor is expressed in Euler angles as Φ = [φ, θ, ψ] T , where φ is the roll angle about the x-axis, θ is the pitch angle about the y-axis, and ψ is the yaw angle about the z-axis of the hex-rotor UAV.Six rotors attached to identical brush-less DC motors are rotating with a speed ω N .The following equations best describe the translational and rotational dynamics model used in this paper for the hex-rotor UAVs [22]: where m = M + m p , which is sum of mass of the platform M and mass of the payload m p , g is the gravitational acceleration and ẍ, ÿ, z are the translational accelerations of the hex-rotor UAV in x-, yand z-axes.T is the total thrust produced by all motors calculated as ), k t and k b are the torque and thrust coefficients, The rotational inertia of the UAV is expressed as (I x , I y , I z ).The model detailed above is suitable for evaluating the optimization algorithm as shown in Section 3, but it cannot be used for cost function calculation because of the high computational costs.If we ignore complex maneuvers and only point A to point B transportation, we can then ignore the energy consumption variation during attitude changes.Therefore, only the Equations ( 8)-( 10) describing the translational dynamics of the multi-rotor are used to develop a reduced model.The optimization algorithm developed considers the transportation from point A to point B without changing altitude or yaw.Keeping yaw angle fixed, and assuming a constant altitude allows the cost function calculations to be simple; further studies, however, will be done without these assumptions.
Therefore, in this case of no vertical motion, z = 0.The total thrust required to stay airborne would be T = mg (cos θ cos φ) .If the yaw angle is fixed to zero and T from the above equation is substituted in equations of motion, the multi-rotor equations of motion can be reduced to Equations ( 14) and (15).
Equations ( 14) and ( 15) can be used to find the roll and pitch angles that can then be used to find the required thrust.The equations must be discretized in order to calculate the thrust required, which in turn is needed to calculate the power consumption as described in Section 2.2.2.A step by step description of power consumption calculation is shown in Algorithm 1 Algorithm 1: Calculation of average power consumption for a state transition Result: Power consumption The distance based translational dynamics model can be discretized using the forward or backward Euler method.The equations for the forward Euler approach are described as below:

Motor Energy Consumption
A thrust measurement stand is used to obtain the rotor speed, thrust and energy consumption plots for the DJI E310 propulsion system, which includes 2312 motors with the DJI 9450 rotors.All values obtained were compared with the values given by [23] for the same propulsion system.A power law based empirical equation between the rotor speed and the power consumption is obtained using curve fitting as shown in Equation ( 20): where P N is the power consumed by rotor N of the UAV, and ω iN is the speed of the rotor in rad/s.The speed of the rotors is found using the relation where k b = 9.85 × 10 −6 .The energy consumption during a particular step can be calculated by which can be written in the discretized form as follows: where t is the time spent in that distance interval, P elec = ∑ N 1 P N and E s is the energy consumed during that distance interval.

Calculation of Step Travel Time
The time spent for that particular distance step can be calculated using the following equation: Equation ( 23) is continuous and we need to discretize this equation in order to evaluate the time consumed in that particular step:

Parameter Selection
There are several parameters that have to be determined before executing the optimization algorithm.These parameters include: the horizon length; the distance interval; and the velocity interval.These parameters can influence the complexity of the algorithm and the computational time for one DP search space sweep.The complexity of the algorithm O as shown in Equation (25), and calculated and plotted in Figure 3a, depends on the number of distance intervals N s between the start and goal position, and the number of velocity intervals N v : The complexity of the algorithm increases by increasing the number of velocity and altitude intervals.The selection of distance size and the velocity interval also depends on the hex-rotor's ability to reach that velocity in a particular distance step.Sample computation times and number of computations were plotted in Figure 3.The analysis in Figure 3 is based on distance interval of 10 m and velocity interval ranging from 0.25 m/s to 2 m/s.The computation is done using a 2.4 GHz computer with 8 GB RAM.An open-source autopilot hardware "pixhawk" is used with "PX4" firmware.The position controller in this system works at a rate of 100 Hz, which means it takes 0.01 s for one position control cycle.Currently, the computation time for one DP sweep for only the x-axis takes between 0.002 s to 0.03 s depending on the number of velocity intervals.If we consider matching the frequency of position controller of PX4 with this DP sweep, we can consider a 0.5 m/s velocity interval.Ideally, if the DP sweep is to replace the position controller, we should try to optimize the computation time so that it is equal to or below 100 Hz.This could be achieved with very small horizon lengths.

Distance Interval Selection
An ideal distance interval would have constant acceleration for the application of discretized translational dynamics equation.The change in velocity states is performed via PID velocity controller.Hence, the ideal case of constant acceleration can only be approximated to the response of velocity controller, if the distance interval is such that the time spent in this distance interval is equal to or greater than the rise time (90% of steady state) of the velocity response.In cases where the velocity change in the two states is such that, for that distance interval, it will take a time equal to the rise time to reach the desired velocity, then the assumption of constant acceleration can be applied.If the required velocity change in the two states will take less time than the time vehicle travels a particular distance step, it would result in the vehicle accelerating and then coasting in one distance step.This simply means that, in such a case, a constant acceleration assumption cannot be applied.In such a case, the transition cost must be calculated in two steps.The first step is to use the rise time to calculate the transition cost and the second step is to calculate the transition cost from rise time to the end of the distance interval.In other words, in this case, the distance interval is divided into two parts: the first part with an acceleration and the second part without any acceleration.In cases where the time required to travel the distance interval and reach the desired velocity is greater than the rise time, this desired velocity is then not feasible and should be discarded from the search-space.The distance interval should have a value high enough that for the minimum velocity intervals it has possible states to explore.

DP Sweep Trigger
Each time the UAV wants to travel a distance equal to the distance interval, the DP algorithm is activated.In order to achieve this, the following equation is used to convert the position vector into a periodic function ς = sin (P + ), where P is the position(currently considering only x), and is used to select the distance interval value to trigger the DP algorithm.The values of ς range from −1 to 1, and whenever the value ς crosses zero, the DP algorithm is triggered.
When the DP sweep is not triggered, the last set-point velocity is continuously being sent to the drone.This is important since, apart from sending the velocity commands for the transport direction, the algorithm is also continuously sending the velocity commands for vertical and lateral position control.The lowest allowable publish rate for position controller is 2 Hz since the firmware used in autopilot of software in loop simulations and hardware experiments triggers the fail-safe mode if the velocity commands are publishing below the threshold frequency.

DP Sweep Sample Plots
Two sample DP sweeps are plotted in Figure 4.These plots are developed using the DP code with the following specifications: the DP sweep consists of only one horizon length with 10 distance intervals.The velocity states ranged from 0 m/s to 14 m/s.In Figure 4, the weighted cost function is represented by colored lines with varying thickness.The cost function values were normalized and, based on the cost function value, the thickness of the line was given.Figure 4 shows the cost function values distribution for all possible states.We can see that the cost function has a cumulative increasing effect since the first stages show thin blue lines and the last stages show thick and red lines.After doing this search space sweep, the DP algorithm traces back the the optimal path by starting from the last transition costs and selects the minimum cost all the way back to the first node.

Numerical Simulation Experiments
The algorithm was tested on a complete hex-rotor model developed in MATLAB Simulink.The purpose of developing a simulation tool is two-fold: first to asses the proposed algorithm, secondly, the simulation tool can also be used to asses the mission completion feasibility prior to the actual flight or software in loop simulations.It also assists in deciding the parameters of the algorithm, which otherwise would be very difficult to do with the real flight.The Simulink model consisted of a hex-rotor model, control block and optimization algorithm block as shown in Figure 5.The hex-rotor model uses Equations ( 8)-( 13) as described in Section 2.2 previously in detail.The optimization block consists of the real-time dynamic programming optimization algorithm.Several factors affect the energy consumption during an aerial transport using a UAV.Some of these factors include the drag caused by wind speed and the density of air.Wind speed and air density are known to vary with elevation.Air density decreases at higher altitudes and wind speed increases with elevation.Several blocks were added to the hex-rotor model, which include a power consumption estimation block, atmospheric density calculation block, wind speed estimation block, and the payload model block.The interactions of all these blocks and information inflow and outflows from these blocks are explained in Figure 5.The controller block, atmospheric density model, wind speed model, and payload model are described further below in Sections 3.1-3.4,respectively.The inertial parameters of the DJI-F550 UAVs were found by using the developed CAD model and a CAD software was used to find the inertial parameters.The mass of the UAV was calculated after taking measurements from a weight scale.The altitude, attitude and position controllers were added to the Simulink model and tuned.All parameters used in the simulation are presented in Table 1.The controller gains mentioned in Table 1 were obtained by tuning the system manually.

Controller Block
The controller block consists of a position controller, which generates velocity set-points and a velocity controller, which generates the attitude set-points, the attitude controller, generates the required motor speeds.

Position Controller
The position controller is simply a proportional controller, which generates the velocity set-points to maintain the desired position: V sp = e p K pp , where K pp is the proportional gain for position controller and e p = P des − P current .

Velocity Controller
The velocity controller is a PID controller, which generates the attitude set-points: where K pv is the proportional gain for velocity controller, and K iv is the integral gain for velocity controller, K dv is the derivative gain for velocity controller, and

Atmospheric Density Model
The atmospheric density model is based on a US standard atmosphere [24].It is implemented in this study using an international standard atmosphere block, which takes the altitude as an input and provides the density as an output.

Wind Speed Model
Wind speed is an important factor which can affect the power consumption of a multi-rotor during the flight.Off-line energy assessment would require the measurements of wind speed.Wind speed measurements are usually available online from the meteorological department or can be obtained via an on-board wind speed sensor, and it is measured at a certain altitude.However, by using a power law, wind speed can be estimated at the required altitude, provided that we have measurements of wind speed at the near ground altitude.The power law is explained as Equation ( 26) and is obtained from [25]: where V w is the estimated wind velocity at the elevation z, and v g is the wind velocity at near ground elevation z g .α is called the Hellman exponent, whose values can be found using Table 2.

Payload Model
Payload is currently considered as a block with a surface area A p , and hence a drag co-efficient CD, and a mass m p .The drag force acting on the payload is calculated by using Equation ( 27): where F Dp is the vector representing the drag forces acting on the payload, C Dp is the co-efficient of drag for payload, A p is the payload surface area exposed, V v is the velocity of the hex-rotor, V w is the wind speed and ρ a is the density of air at an altitude a.

Thrust Irregularity in Forward Flight
The thrust is generally modeled as T = k b ω 2 , which is called the thrust model at hover.This model, however, does not accurately predict the thrust and motor speed relation during high speed forward flights as it is observed in [26,27].It was also concluded in [28] that, for the same rotor speed, the thrust decreases greatly at higher angles of attack and higher speeds.Another experimental study by [29] showed that the total lift produced by various multi-rotors decreases by decreasing pitch angles from 0 • to −40 • when subject to 6 m/s wind speed.In another study [30], a controller is proposed to be able to compensate for thrust irregularity during forward flights.This thrust deficit in high speed forward flight means that the propellers have to rotate faster than usual to produce the thrust required.
This simply means that more energy is consumed during forward flight to maintain the same thrust.In [30], this thrust irregularity is modeled using Equations ( 28)- (30).
The induced air velocity due to rotation of rotors is shown in Equation ( 28) where v h is the air velocity caused by the rotation of propellers in hovering conditions, T h is the thrust in hovering condition, ρ is air density and A is the propeller sweep area: where T is the actual thrust produced, v i is the induced air velocity from the propellers caused by forward flight.v α is the upstream air velocity and β is the attitude angle.The induced velocity v i of the air from propellers can be calculated using the following equation as [31] The above formulation was applied on results from [29] and was found to be matching with experimental results.
The effects of forward velocity and the pitch angle can be further seen via plotting the ratio T T h from Equation ( 29) in Figure 6.Forward velocities v α ranging from 0-12 m/s and pitch angles β ranging from 0-40 • were plugged in Equation ( 30) along with the hover air velocity v h as calculated from Equation ( 28) to calculate induced velocity v i .The induced velocity v i , pitch angle β, forward velocity v α and air velocity in hover v h were plugged in Equation ( 29) to obtain the ratio T T h for each pitch angle value and forward velocity and plotted in Figure 6.At very low pitch angles and higher velocities, the ratio T T h remains above one.However, at higher pitch angles and higher velocities, the ratio stays below one.This simply shows that, for the same power, the thrust produced in the hovering condition will be higher than the thrust produced in forward velocity.The energy consumption calculation in the numerical model accounts for this thrust deficit in high speed forward flight using the method described in Figure 7.

Assumptions
The attitude dynamics of the UAV is decoupled with the payload, therefore any drag force acting on the payload will not effect the attitude angle of the UAV.This assumption is supported by using a multi degree of freedom based free joints arm in software in the loop simulations.In hardware experiments, the payload attachment point can be a ball joint or a small cable; this will decouple the attitude dynamics of the multi-rotor with payload.We do not model the thrust irregularity happening at higher pitch angles and higher forward velocities in the dynamic model.Therefore, a thrust irregularity compensator is not required as it is suggested by [30,32].The energy consumption calculation is, however, performed while considering the thrust irregularity in forward flight as discussed in Section 3.5.We also assume that the attitude, altitude and velocity controller of the UAV is robust enough to deal with extreme conditions (high velocities, high attitude angles).The velocity controller is not a model based controller as it is shown in [32], therefore drag force compensation during high speed forward flight is not required.The RTDP algorithm receives the current state of the UAV and provides a velocity decision.This velocity decision can be implemented using the controller presented in this paper or more calibrated ones such as [33].We assume that the UAV is capable of lifting a cube with a frontal area of 0.12 m 2 .Energy loses during fast maneuvers are not considered.These losses include the braking of the motors to change the motor speeds to achieve differential thrust required for attitude change.This assumption can be supported by the fact that frequent attitude angle changes are not happening between point A to point B during the transport.Side wind is neglected in this analysis.Based on the lateral frame drag, it is necessary for the multi-rotor to balance the drag force to be able to maintain the lateral position.This drag can also influence the energy consumption in addition to the drag caused due to forward flight.A sudden change in air density is out of the scope of the current paper.The sudden change in air density can occur if there is a dust devil vortex or a thermal plume in the path.The horizontal length of such thermal plumes is considerably less than the journey length of our simulations.In real applications, the air density should be updated based on the elevation of the UAV.

Numerical Simulation Results
A numerical simulation test was conducted where the UAV was carrying a cubic payload of 0.12 m 2 surface area and negligible weight from [0, 0, 10 m] to [250 m, 0, 10].The initial velocity at point of origin [0, 0, 10 m] and at goal position [250 m, 0, 10] were 0 m/s.The velocity interval of 0.1 m/s was considered.Distance intervals of 1, 2, 3, 4 m were considered, whereas the horizon considered consisted of 10 distance interval steps.The DP sweep was triggered at every distance interval step.Three test cases with values of λ = (0.2, 0.5, 0.7) were performed.The results of the test cases are shown in Tables 3-6.These results show the percentage of energy savings when the λ is increased from 0.2 to 0.7.Table 5 shows that the journey time decreases when we decrease the λ from 0.7 to 0.2.Another test was performed with an increment of velocity interval of 0.5 m/s and the difference in energy saving was found to be less than 1%.Extended simulations were performed with distance interval ranging from 1 m-4 m for a velocity interval of 0.1 m/s.The results of the extended simulations are tabulated in Tables 3, 4 and 6.The results of the numerical simulation are also presented in Figure 8.These results include the transport trajectory, the velocity profiles, the pitch angles, the opposing drag forces, and the energy consumption profiles.The plots include the numerical simulation results for a value of λ = (0.2, 0.5, 0.7).The velocity profile for λ = 0.2 shows the highest value, which results in more drag forces and thus requires higher pitch angles to maintain the velocity.The higher pitch angles and higher forward velocities supplement the increase of thrust deficit, which means that more power is consumed.Therefore, although λ = 0.2 results in the fastest transportation, it results in more power consumption.However, for λ = 0.7, the velocity is lower, which results in lower drag forces, which in turns results in a lower pitch angle.The lower pitch angle and smaller forward velocities result in a reduction of thrust deficit, hence the total energy consumption during this transportation is low although the mission took more time to complete.

Software in the Loop Simulations
Examples of Software in the loop (SITL) simulations in literature are [34][35][36][37].More accurate simulation frameworks are also being developed [38] for multi-UAV simulations.Software in the loop simulations (SITL) are performed to verify the functionality of the code before performing the hardware experiments.In general, our SITL setup can be explained by the following Figure 9.Using the Table 1, the DJI F-550 drone was modeled in Gazebo as shown in Figure 10, and the same PX4 flight stack was simulated as it is used in the real autopilot.The ROS, MAVROS package, PX4 flight stack simulator, Gazebo simulator and the RTDP algorithm were running on the same Linux computer.The computer has a 2.4 GHz processor and 32 GB RAM.All communication between MAVROS, Gazebo and gripper node were performed via the local ROS MASTER.
The RTDP algorithm was communicating with the flight stack through ROS Master using the MAVROS package.The flight stack was communicating with the DJI F550 drone in the Gazebo simulator.It is a widely accepted practice to test new algorithms in SITL simulations before testing them on real hardware.The autopilot firmware is developed by hundreds of researchers and it is available online with the source code.The RTDP algorithm developed can use MAVROS via the robotics toolbox of MATLAB to be able to communicate with the autopilot.A multi-link arm was fixed at the base of the DJI-F550 model as shown in Figure 10.All the joints in the multi-link arm were free and were mimicking the behavior of the cable.A gripper ROS node was running which can create and delete links between the payload and the gripper attached at the end of the multi-link arm.

SITL Experiments
Similar parameters were used in the SITL simulation experiments as discussed in Section 3.7.RTDP algorithm was developed in Simulink, which was running on the same laptop where the Gazebo simulator was running.A velocity interval of 0.1 m/s was used.Similar to the autopilot, the simulated flight stack also records the flight data including the actuator signals, attitude, velocity and position estimates.The autopilot logs were used to plot the performance plots.The publish rates of the topics during the aerial transport in SITL simulations were observed using an ROSTOPIC tool and were plotted in Figure 11.A delay in publishing of the topic can trigger the fail-safe activation for off-board failure.The frequency of publishing is very high when the DP sweep is not triggered.When the DP sweep is triggered, we expect the publishing frequency to go down.The lowest publishing frequency is 4.8 Hz.The results of the simulations are plotted in Figure 12.The numerical simulation results and software in the loop simulation results are similar.In both cases, at λ = 0.2, the DP sweeps resulted in high velocities, which created higher parasitic drag.This resulted in higher pitch angles, which lead to high power consumption.The drone behavior was similar for numerical simulations and SITL simulations for values of λ = 0.5, 0.7.This shows that the algorithm developed is compatible with the autopilots' firmware and it satisfies the constraints imposed by the parameters of the firmware.These parameters include the multi-rotor constraints such as max velocity, max altitude, fail-safe mode activation conditions, etc.It also proves the applicability and usefulness of the algorithm.

Real-Time Flight Experiments
The system hardware as shown in Figure 13 consists of an external computer, and a DJI-F550 with Pixhawk autopilot with PX4 firmware for low level control.DJI-F550 also had an on-board companion computer, which was connected to the autopilot via MAVLINK.A cable-based gripper inspired by Gough-Stewart platform was developed and attached to the Hex-rotors.The gripper is a payload attachment mechanism that allows for automation of attachment and release of the payload.
The electro-permanent magnet was activated and deactivated by using an Arduino Nano connected to the companion computer via USB link.The gripper can be activated and deactivated during autonomous operation, using an ROS node.The electro-permanent magnet required power for activation and deactivation only; therefore, it was powered by the same battery that was powering the drone.A current and voltage sensor called Elogger V4 is connected in series between the batteries and the ESCs of the UAVs while bypassing the power supply of the onboard computer as shown in Figure 13.Elogger was also used by [39,40] for logging the current and voltages of a UAV during the flight.The cable material was chosen to withstand the weight of the payload requirements.The software for the experiment is based on ROS interface MAVROS.An Optitrack system is used for position estimation.The algorithm is written in MATLAB Simulink and it communicates to the drone via MAVROS through ROS MASTER.Further details of experimental procedure are provided in "Supplementary Materials".The videos of all experiments are available at [41].

Baseline Experiment
An initial baseline experiment was performed, where the Drone01 was kept in hover at fixed altitude set-point.There was no payload attached to the Drone01.The drone01 is the same drone that was modeled in SITL and Simulink simulations.The power consumption was recorded and compared with the SITL and Simulink model hover power consumption.The results are plotted in Figure 14.It can be seen in Figure 14 that power consumption in SITL and Simulink models at hovering condition matches with the hardware experiments.However, there are minor discrepancies during the takeoff, which can be ignored for this study since the optimization is performed for transportation from point A to point B.  The experiment showed that the drone01 moved from point A to point B successfully.The numerical simulation results and experimental results are in close agreement.This proves the compatibility and experimental applicability of the algorithm.
After comparing the simulation and experiments for λ = 0.7, two more experiments were performed to compare λ = 0.7 with λ = 0.5, 0.2.The horizon length, distance interval, start point and goal point were kept the same and fixed.The results of the experiments are plotted in Figure 16 showing the comparison of velocity profiles and trajectories between λ = 0.7, 0.5, 0.2.As expected, the drone01 reaches the destination in the order λ = 0.2, 0.5, 0.7, which is also reflected in the velocity profiles.This test was performed in the lab environment where, due to space constraints and safety limitations, high speed flights were not achieved.However, these tests confirmed that the algorithm is capable of transporting the drone01 from point A to point B successfully, while manipulating the journey time.Minimization of journey time is useful in energy savings when the opposing drag forces are not significant, which is the case in these lab experiments.

Variable Goal Position Experiment
The real-time DP algorithm's effectiveness in changing the velocity profile based on goal position change is tested in the lab environment.The results of the simulation and experiments for the variable goal position are plotted in Figure 17.In this simulation and experimental test, the goal position is initially provided as 2.5 m, whereas, after 3.5 s, the goal location is changed to 5 m.This is reflected in the velocity profile and the position-time profile.The x-trajectory of the drone shows a change at 3.5 s.The velocity profile also shows a sudden change between 3 and 4 s.It shows that the velocity profile was first selected for a goal position at 3 m, but then, because of the goal position change, the velocity profile was also changed.The experiments and the numerical simulations are in good agreement.

Conclusions
In this research study, a real-time dynamic programming algorithm is proposed for achieving energy efficient motion for aerial UAV transportation.The proposed algorithm is based on minimization of cost function.The cost function is the weighted sum of the energy cost and journey time cost.A factor λ is used to assign weight-age to favor either the time of the journey or energy consumption.After the description of the algorithm, a numerical model was developed and explained, which is used for the testing of the algorithm.After testing in the numerical model, the algorithm was also tested in Software-in-the-loop Gazebo simulation environment.The lab scale experiments were also conducted to validate the simulation results.After that, low speed flight tests were conducted for λ = 0.2, 0.5, 0.7, which showed that, by changing λ to a lower value, we can reduce the journey time for low speed flights.The real-time decision-making of the algorithm was tested by changing the goal position mid-flight in another experiment.The algorithm is capable of altering the journey time or energy spent based on the provided value of λ.Due to the definition of λ, it was found that increasing the value of λ decreases the energy consumption, while the journey time increases.The future works include expansion of cost function to include the 3D motion.The cost function will be modified to remove current assumptions such as assumption of constant altitude.After that, we plan to expand the current work to dual UAV payload transportation to test the scalability of the developed algorithms.

Figure 1 .
Figure 1.Process flow diagram; first, the multi-rotor takes off and reaches the transportation altitude, the optimization algorithm then sweeps through search-space in the rolling horizon to find an optimum policy.The multi-rotor moves with this optimum policy for this distance step.This repeats for every distance step, until the final horizon is activated.During the final horizon, at each computation, the number of stages is reduced and the final state is given.

Figure 2 .
Figure 2. Multi-rotor schematic, with body frame and inertial frame.

2. 2
.1.Discretized Model Since the stage corresponds to the distance travelled by the agent, it is necessary to convert the translational dynamics equations from time domain to the distance domain.Chain rule can be used to perform such conversion.The chain rule d ẋ dt = dx dt d ẋ dx was applied to obtain the translational dynamics equation as follows.In order to reflect the change from time domain to distance domain, the index k in the cost function definition is replaced by index s in Equations (1)-(6):

Figure 3 .
Figure 3.The effects of a decrease in velocity interval that results in an exponential increase in (a) number of computations O and (b) computation time.

Figure 4 .
Figure 4.A Dynamic programming (DP) sweep with velocity interval 2 m/s and horizon length 10 m, initial velocity 2 m/s and final velocity 0 m/s.Width of the lines and intensity in redness shows a high value of combined cost function (a) λ = 0.5; (b) λ = 0.9.

Figure 5 .
Figure 5. Description of the complete system for simulation tests of the algorithm, including the controller block, the hex-rotor block, and the optimization algorithm block.

Figure 6 .
Figure 6.The variation of T T h with respect to the pitch angle and the forward velocity is shown using the color-bar.

Figure 7 .
Figure 7. Corrected power consumption calculation based on thrust irregularity.

Figure 8 .
Figure 8. Flight plots for numerical simulations performed for λ = (0.2, 0.5, 0.7).(a) trajectory of the UAV; (b) velocity of the UAV; (c) pitch angle of the UAV; (d) opposing drag force of the UAV; (e) energy consumption of the UAV.

Figure 9 .
Figure 9. Software in the loop simulation architecture.

Figure 10 .
Figure 10.Software in the loop Gazebo DJI F550 model along with the description of the manipulator; (a) DJI-F550 model in Gazebo simulator with PX4 flight stack simulation; (b) the multi-link free joints based arm.

Figure 11 .Figure 12 .
Figure 11.Publish rates and time as obtained by the ROSTOPIC tool with a sample of two consecutive messages.(a) publish rate of the velocity command topic; (b) minimum and maximum time between two commands.

Figure 13 .
Figure 13.A description of the hardware used in the testing of the RTDP algorithm.(a) software architecture of experimental setup; (b) DJI-F550 drone used in the experiments; (c) power source and route for various components of the drone.

Figure 14 .
Figure 14.Hover power consumption in real hardware and the SITL and Simulink experiments.

5. 2 .Figure 15 .
Figure 15.Lab scale RTDP based aerial transportation.(a) drone01 trajectory in simulation and experiment; (b) drone01 velocity profiles in experiments and simulations; (c) drone01 energy profiles in experiments and simulations; (d) drone01 trajectory in simulation and experiments also showing DP sweep with terminal costs in first horizons and DP sweep without terminal cost in the last horizon.

Figure 17 .
Figure 17.Lab scale RTDP based aerial transportation while the goal position changed at 3.5 s.(a) drone01 trajectory in simulation and experiment; (b) drone01 velocity profiles in experiment and simulations; (c) drone01 trajectory during experiment and numerical simulation, initial and final drop location are shown.

Table 1 .
List of all the parameters for DJI F-550 drone model.

Table 2 .
Hellman exponents for different locations

Table 3 .
Numerical simulation results with distance interval 1 m.

Table 4 .
Numerical simulation results with distance interval 2 m.

Table 5 .
Numerical simulation results with distance interval 3.12 m.

Table 6 .
Numerical simulation results with distance interval 4 m.