Stochastic Drift Counteraction Optimal Control of a Fuel Cell-Powered Small Unmanned Aerial Vehicle

: This paper investigates optimal power management of a fuel cell hybrid small unmanned aerial vehicle (sUAV) from the perspective of endurance (time of ﬂight) maximization in a stochastic environment. Stochastic drift counteraction optimal control is exploited to obtain an optimal policy for power management that coordinates the operation of the fuel cell and battery to maximize the expected ﬂight time while accounting for the limits on the rate of change of fuel cell power output and the orientation dependence of fuel cell efﬁciency. The proposed power management strategy accounts for known statistics in transitions of propeller power and climb angle during the mission, but does not require the exact preview of their time histories. The optimal control policy is generated ofﬂine using value iterations implemented in Cython , demonstrating an order of magnitude speedup as compared to MATLAB . It is also shown that the value iterations can be further sped up using a discount factor, but at the cost of decreased performance. Simulation results for a 1.5 kg sUAV are reported that illustrate the optimal coordination between the fuel cell and the battery during aircraft maneuvers, including a turnpike in the battery state of charge ( SOC ) trajectory. As the fuel cell is not able to support fast changes in power output, the optimal policy is shown to charge the battery to the turnpike value if starting from a low initial SOC value. If starting from a high SOC value, the battery energy is used till a turnpike value of the SOC is reached with further discharge delayed to later in the ﬂight. For the speciﬁc scenarios and simulated sUAV parameters considered, the results indicate the capability of up to 2.7 h of ﬂight time. investigation, J.Z., I.K., and M.R.A.; resources, J.Z., I.K., and M.R.A.; data curation, J.Z., I.K., and M.R.A.; writing—original draft preparation, J.Z., I.K., and M.R.A.; writing—review and editing, I.K. and M.R.A.; visualization, J.Z.; supervision, I.K.; project administration, I.K. All authors


Introduction
With the growing market for unmanned aerial vehicles (UAVs), a wide range of industries and organizations, including military, government, industrial, and recreational users, deploy this technology across the globe [1][2][3]. Among different types of UAVs, small unmanned aerial vehicles (sUAVs) [4] are attractive for military, aerial photography, and environmental monitoring applications due to their small size and flexible operation [5]. Considering the (i) hardware and weight constraints, (ii) limited onboard energy storage, and (iii) performance requirements for sUAVs, improving their endurance (maximizing their flight time) is of great importance for extending the duration of their missions, which could involve surveillance, search and rescue, disaster relief, traffic control, and precision agriculture, thereby motivating the development of novel propulsion systems and the implementation of optimal control policies for power and energy management. Among different propulsion systems for such sUAVs, a hybrid propulsion system consisting of a polymer electrolyte membrane fuel cell (PEMFC) and a battery has been proposed for long duration missions, e.g., in [6][7][8][9]. Other propulsion systems may incorporate energy harvesters, such as in [10]. In this paper we focus on novel approaches to the energy management of sUAVs through optimal coordination between the PEMFC and battery for the previously proposed fuel cell hybrid propulsion system.
Rule-based (e.g., thermostat-like on-off control [11]), dynamic programming-based [12], and model predictive control (MPC) [13] methods have been considered for the energy management of hybrid aircraft. As in automotive energy management applications [14], the use of simple rule-based strategies may not provide optimal performance, while the conventional formulations of MPC and dynamic programming do not directly address the flight time maximization objective. Furthermore, deterministic variants of MPC and dynamic programming may require an accurate preview of the propeller power and climb angle over a long horizon and are computationally demanding if optimization has to be performed online. Similarly, the Pontryagin maximum principle (PMP)-based guidance solutions [15] need accurate characterization of the flight environment.
In this paper, we consider a different approach to the problem of endurance maximization for a hybrid UAV with a polymer electrolyte membrane fuel cell (PEMFC) based on an application of stochastic drift counteraction optimal control (SDCOC) [16], which directly addresses the problem of maximizing the time to constraint violation in a stochastic environment. In our case, the objective is to maintain the vehicle flying for a maximum amount of time by coordinating the fuel cell and the battery to provide the requested propeller power subject to the limited amount of fuel and battery state of charge (SOC) onboard the vehicle. The transitions in aircraft climb angle and propeller power are modeled stochastically by a Markov chain with the transition probabilities determined from historical data representing typical missions of an sUAV. Then, a control policy that minimizes a cost functional reflective of expected time-to-violate constraints is determined offline through value iterations; this control policy is then deployed onboard for the online coordination of the fuel cell and the battery in the sUAV.
In a preliminary conference paper [17] by the second author of this paper, the application of SDCOC for power management of a hybrid sUAV with a direct methanol fuel cell (DMFC) was considered. While the DMFC is often considered as a suitable power source for ground vehicles [18] and has certain advantages, PEMFCs are more appealing for air mobility applications [6,7] due to their relatively lower operating temperature, allowing for a quick start-up [19], higher efficiency (up to 60% [18,20]) and power density, and higher safety due to the use of the solid electrolyte [18].
Different from [17], in this paper, we consider the application of SDCOC to the power management of a hybrid sUAV with a PEMFC rather than a DMFC. To accommodate a different fuel cell and an sUAV, the fuel cell model is changed and improvements are made to the models used to compute the propeller power and thrust, as well as the evolution of the SOC.
More importantly, the lack of the ability of the PEMFC to rapidly change its power output imposes a stringent operating constraint (rate limit on PEMFC power output), which was not treated in [17], but is treated in this paper. This rate limit increases the complexity of the problem as an extra state needs to be introduced in the model and handled in SDCOC, and it also changes the optimal policies and the optimal response of the system. For instance, as the fuel cell is not able to support fast changes in power output, the optimal policy is shown to charge the battery to a turnpike value if starting from a low initial state of charge value. If starting from a high SOC, the battery energy is used till a turnpike value of the state of charge is reached with further discharge delayed to a later phase of the flight. In either case, the high frequency chattering of fuel cell load demand power in [17], which cannot be supported by the PEMFC, is eliminated.
Additionally, in this paper, the value iterations are implemented in Cython rather than MATLAB, with an order of magnitude speedup as compared to the MATLAB implementation. As value iterations are frequently used to solve dynamic programming problems in different applications and Python is becoming increasingly popular, our results on the ten-fold speedup with Cython without a substantive increase of the code complexity are of reference value to other researchers considering the computational implementation of dynamic programming.
Furthermore, a discount factor is introduced into the cost function of SDCOC, and its impact on the convergence speed of the value iterations is illustrated. It is shown that this discount factor results in the faster convergence of value iterations, but the performance of the control policy (in terms of exit time) is decreased.
While the SDCOC theory was developed in [16], that reference did not address the fuel cell or sUAV application studied in this paper. Our approach to representing motor power demand and climb angle by a Markov chain with a finite number of states follows [21], which is the first (to the authors' knowledge) paper proposing the use of stochastic dynamic programming for automotive powertrain control applications; that paper also did not address the fuel cell or sUAV application studied in the present paper, nor the drift counteraction problem formulation.
The remainder of this paper is organized as follows: Section 2 describes the sUAV sub-systems and their models. Section 3 presents an integrated model of the hybrid system and defines the problem in a form suitable for SDCOC. Section 4 summarizes SDCOC, and Section 5 reports the results. Finally, Section 6 presents concluding remarks.

Physical Description of the Systems and Model
An sUAV with a series hybrid propulsion system, shown in Figure 1, was chosen in which the power supplied by the battery and the power supplied by the PEMFC are combined to meet the propeller motor power demand. The PEMFC uses hydrogen as the fuel, which is stored in the tank, and air from the atmosphere. A fraction of the energy generated by the PEMFC can be used to charge the battery. The fuel cell pack and battery pack are sized large enough so that they are able to meet the sUAV's mean power demand individually, should either one not be operating properly. The model used in this paper for generating the SDCOC policies captures the battery's SOC dynamics, the fuel cell's hydrogen rate dynamics, and the fuel cell load power dynamics. Thus, the states of this model are the SOC, the mass of hydrogen remaining in the gas tank, and the fuel cell load demand power. The motor power of the sUAV and climb angle are treated as operating variables, and the SDCOC controller determines changes in the fuel cell load demand power. This system level model has been implemented by combining component submodels and characterizations available from the literature; our methodology is generic and can accommodate changes in these component models.

sUAV Dynamics
A control-oriented dynamic model of the sUAV is used for SDCOC law development. The sUAV is constrained to a longitudinal flight path in a vertical plane [22]. Table 1 defines the notations for the variables used in the model. Table A1 in Appendix A lists the model parameter values, partly based on [23][24][25]. The development of lightweight electric components (batteries, fuel cells, motors) for sUAVs is an active area of research; see, e.g., [26,27]. In our model, we assumed that such lightweight components are available to be consistent with the assumed sUAV's weight. Using a flat Earth coordinate system, the longitudinal equations of motion of the sUAV are given by: where v is the velocity of the sUAV and γ is the climb angle. The lift L and drag force D are characterized as: where Neglecting vertical acceleration (i.e., with L = mg), solving Equations (1) and (2) yields the thrust required by the sUAV, Here, ρ air is a function of altitude. The power required by the sUAV is then given by:

Propeller Model
The propeller model is used to relate the torque and angular velocity generated by the electric motor to the power required by the sUAV and the velocity of the sUAV, respectively [22]. With the propulsive efficiency given by η P , the power required to drive the propeller is: According to the disk actuator theory, the ideal propeller power is: In general, the actual power required would be about 15% greater than this [28], which means P P = 1.15P P,ideal . Thus, η P can be calculated as: Combing Equation (7) with (5) and (6) yields:

Electric Motor Model
Electric motors used in sUAV applications exhibit high speed and high torque, as well as high power-to-weight ratios [29]. Assuming the power factor is equal to unity and the magnetic losses can be neglected, the output power of the motor is given by: The angular velocity of the motor in revolutions per minute (RPM) can be expressed as: which should be equal to the RPM of propeller N = v Jd P . From Equations (8) and (9), the motor current, I M , is: The motor power and motor efficiency are given by, respectively,

Fuel Cell Model
A PEMFC system is the primary power source for the sUAV. The total power generated by the fuel cell stack is calculated as: This power must cover the load demand P FC,load and the power required for auxiliaries [18], P aux , P FC,total = P FC,load + P aux , (11) where P aux is the total power required for the compressor motor, the hydrogen circulation pump, the humidifier water circulation pump, the coolant pump, the cooling fan motor, and the bias power, P 0 . After simplifications, P aux could be written as a function of the fuel cell current [30], The fuel cell current is a function of the current density and the fuel cell area, where i FC could be obtained by solving the equation, The reversible cell potential U rev is related to the molar specific Gibbs free energy ∆g f and the number of ions passed in the reaction n e [24], The activation polarization U act is a result of the energy required to initiate the reaction, which can be described by the semi-empirical Tafel equation [31][32][33], where c 0 and c 1 depend on temperature. When the current density is small, this equation can be modified [34] as: where c 0 = −5.8 × 10 −4T + 0.5736 and c 1 = RT n e α FC F . The ohmic losses U ohm are due to the resistance to the flow of (i) ions in the membrane and in the catalyst layers and (ii) electrons through the electrodes [18], The concentration polarization U conc , is given by With the parameters given in Appendix A, the polarization curve of a single cell is plotted in Figure 2. In reality, the current density could be controlled within a certain range. After excluding the very low current densities (i FC < 0.1 A/cm 2 ), (13) could be linearized [34,35] as: where U OC is the voltage at which the linearized curve crosses the y-axis, which should not be confused with U rev . Unlike ground vehicles, the sUAV changes its orientation during the flight, which would change the inner resistance of the fuel cell by about five times [36] from horizontal to vertical. To this end, (17) is modified to account for this effect as: whereR FC =R FC (1 + k 0 sin(k 1 |γ|)). Combining Equation (18) with Equations (10)-(12) yields: where R FC =R FC /A FC . Overall, I FC can be expressed as:

Battery Model
The battery model represents a pack of Model 21700 lithium polymer battery cells. The battery pack is assembled in such a way that the cells are connected in series. According to [37], the open-circuit voltage of the battery can be estimated as: The battery power and the fuel cell load demand power sum up to provide the electrical power to the motor such that: Further, the current drawn from the battery set is obtained by solving: which should not exceed its maximum discharge current I B,max . The battery Coulombic efficiency in the battery model is assumed to be 100%. Thus, the SOC is satisfied as: where t, t 0 , and SOC 0 are the current time, initial time, and initial SOC, respectively.

Hybrid System
The fuel cell load demand power, which will be indicated as P FC in the following section, is the only variable under control. Due to the output characteristic of the PEMFC, the change of P FC is chosen to be 5% of its maximum power, which depends on γ according to (18). The fuel cell load demand power dynamics are then: where u ∈ {−1, 0, 1}, ∆P FC = 5%P FC,max , and P FC (t n ) is the fuel cell load demand power at t = t n . Here, three different values of u correspond to decreasing, sustaining, or increasing P FC . According to Equation (19), the maximum load cell power can be calculated as: Using Equations (25) and (26), the final expression representing the fuel cell load demand power dynamics is given by: The SOC dynamics are obtained by differentiating both sides of (24) with respect to time, Combing (28) with (21) and (23) yields, where U B,OC = SOC(U B,max − U B,min ) + U B,min . The motor power and battery power are related by: where S f is referred to as the split fraction, which could be calculated from (22) as: Using Equations (29) and (30), the final expression representing the SOC dynamics is given by: where the internal resistance R B,int and the battery capacity C B are assumed to be constant [38]. The mass of remaining fuel dynamics is obtained from Faraday's law as: where I FC is calculated from P FC as shown in (20).
Equations (25), (29), and (32) Table A1, the maximum fuel cell output power is 795 W at γ = 0 deg, 496.14 W at γ = ±10 deg, and 335.71 W at γ = ±20 deg. The theoretical maximum power for the battery series (of eight batteries) is 2940 W, due to the limitation of the discharge current (35 A); the maximum power of the battery is 1176 W at any climb angle.

Problem Formulation
The forward Euler method is used in this paper to approximate the time derivatives. During each time segment ∆t, the motor power of the sUAV is w 1 and the climb angle is w 2 .
The following updated equations approximately model the sUAV hybrid propulsion system: where SOC(t n ) and M FR (t n ) are the state of charge and the mass of hydrogen remaining at t = t n . The system is controlled by the change of the fuel cell load demand power ∆P FC at each discrete time instant. Thus, the fuel cell power is modeled as: The motor power and climb angle are typically unknown a priori. In this paper, a Markov chain model is used to describe the evolution of w 1 and w 2 with the transition probabilities identified from the historical data. Once particular w 1 and w 2 values are encountered, a prediction of their probability distribution over the next time segment will be made using the Markov chain model.
The objective of the stochastic endurance maximization problem is to determine a control law that maximizes the time the sUAV can travel before the system states exit a prescribed set, The constraints on the SOC and M FR in Equation (33) reflect the minimum and maximum values of the battery state of charge and the mass of fuel, respectively. The constraints on P FC are reflective of the fact that the fuel cell load demand power cannot (i) exceed the maximum power of the fuel cell and (ii) be negative.
The optimal control policy developed in this paper through the application of DCOC specifies the change in fuel cell load power over one step, ∆P FC (t) = P FC (t + 1) − P FC (t), as a function of SOC(t), the mass of hydrogen fuel left, M FR (t), and the current fuel cell load power, P FC (t). The battery power complements fuel cell power in matching propeller requested power.

Markov Chain Modeling
A Markov chain model [39] is used to represent the evolution of w (in our case, w = [w 1 , w 2 ]). The transition probabilities of the Markov chain are defined as: where W i and W j (i, j = 1, · · · , N) are cells partitioning the feasible range of the operating conditions. The state dependence of the transition probabilities adds flexibility in reflecting the typical motor power and climb angle profiles of an sUAV. The p ij 's can be obtained from the statistical analysis of the historical flight data, where M ij is the total number of transitions from the cell W i to the cell W j (i.e., w(t n ) ∈ W i , w(t n+1 ) ∈ W j ), while M i is the total number of transitions from W i to any other cell, including W i [21].

Control Law Construction
Here, we adopt the SDCOC framework from [16], which is applied to a discrete-time model with the following form, where x(t n ) is the state vector, u(t n ) is the control vector, and w(t n ) is the vector of operating variables, which is not known until the time instant t n . The system has both control constraints and state constraints imposed as u(t n ) ∈ U and {x(t n ), w(t n )} ∈ G, respectively, where U and G are specified sets. A Markov chain with a finite number of states is used to represent transitions in w(t n ) ∈ W = {w p : p ∈ P}. Here, P is the size of the grid for w. The transition probability from w(t n ) = w i ∈ W to w(t n+1 ) = w j ∈ W is denoted by p ij , expressed in (34). In a discounted variant of SDCOC, the objective is to determine a control function u(x, w) such that, with u(t n ) = u(x(t n ), w(t n )), a cost functional of the form, is maximized. Here, τ x 0 ,w 0 ,u (G) ∈ Z + represents the first time instant when the trajectories of x(t n ) and w(t n ), which are denoted by {x u , w u } and result from applying the control u(t n ) = u(x(t n ), w(t n )) with values in the set U, exit the prescribed compact set G. δ is a discount factor [40]. For δ = 1, (37) maximizes the exit time, i.e., the time till the prescribed constraints become violated. The use of the discount factor 0 < δ < 1 facilitates faster convergence of the value iterations. Note that {x u , w u } is a random process, τ x 0 ,w 0 ,u (G) is a random variable, and E x 0 ,w 0 [·] denotes the conditional expectation given the initial values of x and w.
To solve (37), the value iterations approach is used, which produces a sequence of value function approximations, V n , at specified grid-points x ∈ {x k : k ∈ K}, where u ∈ {u m : m ∈ M} is a specified grid for u. Here, K and M are the size of the grid for x and u, respectively. In each iteration, once the values of V n−1 at the grid-points have been determined, linear or cubic interpolation is employed to approx- A termination criterion of the form |V n (x, w i ) − V n−1 (x, w i )| ≤ for all x ∈ {x k : k ∈ K} and i ∈ P, where > 0 is sufficiently small, is used.
Once an approximation of the value function, V * , is available, an optimal control law is determined as:

sUAV Configuration and Model Parameters
The model was parameterized for a 1.5 kg sUAV [23] that can be used for aerial photography and environmental monitoring applications. The minimum and maximum SOC values were set to SOC min = 0.2 and SOC max = 0.8. The minimum and maximum values of M FR were set as M FR min = 2 g and M FR max = 9 g. For the value iterations, the SOC grid was chosen with a step size of 0.05, and the M FR grid was chosen with a step size of 0.5 g. The grid for the control variable u was set as {−1, 0, 1}.
The transition probabilities for the operating variables (motor power and climb angle) were obtained from the time histories of the sUAV motor power and climb angle using (35) and assuming a time step ∆t = 1 s. These time histories were based on a scenario in which an sUAV follows a moving ground vehicle that sUAV operators are interested in monitoring. In this scenario, the ground vehicle, and consequently the sUAV, is assumed to be traveling with the velocity profile defined by concatenating the EPA Highway Cycle [41] nine times. For the sUAV, the speed profile is modified to remain above the stall speed while avoiding extreme acceleration values.
The climb angle time history, shown in Figure 3, was obtained from the Google Earth elevation profile for a path from Monroe, West Virginia, to Princeton, West Virginia, with the help of GPS visualizing software [42]. See [43] for the assessment of the accuracy of such extracted profiles.

Control Law Computation
Cython was used for control law computations as it is more efficient than MATLAB in handling nested for loops and two-dimensional interpolation. In our numerical experiments with dynamic programming, Cython was about 10 times faster than MATLAB.
To further speed up the value iterations, a discount factor was introduced. When testing the effect of the discount factor on the optimal policy, a zero climb angle (γ = 0) was assumed, which means that the only operating variable was the motor power. Table 2 shows the average exit time based on 100 random simulations for discount factors from 0.91 to 0.99. The stopping criterion was chosen with = 10 −10 for all δ. Computations were performed on a Hasee K780G-i7 laptop with a CORE i7-4710MQ (2.5-3.5 GHz) processor and 24 GB of RAM.Note that the number of iterations and the computing time decrease as the discount factor decreases, but so does the exit time. The discount factor δ = 0.95 was ultimately chosen as a compromise between value iteration convergence speed and solution accuracy. Figure 5 shows that the value iterations with a discount factor of δ = 0.95 converge much faster than those with δ = 1.

Endurance Maximization Results
We used = 10 −10 in the stopping criterion for the value iterations. Figure 6 illustrates the resulting control policy. Note that when SOC is low, the control policy calls for an increase in P FC to charge the battery. This is reasonable given that the fuel cell cannot alone respond rapidly to fast changes in motor power request, and hence, the battery has to be charged to do so.

Endurance Maximization Results
We used = 10 −10 in the stopping criterion for the value iterations. Figure 6 illustrates the resulting control policy. Note that when SOC is low, the control policy calls for an increase in P FC to charge the battery. This is reasonable given that the fuel cell cannot alone respond rapidly to fast changes in motor power request, and hence, the battery has to be charged to do so. The simulation results are given for three cases in Figures 7-18. The first case (Scenario I) corresponds to a higher initial SOC, and the second case (Scenario II) considers a lower initial SOC. The third scenario is for a mid-range initial SOC and is used to confirm the The simulation results are given for three cases in Figures 7-18. The first case (Scenario I) corresponds to a higher initial SOC, and the second case (Scenario II) considers a lower initial SOC. The third scenario is for a mid-range initial SOC and is used to confirm the SOC behavior observed in the first two scenarios. In all cases, the initial fuel mass and initial fuel cell power are the same: M FR,0 = 6 g and P FC,0 = 0 W. The dashed lines in              The initial SOC is 0.8, and it decreases rapidly until it reaches a value of about 0.5. Then, it stays near this value between 2000 and 5000 s. Finally, when the mass of hydrogen reaches a relatively low value, the SOC starts to decrease and continues to decrease until the constraints are violated. The fuel cell load demand power keeps a relatively low value during the whole flight, and the mean value of the split fraction is negative during the 2000 to 5000 s time interval, which is the period when the SOC is kept at about 0.5. Figures 11-13 illustrate the closed-loop response for the second simulation scenario. The initial SOC is 0.2. The battery is charged until it reaches a value of about 0.5 to enable the battery to sustain rapid propeller power fluctuations. Then, the SOC stays near that value of 0.5 between 500 and 1500 s. Finally, when the mass of hydrogen reaches a relatively low value, the SOC starts to decrease and continues to decrease until the constraints are violated. The fuel cell load demand power increases rapidly at first to charge the battery, then it keeps a relatively low value during the rest of the flight. The mean value of the split fraction is negative from the beginning to about 1500 s, which is the period when the battery is charged from SOC = 0.2 to about 0.5.
According to the results from Scenarios I and II, a turnpike behavior of the battery SOC is observed, with the SOC converging to about 0.5 and staying at that value for a while before decaying. To confirm this turnpike behavior, we additionally considered the responses with the developed policy to the initial SOC of 0.6. These are shown in Figures 15-18. The exit times for Scenarios I, II, and III were, respectively, 9890 s, 6019 s, and 8478 s.

Conclusions
This paper considers an endurance maximization problem for a small unmanned aerial vehicle (sUAV) with a hybrid propulsion system consisting of a polymer electrolyte fuel cell and a battery, both driving an electric motor connected to a propeller. A stochastic drift counteraction optimal control (SDCOC) approach is employed to develop control policies for optimally coordinating the fuel cell and the battery while enforcing the constraints on the fuel cell power output rate of change. Cython is used to implement value iterations and demonstrated an order of magnitude speedup versus MATLAB without increasing the code complexity, due to its efficiency in handling nested for-loops. Additionally, the use of a discount factor is shown to significantly speed up the value iterations at the price of decreased performance. The results illustrate the effectiveness of the SDCOC strategy in regulating the charging behavior of the battery by the fuel cell to provide the capability to respond to rapidly varying motor power demand.
The proposed approach based on SDCOC is particularly suitable for handling stochastic disturbances and can be applied to sUAVs exposed to headwinds with the headwind modeled as a stochastic disturbance. Accounting for such wind disturbances, extensions to include thermal dynamics, systematic and comprehensive comparison with other energy management approaches and propulsion system choices, the systematic study of the robustness to model uncertainties, as well as actual flight experiments represent directions for continuing research. In particular, our study of the discount factor impact on the computation time and exit time suggests that the flight time is sensitive to the choice of the energy management strategy; our approach based on SDCOC is optimal in the sense of maximizing expected flight time within a stochastically modeled environment.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: The parameters of the sUAV model described in the paper are listed in Table A1.