# Dynamic Power Management for Portable Hybrid Power-Supply Systems Utilizing Approximate Dynamic Programming

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Problem Formulation

#### 2.1. System Configuration and Related Background

#### 2.2. State Equation and Performance Index

_{cap}

_{,i}(t) is the charge state of the ith supercapacitor at time t, r

_{bat}(t) is the charge state of the battery at time t, and ∆r

_{bat}(t) = r

_{bat}(t − 1) − r

_{bat}(t) is the difference in the charge state of the battery. Note that in this paper, we assume that the state-of-charge (SOC) values of the supercapacitors and the battery are all available. In practical implementations, these SOC values can be estimated by e.g., extended Kalman filters [19]. The control input vector we consider for the DPM problem contains information regarding the proportion of the charge that each component (i.e., the ith supercapacitor or battery) supplies to meet the workload and the level of the battery charge transferred to each supercapacitor. As in [18], an upper bound R

_{th}is placed on the amount of charge that can be transferred from a battery to each supercapacitor within a time interval so that the battery charge may not be exhausted during the process of charging the supercapacitors. More specifically, the control input vector consists of the following five entries, all in the range of [0,1]:

_{ix}(t), i = 1, 2, is the normalized level of the ith supercapacitor being charged by the battery; thus, the amount of charge transferred from the battery to the ith supercapacitor during the time interval (t, t + 1) is R

_{th}·a

_{ix}(t). The other entries, a

_{1y}(t), a

_{2y}(t), and a

_{bat}(t), are the proportions of the charges that the two supercapacitors and battery supply to meet the workload w(t) in the time interval (t, t + 1). Because these entries are all proportions, they satisfy:

- ∙
- In [18], it is assumed that the charge values of the batteries and supercapacitors take only discrete values. In this study, we omit this assumption; thus, r
_{cap}_{,i}(t) and r_{bat}(t) are all real-valued. - ∙
- In the model in [18], supercapacitors are constrained to not be simultaneously charged by the battery and discharged by the load. In this paper, we omit this constraint.
- ∙
- In the model in [18], a decision for assigning the source to the workload is carried out such that only one electronic energy supply source can transfer the required charge to the load. Hence, the control inputs in [18] are all binary numbers, and only one member of a
_{iy}(t) and a_{bat}(t) is one. In this study, we omit this constraint. As a result, a_{iy}(t) and a_{bat}(t) are nonnegative real numbers satisfying ∑_{i}a_{iy}(t) + a_{bat}(t) = 1. - ∙
- In the model in [18], it is assumed that at most one supercapacitor can be charged by the battery. This assumption is omitted in this paper.

**·**,

**·**) is the stage cost function; and T is the final time whose value is ∞ in infinite-horizon problems. By minimizing this PI over all admissible state feedback control policies ϕ

_{t}: X→U, the optimal control problem can be solved. For the performance index in the optimization of power flows for HPSSs, one may resort to some ad-hoc stage cost functions (e.g., [18]). However in this paper, it turns out that a traditional objective function which is more clear and widely used in the field of linear quadratic optimal control works well. More precisely, we derive an ADP-based solution to the dynamic power management of HPSSs with the purpose of minimizing the battery charge consumption rate and maintaining the charge level of supercapacitors while keeping the control efforts reasonably low. For the purpose, the stage cost function l(

**·**,

**·**) of the DPM task is chosen as follows:

_{1}, w

_{2}, w

_{3}, w

_{4}, and β

_{bat}) are obtained empirically via a tuning process. In the DPM in [18], the Q-learning method [20] was used to derive an approximation for the optimal power management strategy, and the stage function was defined as

_{bat}(t), (r

_{cap}

_{,i}(t) − 0.5${R}_{\mathrm{max}}^{cap}$)

^{2}, i = 1, 2, and the control-input-related quadratic terms, ${a}_{ix}^{2}$(t), ${a}_{iy}^{2}$(t), i = 1, 2, and ${a}_{bat}^{2}$(t). Finally, note that our stage cost Equation (19) deals with the control effort directly with weighted summation of the quadratic input terms, whereas in the objective Equation (20) of the reference paper [18], consideration of the control efforts can be treated indirectly via the expectation operation.

## 3. Approximate Dynamic Programming Approach to Dynamic Power Management

#### 3.1. Preliminaries

_{t}: X→U, the optimal value of J can be found. This minimal PI value is denoted J*, and the optimal state-feedback function achieving the minimal value is denoted by ϕ* t. The state value function is defined as the minimum total expected cost achieved by an optimal control policy from the given initial state x(0) = z, i.e.,

_{0}, the optimal PI value can be expressed as J* = V*(x

_{0}). According to stochastic optimal control theory [6,7], the state value function in Equation (23) is the fixed point of the Bellman equation:

#### 3.2. ADP-Based Solution Procedure

**·**,

**·**) is the quadratic stage cost function specified in Equation (19). To solve the above DPM problem for the portable HPSS, we propose an ADP-based solution procedure utilizing the approximate value function (AVF) policy approach of O’Donoghue, Wang, and Boyd [9]. In this AVF policy approach [9], the convex quadratic function:

_{i}, p

_{i}, and q

_{i}) satisfy the iterated Bellman inequalities:

**t**can be expressed using the state x(t) and input u(t) as follows:

_{i}, i = 1, …, M to satisfy the following:

_{i}[8] for notational convenience. Here, the expectation of the right-hand side of Equation (37) can be further expanded as follows:

_{i}

_{−1}is the matrix variable defined by

- Choose the parameters of the problem: γ, λ, M, R
_{th}, ${R}_{\mathrm{min}}^{cap}$, ${R}_{\mathrm{max}}^{cap}$, and ${R}_{\mathrm{min}}^{bat}$. - Estimate the 1st and 2nd moments of the external load demands, $\overline{w}$ and $\overline{{w}^{2}}$, from the training data.

- Initialize the decision-making time t = 0, and choose x(0)= x
_{0}. - Compute the stage cost matrix $L$in Equation (30) and the constant matrix ${\mathrm{\Xi}}^{\left(k\right)}$ in Equation (64).
- Define the LMI variables:
- (1)
- Define the basic LMI variables: P
_{i}, p_{i}, and q_{i}in Equation (27). - (2)
- Define the derived LMI variables: G
_{i}in Equation (39) and S_{i}_{−1}in Equation (47). - (3)
- Define the S-procedure multipliers: ${\xi}_{i}^{\left(k\right)}$ in Equation (63).

- Find the approximate state value functions, ${\widehat{V}}_{i}$, by solving the following convex optimization problem:$$\begin{array}{l}\mathrm{min}\text{}{\widehat{V}}_{0}\left({x}_{0}\right)={x}_{0}^{T}{P}_{0}{x}_{0}+2{p}_{0}^{T}{x}_{0}+{q}_{0}\\ \text{subjectto}\\ \text{}L+\gamma {G}_{i}-{S}_{i-1}+{\displaystyle \sum _{k=1}^{3}{\xi}_{i}^{\left(k\right)}{\mathrm{\Xi}}^{\left(k\right)}\ge 0,\text{\hspace{0.17em}\hspace{0.17em}}i=1,\mathrm{...},M,}\\ \text{}{S}_{M-1}={S}_{M},\text{\hspace{0.17em}}\\ \text{}{P}_{i}\ge 0,\text{\hspace{0.17em}\hspace{0.17em}}i=0,\mathrm{...},M,\\ \text{}diag\left({\xi}_{i}^{\left(k\right)}\right)\ge 0,\text{\hspace{0.17em}\hspace{0.17em}}i=1,\mathrm{...},M,\text{\hspace{0.17em}\hspace{0.17em}}k=2,3.\end{array}$$
- Obtain the ADP controllers on the basis of$${\mathrm{\Phi}}_{t}(x)=\underset{u}{\text{argmin}}(\ell (x,u)+\gamma E{V}_{t+1}(f(x,u,w)))\mathrm{for}t=0,\mathrm{...},M$$$${\mathrm{\Phi}}_{t}(x)=\underset{u}{\text{argmin}}(\ell (x,u)+\gamma E{V}_{T+1}(f(x,u,w)))\mathrm{for}t>M$$

## 4. Simulation Results and Trajectories

#### 4.1. An Illustrative Example

_{bat}(0) = 1000 [C]. The initial state conditions for the HPSS were assumed as follows:

_{load}(t) was transformed into w(t) = w

_{load}(t)/10. We performed simulations based on the ten scenarios of Figure 2, and evaluated the resultant controllers. In the evaluation stage, we used a 10-fold cross-validation method for the iPhone workload data of Figure 2. For the simulation results, we considered all possible 10 different partitions of the training and test subsets. For the parameters of the problem, we chose the following:

_{1}, w

_{2}, w

_{3}, w

_{4}, and β

_{bat}were obtained empirically via a tuning process based on a subset of the training data. Simulation results show that the above set of weight values yielded reasonably good results. To develop a more disciplined design guideline for the weight values is an important topic that can be covered in follow-up works.

**Figure 2.**Considered workload scenarios, which are essentially based on the iPhone battery current measurements of [18].

#### 4.2. Discussions and Performance Comparison

_{bat}(t), and the dashed line is for the sum of supercapacitor charges, r

_{cap}

_{,1}(t) + r

_{cap}

_{,2}(t). Note that in the sixth picture, the scenario consists of heavy loads (such as playing games, browsing the Internet, and watching YouTube videos), and the charges in the battery and the supercapacitors were completely depleted before the end of the considered time horizon. Figure 3 shows that the proposed ADP-based DPM procedure resulted in reasonable dynamic charge management for the HPSS. From the simulation results, one can see that when the demand charge changed abruptly, then the control actions for the supercapacitors reacted promptly. It is well known that in general, the battery alone cannot handle the situation of rapidly changing loads effectively due to its low power density. By comparison, Figure 3 shows that the HPSS utilizing the ADP-based DPM was capable of handling such situation reasonably well. We believe that this capability is ensured by the stage cost function of Equation (19), which is defined with the purpose of minimizing the battery charge consumption rate and maintaining the charge level of supercapacitors appropriately for speedy reaction to fluctuating loads. Note that our ADP-based DPM procedure can be applied to a significantly larger class of stage cost functions as long as they are quadratic-program (QP) representable [23] (i.e., l(

**·**,

**·**) is convex quadratic plus a convex piecewise linear function, possibly including linear equality and inequality constraints).

**Figure 3.**Charge-depleting patterns of the battery and supercapacitors resulting from the ADP-based approach of this paper.

**Figure 4.**Charge-depleting patterns of the battery and supercapacitors resulting from the Q-learning-based DPM.

**Figure 5.**Learning curve (the average total cost vs. the iteration over a set of 10 simulation runs): Q-learning (dashed) and ADP (solid).

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflict of Interest

## References

- Burke, A. Ultracapacitors: Why, How, and Where is the Technology. J. Power Sources
**2000**, 91, 37–50. [Google Scholar] [CrossRef] - Yap, H.T.; Schofield, N.; Bingham, C.M. Hybrid energy/power sources for electric vehicle traction systems. In Proceedings of the IEEE Power Electronics, Machines and Drives (PEMD) Conference, Edinburgh, UK, 31 March–2 April 2004; Volume 1, pp. 61–66.
- Miller, J.M.; Deshpande, U.; Dougherty, T.J.; Bohn, T. Power electronic enabled active hybrid energy storage system and its economic viability. In Proceedings of the IEEE Applied Power Electronics Conference and Exposition, Washington, DC, USA, 15–19 February 2009; pp. 190–198.
- Wang, L.; Liu, X.; Li, H.; Im, W.S.; Kim, J.M. Power electronics enabled energy management for energy storage with extended cycle life and improved fuel economy in a PHEV. In Proceedings of the IEEE Energy Conversion Congress and Exposition (ECCE), Atlanta, GA, USA, 12–16 September 2010; pp. 3917–3922.
- Wang, Y.; Boyd, S. Approximate dynamic programming via iterated bellman inequalities. Int. J. Robust Nonlinear Control
**2015**, 25, 1472–1496. [Google Scholar] [CrossRef] - Bertsekas, D. Dynamic Programming and Optimal Control: Volume 1; Athena Scientific: Belmont, MA, USA, 2005. [Google Scholar]
- Bertsekas, D. Dynamic Programming and Optimal Control: Volume 2; Athena Scientific: Belmont, MA, USA, 2007. [Google Scholar]
- O’Donoghue, D.; Wang, Y.; Boyd, S. Min-Max approximate dynamic programming. In Proceedings of the 2011 IEEE International Symposium on Computer-Aided Control System Design (CACSD), Denver, CO, USA, 20–23 September 2011; pp. 424–431.
- O’Donoghue, B.; Wang, Y.; Boyd, S. Iterated approximate value functions. In Proceedings of the European Control Conference (ECC), Zurich, Switzerland, 17–19 July 2013; pp. 3882–3888.
- Romaus, C.; Bocker, J.; Witting, K.; Seifried, A. Optimal energy management for a hybrid energy storage system combining batteries and double layer capacitors. In Proceedings of the Energy Conversion Congress and Exposition (ECCE), San Jose, CA, USA, 20–24 September 2009.
- Chen, G.; Bao, Z.J.; Yang, Q.; Yan, W.J. Scheduling strategy of hybrid energy storage system for smoothing the output power of wind farm. In Proceedings of the IEEE Control and Automation (ICCA), Hangzhou, China, 12–14 June 2013; pp. 1874–1878.
- Choi, M.E.; Kim, S.W.; Seo, S.W. Energy Management Optimization in a Battery/Supercapacitor Hybrid Energy Storage System. IEEE Trans. Smart Grid
**2012**, 3, 463–472. [Google Scholar] [CrossRef] - Gee, A.M.; Robinson, F.V.P.; Dunn, R.W. Analysis of battery lifetime extension in a small-scale wind-energy system using supercapacitors. IEEE Trans. Energy Convers.
**2013**, 28, 24–33. [Google Scholar] [CrossRef] - Blanes, J.M.; Gutierrez, R.; Garrigos, A.; Lizan, J.L.; Cuadrado, J.M. Electric vehicle battery life extension using ultracapacitors and an FPGA controlled interleaved buck-boost converter. IEEE Trans. Power Electron.
**2013**, 28, 5940–5948. [Google Scholar] [CrossRef] - Min, S.W.; Kim, S.J. Optimized installation and operations of battery energy storage system and electric double layer capacitor modules for renewable energy based intermittent generation. J. Electr. Eng. Technol.
**2013**, 8, 238–243. [Google Scholar] [CrossRef] - Gao, L.; Dougal, R.A.; Shengyi, L. Power enhancement of an actively controlled battery-ultracapacitor hybrid. IEEE Trans. Power Electron.
**2005**, 20, 236–243. [Google Scholar] [CrossRef] - Mirzaei, A.; Farzanehfard, H.; Adib, E.; Jusoh, A.; Salam, Z. A fully soft switched two quadrant bidirectional soft switching converter for ultra capacitor interface circuits. J. Power Electron.
**2011**, 11, 1–9. [Google Scholar] [CrossRef] - Mirhoseini, A.; Koushanfar, F. Learning to manage combined energy supply systems. In Proceedings of the IEEE/ACM International Symposium on Low-Power Electronics and Design, Fukuoka, Japan, 1–3 August 2011; pp. 229–234.
- Plett, G.L. Extended Kalman Filtering for Battery Management Systems of LiPB-based HEV Battery pack: Part 1. Background. J. Power Sources
**2004**, 134, 277–292. [Google Scholar] [CrossRef] - Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Boyd, S.; Ghaoui, L.E.; Feron, E.; Balakrishman, V. Linear Matrix Inequalities in System and Control Theory; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1994. [Google Scholar]
- Mirhoseini, A.; Koushanfar, F. HypoEnergy: Hybrid supercapacitor-battery power-supply optimization for energy efficiency. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, Grenoble, France, 14–18 March 2012; pp. 1–4.
- Boyd, S.; Mueller, M.T.; O’Donoghue, D.; Wang, Y. Performance bounds and suboptimal policies for multi-period investment. Found. Trends Optim.
**2014**, 1, 1–69. [Google Scholar] [CrossRef] - Watkins, C. Learning from Delayed Rewards. Ph.D. Thesis, Cambridge University, Cambridge, UK, 1989. [Google Scholar]

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Park, J.; Chung, G.-B.; Lim, J.; Yang, D. Dynamic Power Management for Portable Hybrid Power-Supply Systems Utilizing Approximate Dynamic Programming. *Energies* **2015**, *8*, 5053-5073.
https://doi.org/10.3390/en8065053

**AMA Style**

Park J, Chung G-B, Lim J, Yang D. Dynamic Power Management for Portable Hybrid Power-Supply Systems Utilizing Approximate Dynamic Programming. *Energies*. 2015; 8(6):5053-5073.
https://doi.org/10.3390/en8065053

**Chicago/Turabian Style**

Park, Jooyoung, Gyo-Bum Chung, Jungdong Lim, and Dongsu Yang. 2015. "Dynamic Power Management for Portable Hybrid Power-Supply Systems Utilizing Approximate Dynamic Programming" *Energies* 8, no. 6: 5053-5073.
https://doi.org/10.3390/en8065053