An Integrative DR Study for Optimal Home Energy Management Based on Approximate Dynamic Programming

This paper presents an integrative demand response (DR) mechanism for energy management of appliances, an energy storage system and an electric vehicle (EV) within a home. The paper considers vehicle-to-home (V2H) and vehicle-to-grid (V2G) functions for energy management of EVs and the degradation cost of the EV battery caused by the V2H/V2G operation in developing the proposed DR mechanism. An efficient optimization algorithm is developed based on approximate dynamic programming, which overcomes the challenges of solving high dimensional optimization problems for the integrative home energy system. To investigate how the participation of different home appliances affects the DR efficiency, several DR scenarios are designed. Then, a detailed simulation study is conducted to investigate and compare home energy management efficiency under different scenarios.


Introduction
As a key feature of the smart grid, demand response (DR) brings reliability and efficiency to the electric system through reducing or shifting peak demand for energy [1].Smart homes that can monitor and control their usage of electricity in real time are considered to have the greatest potential for DR [2].Moreover, with more batteries and electric vehicles used in household environments, the potential will be greater.To fulfill the full potential, efficient DR mechanisms and advanced home energy management systems (HEMS) are crucial components and need to be studied and designed carefully [3,4].
Currently, a large body of DR research on HEMS exists, and optimal operation of home appliances considering users' electricity cost or comfort in response to a dynamic electricity price is a major concern.For example, in [5], a learning-based optimal DR policy for a heating ventilation and air conditioning system (HVAC) is developed to minimize the electricity cost.In [6], an end-users' comfort-oriented DR strategy for residential HVAC aggregation scheduling is investigated.In [7], the DR capability of electrical water heaters (EWH) is evaluated for load-shifting and balancing reserve.In [8], coordination control of multiple batteries for optimal HEMS is studied.However, these DR strategies focus on scheduling of only a single type of appliance.
Normally, different types of DR appliances and ESS could be properly scheduled to coordinate one another and improve user's benefits.Moreover, a V2H/V2G-enabled EV with bi-directional power flow could also create synergies with smart appliances and ESS.Therefore, it is necessary to study the optimal DR strategy from an integrative perspective considering different operating features of all kinds of home appliances, ESS and V2H/V2G-enabled EV.
Although some pioneering research works about integrative DR have been performed, few provide a fully-integrated solution for an optimal DR covering all kinds of home appliances, ESS and EV.For example, in [9], coordinated control of several types of home appliances is implemented in an optimization-based DR controller, but ESS and EV are not considered.In [10], the joint operation management of different home appliances, micro-CHP and an energy storage device is optimized, but EV is not considered.In [11], various home appliances are classified into five different categories according to their operation characteristics, and an integration of the five types of home appliances for DR scheduling is investigated; however, the model of ESS and EV is highly simplified by neglecting their energy level constraints.In [12], a dynamic energy management framework considering all types of home appliances and EV is proposed, but ESS and the V2H/V2G functions of the EV are not considered.In [13], an optimal DR strategy considering the collaborative operation of an ESS and an EV with V2H/V2G functions is presented, but the DR potential of smart appliances is not exploited.In [14], different types of home appliances and an EV with V2H capability are optimally scheduled, but the coordination control between ESS and the EV battery is not presented.In the latest research [15], interestingly, the collaboration of all types of appliances, ESS and V2H/V2G-enabled EV is evaluated using a mixed-integer linear programming (MILP) DR model, which provides a meaningful reference for study in this aspect.
Nevertheless, the aforementioned studies on the integrative DR strategy only consider the consumer's electricity cost or thermal comfort and neglect the degradation cost of the EV battery.More frequent charging and discharging cycles caused by the V2H/V2G operation accelerate battery degradation and increase the wear cost [16], which has remained as a main barrier to the integration of V2H/V2G-enabled EV with household DR [17].Therefore, whether the V2H/V2G services will help end-users save enough electricity cost to offset the additional degradation of the EV battery would have to be carefully evaluated in an integrative DR scheme.
On the other hand, a solution problem arises in developing an optimal integrative DR strategy.As more smart appliances participant in household DR and more detailed scheduling in a short time-slot framework becomes entailed, an exponentially larger optimization model is needed.The significant growth in dimension combined with complex objectives and various constraints of the appliances makes the integrative DR optimization problem very difficult to solve in limited time.However, most existing DR research works focus on the modeling process in the framework of MILP [15,18], MINLP [10], a game [19], etc., and ignores the solution problem by tailoring the model to suit a certain commercial solver.Other research on the DR algorithm [4] has also not been found to involve the discussion about how to handle the rapid growth in dimensions with the increase of the number of controllable appliances.
Approximate dynamic programming (ADP) [20] is a powerful tool for high dimensional function optimization problems, which has been examined by some previous works [8,[21][22][23][24].For example, in [8,23,24], ADP is used to solve the optimal battery management and multi-battery coordination control problem in smart home environments.However, the integrated optimization of all kinds of smart home appliances, the energy storage system and EV, which includes high dimensional continuous and integer variables in the framework of ADP, has not been found in the literature.
In this paper, an integrative DR study on optimal scheduling of different types of appliances, ESS and a V2H/V2G-enabled EV considering the battery degradation cost of the EV is presented.A solution method based on ADP is developed for the integrative DR optimization problem.In the developed method, a polynomial function for optimal value function approximation is designed to suit the problem.The contributions of the paper include: (1) the formulation of a joint optimization of all types of residential appliances, ESS and EV considering the electricity consumption cost, user's thermal comfort and battery degradation of the EV; (2) a solution method based on the design of ADP for the integrative DR scheduling to overcome the difficulty of solving the high dimensional optimization problem due to the increasing number of DR-capable appliances; (3) detailed comparative analysis about how the integration of different home appliances affects the DR efficiency.
The rest of the paper is organized as follows.Section 2 gives the framework of the HEMS and models of appliances, energy storage system and EV for DR optimization in different scenarios.In Section 3, the proposed optimization algorithm is presented.The case studies are given in Section 4. Section 5 draws the conclusions.

Modeling of the Integrative DR Strategy
In this section, the integrative DR problem is formulated aiming for optimal day-ahead scheduling of all DR appliances within a home.A suitable cyber infrastructure is presumptively equipped to collect appliances' parameters, end-user's preferences and to receive day-ahead hourly electricity price signals from the utility.Weather information about ambient temperature and solar radiation is accessed through the Internet or forecasted through a local forecasting system in the HEMS.Other parameters such as arrival/departure time of the EV, hot water usage and critical appliances' consumption are considered to be known by occupants' habit statistics.

Models and Constraints of Home Appliances
In an existing home, primary home appliances include the heating/cooling, ventilation and air conditioning system (HVAC), electric water heating (EWH), clothes washing (CW) machine, clothes dryer (CD), dishwasher (DW), cooking, lighting, entertainments, etc.Typically, these appliances are classified into three categories: critical appliances, shiftable appliances and adjustable appliances (Figure 1).Critical appliances are DR incapable.Shiftable appliances can be shifted, but cannot be regulated.Adjustable appliances can be fully controlled and adjusted.In general, the HVAC and the EWH are adjustable appliances.The CW, CD and DW are shiftable appliances, and the rest are considered as critical.For a future home, additional main home appliances would include an energy storage system (ESS), photovoltaic (PV) panels and electric vehicles.Next, the operating constraints associated with these home appliances are formulated for the integrative DR problem.

Adjustable Appliances
Adjustable appliances include HVAC and EWH, whose power consumption is flexibly controlled to provide thermal comfort to the occupants.
For the HVAC, the indoor temperature is maintained at a comfortable level according to the occupants' preference setting.Temperature deviation is allowed if the occupants are willing to sacrifice their comfort for a lower electricity bill.A thermal model from [25] is adopted to describe the dynamics of the indoor temperature as given below: where in (1), T HVAC i,in is the indoor temperature in time slot i, T HVAC i,out is the outdoor temperature in time slot i, δ is the inertia factor, η is the efficiency of the HVAC and A is the thermal conductivity (kW/ • F) of the house.In (2), T HVAC set denotes the temperature setting, and ∆T HVAC is the allowable deviation.The average power (kW) drawn by the HVAC in time slot i is denoted as P HVAC i , which is constrained by its upper limit P HVAC max (kW).Similarly, the EWH regulates its power consumption to keep the water temperature within a comfortable range.Considering the inlet cold water replenished into the EWH and the heat exchange with the indoor air, a thermal model from [26] is used to describe the dynamics of the hot water temperature as shown in (4).The operational constraints of the EWH are given below.
where in (4)-(6), T EW H i,in denotes the hot water temperature ( • F) in time slot i, T EW H cold is the inlet cold water temperature ( • F), SA is the tank surface area (ft 2 ), R is the tank insulation thermal resistance (hour•ft 2 • • F/BTU), d water is the density of water (8.34 lbs/gallon), F i is the hot water flow rate (gallons/hour) in time slot i, C p is the specific heat of water (1.00 BTU/(lbs• • F)), volumn is the capacity of the tank (gallons) and G, B i , R , K and Q EW H i are intermediate variables.In (7), T EW H set is the temperature setting, and ∆T EW H is the allowable tolerance.In (8), the average power P EW H i in time slot i is constrained by its upper limit P EW H max .

Shiftable Appliances
Considered shiftable appliances include CW, CD and DW.Taking the CW as an example, the constraints of shiftable appliances are modeled below.
Assume that a CW requires continuous operation of J CW time slots to fulfill its task, and it must be finished in a time interval indicate the status of the CW in time slot i, which equals one if the CW is ON, whereas zero if the CW is OFF.Then, we have, where ( 9) constrains the operation time of the CW, (10) keeps the CW running continuously and (11) ensures that the task of the CW is fulfilled.The power consumption P CW i of the CW in time slot i is determined by its status s CW i and power patterns.Let P CW (j) be the power pattern of the CW at a specific operating sequence j ∈ {1, 2, . . ., J CW }, then P CW i can be obtained as below.
The models of CD and DW can be obtained in a similar way.

Energy Storage
Let the power of the ESS be positive if the ESS is charging and negative if the ESS is discharging.Therefore, the model of the ESS can be described using the state of charge (SOC) as below, where SOC ESS i is the SOC of the ESS at the end of the time slot i and P ESS i is the power of the ESS (kW) in time slot i, which is limited by the maximum discharging power P ESS dis,max and maximum charging power P ESS ch,max of the ESS; E ESS max is the maximum capacity of the ESS (kWh), and η ESS ch and η ESS dis are the charging and discharging efficiency of the ESS, respectively; the SOC of the ESS is constrained between the allowable minimum SOC ESS min and the allowable maximum SOC ESS max .

Electric Vehicle
In modeling of an EV, the enhanced V2G and V2H functions of the EV are considered.These functions are active only when the EV is at home.Similar to the ESS, let the EV power be positive if it is charging and negative if it is discharging.The model of the EV is described in ( 16)- (18).
The meanings of the terms in ( 16)-( 18) are similar to those of the ESS model (Equations ( 13)-( 15)).Moreover, the EV has to be fully charged before it leaves home.Assume the EV arrives home at the beginning of the time slot i EV α and is available at home during the time interval [i EV α , i EV β ], so the SOC of the EV should meet the constraints, where E EV driven = d d /η EV driven is the total energy consumption for driving the car, d d is the driven distance that is assumed to be predictable and η EV driven is the driving efficiency representing energy needed to drive an EV per mile.

DR Optimization Scenarios
To investigate how the integration of different home appliances affects the DR efficiency, four DR optimization scenarios are designed.

Optimal DR for Adjustable Appliances
The first scenario assumes that only the adjustable appliances, i.e., the HVAC and EWH, participant in the DR scheme and that the other appliances act as critical appliances.The ESS and PV system are not available.Then, the DR model for this scenario is formulated as below, where price i is the electricity price in time slot i and P total i is the total power consumption, which equals the sum of the power consumption of adjustable appliances P adj i , shiftable appliances P shi i , critical appliances P cri i and EV P EV i .Based on the assumption in this scenario, the power consumptions of shiftable appliances, critical appliances and EV are all constant, so removing them from the objective function makes no difference for the optimal solution.Therefore, the problem (P1) can be simplified as (22).

Optimal DR Policy Considering Shiftable Appliances
Second, we consider the scenario that shiftable appliances also participate in DR along with adjustable appliances.In this scenario, additional discrete variables need to be solved to determine the status of the CD, CW and DW; therefore, a mixed integer programming problem (P2) is formed and can be expressed in an equivalent format as shown in (23).
where P pv i is the power production of PV in time slot i.

Integrative Optimal DR Policy
In the last scenario, the V2H/V2G applications of the EV are enabled.Since the V2H/V2G operation may accelerate battery aging and shorten the cycle life, the degradation cost of the EV battery is considered in the formulation.To develop an integrative optimal DR policy for this scenario, a mixed integer nonlinear programming model is needed, where P EV i,dis denotes the power discharged from the EV battery to the home appliances or to the grid, C EV i,deg represents the degradation cost of the EV in the time slot i due to the V2G/V2H operation, which can be modeled as a function of the actual battery cycle life as below, where C EV i,capital is the capital cost of the EV battery ($/kWh) and L EV i,E is the battery life of the EV throughput energy (kWh).The battery life (kWh) can be expressed as below, where L EV i,N represents the battery life in number of cycles, which is a function of the depth of discharging (DoD) depending on the type of the battery.In our study, a linear function relationship between cycle life and DoD is used as below [27], where a = −4775 and b = 4995.The DoD of the EV battery in time slot i can be estimated as below [27],

Approximate Dynamic Programming
Solving the integrative DR optimization problem faces some challenges.First, it is a complex MINLP problem.Second, the dimension of decision variables grows rapidly as more DR-capable appliances are integrated, such as from (P1)-(P4).Third, the growth in dimension is multiplied as the time granularity gets reduced.In this section, the ADP technique is introduced to the integrative DR optimization problem to overcome the challenges.In particular, a polynomial function architecture is designed to approximate the optimal value function.Then, using the approximate optimistic policy iteration (AOPI), an optimal DR policy is derived.

Problem Reformulation
We reformulate the optimization problems in the DP term and show the reformulation process with respect to the problem (P4) for the explanation.Let ) T be the system state at the end of time slot i, which is a vector including indoor temperature T HVAC are discrete ones.S i and x i take values in their feasible sets S and X i , respectively, which are defined by the constraints in (P4).Let a i : S i → x i+1 , a i ∈ A i be a mapping from the current state to a decision, where A i is the set of all feasible mappings while in S i .Let C i+1 (S i , a i ) be the cost applying a i while in S i and V i+1 (S i+1 ) be the total minimum cost for the residual time slots in S i+1 (Figure 2) and V I (S I ) ≡ 0. For convenience, we redefine i ∈ {0, 1, . . ., I − 1}, and the initial system state is denoted by S 0 .The system evolves via transition function S i+1 = S(S i , a i (S i )), which can be obtained from the models in Section 2. Based on Bellman's principle of optimality [28], the optimal decisions of (P4) can be obtained by solving the following Bellman equations recursively.
In this way, we avoid solving a large optimization problem.However, the classical DP algorithm, which requires finding V i (S i ) at every S i , and thus, suffers from the curse of dimensionality when a system includes large or continuous state and action spaces, cannot be used in our problems.
Relationship among notations and time slots.

ADP for the Integrative DR
To overcome the challenge, ADP is introduced to the integrative DR optimization problem.ADP breaks the curse of dimensionality by using an approximation V i to the value function (or cost-to-go function) V i , which only requires values of V i (S i ) at some states for regression.It proceeds in an iterative way and forward through time.For the formulated problems, an ADP algorithm based on approximate optimistic policy iteration (AOPI) is designed.
Assume that we start with a policy π (n) = {a and we evaluate the corresponding value function V (n) i using a parametric approximator V i (S i |θ i ) for all i, where V is the approximation architecture and θ is a parameter vector.This process is called policy evaluation.Once θ is determined, we can obtain an approximating value function V i (S i |θ (n) i ) for all i with respect to policy π (n) .Then, we use V i (S i |θ After that, the corresponding value function V i (S i |θ (n+1) i ) is evaluated again, and we repeat the procedure until a converged parameter sequence {θ is found for all i.Next, we show the process of policy evaluation.First, in each iteration n, we initialize the parameter vector with θ ).Then, we choose a start state S 0 and calculate one period cost C i+1 (S i , a i ) for all i forward in time based on the transform function After that, we calculate the cumulative cost to obtain an observation of the value i ) for all i.With the pair of (S i , v i ), we compute the new parameter vector θ (n,1) i for all i via some learning algorithm.Repeat the procedure a certain number of times, say M, and then, we assign θ

Approximating Functions' Design
Obviously, the approximation architecture and learning algorithm are significant for the convergence and performance of the algorithm.In this study, a linear parameter architecture-based multivariate polynomial is designed for the value function approximation as shown in (32), where θ i = (θ i,1 , θ i,2 , . . ., θ i,F ) T is the parameter vector to be estimated, S i (k) is the k-th element of the state vector S i and r, q ∈ N are nonnegative integer powers; here, Q is the degree of the polynomial.The polynomial function is adopted for the value function approximation because of its many merits.First, it is a linear parameter architecture that is easy to train and has fast convergence speed.Second, it can approximate closely to any nonlinear function on a finite interval according to the Weierstrass approximation theorem [29].Third, it is differentiable and simple, which reduces the complexity of policy evaluation and the proposed method in general.In this study, via trial and error, a cubic polynomial containing 1, 2, 3 power-terms of each state variable, two-degree cross-terms and a constant term is chosen to approximate the value function as shown in (33).
Given the approximation architecture, the recursive least squares algorithm (RLS) [30], which is a quickly-converged learning algorithm for the linear regression problem, is applied to update the parameter vector in each policy evaluation step.It should be noted that the policy improvement step in (31) also plays an important role in the performance of the algorithm.In fact, it is hard to obtain the improved policy exactly because it involves solving a mixed integer nonlinear optimization problem.The pseudo-code of AOPI is shown in Algorithms 1 and 2.
end for 8.

Parameters
In this section, numerical simulation is implemented to test the optimal DR policies and the proposed ADP algorithm.Table 1 summarizes all appliance parameters and characteristics.The real outdoor temperature on a hot day [31] in June 2014 in the U.S. state of Illinois and the statistical hot water flow rate [32] and aggregated critical loads [33] on the same date are adopted.The PV production data from [33] and the Ameren Illinois' DAPtariffs [34] on a high DAP day in June 2014 are used in the simulation.Figure 3 depicts the used data.We assume the DR policy starts from 8:00 a.m. and runs for 24 h with 15-min intervals.Choose the convergence tolerance ε = 10 −3 .The simulations platform is MATLAB 2014a in an Intel(R) Core(TM) i5-3475S, 2.90-GHz personal computer with 8 GB RAM memory.
The developed integrative DR optimization model is a non-convex MINLP and belongs to a class of NP-hard problems, and no algorithm guarantees a global optimum solution in polynomial time.In order to examine the effectiveness of the designed ADP algorithm, we compare it with two existing non-convex MINLP technologies: one is a standard integer branch and bound algorithm developed by the YALMIP Optimization toolbox BNB solver [35], in which the associated relaxed problem is solved by the MATLAB build-in nonlinear constrained optimization solver 'fmincon'; the other is genetic algorithm (GA) provided by the Global Optimization Toolbox in MATLAB.For GA, we choose the population size to be 50; choose the elite population to be 0.05-times the population size and the crossover fraction to be 0.8; default selection and mutation operators in MATLAB are used; the algorithm stops when the average relative change in the best fitness function value generations is less than or equal to 10 −20 ; the maximum generation is set to be 100-times the number of populations.
5 kW, 0.5 kW, 0.5 kW, 0.5 kW} [5,40] → 9:00 a.m.-6:00 p.m. {P DW (1) , . . ., [7,36] → 9:30 a.m.-5:00 p.m. {P CD (1) , . . ., Figure 4 shows the comparison of the daily cost and running time by using the three algorithms under different DR scenarios.The detailed comparison results are listed in Table 2.It can be seen from Figure 4 that for the S1 and S2 DR scenarios, the BNB algorithm performs the best in terms of both the running time and total electricity cost, and the GA performs the worst; the ADP algorithm obtains a comparable result to the BNB algorithm in terms of the total cost, but consumes much more computational time.This is because the optimization models associated with the S1 and S2 DR scenarios are simple, which are a linear model for S1 and a mixed integer linear model for S2, and have a relatively small number of variables.Therefore, the BNB algorithm is able to find the global optimal minimum within a very short time.However, the ADP algorithm needs to solve the Bellman equation at each iteration and do regression to approximate the value function, so it requires more time for calculation and obtains a sub-optimal solution due to the approximation error.
Nevertheless, for more integrative DR scenarios, i.e., S3 and S4, the advantage of the ADP algorithm appears.From Figure 4, it can be seen that the ADP algorithm obtains a close or relatively low total electricity cost in a shorter time for the S3 and S4 DR scenarios compared to the BNB and GA algorithms.Notice that the scenarios S3 and S4 complicate the DR optimization model by not only enlarging its size with more continuous and integer variables, but also introducing a large number of nonlinear equality and inequality constraints, including the SOC dynamics equation of the ESS and EV charging demand constraints.This significantly burdens the BNB algorithm in the branch process due to the large number of integer variables and the bound process due to the serious nonlinearity of the relaxed problem.As a result, it fails to give feasible results after running 24 h.These complex constraints also give rise to a problem for the GA algorithm in searching feasible and better solutions wile evolving from generation to generation.Although the ADP algorithm also encounters these problems, it breaks the large nonlinear mixed integer programming model into many small optimization problems by approximating the value function, which are easy and fast to solve.Therefore, the growth in computational time of the ADP is not as fast as the BNB and the GA algorithms.Observation of the comparison suggests that the proposed ADP algorithm is effective and has special strength for complicated high dimensional optimization problems.
Figure 5 shows the optimal DR solutions of problem S4 by using the ADP algorithm.As can be seen, for the HVAC and EWH, there is a clear pre-cooling and preheating operation respectively during low price hours to reduce power consumption in high price hours, but there is still oscillating power consumption by the EWH during peak hours from 3 p.m.-10 p.m. matching the fluctuation in the usage of hot water as shown in Figure 3c in order to keep the hot water temperature at a comfortable level; the energy usages of CW, CD and DW are also shifted to the time when the price is low; the ESS is controlled to charge up from the PV or the grid in low price hours and discharge to power the loads in high price hours; the EV battery also supplies power to home appliances during peak load hours 9 p.m.-11 p.m. via the V2G/V2H operation.

Different Choices of Approximating Functions
This subsection evaluates how the choice of the approximating function affects the performance of the ADP algorithm.Except for the cubic polynomial in Equation (33) shown in Section 3.3, two additional polynomials of degree two and degree five and two RBF (radial basis function)-based approximating functions are evaluated and compared.
The two additional polynomials are designed as follows.The quadratic polynomial consists of 1, 2 power-terms of each state variable, two-degree cross-terms and a constant term.The quintic polynomial consists of 1-5 power-terms of each state variable, 2, 3-degree cross-terms of any two state variables and a constant term.For simplicity, denote the quadratic, cubic and quintic polynomial functions as Poly-2, Poly-3 and Poly-5, respectively.
The RBF-based approximating functions are designed based on normalized kernel functions, in which the basic functions φ i, f (S i ) are defined as below, where K(S i , s f i ) is the kernel function and s f i is the center of the f -th kernel.The kernel function is normally a local weighting function, whose value declines as the query point goes away from its center.This enables an approximating function not only to characterize the local features of the value function in the neighborhood of every kernel center, but also to offer a proper fit for 'the middle area' through the linear combination of multiple normalized kernel functions.Clearly, the more kernel functions are used, the better the approximation that can be obtained.To show this relationship, two kernel-based approximating functions with different numbers of kernels are implemented.
Figure 6 shows the comparison using different approximating functions.For the polynomial-based approximating functions, the comparison suggests that as the degree of the polynomial goes up, the obtained total electricity cost reduces, but the CPU time increases quickly.Similar properties can be seen for the RBF-based approximating functions.This is due to the fact that a good performance relies on a good approximation, which generally contains a large number of parameters to learn, which is time consuming.In practice, a balance between the performance and the running time is needed no matter which approximating function is adopted.However, the RBF-based approximation functions require much more running time than the polynomial approximators and hence are not proper for the integrative DR problem.In addition, compared to Poly-2 and Poly-5, Poly-3 offers a good balance between the total energy cost and the CPU time according to the comparison shown in Figure 6b.Therefore, Poly-3 is an appropriate approximating function for the integrative DR problem.As can be seen in the figure, under all DR scenarios, the peak load in high price hours 3 p.m.-9 p.m. and the total electricity cost are reduced compared to the scenario without DR.Specifically, the reduction becomes larger and larger as more appliances participate in DR from S1-S4.Moreover, a sharp reduction in cost happens between S2 and S3.It is obvious that the ESS and PV make a great contribution to reducing the electricity cost.In order to investigate how the degradation cost of the EV battery affects the V2H/V2G operation, we make a comparison for different EV battery capital cost settings of $0, $400 and $800.The simulation results are depicted in Figure 8.As can be seen, when the capital cost is $0, the SOC of the EV battery decreases to its lowest level after it arrives home, which means it deliveries the most power to the home.However, when the capital cost goes up, the SOC of the EV battery does not decrease as much.From the results, it is concluded that higher capital cost leads to less energy discharged to supply the home appliances (V2H) or to sell to the grid (V2G).

Discussion and Future Work
In developing the integrative DR strategy, we only consider from the users' perspective and assume that the bidding process has already cleared.However, it is worth mentioning that when a significant fraction of households takes the developed DR strategy and selfishly minimizes their own bills, the local demand may shift to the low price hours and form a new load peak.This certainly would require the electric utilities to adjust their dynamic price structures by considering all kinds of loads, including residential, industry and commercial loads until a new balance between the utilities and energy consumers is achieved.This is a significant issue worth discussion from a more systemic perspective.Pioneering studies on DR pricing problems can be found in [36][37][38][39][40], and we refer the interesting readers to these publications.For now, we leave this problem for our future research.

Conclusions
This paper presents an integrative DR study for the optimal operation of home appliances, ESS and the V2G/V2H-enabled EV based on the proposed ADP algorithm.Based on the simulation results, we have the following conclusions: (1) the proposed ADP algorithm is effective and has special strength for complicated high-dimensional optimization problems; (2) more participation of smart home appliances in DR program brings more benefits to the customers via energy cost reduction and peak load shifting; (3) the ESS makes the greatest contribution to reducing the energy cost by charging from the PV or the grid and discharging to power the loads in high price hours; (4) the V2G/V2H applications of the EV can offer a more economically-efficient usage of electricity, but the degradation of the EV battery must be evaluated carefully.

SA
Tank surface area of EWH (ft 2 ) R Tank insulation thermal resistance of EWH (hour•ft

iI− 1 }
) to obtain an improved policy π (n+1) = {a by computing the Bellman functions as below.

Figure 5 .
Figure 5. Optimal DR solutions of appliances under S4 based on the ADP.

Figure 6 .Figure 7
Figure 6.Comparison among different approximating functions.Poly, polynomial.(a) Total consumption power at each time step; (b) total electricity cost and running time

Figure 7 .
Figure 7. (a) Power exchange between the household and grid under different scenarios.(b) Cumulative cost under different scenarios.

Figure 8 .
Figure 8.Comparison of the EV battery SOC and degradation cost for different capital cost settings.
Optimal DR Policy Combining ESS and PV Third, we consider the scenario that the energy storage system and PV panels are available.The DR optimization problem for this scenario can be modeled as (P3).
and how many time slots the CW, the CD and the DW have been powered, denoted by ∑ i s CW i , ∑ i s CD

Table 1 .
Parameters of the residential appliances.

Table 2 .
Comparison of the total cost and CPU time with different solvers for the residential DR problems.
cost ($) time (s) cost ($) time (s) cost ($) time (s) cost ($) time (s)* Mean value of 10 runs. 2 • • F/BTU) d water Density of water (8.34 lbs/gallon) F Hot water flow rate (gallons/hour) C p Specific heat of water (1.00 BTU/(lbs• • F)) Index of iteration for approximate optimistic policy iteration (AOPI) algorithm (n, m)Index of iteration for policy evaluation in the n-th iteration of AOPI *