Optimal Scheduling of Energy Storage System Considering Life-Cycle Degradation Cost Using Reinforcement Learning

: Recently, due to the ever-increasing global warming effect, the proportion of renewable energy sources in the electric power industry has increased signiﬁcantly. With the increase in distributed power sources with adjustable outputs, such as energy storage systems (ESSs), it is necessary to deﬁne ESS usage standards for an adaptive power transaction plan. However, the life-cycle cost is generally deﬁned in a quadratic formula without considering various factors. In this study, the life-cycle cost for an ESS is deﬁned in detail based on a life assessment model and used for scheduling. The life-cycle cost is affected by four factors: temperature, average state-of-charge (SOC), depth-of-discharge (DOD), and time. In the case of the DOD stress model, the life-cycle cost is expressed as a function of the cycle depth, whose exact value can be determined based on fatigue analysis techniques such as the Rainﬂow counting algorithm. The optimal scheduling of the ESS is constructed considering the life-cycle cost using a tool based on reinforcement learning. Since the life assessment cannot apply the analytical technique due to the temperature characteristics and time-dependent characteristics of the ESS SOC, the reinforcement learning that derives optimal scheduling is used. The results show that the SOC curve changes with respect to weight. As the weight of life-cycle cost increases, the ESS output and charge/discharge frequency decrease.


Introduction
Recently, consumers' perception of energy has changed due to the development and demonstration of an operating system for regional power grids characterized by VPP and MG. Under the influence of economic factors, such as decreasing installation costs of renewable energy and technological advances, consumers have become energy prosumers who can trade their own electricity through distributed power systems [1,2]. Because the power surplus can be sold to neighbors, the energy flow in the energy market has changed from one-way to two-way. In addition, the existing hierarchical market structure has transformed into a network structure.
With the adoption of distributed energy, the need to establish usage standards is increasing with an increasing use of ESSs. When conducting trading through ESSs, certain usage standards, such as the fuel cost function of the generator, must be considered. The life-cycle cost of the ESS can be considered as one of these standards. As research on the ESS life-cycle, Ref. [3] proposed the total capital cost and life-cycle cost models for ESSs. The cost function was introduced and modeled for the system, and a learning model that can accurately estimate the life-cycle cost based on various battery types was built. Reference [4] proposed an analytical optimization for the capacity and sizing of solar power and ESSs connected to the grid. Ref. [5] studied the efficiency difference between HESS and LESS in an independent microgrid. The constraint variable was set by combining the SOC with the cost function, and the stability of the system was considered in preparation for the surge currents of LIB and LAB. Ref. [6] proposed an ESS life-cycle definition using SOC and SOH models. They established an equation via correlation analysis and introduced a cycle depth variable. Ref. [7] introduced an overall cost evaluation model for ESS and used fuzzy comprehensive evaluation theory to analyze the model, considering basic facility and operating costs. Ref. [8] proposed a life assessment method for ESS in distributed energy systems and established an evaluation method classified into four scenarios. Ref. [9] introduced a P2P energy sharing scheme using ESSs. In their study, scheduling was set to maximize system profits depending on the existence of an ESS. Ref. [10] conducted a comparative study on single/hybrid ESSs to examine stable energy transfer capabilities of these systems. This model considered the charge/discharge rate function of the battery, and data analyses were performed for situations with varying supply and demand. Ref. [11] defined a cycle life model of a battery considering the SOC, DOD, average C-rate, and aging of the lithium-ion battery. A comparative analysis was performed on the battery temperature, output, and resistance values with varying parameters. Ref. [12] introduced an ESS life-cycle cost optimization method through an energy consumer scheduling scheme. The battery life was calculated using the Rainflow counting algorithm for maximizing battery life. Scheduling was configured based on unstable PV and WT output data and the composed ESS life results. Ref. [13] introduced an improvement in the prediction accuracy of lithium-ion batteries. A BP neural network was used to predict the life-cycle cost of the battery, and the weights were set using the DE algorithm. Ref. [14] analyzed the ESS life-cycle cost using various forecasting techniques, such as the RVM and CNN models. Ref. [15] conducted a study on the battery output considering the life-cycle cost of an ESS used in the grid. Their study, conducted on the stability of the system, included measuring the frequency fluctuations over time. Moreover, a scheduling scheme for an ESS used in the auxiliary service market was established. Ref. [16] presented a methodology for the optimal location, selection, and operation of battery energy storage systems (BESSs) and renewable distributed generators in medium-to low-voltage distribution systems. Ref. [17] proposed a new formulation of the battery degradation cost for the optimal scheduling of BESSs. This paper defined a one-cycle battery cost function based on the cycle life curve and an auxiliary state of charge (SOC) that tracks the actual SOC only upon discharge. Ref. [18] proposed a mixed-integer nonlinear programming (MINLP) model for the PV-battery systems which aims to minimize the life-cycle cost (LCC), and solved LCC Problem by a novel two-layer optimization, and Ref. [19] studied the multi-objective operation of BESS in AC distribution systems using a convex reformulation. Ref. [20] proposed a two-stage multi-objective optimal operation scheduling method to improve the operation efficiency and reduce the emission of a solar-power-integrated hybrid ferry with shore-to-ship (S2S) power supply, and Ref. [21] addressed the problem associated with economic dispatch of BESSs in alternating current (AC) distribution networks. Ref. [22] addressed the problem of the optimal operation of BESSs in AC grids from the point of view of multi-objective optimization. Ref. [23] proposed a distributed multi-agent consensus-based control algorithm for multiple BESSs, operating in a microgrid, for fulfilling several objectives, including: SOC trajectories tracking control, economic load dispatch, active and reactive power sharing control, and voltage and frequency regulation. Ref. [24] proposed an optimal BESS scheduling for MGs to solve the stochastic unit commitment problem, considering the uncertainties in renewables and load.
In summary, most previous studies derive their results by defining the life-cycle cost in a quadratic manner or simplifying it. This study aims to define it in detail based on a life-cycle cost assessment method and utilize it for scheduling. Because the defined life-cycle cost cannot be derived analytically and explicitly, a solution is derived using reinforcement learning techniques.
The contributions of this study are as follows: (1) By defining the life-cycle cost of an ESS, and deriving and utilizing it for optimal scheduling, prosumers with ESSs can make the best choice between incurring life- cycle costs due to ESS use and profiting from transactions. In addition, because of the active adjustment of prosumers with ESSs, it is possible to reduce the line loss inside a system. (2) Through analysis of the trading tendency of flexible prosumers with respect to changes in ESS life-cycle cost weights, prosumers who own an ESS have the choice of participating in P2P energy trading to make profits.

Life Degradation Model for ESSs Based on a Life-Cycle Assessment Method
This chapter presents the design of an ESS life-cycle cost metric for prosumer participation in P2P energy trading with ESSs. ESSs can be classified according to the type of battery they use. In this study, lithium-ion batteries, which are commonly used in ESSs, are chosen, and their life-cycle cost is designed. The life-cycle cost was designed based on existing studies related to battery life assessment. The life assessment model consists of four stress models: temperature, average state-of-charge (SOC), depth-of-discharge (DOD), and time Ref. [25].
The degradation ratio of the battery life-cycle is determined by the corresponding stress models, and it can be evaluated using the corresponding degradation ratio. The degradation ratio, four stress models, and the consumption life-cycle ratio are formulated as follows: S t (t) = k t t (5) where f d is the degradation ratio, and S T , S σ , S δ are the stresses for temperature, average SOC, and DOD, respectively. T c is the battery cell temperature, T re f is the reference temperature, and k T is the temperature stress coefficient. σ is the average SOC, σ re f is the reference average SOC, and k σ is the average SOC stress coefficient. S t is the stress for time, δ is the cycle depth, and k δ,q1 , k δ,q2 are the DOD coefficients. t is time, k t is the time stress coefficient, L is the consumed life-cycle, and α sei , β sei are the solid electrolyte interphase (SEI) film formation coefficients.
In the case of a DOD stress model, various models such as linear, exponential, polynomial, and power are applicable, but the power function is used according to references. Stress models for average SOC and time can be used immediately for life-cycle cost design because they are explicit. The same does not hold for DOD and temperature stress models.
First, in the case of the DOD stress model (4), which is expressed as a function of the cycle depth, the exact value of the cycle depth can be determined through a post evaluation based on fatigue analysis techniques, such as the Rainflow counting algorithm. In the case of the temperature stress model (2), additional analysis is required because a model for the internal battery temperature is required with respect to the output ESS. Therefore, additional design stages for these two models are required. The model for temperature is designed by analyzing the relationship between battery output and temperature using the thermoelectric model of the battery. Furthermore, for the DOD model in this case, an approximation that considers one charge or discharge of the battery as a half cycle is assumed.

Temperature Stress Model Formulation
A thermoelectric model is used as the temperature stress model, which is categorized into two types: an electric circuit model and a thermal model Refs. [26][27][28][29]. The electric circuit model of the battery used in the temperature stress model is shown in Figure 1 Ref. [28]. The open circuit voltage (OCV) can be expressed as a function of the SOC and internal temperature of the battery. OCV, characteristically, rises during charging and falls during discharging, and this tendency varies according to SOC. The internal resistance R in can also be expressed as a function of SOC and temperature. The RC network located to the right of the internal resistance is a secondary model and represents the diffusion resistance and capacitance. R 1 and C 1 are related to the charge transfer processes occurring in the middle frequency range, whereas R 2 and C 2 are responsible for reproducing the diffusion processes. approximation that considers one charge or discharge of the battery as a half cycle is assumed.

Temperature Stress Model Formulation
A thermoelectric model is used as the temperature stress model, which is categorized into two types: an electric circuit model and a thermal model Refs. [26][27][28][29].

Electric Circuit Model
The electric circuit model of the battery used in the temperature stress model is shown in Figure 1 REF. [28]. The open circuit voltage (OCV) can be expressed as a function of the SOC and internal temperature of the battery. OCV, characteristically, rises during charging and falls during discharging, and this tendency varies according to SOC. The internal resistance can also be expressed as a function of SOC and temperature. The RC network located to the right of the internal resistance is a secondary model and represents the diffusion resistance and capacitance. and are related to the charge transfer processes occurring in the middle frequency range, whereas and are responsible for reproducing the diffusion processes. is an additional voltage component caused by the hysteresis characteristics of the RC network, which refers to the fluctuations on the open voltage during charge/discharge. This component is ignored, assuming its effect is relatively small. Moreover, the corresponding model for life-cycle cost analysis does not require detailed dynamic characteristics of the battery. The function for the SOC of the OCV (7) is based on the parameters listed in Table 1. Regarding the effect of temperature on the OCV, Equation (8) shows the correlation between the OCV bias component and temperature. It can be expressed as a polynomial with the following related parameters in Table 2  The function for the SOC of the OCV (7) is based on the parameters listed in Table 1. Regarding the effect of temperature on the OCV, Equation (8) shows the correlation between the OCV bias component and temperature. It can be expressed as a polynomial with the following related parameters in Table 2: Finally, the open circuit model is constructed as a linear sum of the OCV model for SOC in Equation (9).
The SOC is updated according to the output current, whose unit can be set as % (or p.u.). The sign of the discharge current was set to positive. The discrete equation can be described as: where T s is the sampling time (unit: second [s]), C n is the battery capacity (unit: Ampere hour [Ah]), i is the output current, and k is a time index. The internal resistance is also configured as a function of the internal temperature and SOC, similar to the OCV. However, the internal resistance remains constant without significant changes over the general battery SOC usage period and is dominantly affected by the internal temperature Ref. [19]. Therefore, the internal resistance is expressed as a function of the internal temperature with the battery internal resistance curve parameters in Table 3 and formulated as follows: The relationship between voltage, resistance, and current in an RC network can be represented as where T s is the sampling time, and k is the discrete time index. The second network time constant does not change Ref. [28]; thus, what remains to be estimated are the resistance and time constant of the first RC network and the resistance of the second RC network. If R 1 and τ 1 are known, then using the time constant relational expression (τ = RC) C 1 can be calculated, and C 2 can be calculated in a similar manner if R 2 and τ 2 are known. Network resistances are based on a polynomial model, whereas the first time constant is based on an exponential equation. The equations are stated below and parameters are shown in Tables 4-6.   As shown in Figure 2, the battery thermal model is affected by the temperature values at three points: the cell inside the battery shell, shell surrounding it, and environment Ref. [19]. However, the heat generated due to the overpotential of the RC network and entropy change also need to be considered. The total heat generated by the cell is given by In general, the heat transfer in and out of a battery includes three mechanisms: con duction, convection, and radiation. Before modeling the heat transfer, both the battery shell temperature and internal temperature must be uniform, and the thermal character istics must also be uniformly distributed inside the battery. Only the heat conduction be tween the inside and shell of the battery and between the shell and environment is con sidered. The heat transfer model is expressed as follows: Therefore, the battery thermal model can be modeled in two ways: the heat generation that occurs inside the battery and heat transfer from the inside to the battery shell and from the shell to the environment. In general, heat generated by the cell is considered only as the heat generated by the internal resistance.
However, the heat generated due to the overpotential of the RC network and entropy change also need to be considered. The total heat generated by the cell is given by In general, the heat transfer in and out of a battery includes three mechanisms: conduction, convection, and radiation. Before modeling the heat transfer, both the battery shell temperature and internal temperature must be uniform, and the thermal characteristics must also be uniformly distributed inside the battery. Only the heat conduction between the inside and shell of the battery and between the shell and environment is considered. The heat transfer model is expressed as follows: where T in is the battery internal temperature, T sh is the battery shell temperature, and T amb is the ambient temperature. C q1 and C q2 are the internal and shell thermal capacities of the battery, respectively, and k 1 and k 2 are the heat conduction coefficients between the battery internal and the shell, and between the battery shell and the ambience, respectively. Because Equations (18) and (19) are continuous, they are discretized as follows: Finally, the formulae for the internal temperature and shell temperature are as given by The heat capacity coefficients and internal heat capacity are constant. The heat capacity coefficient k 2 used in this model has time-varying characteristics; and the following relation holds Ref. [28]: The k 2 certainly depends on the heat dissipation condition, such as cooling wind speed and temperature. k 2 also increases with this temperature gradient T sh − T amb . To take this effect into consideration, two cases are compared here: Constants k 21 of k 2 and time-varying k 22 of k 2 .

Coupled Thermoelectric Model
By combining the two previously defined thermal/electrical models into one, where,

Cycle Depth Stress Model Formulation
The cycle depth is derived after fatigue analysis using the Rainflow counting algorithm, as mentioned earlier in the study related to the life-cycle cost evaluation of the ESS. Therefore, it is impossible to determine the cycle depth before scheduling is configured. The first step in solving this problem is deriving it through dynamic programming when composing the ESS schedule. However, in dynamic programming, a cost or reward should be calculated at the transition time between states. It is necessary to define a state, which can be the SOC of the ESS. However, because SOC is a continuous variable, it cannot be determined discretely; however, the state can be defined by dividing it into a specific unit as a simplification to reduce the burden of calculation. For example, if the state is defined in units of 0.1, when the minimum SOC is 0.1 and the maximum SOC is 0.9, nine states can be defined in one time period (Stage). When the total schedule interval is T, the number of cases composed by states is 9 T−1 . This refers to the number of cases when searching for a path from the first to the last stage in the dynamic plan. That is, the computational power required to search for an optimal point is quite large. To solve this problem, a reinforcement learning-based approach is introduced, and the cycle depth to be used in this approach is approximated. Therefore, for the cycle depth, the same half cycle was applied for all charging/discharging cycles. To check whether this approximation is appropriate, we created a random SOC candidate group and compared the difference between the complete and approximate life-cycle cost analysis results. Figure 3 shows a flowchart depicting this process. Figure 4 shows the SOC graph where the difference between the two results is maximum and minimum when the life-cycle costs for the approximated and total cycle depths are calculated. As a result, when charging and discharging are repeatedly performed, the difference between the two life costs is small, as shown in the blue graph, whereas when charging and discharging are sequentially performed in one large cycle depth, the difference is the largest. Figure 5 shows the approximated cycle depth for 20 candidates and the life-cycle cost for the total cycle depth, as well as the ratio between the two life-cycle costs. Although there is a difference in the ratio for each candidate group, we confirmed that even if the life-cycle cost is calculated using the approximated cycle depth, the effect could be similar to the lifetime cost calculated using the total cycle depth. When the life-cycle cost is included in the actual objective function, it may be lower than the actual expected life-cycle cost owing to the approximated cycle depth. However, this can be avoided because the life-cycle cost is used for weight and not directly converted into an actual financial cost.      Figure 5 shows the approximated cycle depth for 20 candidates and the life-cycle cost for the total cycle depth, as well as the ratio between the two life-cycle costs. Although there is a difference in the ratio for each candidate group, we confirmed that even if the life-cycle cost is calculated using the approximated cycle depth, the effect could be similar to the lifetime cost calculated using the total cycle depth. When the life-cycle cost is included in the actual objective function, it may be lower than the actual expected life-cycle cost owing to the approximated cycle depth. However, this can be avoided because the life-cycle cost is used for weight and not directly converted into an actual financial cost.

ESS Life-Cycle Cost Formulation
Considering the temperature and DOD stress models, the ESS life-cycle cost can be expressed as

ESS Life-Cycle Cost Formulation
Considering the temperature and DOD stress models, the ESS life-cycle cost can be expressed as S t (t) = k t t (32) where P bat and C ESS are the output and capacity of the ESS, respectively. η is the ESS charging/discharging efficiency, and σ t is the SOC at time t. Because the consumed life-cycle L presented in Equation (33) is a cumulative expression of the battery aging, the difference in L values reflects the actual shortened lifespan. For example, if the lifecycle L 1 consumed on the first day and life-cycle L 2 consumed on the second day are determined through life assessment, the actual life-cycle consumed on the second day becomes L 2 − L 1 . If this is applied in the dynamic programming method mentioned above, complex calculations, such as the number of cases for the path by the SOC state, must proceed. Therefore, to maintain the tendency of the life-cycle cost and lower the computational complexity, the initially consumed life-cycle value is initialized at 0. The calculation complexity can be reduced using only the degradation ratio and life-cycle cost. Finally, the life-cycle cost of the ESS is treated as a concept of depreciation cost by multiplying the investment cost for the battery, as shown in Equation (34).
f bat is the life-cycle cost, and In bat is the investment cost for the battery. The DOD stress model can be solved analytically through half-cycle approximation. However, the temperature stress model cannot be directly used for optimization problems because it is derived through dynamic characteristic analysis. Therefore, this problem is solved through a reinforcement learning approach.

ESS Scheduling Formulation Considering the Life-Cycle Cost
The basic optimization problem regarding a prosumer who owns an ESS is the summation of the cost of electricity purchased from the grid and life-cycle cost of the ESS, which can be expressed as follows: In (35), weights are applied to both the ESS life-cycle cost and system purchase cost to reflect the subjective preference of the ESS operator regarding the life-cycle cost. The larger this weight is, the larger the ESS life-cycle is, which is reduced when operating the ESS, whereas the system purchase cost is considered relatively low. Regarding this problem, the state and stage for the ESS SOC are defined as shown in Figure 6, and a reward table for state transition is constructed for reinforcement learning.   Figures 8 and 9 show examples of internal reward tables for reinforcement learning. Because this problem is a cost minimization problem, the cost value is treated with a negative sign.    Figures 8 and 9 show examples of internal reward tables for reinforcement learning. Because this problem is a cost minimization problem, the cost value is treated with a negative sign.   Figures 8 and 9 show examples of internal reward tables for reinforcement learning. Because this problem is a cost minimization problem, the cost value is treated with a negative sign.
(52)-(54) are the Bellman Expectation Equations. If the optimal value of is found as in (55), the action state * can be obtained and * can be obtained accordingly. In (54), , the reward of action in state is the sum of the negative values of the cost for energy consumption and the life-cycle cost as shown in Figure 9.
, the probability of transition from s to is set to 1 in this problem. For example, when SOC 0.5 is state and 0.4 is ′, the action is a discharge corresponding to the amount of energy for SOC 0.1 which is a difference. As a result, when the action of discharging from SOC 0.5 to 0.4 is selected, the probability of the transition becomes 1 because another state cannot exist according to this      If the optimal value of is in (55), the action state * can be obtained and * can be obtained accordingly. In the reward of action in state is the sum of the negative values of the cost fo consumption and the life-cycle cost as shown in Figure 9.
, the probability of t All problems subject to reinforcement learning can be expressed as a Markov decision process (MDP) model, and this MDP is based on the Markov process (MP). The purpose of reinforcement learning is to solve the Bellman Equations below.
V π (s) = ∑ a∈Â π( a|s)Q π (s, a) Q π (s, a) = R a s + γ ∑ s ∈Ŝ P a ss ∑ a ∈Â π a s Q π s a (54) Q(s, a) = Q(s, a) + α lr ∆Q (52)-(54) are the Bellman Expectation Equations. If the optimal value of Q is found as in (55), the action state a * can be obtained and π * can be obtained accordingly. In (54), R a s , the reward of action a in state s is the sum of the negative values of the cost for energy consumption and the life-cycle cost as shown in Figure 9. P a ss , the probability of transition from s to s is set to 1 in this problem. For example, when SOC 0.5 is state s and 0.4 is s , the action is a discharge corresponding to the amount of energy for SOC 0.1 which is a difference. As a result, when the action of discharging from SOC 0.5 to 0.4 is selected, the probability of the transition becomes 1 because another state cannot exist according to this action. The discount factor γ is used to evaluate future rewards at the point in time. When determining the optimal scheduling, the γ is set to 1 in this problem because the reward is not discounted. The function approximator solves the problem by finding the value function value in the reverse order from the final state through (54) and updating the Q(s,a) value as shown in (56).

Simulation Results
The MDP object is defined through the configured table, and the problem is solved using the reinforcement learning toolbox of MATLAB 2019a. Figure 10 shows the battery open circuit voltage fitting curve and Figure 11 shows the bias model for temperature and SOC. Figure 12 shows the battery internal resistance/temperature curve and Figures 13 and 14 show RC network R curves and surface with respect to SOC/Internal Temperature. Table 7 shows the settings for the agent and training options.
Energies 2022, 15, x FOR PEER REVIEW action. The discount factor is used to evaluate future rewards at the point in tim determining the optimal scheduling, the is set to 1 in this problem because the is not discounted. The function approximator solves the problem by finding th function value in the reverse order from the final state through (54) and upda Q(s,a) value as shown in (56).

Simulation Results
The MDP object is defined through the configured table, and the problem is using the reinforcement learning toolbox of MATLAB 2019a. Figure 10 shows the open circuit voltage fitting curve and Figure 11 shows the bias model for temperat SOC. Figure 12 shows the battery internal resistance/temperature curve and Fig  and 14 show RC network R curves and surface with respect to SOC/Internal Temp Table 7 shows the settings for the agent and training options.   action. The discount factor is used to evaluate future rewards at the poi determining the optimal scheduling, the is set to 1 in this problem bec is not discounted. The function approximator solves the problem by fi function value in the reverse order from the final state through (54) an Q(s,a) value as shown in (56).

Simulation Results
The MDP object is defined through the configured table, and the p using the reinforcement learning toolbox of MATLAB 2019a. Figure 10 sh open circuit voltage fitting curve and Figure 11 shows the bias model for SOC. Figure 12 shows the battery internal resistance/temperature curve and 14 show RC network R curves and surface with respect to SOC/Intern Table 7 shows the settings for the agent and training options.          For the optimal scheduling of ESS, the power system's architecture is shown as Figure 15a and is behind the meter. Figure 15b shows the load power curve and PV output curve. In the case of the ESS used in this paper, the load and PV were modeled in the behind the meter (BTM) method. Figure 16 demonstrates the process of finding the path to the SOC through reinforcement learning.
By changing the weight for the life-cycle cost through reinforcement learning, we checked whether the effect of the life-cycle cost is reflected in the ESS SOC results in Figure 17.
The results in Figure 17 compare the optimal ESS SOC results when the life-cycle cost is not reflected and when it is reflected. In the figures, the green graphs represent the price curve. In Figure 17a, when the initial life-cycle cost is not considered, the ESS repeats a charging/discharging pattern due to the price difference and discharging during the most expensive time period to maximize profits.
However, in Figure 17b-e, when the life-cycle cost is considered, frequent charging/discharging is reduced. As the life-cycle cost weight increases, discharge is not performed even in a time period when the price is low. It was also confirmed that no charging/discharging was performed when the weight of the life-cycle cost increased by more than a certain amount in Figure 17f. This is because the investment cost value of the ESS itself dominates the difference between the system purchase cost and absolute size. For the optimal scheduling of ESS, the power system's architecture is shown as Figure 15a and is behind the meter. Figure 15b shows the load power curve and PV output curve. In the case of the ESS used in this paper, the load and PV were modeled in the behind the meter (BTM) method. Figure 16 demonstrates the process of finding the path to the SOC through reinforcement learning By changing the weight for the life-cycle cost through reinforcement learning, we checked whether the effect of the life-cycle cost is reflected in the ESS SOC results in Figure  17.   The results in Figure 17 compare the optimal ESS SOC results when the life-cycle cost is not reflected and when it is reflected. In the figures, the green graphs represent the price curve. In Figure 17a, when the initial life-cycle cost is not considered, the ESS repeats a charging/discharging pattern due to the price difference and discharging during the most expensive time period to maximize profits. However, in Figure 17b-e, when the life-cycle cost is considered, frequent charging/discharging is reduced. As the life-cycle cost weight increases, discharge is not performed even in a time period when the price is low. It was also confirmed that no charging/discharging was performed when the weight of the life-cycle cost increased by more than a certain amount in Figure 17f. This is because the investment cost value of the ESS itself dominates the difference between the system purchase cost and absolute size.

Conclusions
In this study, the life-cycle cost for an ESS is defined in detail based on a life assessment model and is used for scheduling. Prosumers with ESSs can make an assessment on the price of P2P energy transactions based on the defined ESS life-cycle cost. The life-cycle cost is affected by four factors: temperature, average SOC, DOD, and time. Among the four stress models, the temperature and DOD cannot be approached analytically; therefore, they are solved by approximation and reinforcement learning. The life-cycle cost of an ESS is verified through the reinforcement learning toolbox of MATLAB. Regarding the life-cycle cost, it is confirmed that the SOC result curve changes according to the weight, and as the weight of life-cycle cost increases, the ESS output and charge/discharge frequency decrease. When the initial life-cycle cost is not considered, the ESS repeats a charging/discharging pattern due to the price difference and the ESS discharges during the most expensive time period to maximize profits. However, when the life-cycle cost is considered, frequent charging/discharging is reduced. As the life-cycle cost weight increases, discharge is not performed even in a time period when the price is low. It was also confirmed that no charging/discharging was performed when the weight of the life-cycle cost increased by more than a certain amount. In the future, we shall investigate the connection between the community grid, general distribution system and a real-time P2P energy trading strategy that considers real-time uncertainty.

Conflicts of Interest:
The authors declare no conflict of interest.

Nomenclature
The following Nomenclatures are used in this manuscript: The degradation ratio S T , S σ , S δ The stresses for temperature, average SOC, and DOD T c The battery cell temperature T re f The reference temperature k T The temperature stress coefficient The battery internal temperature T sh The battery shell temperature T amb The ambient temperature C q1 , C q2 The internal and shell thermal capacities of the battery k 1 , k 2 The heat conduction coefficients P bat The output of the ESS C ESS The capacity of the ESS η The ESS charging/discharging efficiency σ t The SOC at time t f bat The life-cycle cost

In bat
The investment cost for the battery V π (s) The value of state s Q π (s, a) The value of action a in state s π( a|s) The policy of action a in state s R a s The reward of action a in state s P a ss The probability of transition from state s to state s by action a γ The discount factors α lr The learning rates