Two-Stage Physical Economic Adjustable Capacity Evaluation Model of Electric Vehicles for Peak Shaving and Valley Filling Auxiliary Services

: A large number of renewable energy and EVs (electric vehicles) are connected to the grid, which brings huge peak shaving pressure to the power system. If we can make use of the ﬂexible characteristics of EVs and effectively aggregate the adjustable resources of EVs to participate in power auxiliary services, this situation can be alleviated to a certain extent. In this paper, a two-stage physical and economic adjustable capacity evaluation model of EVs for peak shaving and valley ﬁlling ancillary services is constructed. The main steps are as follows: with the help of the deep learning ability of the AC (Actor-Critic) algorithm, the optimal physical charging scheme of EV ﬂeet is determined to minimize the grid ﬂuctuation under the travel constraints of private EVs, and the optimized charging power is transferred to the second stage. In the second stage, load aggregators encourage users to participate in ancillary services by setting subsidy prices. In this stage, the model constructs a user decision model based on a logistic function to describe the probability of users accepting dispatching instructions. With the goal of maximizing the revenue of load aggregators, the wolf colony algorithm is used to solve the optimal solution of the time-sharing subsidy level, and ﬁnally the economic adjustable capacity of the EV ﬂeet considering the subjective decision of users is obtained.


Introduction
With the strong support of policies, EVs have achieved rapid development which is of positive significance to the greenhouse gas emission reduction and air pollution prevention and control in the transportation industry. However, with the continuous increase in the ownership of EVs, it also has a significant impact on power load forecasting and power system planning and operation [1][2][3]. The EV has strong adjustability, fast response speed and flexible adjustment mode [4][5][6][7] and its charging and discharging state can be directly controlled through the charging pile. After effective aggregation, it can provide multiple auxiliary services [8][9][10][11][12][13] and demandside response [14] for the power system; at the same time, EV users can also obtain benefits by participating in the grid interaction [15,16], which is conducive to the benign development of the EV industry.
How to make use of the flexible characteristics of EVs and coordinate their charging and discharging time to weaken the impact of EV load on power grid operation and dispatching and achieve the effect of peak load reduction and valley filling has become one of the important problems in the current engineering field.
Some scholars have studied that it is possible to effectively transfer the charging time of EVs and pull up the low valley of power load at night by providing subsidies to EVs and guiding them Sustainability 2021, 13, 8153 2 of 22 to participate in the response of ancillary services [17], but a single aggregation goal is also very easy to generate new load peaks during the opening hours of ancillary services. Furthermore, some researches have begun to build a two-stage optimization model to realize multi-party optimization of the system by considering the grid operation objectives while aggregating the EV load or achieving collaborative optimization with renewable energy generation [18][19][20].
In the aspect of EV adjustable capacity evaluation, some studies have pointed out that the battery capacity, charging time, charging and discharging power [16,21] and the upper/lower limit of SOC [22] (state of charge) accepted by users are the important constraints of EV adjustable capacity evaluation.
In the construction of an EV scheduling model, most of the research focuses on the travel characteristics of EV users [23]. These models usually assume that users only need to set the time when the EV enters and leaves the network [24,25]. They think that as long as the controllable range of SOC meets the conditions, then the EV can be scheduled from the time when it enters the network to the time when it leaves. However, it should be noted that different from the traditional energy storage equipment, EVs, as a kind of high-quality controllable resources [26], are accompanied by the uncertainty of user decision-making. From this point of view, the current research lacks the subjective analysis of the EV users' acceptance of scheduling. Due to the difference in users' travel characteristics, the acceptance of the same dispatching instructions will be different [27], which will directly affect the final adjustable capacity.
Based on the research status, we constructed a two-stage physical and economic adjustable capacity evaluation model of EVs for participating in the ancillary services market.
The model is divided into two stages to evaluate its response ability to participate in ancillary services. (1) In the first stage, the travel data and charging load data of EVs are obtained. Under the condition of meeting the travel power demand of EVs, the best physical charging scheme of the EV fleet with the objective function of smooth grid fluctuation is determined by means of the in-depth learning ability of AC algorithm, and the optimized charging power is transferred to the second stage. (2) In the second stage, the load aggregator encourages users to participate in auxiliary services by setting subsidy price. In this stage, the user decision-making model based on logistic function is established to describe the probability of users accepting dispatching instructions, and the wolf colony algorithm is used to solve the optimal solution of the time-sharing subsidy level with the goal of maximizing the revenue of the load aggregator, Finally, the economic adjustable capacity of the EV fleet considering the subjective decision of users is obtained, which makes the evaluation results closer to the actual situation. Figure 1 is the logical framework of this paper.

First Stage: Physical Adjustable Capacity Assessment
According to the characteristics of automobile travel, EVs can be divided into four types: taxi, bus, private car and official car. Due to the profit-making nature of buses and taxis, in the operation time, when the SOC is low, most car owners choose the fast charging mode to charge immediately. In addition, the ability of buses and taxis to participate in auxiliary services is greatly affected by the actual scheduling situation. However, when

First Stage: Physical Adjustable Capacity Assessment
According to the characteristics of automobile travel, EVs can be divided into four types: taxi, bus, private car and official car. Due to the profit-making nature of buses and taxis, in the operation time, when the SOC is low, most car owners choose the fast charging mode to charge immediately. In addition, the ability of buses and taxis to participate in auxiliary services is greatly affected by the actual scheduling situation. However, when the buses and taxis park at the station at night and the start and end time is extremely stable, it can be directly issued to the station to guide the charging time to transfer to the low load period, so the scheduling difficulty is relatively low. Most owners of private EVs will charge them immediately after they go home at night, which coincides with the peak height of electricity consumption in the evening [27]. According to statistics, except for buses and taxis, EVs are in parking state for more than 90% of the time in use. In addition, whether private EVs participate in dispatching or not requires each owner to make an independent decision, and the degree of user participation is strongly correlated with economic incentives. Therefore, theoretically, there is a large space for adjusting the charging time of private EVs. Business vehicles account for a small proportion, and the number of trips and travel time are stochastic, and most of the time they are in parking state, but the process of participating in response is similar to that of private cars. Therefore, it is of great practical significance to study how to guide private car users to participate in auxiliary services with appropriate economic incentives. Therefore, private cars are classified as the research object of this paper.

Travel Status Description of Private EVs
The travel activities of private cars are relatively clear and fixed, and the length of travel chain during working days is mostly three links (Home-Workplace-Home) [28]. In order to simplify the model, considering that private EVs are only used for commuting on and off duty on working days, a travel cycle is divided into four periods according to its travel law. g(t) is used to represent the state of EV connected to the power grid at time t, where: Indicates that EV is off grid 1, Indicates that EV is on grid (1) The operation status of EVs in each period is described as follows: 1.
In the period ∆T 1 : at the beginning of the travel cycle, the EV leaves the grid at full charge, and the battery is in the state of discharge, g(t) = 0; 2.
In the period ∆T 2 : during the period from arriving at the work place in the morning to leaving the company after work, the vehicle is in the parking state and connected to the power grid. During the period, the vehicle is in the state of charging (discharge to the power grid is not considered) or idle state. The electric quantity needs to ensure the travel capacity required in the next period, g(t) = 1; 3.
In the period ∆T 3 : at the time of driving home from work in the evening, EV is off the grid and the battery is in discharge state, g(t) = 0; 4.
In the period ∆T 4 : during the period from returning home in the evening to going to work the next day, the vehicle is in the state of charging (the same as not considering discharging to the power grid) or idle state. Considering the sufficient charging time at night, it is necessary to ensure that the battery is fully charged in this period, g(t) = 1.
Generally speaking, EVs will not sacrifice the established travel routine in exchange for the benefits of participating in demand-side response. Therefore, this paper only studies the potential and capacity of EVs in the on-grid state, namely ∆T 2 and ∆T 4 , without affecting users' travel routine.
In addition, in the research of EV charging demand, most scholars at home and abroad analyze and describe the travel characteristics of EVs based on the travel data of fuel vehicles. Among them, some literatures point out that the commuting attributes of private EVs are obvious and have strong regularity. By analyzing the travel characteristics of private EVs, it is pointed out that the daily mileage is approximately logarithmic distribution [29], the first driving time conforms to normal distribution and the end travel time accords with Poisson distribution [22]. This paper will also build a private EV travel model based on the above research.

The Best Physical Charging Scheme for EVs Participating in Ancillary Services
Firstly, the model basis and assumptions are described: 1.
Based on EVi (i = 1, 2, . . . , N) of EV fleet in the case of ∆Ti2 and ∆Ti4 target periods as an example, let [tiin, tiout] denote the on-grid period, and tiin and tiout are respectively the time when the vehicle enters and leaves the grid (but does not represent the time when the vehicle i starts charging and finishes charging).

2.
The initial state of EVi connected to the power grid is recorded as SOCiin. In order to avoid excessive battery discharge, users will set a SOCimin value subjectively, and charge the EV when SOC is lower than SOCimin, and the user also hopes that the SOC of the battery is not lower than SOCimin as far as possible after the next journey.

3.
In the case of sufficient parking time, the user expects the EV charging to 100% by default [30]; however, considering the user's psychology, if it is known that reducing part of the charging power can obtain economic subsidy, then under the incentive of subsidy, the user is willing to give up the requirement of off-grid full charge and set a relatively satisfactory expected rate of charge SOCiexc, SOCiexc ∈ [SOCiin, SOCmax].
That is to say, as long as the charging rate can reach SOCiexc when the user is off-grid, it will not affect the user's enthusiasm for participating in the auxiliary service.
Based on literature research [31,32], the charging load transfer interval of EV i participating in ancillary services under charging mode is given, as shown in Figure 2. ing part of the charging power can obtain economic subsidy, then under the incentive of subsidy, the user is willing to give up the requirement of off-grid full charge and set a relatively satisfactory expected rate of charge SOCiexc, SOCiexc ∈ [ SOCiin, SOCmax]. That is to say, as long as the charging rate can reach SOCiexc when the user is off-grid, it will not affect the user's enthusiasm for participating in the auxiliary service.
Based on literature research [31,32], the charging load transfer interval of EVi participating in ancillary services under charging mode is given, as shown in Figure 2. In this paper, the charging process is an unsteady power mode. When EVi is fully charged to SOCmax in parking state, it will turn into idle state. The BC segment represents the constraint of compulsory charging to meet the travel demand; that is, if the EV is not charged at point B from the beginning of entering the network, even if the maximum rated power is used for charging before leaving the network, the SOC lower bound of off-grid SOC cannot be reached, so point B is the latest charging time for EVs. The latest charging time and linear BC can be expressed as: where is the rated charging power of EVi, which is considered as a constant value, and Ci is the battery capacity of EVi, η is charging efficiency. When EVs participate in peak In this paper, the charging process is an unsteady power mode. When EV i is fully charged to SOC max in parking state, it will turn into idle state. The BC segment represents the constraint of compulsory charging to meet the travel demand; that is, if the EV is not charged at point B from the beginning of entering the network, even if the maximum rated power is used for charging before leaving the network, the SOC lower bound of off-grid SOC cannot be reached, so point B is the latest charging time for EVs. The latest charging time t i sj and linear BC can be expressed as: where p i rated is the rated charging power of EV i , which is considered as a constant value, and C i is the battery capacity of EV i , η is charging efficiency. When EVs participate in peak shaving auxiliary service, it is necessary to sacrifice off-grid SOC to expand the response capacity and further reduce the charging load during this period; at this time, there is EV i is connected to the power grid at the time of t i in , and the charging pile obtains the charging information of EVs and the travel information of users. Taking minimizing the load fluctuation of the power grid as the objective function, the optimal charging scheme of the EV fleet is obtained according to in-grid time, off-grid time, SOC lower bound SOC and the latest charging time t i sj . The EVs participating in the regulation complete the charging in this period according to the charging instructions issued by the charging pile.
(1) Objective function where Q grid,t is the grid operation load at time t; p i t denotes the charging power of EV I at time t according to the best charging scheme; Q avg is the average load of the grid in one day when the EV fleet is charged according to the best charging scheme.
Charging power constraint:

2.
Charging time constraint: 3. SOC lower bound at off-grid time: where C i is the battery capacity, d i is the estimated mileage for the next journey and w i is the EV power consumption per kilometer. 4.
If the charging scheme issued for EVs after entering to grid is not charging this time, it is necessary to ensure that the SOC at the end of the next journey is not less than SOC i min : Considering the abundant charging time of EVs at night and that the main purpose of charging at night is to meet the demand of grid valley filling, EVs should be charged as much as possible at night. In this paper, it is set that the off-grid SOC of EV should reach 100% after charging at night.

Evaluation of Economic Adjustable Capacity of EV Considering User Decision
Load aggregators aggregate EV fleets, issue instructions through EVA (distributed database and application platform of EV) interface, and users can obtain charging price information and supply and demand information through wireless terminals (smart phones, iPad, etc.).

EV User Decision Model Based on Logistic Function
Some studies think that EVs can be scheduled when connected to the power grid [33], but in reality, EV users have their own subjective will, and whether they accept EVA scheduling is affected by many factors. In view of the one-time decision-making of users, the final charging mode depends on the initial one-time decision-making of users in the process from EV access to the grid to leaving the grid. This paper mainly considers that when the user is faced with dispatching instructions, it will mainly evaluate them from two aspects-the subsidy level obtained by participating in the response and the difference of electricity perceived by the user in the two cases of participating and not participating in the response-so as to make the response decision. Therefore, this paper mainly studies the impact of the above two factors on user decision-making. Based on the logistic function, the problem is transformed into the problem of vehicle owners' choice between accepting and not accepting scheduling instructions.
Among them, ∆SOC i refers to the deviation of SOC from the expected SOC at the off-grid time.
u i represents the subsidy received by EV users through participating in the response: (1) When the response period is valley filling period: (2) When the response period is peak shaving period: where p i t is the charging power distributed to EV i at time t; p i t is the reference charging power of EV at time t. x i can be transformed into probability by a logistic function.
The unit of subsidy level S t is CNY/kWh; α is the benchmark probability coefficient; β 1 , β 2 are variable coefficients (β 1 > 0, β 2 > 0); E is the error of random variables, in which the coefficients of each variable need to be obtained through investigation and fitting based on statistical data.  (16) where p i rated is the rated charging power of EV i . In the background of this paper, the load aggregator sets the subsidy level S t at each time according to the clearing price level of the ancillary service market in the target period, and then calculates the revenue and cost of EVs according to the SOC at the time when EV i enters the grid and the user's off-grid time and the lower bound of SOC at off-grid time; after it is distributed to users, users combine u i and ∆SOC i makes decisions. Therefore, from the perspective of user decision-making, ∆SOC i in the physical scheduling scheme is a certain quantity, then Formula (2) is transformed after the fixed value part is merged: EVA can affect the user's response revenue u by changing the subsidy s t , thus affecting the probability of user participating in the response. As shown in Figure 3 below:

Analysis of Adjustable Capacity Considering User Decision
Under the mode of load aggregator, the load aggregator participates in the auxiliary service market on behalf of EV users in the superior market with the goal of maximizing the expected profit, and the decision variable = [ , , , … , ] is the subsidy level for EVs at all times. The objective function is as follows.
(1) When the response period is valley filling period: (2) When the response period is peak shaving period: s.j.

Analysis of Adjustable Capacity Considering User Decision
Under the mode of load aggregator, the load aggregator participates in the auxiliary service market on behalf of EV users in the superior market with the goal of maximizing the expected profit, and the decision variable s = [s 1 , s 2 , s 3 , . . . , s 24 ] is the subsidy level for EVs at all times. The objective function is as follows.
(1) When the response period is valley filling period: (2) When the response period is peak shaving period: where N is the number of EVs; p i t represents the actual charging power of EV I at time t; p i t is the reference charging power of EV at time t; ρ i t,m represents the revenue of the load aggregator at time t by stimulating EVs to participate in the response; price market,t is the forecast clearing price level (unit: KWH) of the ancillary services market period at time t.
The economic adjustable capacity of EV cluster considering user decision at time t is as follows: where p i t and p i t represent the actual charging power of EV i participating in the response and the reference charging power of EV i not participating in the auxiliary service response at time t; T valley and T peak are valley filling and peak shaving, respectively.

AC Reinforcement Learning Algorithm for Optimal Physical Charging Scheme
The idea of reinforcement learning (RL) algorithm is to give a reward function, and maximize the sum of rewards in the future by repeatedly testing and learning strategies in the simulated environment or the real world. Then, the learning strategies are compared with the optimization problems of each step to get real-time decisions. Reinforcement learning can effectively improve and solve the problem of low computational efficiency of the current centralized coordination method [34], and can effectively overcome the problems of discrete action space, difficult training convergence and poor stability of the previous reinforcement learning method [35]. It is suitable for solving the large-scale EV charging scheme in this paper.
In this paper, an AC network is used to build the charging strategy model of EVs participating in ancillary services.
(1) Overall structure The AC framework is divided into two parts: actor and critic. Both of them are multilayer BP neural networks. The actor selects behavior based on probability distribution and the critic evaluates score based on behavior generated by actor; the actor modifies probability of selecting behavior according to critic scores. The logic diagram of AC algorithm is shown in Figure 4.
According to the scenario of solving the physical charging scheme of EVs, the AC model environment is built as follows: in the reference state, the EVs start charging after entering the grid or stop charging after reaching the expected SOC. In order to minimize the load fluctuation of EVs after grid connection, it is necessary to optimize the charging power at each time point, so as to obtain the best physical charging scheme. The state characteristics include SOC and charging time. The action distribution is the charging power of EVs at 24 time points, and the reward is the fluctuation of total load variance before and after the EVs enter the grid. (1) Overall structure The AC framework is divided into two parts: actor and critic. Both of them are multilayer BP neural networks. The actor selects behavior based on probability distribution and the critic evaluates score based on behavior generated by actor; the actor modifies probability of selecting behavior according to critic scores. The logic diagram of AC algorithm is shown in Figure 4. According to the scenario of solving the physical charging scheme of EVs, the AC model environment is built as follows: in the reference state, the EVs start charging after entering the grid or stop charging after reaching the expected SOC. In order to minimize the load fluctuation of EVs after grid connection, it is necessary to optimize the charging power at each time point, so as to obtain the best physical charging scheme. The state characteristics include SOC and charging time. The action distribution is the charging power of EVs at 24 time points, and the reward is the fluctuation of total load variance before and after the EVs enter the grid.

State
s t is the description of the situation at the current time T. In this paper, a state section t is the state of the t-th EV after accessed to the grid.
In this paper, for time t, the optimal charging scheme is determined according to its in-grid time t in , off-grid time t out , minimum rate of charge threshold SOC t,min , expected rate of charge SOC t,exc , off-grid SOC lower bound SOC t and the latest charging time t sj . State variables are represented as:

Action
Action s t is that at the current time t, the agent observes state a t from the environment, the response to the environment. The action in this paper shall be the charging power P of EV at all times, expressed as:

Reward
The objective of the agent is to maximize the cumulative reward. According to Formula (4), the optimization objective of the model is to minimize the load fluctuation. Therefore, for a single EV charging scheme, the reward function is set as follows: Among them, Q i grid,t−1 and Q i grid,t is the load at each time point of the power grid in the last observation state and the current observation state; Q grid,t−1 and Q grid,t are the average load of each time point in the last observation state and the current observation state, respectively. In this paper, the load fluctuation caused by the charging scheme under a certain action is taken as a reward, and the variance is used to describe the load fluctuation. The power grid includes the total charging load of EVs and the basic load Q of the power grid Q i grid,0 . (2) Actor The Actor is an action output module, whose function is to output the probability of each action by constructing strategy gradient and training, where v t is the value function generated by the Critic. The loss function of actor network is as follows: To calculate L (ζ) gradient, which is called the policy gradient of actor network, it can be expressed as: where ∇ is the gradient; β represents the learning rate of the strategy gradient, β ∈ (0,1). The gradient descent method is used to train the strategy gradient, and finally the actor network outputs the probability P(A t ) of different actions. (

3) Critic
The Critic is a value evaluation module, whose function is to evaluate the value of each action according to the observation value and reward value through time difference algorithm (TD). Its output value is the estimated value of the value function of time difference algorithm, and the value function is transmitted to Actor to provide reference for Actor's action selection.
The actual value of state action value function of EV cluster at time t is as follows: where Q r (s t , a t ) is the actual value of the state action value function; s t is the state of EV fleet at time t; a t is the action (charging scheme) selected by the EV group at the time t is the EV individual; R t+1 represents the selection of action a t for the EV fleet in status s t to state s t+1 ; γ is the discount factor; maxQ(s t+1 , a t+1 ) indicates the maximum value of the state action function of the EV cluster in state s t+1 . Discount factor γ indicates the rate at which rewards decay over time steps. That is to say, the further the system state is from the time t, the smaller the interest correlation. When γ = 0, only the current state interests are considered; when γ = 1, the interests of the current state and the future state are equally important.
The method of updating the Q value is as follows: where Q k−1 (s t , a t ) represents the estimated value of the state action value function of the EV at time t in the k-th iteration; α is the learning efficiency, and α < 1. Let TD error = Q k (s t , a t ) − Q k−1 (s t , a t ) as the loss function of the Critic; the gradient descent method is used to train the Critic.
At the same time, let TD error as the value function v t in an actor network. The loss function L(Actor) of the actor network is: The gradient descent method is used to train the loss function of the actor network, and the output is the probability distribution of each action.
To sum up, the logic diagram of the algorithm is given in Algorithm 1: for episode = 1 to K do 3.
set Q i grid,0 //Set the basic load of the power grid at all times 5. for iteration = 1 to N do //iteration 6. S = t in , t out , SOC t,min , SOC t,exc , SOC t , t sj //Collect the information of EVs entering and leaving the grid 7.
A = [p 1 , p 2 , p 3 , . . . , p 24 ] //Action setting ( S t , A t , R t , S t+1 ) → D //Store data in policy pool 10. mini-batch ( S t , A t , R t , S t+1 ) ← D //Randomly extract data from experience pool for batch gradient descent 11. Q k−1 (s t , a t ) ← Critic − NN //The output value of Critic network is the estimated value of action value function 13. //Calculate the target value of action value function 14. TD error = Q k (s t , a t ) − Q k−1 (s t , a t ) //Critic network training error 15. gradient decent ← TD error 2 //Training critical network with gradient descent method 16. lg(P(δ)) ← Actor − NN //The output of actor network is the probability P

Subsidy Price Optimization Algorithm Based on Wolf Colony Algorithm
The wolf colony algorithm is a random probability search algorithm, which can quickly find the optimal solution with a large probability. Moreover, the wolf colony algorithm also has parallelism, which can search from multiple points at the same time, and the points do not affect each other, so as to improve the efficiency of the algorithm [36].
When the charging order has been issued, the EV users will respond according to the subsidy they can get by participating in the response, and choose to accept or not to accept the scheduling. Considering the large number of time-sharing price variables and the number of EVs, and that the price subsidy level at each time point is different, this section uses the optimization ability of the Gray Wolf algorithm to solve the time-sharing subsidy price optimization problem.
(1) Basic process Gray Wolf algorithm is an optimization algorithm based on the three links of hunting behavior: tracking, hunting and capturing. According to the fitness value of the whole group, it is divided into leader wolf (head wolf) from top to bottom α, vice chief wolf β, common wolf δ and bottom wolf ω. The leader wolf has the highest fitness value in the group, and plays the role of specifying the moving direction of the wolf group; the ω wolves' fitness value are low, obeying the law of α, β, δ wolves, and providing stability for the pack. The basic idea of the Gray Wolf algorithm is that α, β, δ wolves locate their prey (optimal solution) and guide ω wolves encircle and hunt.
The process of wolf hunting is as follows: Among them, D is the distance between the individual and the prey, and X(t + 1) is the regeneration position; S p (t) represents the position vector of prey (optimal solution), S(t) represents the individual position vector of gray wolf; C and a are coefficient vectors; r 1 , r 2 is a random number vector with module length between 0 and 1.
The capture activity is mainly realized by the decrement of A. The value of a decreases linearly from 2 to 0 with the number of iterations. In the process of decrement, the corresponding value of A will change between [−a, a]. If |A| ≤ 1, the next generation of wolves will be closer to their prey; if 1 ≤ |A| ≤ 2, the wolves will disperse away from the prey, resulting in the loss of the optimal solution position and falling into the local optimum. The updating formula of a value is as follows: where t is the current number of iterations and T is the preset maximum number of iterations. When wolves capture, the location of the wolf α, wolf β, wolf δ with the highest fitness value is calculated to determine the location of the optimal solution: Finally, the location of the wolves is determined by the wolf α, wolf β, wolf δ: The fitness function in this paper is the expected profit F of load aggregators participating in the ancillary service market described in Formula (6). The decision variable is the subsidy price at 24 time points, expressed as s = [s 1 , s 2 , . . . , s 24 ], so the position space of the corresponding wolf pack is expressed as an n × 24 matrix.
(2) Algorithm solving steps The steps of the Gray Wolf algorithm are as follows: Step 1: determine the parameters of total overlap algebra K, population size m, etc.; Step 2: generate the initial candidate price subsidy S m×24 according to the coding rule of Formula (19); Step 3: the individual fitness value in the current population is calculated according to the objective function Formula (18) and constraint condition Formula (21); to update the location of wolf α, wolf β, wolf δ, and the position vector of wolf detection is generated according to the historical data; Step 4: update the position vector of the wolf α, wolf β, wolf δ and the observation wolf according to Formulas (38)-(39); guide the wolves to update the position according to Equation (40), and update the parameters A and C.
Step 5: if the wolf colony fitness value reaches the maximum number of iterations, the algorithm achieves the expected goal and stops and then goes to step 6; if not, it goes to step 3; Step 6: the algorithm ends and outputs the individual with the highest fitness value.
In conclusion, the flow chart of two-stage EV physical and economic adjustable capacity evaluation based on AC reinforcement learning and the wolf colony algorithm is shown in Figure 5. Sustainability 2021, 13, x FOR PEER REVIEW 14 of 23 Figure 5. Flow chart of two-stage EV physical and economic adjustable capacity evaluation based on AC reinforcement learning and wolf colony algorithm.

Case Design
(1) Electric vehicle parameters setting There are 150 private EVs in the set case, and the parameter settings are listed in Table  1. Based on the literature, the parameters set for the commuting mileage, in-grid and off-grid time of period of ΔT2 and ΔT4 are listed in Table 2

Case Design
(1) Electric vehicle parameters setting There are 150 private EVs in the set case, and the parameter settings are listed in Table 1. (2) Description of EV travel characteristics Based on the literature, the parameters set for the commuting mileage, in-grid and off-grid time of period of ∆T 2 and ∆T 4 are listed in Table 2. (3) State of charge of EV In the ∆T 1 or ∆T 3 period, the EV leaves the grid and enters the discharge state. In the ∆T 2 or ∆T 4 period, EVs are connected to the power grid for charging. The SOC change in EV before and after driving is as follows: where C is battery capacity, D is commuting distance and W is power consumption per kilometer.

(4) Parameters of the algorithms
The parameters of AC reinforcement learning and the wolf colony algorithm mentioned in this paper are listed in Table 3.

Case Analysis
(1) Model solving The example is set as follows: at the beginning of each travel cycle (at the beginning of ∆T 2 ), all of them were in full charge state; in the daytime, the SOC exc required for ∆T 2 is assumed to be 90% when off-grid, while for the ∆T4 at night, it is assumed to be fully charged when off-grid. According to the operation data of the power grid, the power load is divided into three periods: peak period (8:00-11:00, 17:00-23:00), valley period (23:00-5:00 on the next day) and normal period (5:00-8:00, 11:00-17:00), among which peak period opens the peak adjustment auxiliary service market, and valley period opens the valley filling auxiliary service market. The basic load at 96 time points of the power grid is set as Table A1, Appendix A.
According to the first stage of the model, the reference charging power of EVs fleet and the charging power of EVs fleet in full response are calculated. We can obtain the capacity potential of EVs participating in the regulation according to the instructions. As shown in Figure 6 below, EV charging according to the best charging scheme can effectively pull up the nighttime load trough, reduce the peak charging load and narrow the peak valley difference. Sustainability 2021, 13, x FOR PEER REVIEW 16 of 23  First, three scenarios are described as follows: Scene 1: grid reference load without considering EV access; Scene 2: considering the grid load of EV connected but charging is not regulated; Scene 3: EVs are charged according to the best physical charging scheme; According to scene 1, scene 2 and scene 3, peak to valley ratio of daily load in power grid, daily average load, daily load variance of power grid and the load of peak period I (8:00-11:00 in the daytime), peak period II (17:00-23:00 p.m.) and valley period (23:00 p.m.-5:00 p.m.) are calculated and compared for analysis.
It can be seen from the above Table 4 that the peak to valley ratio of power grid is further reduced from 0.3956 to 0.3550 in the reference state due to the disordered access of EV charging load. Especially, because the charging time of EVs coincides with the peak period of power grid naturally, the load of peak period I and peak period II increases significantly, resulting in "peak on peak". However, the optimal physical charging scheme can effectively narrow the gap between peak and valley of power grid by regulating EVs and aiming at the minimum fluctuation of power grid. It can be seen from the   First, three scenarios are described as follows: Scene 1: grid reference load without considering EV access; Scene 2: considering the grid load of EV connected but charging is not regulated; Scene 3: EVs are charged according to the best physical charging scheme; According to scene 1, scene 2 and scene 3, peak to valley ratio of daily load in power grid, daily average load, daily load variance of power grid and the load of peak period I (8:00-11:00 in the daytime), peak period II (17:00-23:00 p.m.) and valley period (23:00 p.m.-5:00 p.m.) are calculated and compared for analysis.
It can be seen from the above Table 4 that the peak to valley ratio of power grid is further reduced from 0.3956 to 0.3550 in the reference state due to the disordered access of EV charging load. Especially, because the charging time of EVs coincides with the peak period of power grid naturally, the load of peak period I and peak period II increases significantly, resulting in "peak on peak". However, the optimal physical charging scheme can effectively narrow the gap between peak and valley of power grid by regulating EVs and aiming at the minimum fluctuation of power grid. It can be seen from the First, three scenarios are described as follows: Scene 1: grid reference load without considering EV access; Scene 2: considering the grid load of EV connected but charging is not regulated; Scene 3: EVs are charged according to the best physical charging scheme; According to scene 1, scene 2 and scene 3, peak to valley ratio of daily load in power grid, daily average load, daily load variance of power grid and the load of peak period I (8:00-11:00 in the daytime), peak period II (17:00-23:00 p.m.) and valley period (23:00 p.m.-5:00 p.m.) are calculated and compared for analysis.
It can be seen from the above Table 4 that the peak to valley ratio of power grid is further reduced from 0.3956 to 0.3550 in the reference state due to the disordered access of EV charging load. Especially, because the charging time of EVs coincides with the peak period of power grid naturally, the load of peak period I and peak period II increases significantly, resulting in "peak on peak". However, the optimal physical charging scheme can effectively narrow the gap between peak and valley of power grid by regulating EVs and aiming at the minimum fluctuation of power grid. It can be seen from the table that the best charging scheme not only makes up for the increase in peak valley difference caused by disorderly charging of EVs after entering the grid but also effectively reduces the peak load and increases the peak valley load by optimizing the charging time and power of EVs, further reducing the daily peak valley gap and increasing the peak valley ratio to 0.4814. EV k , an individual of the EV fleet, is randomly selected for analysis. The information of EV k is shown in the Table 5 below. From EV k 's charging data and charging constraints, it can be seen that the SOC of EV is sufficient for the next journey when it enters the grid, and the parking period coincides with the auxiliary service market opening period, so it meets the regulatory conditions. During the period from entering to leaving the grid, the reference charging power and the charging power at each time point under the optimal charging scheme are as Figure 8: table that the best charging scheme not only makes up for the increase in peak valley difference caused by disorderly charging of EVs after entering the grid but also effectively reduces the peak load and increases the peak valley load by optimizing the charging time and power of EVs, further reducing the daily peak valley gap and increasing the peak valley ratio to 0.4814. EVk, an individual of the EV fleet, is randomly selected for analysis. The information of EVk is shown in the Table 5 below. From EVk's charging data and charging constraints, it can be seen that the SOC of EV is sufficient for the next journey when it enters the grid, and the parking period coincides with the auxiliary service market opening period, so it meets the regulatory conditions. During the period from entering to leaving the grid, the reference charging power and the charging power at each time point under the optimal charging scheme are as Figure 8: The SOC value of the EV is 0.76 in the evening, which is 0.19 less than the expected SOC. Under the condition that SOC can meet the demand of EVs in the next period of The SOC value of the EV is 0.76 in the evening, which is 0.19 less than the expected SOC. Under the condition that SOC can meet the demand of EVs in the next period of travel, the part of load is transferred to night charging by reducing the charging power in the daytime.
In the second stage, the subsidy is calculated according to the up/down regulated charging power at each time point under the best physical charging scheme of EV, and the subsidy is combined with ∆SOC and substituted into the logistic function to calculate the probability of the user accepting the scheduling scheme. Taking the maximum income of the load aggregator as the objective function, the subsidy price at each time point is determined. Finally, the economic adjustable capacity of EV cluster under the incentive of the subsidy is obtained. Figure 9 shows the best subsidy at each time point. travel, the part of load is transferred to night charging by reducing the charging power in the daytime.
In the second stage, the subsidy is calculated according to the up/down regulated charging power at each time point under the best physical charging scheme of EV, and the subsidy is combined with ΔSOC and substituted into the logistic function to calculate the probability of the user accepting the scheduling scheme. Taking the maximum income of the load aggregator as the objective function, the subsidy price at each time point is determined. Finally, the economic adjustable capacity of EV cluster under the incentive of the subsidy is obtained. Figure 9 shows the best subsidy at each time point. The ΔSOC and subsidies received by users are substituted into the logistic function to calculate the probability of each user accepting the regulation, and the charging curve of the regulated part is further calculated, shown as Figure 10. As shown in the figure above, the reason why the economic adjustable capacity is less than the physical adjustable capacity is that the consideration of user decision-making is added. Each EV user makes a decision based on ΔSOC and the subsidy. If the regulation The ∆SOC and subsidies received by users are substituted into the logistic function to calculate the probability of each user accepting the regulation, and the charging curve of the regulated part is further calculated, shown as Figure 10. travel, the part of load is transferred to night charging by reducing the charging power in the daytime.
In the second stage, the subsidy is calculated according to the up/down regulated charging power at each time point under the best physical charging scheme of EV, and the subsidy is combined with ΔSOC and substituted into the logistic function to calculate the probability of the user accepting the scheduling scheme. Taking the maximum income of the load aggregator as the objective function, the subsidy price at each time point is determined. Finally, the economic adjustable capacity of EV cluster under the incentive of the subsidy is obtained. Figure 9 shows the best subsidy at each time point. The ΔSOC and subsidies received by users are substituted into the logistic function to calculate the probability of each user accepting the regulation, and the charging curve of the regulated part is further calculated, shown as Figure 10. As shown in the figure above, the reason why the economic adjustable capacity is less than the physical adjustable capacity is that the consideration of user decision-making is added. Each EV user makes a decision based on ΔSOC and the subsidy. If the regulation As shown in the figure above, the reason why the economic adjustable capacity is less than the physical adjustable capacity is that the consideration of user decision-making is added. Each EV user makes a decision based on ∆SOC and the subsidy. If the regulation instruction is accepted, the EV will be charged according to the best physical charging scheme, and the rest of the EVs that do not accept the regulation instruction will be charged according to the basic charging curve.
In summary, the regulated EVs are charged according to the issued instructions, and the non-regulated EVs are charged according to the basic charging curve. The total charging curve of the two parts of EVs is the final charging curve of the EV fleet after the economic incentive, as shown in Figure 11. instruction is accepted, the EV will be charged according to the best physical charging scheme, and the rest of the EVs that do not accept the regulation instruction will be charged according to the basic charging curve. In summary, the regulated EVs are charged according to the issued instructions, and the non-regulated EVs are charged according to the basic charging curve. The total charging curve of the two parts of EVs is the final charging curve of the EV fleet after the economic incentive, as shown in Figure 11. Because some EVs do not accept the charging scheme, the charging curve under economic incentive is not as good as the best physical charging curve. However, compared with the disorderly charging of EVs, it still effectively increases the load in the valley at night and reduces the load in the noon and evening, which has a significant smoothing effect on the power system operation.

(8) Sensitivity analysis of logistic function
EVk travel data and charging data are used to analyze variable parameters β1, β2. Sensitivity analysis is carried out to study the influence of variable parameters on the probability P(X = 1) when the subsidy price changes, as shown in the Figures 12 and 13.  Because some EVs do not accept the charging scheme, the charging curve under economic incentive is not as good as the best physical charging curve. However, compared with the disorderly charging of EVs, it still effectively increases the load in the valley at night and reduces the load in the noon and evening, which has a significant smoothing effect on the power system operation.
(2) Sensitivity analysis of logistic function EV k travel data and charging data are used to analyze variable parameters β 1 , β 2 . Sensitivity analysis is carried out to study the influence of variable parameters on the probability P(X = 1) when the subsidy price changes, as shown in the Figures 12 and 13. instruction is accepted, the EV will be charged according to the best physical charging scheme, and the rest of the EVs that do not accept the regulation instruction will be charged according to the basic charging curve. In summary, the regulated EVs are charged according to the issued instructions, and the non-regulated EVs are charged according to the basic charging curve. The total charging curve of the two parts of EVs is the final charging curve of the EV fleet after the economic incentive, as shown in Figure 11. Because some EVs do not accept the charging scheme, the charging curve under economic incentive is not as good as the best physical charging curve. However, compared with the disorderly charging of EVs, it still effectively increases the load in the valley at night and reduces the load in the noon and evening, which has a significant smoothing effect on the power system operation.

(8) Sensitivity analysis of logistic function
EVk travel data and charging data are used to analyze variable parameters β1, β2. Sensitivity analysis is carried out to study the influence of variable parameters on the probability P(X = 1) when the subsidy price changes, as shown in the Figures 12 and 13.  (1) Let coefficient α and β2 be fixed, as shown in Figure 12: according to Equation (5), β1 reflects the user's attention to SOC at the time off-grid. The value of β1 is larger, indicating that users value SOC more and may have obvious journey anxiety. If the load aggregators want to improve the probability of users' response of this kind, they can adjust their charging power as far as possible under the condition of meeting their SOCexc. The change in β1 does not affect the gradient of the function image P-U, but only makes the image move in the horizontal direction-that is, under the same subsidy price level in a certain range, the smaller the β1, the higher the probability of users accepting the scheduling. (2) Let factor α and β1 fixed, as shown in Figure 13: coefficient β2 can affect the skew degree of P-U function image. The larger the value of β2, the more sensitive the users are to the change in revenue. This type of EV users are usually active in demand-side response; the smaller the β2 is, the less sensitive the users are to the change in subsidy price, and the load aggregator often needs to pay more cost to motivate such users to participate in the response. Generally, the response of such EVs to economic incentives is lower. In a certain range, under the same level of income, the larger the β2, the higher the response probability of users.

Conclusions
The promotion of EVs is of positive significance to the emission reduction of greenhouse gases and the prevention and control of air pollutants in the transportation industry. However, the growth of power consumption and load brought by the large-scale promotion of EVs in the future will have a profound impact on the generation side, transmission side, distribution side and power supply side. In order to solve this problem, there are some pilot projects in the world that use the flexible adjustable potential of EVs to generate energy exchange with the power grid, and expand the role of EVs from the field of transportation to the two dimensions of transportation energy. The coordinated development of the vehicle grid will not only can provide EVs more friendly access to the grid and reduce obstacles for the rapid development of EVs, but also bring high value to the operation of the power system.
In this paper, we selected private EVs as the research object, and constructed a twostage physical economic and adjustable capacity evaluation model of EVs for peak load shaving and valley load filling in the ancillary service market. The model is oriented to the scenario of a private EV fleet participating in peak shaving and valley filling of the ancillary service market. In the first stage, the maximum physical adjustable capacity is calculated with the minimum load fluctuation as the goal, and in the second stage, the (1) Let coefficient α and β 2 be fixed, as shown in Figure 12: according to Equation (5), β 1 reflects the user's attention to SOC at the time off-grid. The value of β 1 is larger, indicating that users value SOC more and may have obvious journey anxiety. If the load aggregators want to improve the probability of users' response of this kind, they can adjust their charging power as far as possible under the condition of meeting their SOC exc . The change in β 1 does not affect the gradient of the function image P-U, but only makes the image move in the horizontal direction-that is, under the same subsidy price level in a certain range, the smaller the β 1 , the higher the probability of users accepting the scheduling. (2) Let factor α and β 1 fixed, as shown in Figure 13: coefficient β 2 can affect the skew degree of P-U function image. The larger the value of β 2 , the more sensitive the users are to the change in revenue. This type of EV users are usually active in demand-side response; the smaller the β 2 is, the less sensitive the users are to the change in subsidy price, and the load aggregator often needs to pay more cost to motivate such users to participate in the response. Generally, the response of such EVs to economic incentives is lower. In a certain range, under the same level of income, the larger the β 2 , the higher the response probability of users.

Conclusions
The promotion of EVs is of positive significance to the emission reduction of greenhouse gases and the prevention and control of air pollutants in the transportation industry. However, the growth of power consumption and load brought by the large-scale promotion of EVs in the future will have a profound impact on the generation side, transmission side, distribution side and power supply side. In order to solve this problem, there are some pilot projects in the world that use the flexible adjustable potential of EVs to generate energy exchange with the power grid, and expand the role of EVs from the field of transportation to the two dimensions of transportation energy. The coordinated development of the vehicle grid will not only can provide EVs more friendly access to the grid and reduce obstacles for the rapid development of EVs, but also bring high value to the operation of the power system.
In this paper, we selected private EVs as the research object, and constructed a two-stage physical economic and adjustable capacity evaluation model of EVs for peak load shaving and valley load filling in the ancillary service market. The model is oriented to the scenario of a private EV fleet participating in peak shaving and valley filling of the ancillary service market. In the first stage, the maximum physical adjustable capacity is calculated with the minimum load fluctuation as the goal, and in the second stage, the expected adjustable capacity under subsidy incentive is calculated with the maximum revenue of load aggregators as the goal. Through the model and case analysis, the following conclusions are drawn: (1) According to the description of SOC state of EV under the travel characteristics of users, the maximum capacity potential of EVs participating in the ancillary services market is calculated under the condition of meeting the user travel requirements. (2) After the EV is connected to the power grid, the user makes a decision whether or not to accept the charging instruction issued by EVA. Considering the user's decision is directly related to subsidy value and ∆SOC, a user decision model of logistic function is based on these two variables. In the first stage of the adjustable capacity evaluation model, the charging scheme is formulated to minimize the load fluctuation of the power grid and distributed to the EV users. For users, on the one hand, the larger the gap between the participating SOC and the expected SOC, the smaller the user's desire to participate. On the other hand, the time-sharing subsidy price set by load aggregators directly determines the economic benefits that users can obtain after receiving the adjustment instructions. The higher the subsidy price is, the higher the user participation is. (3) The expectation of EV transferable load depends on its maximum capacity potential and user participation. Load aggregators can increase the probability of users participating in ancillary services by increasing the subsidy level, but at the same time, it will increase their own costs. Therefore, the EV load aggregator needs to optimize the time-sharing subsidy price of the EV group under the condition of meeting its own maximum income, so as to maximize the income.
What needs to be supplemented is that although the model can provide reference for the pilot project to a certain extent, it can still be further improved. With the construction of the pilot project, we have more opportunities to obtain the actual operation data, and can use the user's actual travel data and response data, rather than using statistical data to simulate the actual problems.