Factor Analysis of the Aggregated Electric Vehicle Load Based on Data Mining

Electric vehicles (EVs) and the related infrastructure are being developed rapidly. In order to evaluate the impact of factors on the aggregated EV load and to coordinate charging, a model is established to capture the relationship between the charging load and important factors based on data mining. The factors can be categorized as internal and external. The internal factors include the EV battery size, charging rate at different places, penetration of the charging infrastructure, and charging habits. The external factor is the time-of-use pricing (TOU) policy. As a massive input data is necessary for data mining, an algorithm is implemented to generate a massive sample as input data which considers real-world travel patterns based on a historical travel dataset. With the input data, linear regression was used to build a linear model whose inputs were the internal factors. The impact of the internal factors on the EV load can be quantified by analyzing the sign, value, and temporal distribution of the model coefficients. The results showed that when no TOU policy is implemented, the rate of charging at home and range anxiety exerts the greatest influence on EV load. For the external factor, a support vector regression technique was used to build a relationship between the TOU policy and EV load. Then, an optimization model based on the relationship was proposed to devise a TOU policy that levels the load. The results suggest that implementing a TOU policy reduces the difference between the peak and valley loads remarkably.


Introduction
Electric vehicles (EVs) have been promoted in recent years because of their potential to address energy security and environmental problems.It is predicted that 5.2 million EVs will have been sold worldwide by the end of 2017 [1].This will create a great challenge.How can the power system accommodate the active, stochastic load of large-scale EV use, which might have a negative impact?Thanks to energy storage features, the grid might benefit from coordinating the charging behavior of EV owners via the potential flexibility of charging time.To reduce the negative impact and take full advantage of the positive effects, it is essential to understand the nature of the EV load from a system perspective.
Much effort has been directed toward developing an aggregated EV load model [2][3][4][5][6][7][8][9][10][11][12][13].Since EV technology and infrastructure are still developing rapidly, it is at present difficult to construct an accurate load model.Moreover, many related issues have either not been solidified or may develop in the future.Consequently, it is interesting to determine what might happen if some important features of the aggregated EV load were to change.In other words, we are concerned how factors might affect the charging load of a swarm of EVs, such as "what is the relationship between battery size and the EV load?", "how does the charging load curve change if the charging rate changes?", "how does range anxiety influence the EV load if we always charge under a higher state of charge (SOC)?" and "what type of time-of-use (TOU) pricing policy will shift the EV load?"All these questions can be evaluated based on a factor analysis of the aggregated EV charging load.
Several studies have investigated the impact of factors on EV load [2][3][4]14].Some assessed the impact of the charging rate, battery size, and penetration of the charging infrastructure on the EV load by constructing load curves [3,4].Others assessed the effects of a given TOU policy on the load profiles with EV penetration [2,14].Briefly, previous studies evaluated mainly the qualitative impact of some factors.However, it is also important to assess the quantitative impact of these factors on the aggregated EV load at a given instant.For instance, it is crucial to know how much the EV load changes with the factors.The impacts of different factors on the EV load should also be compared.Besides, a precise TOU policy needs devising to coordinate the EV aggregated load.These quantitative impacts haven't been studied in detail and will be discussed here.In this study, the accurate temporal impact of each factor can be determined by means of a quantitative assessment.Based on the results, the aggregated EV load might be regulated by adjusting the values of controllable factors as well, such as by devising a TOU policy.
In order to quantify their impact on the EV load, we classified the factors as internal and external.The former include the battery size, charging rate selected by owners in different locations, penetration of the charging infrastructure, and charging habits.The latter mainly refers to the TOU policy.Data mining techniques were used to build the model by analyzing numerous EV load profiles for different values for these factors.The quantitative relationship between the EV load and these factors can be captured by linear regression (LR) and support vector regression (SVR) with massive input data.However, such input data does not exist in the real world, due to the relative scarcity of EVs and because the values of some factors are fixed.Consequently, a simulation is necessary to generate the input data.
Many studies have concentrated on prediction of the EV load profile by simulation [2][3][4][5][6][7][8][9][10][11][12][13].A growing number of investigators have adopted specified real-world travel datasets to model the EV charging load, such as the National Household Travel Survey (NHTS) [15].This real-world travel dataset includes the historical travel information of 150,147 households, including the number of daily trips, and the start and end times, distance, and purpose of each trip.Exploiting this travel dataset, two approaches can be used to predict the EV load: an indirect or stochastic approach and a direct or deterministic approach [13].Both approaches are first used to simulate the daily EV travel schedules, and the charging schedules can be made based on the results.The difference between the two approaches is in the simulation of travel schedules.Using the indirect approach, travel information (such as the daily mileage) is calculated using probability distributions, rather than extracting it from the travel dataset directly [11][12][13].The probability distribution of the travel information is fit by the statistics in the dataset.Using the direct approach, the daily trip schedule of an EV is considered to be identical to the vehicle information in the travel dataset, and the travel schedules of the total EV swarm are simulated exactly in accordance with the data records in the travel dataset [2][3][4]8].We adopted the direct approach to generate different EV loads as data mining input.
The remainder of the paper is organized as follows: Section 2 proposes the framework of the data mining-based modeling.The methodology used to generate the data mining input data is shown in Section 3. Section 4 uses linear regression to model the relationship between the EV load and the internal factors, and evaluates the impact of these factors on the charging load.In Section 5, SVR modeling is used to build a relationship between the EV load and external excitation, which is the instant when the price of electricity changes.Section 6 presents our conclusions.

Framework of Data-Mining Based Modeling
Data mining (DM) techniques are adopted to examine the relationship between the EV load and various factors.The framework of the DM technique is shown in Figure 1.Data mining is always based on a massive input dataset.Since the input data in this case are unavailable in the real world, the first step of modeling is to generate large amounts of simulation data as input; the data describe the different EV load profiles for different values of the important factors to be investigated.The simulation adopts a direct-approach-based algorithm, in which a real-world travel survey serves as the background data to ensure that the simulation is consistent with the travel patterns of EV owners.Seventy percent of the input data is used as the training set, while the rest serves as the validation set.
The second step is to build the relationships between the EV load and the factors analyzed.LR is used to build the relationship between the EV load and the internal factors, because the LR coefficients represent the sensitivity of the EV load to these factors.SVR is appropriate for modeling external excitation, i.e., the instant when the price of electricity changes.With this model, a TOU policy can be devised to achieve the expected EV load, which is essential for load leveling.The third step is to validate the models.Two methods for validating the models are proposed: one calculates the model error using an index and the other analyzes the stability of the model coefficients.

Methodology
The charging load of every EV at a given instant is assumed to be an independent random variable that follows the same probability distribution.According to the Central Limit Theorem, when n is a very large number, the EV load follows the normal distribution [3]: where n is the total number of EVs; P i (t) denotes the load of the i th EV (i=1,2,…,n) in time step t; and µ(t) and σ 2 (t) are the expectation and variance of P i (t), respectively.As long as the parameters µ(t) and σ 2 (t) are identified, the EV load can be predicted from Equation (1).These parameters can be calculated using the following method.When m is a large number, the law of larger number (LLN) suggests that: In addition, according to the aforementioned assumptions and the LLN: From Equations ( 2) and (3), the EV load profile can be generated from the charging schedules (P 1 (t), P 2 (t), …, P m (t)) (m is a large number).

Simulation of the Charging Schedules
In order to predict the EV load, one must know the charging schedules.Assume that the travel behavior of EVs is identical to that of conventional vehicles (CVs).This assumption is acceptable when CVs are replaced by EVs and the performance of EVs is satisfactory in future.Consequently, an algorithm using the direct approach is developed to simulate the charging schedules based on the historical travel information for CVs.The historical travel information for each CV can be extracted from the NHTS.Since current commercial EVs are mainly cars, only vehicles labeled "CAR" are selected from the NHTS for the charging schedule simulation.If necessary, vehicles of other types can also be used in the simulation using the same methodology.In addition, the algorithm considers the factors associating with charging EVs mentioned in Section 1, as shown in the Table 1.Total number of the trips by EV i l i,j Mileage of trip j for EV i The following three assumptions are made.First, the study considers a simple TOU policy, in which only two price signals are sent at different times in one day to divide it into two periods: the periods with regular and reduced electricity prices.t up and t down are the instants when the price of electricity increases and decreases, respectively.A more sophisticated TOU policy can be studied in future work.Second, the charging power is constant.The charging load of an EV is r kW, while the charging power to the battery is ηr, where r is the charging rate of the charger and η is the efficiency of the charger.Here, η = 0.9.Finally, the mileage per unit of electrical energy is 3.042 mi/kWh, which is the value for the Nissan Leaf [16].
The algorithm used for simulating the charging schedules is shown as Figure 2. In order to generate reasonable charging schedules and reduce the influence of an inappropriate starting SOC assumption, the EV follows the same daily travel schedule for three consecutive days.The starting SOC of the first trip on day one is 100%.The charging schedules of days two and three are stored as the results.
First, the daily travel schedule of vehicle i is read and the travel day is tagged as a workday or a weekend day.Considering the significant differences between the travel patterns on workdays and weekends, the charging schedules generated for workdays and weekends are used separately in modeling.
Second, the algorithm reads the information for trip j of EV i.If the current energy E will enable the distance of trip j, the energy used for the trip will be consumed.Then, the algorithm judges whether a charging service is available at the location where trip j ends.Charging infrastructure is assumed to be always available at home, while the infrastructure in a public place may be limited, such as at workplaces, shops, restaurants, and places for medical or dental service.The probability that a public place offers a charging service is denoted as ρ ( [0,1]

 
).In other places, there is no charging infrastructure.Third, when a charging service is available, the algorithm judges whether the owner of the EV is willing to charge the EV.This willingness depends on the SOC.When the SOC is in the interval [SOC m , 1], the owner will not charge to maintain the battery.When the SOC is in the interval [0, SOC a ], the owner must charge because of range anxiety.When the SOC is in the interval [SOC a , SOC m ], the probability that the owner charges the EV is set as 0.5.The combination of SOC m and SOC a characterizes the owner's charging habit.
Finally, the target of charging is to reach the highest possible SOC before the next trip.When this target can be achieved, the lowest electricity cost is pursued.However, if the owner has multiple potential charging schedules that can all charge the battery pack fully, while paying the same lowest electricity price, the charging schedule is selected randomly.

Purpose and Method
A linear regression model was developed to quantify the impact of internal factors on the EV load.From Section 2, the expectation of the EV load can be calculated using n and µ(t).As a result, the estimate of µ(t) is chosen as the output of the LR model, which is denoted as ˆ( ) t  .The internal features are the inputs of the model.Therefore, ˆ( ) t  is expanded as ˆ( , ) t  F , where F represents the vector of the internal factors.The coefficients of the factors can be used to quantify the impacts.With plenty of input data (F,t) and µ(F,t), the LR model can be built: where F represents the vector of factors, i.e., (E 0 , R p , R h , SOC m , SOC a , ρ); t is the time step; k is the number of factors in F, and ˆk b is the coefficient estimate gained by the LR (k=0,1,…,6).Note that no TOU policy is implemented here when generating massive simulation data as input.The values of the variables can be generated randomly and the corresponding µ(F,t) can be computed by simulation.
Using the same approach, models are built for workdays and weekends with separate training data.

Implementation
A 96-point curve was adopted to describe the EV load for one day, i.e., each time step is 15 minutes.For each time step, the factors are sampled randomly according to the range in Table 2 and 100,000 simulation samples are generated with different factor values.First, 70,000 samples are selected as the training set, while 30,000 samples serve as the validation set (data for the 3rd travel day).[17,18]; ** The range of charging rates is from [19].
Although all of the factors affect the EV load, some factors may prove insignificant for a specified time step.Therefore, the stepwise regression technique was used to select the significant factors and corresponding coefficients [20].Furthermore, the values of the factors were normalized according to their upper and lower bounds, in order to compare the significances of different factors using the coefficients.
After the regression, the coefficients ˆ( ) k b t are calculated (k = 0,1,…,6; t = 1,2,…,96).Here, t = 1 represents the time step from 0:00 to 0:15, and so on.For a given t p , the coefficient ˆ( ) k p b t represents the sensitivity of the EV load to that factor.The coefficient of each factor can be used to evaluate its impact on the EV load.If the coefficient is positive, the EV load increases with the value of the factor, and vice versa.In addition, the value of the coefficient quantifies the contribution of the factor to the EV load.For instance, from 18:00 to 18:15 (the 73rd time step), the load model can be expressed as: 0 ˆ( , 73) 0.0583 0.0110 0.419 0.0996 0.315 0.0500 0.259 Considering factor R h , if the charging rate increases 1 kW, the average EV load increases 0.0665 kW [0.419 × 1 kW/(7.7 − 1.4)] from 18:00 to 18:15.

Model Validation
The Mean Energy Error (MEE) was adopted to validate the model using the data in the validation set: where i is the number of the EV load profile in the validation set;  is the total number of EV load profiles in the validation set; ∆t is the length of the time step (15 minutes); and T is the set of all time steps in a day.The MEE describes the mean error of the estimated charging energy demand during a day compared to the real values in the training set.As the charging energy is calculated using the EV load and length step, the MME can represent the error of the estimated EV load.The MEEs for workdays and weekends are 10.25% and 9.41%, respectively, which are acceptable for analyzing the relationship between the EV load and the internal factors.

Observations and Discussion
The charging profiles for the second and third days are used and stacked graphs of the regression coefficients are shown in Figure 4.In the stacked charts of the regression coefficients, different factors are shown in different colors.There are seven columns at each time step and the length of a column in one color denotes the absolute value of the factor's coefficient at that time step.If the column is above the time axis, the factor is positively correlated with the EV load at that time step and vice versa.Therefore, the impacts of all of -0. the factors on the EV load in one time step can be investigated by analyzing the lengths and positions of the columns in different colors at one time step.In addition, the impact of one factor throughout a day can be evaluated by observing the columns in the corresponding color at all time steps.Note that if the absolute value of the coefficient is greater than 0.05, the impact of the factor is considered major.Comparison of Figure 4(a) and 4(b), as well as comparison of Figure 4(c) and 4(d), shows that the contribution of each coefficient of the factor stays approximately the same on different travel days, which validates the stability of the model.The internal factors are analyzed in detail below.

Battery Size
An EV with a larger battery travels farther.However, traveling a longer distance requires more electrical energy.Therefore, a larger battery leads to a greater energy demand.The stacked chart can determine when the increased energy demand will be met.
On workdays, the coefficient is greater than 0.05 from 18:00 to 6:00 the next day, which indicates that a larger battery leads to a higher EV load for that interval.It can be inferred that the increment in energy consumed caused by the larger battery size is supplied at night (18:00 to 6:00 the next day).Similarly, the energy increment occurs from 16:30 to 6:00 the next day on weekends.

Charging Rate in Public Places
Figure 5 shows that many EVs are parked in public places from 7:00 to 17:00.According to the analysis of the coefficient in Figure 4, R p significantly influences the EV load only from 7:00 to 10:30, and not 7:00 to 17:00.This phenomenon can be explained as follows.After 7:00 on workdays, many EVs arrive at workplaces from home.As travel to the workplaces consumes energy, some owners are willing to charge when EVs are parked.However, the energy demand of these EVs is not large, because most EVs start from home with a high SOC.Therefore, most EVs parking in public places can finish charging before 10:30, and so the charging rate will not significantly affect the EV load from 10:30 to 17:00.On weekends, the coefficient of this factor is greater than 0.05 from 11:30 to 13:00, which indicates the EV load increases with ρ during this period.An analysis similar to that for workdays may be performed.

Charging Rate at Home
Regarding the time step t and a given R h , two groups of EVs contribute to the EV load at t: the EVs that parked before t that have not finished charging and the EVs arriving at home at t to start charging (Figure 6).On the one hand, as the energy demand of the first group of EVs is limited, when R h increases, an increasing number of EVs in the first group can finish charging before t.Therefore, the power demand of the first group at t is reduced as R h increases.On the other hand, for the second group of EVs, the power demand increases with R h .It is interesting to determine how the total power demand of EVs changes with R h .Figure 4 shows that as R h increases, the EV load increases from 10:00 to 20:30 and decreases from 21:30 to 7:00.Let ∆n 1 (t) denote the total number of EVs that finish charging before t due to the increase in R h , and ∆n 2 (t) denote the total number of the EVs in the second group.It can be concluded that from 10:00 to 20:30 ∆n 1 is always smaller than ∆n 2 , while from 21:30 to 7:00, ∆n 1 is always greater than ∆n 2 .For weekends, a similar analysis can be performed.

Battery Maintenance
If the owner is concerned with the lifetime of the EV battery, he will charge his EV less frequently, which results in a lower power demand during this period.Nevertheless, he has to charge his EV during another period when the SOC is too low.The power demand is higher in this period due to the lack of charging in the previous period.From the analysis of the coefficient of SOC m shown in Figure 4, the two periods can be identified.
On workdays, the EV load increases with SOC m from 6:00 to 19:30, while the trend is opposite in the other period.A lower SOC m means caring about the battery more.Therefore, from 6:00 to 19:30 the owners charge their EVs less frequently due to battery maintenance, the approximate period that compensates for this amount of energy is from 19:30 to 6:00 the next day.According to a similar analysis, on weekends fewer EVs are charged from 6:30 to 21:00, and the "compensating" period is from 21:00 to 6:30 of the next day.

Range Anxiety
Throughout the day, the coefficient of this factor is positive, which indicates that the power demand increases with SOC a .The higher SOC a is, the more the owner suffers range anxiety.The phenomenon shows that throughout the day, the more the owner suffers from range anxiety, the greater the power demand of the EV is, since the owners tend to keep the SOC higher.These owners can finish longer trips in their EVs, but their EVs require more energy, increasing the power demand.There is no clear difference in this phenomenon on workdays and weekends.

Penetration of the Public Charging Infrastructure
As ρ increases, more EVs parked in public places can be charged if necessary, typically in the daytime.Therefore, more power is required in the daytime when ρ increases.However, if more energy is supplied in the daytime, then less energy is supplied during other periods, which means that less power is demanded during those periods with increasing ρ.The analysis of the coefficient of ρ on workdays showed that if ρ increases, the EV load increases from 5:30 to 17:00 and decreases during other periods.Similar analysis can be carried out for weekends.

Critical Factors
The relative coefficients of the aforementioned factors were analyzed to compare their significance and identify the critical factors.The relative coefficient of the k th factor is defined as: where G is the set of all the internal features.The relative coefficient represents the relative contribution of that factor at a given time step.When the absolute value of the relative coefficient at a time step is greater than 30%, the corresponding factor is considered a critical factor in that time step.The critical factors at each time step are shown in Figure 7.If the EV load and the critical factor are positively correlated, then the factor is called a positive critical factor; otherwise, it is called a negative critical factor.

Purpose and Method
To devise a suitable TOU policy, it is also important to know how the EV load profile responds to the price of electricity.This paper considers the instants when the price of electricity changes, from high to low or the reverse.It does not consider what prices to set.
Naturally, the relationship between the EV load and the instants when the price of electricity changes is highly nonlinear.Experiments proved that the error of the model describing this relationship is unacceptable when it is built using linear regression.Therefore, SVR was adopted for modeling the relationship between µ and (t up ,t down ). ( , ),( , ),...,( , ) an SVR model can be built that describes the relationship between the target y and the feature vector x [shown in Equation ( 8)], where ŷ is the estimate of y.The SVR ensures that most training errors are within the given error and the regression function fits the unknown data best.However, some parameters have to be selected for SVR modeling, such as the tolerance of the training error for most training data ε, the tradeoff between the ability to predict unknown data and the tolerance for the training error, which is larger than ε (denoted as C) and the kernel function ( , )   i j k x x : * 1 ˆ( ) ( , ) The approach used to select the parameters is as follows.First, the kernel function is selected from classical functions, such as the radial basis function (RBF) In our study, y is µ and x is (t up , t down , t).RBF is chosen as the kernel function, and then the charging load model can be expressed as: t ) can also be made using Equation (10).The case studies in the next section present an application for making a TOU policy.

Case Studies
An SVR model was built to develop a TOU policy for load leveling.The purpose of the TOU policy is to shift the EV load to the original load valley.The TOU policy was made for a province in China on a summer workday.The load profiles without EV integration are collected and called the base load.The peak load of the province is about 6,800 MW.Assume that the total number of EVs is one million and that all the EVs respond to the TOU policy, which is a best-case scenario.In addition, the grid infrastructure is capable of handling many EVs.The problem can be formulated in a way similar to that in [23].In [23], however, the variable is the charging schedule, while here it is the timing of the price signals: where n is the total number of EVs; L 0 (t) is the base load of the province; and L indicates the average power for a day.The meanings of the other symbols are given above.

Implementation of SVR Modeling and Model Validation
Before resolving Equation (11), the model shown in Equation (10) has to be built.The training data for modeling can be obtained as follows: the value of (t up , t down ) is generated randomly after the values of the vector (E 0 , R p , R h , SOC m , SOC a , ρ) are assigned.In the case studies, the assigned value is (24,6,2,90,50,50).Then, the corresponding µ(t up , t down , t) can be computed using the methodology in Section 3. First, 10,080 pairs of µ(t up , t down , t) and (t up , t down , t) are selected as the training set, while 4,320 pairs serve as the validation set (data for the 3rd travel day).The length of each time step is 15 minutes.
After acquiring the training data, the parameters can be selected and the coefficients in Equation ( 10) can be calculated using the methods mentioned above, thanks to LIBSVM [21].Therefore, the SVR model in Equation ( 10) can be established.
The MEE shown in Equation ( 5) was also selected as an index for validating the SVR model using the data in the validation set.The MEE for workdays was 16.89% (each time step was 15 minutes), demonstrating that a relationship between the EV load and price signal can be built with the SVR approach.If LR is used to build this model, the MEE is 37.11%, which is unacceptable.

Results
Since problem Equation ( 11) is difficult to solve with conventional algorithms, an intelligence-based technique called particle swarm optimization (PSO) is used to solve the problem.Kennedy and Eberhart were inspired by flocking birds and introduced PSO in 1995 [24,25].The particle swarm optimization toolbox, PSOt, is used to solve Equation (11) [26].However, some parameters remain to be identified.An acceptable set of parameters is attained after experiments.
The solved inexpensive period is from 23:45 to 4:45.The EV load is shifted significantly to the off-peak period of the base load (Figure 8).The results of the total load and peak-valley difference are shown in Figure 9, comparing three scenarios: (S1) the base load; (S2) the total load without a TOU policy; and (S3) the total load with a TOU policy.
It is obvious that when EVs are introduced without a TOU policy, the load profile is worse because of the increased load variance and peak-valley-load difference.With a TOU policy, the difference between the peak and valley is reduced by 39.11%.These results demonstrate that the load profile improves remarkably with a TOU policy using our SVR model.

Conclusions
Considering real-world travel patterns by exploiting the NHTS, a framework based on data mining is proposed for modeling the aggregated EV load.A linear load model is developed for the internal factors, with which the character and temporal distribution of the coefficients can be used to evaluate the impacts of the internal factors on the EV load quantitatively.Furthermore, a relationship between the EV load and external excitation is built; based on this, an optimization model is used to devise a TOU policy for load leveling.In the future, further research on TOU is planned, which will examine not only when to change the price of electricity, but also how to set the price.

Figure 2 .
Figure 2. Algorithm used for generation of charging schedules.

Figure 3 .
Figure 3.The real and estimated average EV loads.

Figure 4 .
Figure 4. Stacked graphs with the regression coefficients of the internal factors.Workdays with the input data for the (a) 2nd and (b) 3rd travel day; Weekends with the input data for the (c) 2nd and (d) 3rd travel day.

Figure 5 .
Figure 5.Total number of EVs parking in public places at each instant: (a) workdays; (b) weekends.

Figure 6 .
Figure 6.Total number of EVs arriving at home at each instant: (a) workdays; (b) weekends.
ε can be identified after trials, as long as the performance of the SVR is sufficient.Lastly, only the parameters of the selected kernel function (such as γ of the RBF) and C should be identified.N-fold Cross Validation and Grid Search can be used to search for the proper parameters[21].After selecting the parameters, the model can be established by calculating coefficients α i , α i * and b for equation (8) (i = 1,2,…,  ).An optimization problem is solved to compute the coefficients of the SVR model[22]:

Figure 8 .Figure 9 .
Figure 8. EV load profiles with and without implementation of a TOU policy.

Table 1 .
Variables used in the algorithm to simulate the charging schedules.

Table 2 .
The bounds used for factor sampling.

Weekend; 2nd Travel Day (d) Weekend; 3rd Travel Day
The estimate of µ(t) is chosen as the output of the SVR model, which is denoted as ˆ( ) t .As the influence of price signals is to be investigated, ˆ( )