Modeling the Charging Behaviors for Electric Vehicles Based on Ternary Symmetric Kernel Density Estimation

: The accurate modeling of the charging behaviors for electric vehicles (EVs) is the basis for the charging load modeling, the charging impact on the power grid, orderly charging strategy, and planning of charging facilities. Therefore, an accurate joint modeling approach of the arrival time, the staying time, and the charging capacity for the EVs charging behaviors in the work area based on ternary symmetric kernel density estimation (KDE) is proposed in accordance with the actual data. First and foremost, a data transformation model is established by considering the boundary bias of the symmetric KDE in order to carry out normal transformation on distribution to be estimated from all kinds of dimensions to the utmost extent. Then, a ternary symmetric KDE model and an optimum bandwidth model are established to estimate the transformed data. Moreover, an estimation evaluation model is also built to transform simulated data that are generated on a certain scale with the Monte Carlo method by means of inverse transformation, so that the ﬁtting level of the ternary symmetric KDE model can be estimated. According to simulation results, a higher ﬁtting level can be achieved by the ternary symmetric KDE method proposed in this paper, in comparison to the joint estimation method based on the edge KDE and the ternary t-Copula function. Moreover, data transformation can e ﬀ ectively eliminate the boundary e ﬀ ect of symmetric KDE.


Introduction
Nowadays, a considerable part of the total emissions is due to contribution of road traffic [1,2]. For this reason, the development of the new technologies (as electric vehicles (EVs) and bicycles for sustainable mobility) is necessary to reduce the emissions. Nevertheless, a huge amount of disorderly EVs charging behaviors could pose a wide range of challenges to the safe, economic, and efficient operation of the charging network (CN) with the large-scale development of EVs. As for a power grid (PG), these challenges originate from the uncertain distribution of the EVs charging load [3][4][5]. In terms of EVs, these challenges stem from the randomness of EVs driving and charging behaviors [6]. As a result, problems such as uneven EVs distribution in the CN, congestion in some charging stations (CSs), and long waiting time for charging, and waste of charging resources are caused [7]. The above problems raised in the operation of CN will become a "bottleneck" to the EVs development, which should be solved without delay.
Fortunately, the distribution of EVs charging load is also characterized by flexible load with a potential for regulation although it is uncertain. The potential can be found in two aspects. On the one hand, the CN (as shown in Figure 1) integrates the attributes of a PG, a transportation network (TN), and an information network (IN), which allows EV users to select a CS base on bidirectional information interaction with the help of the navigation system and the CS administration center, so as to achieve the regulation of EVs charging load between CSs [8,9]. On the other hand, the advancement of battery technology and charging facility technology, especially the emergence of multi-gun charging piles can provide the necessary conditions for regulating EVs charging load in CSs [10]. Therefore, the study of regulating EVs charging load distribution appears to be particularly important and will create opportunities for the safe, economic, and efficient operation of the CN. Modeling of EVs charging behaviors, charging load modeling, the charging impact on the PG, orderly charging strategy, and planning of charging facilities have been focuses of domestic and foreign studies concerning the regulation of EVs charging load distribution, and EVs charging behaviors should be modelled accurately for the latter four research contents. It can be observed that the modeling study of EVs charging behaviors is the key to the regulation of EVs charging load distribution. However, with low popularization of EVs at the beginning, it was difficult to obtain the actual driving data. Considering that drivers' driving times and distances are constant, it is common to study the modeling of the EVs charging behaviors with fuel vehicle data. From the perspective of modeling data, these research data are derived from the national household travel survey (NHTS) data of America [11], daily traffic flow data of Jiangsu Province of China [10], and daily traffic flow data of South Korea [12]. Characteristic parameters of fuel vehicles should be converted into characteristic parameters of EVs charging via certain assumptions when the above data are applied in the actual study. For instance, it must be assumed that the daily driving mileage is positively correlated with the charging time [13], that the end time of the last trip is equivalent to the initial charging time [14], that the initial charging time is independent of the charging time [15], and that the battery of each EV is fully charged [16]. These assumptions might lead to large deviations between Monte Carlo simulation results and actual conditions. As EVs have witnessed rapid growth in recent years, a higher accuracy can be gained from the modeling of charging behaviors with the actual EVs data. In that case, studies on the modeling of charging behaviors with the actual EVs operating data have been rising gradually. In terms of modeling data, these research data are derived from charging data [3,4] of electric taxis in Shenzhen, EVs charging data [17] in Kanagawa, EV parking and charging data [18,19] of CSs in Netherlands, EV charging data [20] of CSs in UK, and EVs charging data [21] of CSs in Nanjing, etc. In terms of modeling objects, these objects primarily consist of the arrival time (or initial charging time), the staying time (that contains waiting time and charging time), and the charging capacity. Additionally, the charging capacity modeling can be divided into three parts: the starting state of charge (SOC) modeling, the ending SOC modeling, and the battery pack capacity modeling. In terms of modeling approaches, these approaches are divided into two main categories: parametric estimation (PE) and non-parametric estimation (NPE). When EVs charging behaviors are modelled with PE, the variable to be estimated conforming to a certain known probability distribution should be presupposed before estimating parameters of the assumed probability distribution with EVs charging behaviors data for testing. Common PE methods are composed of Gaussian distribution [6], mixed Gaussian distribution [4], mixed Beta distribution [19], and Weibull distribution [22] etc. The premise of using the PE approach is to presuppose that the probability distribution is approaching the actual situation. Otherwise, the established EVs charging behaviors model might be significantly deviated from the actual situation. When EVs charging behaviors are modelled with NPE, presupposition is not required. Instead, the probability distribution of the variable to be estimated is gained directly under the drive of EVs operating data. In this way, deviations caused by wrong selection of PE models can be effectively addressed.
At present, there are two approaches to EVs charging behaviors modeling by NPE. One uses the unitary kernel density estimation (KDE) or the unitary self-adaptive KDE [23,24] for modeling with taking no account of the correlation between variables, the other uses the estimation associating the edge distribution function (common empirical distribution function or unitary KDE [25]) with the Copula function [21] for modeling with considering the correlation between variables. In terms of modeling accuracy, the second approach has higher modeling accuracy than the first one. Nevertheless, in the second approach, errors can be found in the estimation of each edge distribution and the overall relationship of describing ternary variables with the correlation. Consequently, the deviation of the joint density estimation generated by the above errors is large, causing a large deviation between the simulation results and the actual results. With this as the cutting point, this paper introduces an accurate joint modeling approach of the arrival time, the staying time, and the charging capacity for the EVs charging behaviors in the work area based on ternary symmetric KDE in accordance with the actual data. The rest of the paper is organized as follows: a ternary symmetric KDE model of EVs charging behaviors is built in Section 2. In Section 3, we solve the model with using KDE software package and toolbox of MATLAB. In Section 4, the ternary symmetric KDE method, and the joint estimation method based on the edge KDE and the ternary Copula function are investigated, and the results are analyzed from the simulations. Finally, Section 5 concludes this paper.

Data Transformation Model
There is a boundary bias in the symmetric KDE, namely, a large deviation will arise in the boundary estimation [26]. In order to eliminate the boundary bias, Box-Cox transformation was selected in this paper to transform EVs data in the work area from all dimensions. Its normal transformation and its corresponding inverse transformation are expressed as Equations (1) and (2), respectively [27].
where log(·) is the logarithmic function; exp(·) is the exponential function; n is the number of EVs samples; λ is a parameter to be estimated. In this paper, the estimated parameter λ can be solved with the maximum likelihood method [27] and the corresponding logarithmic likelihood function is represented by Equation (3).
where var(·) is the variance function; mean(·) is the mean value function; Ω is the transformed data of EVs.
The parameter λ value is estimated aŝ where max(·) is the maximum function.

Ternary Symmetric KDE Model
The modeling object in this paper is ternary variables that involve the overall relationship among the EVs arrival time, the staying time, and the charging capacity. In order to effectively describe its overall characteristics, a ternary symmetric KDE model is established as follows.
Equation (5) is called the ternary KDE of f (x, y, z) [23,24]; h n1 , h n2 , h n3 are bandwidths; (x, y, z) are valued by dividing the cube space that is composed of the maximum values (X max1 , X max2 , X max3 ) and the minimum values (X min1 , X min2 , X min3 ) of three dimensions in the given sample into m 3 segments. Sequentially, (x, y, z) are taken as is a kernel function, which can be expressed by the product kernel function [28].
where K d (·) is unitary kernel function, which exerts no influence on the overall relationship of variables. In order to ensure rational estimation, a symmetric unimodal probability density function centered on 0 is usually selected. Additionally, the same pattern is generally adopted [28]. Uniform kernel, Gaussian kernel, Epanechnikov kernel, and quartic kernel are common kernel functions [23]. In this paper, the Gaussian kernel is selected as the kernel function of the ternary symmetric KDE model, as expressed in Equation (7).
The ternary symmetric KDE model for EVs arrival time, the staying time, and the charging capacity is established as Equation (8).

Optimum Bandwidth Model
Selecting an appropriate bandwidth is extremely important for the actual use of ternary symmetric KDE. When a small bandwidth is selected, the randomness effect will be prominent, resulting in irregular shapes of estimation results. When large bandwidth is selected, the averaging effect will be prominent, possibly causing the structural characteristics of data [29] to be masked and the over-slick estimation result. Therefore, the least square cross-validation was adopted in this paper, so that bandwidths can be directly and automatically generated by data [23] to ensure the appropriateness of the bandwidth. The integral squared error (ISE) between the ternary symmetric KDE and the true density is defined as where the ISE reaching the minimum bandwidth is called the optimal bandwidth. To make it simple, solving the optimal bandwidth of Equation (9) can be equivalent to solving the optimal bandwidth of Equation (10).
In Equation (10), By calculation Taking Equations (6), (11) and (12) into account, the second term in Equation (10) is Additionally, the first term in Equation (10) can be calculated as When the Gaussian kernel is selected as the kernel function, the K * (·) can be expressed as When n is large, n − 1 in Equation (13) can be replaced with n. In this way, a final scaling function can be obtained by substituting Equations (13) and (14) into Equation (10).
The optimum bandwidth model obtained through verification using the least squares method can be established as

Estimation Evaluation Model
A ternary symmetric KDE model is constructed on the basis of the actual EVs operating data in the work area. Moreover, the Monte Carlo method [30] is utilized to generate the simulation data of the required scale. Then, simulation data and original data are presented in a 3D right-angle plane using a 3D scatter density map. Three binary frequency histograms are selected to describe the overall characteristics of the simulated data and the original data, so as to quantitatively evaluate the fitting level of the estimated model. Meanwhile, corresponding frequency matrices can be also obtained. In order to evaluate the difference of the frequency matrix, the similarity between the matrices A and B is defined as where A = mean2(A) and B = mean2(B); mean2(·) is the mean function of all elements in a matrix.
After obtaining the similarity of frequency matrix, the estimation model is defined to fit the evaluation indexes concerning the advantages and disadvantages of original data. It is composed of Equations (19) and (20).
where µ r and σ r are the mean and the standard deviation of the matrix similarities corresponding to the generated data and the original data after n times of simulations. The greater µ r indicates the higher the fitting level of original data of the model; while the smaller σ r indicates the higher the stability of the model fitting the original data.

Algorithm Flowchart
The data transformation model, the ternary symmetric KDE model, the optimal bandwidth model, and the estimation evaluation model are established in this paper, of which, the data transformation model and the optimal bandwidth model are unconstrained optimization models. Functions in Toolbox are used in MATLAB to solve the unconstrained optimization model and the estimation evaluation model. In addition, the KDE software package [31] is also used to model the ternary symmetric KDE. The specific process is presented as follows: Step 1: Extract effective data records of the EVs arrival time, stay time, and charging capacity in the work area to conduct normal transformations on it with Equation (1) sequentially and estimate the transformed parameters with the Equations (3) and (4) sequentially before solving the parameter estimation model with the genetic algorithm (GA). Moreover, the data transformation model is solved by using the GA function in Toolbox in MATLAB to obtain the estimated value of the transformation parameter; Step 2: Substitute the original data and the transformed data into Equations (16) and (17), respectively, and solve the optimal bandwidth model through using the Fmincon function in Toolbox in MATLAB to obtain corresponding optimal bandwidths; Step 3: After reading the original and transformed data and the corresponding optimal bandwidth, model the ternary symmetric KDE on the original data and the transformed data distributions through calling the KDE software package in MATLAB. At the same time, estimation modeling is conducted on the original data and transformed data, respectively, by means of using the Ksdensity function and the Copulafit function in Toolbox in MATLAB; Step 4: Implement the Monte Carlo random sampling through calling the Sample function in the KDE software package in MATLAB by setting the number of simulations and the sampling size, and reading the results of the ternary symmetric KDE. Meanwhile, the Monte Carlo random sampling can be implemented through calling the Copularnd function in Toolbox in MATLAB.
It should be noted that the random sampling of the transformed data estimation should be inversely transformed in Equation (2); Step 5 Read the frequency matrix of the binary frequency histogram of the original data and the simulated data, and solve the frequency matrix similarity between the original data and the simulated data through calling the Corr2 function in Toolbox in MATLAB. Besides, results of the ternary symmetric KDE are compared with that of the joint estimation based on the edge KDE and the Copula function to verify the effectiveness of the ternary symmetric KDE model.

EVs Data
More than 17,000 data records from September 1, 2016 to December 31, 2017 were extracted and used in this paper from a EVs charging service company platform in Nanjing, each of which contains the tab and the name of the CS, the arrival time, the departure time, the charging capacity, the charging amount, the order type, the payment method, the order number and the order status, etc. This data is a crucial support for the analysis of EVs charging behaviors. Four sets of data records, including the name of the CS (it is used for dividing the charging area), the arrival time, the staying time (the difference between the departure time and the arrival time), and the charging capacity were extracted for the study. Apart from the abnormal data record that should be eliminated as required, there was more than 14,000 effective data records.
The randomness is large since the data volume per day per station is small among the effective data. In order to investigate the charging behaviors law of EVs, effective data records are aggregated from space and time. First of all, every CS is classified into the work area, the residential area, the commercial area, and other areas through searching the name of the CS from the map at the spatial dimension, so as to spatially aggregate the effective data records. Next, the EVs arrival times of multiple days are aggregated into one day at the time dimension, so as to carry out time aggregation on the effective data records.
After time-space aggregation, data records in the work area, the residential area, the commercial area, and other areas are 11,740, 681, 1042, and 792, respectively. As can be seen from data, the work area has the largest number of data records (up to 82%), while few records can be found in the residential area, the commercial area, and other areas (accounting for less than 7.3%). In that case, in order to guarantee the rational modeling, the combined distribution of the EVs arrival time (AT), the staying time (ST), and the charging capacity (CC) were analyzed with the EVs operating data in the work area as the original data (as shown in Figure 2) for concluding the charging behaviors law of EV users, which can provide the basis for studying EVs charging load modeling, the impact of charging on the PG, orderly charging strategies, and planning of charging facilities.

Results and Analysis
In order to verify the effectiveness of the method proposed in this paper, the joint estimation method based on the edge KDE and the Copula function [21,25,28] was compared, defined, and simulated in the following four cases: Case 1: Ternary symmetric KDE was conducted on the original data, and simulation data were generated with the Monte Carlo method; Case 2: After conducting normal transformation on the original data, ternary symmetric KDE was performed on the transformed data. Then, the simulated data that were generated by the Monte Carlo method were inversely transformed as the final simulation data; Case 3: Joint estimation based on the edge KDE and the Copula function was conducted on the original data, and simulation data were generated with the Monte Carlo method; Case 4: After conducting normal transformation on the original data, joint estimation based on the edge KDE and the Copula function was conducted on the original data. Then, the simulated data that were generated by the Monte Carlo method were inversely transformed as the final simulation data.
Furthermore, the simulation samples number can be set to 11,740, and the simulation times with Monte Carlo method can be set to 10,000. The experimental platform was built in MATLAB (R2014a 64-bit), based on a Hewlett Packard personal computer (CPU: Intel Core i7-8700 3.2 GHz, RAM: 8 GB, OS: Windows 10 64-bit).
(1) Data Transformation Analysis In this paper, transformed parameters of the AT, the ST, and the CC were estimated with the maximum likelihood estimation method. Estimated results of various parameters are shown in Figure 3. When the parameter estimation results are taken as −0.5575, 0.2170, and 0.3992, corresponding maximum likelihood functions are maximized. Based on this, the data transformation model was utilized to perform normal transformations on the AT, the ST, and the CC in the original data, respectively. The scatter density map (yellow means high density) of the transformed data (without units) is shown in Figure 4. A total of 74.3% of the scattered points (with yellow) gather around the center area in Figure 4, and 87.3% of the scattered points (with non-yellow) in Figure 2 are relatively dispersed. Therefore, most of the scattered points in the 3D space gather around the center area after the normal transformation of the original data.  In general, the skewness can be used to detect whether the data conforms to a normal distribution. If so, it is equal to 0 [25]. In order to better illustrate the effect of data transformation, the frequency histograms of the original data and the transformed data (without units) in all dimensions are given in Figure 5. Additionally, the corresponding skewness results were calculated as (0.7994, 2.1144, 1.8073), and (0.0830, −0.0062, 0.0192). Therefore, the distributions of the transformed data in all dimensions are closer to the normal distributions than those of the original data, and a better normal transformation effect can be observed in the ST and the CC data. (2) Estimation Model Analysis The optimal bandwidths of the original data and the transformed data were calculated with the least square cross-validation method, as shown in Table 1. When the optimal bandwidths h opt1 , h opt2 , h opt3 are taken as (0.5865, 0.0733, 0.2027) and (0.0069, 0.0456, 0.0908), respectively, the corresponding scaling function can be minimized. Based on this, estimation results can be obtained by conducting the ternary symmetric KDE on the original data and the transformed data. Estimation results that are four-dimensional data are projected on three coordinate planes for output. The ternary symmetric KDE results of the original data and transformed data in each coordinate plane are shown in Figure 6. Moreover, the edge KDE should be given to the to-be-compared joint estimation model based on the edge KDE and the Copula function. The edge KDEs of the original data and the transformed data are shown in Figure 7. On this basis, an appropriate ternary Copula function was selected. The ternary normal Copula function (TNCF) and the ternary t-Copula function (TTCF) are commonly ternary Copula functions [25]. The linear correlation parameter matrices of the Copula functions of the raw data and the transformed data are shown in Table 2.  As shown in Table 2, the original data are characterized by a good linear correlation in comparison to transformed data. After that, the squared Euclidean distance between the Copula function and the empirical Copula function [25] were calculated to evaluate their strengths and weaknesses. Regarding original data, corresponding Euclidean distance calculation results are 3.4119 and 3.2634, respectively; concerning transformed data, corresponding Euclidean distance calculation results are 3.2667 and 3.1489, respectively, which show that original data and transformed data can be better fitted in the TTCF model. In this paper, the joint estimation model based on the edge KDE and the TTCF was chosen as the comparison object.

(3) Evaluation Analysis for Estimation Model
A total of 10,000 simulation times were conducted to generate simulated data samples for evaluating the fitting level of the ternary symmetric KDE model and the joint estimation model based on the edge KDE and the TTCF. To make it simple, a simulation sample was selected. To select the simulated data (whose structure is more similar to that of the original data), the axis ranges of the scatter density maps were set to AT ∈ [6,30] Figure 8. The overall 3D structures of the original data (as shown in Figure 2) and simulated data in case 1 and case 2 are approaching. In comparison, the overall 3D structures of the simulated data in case 3 and case 4 are significantly different. In conclusion, the joint estimation model based on the edge KDE and the TTCF might lead to a large deviation between the simulation results and the actual results, while the ternary symmetric KDE is more suitable for describing the overall relationship among the AT, the ST, and the CC, presenting a high fitting level. As shown in Figure 9, the frequency histograms of the original data and the frequency histograms (for describing four-dimensional data) of the simulated data in two cases in different coordinate planes are presented to compare the accuracy levels of the ternary symmetric KDE in case 1 and case 2. The frequency matrix similarities between the simulated data and the original data in each coordinate plane in two cases can be shown in Table 3. The similarity between the frequency matrix of the simulated data and the original data in each coordinate plane in the case 2 is higher than that in the case 1. Meanwhile, the corresponding frequency histogram has a high goodness of fit. Hence, the transformed data can eliminate the boundary bias of the symmetric KDE in a more effective way in comparison to the original data, improving the fitting level of the ternary symmetric KDE model.  The mean and the standard deviation of the frequency matrix similarities between the simulated data and the original data in two cases under 10,000 Monte Carlo simulations are shown in Table 4. Compared with case 1, the simulated data and the original data in each coordinate plane have a higher mean value of frequency matrix similarities and a lower standard deviation in case 2, that is, the ternary symmetric KDE model has a high fitting level and stability under transformed data.

Conclusions
An accurate joint modeling approach of the arrival time, the staying time, and the charging capacity for the EVs charging behaviors in the work area based on ternary symmetric KDE was proposed in accordance with the actual data. It can provide the basis for studying EVs charging load modeling, the impact of charging on the PG, orderly charging strategies, and planning of charging facilities. The main contributions of this paper can be summarized as follows: • Effective data records are aggregated from space and time. On the one hand, every CS is classified into the work area, the residential area, the commercial area, and other areas through searching the name of the CS from the map at the spatial dimension. On the other hand, EVs arrival times of multiple days are aggregated into one day at the time dimension.

•
A ternary symmetric KDE model of EVs charging behaviors is established. A higher fitting level can be achieved by the proposed ternary symmetric KDE method, in comparison to the joint estimation method based on the edge KDE and the TTCF.
The main conclusions of this paper can be summarized as follows: (1) Through the Box-Cox transformation, the original data can be transformed to make its distribution characteristic more normal.
(2) The deviation between the simulation results and the actual results can be effectively reduced by the ternary symmetric KDE, in comparison to the joint estimation model based on the edge KDE and the Copula function. (3) With the Box-Cox transformation, the ternary symmetric KDE model has a high fitting level and stability, and its boundary effect can be effectively eliminated.
Simulated data samples of the arrival time, staying time, and charging capacity with the overall relationship can be generated on a large-scale with the Monte Carlo method, and the charging behaviors modeling results. The simulated data samples can be directly used for modeling the charging load of a certain scale EVs in work areas, with the actual power data of the charging pile that needs to be given separately since it is not stored in the platform. In the future, the charging behaviors modeling results can be modified with the continual accumulation of the actual running data of EVs in other areas, and the modified results can also be used for modeling the charging load of a certain scale EVs in other areas.
It is worth mentioning that the original data drives the charging behaviors modeling to obtain the modeling results that would otherwise be unavailable. Moreover, the waiting time has not been added to the modeling in this paper due to the charging characteristics of EVs in work area. However, the subsequent research will focus on the EVs charging behaviors modeling in highway fast-charging stations where the waiting time is an important (and a new compared to refueling) factor in case of EVs.