Influence of Different Data Interpolation Methods for Sparse Data on the Construction Accuracy of Electric Bus Driving Cycle

Wang, Xingxing; Ye, Peilin; Deng, Yelin; Yuan, Yinnan; Zhu, Yu; Ni, Hongjun

doi:10.3390/electronics12061377

Open AccessArticle

Influence of Different Data Interpolation Methods for Sparse Data on the Construction Accuracy of Electric Bus Driving Cycle

by

Xingxing Wang

^1,2

,

Peilin Ye

²,

Yelin Deng

^1,*,

Yinnan Yuan

¹,

Yu Zhu

^2,* and

Hongjun Ni

^2,3,*

¹

School of Rail Transportation, Soochow University, Suzhou 215131, China

²

School of Mechanical Engineering, Nantong University, Nantong 226019, China

³

School of Zhang Jian, Nantong University, Nantong 226019, China

^*

Authors to whom correspondence should be addressed.

Electronics 2023, 12(6), 1377; https://doi.org/10.3390/electronics12061377

Submission received: 21 February 2023 / Revised: 6 March 2023 / Accepted: 12 March 2023 / Published: 13 March 2023

(This article belongs to the Special Issue Knowledge Engineering and Data Mining Volume II)

Download

Browse Figures

Versions Notes

Abstract

:

Battery electric vehicles (BEVs) are one of the most promising new energy models for industrialization and marketization at this stage, which is an important way to solve the current urban haze air pollution, high fuel cost and sustainable development of the automobile industry. This paper takes pure electric buses as the research object and relies on the operation information management platform of new energy buses in Nantong city to propose an electric bus cycle construction method based on the mixed interpolation method to process sparse data. Three different interpolation methods, linear interpolation, step interpolation and mixed interpolation, were used to preprocess the collected data. The principal component analysis method and K-means clustering algorithm were used to reduce and classify the eigen parameter matrix. According to the clustering results, different categories of moving section and idle section libraries were established. According to the length of time and the correlation among various types, several moving sections and idle sections were selected to form a representative driving cycle of Nantong city buses. The results show that the mixed interpolation method, based on linear interpolation and cubic spline interpolation, has a good processing effect. The average relative error between the synthesized working conditions and the measured data are 15.71%, and the relative error of the seven characteristic parameters is less than 10%, which meets the development requirements. In addition, the comparison and analysis with the characteristic parameters of the world typical cycle conditions (NEDC, WLTC) show that the constructed cycle conditions of Nantong city are reasonable and reliable to represent the driving conditions of pure electric buses in Nantong city, which can provide a reference for the optimization of the bus energy control strategy.

Keywords:

electric bus; driving cycle; data interpolation method; sparse data; data mining; PCA (principal components analysis); K-means

Graphical Abstract

1. Introduction

Due to the shortage of oil resources and increasingly serious environmental problems, new energy vehicles have ushered in rapid development [1]. The testing of the energy consumption and driving range of electric vehicles [2], the estimation of the charging state [3] and the optimization of the energy management system [4] are all based on vehicle cycle driving conditions. They include UDDS of the American urban road cycle, FTP75 of the federal driving cycle, JC08 of the Japan cycle, NEDC of the new European driving cycle (ECE of the urban driving cycle is its low-speed segment), WLTP of the global unified light load test cycle and CLTC of the Chinese light vehicle driving cycle. These international legal driving cycle data are from internal combustion engine vehicles. However, there are big differences in torque and power characteristics, transmission efficiency, powertrain, energy recovery braking system and so on between electric vehicles and traditional internal combustion engine vehicles [5]. In the process of EV energy management, battery state of charge (SOC) estimation and mileage calculation, simply using the driving cycle of a traditional internal combustion engine vehicle will result in large errors [6].

The vehicle driving condition is based on the actual vehicle driving data which is combined with the relevant mathematical statistics method to quantitatively describe the speed–time curve of typical road vehicle driving conditions. There are differences in the road and traffic conditions among cities, leading to large relative errors in estimating energy consumption per kilometer, driving distance and the equivalent emissions under official driving conditions. Therefore, it is of great significance to construct driving conditions that conform to local vehicle driving characteristics in specific regions. At present, most of the construction methods are based on micro-trip, and the V-A matrix method, cluster analysis method, Markov chain method and other methods are used to construct the construction conditions. Zhao [7] et al. adopted the method of combining the Markov chain and Monte Carlo simulation to construct the driving conditions of EVs in Xi’an, which can fully reflect the actual driving conditions in Xi’an. Hongwen et al. [8] proposed a global travel cycle construction method based on real-time traffic information. The two-step completion method was adopted to obtain complete and accurate traffic information, and the global travel cycle was constructed by using a speed segment database, road segment speed and the Monte Carlo Markov transition matrix. Although the combination of Markov and Monte Carlo is of high precision, its theory is complicated, which brings difficulties to the data processing process and the programming of the analytic program. Amirjamshidi [9] et al. proposed a method to develop representative driving cycles by using simulation data of the Toronto waterfront to calibrate micro-traffic simulation models. The simulation model was calibrated using a multi-objective genetic algorithm to reflect road counts, link speeds and accelerations, and the simulation method was applied to develop morning rush hour travel cycles for light, medium and heavy trucks. Zhao [10] et al. adopted the K-means and support vector machine (SVM) hybrid clustering algorithm to classify vehicle driving segments and select the most representative driving cycle from multiple candidate driving cycles. The results show that the Xi’an EV urban driving cycle has stronger aggressive driving characteristics than other cycles. Shen [11] et al. proposed a subsection construction method based on bus stops and a whole-process construction method based on bus driving conditions, and adopted the K-means clustering method to construct the driving conditions of fixed-line hybrid electric vehicles in Shanghai. Liu [12] et al. took urban roads in Hefei city as the research object, adopted PCA (principal components analysis) to reduce the dimension of motion parameters and adopted the K-means clustering method to classify motion segments. According to the length of time and the correlation among various types, several moving segments are selected to form the driving cycle of passenger cars in Hefei City. The combination of principal component analysis and clustering can effectively extract the main eigenvalues of micro-trip and process the data quickly and efficiently. It is a widely used and mature method at present.

The above vehicle working condition construction methods are all based on data samples with a sampling frequency of 1 Hz, which will consume a lot of time in the process of data collection and only collect the information of the same vehicle, which cannot be compared through another vehicle information. Data collection in this paper relies on the operation management platform of Nantong new energy vehicles developed by the author [13], which can not only monitor vehicle status in real time, but also store vehicle information in the platform through the cloud for users to browse. The longitude and latitude position, driving distance, driving speed, battery status and motor status of Nantong new energy buses were obtained through the platform. To run smoothly on the platform and reduce the burden on the server, the vehicle data collection frequency was 0.1 Hz, with relatively sparse data collection. By observing the bus driving speed in Nantong city, it is found that the bus idle time is long, the average driving speed is larger than that in other cities and the working conditions are special. For sparse data processing, there are usually linear interpolation, spline interpolation, Lagrange interpolation, Newton interpolation and other methods. Therefore, this paper intends to process the sampled data through different interpolation methods and analyze the influence of interpolation methods on the process and results of the construction of the condition.

In this paper, three different interpolation methods will be used to preprocess the collected data. The principal component analysis [14] and K-means clustering algorithm [15] will be used to reduce and classify the feature parameter matrix. The silhouette index [16] will be the standard for measuring clustering results. According to the clustering results, sports segment libraries and idle segment libraries of different categories were established, from which the most frequent duration segments were selected, and the segment closest to the clustering center was selected from the corresponding duration segments to construct the representative bus driving conditions of Nantong city. By comparing the influences of the different interpolation methods on principal component analysis, K-means clustering and synthesis conditions, the average running speed, maximum speed, idle time ratio and other main parameters of the optimal synthesis conditions were compared with WLTC [17] and NEDC [18] conditions, and it was found that there were significant differences in parameters. The deviation of each characteristic parameter from the measured data is approximately 15%, which meets the development requirements and provides a reference for the development of vehicle energy control strategy for specific bus lines. The technical route of this paper is shown in Figure 1 below.

2. Data and Methods

2.1. Original Data Collection

The sampling time and vehicle route have great influence on the construction results of the working conditions [19]. Urban bus lines generally include the following conditions: dense traffic flow in downtown areas, more traffic lights and bus stations, high idle speed ratio and frequent start-stopping. The suburban area has sparse traffic flow, fewer platforms and traffic lights and a high acceleration ratio. There are great differences in vehicle driving conditions in different regions, so the collected data should meet the above two traffic conditions as much as possible. In this paper, according to the actual characteristics of roads in Nantong city, a typical route of Nantong bus No. 77 is selected, which goes from Xiaohai Haishang Jiayuan to Yongyi Jiayuan West. From the southeast of the main city of Nantong, through the bustling area of the main city and finally the northwest area of the main city, the route goes through schools, hospitals, shopping malls, companies, residential areas and other densely populated areas. A pure electric bus, with the line number SuF03808D, was selected from the new energy vehicle management platform of Nantong city. This bus is a ZTO LCK6108EVG3D2 pure electric city bus, with a length of 10.4 m, a width of 2.5 m and a maintenance mass of 10,500 kg. The vehicle driving data was collected from 16 to 19 August 2022 for a week, and 35,085 pieces of effective data were obtained. The circuit and collected samples are shown in Figure 2 below.

2.2. Data Preprocessing Methods

2.2.1. Micro-Trip Segmentation

Short travel (kinematic segment) refers to the speed interval between the beginning of the idle state and the beginning of the next idle state. A micro-trip generally includes four motion states: acceleration, deceleration, uniform speed and idle speed, which are specifically defined as: (1) Idling: the continuous process in which the vehicle stops moving but the engine keeps running at the lowest speed; (2) Acceleration: the continuous process of vehicle acceleration greater than a ≥ 0.15 m/s²; (3) Deceleration: the continuous process of vehicle acceleration less than a ≤ 0.15 m/s²; (4) Uniform: car acceleration |a| < 0.15 m/s², continuous process and the speed v indicates zero. In this method, the entire travel is divided into multiple basic units (short travel), and the required working conditions are formed by these basic units representing different traffic characteristics. The entire driving process of the vehicle can be regarded as the combination of multiple short trips shown in Figure 3 [20].

Referring to the relevant literature [21], this paper further divides short travel into two parts: movement section and idle section. The moving segment refers to short travel without idle, and the start and end speeds are 0. The definitions of micro-trips and motion segments are shown in Figure 4. Different road types, at different times, may have the same short travel or movement.

2.2.2. Sparse Data Interpolation

Since the data collection frequency of Nantong new energy vehicle data platform is 10 s each, it is necessary to supplement the missing data. In this paper, three different interpolation methods are used to fill the collected data, which are linear interpolation, ladder interpolation and mixed interpolation combined by linear interpolation and cubic spline interpolation. Linear interpolation [22] refers to the interpolation method whose interpolation function is a first-order polynomial, and its interpolation error on the interpolation node is zero. Compared with other interpolation methods, linear interpolation has the characteristics of simplicity and convenience. Step interpolation can be divided into adjacent interpolation, front interpolation and back interpolation, and the interpolation result is a stepped curve. In this paper, adjacent interpolation is selected. Cubic spline interpolation [23] is a mathematical method using a variable spline to make a smooth curve through a series of points. Cubic spline interpolation is composed of different cubic polynomials, each of which is determined by two adjacent data points, so that any two adjacent polynomials and their derivatives are continuous at the join point, while it is possible to estimate the approximate value of the function at other points. However, the traditional cubic spline interpolation will cause the negative value of the interpolation result and the interpolation points must be greater than four, which is not suitable for the interpolation result of the vehicle speed. Therefore, this paper develops a mixed interpolation method based on linear interpolation and cubic spline interpolation, that is, linear interpolation is carried out for the front and back sections of short travel, and cubic spline interpolation is carried out for the middle section. The linear interpolation method is used for the micro-trip with a motion segment of less than 4 s. This interpolation method not only conforms to the characteristics of the rapid rise and fall of vehicle speed when starting to brake, but also that the interpolation result will not appear negative value. The effect is shown in Figure 5 (partial data interception).

2.2.3. Characteristic Parameters Extraction

The characteristic parameters are used to describe the kinematic state and quantitatively analyze the characteristics of short travel. They are generally composed of descriptive characteristic parameters and statistical characteristic parameters. In principle, the more characteristic parameters that are selected, the stronger their ability to describe the characteristic information of the working conditions, but this will also create the problem of increasing the amount of calculation. On the contrary, if the number of parameters that are selected is too small, the description of the working condition information will be imperfect. Therefore, based on the summing up of previous experience, 12 characteristic parameters, including average driving speed, average speed, maximum speed, maximum acceleration, average acceleration, minimum deceleration speed, average deceleration speed, speed standard deviation, acceleration time ratio, deceleration time ratio, uniform speed time ratio and idle speed time ratio are initially selected. Their calculation formula is as follows [24]:

V_{m r} = S / (T - T_{i})

(1)

V_{m} = S / T

(2)

V_{m a x} = m a x \{V_{i}, i = 1,2, \dots k\}

(3)

a_{m a x} = m a x \{a_{i}, i = 1,2, \dots k\}

(4)

a_{a} = \frac{s u m \{a_{i}| a_{i} \geq 0.15, i = 1,2, \dots k - 1\}}{T_{a}}

(5)

a_{m i n} = m i n \{a_{i}, i = 1,2, \dots k\}

(6)

a_{d} = \frac{s u m \{a_{i}| a_{i} \leq - 0.15, i = 1,2, \dots k - 1\}}{T_{d}}

(7)

V_{s d} = \sqrt{\frac{\sum_{i = 1}^{k} {(V_{i} - V_{m})}^{2}}{k - 1}}, i = 1,2, \dots k

(8)

p_{a} = \frac{s u m o f t i m e w i t h a \geq 0.15 m / s^{2}}{T}

(9)

p_{d} = \frac{s u m o f t i m e w i t h a \leq - 0.15 m / s^{2}}{T}

(10)

p_{i} = \frac{s u m o f t i m e w i t h v = 0}{T}

(11)

p_{c} = 1 - p_{a} - p_{d} - p_{i}

(12)

In the above formula,

a_{i}

is the acceleration in the second,

T

is the total length of the short journey and

S

is the driving distance.

The 12 characteristic parameters of the 1870 micro-trips divided above were calculated and are listed in Table 1 (mixed interpolation is taken as an example here).

2.3. Principal Component Analysis Method

2.3.1. Theoretical Basis

The 12 characteristic parameters preliminarily selected above were used to describe the characteristics of the working conditions, but their dimensions are relatively high. If the 12 characteristic parameters are directly used as the basis for the classification of working conditions, two problems will be caused. First, the large number of characteristic parameters will increase the difficulty of the classification process, which is not proportional to the effect of classification. Second, through the calculation formula of these characteristic parameters, it can be found that the characteristic parameters are not irrelevant, but have a certain correlation, which indicates that the working condition information expressed by so many characteristic parameters is overlapping, and the classification effect will be affected if it is not simplified. Therefore, it is necessary to reduce its dimension by the principal component analysis of characteristic parameters.

In this paper, there are 1870 divided micro-trips, and there are 12 characteristic parameters defining each micro-trip, so an observation matrix of 1870 × 12 can be obtained [14].

Y = [\begin{matrix} y_{1,1} & y_{1,2} & \begin{matrix} \dots & y_{1, j} \end{matrix} \\ y_{2,1} & y_{2,2} & \begin{matrix} \dots & y_{2,1} \end{matrix} \\ \begin{matrix} \dots \\ y_{i, 1} \end{matrix} & \begin{matrix} \dots \\ y_{i, 2} \end{matrix} & \begin{matrix} \begin{matrix} \dots & \dots \end{matrix} \\ \begin{matrix} \dots & y_{i, j} \end{matrix} \end{matrix} \end{matrix}]

(13)

where

y_{i, j}

is the value of the jth characteristic parameter of the ith micro-trip,

i = 1, 2, \dots, 1870, j = 1, 2, \dots, 12

.

Standardization: The original observation matrix of 1870 × 12 dimensions is standardized and calculated, so that the mean value of each characteristic parameter is zero and the standard deviation is one. The standardized matrix is as follows:

X = [\begin{matrix} x_{1,1} & x_{1,2} & \begin{matrix} \dots & x_{1, j} \end{matrix} \\ x_{2,1} & x_{2,2} & \begin{matrix} \dots & x_{2,1} \end{matrix} \\ \begin{matrix} \dots \\ x_{i, 1} \end{matrix} & \begin{matrix} \dots \\ x_{i, 2} \end{matrix} & \begin{matrix} \begin{matrix} \dots & \dots \end{matrix} \\ \begin{matrix} \dots & x_{i, j} \end{matrix} \end{matrix} \end{matrix}]

(14)

x_{i, j} = \frac{y_{i, j} - u_{j}}{\sqrt{σ_{j}}}

(15)

u_{j}

is the original observation matrix characteristic parameters of the mean, and

σ_{j}

is the variance of the original observation matrix.

To solve the correlation coefficient matrix R

R = [\begin{matrix} r_{1,1} & r_{1,2} & \begin{matrix} \dots & r_{1, j} \end{matrix} \\ r_{2,1} & r_{2,2} & \begin{matrix} \dots & r_{2,1} \end{matrix} \\ \begin{matrix} \dots \\ r_{i, 1} \end{matrix} & \begin{matrix} \dots \\ r_{i, 2} \end{matrix} & \begin{matrix} \begin{matrix} \dots & \dots \end{matrix} \\ \begin{matrix} \dots & r_{i, j} \end{matrix} \end{matrix} \end{matrix}]

(16)

r_{i, j} = \frac{\sum_{k = 1}^{1870} (x_{k i} - \bar{x_{i}}) (x_{k i} - \bar{x_{j}})}{\sqrt{\sum_{k = 1}^{1870} {(x_{k i} - \bar{x_{i}})}^{2} {(x_{k i} - \bar{x_{j}})}^{2}}}

(17)

Solving the characteristic equation

|λ E - R| = 0

: the 12 characteristic parameter values of the correlation coefficient matrix R are obtained by solving the characteristic equation, and then the eigenvectors corresponding to each eigenvalue

λ_{k}

are obtained by solving the equations

R b = λ_{k} b

. Finally, the unit eigenvector

b_{k}

is obtained by unitizing these eigenvectors.

The contribution rate and cumulative contribution rate of principal component are then calculated. The contribution rate of the principal component refers to the variance of the KTH principal component and its proportion in the variance of all principal components. This proportion indicates that this principal component can reflect the amount of information expressed by the original 12 characteristic parameters. Its calculation formula is as follows:

ψ_{k} = \frac{\sum_{i = 1}^{k} λ_{i}}{\sum_{i = 1}^{12} λ_{i}}

(18)

This is the analysis process of the principal component analysis. Through the above series of dimensionality reduction processing, the original 12 characteristic parameters can be replaced by a few principal components and the original information can still be fully expressed by these few principal components. According to experience, if the cumulative contribution rate of the principal components reaches more than 70%, the requirements can be met.

2.3.2. Analytical Process

In this section, the follow-up processing of the mixed interpolation method is discussed. The results of the subsequent processing of the other interpolations are shown in Appendix A. The characteristic value corresponding to the principal component in Table 2 is the representation of the original characteristic parameter information contained in the principal component. The larger the value is, the larger the corresponding principal component containing the original 12 characteristic parameter information. However, when the eigenvalue is less than 1, the corresponding principal component is generally not adopted, because when the eigenvalue is less than 1, it means that the information expressed by the principal component is still not as sufficient as that expressed by the original characteristic parameter, which cannot achieve the purpose of dimension reduction. It can be seen from the table that the characteristic values of the first three principal components are all greater than 1, and their cumulative contribution rate is 74.785%, which is greater than 70%, indicating that these three principal components can express the characteristic information of working conditions instead of the original 12 characteristic parameters. Therefore, the first three principal components are selected for further study.

Three columns of the factor score coefficient matrix were obtained, denoted successively as F1, F2 and F3, and then multiplied by the corresponding eigenvalues, respectively. In this way, the unstandardized principal component score matrix of each micro-trip was obtained, denoted successively as Y1, Y2 and Y3, as shown in Table 3, preparing for further clustering analysis below.

2.4. K-means Clustering Method

2.4.1. Theoretical Basis

According to the results of the principal component analysis, the K-means clustering method will be adopted in this section to classify the 1870 short trips, and the classification results will be used as the basis for building representative working conditions of cities.

At present, the main data classification algorithms include K-means, GMM (Gaussian mixture model) [25], mean-shift clustering, K-medians and density-based spatial clustering of applications with noise (DBSCAN) [26]. The data sample studied in this paper is based on a large number of undeveloped driving data stored in the new energy bus platform of Nantong city. In the first development, the algorithm’s processing capacity in the face of large sample size data needs to be considered, and the algorithm complexity is low while the processing efficiency is relatively high. Therefore, this paper chooses the K-means clustering algorithm to classify short trips. K-means clustering analysis is a non-systematic clustering method, and its advantages lie in that it is a fast sample clustering method with a simple algorithm, fast convergence and high data processing efficiency [27]. The basic principle is to judge the category by distance and calculate the distance between each sample and the preliminarily selected sample center as an indicator to judge whether there is similarity between these samples. In this paper, the Euclidean distance is chosen as the measurement standard. The K-means clustering method classifies close samples into clusters, and its ultimate goal is to classify independent and compact clusters in all sample data as classification results. The general process is to randomly select k points from the sample data as the preliminary clustering center, and calculate the distance between the remaining samples and the k points, respectively. According to the principle of minimum distance, each sample is allocated to the k categories to form k clusters. The second step is to update the center point of each cluster. If the center point of two consecutive clusters remains unchanged, it can be considered that the algorithm has converged and the sample data has been correctly classified.

The mathematical model of the K-means clustering method is as follows [27]:

The k categories and sample data set

\{x_{1}, x_{2}, \dots x_{n},\}

are determined, the k samples are selected as the initial clustering center and the set of k samples is

\{c_{1}, c_{2}, \dots c_{n},\}

;

The Euclidean distance

D (x_{i}, c_{j}), \{i = 1, 2, \dots n; j = 1, 2, \dots n\}

is calculated from all sample data to the k clustering centers, and the calculation formula is as follows [9]:

D (x_{i}, c_{j}) = \sqrt{{(x_{i 1} - c_{j 1})}^{2} + {(x_{i 2} - c_{j 2})}^{2} + \dots + {(x_{i m} - c_{j m})}^{2}}

(19)

If

D (x_{i}, c_{j})

satisfies the following conditions.

D (x_{i}, c_{j}) = m i n \{D (x_{i}, c_{j})\}

(20)

Then, sample

x_{i}

is classified into

c_{j}

.

After the classification is completed, each clustering center needs to be updated, and its calculation formula is:

c_{j}^{*} = \frac{1}{n} \sum_{i = 1}^{n_{j}} x_{i}^{j}

(21)

where,

n_{j}

is the number of samples in class

c_{j}

.

Then, the error sum of squares criterion function is calculated:

J^{*} = \sum_{k = 1}^{n_{j}} \sum_{j = 1}^{k} {‖x_{k}^{j} - c_{j}^{*}‖}^{2}

(22)

If for any

J^{*}

satisfy all conditions

|J^{*} - J| < ξ

is deduced, it shows that the convergence criterion function, clustering analysis process is complete. If the convergence condition is not met, it is necessary to go back to step (2) and continue iteration until convergence.

In the formula, function J represents the sum of the squares of the distance between the sample data and the clustering center. The K-means clustering algorithm aims to make this distance as small as possible. In theory, there are usually two ways to achieve this goal. The first way is to keep the center of each cluster unchanged, that is, the center point of clustering, and adjust the category of sample data to minimize the J value. The other method is to keep the category of the sample data unchanged and reduce the J value by changing the clustering center. However, since the clustering number k of the algorithm needs to be determined in advance, it is usually extremely difficult to directly input an optimal clustering number, and the determination of the clustering number is a key factor to determining the clustering result. The commonly used evaluation methods include the silhouette coefficient, elbow method [28], Davies–Bouldin index [29], etc. The silhouette coefficient is suitable for the situation where the actual category information is unknown. Therefore, this paper chooses the silhouette coefficient as the measuring standard and uses python to obtain contour values with different k values.

2.4.2. Analytical Process

The 1870 sets of unstandardized principal component score data in the previous section are taken as analysis objects for the K-mean clustering analysis. The 1870 short trips are divided into two categories. The first category contains 1489 short trips, belonging to the high-speed segment; The second category contains 281 micro-trips, belonging to the low-speed segment. There is no fragment loss in the whole classification, indicating that the classification results are valid. Table 4 shows the specific information of some clustering members, including the specific classification information of each micro-trip. The first column represents each micro-trip with a total number of 1870, the second column represents each category of micro-trip and the third column represents the Euclidean distance corresponding to the micro-trip distance of the respective clustering center. The smaller the distance is, the closer the working condition is to the corresponding clustering center, and the working condition with the minimum distance value is the most representative working condition in the category. According to this information, representative working conditions corresponding to two categories can be selected, which are numbered 1607 and 291, respectively.

2.5. Condition Synthesis Method

2.5.1. Determination of Duration and Number of Segments

During the synthesis of driving conditions, according to the domestic and foreign experience, the time length of the constructed vehicle driving conditions curve is usually the time occupied by various working conditions in the final fitting working conditions, which can be determined by calculating the proportion of various running times in the total time. The calculation formula is as follows [21]:

T_{i} = \frac{T_{0}}{T_{A}} \sum_{j = 1}^{N_{i}} T_{i, j}

(23)

T_{i}

is the time of class i condition in the final synthesis condition;

T_{0}

is the duration of representative working conditions;

T_{A}

is the total duration of all operating conditions;

N_{i}

is the number of micro-trips in class i;

T_{i, j}

is the running time of the j-micro-trip segment in class i. According to the formula, the time length of light vehicle driving conditions and the proportion of short travel segments in Nantong city were obtained, in which the number of short travel segments in high-speed segment and low-speed segment was 4:1, the duration of the high-speed segment was 1050 s, the duration of the low-speed segment was 750 s and the duration of the whole working conditions was 1800 s.

According to Formulas (24) and (25), the number of moving segments and the number of idle segments in high speed and low speed segments are calculated, respectively. If the calculation result is greater than 1, it should be processed according to the rounding principle; if the calculation result is less than 1, it should be 1.

N_{M S, i} = \frac{p h a s e d u r a t i o n - a v e r a g e i d i n g d u r a t i o n}{a v e r a g e m o v e s e g m e n t d u r a t i o n + a v e r a g e i d i n g d u r a t i o n}

(24)

N_{I, i} = N_{M S, i} + 1

(25)

N_{M S, i}

is the number of motion segments, subscript i is the corresponding speed segment;

p h a s e d u r a t i o n

is the duration of the corresponding speed segment in working conditions;

a v e r a g e i d i n g d u r a t i o n

is the average duration of the idle segment in the corresponding speed segment database;

a v e r a g e m o v e s e g m e n t d u r a t i o n

indicates the average duration of moving segments in the corresponding speed segment database.

N_{I, i}

refers to the number of idle segments, and the subscript i corresponds to the subscript

N_{M S, i}

in the formula.

According to the above formula, the number of moving segments and idle segments in the construction condition are calculated, and the results are shown in Table 5.

2.5.2. Fragment Selection

In the movement segment database of the high-speed segment, the duration of each movement segment is counted, and the number of movement segments under the corresponding duration is calculated. The duration of movement segments is sorted in order from the shortest to the longest, and then the proportion of the number of movement segments under the corresponding duration and the cumulative frequency distribution of the duration of movement segments are calculated. The statistical and calculation results are shown in Table 6.

The calculation results in Table 6 show that seven motion segments are included in the working condition of the low-speed segment. The cumulative frequency distribution of the duration of motion segments can be divided into seven equal parts, and seven distribution segments can be obtained: [0,0.143), [0.143, 0.286), [0.286, 0.429), [0.429, 0.572), [0.572, 0.715), [0.715, 0.858), [0.858, 1). The cumulative frequency is, respectively, calculated in [0,0.143), [0.143, 0.286), [0.286, 0.429), [0.429, 0.572), [0.572, 0.715), [0.715, 0.858) “movement period of the average duration of”. This value is the ratio of the sum of “total duration” and the sum of “number of motion segments” between the two adjacent cumulative frequency nodes, and the results are 37 s, 51 s, 66 s, 90 s, 120 s and 163 s, respectively. Due to the use of the mixed interpolation method, these time periods cannot be obtained in the movement segment library, so the approximate value is taken and the duration is finally obtained as 41 s, 51 s, 71 s, 91 s, 121 s and 161 s. The calculation method of the duration of the remaining movement segment is different from the above six movement segments, and the calculation method is as follows. The total duration of the high-speed segment is 1205 s, so subtract the sum of the above six moving segments, and then subtract the sum of seven idling segments. Then, the remaining duration is determined to be 411 s, according to the proportion of the total duration of the moving segment and the total duration of the idle segment in the high-speed segment database as well as by considering the actual driving situation. The selection method of the other segments is consistent with that of the high-speed segment.

2.5.3. Short Travel Segment Splicing

The first step is to sort the short travel segments. The moving segments are arranged in the order of maximum speed from the lowest to the highest, and the idle segments are arranged in the order of duration from the longest to the shortest. The second step is: idling stage 1, low-speed movement stage 1, idling stage 2, low-speed movement stage 2, idling stage 3... Low-speed movement segment n and idle speed segment n+1 are combined to form the low-speed operating segment. The third step is: idle segment 1, high-speed segment 1, idle segment 2, high-speed segment 2, idle segment 3... High-speed movement segment n and idle speed segment n+1 are combined to form the high-speed cycle operating segment. In the fourth step, the above two sections of the cycle conditions are sequentially spliced into typical bus cycle conditions.

3. Results

3.1. Influence of Different Interpolation Methods on Principal Components

The scree plot of the principal component analysis can assist in determining the number of factor extraction. When the broken line suddenly changes from steep to smooth, the number of factors corresponding to steep to smooth is the number of reference extraction factors [30]. Different interpolation methods will affect the results of the principal component analysis. Figure 6 shows the principal component lithotripsy map generated by different interpolation methods. There are four characteristic parameters greater than 1 in the PCA results after linear interpolation, and three in the PCA results after stepped interpolation and mixed interpolation. Mixed interpolation reduces the characteristic parameters of the dominant component, and three characteristic parameters can contain most of the information in the original condition.

3.2. Influence of Different Interpolation Methods on K-means Clustering

3.2.1. Silhouette Coefficient

The silhouette coefficient is one of the important indicators for evaluating K-means clustering results [31], and it is an evaluation method for the quality of the clustering effect. It can be used to evaluate the influence of different algorithms or different operation modes of algorithms on clustering results based on the same original data. As can be seen from the contour coefficient diagram in Figure 7, when the clustering number is 2, the data clustering effect processed by the three interpolation methods is the best. However, the clustering effect of the sparse data processed by traditional linear interpolation in K-means clustering is lower than that of step interpolation and mixed interpolation. The clustering effect of the data processed by step interpolation and mixed interpolation fluctuates greatly when the number of clusters is low and tends to flatten and decline steadily when the number of clusters is large.

3.2.2. Results of Clustering Quantity

Table 7 shows the summary table of clustering number results generated by different interpolation methods. It can be seen from Table 7 that the three interpolation methods have no great influence on the number of clustering, with the number of valid fragments being 1870 and the number of invalid fragments being 0. None of the micro-trip fragments were lost in the clustering, indicating that the segmentation of micro-trip fragments was highly correlated, and no fragments far away from the clustering center appeared. The clustering classification results of both linear interpolation and mixed interpolation are that the high-speed segment is classified into the first class and the low-speed segment into the second class. However, the clustering classification results of ladder interpolation are different, and the high-speed segment is divided into the second category and the low-speed segment into the first category. This is because the ladder interpolation makes the speed rise and fall rapidly, the proportion of uniform speed increases, the proportion of acceleration and deceleration decreases and the working condition characteristics conform to the driving characteristics of vehicles in the low-speed segment, so the clustering sequence changes from the low-speed segment to the high-speed segment, and the clustering sequence number changes. There is a small difference in the number of segments in the three clustering results, as the high-speed segment accounts for approximately 80% and the low-speed segment accounts for approximately 20%, indicating that the average speed of electric buses in Nantong is relatively high.

3.2.3. Scatter Plot Results

To further visualize the results of the quantitative analysis of the clusters, the scatter plots of cluster 1 and cluster 2, generated by different interpolation methods, are shown in Figure 8. Due to the change of the cluster sequence number, the marking colors of the scatter plots of linear interpolation and mixed interpolation are opposite to those of step interpolation. It can be seen that the dots in the upper half of the picture representing the high-speed segment are relatively dense and highly correlated, indicating that the driving mode of the electric buses in Nantong is basically the same when passing through the high-speed segment, while the dots in the lower half of the picture representing the dots in the low-speed segment are relatively sparse and the correlation is low, indicating that the vehicles start and stop frequently when passing through the low-speed segment, the speed fluctuates greatly and the running time is different. It can be seen from Table 7 that the three interpolation methods do not have much impact on the number of clustering, the number of segments in the three clustering results has a small difference, the high-speed segment accounts for approximately 80% and the low-speed segment accounts for approximately 20%, indicating that the average speed of electric buses in Nantong is relatively high.

3.2.4. Represents Short Trip Results

According to the analysis method in Section 2.4.2, three clustering results produce three different representative micro-trips. As can be seen from Figure 9, compared with mixed interpolation, the speed of low-speed and micro-trip, represented by linear interpolation and stepped interpolation, is higher and the duration is longer. The speed of short travel selected by the mixed interpolation process fluctuates greatly and lasts for a long time.

3.3. The Influence of Different Interpolation Methods on the Synthesis of Working Conditions

3.3.1. Number of Short Travel Segments

As described in Section 2.5.1, the duration of the entire working condition is determined as 1800 s. Through analysis and calculation, the number of moving segments and idle segments in the construction condition are obtained. Different interpolation methods have little influence on the number of segments used in the construction condition, as shown in Table 8.

The duration of the high-speed segment obtained by linear interpolation is the longest, reaching 1205 s and accounting for 66.94% of the time. The shortest duration of the high-speed segment obtained by step interpolation is 1051 s, and the proportion of time is 58.39%. The number of high-speed moving segments obtained by the three processing methods is all 7, and the corresponding number of idle segments is 8. The number of low-speed moving segments is different. The number of low-speed moving segments obtained by linear interpolation is 2, and the corresponding number of idle segments is 3. The other two interpolation methods obtain the number of low-speed moving segments as 1, and the corresponding number of idle segments as 2.

3.3.2. Synthetic Condition

According to the selected short travel segments, connected in turn according to the rules, the electric bus driving cycle corresponding to the three interpolation methods was constructed, as shown in Figure 10. As can be seen from the figure, the idling time of the high-speed segment in linear interpolation processing is longer, and the length of high-speed motion segment selected is shorter, resulting in a long time interval between the low-speed segment and high-speed segment. The conditions of step interpolation and mixed interpolation are similar, and the time interval between low-speed and high-speed is shorter.

4. Discussion

4.1. Comparative Analysis of Three Synthesis Conditions

To test the rationality and feasibility of the constructed driving conditions of Nantong electric buses, errors are compared between the synthetic driving conditions and the statistical results of characteristic parameters of the original sample data. The results are shown in Table 9. The results show that the step interpolation method has a poor processing effect, and the relative error between the synthetic conditions and the measured data are the largest, whose average value is 231.89%, which does not meet the development requirements. The linear interpolation method has a general processing effect, and the relative error between the synthetic conditions and the measured data are large, with an average value of 25.09%, which does not meet the development requirements. The mixed interpolation method based on linear interpolation and cubic spline interpolation has a good processing effect. The average relative error between the synthetic conditions and the measured data are 15.71%, and the relative error of 10 of the 12 characteristic parameters is less than 15%, which meets the development requirements. Compared with sparse data processed by the linear interpolation method and the step interpolation method, the cycle conditions of Nantong city based on the mixed interpolation method can better represent the bus operating conditions of Nantong city. Therefore, from the statistical results, the working conditions constructed by the hybrid interpolation method proposed in this paper basically meet the development requirements, and can accurately reflect the characteristics of the sample population.

To further visualize the influence of different data interpolation methods on the construction results of electric bus driving conditions, the speed index histogram (Figure 11a), acceleration index histogram (Figure 11b) and time proportion index histogram (Figure 11c) were drawn, respectively, according to the classification attributes of the 12 characteristic values.

4.2. Comparison between Optimal Synthesis Condition and Standard Condition

To verify the differences between the working conditions constructed in this paper and other working conditions, the bus driving conditions in Nantong city are compared with NEDC and WLTC, as shown in Table 10 and Figure 12. The comparative analysis shows that the average running speed, maximum speed and idle time ratio of buses in Nantong city are significantly different from those in NEDC and WLTC. The mean relative errors were 71.53% and 85.37%, respectively. However, our electric vehicle certification and production inspection use are equivalent to the WLTC test cycle. There are significant differences between each characteristic of driving conditions constructed in this paper and WLTC. If the vehicle fuel consumption is measured by the WLTC working condition directly, the actual driving characteristics of Nantong buses cannot be accurately reflected. Therefore, the bus driving condition constructed by the method in this paper can provide a reference for the construction of the bus driving condition in Nantong city and the vehicle energy control strategy of a specific bus line.

5. Conclusions

In this paper, using a Nantong city bus as the research object, the construction method of bus driving condition was studied. An electric bus cycle construction method based on mixed interpolation is proposed to deal with the sparse data. The main conclusions are as follows:

(1): Three different interpolation methods, linear interpolation, step interpolation and mixed interpolation, were used to preprocess the collected data, and 1870 short-stroke fragments were extracted from the data. The feature parameter matrix will be reduced and classified using the principal component analysis method and K-means clustering algorithm. The silhouette index will be used as the standard for measuring the clustering results. According to the clustering results, sports segment libraries and idle segment libraries of different categories were established, from which the most frequent duration segments were selected, and the segment closest to the clustering center was selected from the corresponding duration segments to construct the representative bus driving conditions of Nantong city.
(2): The step interpolation method has a poor processing effect, and the relative error between the synthetic conditions and the measured data is the largest, whose average value is 231.89%, which cannot meet the development requirements. The linear interpolation method has a general processing effect, and the relative error between the synthetic conditions and the measured data is large, with an average value of 25.09%, which does not meet the development requirements. The mixed interpolation method based on linear interpolation and cubic spline interpolation has a good processing effect. The average relative error between the synthetic conditions and the measured data is 15.71%, and the relative error of 10 of the 12 characteristic parameters is less than 15%. Therefore, the working conditions constructed by the mixed interpolation method proposed in this paper basically meet the development requirements. It can accurately reflect the characteristics of the sample population.
(3): The average running speed of buses in Nantong city is low, the idle time ratio is high and the maximum speed is low. Compared with NEDC and WLTC, the average running speed, the maximum speed and the idle time ratio of buses in Nantong city are significantly different, with the average relative error reaching 71.53% and 85.37%, respectively. The comparison results show that WLTC driving conditions are difficult to accurately reflect the actual traffic conditions of buses in Nantong city. In order to develop better energy management strategies and vehicle emission standards, it is necessary to consider the influence of regional differences and construct representative driving conditions in line with the actual local driving conditions.

This paper mainly studies the influence of different data interpolation methods on the construction accuracy of an electric bus’s driving condition. Other steps in the whole construction process are also very important, such as the feature parameter dimension reduction method, micro-trip classification method and micro-trip segment splicing method. In the future, different methods will be used in these steps to carry out horizontal comparison, analyze the influence of different methods on the construction of vehicle driving conditions, develop and utilize the driving data in the vehicle management platform and provide reference for the optimization of bus vehicle energy control strategy and battery management system [32].

6. Patents

The authors of this paper have carried out research on new energy vehicle operation data collection, monitoring, evaluation and vehicle energy management control strategy for many years. Five Chinese invention patents application related to this paper have been published, the patent application publication numbers are CN103413413A, CN103413414A, CN103991387A, CN103419671A and CN103419672A. Significantly, three patent applications have been authorized, the patent authorization announcements numbers are CN103413413B, CN 103991387B and CN 103419671B.

Author Contributions

Conceptualization, methodology, formal analysis and investigation, X.W.; software, validation and data curation, P.Y.; resources, H.N. and Y.Z.; writing—original draft preparation, X.W. and P.Y.; writing—review and editing, Y.Z. and Y.D.; visualization, X.W. and P.Y.; supervision, Y.Y.; project administration, Y.D. and Y.Y.; funding acquisition, Y.Y. and H.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant numbers 51905361 and 51876133; the Jiangsu Provincial Key Research and Development Program of China, grant number BE2021065; and a project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

Thanks to the Nantong Public Traffic General Company and Jiangsu Honghu Electronic Technology Co., LTD for their help in collecting the original operation data. The authors would like to thank the anonymous reviewers for their reviews and comments.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

Parameter	Meaning Represented
$V_{m a x}$	maximum speed
$V_{m}$	average speed
$V_{m r}$	average driving speed
$V_{s d}$	speed standard deviation
$a_{m a x}$	maximum acceleration
$a_{m i n}$	minimum deceleration speed
$a_{a}$	average acceleration
$a_{d}$	average deceleration speed
$p_{a}$	acceleration time ratio
$p_{d}$	deceleration time ratio
$p_{i}$	idle speed time ratio
$p_{c}$	uniform speed time ratio
$T$	total length of the short journey
$S$	driving distance
$Y$	observation matrix
$y_{i, j}$	value of the jth characteristic parameter of the ith micro-trip
$X$	normalized matrix
$u_{j}$	original observation matrix characteristic parameters of the mean
$σ_{j}$	variance of the original observation matrix
$R$	correlation coefficient matrix
$ψ_{k}$	contribution rate
$D (x_{i}, c_{j})$	Euclidean distance
$J^{*}$	error sum of squares criterion function
$T_{i}$	time of class i condition in the final synthesis condition
$T_{0}$	duration of representative working conditions
$T_{A}$	total duration of all operating conditions
$N_{i}$	number of micro-trips in class i
$T_{i, j}$	running time of the j-micro-trip segment in class i
BEVs	battery electric vehicles
NEDC	new European driving cycle
WLTC	worldwide harmonized light vehicles test cycle
SOC	state of charge
PCA	principal components analysis
SVM	support vector machines
GMM	gaussian mixture model
DBSCAN	density-based spatial clustering of applications with noise

Appendix A

As a supplementary material to Section 2.3.2, Appendix A outlines the principal component contribution rate of the linear interpolation results and step interpolation results.

It can be seen from Table A1 that the characteristic values of the first four principal components are all greater than 1, and their cumulative contribution rate is 87.246%, greater than 70%, indicating that these four principal components can represent the characteristic information of the working condition, instead of the original 12 characteristic parameters. Therefore, the first four principal components were selected for further study.

Table A1. Characteristic parameter contribution rate of linear interpolation.

Serial Number	Initial Characteristic Parameter
Serial Number	Total	Percentage of Variance	Cumulative (%)
1	5.674	47.284	47.284
2	1.960	16.334	63.618
3	1.662	13.849	77.467
4	1.173	9.779	87.246
5	0.792	6.602	93.848
6	0.245	2.043	95.890
7	0.195	1.623	97.513
8	0.149	1.243	98.757
9	0.066	0.547	99.303
10	0.058	0.480	99.784
11	0.026	0.216	99.99999…
12	6.992 × 10⁻¹⁵	5.827 × 10⁻¹⁴	100.000

Four columns of the factor score coefficient matrix were obtained, denoted successively as F1, F2, F3 and F4, and were then multiplied by the corresponding eigenvalues, respectively. In this way, the unstandardized principal component score matrix of each micro-trip was obtained, denoted successively as Y1, Y2, Y3 and Y4, as shown in Table A2, preparing for further clustering analysis below.

Table A2. Principal component results table.

Serial Number	F1	F2	F3	F4	Y1	Y2	Y3	Y4
1	−1.65	−0.05	0.72	−0.44	−3.93	−0.01	0.92	−0.48
2	0.59	1.01	1.46	−0.17	1.4	1.42	1.88	−0.19
3	−1.23	−0.01	−0.54	−0.61	−2.93	−0.02	−0.7	−0.67
…	…	…	…	…	…	…	…	…
1869	−1.38	0.41	−0.74	−0.51	−3.29	0.58	−0.95	−0.55
1870	−3.10	2.22	2.43	0.37	−7.39	3.11	3.13	0.4

It can be seen from Table A3 that the characteristic values of the first three principal components are all greater than 1, and their cumulative contribution rate is 83.035%, greater than 70%, indicating that these three principal components can represent the characteristic information of the working condition, instead of the original 12 characteristic parameters. Therefore, the first three principal components were selected for further study.

Table A3. Characteristic parameter contribution rate of step interpolation.

Serial Number	Initial Characteristic Parameter
Serial Number	Total	Percentage of Variance	Cumulative (%)
1	6.014	50.120	50.120
2	2.608	21.735	71.855
3	1.342	11.180	83.035
4	0.950	7.915	90.951
5	0.334	2.786	93.737
6	0.236	1.965	95.702
7	0.194	1.613	97.315
8	0.148	1.230	98.545
9	0.077	0.639	99.184
10	0.073	0.612	99.796
11	0.024	0.204	99.99999…
12	2.013 × 10⁻¹⁵	1.678 × 10⁻¹⁴	100.000

Three columns of the factor score coefficient matrix were obtained, denoted successively as F1, F2 and F3, and then multiplied by corresponding eigenvalues, respectively. In this way, the unstandardized principal component score matrix of each micro-trip was obtained, denoted successively as Y1, Y2 and Y3, as shown in Table A4, preparing for further clustering analysis below.

Table A4. Principal component results table.

Serial Number	F1	F2	F3	Y1	Y2	Y3
1	−1.49	0.79	−1.09	−3.66	1.27	−1.17
2	0.92	1.01	−0.14	2.25	1.63	−0.16
3	−1.25	0.27	−0.97	−3.07	0.43	−1.12
…	…	…	…	…	…	…
1869	−1.20	1.06	−0.09	−2.95	1.7	−0.11
1870	−2.22	2.48	0.78	−5.46	4.01	0.91

References

Jiang, N.; Wang, X.; Kang, L. A Novel Power Distribution Strategy and Its Online Implementation for Hybrid Energy Storage Systems of Electric Vehicles. Electronics 2023, 12, 301. [Google Scholar] [CrossRef]
Zhang, C.; Guo, Q.; Li, L.; Wang, M.; Wang, T. System Efficiency Improvement for Electric Vehicles Adopting a Permanent Magnet Synchronous Motor Direct Drive System. Energies 2017, 10, 2030. [Google Scholar] [CrossRef] [Green Version]
Hong, S.; Kang, M.; Park, H.; Kim, J.; Baek, J. Real-Time State-of-Charge Estimation Using an Embedded Board for Li-Ion Batteries. Electronics 2022, 11, 2010. [Google Scholar] [CrossRef]
Zhang, Q.; Shaopeng, T.; Xinyan, L. Recent Advances and Applications of AI-Based Mathematical Modeling in Predictive Control of Hybrid Electric Vehicle Energy Management in China. Electronics 2023, 12, 445. [Google Scholar] [CrossRef]
Wang, X.; Ye, P.; Zhang, Y.; Ni, H.; Deng, Y.; Lv, S.; Yuan, Y.; Zhu, Y. Parameter Optimization Method for Power System of Medium-Sized Bus Based on Orthogonal Test. Energies 2022, 15, 7243. [Google Scholar] [CrossRef]
Cheng, Y.; Xu, G.; Chen, Q. Research on Energy Management Strategy of Electric Vehicle Hybrid System Based on Reinforcement Learning. Electronics 2022, 11, 1933. [Google Scholar] [CrossRef]
Zhao, X.; Zhao, X.; Yu, Q.; Ye, Y.; Yu, M. Development of a representative urban driving cycle construction methodology for electric vehicles: A case study in Xi’an. Transp. Res. Part D Transp. Env. 2020, 81, 102279. [Google Scholar] [CrossRef]
Hongwen, H.; Jinquan, G.; Jiankun, P.; Huachun, T.; Chao, S. Real-time global driving cycle construction and the application to economy driving pro system in plug-in hybrid electric vehicles. Energy 2018, 152, 95–107. [Google Scholar] [CrossRef]
Amirjamshidi, G.; Roorda, M.J. Development of simulated driving cycles for light, medium, and heavy duty trucks: Case of the Toronto Waterfront Area. Transp. Res. Part D Transp. Environ. 2015, 34, 255–266. [Google Scholar] [CrossRef]
Zhao, X.; Yu, Q.; Ma, J.; Wu, Y.; Yu, M.; Ye, Y. Development of a Representative EV Urban Driving Cycle Based on a k-Means and SVM Hybrid Clustering Algorithm. J. Adv. Transp. 2018, 2018, 1–18. [Google Scholar] [CrossRef]
Shen, P.; Zhao, Z.; Li, J.; Zhan, X. Development of a typical driving cycle for an intra-city hybrid electric bus with a fixed route. Transp. Res. Part D Transp. Environ. 2018, 59, 346–360. [Google Scholar] [CrossRef]
Liu, B.; Shi, Q.; He, L.; Qiu, D. A study on the construction of Hefei urban driving cycle for passenger vehicle. IFAC-PapersOnLine 2018, 51, 854–858. [Google Scholar] [CrossRef]
Gao, X.; Zong, X.; Yuan, Y.; Wang, X.; Ni, H.; Chen, J. New Energy Vehicle Integrated Operation Management Platform for Multiple Vehicle. Chinese Patent CN103413413A, 27 November 2013. ZL201310319390.4, 1 February 2017. [Google Scholar]
Zheng, Z.; Yan, Y.; Liu, Y.; Li, L.; Chang, Y. An Efficiency–Accuracy Balanced Power Leakage Evaluation Framework Utilizing Principal Component Analysis and Test Vector Leakage Assessment. Electronics 2022, 11, 4191. [Google Scholar] [CrossRef]
Nazari, M.; Hussain, A.; Musilek, P. Applications of Clustering Methods for Different Aspects of Electric Vehicles. Electronics 2023, 12, 790. [Google Scholar] [CrossRef]
Wang, E.; Lee, H.; Do, K.; Lee, M.; Chung, S. Recommendation of Music Based on DASS-21 (Depression, Anxiety, Stress Scales) Using Fuzzy Clustering. Electronics 2023, 12, 168. [Google Scholar] [CrossRef]
Hinov, N.; Punov, P.; Gilev, B.; Vacheva, G. Model-Based Estimation of Transmission Gear Ratio for Driving Energy Consumption of an EV. Electronics 2021, 10, 1530. [Google Scholar] [CrossRef]
Wu, D.; Feng, L. On-Off Control of Range Extender in Extended-Range Electric Vehicle using Bird Swarm Intelligence. Electronics 2019, 8, 1223. [Google Scholar] [CrossRef] [Green Version]
Zhao, X.; Ye, Y.; Ma, J.; Shi, P.; Chen, H. Construction of electric vehicle driving cycle for studying electric vehicle energy consumption and equivalent emissions. Environ. Sci. Pollut. Res. 2020, 27, 37395–37409. [Google Scholar] [CrossRef]
Hung, W.T.; Tong, H.Y.; Lee, C.P.; Ha, K.; Pao, L.Y. Development of a practical driving cycle construction methodology: A case study in Hong Kong. Transp. Res. Part D Transp. Environ. 2007, 12, 115–128. [Google Scholar] [CrossRef]
Liu, Y.; Wu, Z.X.; Zhou, H.; Zheng, H.; Yu, N.; An, X.P.; Li, J.Y.; Li, M.L. Development of China Light-Duty Vehicle Test Cycle. Int. J. Automot. Technol. 2020, 21, 1233–1246. [Google Scholar] [CrossRef]
Ashtari, A.; Bibeau, E.; Shahidinejad, S. Using large driving record samples and a stochastic approach for real-world driving cycle construction: Winnipeg driving cycle. Transp. Sci. 2014, 48, 170–183. [Google Scholar] [CrossRef]
Zhao, L.; Li, K.; Zhao, W.; Ke, H.C.; Wang, Z. A Sticky Sampling and Markov State Transition Matrix Based Driving Cycle Construction Method for EV. Energies 2022, 15, 1057. [Google Scholar] [CrossRef]
Chen, Z.; Fang, Z.; Zhang, Q.; Zhou, N.; Yu, Q. Constructing the real-world driving cycle for electric vehicle applications: A comparative study. Trans. Inst. Meas. Control. 2022, 01423312221094384. [Google Scholar] [CrossRef]
Lao, Y.; Zhang, G.; Corey, J. Gaussian Mixture Model-Based Speed Estimation and Vehicle Classification Using Single-Loop Measurements. J. Intell. Transp. Syst. 2012, 16, 184–196. [Google Scholar] [CrossRef]
Yu, X.; Long, W.; Li, Y. Trajectory dimensionality reduction and hyperparameter settings of DBSCAN for trajectory clustering. IET Intell. Transp. Syst. 2022, 16, 691–710. [Google Scholar] [CrossRef]
Racolte, G.; Marques, A.; Scalco, L. Spherical K-Means and Elbow Method Optimizations with Fisher Statistics for 3D Stochastic DFN from Virtual Outcrop Models. IEEE Access 2022, 10, 63723–63735. [Google Scholar] [CrossRef]
Chen, H.; Yang, C.; Xu, X. Clustering Vehicle Temporal and Spatial Travel Behavior Using License Plate Recognition Data. J. Adv. Transp. 2017, 2017, 1738085. [Google Scholar] [CrossRef] [Green Version]
Zhang, J.; Chu, L.; Wang, X.; Guo, C.; Fu, Z.; Zhao, D. Optimal energy management strategy for plug-in hybrid electric vehicles based on a combined clustering analysis. Appl. Math. Model. 2021, 94, 49–67. [Google Scholar] [CrossRef]
Qiu, H.; Cui, S.; Wang, S.; Wang, Y.; Feng, M. A Clustering-Based Optimization Method for the Driving Cycle Construction: A Case Study in Fuzhou and Putian, China. IEEE Trans. Intell. Transp. Syst. 2022, 23, 18681–18694. [Google Scholar] [CrossRef]
Peng, J.; Jiang, J.; Ding, F.; Tan, H. Development of driving cycle construction for hybrid electric bus: A case study in Zhengzhou, china. Sustainability 2020, 12, 7188. [Google Scholar] [CrossRef]
Wang, X.; Liu, S.; Zhang, Y.; Lv, S.; Ni, H.; Deng, Y.; Yuan, Y. A Review of the Power Battery Thermal Management System with Different Cooling, Heating and Coupling System. Energies 2022, 15, 1963. [Google Scholar] [CrossRef]

Figure 1. Technical route flow chart.

Figure 2. Route and sample data diagram, (a) roadmap, (b) data diagram.

Figure 3. Micro-trip schematic representation.

Figure 4. Sample diagram of idle and motion segments schematic representation.

Figure 5. Interpolation result shows image, (a) linear interpolation, (b) step interpolation, (c) cubic spline interpolation, (d) mixed interpolation.

Figure 6. Principal component lithotripsy map generated by different interpolation methods.

Figure 7. Contour coefficient graphs generated by different interpolation methods.

Figure 8. Scatterplot generated by different interpolation methods: (a) linear interpolation, (b) step interpolation, (c) mixed interpolation.

Figure 9. Comparison diagram of micro-trip represented by different interpolation methods. (a) This section is listed as a low-speed section, and (b) this section is listed as a high-speed section.

Figure 10. Comparison of different interpolation synthesis conditions.

Figure 11. Influences of different data interpolation methods on construction indexes of electric bus driving conditions (a) speed index chart, (b) acceleration index chart and (c) time ratio index chart.

Figure 12. Comparison of NEDC, WLTC and Nantong bus working conditions.

Table 1. Characteristic parameter list.

Serial Number	$V_{m a x}$	$V_{m}$	$V_{m r}$	$a_{m a x}$	…	$p_{i}$	$p_{c}$
1	19.31	6.43	10.33660145	0.61	…	0.38	0.31
2	45.9	33.5	33.60909371	1.03	…	0	0.44
3	25	7.27	12.02040816	0.34	…	0.4	0.1
4	41.37	3.15	21.68461538	1.49	…	0.85	0.05
5	39.5	12.8	26.58974359	0.86	…	0.5	0
…	…	…	…	…	…	…	…
1870	3.4	1.68	1.769230769	0.09	…	0.05	0.95

Table 2. Characteristic parameter contribution rate.

Serial Number	Initial Characteristic Parameter
Serial Number	Total	Percentage of Variance	Cumulative (%)
1	5.400	44.996	44.996
2	1.942	16.181	61.177
3	1.633	13.608	74.785
4	0.976	8.130	82.916
5	0.738	6.151	89.067
6	0.564	4.701	93.768
7	0.402	3.351	97.120
8	0.200	1.666	98.786
9	0.068	0.563	99.348
10	0.051	0.428	99.776
11	0.027	0.224	99.99999…
12	7.602 × 10⁻¹⁶	6.335 × 10⁻¹⁵	100.000

Table 3. Principal component results table.

Serial Number	F1	F2	F3	Y1	Y2	Y3
1	−1.54	0.10	0.32	−3.57	0.15	0.41
2	0.50	1.10	1.42	1.15	1.53	1.81
3	−1.28	0.11	−0.59	−2.97	0.15	−0.75
…	…	…	…	…	…	…
1869	−0.95	0.10	−0.65	−2.2	0.13	−0.83
1870	−3.03	2.40	2.66	−7.04	3.34	3.4

Table 4. Clustering distance table.

Serial Number	Clustering	Distance
1	2	0.61752
2	1	2.3289
3	2	1.07724
…	…	…
1869	2	1.70867
1870	2	6.02206

Table 5. Fragment number selection table.

Speed Section	Duration of Velocity Segment (s)	Mean Duration of Movement Period (s)	Mean Duration of the Idle Segment (s)	Number of Motion Segments	Number of Idle Segments
High-Speed Section	1100	121.0268637	24.96306246	7.36(7)	8
Low-Speed Section	700	56.01312336	297.4015748	1.14(1)	2

Table 6. Data analysis table.

Duration	Quantity	Frequency	Cumulative Frequency	Total Duration	Mean Duration of Movement Period
21	4	0.002686367	0.002686367	84	…
31	66	0.044325050	0.047011417	2046	…
41	106	0.071188717	0.118200134	4346	37
51	113	0.075889859	0.194089993	5763	51
…	…	…	…	…	…
71	123	0.082605776	0.370047011	8733	66
…	…	…	…	…	…
101	86	0.057756884	0.559435863	8686	90
…	…	…	…	…	…
131	53	0.035594359	0.691739422	6943	120
…	…	…	…	…	…
191	34	0.022834117	0.841504365	6494	163
…	…	…	…	…	…

Table 7. Number of clustering results generated by different interpolation methods.

Project	Linear Interpolation	Step Interpolation	Mixed Interpolation
Cluster 1	1545	413	1489
Cluster 2	325	1457	381
Effective	1870	1870	1870
Failure	0	0	0

Table 8. Table of the number of fragments obtained by different interpolation methods.

Interpolation Methods	Speed Section	Duration of Velocity Segment	Mean Duration of Movement Period	Mean Duration of the Idle Segment	Number of Motion Segments	Number of Idle Segments
Linear	High-speed section	1205	33.54045307	118.9288026	7.09(7)	8
Linear	Low-speed section	595	54.78461538	303.5692308	1.50(2)	3
Step	High-speed section	1050	112.8716541	28.75291695	7.21(7)	8
Step	Low-speed section	750	49.07021792	303.6731235	1.26(1)	2
Mixed	High-speed section	1100	121.0268637	24.96306246	7.36(7)	8
Mixed	Low-speed section	700	56.01312336	297.4015748	1.14(1)	2

Table 9. Comparison of characteristic parameters between synthetic conditions and original data.

Parameter	Original	Linear		Step		Mixed
Parameter	Original	Conditions	$δ$ (%)	Conditions	$δ$ (%)	Conditions	$δ$ (%)
$V_{m a x}$	56	49.7	11.3	51.1	8.75	48.38	13.61
$V_{m}$	14.28	8.22	42.4	16.09	12.68	13.26	7.14
$V_{m r}$	25.2	20.8	17.5	30.8	22.228	22.91	9.09
$V_{s d}$	16.53	12.95	21.7	18.27	10.53	14.9	9.86
$a_{m a x}$	1.44	0.94	34.7	8.56	494.44	2.72	88.89
$a_{m i n}$	−1.43	−0.87	39.2	−9.72	579.72	−1.25	12.59
$a_{a}$	0.47	0.45	4.3	3.43	629.79	0.51	8.519
$a_{d}$	−0.47	−0.44	6.4	−3.72	691.49	−0.46	2.13
$p_{a}$	0.19	0.16	15.8	0.03	84.219	0.2	5.26
$p_{d}$	0.19	0.16	15.8	0.03	84.219	0.22	15.79
$p_{i}$	0.44	0.6	36.4	0.48	9.099	0.42	4.55
$p_{c}$	0.18	0.08	55.6	0.46	155.56	0.16	11.11
Average	-	-	25.09	-	231.89	-	15.71

Table 10. Data analysis table.

Characteristic Parameter	Mixed Interpolation	NEDC		WLTC
Characteristic Parameter	Mixed Interpolation	Conditions	$δ$ (%)	Conditions	$δ$ (%)
$V_{m a x}$	48.38	120	148.04	131.3	171.39
$V_{m}$	13.26	33.21	150.45	46.51	250.75
$V_{m r}$	22.91	44.37	93.67	53.49	133.48
$V_{s d}$	14.9	31.08	108.59	36.12	142.42
$a_{m a x}$	2.72	1.06	61.03	1.67	38.60
$a_{m i n}$	−1.25	−1.39	11.20	−1.5	20.00
$a_{a}$	0.51	0.54	5.88	0.56	9.80
$a_{d}$	−0.46	−0.79	71.74	−0.6	30.43
$p_{a}$	0.2	0.23	15.00	0.3	50.00
$p_{d}$	0.22	0.16	27.27	0.28	27.27
$p_{i}$	0.42	0.25	40.48	0.13	69.05
$p_{c}$	0.16	0.36	125.00	0.29	81.25
Average	-	-	71.53	-	85.37

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, X.; Ye, P.; Deng, Y.; Yuan, Y.; Zhu, Y.; Ni, H. Influence of Different Data Interpolation Methods for Sparse Data on the Construction Accuracy of Electric Bus Driving Cycle. Electronics 2023, 12, 1377. https://doi.org/10.3390/electronics12061377

AMA Style

Wang X, Ye P, Deng Y, Yuan Y, Zhu Y, Ni H. Influence of Different Data Interpolation Methods for Sparse Data on the Construction Accuracy of Electric Bus Driving Cycle. Electronics. 2023; 12(6):1377. https://doi.org/10.3390/electronics12061377

Chicago/Turabian Style

Wang, Xingxing, Peilin Ye, Yelin Deng, Yinnan Yuan, Yu Zhu, and Hongjun Ni. 2023. "Influence of Different Data Interpolation Methods for Sparse Data on the Construction Accuracy of Electric Bus Driving Cycle" Electronics 12, no. 6: 1377. https://doi.org/10.3390/electronics12061377

APA Style

Wang, X., Ye, P., Deng, Y., Yuan, Y., Zhu, Y., & Ni, H. (2023). Influence of Different Data Interpolation Methods for Sparse Data on the Construction Accuracy of Electric Bus Driving Cycle. Electronics, 12(6), 1377. https://doi.org/10.3390/electronics12061377

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Influence of Different Data Interpolation Methods for Sparse Data on the Construction Accuracy of Electric Bus Driving Cycle

Abstract

1. Introduction

2. Data and Methods

2.1. Original Data Collection

2.2. Data Preprocessing Methods

2.2.1. Micro-Trip Segmentation

2.2.2. Sparse Data Interpolation

2.2.3. Characteristic Parameters Extraction

2.3. Principal Component Analysis Method

2.3.1. Theoretical Basis

2.3.2. Analytical Process

2.4. K-means Clustering Method

2.4.1. Theoretical Basis

2.4.2. Analytical Process

2.5. Condition Synthesis Method

2.5.1. Determination of Duration and Number of Segments

2.5.2. Fragment Selection

2.5.3. Short Travel Segment Splicing

3. Results

3.1. Influence of Different Interpolation Methods on Principal Components

3.2. Influence of Different Interpolation Methods on K-means Clustering

3.2.1. Silhouette Coefficient

3.2.2. Results of Clustering Quantity

3.2.3. Scatter Plot Results

3.2.4. Represents Short Trip Results

3.3. The Influence of Different Interpolation Methods on the Synthesis of Working Conditions

3.3.1. Number of Short Travel Segments

3.3.2. Synthetic Condition

4. Discussion

4.1. Comparative Analysis of Three Synthesis Conditions

4.2. Comparison between Optimal Synthesis Condition and Standard Condition

5. Conclusions

6. Patents

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Nomenclature

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI