Applications of Clustering Methods for Different Aspects of Electric Vehicles

: The growing penetration of electric vehicles can pose several challenges for power systems, especially distribution systems, due to the introduction of signiﬁcant uncertain load. Analysis of these challenges becomes computationally expensive with higher penetration of electric vehicles due to various preferences, travel behavior, and the battery size of electric vehicles. This problem can be addressed using clustering methods which have been successfully used in many other sectors. Recently, there have been several studies published on applying clustering methods for various aspects of electric vehicles. To summarize the existing efforts and provide future research directions, this contribution presents a three-step analysis. First, the existing clustering methods, including hard and soft clustering, are discussed. Then, the recent literature on the application of clustering methods for different aspects of electric vehicles is reviewed. The review concentrates on four major aspects of electric vehicles: the behavior of the user, driving cycle, used batteries, and charging stations. Then, several representative studies are selected from each category and their merits and demerits are summarized. Finally, gaps in the existing literature are identiﬁed and directions for future research are presented. They indicate the need for further research on the impact on distribution circuits, charging infrastructure during emergencies, equity and disparity in rebate allocations, and the use of big data with cluster analysis to assist transportation network management.


Introduction
In the last decade, governments around the globe have implemented significant policy reforms to establish countermeasures and corrective steps to address the problem of climate change caused by humans [1].The European Union proposed the European Green Deal in December 2019, in which the majority of member states pledge to zero net greenhouse gas emissions by 2050.The decarbonization of society is one of the foundations of the green deal [2].There is global concern about climate change, which is typically associated with human influence on the environment caused by greenhouse gas emissions [3].Global carbon dioxide (CO 2 ) from fossil fuel usage of fossil fuels surged from 6 billion tons in 1950 to 36.4 billion tons in 2021 [4,5].As the second largest producer of CO 2 emissions, the transportation industry is responsible for 22.67% of the total emission.Figure 1 shows the distribution of CO 2 emissions in different sectors [4].The global population is rising.It is expected that there will be 1.5 billion automobiles on the planet by 2025, and 2 billion by 2040 [6].This would cause more carbon dioxide in the world.
To limit global warming to 1.5 • C or at least below 2 • C [7], it is crucial to stop using fossil fuels.China (31%), the United States (14%), the EU27 (7%) and India (7%) contributed the most to global fossil CO 2 emissions in absolute terms in 2020.These four areas are responsible for 59% of global CO 2 emissions.However, the rest of the world contributes 41% which also includes marine bunker fuels and international aviation [5].The United To limit global warming to 1.5 °C or at least below 2 °C [7], it is crucial to fossil fuels.China (31%), the United States (14%), the EU27 (7%) and India (7% uted the most to global fossil CO2 emissions in absolute terms in 2020.These are responsible for 59% of global CO2 emissions.However, the rest of the wor utes 41% which also includes marine bunker fuels and international aviatio United States, the European Union, and the United Kingdom aim to achiev emissions by 2050, China and Russia by 2060, and India by 2070 [8].In addi mate change, energy security and the future of oil supply pose a significant manufacturers are becoming more aware of their involvement in achieving t decarbonizing the economy and reducing oil dependence [3].Around 97% o pean Union (EU) oil consumption is met by imports, with a quarter of thes coming directly from Russia.The European Commission has the plan to elim sian oil, gas, and coal imports by 2027.The road transportation sector uses ap ly 60% of the total oil consumption in the EU [9].
Electric vehicles (EVs) are one of the practical means of significantly an ately decarbonizing transportation [10].In 2021, sales of EVs hit a record high lion, doubling from the previous year.Only 120,000 electric cars were sold wo 2012.Each week in 2021, sales exceeded that amount.In 2021, around 10% of tomotive sales were electric, four times the percentage in 2019.This increase number of electric cars on the world's roadways to nearly 16.5 million, or thre 2018 level.Two million electric cars were sold worldwide in the first quarter 75% increase over the same time in 2021.China and Europe accounted for 85% of the worldwide sales of EVs in 2021, followed by the United States (10% Higher penetration of EVs can bring several benefits in terms of renew sumption and reducing CO2 emissions, as discussed in previous paragraphs.I EVs are beneficial for distribution systems, microgrids, and nano grids in sev Electric vehicles (EVs) are one of the practical means of significantly and immediately decarbonizing transportation [10].In 2021, sales of EVs hit a record high of 6.6 million, doubling from the previous year.Only 120,000 electric cars were sold worldwide in 2012.Each week in 2021, sales exceeded that amount.In 2021, around 10% of global automotive sales were electric, four times the percentage in 2019.This increased the total number of electric cars on the world's roadways to nearly 16.5 million, or three times the 2018 level.Two million electric cars were sold worldwide in the first quarter of 2022, a 75% increase over the same time in 2021.China and Europe accounted for more than 85% of the worldwide sales of EVs in 2021, followed by the United States (10%) [11].
Higher penetration of EVs can bring several benefits in terms of renewable consumption and reducing CO 2 emissions, as discussed in previous paragraphs.In addition, EVs are beneficial for distribution systems, microgrids, and nano grids in several ways.For example, the authors in [12] analyzed different architectures and concluded that AC-DC hybrid architecture is the most suitable for EV integration in micro and nano grids.Similarly, different challenges and enablers for using EVs as a service are discussed in [13].Different aspects, such as technical, economic, behavioral, and regulatory aspects of integrating EVs with distribution systems are discussed in [14].Finally, the useability of EVs for providing reliability as a service for different building types is analyzed in [15] and fault estimation methods in [16].However, with the increased penetration of EVs, several challenges arise-for example, planning and management of power systems considering highly uncertain loads due to EVs [17].In addition, the driving preferences and patterns of different users are different.This further complicates the management of power system loads and significantly increases the computational burden.One practical solution to deal with this problem is to group EVs and other related aspects of EVs by using different clustering methods.Clustering methods are widely used in different disciplines to arrange and group datasets, and then analysis of representative samples from each cluster can be carried out.These methods can be used to cluster EVs and will eliminate the need to analyze all EVs individually.There are several studies in the literature on clustering different aspects of EVs to reduce the computational complexity of the network during analysis.Some of the main areas studied in the existing literature include modeling the behavior of the EV user [18], EV driving cycle [19], used EV batteries [15] and clustering [20], and EV charging stations [21].However, other aspects of EVs also need to be further analyzed using clustering techniques.For example, the impact of EVs on different distribution circuits [22], charging infrastructure during emergencies [23], equity and disparities in rebate allocations [24], and the use of big data with cluster analysis to assist transportation network management [25].Cluster analysis is a potential solution to reduce the complexity of the network under higher penetration of EVs while preserving the diversity of user behavior and EV traits.
The main objective of this study is to analyze the current literature on the application of clustering methods for various aspects of EVs.In addition, the shortcomings of the existing literature will be identified along with future research directions needed to facilitate rapid analysis of systems under higher penetration of EVs.Therefore, the analysis in this study is divided into three parts.In the first part, different clustering methods are analyzed, which include both hard and soft clustering methods.In addition, different categories of hierarchical and partitional clustering methods are also discussed.Each section is followed by the merits and demerits of different clustering methods.In the second part, the existing literature is analyzed on cluster analysis of different aspects of EVs, specifically the application of clustering methods for modeling the behavior of EV users and EV driving cycle, used EV battery clustering, and EV charging station clustering.Each section is followed by a summary of the methods used in these studies and the major consideration in each study.Finally, in the third part, the shortcomings of existing studies are summarized, and future research directions are presented.Specifically, the need for further research is discussed on the application of cluster analysis to different related fields.These fields include the EV impact on different distribution circuits, charging infrastructure during emergencies, equity and disparities in rebate allocations, and the use of big data with cluster analysis to assist in transportation network management.

Clustering Methods
Clustering, or cluster analysis, is an unsupervised learning technique for assigning data into separate groups based on a predetermined set of criteria.It helps users understand the grouping in a data set.Data from the same class are often similar, while data from other classes are typically dissimilar [26].There are two major types of clustering techniques: crisp (hard) clustering and soft (flexible) clustering.In the case of hard clustering, a data point only belongs to a single cluster, while in the case of fuzzy clustering, each point may belong to two or more groups [27].An overview of different clustering methods is presented in Figure 2.
EV driving cycle [19], used EV batteries [15] and clustering [20], and EV charging stations [21].However, other aspects of EVs also need to be further analyzed using clustering techniques.For example, the impact of EVs on different distribution circuits [22], charging infrastructure during emergencies [23], equity and disparities in rebate allocations [24], and the use of big data with cluster analysis to assist transportation network management [25].Cluster analysis is a potential solution to reduce the complexity of the network under higher penetration of EVs while preserving the diversity of user behavior and EV traits.
The main objective of this study is to analyze the current literature on the application of clustering methods for various aspects of EVs.In addition, the shortcomings of the existing literature will be identified along with future research directions needed to facilitate rapid analysis of systems under higher penetration of EVs.Therefore, the analysis in this study is divided into three parts.In the first part, different clustering methods are analyzed, which include both hard and soft clustering methods.In addition, different categories of hierarchical and partitional clustering methods are also discussed.Each section is followed by the merits and demerits of different clustering methods.In the second part, the existing literature is analyzed on cluster analysis of different aspects of EVs, specifically the application of clustering methods for modeling the behavior of EV users and EV driving cycle, used EV battery clustering, and EV charging station clustering.Each section is followed by a summary of the methods used in these studies and the major consideration in each study.Finally, in the third part, the shortcomings of existing studies are summarized, and future research directions are presented.Specifically, the need for further research is discussed on the application of cluster analysis to different related fields.These fields include the EV impact on different distribution circuits, charging infrastructure during emergencies, equity and disparities in rebate allocations, and the use of big data with cluster analysis to assist in transportation network management.

Clustering Methods
Clustering, or cluster analysis, is an unsupervised learning technique for assigning data into separate groups based on a predetermined set of criteria.It helps users understand the grouping in a data set.Data from the same class are often similar, while data from other classes are typically dissimilar [26].There are two major types of clustering techniques: crisp (hard) clustering and soft (flexible) clustering.In the case of hard clustering, a data point only belongs to a single cluster, while in the case of fuzzy clustering, each point may belong to two or more groups [27].An overview of different clustering methods is presented in Figure 2. Hard clustering algorithms can be divided into hierarchical algorithms and partitional algorithms.The dataset is split into a single partition in case of partitional algo- Hard clustering algorithms can be divided into hierarchical algorithms and partitional algorithms.The dataset is split into a single partition in case of partitional algorithm.Contrarily, the dataset is divided into a series of partitions (nested inside one another) in case of hierarchical algorithms [27].A generalized dendrogram for hierarchical clustering algorithms is shown in Figure 3.
rithm.Contrarily, the dataset is divided into a series of partitions (nested inside one another) in case of hierarchical algorithms [27].A generalized dendrogram for hierarchical clustering algorithms is shown in Figure 3.

Hierarchical Clustering Algorithms
Hierarchical algorithms can be categorized as agglomerative and divisive algorithms.A divisive hierarchical algorithm divides data into smaller clusters, while an agglomerative algorithm merges data points into larger clusters from the bottom to the top.Contrarily, partitioning algorithms establish a one-level division of the dataset [28].Hierarchical clustering is often shown using a dendrogram, a specific tree structure, as shown in Figure 3 [27].

Agglomerative Hierarchical Clustering
In the case of agglomerative hierarchical clustering, each data point starts in its own cluster.Comparable clusters are then merged to form a hierarchy [29].Agglomerative hierarchical methods can be categorized into graph and geometric methods.Graph methods can be further divided into complete, single, average, and weighted average linkage methods.Similarly, geometric methods include Ward, median, and centroid methods [27].An overview of hierarchical clustering methods is shown in Figure 4.There are several subcategories of the agglomerative hierarchical clustering algorithms:

Hierarchical Clustering Algorithms
Hierarchical algorithms can be categorized as agglomerative and divisive algorithms.A divisive hierarchical algorithm divides data into smaller clusters, while an agglomerative algorithm merges data points into larger clusters from the bottom to the top.Contrarily, partitioning algorithms establish a one-level division of the dataset [28].Hierarchical clustering is often shown using a dendrogram, a specific tree structure, as shown in Figure 3 [27].

Agglomerative Hierarchical Clustering
In the case of agglomerative hierarchical clustering, each data point starts in its own cluster.Comparable clusters are then merged to form a hierarchy [29].Agglomerative hierarchical methods can be categorized into graph and geometric methods.Graph methods can be further divided into complete, single, average, and weighted average linkage methods.Similarly, geometric methods include Ward, median, and centroid methods [27].rithm.Contrarily, the dataset is divided into a series of partitions (nested inside one another) in case of hierarchical algorithms [27].A generalized dendrogram for hierarchical clustering algorithms is shown in Figure 3.

Hierarchical Clustering Algorithms
Hierarchical algorithms can be categorized as agglomerative and divisive algorithms.A divisive hierarchical algorithm divides data into smaller clusters, while an agglomerative algorithm merges data points into larger clusters from the bottom to the top.Contrarily, partitioning algorithms establish a one-level division of the dataset [28].Hierarchical clustering is often shown using a dendrogram, a specific tree structure, as shown in Figure 3 [27].

Agglomerative Hierarchical Clustering
In the case of agglomerative hierarchical clustering, each data point starts in its own cluster.Comparable clusters are then merged to form a hierarchy [29].Agglomerative hierarchical methods can be categorized into graph and geometric methods.Graph methods can be further divided into complete, single, average, and weighted average linkage methods.Similarly, geometric methods include Ward, median, and centroid methods [27].An overview of hierarchical clustering methods is shown in Figure 4.There are several subcategories of the agglomerative hierarchical clustering algorithms: Single-Link Method: Single-link hierarchical clustering, also referred to as nearestneighbor, is one of the most straightforward methods [27].The single-linkage criterion is the lowest difference between two objects.The vicinity between two clusters is determined by the minimum distance between any two objects of each cluster [29].A single linkage may efficiently cluster non-elliptical elongated-shaped groupings of data objects.A significant disadvantage of this approach is that it is susceptible to noise and outliers in the data set [28,29].
Complete Link Method: This method is also known as the farthest neighbor method and it determines the most prominent dissimilarity between two objects.The maximum distance between any objects that belong to separate clusters defines the proximity of the two clusters [29].This linkage method considers the structure of the cluster, exhibits non-local behavior, and typically produces clusters with compact shapes [28].These clusters are more compact than clusters based on the single linkage method [30].However, this linkage method is also vulnerable to outliers [29].
Group Average Method: The group average method or the unweighted pair group method uses arithmetic averages to determine the mean or median distances among all the objects between clusters [27,30].Compared to single and complete links, an average linkage method offers the best balance between reducing the variance within the clusters and increasing the variance between clusters [29].However, one of the main disadvantages of this method is that it is likely that elongated clusters divide and parts of neighboring elongated clusters combine as a result of average link clustering [31].
Weighted Group Average Method: This method is also known as the 'weighted pair group method' and it uses arithmetic average.It first constructs a dendrogram that contains information on a similarity matrix.The nearest two clusters are combined into a higherlevel cluster at each step.Then, its distance to another cluster is calculated.It is the arithmetic mean of the average distances between members of clusters.
Centroid Method: The centroid method computes the distance between centroids of two clusters.Compared to previous linkage methods, it is more tolerant of outliers and performs better when dealing with clusters of various sizes [29].Centroid linkage clustering employs only the centroid of the cluster to determine the similarity between two clusters.In contrast, the group average method considers all pairs of datasets to calculate the average pairwise similarity [28].
Median Method: This method is also known as the weighted pair group method and uses centroids or weighted centroids.It was first introduced by Gower in 1967 [28].Although the median and centroid methods are relatively similar, there is a difference.The centroid of the new group does not depend on the size of the groups that make up that group [32].
The major drawback of this method is that it is not suitable for metrics, as it cannot be interpreted geometrically [28].Ward's Method: This method, also known as Ward minimum variance method, was proposed by Ward in 1963 to compute the minimum increase in the within-cluster sum of squares as a result of the merging of two clusters.The objective of the Ward technique is to combine these two clusters into a group with minimal variations [33].
The advantages and disadvantages of different agglomerative hierarchical methods are listed in Table 1.

Single-link Method
Can differentiate between non-elliptical shapes as long as the gap between the two clusters is not small.
Susceptible to noise and outliers in the dataset.[27] Complete-link method Provides well-separated clusters even if there is some noise present between clusters.
Biased towards globular clusters and tends to break large clusters.[29] Table 1.Cont.

Advantages Disadvantages
Ref.

Group average method
Offers the best balance of reducing within-cluster variance and increasing between-cluster variance.
Due to average-link clustering, likely for elongated clusters to get divided and for parts of neighboring elongated clusters to get combined.[30] Weighted group average method Unbiasedness towards middle value and unaffected by outliners or extreme values.
Difficult to understand when the number of observations increases.[29] Centroid method More tolerant to outliers and performs better when dealing with clusters of various sizes.
Updates may cause large changes throughout the cluster hierarchy. [29]

Median method
The new group's centroid is independent of the size of the groups that make up that group.
Not suitable for metrics since it cannot be interpreted geometrically. [32] Ward's method Good at recovering cluster structure and yields unique and exact hierarchy.
Sensitive to outliers and poor at recovering elongated clusters. [33]

Divisive Hierarchical Clustering
Another variant of hierarchical clustering is a top-down approach known as divisive hierarchical clustering [34].At the beginning, all items belong to the same, single cluster.The cluster is then split into sub-clusters and subdivided into still smaller sub-clusters.This procedure is repeated until the appropriate cluster structure is achieved [31].There are two types of divisive clustering: monothetic and polythetic methods.Unlike the monothetic technique, which is focused on a single feature, polythetic approaches consider the values of all characteristics within a data set [27].To highlight the similarities between the two instances, polythetic divisive clustering considers all elements concurrently.When many variables are present, scalability concerns may arise.The best results will be achieved with monothetic clustering when the focus is on a single character throughout the time [30].

Partitional Clustering Algorithms
Partitional clustering techniques partition the data set into a defined number of clusters without any hierarchical structure [35].The benefits of hierarchical algorithms are the drawbacks of partitional algorithms and vice versa.Partitional clustering approaches are more prevalent in pattern recognition than hierarchical algorithms [36].They are advantageous when constructing a dendrogram would be computationally prohibitive for an application requiring an extensive data set.Figure 5 shows the clustering pattern of the partitional clustering method for 145 data points into four clusters [30].However, in general, selecting the number of desired output clusters is challenging using a partitional method [35].An overview of different partitional clustering methods is shown in Figure 6.The following sections describe several partitioning approaches.

K-Means Clustering
The K-means clustering technique is the most widely used partitional clustering algorithm [32,33].The K-means clustering technique was first proposed by Steinhaus in 1956 and has since been used in many domains, including psychology, marketing research, medicine, and biology [29].The fundamental objective of this approach is to split an n-dimensional dataset into k clusters such that the sum of squares inside each partition is as low as possible.K-means generates a flatter grouping structure than hierarchical methods.The Euclidean distance is the most common distance metric used to determine the similarity between two objects.There must be at least one item in each k group partitioned by the partitioning algorithm [36].
Despite its popularity, there are some limitations to K-means clustering [37].For example, there is no efficient and universal approach to determine the initial partitions and the number of clusters.In addition, the K-means algorithm is susceptible to noise and outliers.Even if an item is far from the cluster's center, it is nevertheless compelled to join the cluster, distorting its structure [38,39].[40,41].FCM is an unsupervised clustering algorithm [42] in which a single data point may belong to two or more clusters [43,44].

Fuzzy C-Means Clustering
FCM can be used to solve various feature analysis, clustering, and classifier construction issues.It has been widely used in diverse fields [42].When compared with Kmeans, FCM allocates each pattern with some degree of membership to a cluster, i.e., it yields a fuzzy clustering.When there are some overlaps between clusters in the data set, it is more appropriate for real-world applications than K-means.

K-Medoids Clustering
K-medoid also seeks to minimize the sum of squared error (SSE) [31].One of the cluster points is used to represent it in k-medoids approaches.The objective function is the averaged distance or another measure of dissimilarity between a point and its medoid when medoids are chosen.Clusters are subsets of points near their corresponding medoids [38].

K-Means Clustering
The K-means clustering technique is the most widely used partitional clustering algorithm [32,33].The K-means clustering technique was first proposed by Steinhaus in 1956 and has since been used in many domains, including psychology, marketing research, medicine, and biology [29].The fundamental objective of this approach is to split an n-dimensional dataset into k clusters such that the sum of squares inside each partition is as low as possible.K-means generates a flatter grouping structure than hierarchical methods.The Euclidean distance is the most common distance metric used to determine the similarity between two objects.There must be at least one item in each k group partitioned by the partitioning algorithm [36].
Despite its popularity, there are some limitations to K-means clustering [37].For example, there is no efficient and universal approach to determine the initial partitions and the number of clusters.In addition, the K-means algorithm is susceptible to noise and outliers.Even if an item is far from the cluster's center, it is nevertheless compelled to join the cluster, distorting its structure [38,39].[40,41].FCM is an unsupervised clustering algorithm [42] in which a single data point may belong to two or more clusters [43,44].

Fuzzy C-Means Clustering
FCM can be used to solve various feature analysis, clustering, and classifier construction issues.It has been widely used in diverse fields [42].When compared with K-means, FCM allocates each pattern with some degree of membership to a cluster, i.e., it yields a fuzzy clustering.When there are some overlaps between clusters in the data set, it is more appropriate for real-world applications than K-means.

K-Medoids Clustering
K-medoid also seeks to minimize the sum of squared error (SSE) [31].One of the cluster points is used to represent it in k-medoids approaches.The objective function is the averaged distance or another measure of dissimilarity between a point and its medoid when medoids are chosen.Clusters are subsets of points near their corresponding medoids [38].
This method is quite similar to the K-means algorithm.The K-medoids approach, like K-means, aims to discover a clustering solution that minimizes a given objective function.Like the K-means clustering technique, the K-medoids algorithm iterates until each representative data point becomes the cluster medoid [29].Since the placement of most of the points within a cluster determines the choice of medoids, it is less vulnerable to outliers.Therefore, the K-medoids approach is more robust for noise and outliers as compared to the K-means algorithm.However, compared to the K-means approach, it is computationally more expensive [31,38].

K-Modes Clustering
Huang (1997) proposed the K-modes clustering algorithm for categorical data by presenting a new dissimilarity metric.The K-modes algorithm is an improved version of the K-means algorithm.Due to the improvements to the K-means method, the Kmodes algorithm can cluster very large categorical data sets from real-world databases effectively [45,46].Another benefit of the K-modes technique is that the modes provide distinctive cluster descriptions.These descriptions are crucial to the user's ability to comprehend clustering findings.The K-modes method is faster than the K-means algorithm because it requires fewer iterations to achieve convergence [46].
The K-modes method employs the same clustering procedure as the K-means algorithm, except for the clustering cost function, which has the same limitations.The K-modes algorithm has several additional shortcomings.For example, inability to detect the number of clusters, inability to converge to the global optimum, and prone to outliers [4,47].

DBSCAN Algorithm
Ester proposed the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, a density-based clustering algorithm to discover arbitrarily shaped clusters in 1996 [27].Clusters are determined using the DBSCAN technique by examining the point density.The presence of clusters is indicated by the density of points.Similarly, regions with a low density of points represent noise clusters or outlier clusters.This technique is well-suited for dealing with large datasets that include noise.In addition, it can distinguish clusters of different sizes and forms.
The essential concept of the DBSCAN algorithm is that, for each point in a cluster, the neighborhood of a specific radius must have a minimum number of points, i.e., the density in the neighborhood must surpass a set threshold [34].

Gaussian Mixture Model Algorithm
Gaussian Mixture Model (GMM) is a probabilistic model that indicates the existence of subclusters in every observation.Mixture models are used to identify subcluster characteristics.For developing mixed models, approaches such as unsupervised learning and clustering are used.These, however, do not apply to all feature extraction processes.Combinational models may be assumed for mixture models.In combinational models, the members of the cluster are specified arbitrarily, while the total size of the clusters in mixture models is fixed at 1 [48].GMM is an estimation method for probability density distributions [49].GMM may be seen as an extension of the Vector Quantization (VQ) model.The clusters in this model overlap.A feature vector is not allocated to the cluster that is closest to it.Nonetheless, the probability value determined from cluster observations is not zero [50,51].
The expectation-maximization (EM) technique for Gaussian mixtures and the K-means algorithm are comparable in many ways [52].Instead of assigning each data point to a single cluster in a rigid way, as the K-means algorithm does, the EM method assigns data points based on posterior probabilities.The K-means method can be derived from the EM for Gaussian mixtures as a specific limit [53].
The advantages and disadvantages of different partitional clustering methods are listed in Table 2.

K-means clustering
Most widely used method; it generates a flatter grouping structure than hierarchical methods.
No universal approach for determining the initial partitions and the number of clusters; susceptibility to noise and outliers. [39]

Fuzzy C-means clustering
More appropriate for datasets with some overlaps between clusters.
Poor performance for clusters with unequal sizes/densities and sensitive to noise and outliers. [44]

K-medoids clustering
Less vulnerable to outliers; therefore, it is more resilient than the K-means algorithm in the face of noise and outliers.
Compared to the K-means approach, it is more computationally expensive.

Application of Clustering in Electric Vehicles
Cluster analysis can be applied to different aspects of EVs.Common examples include EV user behavior, EV driving cycle, and EV battery charging.In addition, clustering can also be applied to group EV charging stations and to analyze the impacts of EVs on power distribution systems.It should be noted that the application of clustering analysis for EV battery charging and impacts on distribution systems are specific to EVs.However, user behavior and driving cycle analysis are common for both EVs and internal combustion engine vehicles (ICEVs).Although the same clustering methods can be applied for both EVs and ICEVs, the driving behavior of EV owners and ICEV owners has major differences mainly due to the difference in charging/fueling mechanisms.Similarly, the driving cycle of EVs and non-EVs is different, as discussed in [54].Therefore, the outcome of the clustering method could be different.The following sections cover these aspects of EV clustering in detail.

EV User Behavior Clustering
To ensure the reliability of the power supply, it is necessary to anticipate the EV's behavior in advance.However, the activity of individual EVs is very unpredictable, and their daily behavior patterns can vary considerably.This makes it challenging to create a model that simultaneously predicts the actions of all EVs operating in a system or area.To solve this problem, the results of the cluster analysis can be used to model and forecast the behavior of a group of similar EVs.The collection of similar EV activities is expected to minimize unpredictability and improve the behavior prediction accuracy [55].The following subsections summarize different studies conducted on EV user behavior clustering.

K-Means Algorithm
The K-means clustering algorithm has been commonly used to cluster the behavior of EV users due to its simplicity and many other advantages [56][57][58].For example, Hu et al. [18] used the K-means and DBSCAN clustering algorithms to classify EV consumers.They categorize 7426 EV users into six classes, which includes lost users, possible users, new users, key users to develop, key users to sustain, and high-value users.An overview of the proposed method [18] is shown in Figure 7.The suggested technique was compared with the standard clustering algorithm and fuzzy c-means method, showing that the new method is more robust than the other approaches.Similarly, Xiong et al. [59] proposed a new method that integrates K-means clustering with multilayer perceptron.First, historical charging data are processed using K-means clustering to establish assumptions about EV user behavior for EV charging schedules.Then, a multilayer perceptron is used to analyze the EV user charging record data and generate classifications based on clustering labels from the K-means algorithm and manual labeling through data visualization.The suggested technique automates the labeling of the data sets.In addition, it is not required to perform clustering when a new user connects to the charging network.After training, the method may be used concurrently with real-time control.
view of the proposed method [18] is shown in Figure 7.The suggested technique was compared with the standard clustering algorithm and fuzzy c-means method, showing that the new method is more robust than the other approaches.Similarly, Xiong et al. [59] proposed a new method that integrates K-means clustering with multilayer perceptron.First, historical charging data are processed using K-means clustering to establish assumptions about EV user behavior for EV charging schedules.Then, a multilayer perceptron is used to analyze the EV user charging record data and generate classifications based on clustering labels from the K-means algorithm and manual labeling through data visualization.The suggested technique automates the labeling of the data sets.In addition, it is not required to perform clustering when a new user connects to the charging network.After training, the method may be used concurrently with real-time control.

DBSCAN Algorithm
The DBSCAN algorithm has also been used by a number of studies to cluster EVs based on user behavior.For example, Fan et al. [60] clustered EV users by combining the coefficient matrix with density clustering.The grouping comprises clustering based on user preferences and clustering based on item similarity.User preference clustering refers to grouping based on trust between users.It first computes the correlation coefficient matrix between users and then performs density clustering based on the similarity matrix.The premise of density clustering is that samples of the same category are strongly connected; that is, samples of the same category must be close to each sample in this category.A cluster category is generated by grouping closely related samples into one category.When all samples are divided into distinct groups, the findings of all clustering groups can be collected.Item similarity clustering also refers to the setup of similarity clustering for new EVs.Based on the findings of the user clustering, the score of the new EV configurations in each user group is determined by item similarity.This way, the degree of preference of each type of user for various configurations can be easily understood.An overview of the approach used in [60] for EV clustering is shown in Figure 8.

Hybrid Methods
Hybrid methods can provide better results due to their ability to overcome the demerits of different algorithms they combine.Therefore, several researchers have combined different methods for EV clustering.For example, Helmus et al. [61] classified the charging behavior of EV users using a two-stage clustering technique.First, a Gaussian mixture model is used to cluster charging sessions, revealing 13 unique charging session categories (including seven types of daylight charging sessions and six types of nocturnal charging sessions).The Partition Around Medoids method yields nine user classes based on their separate portfolio of charging session types.Three types of daytime charging users, three types of night-time charging types, and three types of irregular users.An overview of the hybrid method (Gaussian mixture model and Partition Around Medoids) proposed in [61] is shown in Figure 9.

Hybrid Methods
Hybrid methods can provide better results due to their ability to overcome the demerits of different algorithms they combine.Therefore, several researchers have combined different methods for EV clustering.For example, Helmus et al. [61] classified the charging behavior of EV users using a two-stage clustering technique.First, a Gaussian mixture model is used to cluster charging sessions, revealing 13 unique charging session categories (including seven types of daylight charging sessions and six types of nocturnal charging sessions).The Partition Around Medoids method yields nine user classes based on their separate portfolio of charging session types.Three types of daytime charging users, three types of night-time charging types, and three types of irregular users.An overview of the hybrid method (Gaussian mixture model and Partition Around Medoids) proposed in [61] is shown in Figure 9.
categories (including seven types of daylight charging sessions and six types of nocturnal charging sessions).The Partition Around Medoids method yields nine user classes based on their separate portfolio of charging session types.Three types of daytime charging users, three types of night-time charging types, and three types of irregular users.An overview of the hybrid method (Gaussian mixture model and Partition Around Medoids) proposed in [61] is shown in Figure 9.

Other Methods
There have been other clustering methods used to group EVs according to user behavior.For example, Powell et al. [62] used agglomerative clustering to classify drivers in ascending order.Each driver is assigned to its cluster at initialization, and the method joins two clusters at each step.The selection of clusters to merge is based on minimizing the increase in the sum of squares within the cluster.Similarly, Campbell et al. [3] used the Wards cluster analysis approach to census data (based on age, income, automobile ownership, property ownership, socioeconomic status, and education) to discover prospective drivers of alternative fuel vehicles in Birmingham, United Kingdom.The sum of squares (distance) between an item in the first cluster and an object in the second cluster is calculated using Ward's approach and then totaled over all variables.This strategy maximizes the formation of clusters of roughly equal proportions.An overview of the clustering approach used in [3] is shown in Figure 10.

Other Methods
There have been other clustering methods used to group EVs according to user behavior.For example, Powell et al. [62] used agglomerative clustering to classify drivers in ascending order.Each driver is assigned to its cluster at initialization, and the method joins two clusters at each step.The selection of clusters to merge is based on minimizing the increase in the sum of squares within the cluster.Similarly, Campbell et al. [3] used the Wards cluster analysis approach to census data (based on age, income, automobile ownership, property ownership, socioeconomic status, and education) to discover prospective drivers of alternative fuel vehicles in Birmingham, United Kingdom.The sum of squares (distance) between an item in the first cluster and an object in the second cluster is calculated using Ward's approach and then totaled over all variables.This strategy maximizes the formation of clusters of roughly equal proportions.An overview of the clustering approach used in [3] is shown in Figure 10.Finally, a summary of the different clustering methods used for EV clustering based on user behavior is shown in Table 3.

Method Used
Clustering Objective Ref.

K-means and DBSCAN
Classify EV consumers to improve profitability and user loyalty [18] K-means clustering with multilayer perceptron EV user behavior for charging schedules [59] Coefficient matrix with density clustering Grouping based on the trust among users [60] Gaussian Mixture Model User classes based on their separate charging session portfolios [61] Agglomerative clustering To estimate EV charging load for long-term planning [62] Wards cluster analysis Find prospective drivers of alternative fuel vehicles [3]

EV Driving Cycle Clustering
Driving cycles are speed-time profiles representing real-world driving conditions in a particular city or region [63].They can be used during laboratory chassis dynamometer simulation tests and in automotive simulation research to evaluate fuel consumption and exhaust emissions.In addition, driving cycles can be used to monitor energy con- Finally, a summary of the different clustering methods used for EV clustering based on user behavior is shown in Table 3.

Method Used
Clustering Objective Ref.

K-means and DBSCAN
Classify EV consumers to improve profitability and user loyalty [18] K-means clustering with multilayer perceptron EV user behavior for charging schedules [59] Coefficient matrix with density clustering Grouping based on the trust among users [60] Gaussian Mixture Model User classes based on their separate charging session portfolios [61] Agglomerative clustering To estimate EV charging load for long-term planning [62] Wards cluster analysis Find prospective drivers of alternative fuel vehicles [3]

EV Driving Cycle Clustering
Driving cycles are speed-time profiles representing real-world driving conditions in a particular city or region [63].They can be used during laboratory chassis dynamometer simulation tests and in automotive simulation research to evaluate fuel consumption and exhaust emissions.In addition, driving cycles can be used to monitor energy consumption and estimate the driving range of EVs.Moreover, driving cycles are essential for realistic life cycle studies and evaluating the impacts of EVs on the power system [63,64].Therefore, several researchers have clustered EV driving cycles using different methods to reduce the computational burden of the analysis.An overview of these studies is presented in the following sections.

Hard Clustering Approaches
In hard clustering, a data point only belongs to a single cluster.Several researchers have used hard clustering methods to analyze the EV driving cycle.For example, EV driving cycles are developed by Berzi et al. [19] where driving sequence analysis is performed to group different EV clusters.Similarly, different microsegment parameters are estimated by Brady and O'Mahony [63], and driving cycle synthesis is carried out using data segmentation and classification techniques.An overview of the clustering process proposed in [63] is shown in Figure 11.Fotouhi and Montazeri-Gh [64] used K-means clustering to group vehicles into four clusters considering two driving features, such as the average speed and idle time percentage.K-means clustering was used by Yuhui et al. [65] to design a target driving cycle using six characteristic parameters.Driving time and instantaneous velocity are used by Zhou et al. [66] and Chen and Xiong [66] to develop the driving cycle of EVs using K-means clustering.Principal component analysis and K-means clustering are used by Zhang et al. [67] for driving cycle estimation of special-purpose EVs.The K-means algorithm can also be used for cluster analysis of EV driving cycles.Zhao et al. [68,69] classified driving segments using a hybrid classification technique combining K-means and support vector machine (SVM).In [69], the SVM model training sets comprised the top 10% of optimal driving segments from the K-means clustering results.The K-means clustering method employs the Euclidean distance between the driving cycles and the cluster center as a classification metric and offers the benefits of efficiency and simplicity.However, it is a hard clustering approach; when driving segment clustering includes numerous classes or the distance between cluster centers is short, the clustering impact is weak and it is possible to slip into a local optimum that cannot be clustered incrementally.Therefore, other researchers have used soft clustering methods to conduct studies summarized in the following subsection.

Soft Clustering Approaches
In soft clustering, data points can belong to more than one cluster.In contrast to the K-means method, the fuzzy C-means clustering algorithm calculates the degree to which each sample point is similar to each class; this value is referred to as the class membership degree.Zhao et al. [70] used this method to cluster the driving segments.In this scheme, a membership degree matrix reflects the likelihood that samples correspond to specific classes.Therefore, the hard clustering of the K-means algorithm is changed into fuzzy clustering based on soft membership, which has a greater chance of achieving global optimality.The fundamental concept of the FCM method is to iteratively search for the membership degree matrix and clustering centers to achieve the minimum value of the objective function.An overview of the representative cycle selection based on soft clustering is presented in Figure 12.The K-means clustering method employs the Euclidean distance between the driving cycles and the cluster center as a classification metric and offers the benefits of efficiency and simplicity.However, it is a hard clustering approach; when driving segment clustering includes numerous classes or the distance between cluster centers is short, the clustering impact is weak and it is possible to slip into a local optimum that cannot be clustered incrementally.Therefore, other researchers have used soft clustering methods to conduct studies summarized in the following subsection.

Soft Clustering Approaches
In soft clustering, data points can belong to more than one cluster.In contrast to the K-means method, the fuzzy C-means clustering algorithm calculates the degree to which each sample point is similar to each class; this value is referred to as the class membership degree.Zhao et al. [70] used this method to cluster the driving segments.In this scheme, a membership degree matrix reflects the likelihood that samples correspond to specific classes.Therefore, the hard clustering of the K-means algorithm is changed into fuzzy clustering based on soft membership, which has a greater chance of achieving global optimality.The fundamental concept of the FCM method is to iteratively search for the membership degree matrix and clustering centers to achieve the minimum value of the objective function.An overview of the representative cycle selection based on soft clustering is presented in Figure 12.

Soft Clustering Approaches
In soft clustering, data points can belong to more than one cluster.In contrast to the K-means method, the fuzzy C-means clustering algorithm calculates the degree to which each sample point is similar to each class; this value is referred to as the class membership degree.Zhao et al. [70] used this method to cluster the driving segments.In this scheme, a membership degree matrix reflects the likelihood that samples correspond to specific classes.Therefore, the hard clustering of the K-means algorithm is changed into fuzzy clustering based on soft membership, which has a greater chance of achieving global optimality.The fundamental concept of the FCM method is to iteratively search for the membership degree matrix and clustering centers to achieve the minimum value of the objective function.An overview of the representative cycle selection based on soft clustering is presented in Figure 12.

Other Approaches
Apart from hard and soft clustering approaches, other methods have also been used for estimating the driving cycles of EVs.For example, Chen et al. [71] classified various driving cycles into six distinct types of driving cycle using the K-Shape clustering technique, a new algorithm that maintains the forms of the driving cycle data.This algorithm is suitable for clustering time series; therefore, it is not essential to extract features describing driving cycle characteristics.The authors compared this new approach to the common K-means algorithm for grouping driving cycles and concluded that the K-Shape approach performs better.
Due to the absence of detailed methodology for driving cycle analysis in [71], Wang et al. [72] proposed a "dimension reduction, clustering"-based driving cycle construction method that uses an advanced machine learning method for the offline solution of the SDP problem.It is based on the identification of driving conditions and its objective is to minimize the number of driving cycle characteristics while preserving the travel information included in the driving cycle data.An overview of the offline training and online testing method is shown in Figure 13.

Other Approaches
Apart from hard and soft clustering approaches, other methods have also been used for estimating the driving cycles of EVs.For example, Chen et al. [71] classified various driving cycles into six distinct types of driving cycle using the K-Shape clustering technique, a new algorithm that maintains the forms of the driving cycle data.This algorithm is suitable for clustering time series; therefore, it is not essential to extract features describing driving cycle characteristics.The authors compared this new approach to the common K-means algorithm for grouping driving cycles and concluded that the K-Shape approach performs better.
Due to the absence of detailed methodology for driving cycle analysis in [71], Wang et al. [72] proposed a "dimension reduction, clustering"-based driving cycle construction method that uses an advanced machine learning method for the offline solution of the SDP problem.It is based on the identification of driving conditions and its objective is to minimize the number of driving cycle characteristics while preserving the travel information included in the driving cycle data.An overview of the offline training and online testing method is shown in Figure 13.Finally, a summary of the different clustering methods used for the clustering of EVs for driving cycle analysis is shown in Table 4.

Method Used
Clustering Objective Ref.

K-means
Group different EV clusters for driving sequence analysis [19] To devise a target driving cycle [65] To develop the driving cycle of EVs [66] K-means and support vector machine To perform clustering even with numerous classes or with a short distance between cluster centers  Finally, a summary of the different clustering methods used for the clustering of EVs for driving cycle analysis is shown in Table 4.

Method Used
Clustering Objective Ref.

K-means
Group different EV clusters for driving sequence analysis [19] To devise a target driving cycle [65] To develop the driving cycle of EVs [66]

K-means and support vector machine
To perform clustering even with numerous classes or with a short distance between cluster centers [69] Table 4. Cont.

Method Used Clustering Objective
Ref.

Fuzzy C-means clustering To cluster the driving segments [70] K-Shape clustering technique
To choose driving cycle characteristics [71] Dimension reduction clustering To reduce the driving cycle features required for clustering while preserving the travel information [72]

EV Battery Clustering
Lithium ion batteries (LIBs) are commonly used in EVs because of their benefits, such as extended service life, high safety, and substantial specific energy.However, with usage, their capacity will degrade.As soon as the capacity drops to 80% of its original rated capacity, LIBs will meet the criteria for being retired from EVs.There will be an immediate need to address the optimal usage of decommissioned power LIBs due to the fast growth of the number of EVs.It is expected that more than 12 million tons of LIBs will be retired by 2030.There has been a surge of attention to the retirement of power LIBs in electric cars to reduce resource waste and pollution.Clustering and regrouping large-scale decommissioned LIBs are now the most important ways to achieve optimal use of the echelon usage [73].

Fuzzy Clustering Methods
Several [15,44,68,69] used fuzzy clustering methods to classify LIBs.Hu and Sun [20] proposed a new model to evaluate the state of charge (SOC) of lithium ion batteries used in EVs.The fuzzy c-means and subtractive clustering combination approach are used to perform fuzzy partitioning of data vectors, including the temperature, load voltage, and current of the lithium-ion battery pack under the urban dynamometer driving schedule.Then, the multi-model support vector regression (SVR) approach was used to estimate the SOC of a lithium-ion battery pack.The synthesized model was evaluated using 2000 training data and 3500 validation data.Simulation results indicate that the mean validation error of the fuzzy clustering-based multimode SVR technique is less than that of the conventional SVR model.
In addition, using machine learning, Hu et al. [74] created a state-of-charge indicator for LIB modules used in EVs.To identify the model's topology and antecedent parameters, a novel fuzzy C-means, they use a clustering technique based on a genetic algorithm.This reduced the risk of being trapped in local minima, and the number of fuzzy clusters was then estimated using a fast one-pass algorithm called the subtractive clustering algorithm.The second stage uses the backpropagation learning technique to improve the model's antecedent and consequent parameters.
Similarly, Tian et al. [49] clustered batteries using an enhanced fuzzy clustering approach based on a genetic algorithm.In addition, they used the Kernel Function (KF) to optimize the clustering center.The KF turned the samples of the original space into the feature space.Samples in the feature space were separated to obtain the best partition of the original space, enhancing the efficiency of clustering.Finally, nine months of data from EVs was compiled to verify the suggested algorithms.The simulation results demonstrated that the proposed technique clusters batteries more effectively.An overview of the battery clustering method proposed in [49] is shown in Figure 14.The fuzzy c-means clustering algorithm was also used by Wang et al. [75] to estimate the state of function (SOF) of the power LIBs.
Electronics 2023, 12, x FOR PEER REVIEW 15 of 24 fast growth of the number of EVs.It is expected that more than 12 million tons of LIBs will be retired by 2030.There has been a surge of attention to the retirement of power LIBs in electric cars to reduce resource waste and pollution.Clustering and regrouping large-scale decommissioned LIBs are now the most important ways to achieve optimal use of the echelon usage [73].

Fuzzy Clustering Methods
Several studies [15,44,68,69] used fuzzy clustering methods to classify LIBs.Hu and Sun [20] proposed a new model to evaluate the state of charge (SOC) of lithium ion batteries used in EVs.The fuzzy c-means and subtractive clustering combination approach are used to perform fuzzy partitioning of data vectors, including the temperature, load voltage, and current of the lithium-ion battery pack under the urban dynamometer driving schedule.Then, the multi-model support vector regression (SVR) approach was used to estimate the SOC of a lithium-ion battery pack.The synthesized model was evaluated using 2000 training data and 3500 validation data.Simulation results indicate that the mean validation error of the fuzzy clustering-based multimode SVR technique is less than that of the conventional SVR model.
In addition, using machine learning, Hu et al. [74] created a state-of-charge indicator for LIB modules used in EVs.To identify the model's topology and antecedent parameters, a novel fuzzy C-means, they use a clustering technique based on a genetic algorithm.This reduced the risk of being trapped in local minima, and the number of fuzzy clusters was then estimated using a fast one-pass algorithm called the subtractive clustering algorithm.The second stage uses the backpropagation learning technique to improve the model's antecedent and consequent parameters.
Similarly, Tian et al. [49] clustered batteries using an enhanced fuzzy clustering approach based on a genetic algorithm.In addition, they used the Kernel Function (KF) to optimize the clustering center.The KF turned the samples of the original space into the feature space.Samples in the feature space were separated to obtain the best partition of the original space, enhancing the efficiency of clustering.Finally, nine months of data from EVs was compiled to verify the suggested algorithms.The simulation results demonstrated that the proposed technique clusters batteries more effectively.An overview of the battery clustering method proposed in [49] is shown in Figure 14.The fuzzy c-means clustering algorithm was also used by Wang et al. [75] to estimate the state of function (SOF) of the power LIBs.

Support Vector Machine-Based Methods
SVM-based methods have also been used for grouping of used EV batteries.For example, Li et al. [73] developed an SVN-based approach for clustering and regrouping retired LIBs.Preliminary screening (based on battery capacity, internal resistance, and remaining useful life) was used to eliminate batteries with no echelon usage value.On the basis of the SVC, they developed an equal-number clustering technique.Using a publicly available validation data set, the proposed method correctly split 60 batteries into four even clusters.The proposed algorithm was compared with K-means and Gaussian mixture models clustering methods, and the results indicated that the equal-number SVC technique is very promising.An overview of the EV battery clustering method proposed in [67] is shown in Figure 15.Li et al. [26] used K-means and SVC methods to group battery cells with similar performance to construct battery modules with improved electrochemical performance.The results of the cluster analysis were experimentally validated by monitoring the cell temperature increase during a specified period in an air-conditioned environment.

Other Methods
Apart from fuzzy clustering and SVM-based methods, other clustering methods have also been used for grouping used EV batteries.For example, Liu [76] used a modified K-means clustering algorithm to classify EVs with different battery states of charge and different average daily vehicle travel (AVDT).The principle of this battery clustering method is shown in Figure 16.Similarly, Xu et al. [77] introduced a new clustering approach for retired batteries based on traversal optimization to reduce computation time and increase clustering accuracy.This approach does not need predefined cluster numbers and centers, and the clustering outcome is independent of outliers.In addition to avoiding repeated computation, this approach completes clustering by visiting all target locations.This way, the optimization process is not iterative and scales well to large sample sets.Compared to existing clustering methods, the new algorithm generates partitions with high disparity between clusters and the lowest differences between points within clusters.
A summary of different clustering methods used for EV battery clustering is shown in Table 5.
Table 5. Summary of clustering methods along with their objectives for EV battery grouping.

Method Used
Clustering Objective Ref. Fuzzy c-means and subtractive clustering SOC estimation of different EV clusters [20] Fuzzy C-means based on a genetic SOC indicator for lithium-ion battery Li et al. [26] used K-means and SVC methods to group battery cells with similar performance to construct battery modules with improved electrochemical performance.The results of the cluster analysis were experimentally validated by monitoring the cell temperature increase during a specified period in an air-conditioned environment.

Other Methods
Apart from fuzzy clustering and SVM-based methods, other clustering methods have also been used for grouping used EV batteries.For example, Liu [76] used a modified K-means clustering algorithm to classify EVs with different battery states of charge and different average daily vehicle travel (AVDT).The principle of this battery clustering method is shown in Figure 16.Li et al. [26] used K-means and SVC methods to group battery cells with similar performance to construct battery modules with improved electrochemical performance.The results of the cluster analysis were experimentally validated by monitoring the cell temperature increase during a specified period in an air-conditioned environment.

Other Methods
Apart from fuzzy clustering and SVM-based methods, other clustering methods have also been used for grouping used EV batteries.For example, Liu [76] used a modified K-means clustering algorithm to classify EVs with different battery states of charge and different average daily vehicle travel (AVDT).The principle of this battery clustering method is shown in Figure 16.Similarly, Xu et al. [77] introduced a new clustering approach for retired batteries based on traversal optimization to reduce computation time and increase clustering accuracy.This approach does not need predefined cluster numbers and centers, and the clustering outcome is independent of outliers.In addition to avoiding repeated computation, this approach completes clustering by visiting all target locations.This way, the optimization process is not iterative and scales well to large sample sets.Compared to existing clustering methods, the new algorithm generates partitions with high disparity between clusters and the lowest differences between points within clusters.
A summary of different clustering methods used for EV battery clustering is shown in Table 5.Similarly, Xu et al. [77] introduced a new clustering approach for retired batteries based on traversal optimization to reduce computation time and increase clustering accuracy.This approach does not need predefined cluster numbers and centers, and the clustering outcome is independent of outliers.In addition to avoiding repeated computation, this approach completes clustering by visiting all target locations.This way, the optimization process is not iterative and scales well to large sample sets.Compared to existing clustering methods, the new algorithm generates partitions with high disparity between clusters and the lowest differences between points within clusters.
A summary of different clustering methods used for EV battery clustering is shown in Table 5.
Table 5. Summary of clustering methods along with their objectives for EV battery grouping.

Method Used
Clustering Objective Ref.
Fuzzy c-means and subtractive clustering SOC estimation of different EV clusters [20] Fuzzy C-means based on a genetic algorithm SOC indicator for lithium-ion battery modules used in EVs [74] Innovative equal-number support vector clustering Clustering and regrouping retired LIBs [73] K-means and support vector clustering Cluster battery cells with similar performance to construct battery modules with improved performance [26] Modified K-means clustering Classify EVs based on battery SOC and different average vehicle daily travel [76] A novel clustering approach Grouping retired batteries based on traversal optimization [77]

EV Charging Station Clustering
As EV ownership expands, the number of charging stations also increases.The construction of a charging station requires a significant investment.Only with an optimum placement can charging stations save a substantial amount of money, offer users convenience, and increase their operational efficiency.Therefore, it is crucial to also include relevant studies on this topic [44].Clustering can eliminate the need for analysis of individual charging stations by grouping stations with similar profiles together.Several studies on clustering EV charging stations are analyzed in the following sections.

K-Means Clustering
K-means is the most widely used clustering technique in general and has also been used for charging station clustering.For example, Sánchez et al. [21] developed a clustering technique based on the K-means algorithm to partition consumers into small zones and identify potential locations of EV charging station.Hence, each centroid of the partition indicates a possible location for a charging station, while each cluster represents a customer's region.An overview of the clustering method used in [21] for charging station clustering is shown in Figure 17.Similarly, Chen et al. [78] employed the K-means clustering technique to compute the number of charging stations for EVs and their locations.

EV Charging Station Clustering
As EV ownership expands, the number of charging stations also increases.The construction of a charging station requires a significant investment.Only with an optimum placement can charging stations save a substantial amount of money, offer users convenience, and increase their operational efficiency.Therefore, it is crucial to also include relevant studies on this topic [44].Clustering can eliminate the need for analysis of individual charging stations by grouping stations with similar profiles together.Several studies on clustering EV charging stations are analyzed in the following sections.

Hierarchical Clustering
Hierarchical clustering has also been used for grouping EV charging stations.For example, Zhang et al. [48] used a hierarchical clustering method and a quadratic division based on K-means to group the charging demand location for EVs.Similarly, Catalbas et al. [50] estimated the optimal charging station locations of EVs for Ankara using various clustering approaches such as spectral clustering and the Gaussian Mixture Model.Ip et al. [79] implemented a two-step framework.First, road traffic data, such as traffic flows, were converted into data points.Then, an agglomerative hierarchical approach was used for the data points to produce different levels of clusters.The stations were then assigned to demand clusters using linear programming for optimization purposes.An overview of the proposed method for demand estimation based on charging station clustering is shown in Figure 18.

Hierarchical Clustering
Hierarchical clustering has also been used for grouping EV charging stations.For example, Zhang et al. [48] used a hierarchical clustering method and a quadratic division based on K-means to group the charging demand location for EVs.Similarly, Catalbas et al. [50] estimated the optimal charging station locations of EVs for Ankara using various clustering approaches such as spectral clustering and the Gaussian Mixture Model.Ip et al. [79] implemented a two-step framework.First, road traffic data, such as traffic flows, were converted into data points.Then, an agglomerative hierarchical approach was used for the data points to produce different levels of clusters.The stations were then assigned to demand clusters using linear programming for optimization purposes.An overview of the proposed method for demand estimation based on charging station clustering is shown in Figure 18.
various clustering approaches such as spectral clustering and the Gaussian Mixture Model.Ip et al. [79] implemented a two-step framework.First, road traffic data, such as traffic flows, were converted into data points.Then, an agglomerative hierarchical approach was used for the data points to produce different levels of clusters.The stations were then assigned to demand clusters using linear programming for optimization purposes.An overview of the proposed method for demand estimation based on charging station clustering is shown in Figure 18.

Other Clustering Methods
Apart from K-means and hierarchical algorithms, several other methods have also been used for charging station clustering.For example, Momtazpour et al. [80] used coordinated clustering algorithms to find a collection of places that are optimal candidates for charging stations.Shi and Zheng [44] used the fuzzy C-means clustering method to investigate the optimal location of charging stations.The first stage is to collect charging information from an urban region and then measure the charging requirement areas into separate data points over a control grid.Finally, the fuzzy C-means clustering algorithm is used to group spatial data points into clusters in which the data points are similar.An overview of the coordinated charging scheme proposed in [44] based on charging station clustering is shown in Figure 19.Finally, a summary of different clustering methods used for EV charging station clustering is shown in Table 6.Apart from K-means and hierarchical algorithms, several other methods have also been used for charging station clustering.For example, Momtazpour et al. [80] used coordinated clustering algorithms to find a collection of places that are optimal candidates for charging stations.Shi and Zheng [44] used the fuzzy C-means clustering method to investigate the optimal location of charging stations.The first stage is to collect charging information from an urban region and then measure the charging requirement areas into separate data points over a control grid.Finally, the fuzzy C-means clustering algorithm is used to group spatial data points into clusters in which the data points are similar.An overview of the coordinated charging scheme proposed in [44] based on charging station clustering is shown in Figure 19.Finally, a summary of different clustering methods used for EV charging station clustering is shown in Table 6.

Method Used
Clustering Objective Ref.

K-means algorithm
Finding prospective recharging station locations [21] Number of charging stations for EVs and the location of charging stations [78] Spectral clustering and the Gaussian Mixture Model Optimal charging station locations for EVs [50] Agglomerative hierarchical approach Different levels of clusters for charging stations [79] Fuzzy C-means clustering method Optimal location of charging stations [44] Coordinated clustering algorithms Optimal candidates for charging stations [80]

Summary of Selected Studies in Each Category
Four major aspects of EV clustering are discussed in the previous sections, which includes EV user behavior, driving cycle, EV battery, and charging stations.A few representative papers are selected from each category, and their advantages and drawbacks are summarized in Table 7.The drawbacks in each study are open research questions in each category and are also future research directions for researchers in these areas.

Summary of Selected Studies in Each Category
Four major aspects of EV clustering are discussed in the previous sections, which includes EV user behavior, driving cycle, EV battery, and charging stations.A few representative papers are selected from each category, and their advantages and drawbacks are summarized in Table 7.The drawbacks in each study are open research questions in each category and are also future research directions for researchers in these areas.

Shortcomings in Existing Studies and Future Research Directions
As described in the previous sections, there have been a number of studies conducted on clustering EVs considering their different aspects.However, more research is required to reduce the computational burden for detailed analysis with a higher penetration of EVs.The following are some of the areas that deserve further attention, as there is limited or no research available in the current literature.

Impact of EVs on the Distribution System
With a higher percentage of EVs, the load profiles of the distribution circuits are expected to change significantly.In addition, the loading profiles of different circuits also change due to the different penetration levels of EVs in different localities.Therefore, the grouping of distribution circuits is required to reduce the computational burden during analysis.Distribution circuits can be grouped into several clusters depending on their characteristics; circuits within the same subset will have similar load profiles, while circuits from separate subsets will have different profiles.This approach decreases the variety of circuit attributes in each subset and provides a more accurate description of the features using a typical single circuit.Thus, distinct circuit groups define different system features, and a typical circuit depicts each group of related circuits.
Only a few studies have been conducted on this topic.For example, Xu et al.
[52] proposed a plug-in EV impact assessment framework that uses a K-medoid clustering algorithm to select a small number of representative circuits from thousands of distribution circuits and conducts the impact study using Monte Carlo simulation in the representative circuits.The impact at the feeder level is then extrapolated to the system level.An overview of the proposed method is shown in Figure 20.Similarly, to assess the impact of electric cars on the electric power distribution system, Dow et al. [81] clustered the entire set of utility feeders using the K-medoids technique.With K clusters, each feeder in the data set is grouped into one of the clusters.However, more research is required in this area to facilitate the analysis of the distribution circuits with higher penetration of EVs.quired in this area to facilitate the analysis of the distribution circuits with higher penetration of EVs.

Charging Stations for Emergency Response
The intensity and frequency of natural disasters and man-made events are increasing due to climate change and increased penetration of information and the communication technology (ICT) in the power sector [23].In the electrified transportation era, EVs will be used for emergency response and evacuation as well.Therefore, appropriate charging infrastructure is required in different locations to cope with emergencies.Cluster analysis can potentially be applied in this area as well.Specifically, grouping and clustering of different localities are required considering their ability to respond to a large-scale outage.This will help policymakers identify areas with inadequate charging infrastructure so that they can be prioritized for future development.More research is required in this area, especially considering the application of different cluster analysis techniques discussed in this paper.

Disparities and Equity in Rebate Allocations
To increase the adoption of EVs, governments around the world have introduced different rebates and incentive programs.Proper allocation of rebate programs is required to maximize their benefit, and to ensure equity and reduce disparity in different localities.For example, a study conducted in California [24] has revealed that the initial EV rebate programs were more focused on high-income groups.This study also noted that the share of rebate programs for low-income/disadvantaged communities increased later when an income cap policy was put into effect.Cluster analysis can also be applied to identify different groups and target them to make EVs affordable to all, especially to low-income groups and disadvantaged communities.A very limited number of studies have been conducted on this topic, especially with the consideration of cluster analysis.

Model-Free Analysis
The increased penetration of EVs has necessitated detailed analysis of power systems, especially distribution networks at different levels.However, modeling power systems in detail is a time-consuming task and the analysis of each region is difficult.Cluster analysis can be combined with neural networks to generate synthetic data for different regions and train the model using historical and synthetic data.Such an approach is proposed in [82] by dividing the distribution circuits into four categories (clusters).The authors note that the proposed method can produce accurate time series scenarios, under different EV penetration levels, to ensure stable power system operation.More research is required in this area to facilitate the analysis of power systems considering different levels of EV penetrations in different regions.

Clustering with Big Data for EVs
With the rapid development in electronics and ICT, all vehicles, and especially EVs, are equipped with more and more sensors and intelligence.This generates more data which can be used to manage different aspects of transportation in the electrified transportation era of the near future.For example, mitigation of transportation network congestion is proposed in [25] using big data and cluster analysis techniques to group/cluster different localities based on the traffic flow.Then, rerouting of EVs is con-

Charging Stations for Emergency Response
The intensity and frequency of natural disasters and man-made events are increasing due to climate change and increased penetration of information and the communication technology (ICT) in the power sector [23].In the electrified transportation era, EVs will be used for emergency response and evacuation as well.Therefore, appropriate charging infrastructure is required in different locations to cope with emergencies.Cluster analysis can potentially be applied in this area as well.Specifically, grouping and clustering of different localities are required considering their ability to respond to a large-scale outage.This will help policymakers identify areas with inadequate charging infrastructure so that they can be prioritized for future development.More research is required in this area, especially considering the application of different cluster analysis techniques discussed in this paper.

Disparities and Equity in Rebate Allocations
To increase the adoption of EVs, governments around the world have introduced different rebates and incentive programs.Proper allocation of rebate programs is required to maximize their benefit, and to ensure equity and reduce disparity in different localities.For example, a study conducted in California [24] has revealed that the initial EV rebate programs were more focused on high-income groups.This study also noted that the share of rebate programs for low-income/disadvantaged communities increased later when an income cap policy was put into effect.Cluster analysis can also be applied to identify different groups and target them to make EVs affordable to all, especially to low-income groups and disadvantaged communities.A very limited number of studies have been conducted on this topic, especially with the consideration of cluster analysis.

Model-Free Analysis
The increased penetration of EVs has necessitated detailed analysis of power systems, especially distribution networks at different levels.However, modeling power systems in detail is a time-consuming task and the analysis of each region is difficult.Cluster analysis can be combined with neural networks to generate synthetic data for different regions and train the model using historical and synthetic data.Such an approach is proposed in [82] by dividing the distribution circuits into four categories (clusters).The authors note that the proposed method can produce accurate time series scenarios, under different EV penetration levels, to ensure stable power system operation.More research is required in this area to facilitate the analysis of power systems considering different levels of EV penetrations in different regions.

Clustering with Big Data for EVs
With the rapid development in electronics and ICT, all vehicles, and especially EVs, are equipped with more and more sensors and intelligence.This generates more data which can be used to manage different aspects of transportation in the electrified transportation era of the near future.For example, mitigation of transportation network congestion is proposed in [25] using big data and cluster analysis techniques to group/cluster different localities based on the traffic flow.Then, rerouting of EVs is considered to facilitate a smooth flow of traffic under different network congestion levels.More research is required on this topic to facilitate the increased penetration of EVs and to mitigate the existing congestion problems in the transportation sector.

Conclusions
This article presents a three-step analysis to review the application of clustering methods for different problems related to EVs.First, an overview of different existing clustering methods is provided.Then, the application of different clustering methods for diverse areas in EVs is reviewed.Finally, the research gaps in the existing literature are identified and future research directions are outlined.
The analysis has shown that the application of cluster analysis has gained popularity in the area of electromobility, and a number of studies have used clustering methods to address different related problems.The most widely applied areas identified in this study are the application of clustering methods to model EV user behavior, the EV driving cycle, the classification of used EV batteries, and clustering of EV charging stations.In addition, several potential areas have been identified in which the application of cluster analysis can bring new benefits.The prospective areas identified in this study are mitigation of the EV impact on distribution systems, development and coordination of charging infrastructure during emergencies, issues of equity and disparities in rebate allocations, and the use of big data with cluster analysis to assist transportation network management.

Figure 4 .
There are several subcategories of the agglomerative hierarchical clustering algorithms: Electronics 2023, 12, x FOR PEER REVIEW 4 of 24
J.C. Dunn developed fuzzy c-means (FCM) clustering in 1973, and J. C. Bezdek improved it in 1981
J.C. Dunn developed fuzzy c-means (FCM) clustering in 1973, and J. C. Bezdek improved it in 1981

Figure 7 .
Figure 7. Overview of K-means based EV clustering method proposed in [18].

Figure 7 . 24 Figure 8 .
Figure 7. Overview of K-means based EV clustering method proposed in [18].3.1.2.DBSCAN AlgorithmThe DBSCAN algorithm has also been used by a number of studies to cluster EVs based on user behavior.For example,Fan et al. [60]  clustered EV users by combining the coefficient matrix with density clustering.The grouping comprises clustering based on user preferences and clustering based on item similarity.User preference clustering refers to grouping based on trust between users.It first computes the correlation coefficient matrix between users and then performs density clustering based on the similarity matrix.The premise of density clustering is that samples of the same category are strongly connected; that is, samples of the same category must be close to each sample in this category.A cluster category is generated by grouping closely related samples into one category.When all samples are divided into distinct groups, the findings of all clustering groups can be collected.Item similarity clustering also refers to the setup of similarity clustering for new EVs.Based on the findings of the user clustering, the score of the new EV configurations in each user group is determined by item similarity.This way, the degree of preference of each type of user for various configurations can be easily understood.An overview of the approach used in [60] for EV clustering is shown in Figure8.Electronics 2023, 12, x FOR PEER REVIEW 11 of 24

Figure 9 .
Figure 9. Overview of hybrid method for EV clustering proposed in [61].

Figure 9 .
Figure 9. Overview of hybrid method for EV clustering proposed in [61].

Table 3 .
Summary of clustering methods along with their objectives for user behavior-based clustering of EVs.

Electronics 2023 ,
12,  x FOR PEER REVIEW 13 of 24 the driving cycle of EVs using K-means clustering.Principal component analysis and K-means clustering are used by Zhang et al.[67] for driving cycle estimation of specialpurpose EVs.The K-means algorithm can also be used for cluster analysis of EV driving cycles.Zhao et al.[68,69] classified driving segments using a hybrid classification technique combining K-means and support vector machine (SVM).In[69], the SVM model training sets comprised the top 10% of optimal driving segments from the K-means clustering results.

Figure 12 .
Figure 12.Overview of EV clustering based on soft clustering proposed in [70].Figure 12. Overview of EV clustering based on soft clustering proposed in [70].

Figure 12 .
Figure 12.Overview of EV clustering based on soft clustering proposed in [70].Figure 12. Overview of EV clustering based on soft clustering proposed in [70].

Figure 13 .
Figure 13.Overview of offline and online method for EV clustering proposed in [72].

Table 4 .
Summary of clustering methods along with their objectives for driving cycle analysis of EVs.
clustering technique To choose driving cycle characteristics[71]

Figure 13 .
Figure 13.Overview of offline and online method for EV clustering proposed in [72].

Figure 14 .
Figure 14.Overview of used battery clustering method proposed in [49].3.3.2.Support Vector Machine-Based Methods SVM-based methods have also been used for grouping of used EV batteries.For example, Li et al. [73] developed an SVN-based approach for clustering and regrouping retired LIBs.Preliminary screening (based on battery capacity, internal resistance, and re-

Figure 14 .
Figure 14.Overview of used battery clustering method proposed in [49].

Electronics 2023 , 24 Figure 15 .
Figure 15.Overview of EV battery clustering based on capacity and internal resistance proposed in [67].

Figure 16 .
Figure 16.Overview of EV battery clustering based on SOC and daily mileage [76].

Figure 15 .
Figure 15.Overview of EV battery clustering based on capacity and internal resistance proposed in [67].

Electronics 2023 , 24 Figure 15 .
Figure 15.Overview of EV battery clustering based on capacity and internal resistance proposed in [67].

Figure 16 .
Figure 16.Overview of EV battery clustering based on SOC and daily mileage [76].

Figure 16 .
Figure 16.Overview of EV battery clustering based on SOC and daily mileage [76].

3. 4
.1.K-Means Clustering K-means is the most widely used clustering technique in general and has also been used for charging station clustering.For example, Sánchez et al.[21] developed a clustering technique based on the K-means algorithm to partition consumers into small zones and identify potential locations of EV charging station.Hence, each centroid of the partition indicates a possible location for a charging station, while each cluster represents a customer's region.An overview of the clustering method used in[21] for charging station clustering is shown in Figure17.Similarly, Chen et al.[78] employed the K-means clustering technique to compute the number of charging stations for EVs and their locations.

Figure 18 .
Figure 18.Overview of demand estimation based on charging station clustering [79].

Figure 19 .Table 6 .
Figure 19.EV coordinated charging scheme based on charging station clustering [44].Table 6.Summary of clustering methods along with their objectives for EV battery grouping.

Table 1 .
Advantages and disadvantages of agglomerative hierarchical clustering methods.

Table 2 .
Advantages and disadvantages of partitional clustering methods.

Table 5 .
Summary of clustering methods along with their objectives for EV battery grouping.

Table 6 .
Summary of clustering methods along with their objectives for EV battery grouping.

Table 7 .
Summary of representative papers from each EV clustering category.