Machine Learning for Solving Charging Infrastructure Planning Problems: A Comprehensive Review

: As a result of environmental pollution and the ever-growing demand for energy, there has been a shift from conventional vehicles towards electric vehicles (EVs). Public acceptance of EVs and their large-scale deployment raises requires a fully operational charging infrastructure. Charging infrastructure planning is an intricate process involving various activities, such as charging station placement, charging demand prediction, and charging scheduling. This planning process involves interactions between power distribution and the road network. The advent of machine learning has made data-driven approaches a viable means for solving charging infrastructure planning problems. Consequently, researchers have started using machine learning techniques to solve the aforementioned problems associated with charging infrastructure planning. This work aims to provide a comprehensive review of the machine learning applications used to solve charging infrastructure planning problems. Furthermore, three case studies on charging station placement and charging demand prediction are presented. This paper is an extension of: Deb, S. (2021, June). Machine Learning for Solving Charging Infrastructure Planning: A Comprehensive Review. In the 2021 5th International Conference on Smart Grid and Smart Cities (ICSGSC) (pp. 16–22). IEEE. I would like to conﬁrm that the paper has been extended by more than 50%.


Introduction
Global energy consumption is increasing at an alarming rate, and the transportation sector is one of the largest consumers [1]. It was found that, in 2019, in the US, approximately 28% of the net energy consumption was involved in moving people and goods [2]. Furthermore, it was reported that the transport sector is one of the major agents of air pollution [3][4][5]. The paradigm shift from internal combustion engine (ICE)-driven vehicles to EVs is a viable way to mitigate the serious concerns regarding the energy crisis and air pollution. The large-scale adoption of EVs requires fully operational charging infrastructure. Charging infrastructure planning involves interactions between both the road and power distribution network. Charger placement at weak points in the power distribution network and uncoordinated charging can result in voltage instability, increased power losses, harmonic distortions, and degraded reliability indices [6][7][8][9][10][11][12]. Furthermore, charging infrastructure planning must also take into account the convenience of EV drivers, for example, the accessibility of the charging stations, and the waiting time in the charging stations [13]. Moreover, smart coordinated charging is preferred over uncoordinated charging to tackle the detrimental impact of EV charging on the grid [14]. Charging infrastructure planning is a multifaceted problem involving a number of decision variables, objective functions, and constraints. Researchers have used heuristics [15,16], metaheuristics [17], machine learning [18], and game theory [19,20] for solving these problems.
In recent years, the advent of machine learning has made data-driven approaches popular for solving charging infrastructure planning problems. Consequently, researchers started using machine learning techniques to solve the problems associated with charging infrastructure planning, such as charging station placement, charging demand prediction,

•
A comprehensive review of the applications of machine learning algorithms for charging infrastructure planning; • Qualitative and quantitative analyses of the reported literature; • Recommendations regarding the suitability of machine learning algorithms for solving charging infrastructure planning problems; • Case studies on charging hotspot identification and charging demand prediction.

Overview of Charging Infrastructure Planning
Charging infrastructure planning is a prerequisite for the large-scale adoption of EVs. The different activities associated with charging infrastructure planning are shown in Figure 1. Charging demand prediction involves the prediction of the demand of charging services at different times of the day and in different locations. Charging station placement is a typical planning problem centered on the optimal allocation and sizing of charging stations, which takes into consideration the economic factors, the operating parameters of the distribution network, and EV drivers' convenience. Charger utilization computation involves computing how much a charger is utilized or how many charging events a charger has served. Charging scheduling involves managing the charging activities based on the charging demand and load profile, while keeping in mind that the power grid must not be overloaded. services at different times of the day and in different locations. Charging station placement is a typical planning problem centered on the optimal allocation and sizing of charging stations, which takes into consideration the economic factors, the operating parameters of the distribution network, and EV drivers' convenience. Charger utilization computation involves computing how much a charger is utilized or how many charging events a charger has served. Charging scheduling involves managing the charging activities based on the charging demand and load profile, while keeping in mind that the power grid must not be overloaded.

Machine Learning Techniques
In machine learning, the computer learns from previous experience without any explicit programming [51]. In this context, experience refers to the dataset that the algorithm uses to train itself [52]. With time and learning experience, the models can accurately predict trends, thereby providing predictive analysis [51]. Typically, machine learning algorithms are categorized into supervised and unsupervised learning algorithms [51,53,54]. Furthermore, depending on the type of variable, the problems that machine learning algorithms approach can be divided into regression problems and the classification problems [51]. If the response variable is continuous, it is called a regression problem [51]; if the response variable is categorical, it is called a classification problem [51]. In the context of charging infrastructure planning, charging demand prediction is a regression problem, as the response variable is continuous. On the other hand, the identification of charging hotspots is a classification problem because the response variable is categorical.
Data partitioning in machine learning is the division of all data available into two or three nonoverlapping sets: the training set, the validation set, and the test set. The parameters of the model were fitted to the available data, and the model demonstrated high prediction accuracy on these data. Partitioning can be performed by different techniques, such as harsh partitioning, list partitioning, and composite partitioning [18].
The classification of machine learning algorithms is shown in Figure 2. Detailed descriptions of these groups are provided in the subsequent subsections.

Machine Learning Techniques
In machine learning, the computer learns from previous experience without any explicit programming [51]. In this context, experience refers to the dataset that the algorithm uses to train itself [52]. With time and learning experience, the models can accurately predict trends, thereby providing predictive analysis [51]. Typically, machine learning algorithms are categorized into supervised and unsupervised learning algorithms [51,53,54]. Furthermore, depending on the type of variable, the problems that machine learning algorithms approach can be divided into regression problems and the classification problems [51]. If the response variable is continuous, it is called a regression problem [51]; if the response variable is categorical, it is called a classification problem [51]. In the context of charging infrastructure planning, charging demand prediction is a regression problem, as the response variable is continuous. On the other hand, the identification of charging hotspots is a classification problem because the response variable is categorical.
Data partitioning in machine learning is the division of all data available into two or three nonoverlapping sets: the training set, the validation set, and the test set. The parameters of the model were fitted to the available data, and the model demonstrated high prediction accuracy on these data. Partitioning can be performed by different techniques, such as harsh partitioning, list partitioning, and composite partitioning [18].
The classification of machine learning algorithms is shown in Figure 2. Detailed descriptions of these groups are provided in the subsequent subsections.

Supervised Learning
As the name indicates, supervised machine learning models are trained by labeled datasets [51,55,56]. The dataset contains the input variable and target variable. Model learning is iterative in nature and works by mapping between the input and target output assisted by optimization [51]. As shown in Figure 2, supervised learning can be divided into five types. In the linear regression model, there is a linear relationship between the input variable and the target variable [51]. Linear regression can be used for regression problems and for linearly separable datasets [54]. Decision trees can be used for both regression and classification problems [54]. Decision trees separate complex decisions into simpler decisions using split points [54,57,58]. In the random forest technique, several decision trees are aggregated for the purpose of prediction [59,60]. A support vector machine (SVM) is mainly used for classification problems, but can also be utilized for regression problems [61,62]. An SVM separates the classes with the best hyperplane, which maximizes the marginal difference between the classes [18,62]. The training time for an SVM is long, and therefore, it is not suitable for large datasets [61,62]. K-nearest neighbors (KNN) can be used for both regression and classification problems [18,63,64]. However, it is mostly used for classification problems [18]. KNN does not require a dedicated training phase and it is associated with a lazy learning phase [18,63,64].

Supervised Learning
As the name indicates, supervised machine learning models are trained by labeled datasets [51,55,56]. The dataset contains the input variable and target variable. Model learning is iterative in nature and works by mapping between the input and target output assisted by optimization [51]. As shown in Figure 2, supervised learning can be divided into five types. In the linear regression model, there is a linear relationship between the input variable and the target variable [51]. Linear regression can be used for regression problems and for linearly separable datasets [54]. Decision trees can be used for both regression and classification problems [54]. Decision trees separate complex decisions into simpler decisions using split points [54,57,58]. In the random forest technique, several decision trees are aggregated for the purpose of prediction [59,60]. A support vector machine (SVM) is mainly used for classification problems, but can also be utilized for regression problems [61,62]. An SVM separates the classes with the best hyperplane, which maximizes the marginal difference between the classes [18,62]. The training time for an SVM is long, and therefore, it is not suitable for large datasets [61,62]. K-nearest neighbors (KNN) can be used for both regression and classification problems [18,63,64]. However, it is mostly used for classification problems [18]. KNN does not require a dedicated training phase and it is associated with a lazy learning phase [18,63,64].
There is also another class known as semi-supervised learning. Semi-supervised learning is an innovative approach to machine learning that combines a small amount of labeled data and a large amount of unlabeled data during training. Semi-supervised learning falls between unsupervised learning (with no labeled training data) and supervised learning (with only labeled training data) [52]. There is also another class known as semi-supervised learning. Semi-supervised learning is an innovative approach to machine learning that combines a small amount of labeled data and a large amount of unlabeled data during training. Semi-supervised learning falls between unsupervised learning (with no labeled training data) and supervised learning (with only labeled training data) [52].

Unsupervised Learning
In the case of unsupervised learning, the training dataset comprises the input variable only [18,65,66]. The key goal of this model is to find patterns within the dataset using clustering [18,65,66]. The subdivisions of unsupervised learning are as illustrated in Figure 2. In k clustering, individual datapoints form k clusters, wherein each and every point is assigned to k center points at the beginning in a random fashion [18,67], and later datapoints are assigned to the nearest centers based on new datapoint calculations. The Gaussian mixture model (GMM) is a probabilistic learning model that has the capacity to represent subpopulations of normal distribution by considering multiple normal distributions of the dataset in use [18]. The kernel density estimator (KDE) is used in the case of a nonparametric probability density function [18].

Performances of Machine Learning Algorithms
The performances of different machine learning algorithms can be compared on the basis of some metrices. For regression models, root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) are some of the metrices for performance evaluation [18].
Ideally, the difference between the predicted value y i , and the target value y i , should be small. For the classification problem, the evaluation metrices are accuracy, precision, and F1 score [18], as follows:

Machine Learning for Charging Infrastructure Planning
Applications of machine learning techniques for solving different charging infrastructure planning problems are shown in Figure 1.

Machine Learning for Charging Station Placement
The charging station problem involves determining the locations and sizes of chargers. In [68], the authors provided an optimal wireless charging station placement scheme for electric trams by applying an algorithm that hybridizes the genetic algorithm (GA) and reinforcement learning (RL). The integration of GA with reinforcement learning improved the performance of GA by preventing it from becoming stuck in local optima. The superior performance of the hybrid GA RL algorithm as compared with the standalone algorithms is illustrated in Table 2. In [69], the authors provided a novel scheme for placing new charging stations that utilizes the maximization utilization rate of chargers as the objective function. The problem was solved by hierarchal clustering [70]. In [71], the authors categorized charging stations as top ranked and bottom ranked using the linear regression model and decision trees. The simulation results established the superiority of the linear regression model over decision trees. In [72], a cellular automaton agent-based model was proposed to study different EV deployment scenarios.

Machine Learning for Charging Demand Prediction
Accurately predicting the charging load is crucial for charging infrastructure planning and the large-scale adoption of EVs. In [73], the authors presented a novel scheme for predicting the aggregated load demand of buildings in the presence of EVs that utilizes a methodology based on feature selection and an enhanced SVM. In [74], the authors predicted the charging load of the UCLA campus by applying a modified pattern sequencebased technique. In [75], the authors used a deep learning approach to estimate multiscale EV charging demand. Moreover, in [76,77], an enhanced deep learning-based approach was used for charging load prediction. In [78], the authors used a hybrid ant lion algorithm and deep learning for charging demand prediction. In [79], the authors proposed a hybrid KDE using both Gaussian and diffusion-based KDE (GKDE and DKDE) to predict the stay duration and charging demand of EVs. In [80], authors employed a generalized regression neural network (GRNN) model to predict the charging load. In [81], the authors predicted the charging demands of electric bus charging stations using an SVM and the wolf pack algorithm. In [82], the authors compared the performances of different deep learning approaches as applied to the charging demand prediction problem, and concluded that the long short-term memory (LSTM) method performed best, as it reduced the forecasting error by over 30%. In [83], the authors used a regression model to predict the charging load. In [84], the authors compared the time series approach with machine learning techniques, such as the random forest technique and the regression model, as applied to the charging demand prediction problem. The simulation results established the superiority of machine learning techniques over the time series approach. In [85], the authors used ensemble learning to predict household EV charging demand. Ensemble learning is a machine learning technique that leans by evaluating the results from different machine learning models. In the aforementioned work, the ensemble learning model was based on the results of the random forest, gradient boosting, adaptive boosting, and regression techniques. In [86], the authors used the k-nearest neighbors method for charging demand prediction. In [87], the authors applied a neural network to predict the charger occupancy for an EV charging station in an urban area.

Machine Learning for Charging Scheduling
The management of charging activities at charging stations is important to avoid sudden increases in the peak load demand. In [88], the authors considered the operational benefit of EVs by focusing on vehicle-to-grid (V2G) technology and scheduled EV charging at charging stations using reinforcement learning. In [89], the authors proposed a demand response method for long-term charging cost reduction and provided a charging schedule for EVs. The solution was based on reinforcement learning. In [90], the authors proposed a constrained EV charge scheduling strategy and utilized reinforcement learning for this. In [91], the authors formulated charging scheduling as a NP-hard problem and found a solution using reinforcement learning. In [90], the authors proposed an artificial neural network (ANN) for solving charging scheduling and suggested adopting a smart pricing strategy at charging stations. In [92], the authors identified the best charging time for EVs in a fast-charging station integrated with a smart grid using the Q-learning method. In [93,94], the authors solved the charging scheduling problem using reinforcement learning. In [95,96], the authors used multiagent reinforcement learning for charging scheduling and proposed a dynamic pricing strategy. In [97], the authors proposed a reinforcement learning-based approach for optimizing the charging scheduling and pricing strategies of a public EV charging station. In [98], the authors used reinforcement learning to regulate charging scheduling for electric buses in a charging station in a smart grid environment.

Machine Learning for Charger Utilization Prediction
Estimating the charger utilization rate is essential for the expansion of the charging infrastructure. In [99], the authors predicted EV charging station usage using an ANN. In [100], the authors used the linear regression model to compute the charger idle time for a dataset in the Netherlands. In [101], the authors used the linear regression model to predict the charger utilization rate, assuming a nonlinear charging profile.

Literature Review Summary
A summary of the research reported in the previous section is presented in Table 3. Furthermore, a quantitative analysis of the reported literature is presented in Figure 3. From Figure 3, it is clear that machine learning techniques can be successfully applied to charging demand prediction problems.

Home Charging Hotspot Prediction for Helsinki, Finland
Charging hotspots are points with relatively high charging demand throughout the day. It is expected that, during the initial stages of EV deployment, the majority of charging activity will take place at home. Hence, identifying home charging hotspots is necessary. In this work, we identified home charging hotspots for the city of Helsinki. The

Home Charging Hotspot Prediction for Helsinki, Finland
Charging hotspots are points with relatively high charging demand throughout the day. It is expected that, during the initial stages of EV deployment, the majority of charging activity will take place at home. Hence, identifying home charging hotspots is necessary. In this work, we identified home charging hotspots for the city of Helsinki. The charging behavior and schedule of EV drivers in Helsinki specifically concerning home charging was modeled using the Activity-Based Transport Model (ABTM) [103,104]. A data-driven approach was adopted to identify the charging hotspots. The output of the ABTM model was utilized as an input with which to evaluate the charging hotspots. In this scenario, it was considered that the EV drivers charged their vehicles at home at the end of their journeys. The data-driven approach used for the identification of home charging hotspots is shown in Figure 4. Moreover, the home charging hotspots computed using the methodology shown in Figure 4 are presented in Table 4. 14, x FOR PEER REVIEW 11 . Figure 4. Flowchart for the computation of home charging hotspots [105].  In addition to home charging, commercial public charging stations will be required for the large-scale adoption of EVs. Therefore, the identification of commercial public charging hotspots is also essential. A data-driven methodology was used for the identification of charging hotspots for Dundee city council, as shown in Figure 5. The identified charging hotspots are presented in Table 5.  Figure 5. Flowchart for the computation of public commercial charging hotspots [105].

Charging Demand Prediction for Helsinki, Finland
Predicting the charging demand in advance will assist in the smart and effective management of the charging load. In this work, a case study on charging demand prediction using the random forest technique for e-buses and private EVs in Helsinki is presented. The RF model was validated for Leepavara, which is a commercial shopping hub in Espoo, Finland. The e-buses charging dataset was generated using the bus timetables available on the HSL website [106][107][108]. Moreover, the charging dataset for private EVs was generated using the Bayesian network (BN)-based approach [109][110][111] proposed in [112]. The congestion levels in the city, as recorded using the Tom application [113] and shown in Figure 6, and the typical traffic conditions in Leepavara, as shown in Figure 7, were also taken into account while generating the charging dataset for private EVs. The charging demand was predicted using the random forest technique. The target and predicted charging demands are shown in Figure 8.

Discussions
This work comprehensively reviews the applications of machine learning algorithms for solving charging infrastructure planning problems. An overview of charging infrastructure planning is also provided herein. Charging station placement, charging demand prediction, charger utilization computation, and charging scheduling and pricing are some of the activities involved in charging infrastructure planning. Dedicated chargers serve the scheduled EV operations, which can be derived from GTFS and fleet management (i.e., to provide the combined flow of EVs). Different machine learning algorithms, such as supervised learning, reinforcement learning, and ANN, are used extensively for solving these problems. Qualitative and quantitative analyses of the research in this arena are provided. It can be seen that machine learning algorithms can be successfully applied to charging demand prediction and charging scheduling. SVM, deep learning, and random forest techniques are extensively used in charging demand prediction. Moreover, reinforcement learning is widely used for solving the problem of charging scheduling. Three case studies focused on charging infrastructure planning are also provided in this Note: In the above figure, red implies high traffic levels, amber implies moderate traffic levels, and green implies lower traffic levels.

Discussions
This work comprehensively reviews the applications of machine learning algorithms for solving charging infrastructure planning problems. An overview of charging infrastructure planning is also provided herein. Charging station placement, charging demand prediction, charger utilization computation, and charging scheduling and pricing are some of the activities involved in charging infrastructure planning. Dedicated chargers serve the scheduled EV operations, which can be derived from GTFS and fleet management (i.e., to provide the combined flow of EVs). Different machine learning algorithms, such as supervised learning, reinforcement learning, and ANN, are used extensively for solving these problems. Qualitative and quantitative analyses of the research in this arena are provided. It can be seen that machine learning algorithms can be successfully applied to charging demand prediction and charging scheduling. SVM, deep learning, and random forest techniques are extensively used in charging demand prediction. Moreover, reinforcement learning is widely used for solving the problem of charging scheduling. Three case studies focused on charging infrastructure planning are also provided in this work in order to provide real-world examples. The first case study was focused on identifying home charging hotspots in the city of Helsinki, Finland, using a data-driven methodology. One of the main contributions of the first case study is the realization that initial EV adopters will mostly rely on home charging. The second case study identified public charging hotspots for Dundee city council. The identification of charging hotspots in advance will help power grid operators check whether grid reinforcement is required to support the increasing EV adoption. The planning model adopted in this case study performed better than the model reported in [114][115][116][117]. The third case study predicted the charging demand in Helsinki using a hybrid Bayesian network and RF-based methodology. It was observed that the model used for prediction was efficient as compared with the model proposed in [101,102].
This has been an extensive review of the machine learning algorithms utilized for solving different charging infrastructure planning problems. We hope to provide researchers with an analysis of the suitability of machine learning algorithms for charging infrastructure planning problems. However, this work was limited to the charging infrastructure without vehicle grid integration (VGI).

Conclusions
The large-scale deployment of EVs requires sustainable charging infrastructure. This work systematically analyzed the machine learning applications for solving charging infrastructure planning problems. Qualitative and quantitative analyses of the research in this arena are provided herein. It can be seen that machine learning algorithms can be successfully applied in charging demand prediction and charging scheduling. Furthermore, three case studies that focus on charging infrastructure planning are presented. These explored charging station placement and charging demand prediction. We presented an extensive review of the machine learning algorithms utilized in solving different charging infrastructure planning problems. We hope to provide researchers with an analysis of the suitability of machine learning algorithms for charging infrastructure planning problems. However, this work was limited to charging infrastructures without vehicle grid integration (VGI).
We expect this work to attract the attention of researchers working in the areas of e-mobility, optimization, machine learning, power, and energy. Our future research will address some of the following key issues:

•
The use of machine learning in localizing charging hotspots; • A performance comparison of machine learning techniques combined with heuristics and metaheuristics applied to charging infrastructure planning problems; • Planning V2G-enabled charging facilities.