Prediction of Charging Demand of Electric City Buses of Helsinki, Finland by Random Forest

: Climate change, global warming, pollution, and energy crisis are the major growing concerns of this era, which have initiated the electriﬁcation of transport. The electriﬁcation of roadway transport has the potential to drastically reduce pollution and the growing demand for energy and to increase the load demand of the power grid, thereby giving a rise to technological and commercial challenges. Thus, charging load prediction is a crucial and demanding issue for maintaining the security and stability of power systems. During recent years, random forest has gained a lot of popularity as a powerful machine learning technique for classiﬁcation as well as regression analysis. This work develops a random forest (RF)-based approach for predicting charging demand. The proposed method is validated for the prediction of public e-bus charging


Introduction
Climate change, global warming, pollution, and energy crisis are the major growing concerns of this era that have initiated the electrification of transport. Vehicles act as the major emitters of air pollutants ranging from nitrogen oxides (NOx) to the particulate matters (PM) affecting human health as well as the environment [1][2][3][4]. It is estimated by European Environmental Agency that road transport contributes about 70% of NO 2 and 30% of PM [1]. The electrification of transport has received much attention in recent years as a means to control the global concerns regarding pollution and growing energy demand, as well as climate change. It is obvious that the electrification of transport will increase the load demand drastically. The electrification of transport in an unplanned and uncoordinated manner may have a serious impact on the power grid, such as voltage profile deterioration, harmonics, degraded reliability indices, and transformer overloading [5][6][7][8][9][10][11][12]. However, the negative impacts of electric vehicle (EV) charging can be drastically reduced by a suitable energy management strategy. Thus, EV charging load prediction is an important issue for maintaining the smooth and hassle-free operation of the power distribution network.
In recent years, the accurate prediction and forecasting of the EV charging load has received a lot of research focus. Table 1 presents a systematic review of the existing research work on this arena. In [13], a wavelet-decomposition-based approach was used for charging demand prediction of central road, which is an urban area of Sri Lanka. In [14], a Markov chain and graph-theory-based approach was applied for predicting the charging demand of the private EVs operating in Seoul, South Korea. In [15], the authors used different machine learning techniques, e.g., gradient boosting and support vector machine (SVM), for the charging demand prediction of Nebraska, USA. Furthermore, the performances of gradient boosting and SVM on the charging demand prediction were compared based on the mean square error (MSE) and root mean square error (RMSE). In [16], a probabilistic approach based on the normal distribution was employed for the charging demand prediction of  Figure 1 presents a quantitative analysis of the reported research work. From Figure 1, it is observed that there is relatively less work that investigates the charging demand prediction for buses as compared to private EVs. However, during recent years, many countries, such as China, USA, Norway, Finland, Germany, India, Russia, and Sweden [31][32][33][34][35][36], have placed emphasis on electrifying bus routes. The electrification of bus routes will drastically increase the load demand of the power grid, thereby giving rise to technological and commercial challenges. Driven by the aforementioned factors, the present work focuses on the prediction of e-bus charging demand. This work proposes a novel random-forest-based approach for the prediction of e-bus charging demand. Random forest is a flexible supervised machine learning algorithm that can be used for both classification and regression problems [37,38]. Moreover, random forest reduces overfitting and can be used for both categorical and continuous values, which automates missing data. The applications of random forest algorithm are found in diverse areas, e.g., feature selection [39], remote sensing [40], forest carbon mapping [41], hail forecasting [42], android malware detection [43], image classification [44], and air quality classification [45]. Hence, inspired by the superior performances of the random forest algorithm in dealing with such a wide range of real-life problems, our work uses a random-forest-based approach for charging demand prediction. The model was validated for Helsinki, Finland. Helsinki Regional Transport (HSL) has plans to make one in every three buses operating in Helsinki electric by 2025 [46]. Moreover, the EV developer Linkker has loaned several e-buses to HSL for pilot projects [46]. Therefore, it is expected that Helsinki is a good test case for the validation of the proposed approach. In comparison to the existing results, the key contributions of our research work are:

•
A novel random-forest-based approach for the prediction of e-bus charging demand; • Comparison of the random forest algorithm with other state-of-the-art methods of coping with the charging demand prediction problem; • Guide for bus electrification of Helsinki, Finland.
[25] 2018 E bus Markov model Shenzhen, China [26] 2017 E taxi Monte Carlo Ideal city with E taxi [27] 2020 E bus GA and DP Ideal city with E bus [28] 2018 Private EV Elastic coefficient method Chengdu, China [29] 2019 E bus Reinforcement learning St. Albert Transit, AB, Canada [30] 2018 E bus MHO Edmonton Transit Service (ETS) Figure 1 presents a quantitative analysis of the reported research work. From Figure  1, it is observed that there is relatively less work that investigates the charging demand prediction for buses as compared to private EVs. However, during recent years, many countries, such as China, USA, Norway, Finland, Germany, India, Russia, and Sweden [31][32][33][34][35][36], have placed emphasis on electrifying bus routes. The electrification of bus routes will drastically increase the load demand of the power grid, thereby giving rise to technological and commercial challenges. Driven by the aforementioned factors, the present work focuses on the prediction of e-bus charging demand. This work proposes a novel random-forest-based approach for the prediction of e-bus charging demand. Random forest is a flexible supervised machine learning algorithm that can be used for both classification and regression problems [37,38]. Moreover, random forest reduces overfitting and can be used for both categorical and continuous values, which automates missing data. The applications of random forest algorithm are found in diverse areas, e.g., feature selection [39], remote sensing [40], forest carbon mapping [41], hail forecasting [42], android malware detection [43], image classification [44], and air quality classification [45]. Hence, inspired by the superior performances of the random forest algorithm in dealing with such a wide range of real-life problems, our work uses a random-forest-based approach for charging demand prediction. The model was validated for Helsinki, Finland. Helsinki Regional Transport (HSL) has plans to make one in every three buses operating in Helsinki electric by 2025 [46]. Moreover, the EV developer Linkker has loaned several e-buses to HSL for pilot projects [46]. Therefore, it is expected that Helsinki is a good test case for the validation of the proposed approach. In comparison to the existing results, the key contributions of our research work are: • A novel random-forest-based approach for the prediction of e-bus charging demand; • Comparison of the random forest algorithm with other state-of-the-art methods of coping with the charging demand prediction problem; • Guide for bus electrification of Helsinki, Finland.

Random Forest Algorithm
As previously mentioned, random forest is a supervised machine learning technique that can be used for handling both classification and regression problems [38]. The random forest is based on ensemble learning, combining multiple classifiers for solving an intricate problem. Random forest contains a number of decision trees on different subsets of the datasets. It takes the mean to improve the overall prediction accuracy of the given datasets [38,39]. Random forest takes the prediction from each tree, and based on the majority votes of predictions, it can predict the final output [38]. The accuracy of the random forest is directly proportional to the number of trees used. The working principle of random forest is shown in Figure 2.
the majority votes of predictions, it can predict the final output [38]. The accuracy of the random forest is directly proportional to the number of trees used. The working principle of random forest is shown in Figure 2.
Some of the advantages of random forest are as follows: 1. It reduces overfitting and helps to improve accuracy; 2. It is flexible and can be used for classification as well as regression problems; 3. It works well with both categorical and continuous values; 4. It automates missing values present in the data; 5. Normalizing the data is not required, as it uses a rule-based approach.

Figure 2.
Working principle of random forest.

Dataset Generation
A dataset needs to be fed to the random forest model for training and testing. In this work, a historical dataset is built for a one-year period. The transition from conventional vehicles to EVs is still in a nascent stage. Therefore, real time data for charging demand prediction is not always available. However, the authors would like to state that the synthetic data was generated based on the real-world data of bus timetables.
The algorithm used for generating the historical dataset is given in Figure 3. In our approach, the charging datasets were created separately for all the charging centers (only depot charging was considered). The energy consumed and charging time of the e-buses depended on the arrival time and the state of charge (SOC) of the buses. The SOC was dependent on the distances travelled by the buses [47]. Some of the advantages of random forest are as follows: 1.
It reduces overfitting and helps to improve accuracy; 2.
It is flexible and can be used for classification as well as regression problems; 3.
It works well with both categorical and continuous values; 4.
It automates missing values present in the data; 5.
Normalizing the data is not required, as it uses a rule-based approach.

Dataset Generation
A dataset needs to be fed to the random forest model for training and testing. In this work, a historical dataset is built for a one-year period. The transition from conventional vehicles to EVs is still in a nascent stage. Therefore, real time data for charging demand prediction is not always available. However, the authors would like to state that the synthetic data was generated based on the real-world data of bus timetables.
The algorithm used for generating the historical dataset is given in Figure 3. In our approach, the charging datasets were created separately for all the charging centers (only depot charging was considered). The energy consumed and charging time of the e-buses depended on the arrival time and the state of charge (SOC) of the buses. The SOC was dependent on the distances travelled by the buses [47].

Charging Demand Prediction
The charging demand prediction made by the random forest model is elaborated in Section 2. The random forest is based on ensemble learning, which combines multiple classifiers for dealing with an intricate problem. A random forest contains a number of decision trees on different subsets of the datasets. It takes the mean to improve overall prediction accuracy of the given dataset [38,39]. A random forest takes the prediction from each tree and, based on the majority votes of the predictions, it can predict the final output [38]. The prediction accuracy of the random forest is directly proportional to the number of trees. Our step-by-step approach for the charging demand prediction is shown in Figure 4.

Charging Demand Prediction
The charging demand prediction made by the random forest model is elaborated in Section 2. The random forest is based on ensemble learning, which combines multiple classifiers for dealing with an intricate problem. A random forest contains a number of decision trees on different subsets of the datasets. It takes the mean to improve overall prediction accuracy of the given dataset [38,39]. A random forest takes the prediction from each tree and, based on the majority votes of the predictions, it can predict the final output [38]. The prediction accuracy of the random forest is directly proportional to the number of trees. Our step-by-step approach for the charging demand prediction is shown in Figure 4.

Bus Network and Charging Dataset
The proposed prediction model was validated for a selected bus line of Helsinki, Finland. The line 11 operating from Tapiola to Friisilä in Espoo was chosen for validation of the model [48]. The details of the buses operating in that route are given in Table 2. The specifications of the e-buses considered in the analysis are in Table 3. The trip details computed based on the bus timetable and e-bus specifications are presented in Table 4. This work considers that only depot charging is sufficient for the e-buses. Thus, the charging prediction is performed for the following locations:

Bus Network and Charging Dataset
The proposed prediction model was validated for a selected bus line of Helsinki, Finland. The line 11 operating from Tapiola to Friisilä in Espoo was chosen for validation of the model [48]. The details of the buses operating in that route are given in Table 2. The specifications of the e-buses considered in the analysis are in Table 3. The trip details computed based on the bus timetable and e-bus specifications are presented in Table 4. This work considers that only depot charging is sufficient for the e-buses. Thus, the charging prediction is performed for the following locations:   The charging dataset was prepared based on the methodology shown in Figure 2.

Charging Demand Prediction
The charging demand prediction was performed for the seven charging locations of the e-buses by the methodology in Section 3. The actual and predicted charging demands for the seven charging locations for weekdays, as well as weekends, are shown graphically in Figures 5-18, respectively. It was observed that the charging load of weekdays was more than weekends for all the charging locations. This was due to the fact that buses are less frequent on weekends. The figures also indicate the effectiveness of the random forest in predicting the charging load.

Sensitivity Analysis
The impact of different parameters of the random forest model on its performance is explored in this section. The performance comparison was conducted by comparing performance indices, such as the accuracy, r score, MSE, MAE, and RMSE. The impacts of n_estimators and random_state, which are hyperparameters of the model, on the performance of the random forest were investigated. Table 5 reports the impact of n_estimators on the performance of random forest with random_state = 0 for the seven charging datasets. It was observed for dataset 1, dataset 4, and dataset 7 that the model performed best when n_estimators = 100. In addition, for dataset 2, dataset 3, dataset 5, and dataset 6, the model performed the best when n_estimators = 150. Table 6 gives the impact of random_state on the performance of random forest with n_estimators = 100 for the seven charging datasets. It can be discovered that the model performed the best when random_state = 0 for all the datasets except for dataset 6. For dataset 6, the model performed the best when random_state = 5. the e-buses by the methodology in Section 3. The actual and predicted charging d for the seven charging locations for weekdays, as well as weekends, are shown grap in Figures 5-18, respectively. It was observed that the charging load of weekda more than weekends for all the charging locations. This was due to the fact that bu less frequent on weekends. The figures also indicate the effectiveness of the random in predicting the charging load.   for the seven charging locations for weekdays, as well as weekends, are shown grap in Figures 5-18, respectively. It was observed that the charging load of weekda more than weekends for all the charging locations. This was due to the fact that bu less frequent on weekends. The figures also indicate the effectiveness of the random in predicting the charging load.

Sensitivity Analysis
The impact of different parameters of the random forest model on its performance is explored in this section. The performance comparison was conducted by comparing performance indices, such as the accuracy, r score, MSE, MAE, and RMSE. The impacts of n_estimators and random_state, which are hyperparameters of the model, on the performance of the random forest were investigated. Table 5 reports the impact of n_estimators on the performance of random forest with random_state = 0 for the seven charging datasets. It was observed for dataset 1, dataset 4, and dataset 7 that the model performed best when n_estimators = 100. In addition, for dataset 2, dataset 3, dataset 5, and dataset 6, the model performed the best when n_estimators = 150. Table 6 gives the impact of random_state on the performance of random forest with n_estimators = 100 for the seven charging datasets. It can be discovered that the model performed the best when random_state = 0 for all the datasets except for dataset 6. For dataset 6, the model

Sensitivity Analysis
The impact of different parameters of the random forest model on its performance is explored in this section. The performance comparison was conducted by comparing performance indices, such as the accuracy, r score, MSE, MAE, and RMSE. The impacts of n_estimators and random_state, which are hyperparameters of the model, on the performance of the random forest were investigated. Table 5 reports the impact of n_estimators on the performance of random forest with random_state = 0 for the seven charging datasets. It was observed for dataset 1, dataset 4, and dataset 7 that the model performed best when n_estimators = 100. In addition, for dataset 2, dataset 3, dataset 5, and dataset 6, the model performed the best when n_estimators = 150. Table 6 gives the impact of random_state on the performance of random forest with n_estimators = 100 for the seven charging datasets. It can be discovered that the model performed the best when random_state = 0 for all the datasets except for dataset 6. For dataset 6, the model performed the best when random_state = 5.

Performance Comparison
The performance of the random forest model in coping with the charging demand prediction problem is compared with the SVM. Table 7 reports the MSE, MAE, and RMSE values computed by the SVM, as well as RF, for the seven charging datasets. It is apparent that the values of MSE, MAE, and RMSE are much less by the RF for all the datasets as compared to the SVM, thereby indicating the superior performances of the RF.

Conclusions
Global warming, crisis of energy, and a degraded air quality index have compelled electrification of the transport sector. Public e-buses are the first candidates for electrification since the majority of public transport is dependent on them. Electrification of the public e-buses will increase the load demand of the power grid, thus creating technological and commercial challenges. Indeed, prediction of the e-bus charging load is a crucial issue for maintaining the smooth and hassle-free operation of the power system. The present work proposes a random-forest-based model for the prediction of e-bus charging demand for the city of Helsinki, Finland. The model was validated for seven charging datasets for route 11 of Helsinki city. The simulation results show that the proposed approach is very capable of predicting the charging demand accurately for all the datasets. Furthermore, the effect of different parameters of random forest on its performance were tested for all the datasets, and the optimal values of n_estimators and random_state were obtained. The performance of the random forest was also compared with that of SVM for all seven charging datasets.
The computed values of MSE, RMSE, and MAE indicate the superiority of random forest over SVM. Our approach can be used by charging station operators to effectively predict the charging demand and to manage the charging load. The EV load was predicted in this work for the public e-buses of Helsinki, which can be called categorical to some extent. The EV load demand prediction for the non-categorical EV (both buses and private EVs) will be considered in our future work, which mainly focuses on the following issues:

•
The use of more sophisticated machine learning techniques for charging demand prediction; • Charging demand prediction for private electric vehicles; • Use of the gene expression programming method for charging demand prediction.