Short-Term Load Forecasting Model of Electric Vehicle Charging Load Based on MCCNN-TCN

: The large ﬂuctuations in charging loads of electric vehicles (EVs) make short-term forecasting challenging. In order to improve the short-term load forecasting performance of EV charging load, a corresponding model-based multi-channel convolutional neural network and temporal convolutional network (MCCNN-TCN) are proposed. The multi-channel convolutional neural network (MCCNN) can extract the ﬂuctuation characteristics of EV charging load at various time scales, while the temporal convolutional network (TCN) can build a time-series dependence between the ﬂuctuation characteristics and the forecasted load. In addition, an additional BP network maps the selected meteorological and date features into a high-dimensional feature vector, which is spliced with the output of the TCN. According to experimental results employing urban charging station load data from a city in northern China, the proposed model is more accurate than artiﬁcial neural network (ANN), long short-term memory (LSTM), convolutional neural networks and long short-term memory (CNN-LSTM), and TCN models. The MCCNN-TCN model outperforms the ANN, LSTM, CNN-LSTM, and TCN by 14.09%, 25.13%, 27.32%, and 4.48%, respectively, in terms of the mean absolute percentage error.


Introduction
The growth of the electric vehicle industry has captivated governments, automakers, and energy companies. EVs are seen as a viable solution to the depletion of fossil resources and rising pollution [1]. It is widely believed that the popularity of EVs can reduce greenhouse gas emissions (mainly carbon dioxide) [2]. Meanwhile, falling battery prices and government incentives will also promote rapid growth in the scale of EVs [3]. However, the increased charging demand resulting from the rapid development of EVs also poses various challenges to the grid. The EV charging load has a great impact on the stable operation of the distribution network [4], including the decline of power quality and the difficulty of optimizing and controlling the operation of the power grid [5,6]. The research on EV charging load forecasting is carried out not only to ensure the economical and stable operation of the power system [7] but also to support the development of EVs [8].
EV charging load forecasting approaches are now separated into probabilistic models, time series models, and machine learning models. The probabilistic modeling method establishes probabilistic models of residents' charging and travel behavior using statistical and queuing theory, followed by load forecasts using Monte Carlo simulation. Taylor J et al. [9] utilized the Monte Carlo method to establish a large-scale charging demand model, considering EV type, penetration rate, charging scenario, etc. In [10], it is assumed that the arrival time of EVs at the charging station follows Poisson distribution, and the charging proposed a "decomposition-predict-reconstruction" prediction model based on empirical mode decomposition (EMD) and LSTM, which effectively improved the accuracy of load prediction.
One-dimensional convolutional neural networks (1DCNN) can extract one-dimensional sequence features, commonly used to extract time series feature information. Wang et al. [32] utilized 1DCNN to extract the fusion features of bearing vibration signal and sound signal to realize bearing fault diagnosis. In [33], the influent load is first decomposed by EMD, and then 1DCNN extracts the latent features of each intrinsic mode function's periodic signal. However, although the 1DCNN model can achieve feature extraction at various time scales by adjusting the scope of the receptive field, it cannot extract the time series dependencies between time series data. With the advent of advanced TCN models that combine the advantages of CNN feature processing and RNN time-domain modeling, it is possible to extract time series dependencies between long intervals of historical data [34]. Yin et al. [35] proposed a feature fusion TCN structure that fuses model output features at multiple time delay scales. The TCN built on the convolutional network can process data in parallel on a large scale and has a faster computing speed than the RNN such as LSTM [36]. Although the signal decomposition method can obtain the components of EV charging load at various time scales, it still necessitates the selection and construction of low-dimensional features with a high degree of differentiation, which not only adds subjectivity and complexity to this identification method but also risks losing important information.
On the basis of the foregoing research, an EV charging load forecasting model based on the MCCNN-TCN is proposed in this paper. The MCCNN model can mine the fluctuation features of EV charging load at multi-time scales. The TCN model can establish the global time-series dependencies between the local time-series feature information at different time scales extracted by the MCCNN model. In addition, accurate load forecasting is frequently reliant on a thorough understanding of the elements that contribute to increasing or decreasing consumer demand [37]. The EV charging load is affected by numerous aspects, including weather temperature, date type, traffic conditions, user travel behavior, etc. [8]. Therefore, this paper introduces the maximum information coefficient (MIC) and Spearman rank correlation coefficient and proposes a similar day method based on weighted gray correlation analysis to screen historical loads. The main contributions of this paper are described as follows: (1) The MIC was applied to eliminate input data redundancy and reduce the complexity of the model. The MIC was used to choose meteorological variables that have a substantial link with EV charging load. The selected meteorological variables were utilized as an input to both the prediction and comparable day selection models; (2) A similar day selection model based on weighted grey relational analysis was proposed. The Spearman rank correlation coefficient of the week average daily load was used to calculate week type similarity. Then, by selecting meteorological variables obtained by MIC and week type similarity as the input, a similar day selection model based on weighted gray correlation analysis was used to choose a similar day load used as the forecasting model's input; (3) An MCCNN-TCN model framework was built. Combining the multi-channel 1DCNN model with the TCN model can establish global temporal dependencies between time series features at multiple time scales, which effectively improves the prediction performance.
The remainder of this paper is organized as follows. In Section 2, a short-term EV charging load forecasting framework based on the MCCNN-TCN model is introduced. In Section 3, experiments are conducted with a real dataset of grid companies and compared with other models. In Section 4, the model proposed in this paper is analyzed compared to other state-of-the-art methods based on experimental results. In Section 5, the paper's conclusions and future research are given. As a new type of electric load, EV charging load is not only related to residents' travel behavior but also affected by meteorological factors such as weather and temperature [38]. In order to lower the input size of the similar day model and forecast model, relevant meteorological features that strongly correlate with EV charging load must be selected [36]. At the same time, since meteorological features and EV charging load are both nonlinear time series, this paper uses MIC to examine the nonlinear relationship between each meteorological variable and EV charging load. Unlike other traditional correlation analysis methods, the benefit of MIC is that it does not require any assumptions about the data distribution and is acceptable for both linear and nonlinear data [39]. The MIC is calculated as follows [40].
For a binary dataset, D and D ∈ R 2 , divide D into a grid of x rows and y columns. The obtained grid G based on different division methods forms set A. Find the maximum mutual information maxI(D|G) in set A, conserve it as: where D|G is the distribution of the binary data set D on the grid G.
The maximum normalized mutual information of the binary dataset D at different scales is formed into the feature matrix M(D), and the elements of the feature matrix are defined as: The MIC is calculated by: where n indicates the size of the sample, B(n) is a function about the size of the sample, and the constraint indicating the total number rc of squares of the grid G is less than B(n), generally B(n) = n 0.6 [41]. A greater MIC value between the two variables indicates a stronger correlation.

Quantifying Week Type Similarity Based on Spearman Correlation Analysis
The characteristics of EV charging load in different months, seasons, and week kinds are investigated in this article to study the relationship between EV charging load and date types. The EV charging load has the maximum consumption level in December and the lowest in April, as shown in Figure A1 in the Appendix A. The consumption level of EV charging load in winter and fall is significantly higher than in spring and summer, and the load in winter represents a tendency of rising first and then reduce. In contrast, the load in summer has a fluctuating and rising trend, as shown in Figure A2 in the Appendix A. EV charging load consumption level is highest on Saturday and lowest on Monday, as shown in Appendix A Figure A3. In summary, it is critical to pay attention to the effect of date type on the charging load of EVs. In this paper, the date types were divided into season types and week types, and the similarity between week types under each season was established as the input of the similar day model. In order to avoid human subjective participation in setting the week types map value, using the average daily EV charging load between week types calculated the similarity between week types in this paper.
The data on electric vehicle charging load do not follow a normal distribution. Additionally, the Spearman coefficient does not require that the data remain normal [42]. As a result, this paper proposes utilizing the Spearman coefficient to quantify the similarity of week types. The week types under each season were divided into seven (Monday to Sunday), and then the Spearman coefficient was calculated for the average daily load between the week types. The correlation value indicative is represented by F h kg , as in (4): where k and g represent the week type; h represents the season, h = 1, 2, 3, 4; n is the load sample number; and A t indicates the difference of the position between the t-th daily load samples of week type k and week type g.

Similar Days Selection Model Based on Weighted Grey Correlation Analysis
When calculating the gray correlation, the traditional gray correlation analysis assigns the same weight to each feature, ignoring each influencing factor's difference [43]. Therefore, each influencing factor's weight is first analyzed based on the improved entropy weight method in this paper. Then the correlation degree between the forecasting day and history day is calculated based on the weighted grey correlation degree analysis.
According to the historical data, the entropy E j of the j-th meteorological feature is calculated [44]: where n is the number of historical days, m indicates the dimension of the day feature; a ij represents the value of the j-th feature of the i-th historical day.
According to the entropy of each meteorological feature, the weight of the j-th day feature based on the improved entropy weight method is calculated as [45]: The correlation coefficient of each day's feature is calculated using gray correlation analysis [18]. The following are the feature sequences of the forecasting and history days: where X d represents the feature sequence of the forecasting day d, X d−i represents the factor sequence of the history day d − i. The correlation coefficient of the j-th feature of X d to X d−i is: where x d (j) and x d−i (j) are the j-th feature of the forecasting day d and the history day d − i, respectively, ρ is the distinguishing coefficient and ρ = 0.5. Based on calculating the grey correlation coefficients ξ of the factors and their weights w, the weighted grey correlation between forecast day d and historical day d − i can be expressed as follows: Energies 2022, 15, 2633 6 of 25 The first 14 days of the forecasting day are defined as a similar day rough set in this paper. Because the capacity of the similar day rough set is limited, it is not assumed that as the date distance increases, the similarity between the forecasting day and the historical day decreases. Furthermore, derived from the past EV charging load data, the average number of days with a Spearman's correlation coefficient larger than 0.4 between the forecasting day and each historic day in the similar day rough set is 3. In addition, the adjacent daily load is added to the similar day set to ensure time consistency between the forecasting day load and the historical day load. According to the above analysis, the size of the similar day set in this paper is 4.

Multi-Channel Convolutional Neural Network and Temporal Convolutional Network Model
Because the charging load of EVs is influenced by various factors, including weather conditions, residents' travel habits, and the traffic network, there is a high level of shortterm volatility, making short-term load forecasting more complex. It was demonstrated that extracting the characteristics of EV charging load at various time scales is an effective strategy for improving prediction accuracy [31]. Different influencing factors affect the features of EV charging load at different time scales. In this regard, the paper proposes the MCCNN-TCN model framework. As illustrated in Figure 1, the model framework is divided into three layers: a multi-channel 1DCNN feature extraction layer, a multi-channel TCN layer, and an output layer. The model framework can extract EV charging load characteristics at various time scales and construct a worldwide time-series dependency between the historical and predicted day loads. The multi-channel 1DCNN is utilized as the gate of the MCCNN-TCN model to extract the local features of the input time series at different time scales. Deepening the TCN network can expand its receptive field, establishing the temporal dependencies between global features. The output layer's job is to create a nonlinear relationship between the forecasting load, meteorological and calendar features, and historical load. Sections 2.1.1 and 2.1.2 show that the meteorological and date factors impact the EV charging load, in addition to the influence of the historical load on the forecasting load. As a result, this paper combines the TCN model's output historical load feature vector with a high-dimensional feature vector derived from meteorological and date features. Then, it is input into a fully connected neural network. The fully connected neural network's output is forecasting day load.
The length of the 1DCNN layer's input feature map is sn, where s is the number of similar days and n is the number of daily load samples. The role of the multi-channel 1DCNN is to extract the features of a one-dimensional time series consisting of EV charging load sequences in similar daily sets at different time scales. The TCN layer takes the output of the multi-channel 1DCNN model as input and captures the global temporal dependencies at different time scales. The BP layer maps the feature composed of the meteorological factors simultaneously as the forecasting day load and the date type of forecasting day to the high-dimensional feature space. The high-dimensional feature vector obtained by integrating the BP model's output and the TCN model's output is used as the input of the fully connected layer in the output layer of the MCCNN-TCN.  Figure 1. Multi-channel convolutional neural network and temporal convolutional network (Where, @ is preceded by the number of channels and followed by the output of the convolution layer).

Multi-Channel 1D Convolutional Network Model
CNN is a great neural network model that uses convolution kernels to extract essential information automatically [46]. Figure 2 shows the basic architecture of the 1DCNN, which can extract latent features in time series using multiple convolution kernels of the same weight. The same convolution kernel obtains a class of related features during the convolution process. Its mathematical model is described as [47]: where i H indicates the input of layer I; Following the convolution operations, the pooling layer uses data downsampling to downsample a huge matrix into a small one, reducing the amount of computation and avoiding overfitting. The pooling layer mathematical model is as follows: where 1 i H  and i H indicate the features before and after pooling, respectively, and "down()" indicates the pooling function.

Multi-Channel 1D Convolutional Network Model
CNN is a great neural network model that uses convolution kernels to extract essential information automatically [46]. Figure 2 shows the basic architecture of the 1DCNN, which can extract latent features in time series using multiple convolution kernels of the same weight. The same convolution kernel obtains a class of related features during the convolution process. Its mathematical model is described as [47]: where H i indicates the input of layer I; H i−1 indicates the output of layer i − 1; W i and b i indicate the weight matrix and the corresponding bias vector of the convolution kernel of layer i, respectively; ⊗ indicates for convolution operation; and f indicates the activation function. Following the convolution operations, the pooling layer uses data downsampling to downsample a huge matrix into a small one, reducing the amount of computation and avoiding overfitting. The pooling layer mathematical model is as follows: where H i−1 and H i indicate the features before and after pooling, respectively, and "down()" indicates the pooling function. As shown in Figure 3, the multi-channel 1DCNN is made up of numerous parallel 1D convolution blocks. The first convolutional layer of the multi-channel 1DCNN has a varied convolution kernel size. Long-term scale characteristics of EV charging load can be extracted using big convolution kernels. Short-time-scale characteristics of EV charging loads can be extracted using little convolution kernels. Rough features of EV charging load at different time scales are obtained after the first convolutional layer. This paper extracts detailed features by adding numerous convolutional layers with a convolution kernel of three to the initial convolutional layer to fully mine the detailed information under various EV charging load time scales. The first convolutional layer kernel size K of each channel is represented as follows: , N is the number of channels. The value of N depends on the length of the input layer time series.
Furthermore, earlier research has revealed that when the depth of the neural network increases, residual connections can effectively handle the problems of gradient disappearance and network overfitting [48]. As a result, each channel of the multi-channel 1DCNN is assigned a residual connection in this paper. The residual connection mathematical model is: where xl+1 is the output of layer l + 1, xl is the input of layer l, and    As shown in Figure 3, the multi-channel 1DCNN is made up of numerous parallel 1D convolution blocks. The first convolutional layer of the multi-channel 1DCNN has a varied convolution kernel size. Long-term scale characteristics of EV charging load can be extracted using big convolution kernels. Short-time-scale characteristics of EV charging loads can be extracted using little convolution kernels. Rough features of EV charging load at different time scales are obtained after the first convolutional layer. This paper extracts detailed features by adding numerous convolutional layers with a convolution kernel of three to the initial convolutional layer to fully mine the detailed information under various EV charging load time scales. The first convolutional layer kernel size K of each channel is represented as follows: where n ∈ ( 1, 2, 3, . . . , N), N is the number of channels. The value of N depends on the length of the input layer time series. Furthermore, earlier research has revealed that when the depth of the neural network increases, residual connections can effectively handle the problems of gradient disappearance and network overfitting [48]. As a result, each channel of the multi-channel 1DCNN is assigned a residual connection in this paper. The residual connection mathematical model is: where x l+1 is the output of layer l + 1, x l is the input of layer l, and F(x l , w l ) is the residual of layer l.  Figure 3. Multi-channel one-dimensional convolutional network.

Temporal Convolutional Network Model
The TCN developed by Bai et al. in 2018 is an algorithm for processing time series [49]. The TCN combines causal convolution, dilated convolution, and residual block to address the problem of extracting long-term time-series information.
The core of TCN is the residual dilated causal convolution unit (RDCCU), which consists of two rounds of dilated causal convolution with the same dilation factor, WeightNorm layer, activation function, Dropout layer, and residual connections formed by direct mapping of the input [35]. Multiple residual dilated causal convolutional units are connected to form a multi-layer TCN network structure, as shown in Figure 4.

Temporal Convolutional Network Model
The TCN developed by Bai et al. in 2018 is an algorithm for processing time series [49]. The TCN combines causal convolution, dilated convolution, and residual block to address the problem of extracting long-term time-series information.
The core of TCN is the residual dilated causal convolution unit (RDCCU), which consists of two rounds of dilated causal convolution with the same dilation factor, WeightNorm layer, activation function, Dropout layer, and residual connections formed by direct mapping of the input [35]. Multiple residual dilated causal convolutional units are connected to form a multi-layer TCN network structure, as shown in Figure 4.  The fundamental core structure of the RDCCU is the dilated causal convolution [50], which is composed of causal convolution and dilated convolution [51]. The structure of the dilated causal convolution is shown in Figure 5. The fundamental core structure of the RDCCU is the dilated causal convolution [50], which is composed of causal convolution and dilated convolution [51]. The structure of the dilated causal convolution is shown in Figure 5. Figure 4. Connection of multiple residual dilated causal convolution units.
The fundamental core structure of the RDCCU is the dilated causal convolution [50], which is composed of causal convolution and dilated convolution [51]. The structure of the dilated causal convolution is shown in Figure 5. Causal convolution refers to obtaining the output of time t through the convolution of elements at time t and earlier in the previous layer. It ensures that there will be no future information leakage, meeting the requirements of power load forecasting. Dilated convolution can expand the receptive field by increasing the dilation factor [52] and capture long enough historical information without increasing the depth of the model [53], which improves the efficiency of model training. Dilated convolution makes the input of the previous layer sampled at intervals, and the dilation factor d of each layer increases exponentially by 2, which can be described as: As illustrated in Figure 5, the kernel size of each dilated causal convolutional layer is 3. The dilation factor d grows from 1 to 4, which raises the effective history of neurons in the output layer from 3 to 15. In addition, to maintain the whole sequence information, Causal convolution refers to obtaining the output of time t through the convolution of elements at time t and earlier in the previous layer. It ensures that there will be no future information leakage, meeting the requirements of power load forecasting. Dilated convolution can expand the receptive field by increasing the dilation factor [52] and capture long enough historical information without increasing the depth of the model [53], which improves the efficiency of model training. Dilated convolution makes the input of the previous layer sampled at intervals, and the dilation factor d of each layer increases exponentially by 2, which can be described as: As illustrated in Figure 5, the kernel size of each dilated causal convolutional layer is 3. The dilation factor d grows from 1 to 4, which raises the effective history of neurons in the output layer from 3 to 15. In addition, to maintain the whole sequence information, each layer's output is zero-padded to match the number of input sequences. The mathematical model of dilated causal convolution is as follows [49]: where x is the input and y is the output. Residual connections are a key structure of the RDCCU. The RDCCU is defined as follows [49]: The output of the multi-channel 1DCNN is arranged in a T*n two-dimensional data structure according to the channel direction and fed into the first RDCCU of the TCN model. The internal procedure of the RDCCU is shown in Figure 6. The width of the convolution kernel of the RDCCU corresponds to the number of input data channels. The number of output channels of this RDCCU is equal to the number of convolution kernels in the RDCCU. The output of the RDCCU is seamed in the channel direction and used as the input to the next RDCCU. structure according to the channel direction and fed into the first RDCCU of the TCN model. The internal procedure of the RDCCU is shown in Figure 6. The width of the convolution kernel of the RDCCU corresponds to the number of input data channels. The number of output channels of this RDCCU is equal to the number of convolution kernels in the RDCCU. The output of the RDCCU is seamed in the channel direction and used as the input to the next RDCCU.

Results
The subject of the study in the paper is EV charging load short-term forecasting in the urban area of a city in northern China. The dataset was data collected from 38 public DC charging stations in the city's urban area, from 1 January 2019 to 31 March 2020. The number of charging stations in residential, commercial, work and leisure areas is 8, 12, 11, and 7. These charging stations have 298 charging poles, each with a maximum charging power of 60 kW. The dataset included the active power of the charging poles, the transaction power, the charging start time and the charging end time, etc. The active power of the charging poles was sampled at 15 min intervals.
Meteorological data, which can be obtained from China Meteorological Data Network, include the temperature, humidity, precipitation, visibility, wind speed, and weather type. Among them, the temperature, humidity, and precipitation need to be interpolated by spline, and the purpose is to obtain the sampling value simultaneously with the load. Other data includes date type, season, etc.
All of the experimental models were run in the Python 3.6 programming environment, implemented under the Pytorch framework. The hardware used for the experiments was a PC with an Intel Core i7-10300H CPU, NVIDIA RTX 2060 GPU, and 32 GB of RAM.

Results
The subject of the study in the paper is EV charging load short-term forecasting in the urban area of a city in northern China. The dataset was data collected from 38 public DC charging stations in the city's urban area, from 1 January 2019 to 31 March 2020. The number of charging stations in residential, commercial, work and leisure areas is 8, 12, 11, and 7. These charging stations have 298 charging poles, each with a maximum charging power of 60 kW. The dataset included the active power of the charging poles, the transaction power, the charging start time and the charging end time, etc. The active power of the charging poles was sampled at 15 min intervals.
Meteorological data, which can be obtained from China Meteorological Data Network, include the temperature, humidity, precipitation, visibility, wind speed, and weather type. Among them, the temperature, humidity, and precipitation need to be interpolated by spline, and the purpose is to obtain the sampling value simultaneously with the load. Other data includes date type, season, etc.
All of the experimental models were run in the Python 3.6 programming environment, implemented under the Pytorch framework. The hardware used for the experiments was a PC with an Intel Core i7-10300H CPU, NVIDIA RTX 2060 GPU, and 32 GB of RAM.

Input Variables Selection and Processing
According to the investigation of influencing factors on EV charging load, these factors were divided into meteorological factors, date features, and similar daily load in this paper. Next, three types of features are selected and processed.
The MIC between each meteorological factor and EV charging load was calculated except for weather conditions. Table 1 shows the MIC and Pearson correlation coefficient between EV charging load and temperature, humidity, precipitation, visibility, and wind direction. As shown in Table 1, the EV charging load has a strong correlation with temperature, humidity, and rainfall but a weak correlation with visibility and wind speed. At the same time, the influence of weather conditions on the charging load of EVs cannot be ignored [25]. The min-max normalization was used to linearly transform the raw temperature, humidity, and rainfall data to [0, 1]. The number of index mapping databases is referenced in Ref. [18]. In this paper, the mapping values were set to 0.1, 0.2, and 0.3 for the weather types sunny, cloudy and overcast, respectively, and 0.7, 0.1, and 1.5 for the weather types light rain or snow, rain or snow, and heavy rain or snow, respectively. Therefore, this paper selected weather type, temperature, humidity, and rainfall as the meteorological features that affect the EV charging load. Thus, this paper selected the temperature, humidity, rainfall, and weather conditions among meteorological factors as similar daily selection and prediction models. Since the month, season, and week type affect the EV charging load fluctuation characteristics, the season, month, day, week type, weekday, and holiday, selected as date features, were used as the input of the prediction model. Table 2 depicts the date features. Similar daily loads were obtained from the similar days model. The min-max normalization was adopted to constrain EV charging load to [0, 1]. After that, the forecasted load values were exponentiated to establish a nonlinear relationship between the exponentially mapped forecasted load values and the historical loads. It eliminates the lagging problem when the model takes the last moment of the input sequence as the forecasting load value.

Performance Evaluation
The paper considered the root mean square error (RMSE), the mean absolute error (MAE), and the mean absolute percentage error (MAPE) while assessing the performance of the forecasting model. These are the statistical metrics defined: where N indicates the number of validation or testing instances. y i and y f i represents the actual load and forecasted load of the i-th instance, respectively. Each statistical metric has different advantages and disadvantages. The RMSE evaluates the performance of a predictive model based on the mean absolute error of the deviation between predicted and actual loads. However, it is susceptible to outliers. In comparison to the RMSE, the MAE reflects the mean absolute error between forecasted and actual loads. It is more resilient to outliers than the RMSE but does not show the real degree of prediction bias. The MAPE is a forecast accuracy measure that considers the relative difference between forecasted and actual loads. However, the MAPE does not apply when the actual load is zero. Therefore, it is vital to employ multiple statistical metrics to assess the prediction performance.

Similar Daily Load Selection Based on Weighted Grey Correlation Analysis
The weather condition, temperature, humidity, rainfall, and week type are selected as daily features for the similar day in this paper. Since weather conditions and week type similarity are coarse-grained features, while temperature, humidity, and rainfall are fine-grained features, it is necessary to select the coarse-grained amounts of temperature, humidity, and rainfall. This paper selected daily maximum temperature, mean temperature, minimum temperature, as well as daily mean humidity and daily average rainfall as coarsegrained characteristics. Therefore, weather conditions, daily maximum temperature, daily average temperature, daily minimum temperature, humidity, rainfall, and week type similarity were selected as daily features. According to the selected day characteristics and the weighted gray correlation degree, a similar day set of the forecasting day was obtained.
Taking the EV charging load forecast on 15 December 2019 as an example, the weather forecast parameters on that day are shown in Table 3. Because the selected December belongs to winter, the week type similarity obtained by Spearman correlation analysis in this season is shown in Table 4.  According to the historical meteorological data and week type before the forecast day (1 December 2019 to 14 December 2019), the weighted grey correlation degrees between the forecasting day and the historical days were calculated to obtain a similar day set. The results of a similar day set are shown in Table 5. From the similar day model results, it can be seen that the length of the similar day historical load sequence of the forecasting day is 384. In this paper, the number of channels of the multi-channel 1DCNN model was set to 4 to fully exploit the characteristics of EV charging load at different time scales. In the multi-channel 1DCNN model, the convolution stride in each channel was set to 1, and the activation function Tanh was selected to perform nonlinear mapping on the results after each convolution. The hyperparameters of the multi-channel 1DCNN model are shown in Table 6. The TCN model hyperparameters are shown in Table 7. The hyperparameters of the BP model and output layer are shown in Table 8. In this paper, meteorological features, date features, and similar daily loads were selected as input variables for the MCCNN-TCN model, as shown in Table 9.    Table 10. From Table 10, it can be seen that the prediction performance of Model 1 to Model 4 decreases as the extracted time scale increases. This is due to the fact that the single-channel 1DCNN-TCN at the long-term scale loses the local short-term variation features of the EV charging load. The reason why the prediction performance of Model 1 is lower than that of the MCCNN-TCN model is that Model 1 lacks attention to the change trend features of EV charging load at a long-time scale. The advantage of the MCCNN-TCN model is that it can extract the local short-term change features and long-term change trend features of the EV charging load. Therefore, the RMSE, MAPE, and MAE values of the MCCNN-TCN model are lower than those of the single-channel 1DCNN-TCN models. It can be shown that extracting the multi-scale features of EV charging load can significantly improve the prediction accuracy.

Comparative Analysis of Different Forecasting Models
In order to evaluate the forecasting accuracy and superiority of the model proposed in this paper, ANN, LSTM, CNN-LSTM, and TCN prediction models, whose model structures are shown in Appendix B Figures A4-A7, were chosen for comparison. Table 11   The forecasting load curve of the model mentioned above on the test set from 1 March to 7 March 2020 is shown in Figure 7. It can be seen from Figure 7 that the original load is an approximately constant value from 0:00 to 6:00 am every day. The forecasting value of this period, except for the BP model, the forecasting value of all models fluctuates and deviates from the actual value. Although the forecasting value of the ANN model remains constant, it deviates significantly from the actual value. The MCCNN-TCN model fluctuates less than other models and is proximate to the actual value. At the peak of the load curve, the predicted values of the LSTM, ANN, and CNN-LSTM models all deviate to a certain extent and lag significantly compared with the actual values. The TCN model has a significant deviation from the actual values. In comparison to other models, the changing trend of the MCCNN-TCN model is compatible with the actual situation, and the predicted value is more proximate to the actual value. In the rising stage of the load curve, the forecasting value of the MCCNN-TCN model can also maintain a trend similar to the actual value. By analyzing the forecast effect of each prediction model in three stages, it can be seen that the MCCNN-TCN model can improve the accuracy of the short-term load forecasting of EV charging load. This is because the MCCNN-TCN model can not only learn the variation law of EV load on a long timescale but also pay attention to the short-term fluctuation characteristics of EV charging load.  Table  can be seen from Table 12  The RMSE, MAPE, and MAE of each model on the test set are shown in Table 12. It can be seen from Table 12 that the MAPE of the MCCNN-TCN model is 13.24%, which is 14.09%, 25.13%, 27.32%, and 4.48% higher than that of the ANN, LSTM, CNN-LSTM, and TCN models, respectively. The RMSE of the MCCNN-TCN model is 4.92 kW, which is also significantly less than that of other models. The absolute prediction error boxplots of the five models on the test dataset are shown in Figure 8. The wider the boxplot, the more spread out the prediction errors are. It can be seen from Figure 8 that the prediction error range of the MCCNN-TCN model is the narrowest while the LSTM is the widest, and the median absolute error of the MCCNN-TCN model is smaller than that of ANN, LSTM, CNN-LSTM, and TCN. From the prediction results, the MCCNN-TCN model is more effective than the ANN, LSTM, and CNN-LSTM models in complex fluctuation time series prediction.  In addition, it can be seen from Appendix A Figure A2 that in different seasons, the charging load of EVs will show different characteristics. Therefore, this means that the performance of the model proposed in this paper needs to be evaluated further during each season. According to the four seasons defined by meteorology, spring is from March 2019 to May 2019, summer is from June 2019 to August 2019, autumn is from September 2019 to November 2019, and winter is from December 2019 to February 2020. In this paper, each season's historical load and meteorological data are selected, respectively, and the training set, the verification set, and the test set are selected according to the ratio of 8:1:1. The prediction errors of different models on the test set of each season are presented in Table 13.  In addition, it can be seen from Appendix A Figure A2 that in different seasons, the charging load of EVs will show different characteristics. Therefore, this means that the performance of the model proposed in this paper needs to be evaluated further during each season. According to the four seasons defined by meteorology, spring is from March 2019 to May 2019, summer is from June 2019 to August 2019, autumn is from September 2019 to November 2019, and winter is from December 2019 to February 2020. In this paper, each season's historical load and meteorological data are selected, respectively, and the training set, the verification set, and the test set are selected according to the ratio of 8:1:1. The prediction errors of different models on the test set of each season are presented in Table 13. As shown in Table 13, by comparing the prediction results of the five models in each season, the advanced nature of the model proposed in this paper can be verified intuitively.

Discussion
By comparing with the single-channel 1DCNN-TCN model, it can be demonstrated that the method of extracting EV charging load feature information at different time scales by setting multiple parallel 1DCNN passes can significantly improve the short-term load prediction performance.
The results in Table 12 show that the MCCNN-TCN model can effectively improve short-term load prediction by using an approach that extracts EV charging load features

Discussion
By comparing with the single-channel 1DCNN-TCN model, it can be demonstrated that the method of extracting EV charging load feature information at different time scales by setting multiple parallel 1DCNN passes can significantly improve the short-term load prediction performance.
The results in Table 12 show that the MCCNN-TCN model can effectively improve short-term load prediction by using an approach that extracts EV charging load features at multiple scales and relies on TCN to establish long-time dependencies between features. The ANN model has the disadvantage of only establishing superficial nonlinear mapping relationships, which leads to a weaker ability to extract temporal correlations of EV charging loads. Recurrent neural network models such as LSTM have memory properties. They can learn long-term temporal correlations, but feature extraction is weak due to the lack of convolution in their models. This leads to its poor effectiveness in predicting EV charging loads characterized by substantial fluctuations over short periods. The TCN model has superior predictive capabilities over the LSTM and CNN-LSTM due to the availability of convolutional units for extracting shallow temporal features and establishing temporal dependencies. However, the TCN model can only extract features at a single scale, and therefore its prediction performance is poorer than that of the MCCNN-TCN. Further, the results in Table 13 show that the predictive performance of the MCCNN-TCN model proposed in this paper is stable and outperforms those of the comparison models under different seasons.
Combined with the above analysis, it can be seen that the EV charging load prediction model proposed in this paper has a high prediction accuracy. However, the model proposed in this paper relies on the accuracy of meteorological data and EV charging load data to achieve high accuracy prediction. Therefore, some problems need to be noted in the engineering application of this method. On the one hand, if there are deviations in the meteorological data measurement of the forecasting day, this will affect the selection of similar daily loads. This paper uses several meteorological and date factors as day features when selecting similar day loads. Additionally, the adjacent day loads of the forecasting day to be measured are also added to the similar day set, making the similar day selection model somewhat fault-tolerant. On the other hand, in the power system, there are disturbances in the power load data from the measurement system caused by errors in the electric power system, outliers due to data encoding errors, and EV charging start and end times falling between load sampling points. Suppose the deviation from the actual value is slight. In that case, the deviation from the actual value obtained from the prediction model will also be slight. Conversely, suppose there are significant deviations from the actual values. In that case, the actual values need to be estimated using data pre-processing techniques such as mean-fill, interpolation, and algorithmic mean filtering.

Conclusions
Due to the randomness of EV charging behavior, the short-term fluctuation characteristics of EV charging load are obvious in one day. In order to improve the load prediction accuracy, this paper proposes the MCCNN-TCN load model, which considers the multitime scale characteristics of EV charging loads. The multi-channel 1DCNN model was used to extract the features of EV charging load at multiple time scales. The TCN model was used to establish global temporal dependencies between the features.
By considering the influence of various factors on the load, MIC and Spearman coefficient were used to reduce the meteorological feature dimension and establish the similarity of date types, respectively. Then, taking the selected meteorological features and the similarity of date types as the daily features, a similar day selection model based on the weighted grey correlation degree was established to select similar daily loads. The selected meteorological features, date features, and similar daily loads were used as the input of the MCCNN-TCN model.
From the comparative experiments of single-channel 1DCNN-TCN and MCCNN-TCN, it can be seen that MCCNN-TCN can improve the prediction accuracy of EV charging load. This shows that the prediction performance can be improved by extracting the According to the prediction results compared with ANN, LSTM, CNN-LSTM, and TCN models, compared with these models, due to the unique structure of the MCCNN-TCN network, it can learn the multi-scale features of the EV charging load time series and master the changing law of EV charging load.
The MCCNN-TCN network constructed in this paper also lacks the consideration of real-time electricity price factors. In the future, we can further consider the selection of richer feature data and take advantage of big data to improve the accuracy of load forecasting.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Based on the EV charging load dataset used in Section 3 of the paper, the characteristics of EV charging load in different months, seasons, and week kinds are investigated. The box plot of EV charging load in each month is shown in Figure A1, and the average daily EV charging load curves for different seasons and different week kinds are shown in Figures A2 and A3

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Based on the EV charging load dataset used in Section 3 of the paper, the characteristics of EV charging load in different months, seasons, and week kinds are investigated. The box plot of EV charging load in each month is shown in Figure A1, and the average daily EV charging load curves for different seasons and different week kinds are shown in Figures A2 and A3, respectively.    Figure A6. CNN-LSTM model architecture. Figure A6. CNN-LSTM model architecture.