Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction

This paper proposes a convolutional neural network (CNN)-based method that learns traffic as images and predicts large-scale, network-wide traffic speed with a high accuracy. Spatiotemporal traffic dynamics are converted to images describing the time and space relations of traffic flow via a two-dimensional time-space matrix. A CNN is applied to the image following two consecutive steps: abstract traffic feature extraction and network-wide traffic speed prediction. The effectiveness of the proposed method is evaluated by taking two real-world transportation networks, the second ring road and north-east transportation network in Beijing, as examples, and comparing the method with four prevailing algorithms, namely, ordinary least squares, k-nearest neighbors, artificial neural network, and random forest, and three deep learning architectures, namely, stacked autoencoder, recurrent neural network, and long-short-term memory network. The results show that the proposed method outperforms other algorithms by an average accuracy improvement of 42.91% within an acceptable execution time. The CNN can train the model in a reasonable time and, thus, is suitable for large-scale transportation networks.


Introduction
Predicting the future is one of the most attractive topics for human beings, and is the same true for transportation management.Understanding traffic evolution for the entire road network rather than on a single road is of great interest and importance to help people with complete traffic information in making better route choices and to support traffic managers in managing a road network and allocating resources systematically [1,2].
However, large-scale network traffic prediction requires more challenging abilities for prediction models, such as the ability to deal with higher computational complexity incurred by the network topology; the ability to form a more intelligent and efficient prediction to solve the spatial correlation of traffic in roads expanding on a two-dimensional plane; and the ability to forecast longer-term futures to reflect congestion propagation.Unfortunately, traditional traffic prediction models, which usually treat traffic speeds as sequential data, do not provide those abilities because of limitations, such as simple implementation, hypotheses and assumptions, ineptness to deal with outliers, noisy or missing data, and incapability to determine dimensions [3].Thus, existing models may fail to predict large-scale network traffic evolution.
In the existing literatures, two families of research methods have dominated studies in traffic forecasting: statistical methods and neural networks [3].
Statistical techniques are widely used in the traffic prediction.For example, according to the periodicity of traffic evolutions, nonparametric models, such as k-nearest neighbors (KNN), have been applied to predict traffic speeds and volumes [4][5][6].More advanced models were employed, including support vector machines (SVM) [7], Online-SVM [8], and seasonal SVM [9] to promote prediction accuracy.SVM performance in large-scale traffic speed prediction was further improved [9,10].Multivariate nonparametric regression was also used in traffic prediction [11,12].Considering the correlations shown in successive time sequences of traffic variables, time-series prediction models have been widely employed in traffic variable prediction.One of the typical models is the autoregressive integrated moving average (ARIMA) model, which considered the essential characteristics of traffic variables, such as inherent correlations (via moving average) and its effect on the short future (via autoregression).To date, the model, and its extensions, such as seasonal ARIMA model [13], KARIMA model [14], seasonal ARIMA model [15], and the ARIMAX model [16], have been widely studied and applied.In summary, statistical methods have been widely used in traffic prediction, and promising results have been demonstrated.However, these models ignore the important spatiotemporal feature of transportation networks, achieve lower accuracies than neuralnetwork-based models, and cannot be applied to predict overall traffic in a large-scale network.The spatial effects of adjacent road sections cannot be neglected.SVM usually takes a long time and consumes considerable computer memory on training and hence, it might be powerless in big datarelated applications.
Artificial neural networks (ANNs) are also usually applied to traffic prediction problems because of its advantages, such as their capability to work with multi-dimensional data, implementation flexibility, generalization ability, and strong forecasting power [3].For example, Huang and Ran [17] used ANN to predict traffic speed under adverse weather conditions.Park et al. [2] presented real time vehicle speed prediction algorithm based on ANN.Zheng et al. [18] combined ANN with Bayes' theorem to predict short-term freeway traffic flow.Moretti et al. [19] developed statistical and ANN bagging ensemble hybrid modeling to forecast urban traffic flow.
However, the data-driven mechanism of ANN cannot explain well the spatial correlations of a road network.In addition, compared with deep learning approaches, prediction accuracy of ANN is lower because of its simplicity.Recently, more advanced and powerful deep learning models have been applied to traffic prediction.For example, Duan et al. [20] used denoising stacked autoencoders (DSAE) for traffic data imputation.Polson and Sokolov [21] used deep learning architecture to predict traffic flow.Huang et al. [22] first introduced DBN into transportation research.Ma et al. [23] combined deep restricted boltzmann machines (RBM) with RNN and formed a RBM-RNN model that inherits the advantages of both RBM and RNN.Lv et al. [24] proposed a novel deep-learningbased traffic prediction model that considered spatiotemporal relations, and employed stack autoencoder (SAE) to extract traffic features.Ma et al. [25] introduced LSTM into traffic prediction and demonstrated that LSTM outperformed other neural networks in both stability and accuracy in terms of traffic speed prediction by using loop detector data collected in the Beijing road network.
Deep learning methods exploit much deeper and more complex architecture than ANN, and can achieve better results than traditional methods.However, these attempts still focused mainly on the prediction of traffic on a road section or a small network region.Few studies have considered a transportation network as a whole and directly estimated the traffic evolution in a large scale.More importantly, majority of these models merely considered the temporal correlations of traffic evolutions at a single location, and did not consider its spatial correlations from the perspective of network.
To fill the gap, this paper introduces an image-based method that represents network traffic as images, and employs deep learning architecture of convolution neural network (CNN) to extract spatiotemporal traffic features contained by the images.CNN is an efficient and effective image processing algorithm and has been widely applied in the field of computer vision and image recognition with remarkable results achieved [26,27].Compared with prevailing artificial neural networks, CNN has the following properties in extracting features.First, the convolution layers of CNN are connected locally instead of being fully connected, meaning that output neurons are only connected to its local nearby input neurons.Second, CNN introduces a new layer-construction mechanism called pooling layers that merely select salient features from its receptive region and tremendously reduce the number of model parameters.Third, normal fully connected layers are used only in the final stage, when the dimension of input layers is controllable.The locally connected convolution layers enable CNN to deal efficiently with spatial-correlated problems [26,28,29].The pooling layers makes CNN generalizable to large-scale problems [30].The contributions of the paper can be summarized as follows:


The temporal evolutions and spatial dependencies of network traffic are considered and applied simultaneously in traffic prediction problems by exploiting the proposed image-based method and deep learning architecture of CNN.


Spatiotemporal features of network traffic can be extracted using CNN in an automatic manner with a high prediction accuracy.


The proposed method can be generalized to large-scale traffic speed prediction problems while retaining trainability because of the implementation of convolution and pooling layers.
The rest of the paper is organized as follows.In section two, a two-step procedure that includes converting network traffic to images and CNN for network traffic prediction is introduced.In section three, four prediction tests are conducted on two transportation networks using the proposed method, and are compared with the other prevailing prediction methods.Finally, conclusions are drawn and future study directions discussed.

Methods
Traffic information with time and space dimensions should be considered to predict traffic congestion in a transportation network precisely.Let x-and y-axes represent time and space of a matrix, respectively and the elements within the matrix are values of traffic variables associated with time and space.The generated matrix can be viewed as a channel of an image in the way that every pixel in the image shares the corresponding value in the matrix.As a result, the image is of M pixels width and N pixels height, where M and N are the two dimensions of the matrix.A two-step methodology, converting network traffic to images and CNN for network traffic prediction respectively, is designed to learn from the matrix and make predictions.

Converting Network Traffic to Images
A vehicle trajectory recorded by a floating car with a dedicated GPS device provides specific information on vehicle speed and position at certain time.From the trajectory, the spatiotemporal traffic information on each road segment can be estimated and integrated further into a time-space matrix that serves as time-space image.
In the time dimension, time usually ranges from the beginning to the end of a day, and time intervals, which are usually 10 sec to 5 min, depend on the sampling resolution of the GPS devices.Generally, narrow intervals, for example 10 sec, are meaningless for traffic prediction.Thus, if the sampling resolution is high, these data may be aggregated to obtain wider intervals, such as several minutes.
In the space dimension, the selected trajectory is viewed as a sequence of dots with inner states including vehicle position, average speed, etc.This sequence of dots can be ordered simply and directly into linear and fitted into y-axis, but may result in high dimension and uninformative y-axis, because the sequence of dots are lengthy and a large number of regions in this sequence are stable and lack variety.Therefore, to make y-axis both compact and informative, the dots are grouped into sections, each representing a similar traffic state.The sections are then ordered spatially with reference to a predefined start point of a road, and then fitted into the y-axis.
Finally, a time-space matrix can be constructed using time and space dimension information.Mathematically, denote the time-space matrix by where N is the length of time intervals, Q is the length of road sections; the ith column vector of M is the traffic speed of the transportation network at time i; and pixel mij is the average traffic speed on section i at time j.Matrix M forms a channel of the image.Figure 1 illustrates the relations among raw averaged floating car speeds, time-space matrix, and the final image.CNN has exhibited significant learning ability in image understanding because of its unique method of extracting critical features from images.Compared to other deep learning architectures, two salient characteristics contribute to the uniqueness of CNN, namely, (a) locally connected layers, which means output neurons in the layers are connected only to its local nearby input neurons, rather than the entire input neurons in fully connected layers.These layers can extract features from an image effectively, because every layer attempts to retrieve a different feature regarding the prediction problem [26]; (b) pooling mechanism, which largely reduces the number of parameters required to train CNN while guaranteeing that the most important features are preserved.
Sharing the two salient characteristics, CNN is modified in the following aspects to adapt to the context of transportation.First, the model inputs are different, i.e., the input images have only one channel valued by traffic speeds of all roads in a transportation network, and the pixel values in the images range from zero to the maximum traffic speed or speed limits of the network.In contrast, in the image classification problem, the input images commonly have three channels, i.e.RGB, and pixel values range from 0 to 255.Although differences exist, the model inputs are normalized to prevent model weights from increasing the model training difficulty.Second, the model outputs are different.In the context of transportation, the model outputs are predicted traffic speeds on all road sections of a transportation network, whereas, in the image classification problem, model outputs are image class labels.Third, abstract features have different meanings.In the context of transportation, abstract features extracted by the convolution and pooling layers are relations among road sections regarding traffic speeds.In the image classification problem, the abstract features can be shallow image edges and deep shapes of some objects in terms of its training objective.All these abstract features are significant for a prediction problem [31].Fourth, the training objectives differ because of distinct model outputs.In the context of transportation, because the outputs are continuous traffic speeds, continuous cost functions should be adopted accordingly.In the image classification problem, crossentropy cost functions are usually used.

CNN characteristics
Figure 2 shows the structure of CNN in the context of transportation with four main parts, that is, model input, traffic feature extraction, prediction, and model output.Each of the parts is explained as follows.
First, model input is the image generated from a transportation network with spatiotemporal characteristics.Let the lengths of input and output time intervals be F and P, respectively.The model input can be written as where i is the sample index, N is the length of time intervals, and mi is a column vector representing traffic speeds of all roads in a transportation network within one time unit.
Second, the extraction of traffic features is the combination of convolution and pooling layers, and is the core part of the CNN model.The pooling procedure is indicated by using pool, and L is denoted by the depth of CNN.Donate the input, output, and parameters of lth layer by where σ is the activation function, which will be discussed in next section.The output in the lth (l≠1, l=1..L) convolution and pooling layers can be written as The extraction of traffic features has the following characteristics.(a) Convolution and pooling are processed in two dimensions.This part can learn the spatiotemporal relations of the road sections in terms of prediction task in model training.(b) Different from layers with only four convolution or pooling filters in Figure 2, in reality, the number of the layers in applications are set to be hundreds, which means hundreds of features can be learnt in CNN.(c) CNN transforms the model input into deep features through these layers.
In the model prediction, the features learnt and outputted by traffic feature extraction are concatenated into a dense vector that contains the final and most high-level features of the input transportation network.The dense vector can be written as   12 , ,..., , where L is the depth of CNN and flatten is the concatenating procedure discussed above.Finally, the vector is transformed into model outputs through a fully connected layer.The model output can be thus written as where Wf and bf are parameters of the fully connected layer.ŷ are the predicted network-wide traffic speeds.
Convolution layers differ from traditional feedforward neural network where each input neuron is connected to each output neuron and network is fully connected (fully connected layer).CNN uses convolution filters over its input layer and obtains local connections where only local input neurons are connected to the output neuron (convolution layer).Hundreds of filters are sometimes applied to the input and results are merged in each layer.One filter can extract one traffic feature from the input layer, and thus, hundreds of filters can extract hundreds of traffic features.Those extracted traffic features are combined further to extract a higher level and more abstract traffic features.The process forms compositionality of CNN, meaning each filter composes a local path from lower-level into higher-level features.When one convolution filter r l W is applied to the input, the output can be formulated as where m and n are two dimensions of the filter, def is the data value of the input matrix at positions e and f, and   r l ef W is the coefficient of the convolution filter at positions e and f, and yconv is the output.
Pooling layers are designed to downsample and aggregate data, because they only extract salient numbers from the specific region.The pooling layers guarantee that CNN is locally invariant, which means that CNN can always extract the same feature from the input, regardless of feature shifts, rotations, or scales [31].Based on the above facts, the pooling layers can not only reduce the network scale of CNN, but also identify the most prominent feathers of input layers.Taking the maximum operation as an example, the pooling layer can be formulated as where p and q are two dimensions of pooling window size, def is the data value of the input matrix at positions e and f, and ypool is the pooling output.

CNN optimization
The predictions of CNN are traffic speeds on different road sections, and mean squared errors (MSEs) are employed to measure the distance between predictions and ground-truth traffic speeds.Thus, minimizing MSEs is taken as the training goal of CNN.MSE can be written as Let the model parameters be set , the optimal values of  can be determined according to the standard backpropagation algorithm similar to other studies on CNN [26,31]:

Data description
Beijing is the capital of China and one of the largest cities in the world.At present, Beijing is encircled by four two-way ring roads, that is, second to fifth ring roads, and has about ten thousand taxis to serve its population of more than 21 million.These taxis are equipped with GPS devices that upload data approximately every one minute.The uploaded data contains information, including car positions, recording time, moving directions, car travel speeds, etc.The data were collected from May 1, 2015 to Jun 6, 2015 (37 days).These data are well-qualified probe data because the missing data account for below 2.9%, and are properly remedied using spatiotemporal adjacent records.In this paper, data are aggregated into two-min intervals because data are fluctuated usually in shorter time intervals, and the aggregation will cause data to be more stable and representative.
In this paper, two sub transportation networks, i.e., the second ring (labeled as Network 1) and north-east transportation network (labeled as Network 2) of Beijing, are selected to demonstrate the proposed method.The two networks differ in network size and topology complexity as shown in Figure 3. Network 1 consists of 236 road sections for aggregating GPS data, all of which are one-way roads.Network 2 consists of 352 road sections including two-way and cross roads.The selected networks represent different road topologies and structures, and thus can be used to better evaluate the effectiveness of the proposed CNN traffic prediction algorithm.Four prediction tasks are performed to test the CNN algorithm in predicting network-wide traffic speeds.These tasks differ in prediction time spans, i.e., short-term and long-term predictions, and in input information, i.e., prediction using abundant information and prediction using limited information.The four tasks are listed as follows: Task 1: 10-minute traffic prediction using last 30-minute traffic speeds; Task 2: 10-minute traffic prediction using last 40-minute traffic speeds; Task 3: 20-minute traffic prediction using last 30-minute traffic speeds; Task 4: 20-minute traffic prediction using last 40-minute traffic speeds.
In the four tasks, the capabilities and effectiveness of CNN in predicting large-scale transportation network speed can be validated by calculating and comparing the MSEs of CNN.

Time-space image generation
In terms of time-space matrix representation, the goal is to transform spatial relations of the traffic in a transportation network into linear representations.The matrix is straightforward in Network 1 because connected road sections in the ring road can be easily straightened.For Network 2, straightening the road sections into a straight line is impossible while maintaining the complete spatial relations of these sections.A compromise is to segment the network into straight lines and lay road sections in order on these lines.Consequently, in Network 2, only linear spatial relation on straight lines can be captured.However, complex and network-wide relations of traffic speeds in Network 2 can still be learnt because the CNN can learn features from local connections and compose these features into high-level representations [27,31].Regarding Network 2, CNN learns the relations of traffic roads from segmented road sections and composes these relations into complex networkwide relations.
After using time-space matrix as the channel of an image and representing everyday traffic speeds of the network in an image, 37 images, each corresponding to a day, can be generated for Networks 1 and 2, respectively.Sample images of Networks 1 and 2 on May 26, 2015 are shown as Figure 4.The y-labels of Figure 4, i.e. s1, s2, s3, s4, and other, are road sections shown in Figure 3.The images show rich traffic information, such as most congested traffic areas in red regions and typical congestion propagation patterns, i.e. oscillating congested traffic (OCT) and pinned localized clusters (PLC).A more specific explanation on these traffic patterns can be found in the study by Schönhof and Helbing [32].Such rich information cannot be well learned by simple ANN.Thus, a more effective algorithm is necessary.

Tuning up CNN parameters
Two critical factors should be considered when implementing the structure of CNN: (a) hyperparameters concerned with convolution and pooling layers, such as convolution filter size, polling size, and polling method; and (b) depth of CNN.
First, the selection of hyperparameters relies on experts' experience.No general rules can be applied directly.Two well-known examples can be referred.One is LeNet, which marked the beginning of the development of CNN [33], and the other is AlexNet, which won the image classification competition ImageNet in 2010 [26].Based on the parameter settings of LeNet and AlexNet, we select convolution filters of size (3,3) and max poolings of size (2, 2) for the example networks.
Second, the depth of CNN should be neither too large nor too small [34], and thus, CNN is capable of learning much complex relations while maintaining the convergence of the model.Different values from small to large is assigned to test the CNN model until the incremental benefits are diminished and the convergence becomes difficult to determine a proper value for the depth of the model.The structures of CNN in different depths are listed in Table 1, where each convolution layer is followed by a pooling layer, and the numbers represent quantities of convolution filters in the layer.Obviously, the depth-1 network is a fully connected layer that transforms inputs into predictions, whereas the three other networks first extract spatiotemporal traffic features from the input image using convolution and pooling layers, and then make predictions based on them.In the experiments, the 40-min historical traffic speeds are used to predict the following 10-min traffic speeds.In model training, 21600 samples on the first 30 days are used, and in model validation, 5040 samples in the following seven days are used.The results are shown as Figure 5, which shows that adding depth for the CNN model significantly reduces MSEs on the testing data.As a result, depth-4 CNN model achieved the lowest MSEs on the training and testing data, which are 21.3 and 35.5, respectively.Therefore, the depth-4 model is adopted for experiments in this paper.The details of the depth-4 CNN are listed in Table 2.The model input has three dimensions (1,236,20), where the first number indicates that the input image has one channel, the second number represents the total number of road sections in Network 1, and the third number refers to the input time span, which is 20 time units.Convolution layers consecutively transform the channel into 256, 128, and 64 with the corresponding quantity of convolution filters.At the same time, pooling layers consecutively downsample the input window to (118, 10), (59, 5), and (30,3).The output dimensions in layer 6 are (64, 30, 3), which are then flattened into a vector with a dimension of 5760.The vector is finally transformed into the model output with dimension 1180 through a fully connected layer.Table 2 shows that the parameter scale in the convolution and pooling layers is small because 4032 parameters are necessary.Large quantity of parameters, i.e. 6796800, is required in the fully connected layer.However, this quantity is quite close to that of simple OLS, which is 236 20 1180 5569600    .Thus, the large parameter quantity in the fully connected layer cannot be avoided.

Results and comparison
Four prevailing algorithms are chosen for comparisons with CNN.OLS is the basic regression algorithm and taken as the benchmark.KNN performs regression using the nearest points.Random forest (RF) makes predictions based on branches of decision trees.ANN represents the traditional neural network and attempts to learn features through hidden layers.These algorithms differ in their ability to handle multi outputs.OLS, KNN, and RF cannot make multiple predictions at one time.Hence, to predict network-wide traffic speeds, multiple models have to be developed.In contrast, ANN can predict multiple outputs in one model.As for the ability to take spatial relations into account, all four algorithms treat traffic speeds in different sections as independent sequences and cannot learn spatial relations among sections.Moreover, KNN is configured to use 10 nearest points, RF is set up to generate 10 decision trees, and ANN is optimized to contain three hidden layers with each layer consists of 1000 hidden units.All algorithms are trained on a desktop computer with i7-3770 3.40GHz CPU and NVIDIA GeForce GTX 650 GPU.
Tables 3, 4 and Figure 6 show the results of four algorithms and CNN when applied to Networks 1 and 2 in four different prediction tasks.The results show that in all circumstances, CNN algorithm outperformed other algorithms on testing data, thereby implying that CNN can best generalize new data samples.One possible reason is that OLS, KNN, and RF treat traffic speeds in each section as independent sequences and model the traffic based on the assumption that traffic speeds on a section are only self-affected.This assumption ignores spatial relations among road sections in the network and neglects the important mutual effect of adjacent sections or deeper traffic features.ANN is also inferior to CNN, which could possibly be because the structure of ANN is so simple that ANN cannot capture rich features among road sections, and more importantly, ANN cannot handle and utilize spatial information among sections.
Long-term predictions using CNN can also be validated by comparing the results of tasks 1 and 3, and tasks 2 and 4. Usually, when the input time-span is fixed, long-term predictions achieve higher MSEs than short-term predictions.Long-term prediction can be improved by using more data.For example, in task 1, when the input time span is fixed to 30 min, CNN achieves an MSE of 47.8328 at the 20-min prediction, and is higher than 39.9405 at the 10-min prediction.However, the MSE of 47.8328 drops to 45.8109 when the input time span is increased to 40 mins, thereby implying that when training CNN to make long-term predictions, the larger number of input data will improve model results.Figure 7 shows training time of different algorithms on Networks 1 and 2. OLS, KNN, and ANN train the model more efficiently than CNN because these algorithms simple structures and are easy to train.However, these algorithms make significant trade-off between its training efficiency and prediction accuracy because MSEs resulting from these algorithms are at least twice as large as that of CNN.As to RF, it takes about nine hours to train and obtains much better results, but these results are still inferior to CNN.RF may fail when applied to a larger scale transportation network.Therefore, when both training efficiency and accuracy are considered, the applied CNN outperformed other algorithms. CNN achieves less accurate results in long-term than in short-term predictions.Prediction performance is expected to be enhanced as an increasing amount of historical data are fed into the CNN.

Conclusion
Deep learning methods are widely used in the domain of image processing with satisfactory results, because deep learning architectures usually have deeper construction and depict more complex nonlinear functions than other neural networks [22,23,25,34].However, limited studies have addressed spatiotemporal relations shown among road sections in transportation networks.Spatiotemporal relations are important traffic characteristics and better understanding of these relations will promote accuracy of traffic prediction with great possibility.
This paper proposes an image-based traffic speed prediction method that can extract abstract spatiotemporal traffic features in an automatic manner to learn spatiotemporal relations.The method contains two main procedures.The first procedure involves converting network traffic to images that represent time and space dimensions of a transportation network as two dimensions of an image.Spatiotemporal information can be preserved because surrounding road sections are adjacent on the image.The second procedure is CNN for network traffic prediction that applies deep learning architecture of CNN to the image.CNN has attained significant success in computer vision and performs well in the image learning task [26].In this transportation prediction problem, CNN shares the following important properties: (a) Spatiotemporal features of the transportation network can be extracted automatically because of the implementation of convolution and pooling layers of CNN.Thus, the need for manual feature selection can be avoided.(b) Using layers, CNN represents network-wide traffic information into high-level features that are then used to create network-wide traffic speed predictions.(c) CNN can be generalized to large transportation networks because it shares weights in convolution layers and employs the pooling mechanism.Two real life transportation networks and four prediction tasks are considered to test the applicability of the proposed method.The results show that the proposed method outperforms OLS, KNN, ANN, and RF, with an average accuracy promotion of 27.96%.The training time of the proposed method is acceptable because the proposed method achieves the best MSEs on testing data and takes much less training time than RF, which achieves the best MSEs on training data and achieves the second greatest prediction accuracy on testing data.
The proposed method has some possible interesting extensions.For example, in the second procedure, other models, such as the combination of CNN and LSTM, would be an interesting attempt.Specifically, CNN can first extract abstract traffic features from a transportation network.The feature vectors can be fed into the LSTM model for prediction.

Figure 1 .
Figure 1.An illustration of the traffic-to-image conversion on a network

Figure 2 .
Figure 2. Deep learning architecture of CNN in the context of transportation

Figure 3 .
Figure 3. Two sub transportation networks for testing: (a) Network 1, the second ring of Beijing; (b) Network 2, a network in north-east Beijing

Figure 4 .
Figure 4. Sample images with spatiotemporal traffic speeds for (a) Network 1 and (b) Network 2

Figure 7 .
Figure 7. Training time of different algorithms: (a) Training time on Network 1; (b) Training time on Network 2Based on the above discussion, useful conclusions can be yielded as follows: CNN outperforms other algorithms on testing data with an average accuracy improvement of 27.96% in all circumstances.

Table 2 .
[35]rparameters of CNN although it improves the prediction accuracy of CNN on testing data.The model should stop training when it begins to overfit.Early stopping is the most common and effective procedure to prevent the model from being overfitted[35].This method works in the phase of model training, and early stopping records losses of the model on validation dataset.After model training in each epoch, it checks if the losses increased or remained unchanged.Finally, if true and no sign of improvements are observed within a specific number of epochs, early stopping stops further model training.

Table 3 .
Prediction performance of CNN for Network 1

Table 4 .
Prediction performance of CNN for Network 2