6.1. Traffic States Classification Results
The RF algorithm is used to directly rank the importance of various parameters that characterize the traffic operation status in freeway weaving sections. Based on the ranking results, the most important variables are selected and used in training and analysis process of the subsequent models. The number of trees in the forest is set to 100, and the minimum number of samples required to split an internal node is 2. The ranking results of the relative importance of variables representing the traffic flow stability of each lane in the freeway weaving sections are shown in
Figure 4. A higher score indicates a greater influence of the corresponding feature variable on traffic flow stability.
According to the results of the RF, 5 min flow, average speed, density, and weather conditions are four common influencing factors among the top six factors for all the lanes within the freeway weaving sections. For Lane 1, which is the farthest lane from the weaving lanes, the main additional influencing factors are the speed differences in spatially associated areas. For Lane 2, the merging ratio also exerts a notable influence. Lane 3, as the outermost lane of the basic freeway segments, where primarily serves heavy vehicles. Thus, one critical influencing factor shifts from speed differences to the proportion of heavy vehicles. Lane 4 is an auxiliary lane at the weaving area, whose traffic conditions are more affected by the proportion of heavy vehicles and the diverging ratio.
The improved k-prototypes algorithm is applied to reclassify the traffic states of freeway weaving sections under adverse weather conditions using the selected the feature parameters from RF. A multi-round clustering approach is adopted to search for the optimal number of clusters, which is set from 2 to 10. The results of CUM for each cluster are summarized in
Table 4 and
Figure 5. The comparison of CUM values indicates that when the number of clusters is 7, the proposed improved k-prototypes algorithm yields the most effective clustering results across all lanes. Therefore, the traffic states of each lane in the freeway weaving sections should be divided into seven categories.
The clustering results of each lane using the improved k-prototypes algorithm are shown in
Table 5. By analyzing the distribution of cluster centers across the different categories for each lane, it can be observed that Category 3 consistently represents the free-flow conditions, with favorable weather and other indicators also reflecting high operational stability and minimal disturbance. This category can thus be considered the optimal traffic state. The dissimilarity distances between cluster centers are then used to quantify the proximity of each category to Category 3, which can convert the seven clusters into a seven-level classification of traffic states for freeway weaving sections.
6.2. Traffic State Prediction Results
The 518,400 5 min datasets were selected to train the proposed WSTGCN model, and the data from 27 December 2019, which was not included in the training set, is used to test the proposed model. It should be noticed that the output of deep learning model is continuous data. According to the comparative analysis of our results, the rounding processing method is preferred primarily due to its lower computational resource requirements and the fact that it does not require parameter tuning like a classifier. Thus, the predicted results are rounded to the nearest integer to obtain the final predicted traffic state level. We select the prediction results and rounded results of traffic states for Lane 3, the lane affected most by the weaving flow, to verify the effectiveness of this correction, which are represented in
Figure 6. The results indicate that the rounding correction values are highly consistent with the variation trend of the predicted continuous values, which have limited impact on the result analysis.
In order to assess the effectiveness of the proposed method, we compared proposed WSTGCN model with several widely used benchmark models to validate the performance in traffic flow state prediction under the adverse weather. The specific models are as follows:
- (1)
RNN (Recurrent Neural Network): A commonly used architecture for predicting the temporal patterns of traffic flow data.
- (2)
LSTM (Long Short-Term Memory): Compared with RNN, it is more suitable for handling data with long temporal dependencies and can effectively avoid issues such as gradient vanishing and explosion.
- (3)
GRU (Gated Recurrent Unit): A variant of LSTM that requires fewer parameters, less data, and shorter training time.
- (4)
TSE-GC-GRU: An architecture that adds a temporal attention mechanism to the combination of GCN and GRU, which enables the model to effectively identify how data at different time steps influence the prediction results.
- (5)
DT-GC-GRU: A dual-stream model consisting of two TSE-GC-GRU modules that, respectively, extract features from periodic sequences and recent time windows, thereby enhancing the model’s ability to capture the periodicity of traffic states.
- (6)
WSTGCN: An optimized version of the DT-GC-GRU model that incorporates a weather feature extraction module to further improve prediction performance.
The Root Mean Square Error (RMSE), Equalization Coefficient (EC), and Accuracy Rate (AR) are selected to evaluate the average deviation between predicted and actual values, the degree of spatiotemporal alignment between predicted results and actual traffic states, and the classification accuracy of the model for traffic states of the freeway weaving section under adverse weather conditions, respectively. Because the prediction output is ordinal data (the discrete traffic states), the prediction accuracy rate (AR) can be obtained by directly comparing the rounded predicted values with the actual traffic states.
is the total number of samples, is the sample index, is the true value of the -th sample, is the predicted value of the -th sample.
The performance of the selected models on the prediction of traffic states for all the lanes at freeway sections are listed and compared in
Table 6 and
Figure 7. Based on the performance comparison results, the WSTGCN traffic state prediction model developed in this study achieves the best predictive performance across all lanes. The goodness-of-fit (coefficient of determination) exceeds 0.9 for all lanes, and the prediction accuracy is above 90% for all lanes except lane 4, where it is very close to 90%. This demonstrates the high practical value of the proposed model.
Among the models, the RNN prediction model exhibits the poorest performance. Compared with LSTM, the GRU model, with fewer parameters, achieves better prediction results for all lanes. The TSE-GC-GRU model integrates graph convolutional networks to enhance spatial feature extraction, leading to significant improvements over GRU and LSTM. Specifically, relative to the GRU model, the TSE-GC-GRU reduces RMSE by 5.0–12.7%, increases the explained coefficient (EC) by 1.7–3.8%, and improves accuracy by 2.0–3.1% across lanes. The DT-GC-GRU model, which employs two layers of TSE-GC-GRU to incorporate periodic features of traffic state changes, further improves prediction performance. Its RMSE decreases by 4.5–13.0%, EC increases by 1.6–4.8%, and accuracy improves by 1.8–3.3% in lane-wise applications.
After incorporating weather features, the WSTGCN model achieves optimal prediction performance. Compared to the DT-GC-GRU model, WSTGCN reduces RMSE by 3.8–8.0%, increases EC by 1.0–3.2%, and improves accuracy by 1.4–3.1%, which indicates that the consideration of weather factors effectively enhances model performance.
Figure 7 presents a detailed comparison of the prediction results from the selected models for the traffic operation state time series data of four lanes in the freeway weaving sections. In this figure, the traffic operation states progressively worsen from State 1 to State 7, where State 1 represents the optimal operating condition, characterized by free-flow traffic and favorable weather conditions, and State 7 corresponds to the traffic state with the poorest operational stability, often occurring under congested flow and adverse weather conditions. The results in
Figure 7 further confirm that the WSTGCN model’s predictions are the closest to the actual states and achieve the highest prediction accuracy.
6.3. Influence on the Type of Weaving Section
Besides the type A weaving sections, the freeway also has type B weaving sections, where one weaving traffic stream can complete its maneuver without lane changes and the other stream requires at most one lane change, and type C weaving sections, where at least one weaving stream must make two or more lane changes to complete the maneuver. Typical examples of Type B and Type C weaving sections are shown in
Figure 8. Compared with Type A weaving sections, the Type B and C ones are more suitable when one weaving stream is significantly heavier than the other, which results in certain differences in traffic operation characteristics.
The primary influencing factors in Type B and Type C weaving sections can also be processed using RF model, whose results indicate that traffic volume, density, speed, and weather conditions remain the major influencing factors across all lanes. For lanes which are less affected by heavy vehicles and weaving flows, such as Lane 1, the critical additional influencing factors continue to be speed differential ones. Meanwhile, for weaving lanes and auxiliary lanes, heavy vehicle proportion and diverging ratios or merging ratios remain as critical additional factors. Overall, the influencing factor patterns are similar to those in Type A weaving sections. However, in practical applications, the proposed method in this study should be applied in conjunction with actual detection data for validation and analysis.
When applying the proposed WSTGCN model to predict the traffic state at freeway weaving sections, it was found that the proposed model can also effectively predict the traffic states of Type B and Type C weaving sections when they are trained with sufficient data. The Type B and Type C weaving segments shown in
Figure 8 were selected for validating the proposed WSTGCN model. The prediction results of Lane 1 (with limited influence by weaving traffic stream) and Lane 3 (weaving lane) were tested and are presented in
Table 7. Compared with the prediction results of Type A weaving section (see
Table 6), the WSTGCN model achieves comparable performance in predicting the traffic states of Type B and C weaving sections. For Lane 1, the prediction results remain unchanged. However, for Lane 3, due to more complex lane-changing behaviors, the prediction metrics (RMSE, EC, and AR) are approximately 3–5% lower than those for Type A. These results demonstrate that although the configuration of the weaving sections changes, the proposed WSTGCN model is still capable of effectively extracting weather features and capturing the spatiotemporal dynamics of traffic flow, thus achieving accurate traffic state prediction.