4.2. Evaluation of Prediction Performance
Figure 9 and
Table 6 present the model’s prediction performance, evaluated using the confusion matrix for the test set of each database. In
Figure 9, the vertical axis represents the actual data classes, while the horizontal axis shows the classes predicted by the model. In all cases, the model achieved consistently high accuracy in classifying the test data classes. In particular, predictions for the
normal class were completely accurate in all cases except for a single misclassification in the first database.
In the first and second databases, most prediction errors occurred in the classification of the
ice class. Specifically, in the first database, most errors involved misclassifying
ice as
wet, whereas in the second database, the most frequent errors involved predicting
ice as
snow. According to
Table 6, the proportions of misclassification were similar across the two databases, suggesting that the impact of database structure on classification performance was minimal in quantitative terms.
However, the primary objective of this study is to assist road managers in identifying hazardous road conditions, thereby enabling appropriate decision-making. Even if the ice class is misclassified as snow, road managers would still be expected to conduct snow removal operations and address the hazardous condition. Consequently, the likelihood of such misclassification leading to incorrect decision-making remains relatively low. In contrast, if an ice surface is misclassified as wet, there is a risk that no action will be taken despite the presence of a hazardous condition. Therefore, when developing a CNN model for predicting winter road surface conditions, it is more desirable from a safety perspective to classify slushy surfaces as either ice or snow, as in the second database. This approach is preferable to the method adopted in the first database.
In the third database, most prediction errors occurred in the wet class. When road managers use a model trained on this database, it may lead to overly cautious decisions, such as conducting snow removal on non-hazardous road surfaces. However, from a safety perspective, such overreactions are less critical than underreactions.
4.3. Detection of Prediction Error
Prediction errors were reviewed for each model. For the model trained on the first database, only 1 error occurred out of 80 cases in the
normal class, and only 1 error out of 568 cases in the
wet class, indicating excellent accuracy. In contrast, prediction errors occurred more frequently in the slushy conditions. For the
ice class, 19 errors were observed out of 530 cases, and for the
Snow class, 6 errors out of 503 cases, indicating comparatively lower accuracy.
Figure 10 illustrates the prediction errors of the first model for the
ice and
snow classes. These road surfaces exhibited a mixed distribution of
wet,
ice, and
snow conditions, making it difficult to determine the dominant RSC based on visual inspection. Accordingly, during database construction, such cases were defined as belonging to the most hazardous condition,
ice. However, the prediction model classified these images as either
wet or
snow. As shown in
Figure 10a, misclassification between the
ice and
snow occurs when the visually observed surface moisture indicates that the area covered by ice or snow is greater than that of water. Therefore, regardless of such misclassifications, snow removal operations would still be carried out in practice, and the risk of skidding accidents is relatively low. In contrast, as illustrated in
Figure 10b, misclassifying
ice as
wet may result in incorrect decision-making and delayed snow removal operations.
For the model trained on the second database, no prediction errors occurred in the normal class. In the wet class, two new errors were observed out of 562 cases, both misclassified as ice. In the ice class, 21 errors occurred out of 539 cases, yielding an accuracy similar to that of the first model; however, misclassifications as wet were substantially reduced, while misclassifications as snow markedly increased. For the snow class, 3 errors were observed out of 520 cases, representing a slight improvement in accuracy.
Figure 11 presents the major prediction errors of the second model. The number of cases in which the
ice class was misclassified as
wet decreased substantially. However, as shown in
Figure 11a, some errors persisted when moisture on the road surface reflected light from street lamps or vehicle headlights. Such phenomena were mainly observed in data captured under artificial lighting conditions.
Figure 11b shows prediction errors for slushy road surfaces, where water, ice, and snow coexist, indicating that the model tended to classify them as either
ice or
snow. As with
Figure 10a, this type of misclassification is regarded as unlikely to lead to incorrect decision-making in practical contexts.
For the model trained with the third database, prediction errors were mainly observed between the
wet and
hazard classes. No prediction errors occurred in the
normal class. In the
wet class, 7 errors out of 644 cases were misclassified as
hazard, while in the
hazard class, 6 errors out of 656 cases were misclassified as
wet.
Figure 12 presents examples of the misclassified data.
Figure 12a shows cases in which the
hazard class was misclassified as
wet, with some errors attributed to artificial lighting, as in
Figure 11a. Meanwhile,
Figure 12b illustrates cases where the
wet class was misclassified as
hazard. Such errors were found to occur when ice and snow on the road had nearly melted, but localized snow was still observed along gutters or adjacent sidewalks. In this study, road surfaces were classified as
hazard only when snow was present on the driving lane. However, during the model training process, snow present near gutters was also included in the learning, which likely contributed to these misclassifications. To reduce such errors and enhance model performance, it would be beneficial in the data preprocessing stage to define the target area with a focus on the driving lane or wheel paths, while excluding regions outside the snow removal zone from training data.
Table 7 summarizes the types of prediction errors for each model by pavement type and by whether artificial lighting interference was present. In this study, more prediction errors were observed on concrete pavements; however, this result alone is insufficient to conclude that RSC prediction accuracy is lower on concrete surfaces. This is because factors such as snow removal promptness, traffic volume, and snowfall frequency vary across actual field sites. In this study, as shown in
Figure 10 through
Figure 12, slushy road surfaces were frequently observed at concrete pavements sites. This appears to have contributed to the greater frequency of prediction errors on concrete surfaces. Similarly, the frequency of interference from artificial lighting can vary depending on site-specific factors such as traffic volume (e.g., the frequency of vehicle headlights) and streetlight density. Therefore, based on the summary in
Table 7, it is not appropriate to conclude that prediction errors were fewer in cases involving artificial lighting interference. However, during the model development stage, stricter outlier removal should be applied in the preprocessing phase to ensure that artificial lighting interference does not adversely affect model training. Furthermore, when applying the model in practice, it is advisable to avoid predicting RSCs using images captured during vehicles are passage.
Through the analysis of prediction errors, practical issues such as data imbalance and outliers were identified at certain sites. Nevertheless, all models achieved an overall accuracy exceeding 98%, demonstrating strong performance. In particular, the third model achieved the highest accuracy and posed the lowest risk of decision-making errors. From the perspective of road managers, the model trained on the third database is considered the most applicable.