Container Truck High-Risk Events Prediction and Its Influencing Factors Analyses Based on Trajectory Data

Zhu, Zhihao; Meng, Yuan; Cheng, Rongjun

doi:10.3390/systems13050326

Open AccessArticle

Container Truck High-Risk Events Prediction and Its Influencing Factors Analyses Based on Trajectory Data

by

Zhihao Zhu

,

Yuan Meng

and

Rongjun Cheng

^*

Faculty of Maritime and Transportation, Ningbo University, Ningbo 315211, China

^*

Author to whom correspondence should be addressed.

Systems 2025, 13(5), 326; https://doi.org/10.3390/systems13050326

Submission received: 25 February 2025 / Revised: 23 April 2025 / Accepted: 25 April 2025 / Published: 27 April 2025

(This article belongs to the Section Systems Practice in Social Science)

Download

Browse Figures

Versions Notes

Abstract

With the prosperity of the economy and the continuous expansion of the port area, container trucks have become the main means of transportation on port roads. Traditional traffic flow research mainly focuses on passenger cars. In view of the unique characteristics of container truck traffic flow and the lack of research on conflict-influencing factors for this traffic flow, this paper is committed to filling this research gap. This paper uses drones and YOLOv8 technology to construct a vehicle trajectory dataset in the container truck traffic flow scenario and extracts relevant features of container truck traffic flow from vehicle trajectory data from a macro perspective. For the trajectory data after denoising, the time to collision (TTC) indicator is used to identify conflict events, and then the synthetic minority oversampling technique (SMOTE) is used to obtain four datasets. Machine learning and related classification models are selected for conflict prediction. It is worth noting that the XGBoost model performs better than other models on the four datasets, with an accuracy of 0.86 and an AUC value of 0.933. The Shapley additive explanation (SHAP) theory is used to explain and analyze the model results and compare them with existing studies. The results show that in container truck traffic flow, traffic density is the most important factor affecting conflicts, and conflicts occur more frequently when the traffic density is between 50 and 70 vehicles/km, followed by lane change rate. In contrast, for general traffic flows, studies have shown that speed is the main factor affecting conflicts.

Keywords:

real-time safety analysis; container truck traffic flow; high-resolution trajectory dataset; explainable machine learning; rear-end conflict

1. Introduction

Container truck transportation has an important impact on economic growth, and accidents involving container trucks often result in greater casualties and economic losses [1,2]. Therefore, it is necessary to conduct real-time safety analysis of container truck traffic flow. Traditional traffic safety analysis relies on collision data [3] but collision accidents are rare and it takes a long time to collect enough samples to conduct safety analysis and evaluation of a location. Surrogate safety measures (SSM) are a proactive method that has been advocated for by many researchers [4,5]. They identify potential dangers by evaluating the spatial and temporal proximity of traffic participants. Conflicts occur more frequently than collisions. These surrogate safety measures can help researchers conduct traffic safety analysis in a timely manner.

Emerging networked vehicles integrate many functional units (on-board units, inertial measurement units, GPS, wireless communication modules, sensors, etc.) [6] and combine with some detectors to obtain data with high accuracy and strong real-time performance [7,8]. However, it is very difficult to conduct real-time safety analysis for some special traffic flow scenarios (container truck traffic flow near ports) and some sections that are not equipped with multiple detectors. Drones provide a new way to record video data sources. Drones have advantages such as a bird’s-eye view and high resolution, which makes the collected video data less distorted and can be used to study some difficult, special scenes. Using computer vision technology (CV) to extract high-resolution vehicle trajectories from high-definition video sets and combining them with surrogate safety measures can obtain a large amount of traffic conflict data, helping researchers to conduct traffic safety analysis on these sections in a timely manner.

Conflict prediction usually uses statistical models and machine learning models. Although traditional statistical models provide predictive capabilities to a certain extent, their limitation is that they are difficult when handling complex nonlinear relationships and high-dimensional data [9]. Machine learning models such as decision trees, random forests, and neural networks can extract deeper features from massive traffic data and provide more accurate prediction results [10]. However, machine learning models are often considered to be black boxes, and it is difficult to explain the direct relationship between input and output. Recently, several methods have been proposed to explain the results of machine learning models, such as local interpretable model-agnostic explanations (LIME) and SHAP, which are relatively mainstream methods [11].

This paper conducts a real-time traffic safety analysis of container truck traffic flow scenarios from a macro perspective and compares the results with previous studies. The video data are captured by drones and processed through a series of trajectory data. The conflict data are obtained by identifying key vehicle interactions and calculating the TTC. Multiple machine learning models are used for conflict prediction, and the impacts of the original dataset and the oversampled dataset on the model performance are compared. Finally, SHAP is used to interpret the results of the best-performing model to explore how the selected features affect the occurrence of traffic conflicts. The main contributions of this paper are shown as follows:

(1): Referring to the construction process of the HighD dataset, this paper uses drones and YOLOv8 technology to create a natural trajectory dataset of container truck traffic flow, which contains a total of 24,038 trajectories of about 40 h.
(2): Aiming at the low level of vehicle intelligence and imperfect road infrastructure in container truck traffic flow, a method framework from data collection to real-time safety analysis is proposed.
(3): Multiple classification models were used for conflict prediction. The best performing model was XGBoost, with an accuracy of 0.86 and an AUC value of 0.933. SHAP was used to carefully interpret the results of the XGBoost model and analyze how the selected feature variables specifically affect the conflict.
(4): The research on conflict prediction in recent years was compared to this paper, and the differences between container truck traffic flow and traditional traffic flow were concluded in terms of conflict influencing factors.

The rest of this paper is organized as follows: Section 2 is a literature review on SSM and traffic conflict prediction methods. Section 3 introduces the applied methods. Section 4 details the process of constructing trajectory data and data processing. Section 5 presents the modeling results and interprets the results using SHAP analysis, comparing the existing research with this study. Section 6 concludes the paper.

2. Literature Review

In the research on the traffic safety of container trucks, most studies focus on the factors affecting the severity of container truck accidents [1]. This is not very meaningful for proactively preventing accidents. Therefore, using traffic conflict technology to conduct safety analysis of container truck traffic flow or develop real-time conflict prediction models are some more proactive methods. The development of real-time conflict prediction models needs to refer to research in the fields of intelligent connected vehicles and fleet control, because an intelligent connected environment is needed to deploy real-time conflict prediction models. When a conflict is predicted, the fleet or vehicle needs to be controlled to avoid the conflict. Some scholars have already conducted research in this area [12]. Traffic safety research based on conflict data mostly relies on high-resolution vehicle trajectory datasets. Existing datasets are those such as NGSIM [13], HighD [14], pNEUMA [15], CitySim [16], etc. The vehicle types in these datasets are mainly cars, and there is a lack of a vehicle trajectory dataset whose main vehicle type is container trucks. In recent years, there have been many studies on the traffic safety of general traffic flow [9,17,18], which also confirms the necessity of conducting safety research on container truck traffic flow.

In recent traffic safety research, SSM has become an important indicator for evaluating road traffic safety. Although alternative safety indicators can reflect potential collision risks to a certain extent, it is still questionable whether they can replace collision data for safety assessment. Researchers have made some progress in verifying the effectiveness of traffic conflicts in predicting collisions. They conceptualized the relationship between traffic conflicts and collisions through causality to illustrate that traffic conflicts can lead to collisions [19,20]. Some scholars have also developed some statistical models to verify the relationship between traffic conflicts and collisions. There is a significant relationship [21,22], and these studies lay a theoretical foundation for the effectiveness of SSM. Since traffic conflict events occur much more frequently than collision events, a large amount of data can be obtained in a short time, which meets the needs of researchers for timely traffic safety analysis and the development of real-time conflict prediction models. SSM has been studied. It is widely used by personnel [9,18,23]. SSM is mainly divided into two categories: time indicators such as TTC and post encroachment time (PET), etc., and nontime indicators such as deceleration rate to avoid a crash (DRAC), among which the TTC index is a commonly used indicator for security assessment [9,24,25]. For different scenarios, there are different opinions on the selection of the TTC threshold. Even for similar scenarios, different thresholds have been proposed. The commonly used thresholds are between 1–4 s [26]. Some studies pointed out that a threshold of 4 s would lead to too many false alarms, while a threshold of 3 s produced the fewest false alarms in most cases. Later, more studies also recommended that the threshold be selected as 3 s [27,28,29].

At present, data-driven methods are mainly used in traffic conflict prediction and analysis. Data-driven methods can be divided into mainstream methods such as statistical analysis, machine learning, and deep learning. Machine learning and deep learning methods are widely used because of their easy implementation and high efficiency [9]. Conflict prediction models mainly use binary classification-based models; that is, the output of the model is a binary event; that is, a conflict occurs or does not occur. For binary classification-based models, many studies have shown that related models of machine learning and deep learning have shown excellent comprehensive performance. For example, support vector machines can effectively process high-dimensional data and extract some more complex traffic features, requiring only a few parameters for tuning [30,31,32]. Random forest (RF) integrates multiple decision trees with high accuracy and robustness and can effectively process high-dimensional data and perform feature importance evaluation [18,33,34]. XGBoost has been frequently used in recent years due to its high prediction accuracy, ability to effectively handle unbalanced data, and ability to effectively reduce overfitting problems when combined with regularization techniques [32,35,36]. Long short-term memory networks (LSTM) can capture temporal dependencies and automatically extract features and have good robustness for predicting sequence data. They are widely used in real-time traffic conflict prediction [37,38]. Although machine learning and deep learning models have good performance, they do not have the ability to explain how variables affect traffic conflicts. This has also promoted the development of local explanatory models, of which LIME and SHAP are the most used methods [38,39,40]. LIME generates weighted datasets around target data points and trains simple models to approximate the local behavior of complex models. It focuses on the interpretation of single data points and has low computational complexity [41]. SHAP is based on the Shapley value in game theory, calculates the contribution of each feature to model prediction, provides more consistent and reliable explanations, and is suitable for local and global analysis, but has high computational complexity [42]. Therefore, this study selected SHAP as a local explanatory model to conduct traffic safety analysis on container truck traffic flow.

3. Methodology

First, YOLOv8 is used to extract trajectory data from the video taken by the drone. Then, TTC is calculated by identifying the key interactions between vehicles and the TTC threshold is set to identify potential conflict events. Then, conflict samples are extracted from a macro or micro perspective, including the extraction of some variables. After the data processing is completed, it is input into the machine learning model for training and the optimal model is obtained through test set evaluation. Finally, the SHAP method is used to interpret the results of the machine learning model. Figure 1 is the overall framework diagram of the method. Figure 1a shows the process from data collection to data processing and feature extraction. The data collection and data extraction parts will be explained in detail in Section 3.1, and the feature extraction part will be explained in detail in Section 4.2. Figure 1b shows the process of analyzing the conflict samples, in which the introduction of machine learning methods is in Section 3.2 and the introduction of SHAP theory is in Section 3.4.

3.1. Data Collection Methods

3.1.1. Video Data Capture

The use of drones to shoot video data has the advantages of high resolution and being less prone to obstruction, but it is best to shoot in windless and sunny weather so that the quality of the obtained video data will be higher. The videos after shooting are manually checked to see if there are any abnormal videos, such as large image offset and severe image shaking. After eliminating abnormal videos, antishake and stabilization processing is performed.

3.1.2. Object Detection

For the target detection task, this paper adopts the transfer learning method, loads YOLOv8x.pt as a pretrained model, and collects enough training data by annotating the vehicles to train a model with a better vehicle detection effect.

3.1.3. Trajectory Extraction

When using YOLOv8 to extract trajectory data, follow the steps below. First, you need to load some basic information, including the path of video input, the path of trajectory data output, the weight of the target detection model, and the constants and category mapping dictionary for converting pixel coordinates into actual coordinates. Since the category name of the output of the YOLOv8 detection box is a number, it is necessary to map the number name to the category name. Then traverse all video files and perform frame-by-frame target detection and tracking on each video. The properties of the YOLOv8 detection box include the coordinates of the top left corner and the lower right corner of the detection box, the length and width of the detection box, and the category of the target. With the coordinates of the detection box, the center coordinates of the target can be calculated, and these data are recorded every other sampling frequency. Finally, all video files are traversed, and the trajectory data are output.

3.2. Machine Learning Methods

This paper uses machine learning methods to conduct traffic safety analysis on conflict data, predict whether a conflict will occur, and analyze which factors affect the occurrence of conflicts. The scikit-learn package in Python3.11.8 is used for analysis, which provides different machine learning models. The models we use cover single algorithms and integrated algorithms. Specifically, logistic regression [43] is a binary classification algorithm. LR converts the output of linear regression into probability through the Sigmoid function. Support vector machines [44] can be used for classification and regression tasks. SVM maps data to a high-dimensional space and finds an optimal hyperplane for classification. RF [45] and XGBoost [46] are both integrated learning algorithms based on decision trees. They can efficiently process large-scale data, perform feature selection, and effectively reduce overfitting through different mechanisms. RF is a bagging algorithm, and XGBoost is a boosting algorithm. These machine learning methods have been widely used in traffic safety research.

3.3. Model Evaluation

Five evaluation criteria are used to evaluate the performance of machine learning model classification tasks, namely accuracy (ACC), false negative rate (FNR), and false positive rate (FPR). These three indicators are calculated based on the confusion matrix and can be calculated by Equations (1)–(3). A higher value of ACC indicates a better performance of the model, while lower values of FNR and FPR mean better model performance. The other two indicators are the receiver operating characteristic curve (ROC) and the area under the ROC curve (AUC), which were also used to evaluate model performance [47]. ROC is used to show the performance of the classification model at different decision thresholds. It depicts the relationship between the true positive rate (TPR) and the FPR. The TPR can be calculated by Equation (4). The range of AUC values is between 0 and 1. The closer to 1, the better the performance of the model.

A C C = \frac{T P + T N}{T P + F P + F N + T N}

(1)

F N R = \frac{F N}{T P + F N}

(2)

F P R = \frac{F P}{T N + F P}

(3)

T P R = \frac{T P}{T P + F N}

(4)

where

T P

is the correct prediction of the positive class,

T N

is the correct prediction of the negative class,

F N

is the incorrect prediction of the negative class as the positive class, and

F P

is the incorrect prediction of the positive class as the negative class.

3.4. Shapley Additive Explanation

Since it is difficult for machine learning models to explain the importance of each feature to the model output, we introduce SHAP to explain the results of machine learning models. SHAP has been used to explain machine learning models [38]. It provides a consistent and fair way to assign the contribution of features to predictions based on the Shapley value in game theory. The SHAP value measures the importance of each feature by calculating its marginal contribution in different combinations, thereby explaining the output of the model. The positive or negative value means that the feature has a positive or negative effect on the prediction. The principle [42] is expressed by the following equation:

g (x^{'}) = ϕ_{0} + \sum_{j = 1}^{M} ϕ_{j} z_{j}^{'}

(5)

where

g (x^{'})

represents the predicted value of the sample in the model;

ϕ_{0}

is a constant; and

z_{j}^{'} \in {\{0, 1\}}^{M}

represents how many features among the

M

features are included in the decision path where the sample is located. For a certain sample, if feature

i

is not in its decision path, then

ϕ_{i} = 0

, indicating that the feature does not contribute to the final predicted value.

4. Container Truck Dataset

4.1. Dataset Introduction

This dataset is a collection of high-definition images of vehicles on the road section of Ningbo Meishan Island taken by drones. The total video length is about 40 h, the resolution is 1920 × 1080, and the frame rate is 30 frames per second. The dataset contains vehicle images at different time periods and has the characteristics of high resolution and real scenes. Since Ningbo Meishan Island is an important port and logistics center, almost 80–90% of the vehicles on the road are container trucks, so this dataset also has the characteristics of special traffic flow. Figure 2 is a top view of a road section in a container truck traffic flow scenario.

YOLOv8 is used to extract vehicle trajectories from high-resolution videos. YOLOv8 shows significant advantages in extracting vehicle trajectories. First, it uses advanced network architecture and training strategies to achieve high-precision detection; second, it has strong real-time processing capabilities. In addition, it shows extremely high robustness under different lighting conditions and complex traffic scenarios, and can identify vehicles and other traffic participants. These features make subsequent trajectory extraction and analysis easier and more efficient. Before using YOLOv8 to extract vehicle trajectories, we applied the optical flow method to each video segment for antishake processing.

After the video data are processed, the vehicles in the video are annotated, and a total of 1205 images are annotated. Then, the pretrained model yolov8x.pt is loaded for model training. The model training results are shown in Figure 3. The average precision (mAP@0.5) of all categories is 0.946, which is a high value, indicating that the overall performance of the model is good. Figure 4 shows the detection effect of the vehicle on the test set. The detection effect and classification accuracy of the vehicle are relatively good. Before using the trained model to extract trajectory data, the pixel coordinates need to be converted to actual coordinates. According to actual measurements, the actual length of the “bigtruck” model is 16.5 m and the width is 2.5 m. The output information of the YOLOv8 detection frame includes the coordinates of the upper left corner and lower right corner of the detection frame as well as the length and width of the detection frame. Since the shooting angle of the drone is a bird’s-eye view, the pixel coordinates output by the detection frame are combined to perform coordinate conversion operations, calculate the actual distance represented by one pixel in the x and y directions, and enter these constants into the trajectory generation code to complete the extraction of trajectory data.

As shown in Figure 5, the vehicle position is recorded using the global coordinate system. The origin of the coordinate is set at the upper left corner of the road section. The x coordinate increases as it moves to the right, and the y coordinate increases as it moves down the road. YOLOv8 is used to extract vehicle trajectory information. The original trajectory data includes the frame time, ID, type of target vehicle, center coordinates of the target vehicle, and their respective lengths and widths. Based on this, the speed and acceleration of the target vehicle in the X and Y directions are calculated, as shown in Table 1.

This paper compares the dataset with some public datasets collected by drones, as shown in Table 2, comparing the frame rate, shooting time, and shooting scenes.

Table 3 shows the number of vehicle trajectories in the dataset, a total of 24,038, of which 4323 are cars, accounting for 0.18%, and the other types are trucks. “bigtruck” represents container trucks, “littletruck” and “truck” represent container trucks with smaller vehicle sizes, “no_container” represents a truck without a container, and “half” represents a truck with only half a container. The total proportion of trucks is about 82%, which confirms the particularity of this dataset.

Figure 6 shows part of the vehicle trajectory data after data extraction and exponential smoothing processing. There are lane changes and straight driving situations. Figure 6c shows the situation of changing two lanes and Figure 6d shows the situation of changing one lane.

4.2. Data Processing

4.2.1. Identifying Key Vehicle Interactions

Assume that any vehicle located in front of the target vehicle and overlapping with its virtual strip poses a potential collision risk. When multiple vehicles simultaneously occupy the forward strip, the vehicle closest to the target vehicle should be selected. Figure 7 illustrates the concepts of lateral overlap, longitudinal gap, and virtual strip [48]. The longitudinal gap (

g_{x}

) between two vehicles refers to the net gap measured along the direction of movement, which is the distance between the front end of the target vehicle and the rear end of the interacting vehicle. The lateral overlap (

g_{y}

) represents the degree of width overlap between the interacting vehicle and the target vehicle, defined as the minimum distance between the relative edges of the two vehicles. The longitudinal gap (

g_{x}

) and lateral overlap (

g_{y}

) can be calculated using the following equations:

g_{x} = x_{i} - x_{s} - \frac{l_{i}}{2} - \frac{l_{s}}{2}

(6)

g_{y} = |y_{i} - y_{s}| - \frac{w_{i}}{2} - \frac{w_{s}}{2}

(7)

where (

x_{s}

,

y_{s}

) is the center coordinates of the subject vehicle, (

x_{i}

,

y_{i}

) is the center coordinates of the interacting vehicle,

w_{s}

is the width of the subject vehicle,

w_{i}

is the width of the interacting vehicle, and

l_{i}

,

l_{s}

are the lengths of the interaction vehicle and the subject vehicle, respectively.

If the path of the subject vehicle overlaps with another vehicle, there is a potential risk of collision. On this basis, the vehicle closest to the subject vehicle is the key interaction vehicle. In other words, when a vehicle satisfies the lateral overlap width less than zero and the longitudinal gap is the smallest and greater than zero, this vehicle has the most critical interaction with the subject vehicle. This method can effectively identify the key interactions between vehicles in the vehicle trajectory data.

4.2.2. Conflict Event Identification

This study uses TTC as the identification indicator of conflict events. Existing studies have shown that the TTC threshold is usually between 1.0 and 3.0 s [49]. Referring to the relevant literature [28,29], 3 s was finally selected as the threshold for conflict event identification. Figure 4 is a schematic diagram of the identification of key interactions between front and rear vehicles. According to this scenario, TTC can be calculated by Equation (7) [9].

\{\begin{cases} T T C = \frac{g_{x}}{v_{s} - v_{i}}, v_{s} \neq v_{i} \\ T T C = \infty, v_{s} = v_{i} \end{cases}

(8)

where

g_{x}

is the longitudinal gap, and

v_{s}

v_{i}

are the speeds of the target vehicle and the interacting vehicle, respectively. This is only used to calculate rear-end collisions.

4.2.3. Feature Extraction

To analyze which factors play a key role in the formation of conflict events, it is necessary to extract traffic-flow-related features. As shown in Figure 8, traffic flow features and traffic conflict indicators are extracted from trajectory data at intervals of 1 min. If the TTC value of two vehicles within 1 min satisfies less than 3 s, it is considered that a conflict event has occurred.

In addition to traffic flow characteristics, since this dataset is a special traffic flow, the proportion of cars in a 1-min time interval is also extracted. In addition, since this section is close to the entrance of the intersection, the proportion of different lane types in the time interval is also extracted, namely the proportion of left-turning, straight-going, and right-turning vehicles, as well as the proportion of vehicles changing lanes. Finally, with a time interval of 1 min and a TTC threshold of 3 s, a total of 2071 samples were extracted, of which 1146 were conflict-free samples and 925 were conflict samples.

Table 4 provides descriptive statistics for the selected variables and conflict data. From these features, the text selects a feature from a macro perspective. The first three variables are the basic characteristics of the traffic flow, traffic volume, traffic density, and average vehicle speed, which have significant differences between the minimum and maximum values of these three characteristics. This is because the road section is close to the intersection and is affected by signal control. The traffic flow will show a regular process of vehicle gathering and evacuation, which is a reasonable phenomenon. The mean value of the lane change ratio is 0.07, indicating that there are fewer vehicles changing lanes on this road section. The mean value of the car ratio is 0.18, which shows the particularity of traffic flow; almost 80–90% of the vehicle types are trucks. The remaining three features are calculated based on the features of the adjacent intersection. They have a certain correlation with each other, and the range of the proportional feature values is 0–1. There are slightly more cases of no conflict in the conflict samples, but the number of conflict samples is not small. In addition, combined with the conflict frequency, the maximum number of conflicts within 1 min is 8 times, indicating that conflicts are relatively common. This may be the result of the special scene of the adjacent intersection and the special truck traffic flow.

5. Results

5.1. Results of the Model

Table 5 shows the composition of the four datasets, which are numbered A, B, C, and D. The A dataset represents the original dataset. Studies have shown that using SMOTE to balance the dataset can improve the model’s effect. The B dataset is obtained by SMOTE processing the original dataset. Some studies have suggested that the model fitting effect will be better when the ratio of conflict samples to nonconflict samples is set to 1:4 [17]. Therefore, we randomly extracted conflict samples from the original dataset to obtain the C dataset. The D dataset is obtained by SMOTE processing the C dataset. Each dataset uses 80% as a training set and 20% as a test set. The process of dividing the dataset is random.

Table 6 lists the experimental parameter settings of the XGBoost model for the four datasets. The experimental parameters are a set of parameters representing the best model performance obtained through cross-grid optimization.

Table 7 shows the prediction accuracy, FPR, and FNR of the four models for datasets A and B. For the original samples, the prediction accuracy of the four models ranges from 0.7 to 0.77, the FPR is roughly between 0.2 and 0.3, and the FNR is between 0.25 and 0.42. Table 8 shows the prediction accuracy, FPR, and FNR of the four models for datasets C and D. It can be seen that after setting the ratio of conflict samples to nonconflict samples to 1:4, the prediction accuracy of the model is improved to about 0.8, and the false alarm rate is reduced to a very low level, but an extremely high FPR occurs, indicating that the model predicts most conflict situations as nonconflicts, which may be due to the fact that conflict samples only account for a small part in dataset C. When facing unbalanced datasets, the performance of the LR model seems to be better than other models, which may be due to the adaptability of the LR model to unbalanced data, relatively simple decision boundaries, training efficiency, and good threshold adjustment capabilities. Compared with dataset C, dataset D obtains better model results. Although the FPR increases, the FNR decreases significantly. When the SMOTE method is used to balance the dataset, the FPR generally increases and the FNR generally decreases. This is because the SMOTE method balances the data by generating minority class samples, thereby improving the model’s prediction performance for conflict situations, although this may result in more false positives (i.e., an increase in FPR). Overall, the performance of the model is improved after the dataset is balanced.

Figure 9 shows the ROC curve and AUC of each model in different datasets. The results show that the overall performance of the model is improved after SMOTE processing, regardless of dataset A or dataset C. Among these models, XGBoost has the best overall performance of the four datasets. For the original dataset, its prediction accuracy reaches 0.77, FPR and FNR are also relatively good, and the AUC value is 0.833, which is better than other models. In addition, XGBoost has the best performance in dataset D, with a prediction accuracy of 0.86, FPR and FNR of 0.14 and 0.13, respectively, and an AUC value of 0. 933. In general, the XGBoost model has good performance on different datasets, so the classification results of the XGBoost model on dataset A are explained and analyzed.

5.2. Discussion

5.2.1. Interpretation of Model Results

The XGBoost model can rank the importance of features in the classification task, thereby evaluating the contribution of each feature to the model’s prediction results. As shown in Figure 10, XGBoost ranks the importance of features for the classification task of dataset A. The lane change ratio feature has the highest overall contribution to the model prediction, followed by the spatial average speed and traffic density features, with little difference in contribution. The proportion of cars and traffic flow has certain contributions. It is obvious that the overall contribution of vehicles turning left, going straight, or turning right to the model is relatively low compared to other features. The feature importance of the XGBoost model explains the model from a global perspective and cannot reveal the specific impact of features on individual samples.

SHAP is used to explain the model’s results. SHAP can not only provide a local explanation of the model but can also obtain a global feature importance ranking by summarizing multiple samples, which perfectly solves the poor interpretability of machine learning models. Figure 11 shows a summary of the SHAP value of the XGBoost model for dataset A. Each feature is sorted by importance, and the higher the feature, the more important it is. The vertical line in the middle separates the positive and negative effects on the prediction. The right side of the vertical line is positive and the left side is negative. The red and blue points represent the high and low feature values.

Traffic density is the most important factor. Higher traffic density has a positive impact on the occurrence of conflicts. Since the research section is close to the intersection, when the traffic density is high, the traffic light is usually red and the vehicle needs to brake to avoid a collision, so it is more likely to cause conflicts. Similarly, when the average speed is lower, conflicts are more likely to occur. A higher lane change rate is also more likely to cause conflicts. There is a problem of information asymmetry in lane-changing behavior. Other drivers may not be able to obtain the lane change intention in time and need a certain amount of time to react and take braking measures. The lower the proportion of cars, the more likely conflicts will occur. A low proportion of cars means a high proportion of trucks. Trucks are prone to traffic conflicts due to factors such as wide bodies, limited vision, and long braking distances.

Figure 12 shows the interaction diagram of some important features. Figure 12a shows that when the average speed is high, there are fewer vehicles turning left. This is because vehicles turning left often involve turning and changing lanes, and their speed is generally low. Figure 12b shows that when the traffic density is higher and the proportion of cars is lower, conflicts are more likely to occur. Figure 12c shows that when the average speed is high, even if the proportion of cars is low, the probability of conflict will decrease. This may be because the traffic flow is relatively smooth and most vehicles are running at a stable speed without the need for deceleration and braking.

5.2.2. Comparison with Existing Studies

Furthermore, the comparison between this study and the existing conflict risk research is shown in Table 9. The vehicle trajectory datasets used are diverse. Even for the same dataset, the selected variables are different, and the analysis angles also involve both macro and micro aspects. But the common point of these datasets is that the proportion of cars accounts for the majority. The dataset used in this paper is the container truck traffic flow, and the proportion of trucks accounts for between 80% and 90%. Then, from a macro perspective, relevant traffic operation characteristics are extracted from the vehicle trajectory data. In conflict analysis, machine learning methods and traditional statistical analysis methods are not simply used. Machine learning methods are used in combination with SHAP theory to analyze the relationship between these characteristics and conflicts.

By comparison, it is found that the factors that have the greatest impact on conflicts in traffic flows with a large proportion of cars are basically related to speed, whether in micro or macro characteristics. However, this study finds that in container truck traffic flows, the factor that has the greatest impact on conflicts is traffic density. Traffic density and lane change rate have a relatively large impact on conflicts. When the traffic density is large, it means that the distance between vehicles is small, and a small distance will make conflicts more likely to occur. By comparison, it is found that in container truck traffic flows, the vehicle lane change rate is relatively high, which may be caused by the frequent lane changes of vehicles other than trucks. Because trucks have wide bodies and slow driving speeds, they are easy to block the line of sight, which will bring psychological pressure to other small car drivers and other factors, which will prompt other vehicle drivers to frequently change lanes to go straight and overtake. The average speed feature is not as influential as the previous two features in container truck traffic flows, but it is also a relatively important factor. When the average speed is relatively high, conflicts are less likely to occur, which is consistent with the rules obtained by using other trajectory datasets to analyze from a macro perspective. In container truck traffic flows, this paper selects the proportion of cars as a variable. The analysis finds that the higher the proportion of cars, the less likely conflicts will occur, which is basically consistent with the conclusions of other studies.

6. Conclusions

This paper proposes an overall framework from data collection to traffic safety analysis in special scenarios. It uses drones to record natural vehicle driving videos in container truck traffic flow environments, uses YOLOv8 to detect vehicles and extract trajectory data, combines alternative safety measures to screen out conflict data, extracts traffic features from a macro perspective, uses machine learning and other methods to predict conflicts, and uses SHAP theory to explain the results of the model, explores which features have an important impact on conflicts, and compares the similarities and differences between the conclusions drawn in container truck traffic flow scenarios and general traffic flows. The main conclusions of this paper are as follows:

(1): The XGBoost model performed best in all datasets, and the model after SMOTE processing had the best performance, with a prediction accuracy of 0.86 and an AUC value of 0.933.
(2): In the container truck traffic flow scenario, traffic density and lane change ratio are the two main factors causing conflicts, both of which have a positive impact on the occurrence of conflicts; that is, the greater the traffic density and the more lane-changing vehicles, the more likely conflicts will occur. The average speed and car ratio have a negative impact on the occurrence of conflicts; that is, the higher the speed and the higher the car ratio, the less likely conflicts will occur. In addition, based on several characteristics of the lane, the results show that the impact on the occurrence of conflicts is small.
(3): Compared to general traffic flow, the traffic density of container truck traffic flow has the greatest impact on conflicts. Relevant studies in recent years have shown that in general traffic flow, regardless of whether the features are extracted from a micro or macro perspective, the features that have the greatest impact on conflicts are related to speed. Compared to general traffic flow, truck traffic flow has more lane changes. Therefore, the impact of lane changes on conflicts in container truck traffic flow is second only to traffic density.

It should be acknowledged that this study also has certain limitations. First, the video dataset is not collected continuously but is stacked into a total of about 40 h of video by collecting two hours of video every day. Second, the length of the road section in the study area is only about 150 m. In addition, feature extraction is carried out from a macroscopic perspective, without conducting a safety analysis from a microscopic perspective. The microscopic perspective has higher requirements for data quality. In the future, we can look for better data processing methods, such as trajectory reconstruction, rather than just using simple smoothing. For conflict data, only longitudinal conflicts are considered, not lateral conflicts. Conflicts in a two-dimensional space deserve further study. The application of this method also has certain limitations. It must be taken in windless and clear weather to ensure data quality.

Author Contributions

Z.Z.: Conceptualization, Methodology, Writing—original draft. Y.M.: Validation, Data curation, Formal analysis. R.C.: Investigation, Writing—review and editing, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Program of Humanities and Social Science of Education Ministry of China (Grant No. 24YJA630013) and the APC was funded by the Ningbo Natural Science Foundation of China (Grant No. 2024J125).

Data Availability Statement

The data will be made available on reasonable request.

Acknowledgments

This work is supported by the Program of Humanities and Social Science of Education Ministry of China (Grant No. 24YJA630013) and the Ningbo Natural Science Foundation of China (Grant No. 2024J125).

Conflicts of Interest

The authors report no conflicts of interest.

References

Xu, C.; Ozbay, K.; Liu, H.; Xie, K.; Yang, D. Exploring the impact of truck traffic on road segment-based severe crash proportion using extensive weigh-in-motion data. Saf. Sci. 2023, 166, 106261. [Google Scholar] [CrossRef]
Li, L.; Lyu, H.; Wang, T.; Cheng, R. STdi4DMPC: Distributed Model Predictive Control for Connected and Automated Truck Platoon with Mixed Traffic Flow Based on Spatiotemporal Trajectory Prediction. IEEE Trans. Veh. Technol. 2024, 73, 14563–14579. [Google Scholar] [CrossRef]
Nabavi Niaki, M.S.; Fu, T.; Saunier, N.; Miranda-Moreno, L.F.; Amador, L.; Bruneau, J.-F. Road Lighting Effects on Bicycle and Pedestrian Accident Frequency: Case Study in Montreal, Quebec, Canada. Transp. Res. Rec. 2016, 2555, 86–94. [Google Scholar] [CrossRef]
Fu, C.; Sayed, T. Identification of adequate sample size for conflict-based crash risk evaluation: An investigation using Bayesian hierarchical extreme value theory models. Anal. Methods Accid. Res. 2023, 39, 100281. [Google Scholar] [CrossRef]
Zheng, L.; Sayed, T.; Mannering, F. Modeling traffic conflicts for use in road safety analysis: A review of analytic methods and future directions. Anal. Methods Accid. Res. 2021, 29, 100142. [Google Scholar] [CrossRef]
Ji, Q.; Lyu, H.; Yang, H.; Wei, Q.; Cheng, R. Bifurcation control of solid angle car-following model through a time-delay feedback method. J. Zhejiang Univ.-Sci. A 2023, 24, 828–840. [Google Scholar] [CrossRef]
Abdel-Aty, M.; Wang, Z.; Zheng, O.; Abdelraouf, A. Advances and applications of computer vision techniques in vehicle trajectory generation and surrogate traffic safety indicators. Accid. Anal. Prev. 2023, 191, 107191. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Cheng, R. A real-time adaptive signal control method for multi-intersections in mixed connected vehicle environments. J. Zhejiang Univ.-Sci. A Appl. Phys. Eng. 2025, 1, 189. [Google Scholar] [CrossRef]
Li, D.; Fu, C.; Sayed, T.; Wang, W. An integrated approach of machine learning and Bayesian spatial Poisson model for large-scale real-time traffic conflict prediction. Accid. Anal. Prev. 2023, 192, 107286. [Google Scholar] [CrossRef]
Sohail, A.; Cheema, M.A.; Ali, M.E.; Toosi, A.N.; Rakha, H.A. Data-driven approaches for road safety: A comprehensive systematic literature review. Saf. Sci. 2023, 158, 105949. [Google Scholar] [CrossRef]
Xie, Y.; Pongsakornsathien, N.; Gardi, A.; Sabatini, R. Explanation of Machine-Learning Solutions in Air-Traffic Management. Aerospace 2021, 8, 224. [Google Scholar] [CrossRef]
Peng, Y.; Liu, D.; Wu, S.; Yang, X.; Wang, Y.; Zou, Y. Enhancing Mixed Traffic Flow with Platoon Control and Lane Management for Connected and Autonomous Vehicles. Sensors 2025, 25, 644. [Google Scholar] [CrossRef]
Kovvali, V.G.; Alexiadis, V.; Zhang, L. Video-Based Vehicle Trajectory Data Collection. Available online: https://trid.trb.org/View/801154 (accessed on 25 January 2017).
Krajewski, R.; Bock, J.; Kloeker, L.; Eckstein, L. The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, MI, USA, 4–7 November 2018; pp. 2118–2125. [Google Scholar] [CrossRef]
Barmpounakis, E.; Geroliminis, N. On the new era of urban traffic monitoring with massive drone data: The pNEUMA large-scale field experiment. Transp. Res. Part C: Emerg. Technol. 2020, 111, 50–71. [Google Scholar] [CrossRef]
Zheng, O.; Abdel-Aty, M.; Yue, L.; Abdelraouf, A.; Wang, Z.; Mahmoud, N. CitySim: A Drone-Based Vehicle Trajectory Dataset for Safety-Oriented Research and Digital Twins. Transp. Res. Rec. J. Transp. Res. Board 2023, 2678, 606–621. [Google Scholar] [CrossRef]
Yu, R.; Han, L.; Zhang, H. Trajectory data based freeway high-risk events prediction and its influencing factors analyses. Accid. Anal. Prev. 2021, 154, 106085. [Google Scholar] [CrossRef]
Hu, Y.; Li, Y.; Huang, H.; Lee, J.; Yuan, C.; Zou, G. A high-resolution trajectory data driven method for real-time evaluation of traffic safety. Accid. Anal. Prev. 2022, 165, 106503. [Google Scholar] [CrossRef]
Davis, G.A.; Hourdos, J.; Xiong, H.; Chatterjee, I. Outline for a causal model of traffic conflicts and crashes. Accid. Anal. Prev. 2011, 43, 1907–1919. [Google Scholar] [CrossRef]
Tarko, A.P. Estimating the expected number of crashes with traffic conflicts and the Lomax Distribution—A theoretical and numerical exploration. Accid. Anal. Prev. 2018, 113, 63–73. [Google Scholar] [CrossRef] [PubMed]
Zheng, L.; Ismail, K.; Meng, X. Traffic conflict techniques for road safety analysis: Open questions and some insights. Can. J. Civ. Eng. 2014, 41, 633–641. [Google Scholar] [CrossRef]
Zheng, L.; Sayed, T. A bivariate Bayesian hierarchical extreme value model for traffic conflict-based crash estimation. Anal. Methods Accid. Res. 2020, 25, 100111. [Google Scholar] [CrossRef]
Orsini, F.; Gecchele, G.; Rossi, R.; Gastaldi, M. A conflict-based approach for real-time road safety analysis: Comparative evaluation with crash-based models. Accid. Anal. Prev. 2021, 161, 106382. [Google Scholar] [CrossRef] [PubMed]
Kamel, A.; Sayed, T.; Fu, C. Real-time safety analysis using autonomous vehicle data: A Bayesian hierarchical extreme value model. Transp. B Transp. Dyn. 2022, 11, 826–846. [Google Scholar] [CrossRef]
Kilicarslan, M.; Zheng, J.Y. Predict Vehicle Collision by TTC From Motion Using a Single Video Camera. IEEE Trans. Intell. Transp. Syst. 2019, 20, 522–533. [Google Scholar] [CrossRef]
Li, Y.; Wu, D.; Lee, J.; Yang, M.; Shi, Y. Analysis of the transition condition of rear-end collisions using time-to-collision index and vehicle trajectory data. Accid. Anal. Prev. 2020, 144, 105676. [Google Scholar] [CrossRef] [PubMed]
Bella, F.; Russo, R. A Collision Warning System for rear-end collision: A driving simulator study. Procedia -Soc. Behav. Sci. 2011, 20, 676–686. [Google Scholar] [CrossRef]
Meng, Q.; Qu, X. Estimation of rear-end vehicle crash frequencies in urban road tunnels. Accid. Anal. Prev. 2012, 48, 254–263. [Google Scholar] [CrossRef]
Qu, X.; Kuang, Y.; Oh, E.; Jin, S. Safety Evaluation for Expressways: A Comparative Study for Macroscopic and Microscopic Indicators. Traffic Inj. Prev. 2014, 15, 89–93. [Google Scholar] [CrossRef]
Mohammadian, S.; Haque, M.M.; Zheng, Z.; Bhaskar, A. Integrating safety into the fundamental relations of freeway traffic flows: A conflict-based safety assessment framework. Anal. Methods Accid. Res. 2021, 32, 100187. [Google Scholar] [CrossRef]
Sun, J.; Sun, J. Real-time crash prediction on urban expressways: Identification of key variables and a hybrid support vector machine model. IET Intell. Transp. Syst. 2016, 10, 331–337. [Google Scholar] [CrossRef]
Yuan, C.; Li, Y.; Huang, H.; Wang, S.; Sun, Z.; Li, Y. Using traffic flow characteristics to predict real-time conflict risk: A novel method for trajectory data analysis. Anal. Methods Accid. Res. 2022, 35, 100217. [Google Scholar] [CrossRef]
Katrakazas, C.; Quddus, M.; Chen, W.H. A Simulation Study of Predicting Real-Time Conflict-Prone Traffic Conditions. IEEE Trans. Intell. Transp. Syst. 2018, 19, 3196–3207. [Google Scholar] [CrossRef]
Orsini, F.; Gecchele, G.; Gastaldi, M.; Rossi, R. Real-time conflict prediction: A comparative study of machine learning classifiers. Transp. Res. Procedia 2021, 52, 292–299. [Google Scholar] [CrossRef]
Li, P.; Abdel-Aty, M.; Cai, Q.; Yuan, C. The application of novel connected vehicles emulated data on real-time crash potential prediction for arterials. Accid. Anal. Prev. 2020, 144, 105658. [Google Scholar] [CrossRef] [PubMed]
Li, P.; Abdel-Aty, M.; Yuan, J. Real-time crash risk prediction on arterials based on LSTM-CNN. Accid. Anal. Prev. 2020, 135, 105371. [Google Scholar] [CrossRef] [PubMed]
Yao, R.; Zeng, W.; Chen, Y.; He, Z. A deep learning framework for modelling left-turning vehicle behaviour considering diagonal-crossing motorcycle conflicts at mixed-flow intersections. Transp. Res. Part C Emerg. Technol. 2021, 132, 103415. [Google Scholar] [CrossRef]
Islam, Z.; Abdel-Aty, M. Traffic conflict prediction using connected vehicle data. Anal. Methods Accid. Res. 2023, 39, 100275. [Google Scholar] [CrossRef]
Gregurić, M.; Vrbanić, F.; Ivanjko, E. Towards the spatial analysis of motorway safety in the connected environment by using explainable deep learning. Knowl. -Based Syst. 2023, 269, 110523. [Google Scholar] [CrossRef]
Madushani, J.P.S.S.; Sandamal, R.M.K.; Meddage, D.P.P.; Pasindu, H.R.; Gomes, P.I.A. Evaluating expressway traffic crash severity by using logistic regression and explainable & supervised machine learning classifiers. Transp. Eng. 2023, 13, 100190. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. Why Should I Trust You? In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA; pp. 1135–1144. [CrossRef]
Lundberg, S. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
Safavian, S.R.; Landgrebe, D. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 1991, 21, 660–674. [Google Scholar] [CrossRef]
Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar] [CrossRef]
Laval, J.A. Hysteresis in traffic flow revisited: An improved measurement method. Transp. Res. Part B Methodol. 2011, 45, 385–391. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
Charly, A.; Mathew, T.V. Estimation of traffic conflicts using precise lateral position and width of vehicles for safety assessment. Accid. Anal. Prev. 2019, 132, 105264. [Google Scholar] [CrossRef] [PubMed]
Jiang, C.; Yin, S.; Yao, Z.; He, J.; Jiang, R.; Jiang, Y. Safety evaluation of mixed traffic flow with truck platoons equipped with (cooperative) adaptive cruise control, stochastic human-driven cars and trucks on port freeways. Phys. A Stat. Mech. Appl. 2024, 643, 129802. [Google Scholar] [CrossRef]

Figure 1. Methodological framework for traffic safety analysis. (a) Framework diagram of data collection, data extraction and data processing. (b) Framework diagram of model training, conflict prediction, and influencing factor analysis.

Figure 2. Container truck traffic flow scene.

Figure 3. Precision–recall curve of the object detection model.

Figure 4. Test set detection results.

Figure 5. Road section recorded by drone.

Figure 6. Display of some vehicle trajectory data.

Figure 7. Illustration of identifying key interaction vehicles.

Figure 8. Feature extraction diagram. The green area in the lower left corner of the image indicates the area where data was collected.

Figure 9. Comparison of ROC and AUC.

Figure 10. XGBoost sorts the feature importance of dataset A.

Figure 11. Summary plot of XGBoost model with SHAP values.

Figure 12. Interaction influence diagrams of some important features.

Table 1. Introduction to raw data.

Raw Data	Description	Unit
Frame	Frame time.	_
ID	Vehicle ID.	_
cls_Name	Vehicle type: defines five different types of trucks as well as a car type.	_
(X_left_top,Y_left_top)	The relative coordinates of the upper left corner of the vehicle.	_
(X_right_bottom,Y_right_bottom)	The relative coordinates of the lower right corner of the vehicle.	_
(X_center,Y_center)	Relative coordinates of the vehicle center.	_
Length	Vehicle length.	m
Width	Vehicle width.	m
X_speed	The vehicle’s speed in the X direction.	m/s
Y_speed	The vehicle’s speed in the Y direction.	m/s
X_acceleration	The acceleration of the vehicle in the X direction.	m/s²
Y_acceleration	The acceleration of the vehicle in the Y direction.	m/s²

Table 2. Dataset comparison.

Dataset	FPS	Duration (Minutes)	Road Type
NGSIM	10	75	Highway
HighD	25	990	Highway
Interaction	10–30	998	Intersection, expressway
CITR and DUT	29.97	18.7	Intersection
SkyEye	50	180	Intersection
InD	25	600	Intersection
RounD	25	360	Roundabout
pNEUMA	25	3540	Freeway
High SIM	30	120	Highway
MAGIC	25	180	Highway
CitySim	30	1200+	Highway, intersections, on/off ramps, weaving sections
This study	30	2270	Road sections near intersections

Table 3. Trajectory data statistical description.

Vehicle Type	Number of Tracks	Proportion
car	4323	0.18
bigtruck	13,218	0.55
littletruck	1215	0.051
truck	346	0.014
no_container	2270	0.094
half	2666	0.111
Total	24,038

Note: Except for car, the other five categories are different types of trucks.

Table 4. Descriptive statistics of variable and conflict data.

	Unit	Description	Mean	Min	Max	Std
Variable
Traffic_flow	veh/min	The number of vehicles passing a certain point on the road within 1 min.	14.69	1	46	5.80
Traffic_density	veh/km	Average traffic flow density within 1 min.	32.17	7.14	75.38	12.85
Space_mean_speed	m/s	The average of space mean speed within 1 min.	3.80	0.05	14.93	2.27
Lane_change_ratio	_	The proportion of vehicles changing lanes within 1 min to traffic flow.	0.07	0	0.63	0.09
Car_ratio	_	The proportion of cars in traffic flow within 1 min.	0.18	0	1	0.17
Turn_left_ratio	_	The proportion of vehicles turning left in 1 min to traffic flow.	0.16	0	1	0.23
Straight_ratio	_	The proportion of vehicles going straight in 1 min to the traffic flow.	0.65	0	1	0.30
Turn_right_ratio	_	The proportion of vehicles turning right in 1 min to traffic flow.	0.19	0	1	0.25
Conflict data
Conflict	_	This is a binary variable; 1 represents conflict and 0 represents no conflict.	0.45	0	1	0.50
Conflict frequency	_	Counted the number of different conflicts that occurred within 1 min.	0.74	0	8	1.06

Table 5. Dataset composition.

Samples	Original	SMOTE	Original (4:1)	SMOTE
Samples	A	B	C	D
y = 0	1146	1146	1146	1146
y = 1	925	1146	286	1146

Note: y = 0 means no conflict, y = 1 means conflict.

Table 6. Experimental parameter settings.

Dataset	Model	n_Estimators	Max_Depth	Learning_Rate	Subsample	Colsample_Bytree
A	XGBoost	400	4	0.1	1.0	0.7
B		300	3	0.3	1.0	0.9
C		500	3	0.01	1.0	0.8
D		400	6	0.1	1.0	0.7

Note: the “random_state” for all models was uniformly set to 42.

Table 7. Comparison of modeling results on datasets A and B.

Dataset		ACC	FPR	FNR
A	LR	0.71	0.19	0.42
	SVM	0.72	0.20	0.40
	RF	0.73	0.23	0.31
	XGBoost	0.77	0.21	0.25
B	LR	0.72	0.25	0.31
	SVM	0.73	0.33	0.22
	RF	0.73	0.27	0.27
	XGBoost	0.76	0.24	0.24

Table 8. Comparison of modeling results on datasets C and D.

Dataset		ACC	FPR	FNR
C	LR	0.81	0.04	0.77
	SVM	0.79	0.02	0.92
	RF	0.80	0.06	0.75
	XGBoost	0.80	0.04	0.80
D	LR	0.72	0.26	0.30
	SVM	0.79	0.24	0.17
	RF	0.84	0.17	0.14
	XGBoost	0.86	0.14	0.13

Note: The evaluation indicators of the model results in Table 4 and Table 5 correspond to the test part of the dataset.

Table 9. Comparing this study with existing conflict risk research.

Authors	Trajectory Data	Variable Selection Perspective
[23]	Data collected by detectors and radar on Italian highways	Macro perspective
[32]	HighD trajectory dataset	Macro perspective
[18]	HighD trajectory dataset	Extract variables based on lanes from a macro perspective
[38]	Connected vehicle dataset provided by Wejo	Microscopic perspective
This study	Container truck traffic flow dataset	Extract variables based on lanes from a macro perspective

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, Z.; Meng, Y.; Cheng, R. Container Truck High-Risk Events Prediction and Its Influencing Factors Analyses Based on Trajectory Data. Systems 2025, 13, 326. https://doi.org/10.3390/systems13050326

AMA Style

Zhu Z, Meng Y, Cheng R. Container Truck High-Risk Events Prediction and Its Influencing Factors Analyses Based on Trajectory Data. Systems. 2025; 13(5):326. https://doi.org/10.3390/systems13050326

Chicago/Turabian Style

Zhu, Zhihao, Yuan Meng, and Rongjun Cheng. 2025. "Container Truck High-Risk Events Prediction and Its Influencing Factors Analyses Based on Trajectory Data" Systems 13, no. 5: 326. https://doi.org/10.3390/systems13050326

APA Style

Zhu, Z., Meng, Y., & Cheng, R. (2025). Container Truck High-Risk Events Prediction and Its Influencing Factors Analyses Based on Trajectory Data. Systems, 13(5), 326. https://doi.org/10.3390/systems13050326

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Container Truck High-Risk Events Prediction and Its Influencing Factors Analyses Based on Trajectory Data

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Data Collection Methods

3.1.1. Video Data Capture

3.1.2. Object Detection

3.1.3. Trajectory Extraction

3.2. Machine Learning Methods

3.3. Model Evaluation

3.4. Shapley Additive Explanation

4. Container Truck Dataset

4.1. Dataset Introduction

4.2. Data Processing

4.2.1. Identifying Key Vehicle Interactions

4.2.2. Conflict Event Identification

4.2.3. Feature Extraction

5. Results

5.1. Results of the Model

5.2. Discussion

5.2.1. Interpretation of Model Results

5.2.2. Comparison with Existing Studies

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI