Machine Learning-Based Models for Accident Prediction at a Korean Container Port

: The occurrence of accidents at container ports results in damages and economic losses in the terminal operation. Therefore, it is necessary to accurately predict accidents at container ports. Several machine learning models have been applied to predict accidents at a container port under various time intervals, and the optimal model was selected by comparing the results of different models in terms of their accuracy, precision, recall, and F1 score. The results show that a deep neural network model and gradient boosting model with an interval of 6 h exhibits the highest performance in terms of all the performance metrics. The applied methods can be used in the predicting of accidents at container ports in the future.


Introduction
Most of the existing studies on the risk assessment of maritime ports have focused on port security [1][2][3] and port safety [4][5][6]. Notably, most studies focusing on the security considered only unusual events such as hostile attacks [7] and the smuggling of weapons [8], and most studies pertaining to port safety focused on accidents that occurred during usual port activities such as loading, discharging, importing, and exporting. In this regard, research on port safety must be emphasized over that of port security.
In the maritime field, several researchers have examined safety by predicting vessel accidents on the waterway [9], forecasting coastal waves [10], and examining ship collisions [11,12], among other aspects. Moreover, although several researchers have implemented risk assessment methods to identify the risk factors associated with container ports [4][5][6], research regarding the prediction of container port accidents remains limited.
In a container port, several activities, including the loading, discharging, importing, and exporting of containers are performed by port workers using equipment such as yard tractors and container cranes. Owing to these extensive activities, container ports are prone to accidents, such as equipment-equipment collisions, equipment-container collisions, injuries, and container damages during discharging, loading, and moving. These accidents may result in damages to workers, equipment, and containers, as well as in economic loss. In particular, according to the statistics of occupational accidents in Korea [13], as shown in Table 1, the total number of accidents occurring at all container ports in Korea led to 4 fatalities and 91 injuries in 2015 and 3 fatalities and 96 injuries in 2019; the damages have gradually increased every year, except for in 2016. The estimated economic losses resulting from the accidents were approximately KRW 18.8 and 20.5 billion in 2015 and 2018, respectively. Moreover, the number of accidents involving minor injuries is considerably larger. Therefore, predicting accidents at a container port is essential for minimizing the economic loss in port operation and enhancing the port safety. To predict uncommon events such as accidents, machine learning methods including neural network models, random forest, and gradient boosting have been used. In the transportation field, these methods have been widely applied to predict traffic accidents on roads. Specifically, neural networks have been used to predict vehicle crash accidents on roads [14] and the duration of traffic accidents [15]. Random forest models have been applied to detect traffic accidents [16] and identify taxi drivers with a high risk of accidents [17]. Gradient boosting models have been used to predict traffic accidents on roads [18,19] and railways [20]. Moreover, several comparative studies have been performed regarding machine learning methods, including the use of neural networks to predict the severity associated with traffic accidents [21,22].
In the field of maritime and port logistics, machine learning methods have been applied in certain studies to predict the risk associated with accidents. Several researchers have applied neural networks to predict vessel accidents on the waterway [9] and risk in maritime safety [23] and forecast coastal waves [10]. A neural network model was applied to develop a ship collision risk model [11]. Moreover, a random forest model was applied to predict the severity of ship collision accidents [12]. However, relatively few studies have applied machine learning methods to predict accidents in a container port.
In Korea, most studies associated with analyzing and predicting accidents have been conducted in road transportation fields, considering the characteristics of vehicles and pedestrians [24][25][26][27]. Most studies on container ports in Korea were focused on analyzing the safety factors related to loading and unloading activities [28], developing risk assessment methods based on the type of accidents occurring in the port [29,30], and examining the influence of education on the risk factors of accident occurrences in the ports [31]. Research on the prediction of accidents that occur in activities associated with the port is relatively limited.
Therefore, in this study, the accidents that can occur in a container port are predicted by applying machine learning methods, including a neural network model, random forest model, and gradient boosting model to a historical dataset of accidents that occurred at a container port in Busan, Korea.

Neural Network Model
A neural network is a machine learning model that has been widely applied in prediction applications. The basic element of a neural network model is the processing node [32]. In the model, each processing node performs two functions: (1) To sum the input values of the node; (2) To pass this information through an activation function to generate an output. All the processing nodes in a neural network are arranged in layers, and each layer is interconnected to the following layer. There is no interconnection among the nodes in the same layer. In general, a neural network model has an input layer that functions as a distribution structure for the data being used as the input, and this layer is not involved in any type of processing. The input layer is followed by the hidden layer, which consists of one or more processing layers. The final processing layer is the output layer.
The intersections between the nodes have a certain weight. When a value is passed through the input layer, the value is multiplied by the weight and summed to derive the total input n j to the unit, as shown in Equation (1).
where w ji is the weight of the interconnection from the input unit i to another unit j, and o i is the output of i. The total input calculated using Equation (1) is transformed by the activation function to produce an output o j of j. For a neural network, the parameters must be set by the users. These parameters include the number and type of hidden layers, number of nodes in each hidden layer, activation function for the output, weight initialization method, optimization algorithm, learning rate of the optimization algorithm, batch size (i.e., number of training samples used in one iteration), and number of epochs (one epoch is defined as the period in which an entire training dataset passes once through the neural network).
Neural networks are trained by searching for the optimal set of weights for the mapping function from the inputs to the outputs with the given dataset by initializing and updating the weights. In this study, a neural network with the adaptive moment estimation (Adam) optimizer is applied through the Keras Python package [33].

Random Forest Model
Random forests represent an ensemble machine learning technique. A random forest model employs an advanced decision tree analysis method to overcome overfitting issues, which is a drawback of decision tree analyses [34]. In the learning process, a random forest model generates classification trees by selecting subsets of the given dataset and randomly selecting subsets of variables for prediction. The number of trees is set in advance, and the average results for each tree are derived as the final outputs, based on the results generated in each tree. The learning process of random forests using bootstrap sampling consists of the following steps: (i) generate trees and datasets from the training dataset by sampling the bootstrap, (ii) train a basic sorter for the trees, (iii) combine the basic sorter (i.e., tree) into one sorter (i.e., random forest), and (iv) derive the final results of prediction by the majority voting rule. The observed values in the random forest that are not included in the learning process are considered out-of-bag (OOB) values, and they are used in the model validation. OOB values are used to estimate the predicted values and classify variables that cause anomalies. The number of times OOB values are selected in all trees varies for each tree, and the expected values are different for each tree. The probability of correctly predicting the OOB values for each observation in the original category, i.e., category k, can be calculated using Equation (2).
where i is an indicator that is set as 1 and 0 when the value in the parenthesis is true and false, respectively. y x i , T j is the predicted category, and T j is the jth decision tree among the generated trees T, in the forest. OOB i represents a group of decision trees that are not used in the learning process and are bagged as an observed variable. If a set of decision trees does not include x i , the ratio of the number of decision trees predicting x i to category k is Prob k (x i ). For a random forest, the Gini importance is computed and used to indicate the importance of the independent variables. At each node τ within the binary trees t of the random forest, the optimal split is found using the Gini impurity i(τ), which indicates how well a potential split separates the samples of the two categories in a particular node. Let p k = n k n represent the fraction of n k samples from category k = {0, 1} among the total n samples at node τ. The Gini impurity i(τ) can be calculated using Equation (3).
The change in i(τ), ∆i, which can be attributed to the splitting and transmission of the samples to two subnodes τ l and τ r (with sample fractions p l = n l n and p r = n r n , respectively) based on a threshold t θ for variable θ, can be calculated using Equation (4).
In the search for all variables θ available at the node and all possible thresholds t θ , the pair {θ, t θ } leading to the maximum ∆i is determined. The change in the Gini impurity resulting from the optimal split, ∆i θ (τ, T), is recorded and accumulated for all nodes τ in all trees T in the forest for all θ values, as shown in Equation (5).
The Gini importance I G indicates how often a particular variable θ is selected for a split and the contribution of this value to the classification problem.
This study adopts the scikit-learn package in Python, an open-source programming language software that provides a user-customizable random forest model [35].

Gradient Boosting Decision trees
Gradient boosting decision trees are decision tree models that can prevent overfitting and demonstrate an enhanced prediction accuracy [36]. In gradient boosting decision trees, F(x) is assumed to be an approximation function of the output y based on a set of input variables x. The squared error function is applied as the loss function L to estimate the approximation function, as indicated in Equation (6).
Assuming that the number of splits is J for each regression tree, each tree partitions the input space into J disjoint regions R 1m , . . . , R jm and predicts a constant value b jm for region R jm . In this case, each decision tree exhibits the additive form, as indicated in Equation (7).
Using the training data, the gradient boosting model iteratively constructs M decision trees h 1 (x), . . . , h M (x). The updating approximation function F m (x) and gradient descent step size ρ m can be defined using Equation (8) and (9).
With a separate optimal γ jm for each region R jm , b jm can be discarded. Equation (8) can be expressed as Equation (10): and the optimal γ jm can be calculated using Equation (11).
Gradient boosting decision trees build the model sequentially and update it by minimizing the expected value of the loss function. To avoid overfitting and increase the prediction accuracy, a learning rate strategy is applied. The learning rate is used to scale the contribution of each tree model by introducing a factor ξ (0 < ξ ≤ 1), as indicated in Equation (12).
In Equation (12), a smaller ξ corresponds to a higher learning rate. Through the learning rate strategy, the overfitting issue can be avoided by reducing the impact of an additional tree. A smaller learning rate leads to a higher reduction in the loss function value. However, a larger number of trees may be added to the model. In this case, another parameter C, which refers to the number of splits, can be used for fitting each decision tree. This parameter represents the depth of variable interaction in a tree. Increasing C can help capture more complex interactions among variables and exploit the strength of gradient boosting decision trees. Depending on the value of the learning rate and C, the optimal number of trees can be identified by examining how well the model fits the test dataset. The performance of gradient boosting decision trees depends on the combination of the learning rate and tree complexity. In this study, the gradient boosting model is applied through the scikit-learn package in Python, which provides a user-customizable model [35].

Synthetic Minority Oversampling Technique (SMOTE)
Despite their importance in analyzing the risk to safety, accident data are usually a minority class owing to their relative unavailability. Therefore, if oversampling to the minority class is not performed, the results from machine learning models may be skewed to the majority class (i.e., non-accident data), leading to the inferior performance of the models. SMOTE is an oversampling method that can be used to overcome this imbalanced data issue [37]. In this method, a minority class is oversampled by creating synthetic samples. SMOTE generates synthetic samples in the following order: 1) Consider the difference between the feature vector (sample) and its nearest neighbor; 2) Multiply this difference by a random number between 0 and 1; 3) Add this value to the feature vector under consideration. Through this process, SMOTE effectively enables the enhanced generalization of the minority class. Because SMOTE can address the imbalanced data issue, several studies on the predicting of uncommon events such as accidents have applied this method to enhance the model performance [38][39][40]. This study uses time-series datasets, in which the amount of historical data of accidents is small. Consequently, the datasets contain accident data as a minority class, which leads to an imbalanced data issue. To address this problem, SMOTE is applied to the dataset in the model training phase.

Dataset
Three datasets were aggregated and used in the analysis: (1) An operation dataset for 2017, 2018, and 2020, formulated at container port A in Busan, Korea; (2) A historical dataset of the accidents that occurred at container port A in 2017, 2018, and 2020, formulated at container port A; (3) Weather observation dataset for Busan for 2017, 2018, and 2020, collected by the Korea Meteorological Administration [41]. Notably, because the data for 2019 are missing in the second dataset, the analysis was performed using the data for 2017, 2018, and 2020.
The first dataset contained information regarding the movements of containers (i.e., loading, discharging, importing, and exporting) and equipment (e.g., yard trucks and container cranes). The second dataset contained information of the accidents, including the time at which an accident occurred and type of accident (i.e., injury, collision, etc.). The weather dataset included data of the temperature, humidity, wind speed, and precipitation.
Tables 2, A1 and A2 present the results of the basic statistical analysis of the datasets. As shown in Table 2, at container port A, 26, 39, and 78 accidents occurred in 2017, 2018, and 2020, respectively. Therefore, the number of accidents has increased from 2017 to 2020. Tables A1 and A2 present the results for the third and first datasets, respectively. These datasets are integrated into a single time-series dataset. Five datasets are generated according to the intervals of 1 h, 3 h, 6 h, 12 h, and 24 h.

Results
The data of 2017 and 2018 were used for the model training, and the data of 2020 were used for testing the models. As described in Section 2.1.4, SMOTE was applied to the dataset for training to overcome an imbalanced data issue and enhance the model performances. From the training and testing datasets, hourly weather data (i.e., temperature, precipitation, wind speed, and humidity) from the third dataset and hourly operation data for terminal A (i.e., number of ships in berth, number of containers loaded/unloaded from the ships, number of containers imported/exported from the port, number of trucks entering/exiting the port, and number of container cranes/yard equipment/yard trucks in operation) were used as the input variables. The occurrence of accidents (or lack thereof) in the time intervals was used as the output variable. Moreover, as described in Section 2, the model hyperparameters were the learning rate, max depth, max features, min samples leaf, min samples split, n-estimator, and subsample. Tables 3-5 list the values of hyperparameters for each model that can enhance the model performance. To determine the model accuracy in the training process, a 10-fold cross-validation method was applied to the training dataset. Specifically, every model in this study was trained 10 times with 10 datasets for each time interval. The results of the cross validations for the models are presented in Tables 6-8. As shown in Tables 6-8, the average accuracies of the models in the training phase were higher than 90%, except for the accuracy of the deep neural network with a 24 h interval. This finding shows that the model performance is acceptable in terms of its accuracy. In the testing phase, the model performance was evaluated using the test data. The model performance indicators, specifically, the accuracy, precision, recall, and F1 score, were calculated using Equations (13)-(16), respectively.   Table 9 summarizes the results of implementing the models over the testing dataset. All the models exhibit the highest accuracy for the 1 h interval. As the time intervals increase, the accuracies decrease, although the precision, recall, and F1 score increase. For the deep neural network model, random forest model, and gradient boosting model, the accuracy for the 1 h interval is approximately 98.2%, 90.4%, and 98.3%, respectively, which decreases to approximately 76.5%, 76.8%, and 62.3% for the 24 h interval, respectively. The gradient boosting model exhibits the highest increase in the precision, recall, and F1 score as the time interval increases (i.e., from 1.4%, 1.3%, and 1.4% for the 1 h interval to 20.2%, 39.1%, and 26.6% for the 24 h interval, respectively). The second-largest increase pertains to the deep neural network model (i.e., from 2.3%, 2.6%, and 2.4% for the 1 h interval to 17.7%, 21.9%, and 19.6% for the 24 h interval, respectively), and the lowest increase pertains to the random forest model (i.e., from 1.1%, 11.5%, and 2.1% for the 1 h interval to 18.2%, 9.4%, and 12.4% for the 24 h interval, respectively).

Discussion
The results presented in Section 3 show that all the considered models exhibit different performances in predicting accidents in terms of their accuracy, precision, recall, and F1 score under various time intervals. As shown in Table 9, the precision, recall, and F1 score of the models increase as the time intervals increase, whereas the accuracy decreases. In addition to the accuracy, the precision, recall, and F1 score are important measures of the model performance. Higher precision, recall, and F1 score values correspond to a higher model performance. In comparison, a model with an accuracy of at least 85% is considered to have a high performance. Several studies that have applied machine learning methods to predict the accidents indicated that the highest accuracy of the existing models in predicting accidents was 85% [42][43][44]. Therefore, in terms of the precision, recall, and F1 score, as well as the accuracy, the models using the input data with a time interval of 6 h exhibit a reasonable performance. For the 6 h interval, the deep neural network exhibits an accuracy, precision, recall, and F1 score of 90.9%, 7.4%, 6.7%, and 7.0%, respectively. The corresponding values for the random forest model are 86.9%, 4.7%, 8.0%, and 5.9%. The corresponding values for the gradient boosting model are 85.1%, 8.7%, 20.0%, and 12.1%. The gradient boosting model exhibits the best F1 score of 12.1%, followed by the deep neural network (11.2%). Therefore, the gradient boosting model and deep neural network model are preferable for predicting the accident occurrence at a container port.
In this study, accident data for container port A for three years was used in the analysis. In Korea, the accident data for each container port is collected and managed by the container terminal operator of that port, and there is no legal organization that manages the overall accident data for all container ports nationwide. The accident data for each container port is classified as confidential, and so it is not fully available to the public, except for occupational accidents that are compensated by insurance, whereas the terminal operating data for the port is partially available. Moreover, it is difficult to obtain data because accidents are considered to be a sensitive issue in port operations, and so the terminal operator rarely provides it. This makes it difficult to collect a sufficient amount of accident data at container ports nationwide. As a result, this study used an accident dataset of limited size that was available for analyzing whether accidents occurred or not, rather than focusing on the types of accident that occurred.

Conclusions
This study adopts machine learning methods to predict the accidents that can occur in a container port. Time-series datasets with various time intervals are applied, and the model performance is evaluated based on these intervals. According to the results, as the time interval increases, the accuracy in predicting accidents decreases and the precision, recall, and F1 score increases. In terms of all the indicators, the models using the dataset with a 6 h interval exhibit the highest performance. Under the same time interval, the gradient boosting model and deep neural network model are the best in predicting accidents at the container port. These results demonstrate that machine learning methods can be applied to predict accidents at container ports.
Nevertheless, this study involves certain limitations that must be addressed in future work. The operation dataset of the container port and weather dataset are considered as independent variables affecting the occurrence of accidents. However, other variables, such as the accident type, cause, and time of incidence, can also directly affect the occurrence of accidents. Accident data for container ports, especially in Korea, are of significance, and can affect the port operation. However, although the annually aggregated statistics of accidents are available [13], it is difficult to collect an adequately large dataset including the raw data from the port. The accident dataset from container port A contains accident types, including vehicle collision, container damage, injury, and death. However, considering the total number of accidents (i.e., 26 accidents in 2017, 39 accidents in 2018, and 78 accidents in 2020), it is difficult to specify the accidents by type. Therefore, in this study, the number of accidents is considered as the output variable. In future work, by categorizing adequate accident data according to the accident type (i.e., injury, collision, and container damage, among other events), it may be possible to predict the accident type, analyze the factors affecting the accidents, and assess the risks in a container port. Moreover, accidents at a container port happen during work in any environment. Operational data for a container terminal is stored as an hourly dataset, and shows the changes in working status at the port well. However, these data and accident data together cannot provide enough information about the situation in which an accident occurred, because the accident data contains reasons for the occurrence (i.e., carelessness during working, rough driving, and so on) and the hourly aggregated operations dataset cannot describe the dynamic situation in which the accident happened. Therefore, to analyze accidents with their causes, the dynamic operating situation at the container port when the accident happened should be considered, rather than the hourly aggregated data. In a future study, with disaggregated operational data and accident data for a container port, it may be possible to not only predict accident occurrence but also to analyze accident risk.  Data Availability Statement: Restrictions apply to the availability of these data. Data was obtained from container port A in Korea and are available with the permission of container port A.

Conflicts of Interest:
The authors declare no conflict of interest.