Next Article in Journal
Pose Measurement Method Based on Machine Vision and Novel Directional Target
Previous Article in Journal
Applying Named Entity Recognition and Graph Networks to Extract Common Interests from Thematic Subfora on Reddit
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analyzing the Factors Influencing Time Delays in Korean Railroad Accidents

1
Department of Architectural Engineering, Mokpo National University, Mokpo 58554, Republic of Korea
2
Department of Railroad Management, Songwon University, Gwangju 61756, Republic of Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(5), 1697; https://doi.org/10.3390/app14051697
Submission received: 11 January 2024 / Revised: 17 February 2024 / Accepted: 18 February 2024 / Published: 20 February 2024
(This article belongs to the Topic AI Enhanced Civil Infrastructure Safety)

Abstract

:
Railroads play a pivotal role in the Korean national economy, necessitating a thorough understanding of factors influencing accidents for effective mitigation strategies. Unlike prior research focused on accident frequency and severity, this study delves into the often-overlooked aspect of time delays resulting from railroad accidents. Analyzing 15 years of nationwide data (2008–2022), encompassing 3244 human-related and 3350 technical events, this research identifies key factors influencing delay likelihood and duration. Factors considered include event type, season, train type, location, operator size, person type involved, facility type, and causes. Despite an overall decrease in events, variable delay times highlight the need to comprehend specific contributing factors. To address excess zeros, the study employs a two-stage model and a zero-inflated negative binomial (ZINB) model, alongside artificial neural networks (ANNs) for non-linear pattern recognition. Human-related delays are influenced by event types, seasons, and passenger categories, exhibit nuanced impacts. Technical-related delays are influenced by incident types and facility involvement. Regarding model performance, the ANN models outperform regression-based models consistently in all cases. This study emphasizes the importance of considering both human and technical factors in predicting and understanding railroad accident delays, offering valuable insights for formulating strategies to mitigate service disruptions associated with these incidents.

1. Introduction

Railroads have consistently been identified as crucial components of the transportation infrastructure underpinning the Korean national economy. Over the preceding decade (2010–2019, excluding the period of 2020~2022 due to the impact of COVID) in Korea, an examination of travel patterns reveals an average daily volume of approximately 21 million regional trips [1]. Notably, the share of rail transport escalated to 20% in 2019, marking a discernible increase from the 16.3% recorded in 2010. Concurrently, the share of road transport experienced a decline, decreasing to 80% in 2019 from 83.7% in 2010. This pronounced shift in modal distribution is a direct outcome of the sustained governmental commitment to the railway sector, evident through investments in both novel infrastructural undertakings and the modernization of existing systems. A pivotal development occurred in 2020 when, for the first time, the cumulative government investment in the railroad sector surpassed that allocated to the roadway sector. This milestone underscores a strategic reconfiguration in the prioritization and reinforcement of the railroad infrastructure within the transportation investment.
While society reaps considerable benefits from railroad transportation, persisting safety concerns necessitate concerted efforts to mitigate the frequency and impact of railroad accidents. In the context of rail transportation, incidents bear the potential for producing adverse outcomes, including casualties, infrastructural and rolling stock impairment, service interruptions, and environmental damage. The Railroad Safety Act in Korea defines “Railroad safety” as the ongoing process wherein railroad operators and managers of railroad facilities consistently identify and mitigate risk factors associated with the operation of railroads or the management of railroad facilities, which could lead to casualties or property damage. It signifies a condition where the level of risk remains within an acceptable range. Therefore, railroad safety involves safeguarding both lives and property by overseeing, administering, and advancing technological innovations across all modes of rail transport. Particularly significant are accidents with multiple fatalities, prompting profound inquiries into railroad safety within the realms of media and public discourse. These accidents not only bring pertinent issues to the forefront and initiate discussions, but also catalyze actionable measures, serving as pivotal junctures for the enhancement of railroad safety. Within the regulatory framework of the Railroad Safety Act, the Ministry of Land, Infrastructure and Transport (MOLIT) in Korea periodically releases a structured Railroad Safety Master Plan (RSMP) at five-year intervals. This comprehensive strategy encompasses quantifiable safety objectives, encompassing accident rates and associated fatality figures. As a result, the precise forecasting of railroad accident frequencies and their consequential implications for stakeholders emerges as an essential prerequisite for advancing railroad safety. Such foresight significantly contributes to the overarching goal of advancing safety standards within the domain of railroad transportation.
Understanding the key factors influencing the occurrence and consequences of railroad accidents is crucial for developing effective strategies to mitigate casualties, infrastructure and rolling stock damage, service disruptions, and environmental harm. However, numerous researchers have dedicated their efforts to examining the frequency and severity of railroad accidents, with a particular emphasis on fatal train incidents and railroad grade crossing accidents. Most of the research has concentrated on analyzing the frequency and impact of fatal train accidents, primarily in terms of casualties. Notably, there is a dearth of studies focused on accurately predicting the duration of time delays resulting from railroad accidents, particularly utilizing comprehensive historical data on railroad incidents in Korea, as far as the author is aware.
This study aims to identify the key factors influencing the likelihood and duration of time delays in railroad events, encompassing incidents and accidents by utilizing a comprehensive set of factors, such as event type, season, train type, location, operator size, person type involved, facility type, causes and so on. Also, in predicting time delays, comparing model performance by employing conventional statistical models and artificial neural network (ANN) models is another purpose of this study.
The paper commences with a literature review encompassing prior research endeavors related to modeling accident severities (including time delay related), subsequently introducing the dataset employed. The ensuing sections expound upon the methodological framework employed for model estimation and furnish insights into the outcomes of the estimation process. Lastly, a summative discussion is presented, along with recommendations for further research.

2. Literature Review

Railroad safety studies have seen the emergence of numerous predictive models designed to evaluate both the frequency and severity of accidents. A predominant approach in these models involves the analysis of mean count response data, establishing relationships between independent and dependent variables under specific statistical distribution assumptions. Notably, Evans [2,3,4] conducted an in-depth examination of fatal train accidents on Britain’s mainline railways, leveraging 31 years of historical data. His analysis encompassed the study of trends in accident frequencies and fatalities arising from 75 fatal collisions, derailments, and buffer stop overruns, totaling 273 fatalities among passengers, employees, and the public. In addressing annual accident frequencies, Evans employed a Poisson distribution model to account for and estimate the mean number of accidents per billion train kilometers. Similarly, Miwa et al. [5] investigated train accidents within the Japanese railroad system. They adopted an exponential regression model to scrutinize 71 serious train accidents occurring between 1987 and 2003, employing the time interval between consecutive accidents as a predictor variable. Their findings indicated that the exponential distribution aptly characterized the number of days between successive accidents, the log-normal distribution fitted the train suspension time (using the factors of accident types, causes and years), and a Poisson distribution aligned with the number of casualties. When estimating the total annual accident count, the exponential function emerged as the preferred model for representing the relationship between annual accident frequency and the time period. In terms of predicting train delay time, Park et al. [6] revealed that the length of duration of delayed time due to railway accidents is dependent on the factors such as train types (express train vs. metro), causes (human-related vs. non-human-related), and magnitude of occurrence (i.e., number of casualties, number of derailed trains).
In the realm of railway accident severity assessment, Liu et al. [7] investigated cargo-train derailments, scrutinizing the relationship between the number of derailed cabins and severity. They employed a zero-truncated negative binomial (ZTNB) regression model along with a quantile regression model. Their analysis was grounded in a dataset comprising 458 cargo-train derailments attributed to brake-rail causes in the United States spanning the years 2001 to 2010. Lim [8], in a distinct context, applied modeling techniques, including zero-truncated negative binomial (ZTNB) regression and Artificial Neural Networks (ANNs), to predict railway accidents on South Korea’s National Railroad. This inquiry leveraged historical accident data to assess the effectiveness of the models in accurately forecasting accidents, thereby offering valuable insights for accident prevention.
In the domain of train accident rate calculation, Evans [9] determined the fatal train accident rate per billion train kilometers for Europe’s mainline railways spanning the years 1990 to 2019. This estimation relied on the assumption of accidents following a Poisson distribution within specified time periods and train kilometers per year. Employing a similar methodology, Zhang et al. [10] analyzed the frequency of derailments and collisions in freight train accidents in the United States, attributed to human factors. Their model incorporated yearly train miles and the occurrence of train accidents as explanatory variables, deploying a Negative Binomial (NB) model.
Numerous prior studies in the field of highway-rail crossing analysis have employed the Generalized Linear Model (GLM) approach and its extensions, which make assumptions about specific statistical distributions, as mentioned earlier. For instance, Austin and Carson [11] delved into the analysis of accident frequency data related to highway-rail crossings, highlighting the superior performance of the negative binomial distribution model over the Poisson model and conventional multiple linear regression techniques. Concurrently, several scholars have explored the realm of collision severity analysis, spanning from fatal to non-injury outcomes. Notable researchers such as Lu and Tolliver [12], Hu et al. [13], Oh et al. [14], Raub [15], Ma et al. [16], Kang and Khattak [17], Ghomi et al. [18], Hao and Daniel [19,20,21], Haleem and Gan [22], Fan et al. [23], Eluru et al. [24], and Savolainen et al. [25] have employed various GLM models to investigate these aspects.
In response to the prevalence of zero values in railroad accident data, where instances of zero-time delays are observed, several studies have advocated for the application of models such as the zero-inflated negative binomial and zero-inflated Poisson models [26,27,28]. For instance, Lambert [29] employed the zero-inflated Poisson (ZIP) model to predict manufacturing defects. Ridout et al. [30] conducted an extensive review of contemporary statistical models tailored for count data characterized by an overabundance of zero values. Joe and Zhu [31] conducted a comparative analysis between generalized Poisson models and zero-inflated negative binomial (ZINB) models. In the domain of caries research, Mwalili et al. [32] made substantial contributions by utilizing the zero-inflated negative binomial model to rectify misclassifications. Neelon et al. [33] proposed a Bayesian model specifically designed for zero-inflated count data, focusing on the analysis of psychiatric outpatient service utilization.
In addition to the conventional approaches, Artificial Neural Network (ANN) models have garnered popularity in both practical applications and research, offering enhanced predictive accuracy and surmounting limitations associated with statistical prediction models in transportation studies, particularly within the domain of vehicle accidents. Notably, Zeng and Huang [34] employed an ANN model for predicting the severity of highway collisions, surpassing the performance of a statistical ordered logit model. Abdelwahab and Abdel-Aty [35] determined that multilayer neural networks provided more accurate classifications of severity outcomes compared to ordered logit models. Codur and Tortum [36] demonstrated the applicability of ANN models in the analysis of traffic accident frequency. While there is a considerable focus on the implementation of ANN models in roadway accident prediction studies, a limited number of researchers have leveraged ANNs to forecast accident frequency and severity within the railway sector. Zheng et al. [37] explored the likelihood of train–vehicle crashes at highway-rail grade crossings (HRGC) and illustrated the superior performance of ANN models compared to the decision tree approach in predictive and descriptive capabilities using public HRGC case studies in North Dakota spanning the years 1996 to 2014. Gao et al. [38] investigated the utility of a convolutional neural network (CNN), a deep learning-based approach, on the same dataset employed by Zheng et al. [37], conducting a comparative analysis of various machine learning and deep learning methods. For the Canadian HRGC database spanning from 2004 to 2013, Yang et al. [39] proposed a machine learning-based methodology, utilizing the RandomForest algorithm to effectively examine the relationships between accident rates and contributing factors.

3. Data Descriptions

3.1. Overview of Railroad Accidents in Korea

This section commences with a concise overview of railroad accident statistics. The investigation employed a dataset encompassing 22 years of railway accident data, culled from the Railroad Safety Information System (RSIS) spanning the timeframe of 2001 to 2022. The RSIS database features comprehensive historical records of accidents, including temporal specifics, causative factors, duration of train suspensions, fatalities, injuries, geographic coordinates, meteorological conditions, and other pertinent details. The dataset is classified into two distinct categories. The first category pertains to accidents primarily characterized by their detrimental impact on either human life or property and encompasses occurrences such as fatal train accidents (involving train-to-train collisions, derailments, or fires on trains), ground-level crossing accidents (involving train-to-human or train-to-vehicle collisions at ground-level crossings), and human-involved accidents (involving deaths or injuries resulting from train operations or other factors). The second category encompasses incidents that gave rise to perilous situations, exemplified by near misses, as well as events leading to time delays, such as those caused by train malfunctions or facility failures.
Figure 1 presents the RSIS dataset, offering a comparative analysis of “Incidents” and “Accidents” during the years 2001 and 2022. Herein, “Fatality” encompasses the total number of deaths and/or injuries resulting from any railway accident. In 2001, the dataset records 498 “Incidents”, a number that exhibited a noteworthy reduction to 227 in 2022. This reduction in “Incidents” is reflected in an annual growth rate (AGR) of −3.7%. Similarly, the data indicates a decline in “Accidents” from 710 instances in 2001 to a mere 80 occurrences in 2022. This remarkable decrease in “Accidents” corresponds to an AGR of approximately −9.9%. The negative AGR values underscore a substantial decline in both “Incidents” and “Accidents”, with “Accidents” displaying a more pronounced reduction over the specified period. It is worth noting that, aside from an anomaly in the year 2014, where a train collision resulted in 477 injuries (based on the statement of accident overview in the data field of RSIS dataset, following a rear-end collision of a departing train from Sangwangsimni station with an incoming train on 2 May 2014), the number of fatalities per accident has remained relatively stable. The rationale for this development lies in the widespread adoption of state-of-the-art technological innovations, including advanced safety systems such as automatic train control, train radio communications, various grade level crossing solutions, and passenger safety doors. Notably, this period has witnessed the extensive implementation of these technologies by numerous railroad operators, coupled with the strategic repositioning of level-crossings to subterranean or elevated positions [8].
This study seeks to measure time delays caused by various accidents or incidents by analyzing the impact between time delays and factors such as season, size of train operators, type of railroad, accident cause, accident location, etc. Figure 2 represents the ratio of the number of accidents and incidents with time delay records. A total of 14,771 accidents and incidents occurred, and only 38.3% of them had a time delay record of 22 years on average. In detail, the proportion was only a small portion of 0.9% in 2001 and 2007, but increased noticeably to an average of 78.8% in the remaining years. In general, reporting of train accidents and incidents follows the guidelines on railway accident and investigation reporting. This guideline was established under the Railway Safety Act in November 2007, and serves to define the specific procedures and methods for reporting railway accidents. It aims to ensure the timely and accurate reporting of such incidents. Therefore, this study used the data for the analysis starting from 2008. This is because it was recorded in accordance with appropriate methods and procedures of subsequent guidelines.
Table 1 is a brief summary of railroad accident statistics divided into before and after periods since 2008, when the guidelines were implemented. Although the period since 2008 has more than doubled compared to the previous period, the total number of accidents has decreased significantly compared to before. Specifically, train-related passenger accidents decreased significantly by 54%, and ground crossing (GC) accidents also decreased by about 38%. This is because many of the existing GCs were demolished, as they were relocated up or down the road and, as a result, the number decreased by more than half from 1744 in 2001 to 808 in 2021 [1]. The delayed operation means that high-speed trains and metro trains are delayed by 10 min, conventional passenger trains are delayed by 20 min, and freight trains and other trains are delayed by more than 40 min. Railroad events tended to decrease on a yearly basis since 2008. However, no decreasing trend was found in the number of fatal train accidents (train fires, collisions, derailments).
Table 2 shows the number of railroad accidents categorized by their causes and the average train delay time attributed to since 2008. Human factors include people such as field workers, staffs, route controllers, train drivers, crew, and passengers. Technical factors are mainly related to malfunctions, failures, and incorrect installation of equipment, devices, and interfaces between systems. Lastly, external factors are completely influenced by environmental conditions such as bad weather. The numbers of occurrences in human and technical factors both tended to decrease over the past 15 years and, further, both factors are main causes. For a comparison of accident occurrences to other countries, International Union of Railways (UIC) Safety Report has been referred to. The UIC publishes safety report every year by collecting and analyzing significant accident data from 32 different countries (see the list of countries on pg. 5 of the UIC Safety Report 2023 [40]). According to the UIC report, the annual count of significant accidents decreased by approximately 25% in 2014 compared to 2006. The period of 2017–2020 sees a slow decrease in significant accidents, with its lowest level in 2020. 2020 was an exceptional year due to COVID-19 restrictions, and significant accidents slowly increased in 2021 with 1712 records, which is comparable to the pre-COVID-19 year 2019 (−1%). An increase in significant accidents for the established members in 2022 can be explained by a significant rise in traffic (see Figure 3).
Further, the report presented the distribution by accident causes as shown in Figure 4. The number of accidents with external causes decreased by 12% between 2018 and 2019, and again by 12% between 2019 and 2020, but increased 8% in 2021 and 6% in 2022. Overall, a consistent decreasing trend is observed in many countries, regardless of the steepness of the trend.
The second part of Table 2 shows the average time delayed by three factors. The length of delay time appears in the following order: human factors, technical factors, and external factors. In the case of a fatal accident, the possibility of recovering from a breakdown or damage to facilities or equipment is relatively low compared to an accident caused by technical factors, so the time it takes for the train to operate normally is inevitably relatively small. However, in the case of accidents caused by environmental factors, it causes breakdown and damage to railway facilities such as tracks, structures, and vehicles. Therefore, it is estimated that accidents caused by environmental factors require more time to recover from, compared to accidents caused by technical factors, resulting in relatively longer train operation delays. Interestingly, average latency has not shown any significant decrease in length over the past 15 years. This means that, regardless of the development of railway technology, an almost certain amount of delay time will inevitably occur.

3.2. Details of Human- and Technical-Involved Events

As mentioned above, because there is sufficient amount of data on human and technical factors since 2008, this study limited these two factors for the time delay estimation. Figure 5 presents the distribution of accident delay times by human and technical factors.
The percentage of accident delays recorded as zeros was 41.4% for human factors and 9.1% for technical factors, showing a very large difference. This is the reason why the average delay time due to human factors is lower than that due to technical factors, as shown in Table 2 above. In other words, this means that in the case of an accident caused by human factors, it is much more likely that a delay time of zero will be recorded than an accident caused by technical factors. Then, records with excessively large delay times were defined as outliers and excluded from the analysis. The cutoff for outliers was defined as twice the interval between the 1st and 3rd quartiles, added to the 2nd quartile. As a result, the baseline for outliers for human factors was calculated to be 93 min, and for technical factors to be 96 min. However, in this study, the baseline for all outliers was set at 100 min. Outliers exceeding 100 min were 83 cases for the human factors and 104 cases for the technical factors, respectively. After excluding these outliers, the final data to be used for analysis was 3161 cases for the human factors and 3246 cases for the technical factors. The mean and variance of cases where time delay occurred (excluding cases without time delay) were 30.6 and 347.4 for human factors and 31.9 and 347.9 for technical factors, respectively. Table 3 identifies the explanatory variables used in modeling and presents the descriptive statistics for the human and technical factors.
The statistics reveal that approximately 75% of accidents can be attributed to factors associated with human involvement, within the category of human factors, while roughly 96.5% of incidents are linked to delays in operations, categorized under technical factors. This observation shows the higher likelihood of events arising from human factors evolving into accidents, whereas those originating from technical factors are more inclined to manifest as incidents. In terms of seasonal distribution, the number of occurrences related to human factors does not exhibit clear patterns. From the technical factors, however, there is a noticeable decrease in occurrences during the spring and fall seasons compared to other periods. This phenomenon can be attributed to the climatic conditions in Korea, with hot and humid summers featuring numerous rainy days, and cold, dry winters. Consequently, technical factors are more likely to result in accidents and incidents during these weather seasons.
Concerning the type of railways, conventional trains play a predominant role in the occurrence of both accidents and incidents. Express trains, characterized by their relatively modern design and the presence of safety fences along their tracks, maintain a higher level of safety. Furthermore, most metro routes are situated underground, offering protection against adverse weather conditions and public intrusion. In contrast, conventional trains, employing relatively older vehicles and operating on ground sections without safety fences, are inherently more prone to experiencing a higher accident rate than the other train types. In the classification of train operators based on the number of routes they manage, operators overseeing two or three routes are categorized as ‘Middle,’ while those managing four or more routes fall into the ‘Large’ category. This categorization aids in understanding the scale and scope of train operations. Many accidents and incidents occur within station locations, particularly on concourses or platforms, with approximately 40% attributed to human factors and 42% linked to technical factors. In the domain of human factors, 42% of accidents can be attributed to carelessness or errors, such as careless behavior, unauthorized track crossings, close proximity to tracks, incorrect driving procedures, disregard for safety stops, and errors in track switching, among others. Conversely, in the domain of technical factors, approximately 69% of accidents are associated with aging or defects in specific parts themselves, or inadequate maintenance practices.

4. Methodology

4.1. Regression Methods

The magnitude of time delay, a non-negative continuous measurement characterized by a prevalence of zeros as depicted in Figure 5, can be quantified through regression techniques. Poisson and Negative Binomial (NB) regression models have emerged as widely employed tools in accident analysis. The Poisson model is well-suited for datasets exhibiting variance approximately equal to the mean, whereas the NB model is aptly suited for datasets with pronounced over-dispersion, where the variance significantly exceeds the mean. Consequently, the NB model is deemed appropriate for predicting time delays due to the substantial disparity between the variance and mean within both datasets, as elucidated in the data description. However, the data on time delay has two heterogeneous measurement distributions (e.g., zero or positive), which are not well described by these classical models. To deal with the excess frequencies of zero, two approaches have been adopted. One is the two-stage model. In the first stage of the model, a binary logit model was applied to the distribution of whether delay time occurred. And, when the delay time occurred from the model, the delay time was measured using the NB model. Another is the ZINB model, which analyzes data with excess zeros and over-dispersion, providing a more accurate representation of the underlying processes. The dependent variable, delay time was converted into a categorical variable with 5 min intervals, expressed as 0 to 21 (e.g., ‘1′ if the delay time is greater than 0 and less than or equal to 5), and then applied to the models. The NB model postulates that the Poisson means conform to a gamma distribution, and the probability mass function is expressed as follows: [see Cameron et al. [41] (p. 100) for details]:
P r y i = Γ ( α 1 + y i ) Γ ( α 1 ) Γ ( y i + 1 ) 1 μ i α + 1 α 1 μ i α μ i α + 1 y i
where
E y i = μ i = e x p ( β 0 + β 1 x 1 i + + β m x m i + ε i )
V a r ( y i ) = μ i + α μ i 2
L = i = 1 N P r y i
Let i = 1, 2, 3, …, n represent the observation index; m denotes the index for explanatory variables; y i signify the observed delayed time amount for observation i; x m i represent the value of explanatory variable m for observation i; and β m denote the coefficients to be estimated corresponding to x m i . The estimated outcome for delayed time for the i t h observation, denoted as μ i , is derived. The overdispersion parameter is represented as α. When α→0, it implies equivalence between the conditional mean and conditional variance (eliminating unobserved heterogeneity), effectively collapsing the NB model into the Poisson model. The likelihood function is denoted as L.
To model overdispersion in the presence of excessive zeros, the ZINB distribution is employed. For y i , the value 0 manifests in two distinct states. The first state is the zero state, occurring with a probability of p i , resulting exclusively in observations with a value of 0. The second state is the negative binomial state, occurring with a probability of (1 − p i ) and characterized by a negative binomial distribution [42].
Pr y i = p i + 1 p i 1 μ i α + 1 α 1 ,   for   y i = 0 1 p i Γ α 1 + y i Γ α 1 Γ y i + 1 1 μ i α + 1 α 1 μ i α μ i α + 1 y i ,   for   y i > 0
The means and their variances are defined E y i = 1 p i μ i and V a r ( y i ) = 1 p i μ i ( 1 + μ i α + p i μ i ) , and the log-likelihood function,
L = i = 1 N p i + 1 p i 1 μ i α + 1 α 1 + i = 1 N 1 p i Γ α 1 + y i Γ α 1 Γ y i + 1 1 μ i α + 1 α 1 μ i α μ i α + 1 y i

4.2. Artificial Neural Network Method

Artificial neural networks emulate the functionality of the human brain and facilitate progressive learning through a series of algorithmic operations. Within ANNs, information is processed via interconnected nodes, known as neurons, situated within the ANN framework. The standard ANN architecture, depicted in Figure 6, comprises three fundamental components: the input layer, hidden layer, and output layer. Each neuron in the input layer serves as a data aggregator, collecting information from external sources and performing a role akin to that of a predictor or contributor in a regression model. In the context of artificial neural networks, the hidden layer resides between the input and output layers. To discern non-linear patterns, it is commonly recommended to employ a single hidden layer in most applications [37,43,44,45]. Augmenting the number of hidden layers introduces greater complexity and prolonged processing times, often without a proportional enhancement in model performance, as noted by Nielsen [46]. Concerning the number of neurons within a hidden layer, Nielsen [46] advises that it should be fewer than the neurons in the input layer.
Figure 7 illustrates a structure comprising a solitary node. This node receives inputs from either the data within the input layer or the outputs of other nodes, referred to as activations. Each input to the node is associated with a weight (notated as ‘w’) and a bias (notated as ‘b’) or a threshold. The weight characterizes the influence of the input on the node’s response. The node computes the weighted sum of inputs and subsequently applies an activation function, which may include threshold functions, Sigmoid functions, Rectified Linear Unit functions (ReLU), Hyperbolic Tangent functions, and Softmax functions, among others, to determine the node’s output. Training the neural network can be accomplished using various training algorithms, with the Backpropagation (BP) algorithm being one of the most widely adopted choices. The BP algorithm is employed to adjust the weights of neural network nodes, with the objective of minimizing the sum of squared errors during the training phase.

5. Analysis Results

To evaluate the time delay associated with both human-related and technical factors, the dataset underwent partitioning, allocating 80% of the data for model estimation and reserving the remaining 20% for testing purposes. The application of a Chi-squared test reveals that no statistically significant difference exists in the frequency distribution of time delay between these two subsets.

5.1. Human-Related Time Delays

Table 4 represents the modeling outputs for the delayed time caused by human-related factors with some of the explanatory variables shown in Table 3. In general, the direction of estimates indicates a certain trend. For instance, in the outputs of the negative binomial part, a positive estimate suggests a higher likelihood of an increase in the delayed time, especially when associated with a p-value equal to or greater than 0.05. Similarly, in the logit part outputs, a positive estimate signifies a greater probability of occurrence of time delays. These trends remain consistent across various models, except for estimates in the type of event within the negative binomial part. Nevertheless, the likelihood of delayed time occurrence is notably higher when the type of event is linked to high-potential events (2.708 in Two-stage; 2.980 in ZINB) or ground-level crossing accidents (0.931 in Two-stage; 2.980 in ZINB), as opposed to events related to humans (−2.867 in Two-stage; −2.865 in ZINB).
However, the duration of delayed time for human-involved (−0.357 in ZINB) or ground-level crossing accidents (−0.234 in ZINB) tends to be less than that for other types of events such as crashes, derailments, facility-involved accidents, and so on. The likelihood of a time delay occurring is not only higher during the fall season, but is also associated with a longer duration of delayed time, and this is statistically significant only at the negative binomial part of Two-stage model. When railroad accidents involve the public (2.215 and 1.513 in Tow-stage, respectively) or passengers (2.382 and 1.645 in ZINB, respectively), the likelihood of a time delay is higher compared to other person types. However, during the duration of the delay, events related to ‘passengers’ are likely to experience a reduction. The categorization of train operators into three groups based on the number of operating routes did not yield statistically significant explanations for time delays.
In cases involving conventional trains in railroad accidents, there is a diminished likelihood of time delays; however, the temporal duration of the delay shows a statistically significant likelihood of increase (0.189 in Two-stage; 0.243 in ZINB). Conventional trains, operated exclusively by ‘KORAIL’ across the country, generally have time headways at least exceeding 20~30 min—a notable contrast to express trains and metros, which operate with headways of 5 min or less. Constituting a mere 1.8% of the total passenger volume, compared to 95.5% for metros and 2.6% for express trains (as of Aug. 2023 in KOSIS), conventional trains are allocated a comparatively smaller share of resources by KORAIL. Consequently, accidents involving conventional trains result in inevitably longer delay times, given the relatively sluggish response attributed to their diminished strategic importance. The analysis indicates that when an incident occurs on the main line, between railroad stations, both models consistently show an increase in the probability of a delay occurring (positive signs in the logit part) as well as an extension in the length of the delay time (positive signs in the negative binomial part). Similarly, accidents related to suicide are associated with an elevated likelihood of delays and an extended duration of time.
To configure the ANN model, the input layer is designed to incorporate the same independent variables as detailed in Table 4. The input layer comprises two variables, each accommodating 12 neurons, representing predictive factors related to the duration of time delays. During the development of the ANN model, the sigmoid activation function was selected in conjunction with the BP algorithm, as the application of the ReLU activation function failed to converge to optimal values. Throughout the course of this ANN study, an interesting observation emerges, indicating a consensus on the limited performance improvement attained by introducing additional hidden layers. In many instances, a single hidden layer is deemed sufficient to address many problems. However, this study explores the introduction of hidden layers ranging from 1 to 3, revealing that optimal convergence was not achieved with three hidden layers. Given that the efficacy of the BP training algorithm is contingent on the number of neurons within the hidden layer, various neuron counts ranging from 1 to 12 were systematically examined. The investigation concludes that the optimal number of neurons in the hidden layer is eight, resulting in the lowest values for Mean Square Error (MSE) and Root Mean Square Error (RMSE) for the training data. The analysis was conducted using the R 4.3.1 software package. The primary criteria employed to compare the performance superiority among the Two-stage model, ZINB model, and ANN model encompass the coefficients of determination ( R 2 ), MSE, and RMSE. These criteria are defined as follows:
R 2 = r 2 ,   r = n y × y ^ ( y ) ( y ^ ) [ n y 2 ( y ) 2 ] × [ n y ^ 2 ( y ^ ) 2 ]
M S E = 1 n i = n ( y i y ^ i ) 2
R M S E = 1 n M S E
Within the framework of this analysis, we define y i as the i-th observed value, y ^ i as the i-th predicted value generated by the model, and n as the total available dataset size.
Table 5 presents a concise overview of the comparative predictions derived from the three estimated models for the human-related factors. The R 2 serves as a metric assessing the proximity of model-fit data, ranging from 0 to 1, where higher values indicate a more comprehensive explanation of dependent variables by independent variables. In the training dataset, the R 2 values are 0.355 for the Two-stage model, 0.389 for the ZINB model, and 0.446 for the ANN model. Notably, the ANN model exhibits a 7.2% improvement in model fit compared to the Two-stage model and a 4.7% enhancement compared to the ZINB model. Similar trends are observed in the testing data, with an 8.8% and 3.1% improvement associated with the ANN model. Furthermore, upon examining the results of MSE and RMSE, the ANN model demonstrates relatively lower prediction error rates compared to the other two regression models.

5.2. Technical-Related Time Delays

Table 6 presents the results of the regression analysis concerning both the occurrence and the duration of delayed time attributed to technical factors. Similar to the findings for human-related factors, the coefficient estimates in both models exhibit considerable similarity. The probability of delayed time occurrence is elevated when events are associated with high-potential incidents (3.848 in the Two-stage model; 3.969 in the ZINB model). However, the duration of time delay was not found to be statistically significant in the negative binomial part. In instances of derailment accidents, there is an increased likelihood of prolonged delayed time (0.021 in ZINB). Additionally, when the accident involves fire, the duration of delayed time tends to increase. The analysis reveals that accidents associated with signal, telecommunication, rail, or structural facilities are less likely to experience time delays, as evidenced by the negative signs of all coefficient estimates in the logit part. Furthermore, the duration of time delay is less likely to be prolonged in cases involving signal or communication facilities (−0.290 in the Two-stage model and −0.236 in the ZINB model) and rail or structural facilities (−0.166 in the Two-stage model). Conversely, accidents linked to electric power facilities tend to exhibit an increase in the duration of time delay (0.151 in the Two-stage model and 0.185 in the ZINB model). Accidents involving personnel other than crew and control staff tend to decrease the likelihood of prolonged time delays. Specifically, in incidents related to conventional trains in railroad accidents, there is a statistically significant likelihood of an increase in the temporal duration of delay (0.422 in the Two-stage model and 0.441 in the ZINB model). This aligns with the interpretation of the model output for human-related factors. The analysis also indicates that incidents occurring on the main line, between railroad stations, consistently result in an elevated probability of extending the duration of time delay, as evidenced by positive signs in the negative binomial part on both models.
Table 7 provides a succinct summary of the comparative predictive performance among the three models concerning technical-related factors. The model-fit R 2 values are recorded as 0.197 for the Two-stage model, 0.209 for the ZINB model, and 0.300 for the ANN model. Noteworthy is the ANN model’s outperformance, displaying a 6.6% improvement in model fit compared to the Two-stage model and a 5.9% enhancement compared to the ZINB model. These trends persist in the testing data, with a 10.6% and 6.0% improvement associated with the ANN model. Additionally, upon scrutiny of the MSE and RMSE results, the ANN model exhibits comparatively lower prediction error rates in contrast to the other two regression models.

6. Discussion

This study focuses on identifying key factors influencing the likelihood of time delays in railroad events, utilizing comprehensive historical data from 2001 to 2022 in Korea. The dataset is divided into two categories: accidents with a significant impact on human life or property and incidents leading to time delays. Over the years (2001–2022), there is a substantial decline in both accidents and incidents, with an AGR of approximately −9.9% for accidents and −3.7% for incidents. Adoption of advanced safety systems and technological innovations, such as automatic train control, contributes to the reduction in accidents. The study aims to measure time delays caused by accidents or incidents, considering factors like season, train operator size, railroad type, accident cause, and location. A total of 14,771 accidents and incidents occurred, with only 38.3% having time delay records on average over 22 years.
Reporting of accidents and incidents aligns with guidelines established under the Railway Safety Act from November 2007; therefore, the study used the data for the analysis starting from 2008. The total number of events used for human factors is 3244, and is 3350 for technical factors. The average delay time due to human factors ranges from 14.8 to 48.0 min, with an overall average of 27.2 min, while the average delay time due to technical factors ranges from 28.9 to 51.0 min, with an overall average of 34.3 min. There are variations in the annual average delay times, emphasizing the need for a dynamic and context-specific approach to address the causes of delays in the railway system. The overall decrease in the total number of events indicates positive trends in railway safety, but the varying delay times highlight the importance of understanding the specific factors contributing to delays for effective mitigation strategies.
To address excess zeros in the data, two approaches are employed: a two-stage model involving binary logit and NB models, and a ZINB model, offering a more accurate representation. The article further introduces the mathematical formulations of these models and their parameters. Additionally, ANNs are explored as an alternative method, mimicking human brain processes for non-linear pattern recognition. The structure and components of ANNs, along with the backpropagation algorithm for training, are explained. The study aims to provide insights into predicting delayed time in railway accidents, considering both traditional regression models and advanced ANN methods.
The study investigates time delays resulting from human-related and technical factors by dividing the dataset for model estimation and testing. Statistical tests indicate no significant difference in the frequency distribution of time delays between the two subsets. Modeling outputs, particularly from the Two-stage and ZINB models, consistently demonstrate trends in the likelihood and duration of delays associated with various factors. Human-related delays, influenced by event types, seasons, and passenger categories, show nuanced impacts. An ANN model is configured with optimal parameters, and its performance is compared to regression models, indicating that the ANN model provides superior predictions for human-related factors. Technical-related delays, influenced by incident types and facility involvement, are also analyzed, with the ANN model again showing enhanced predictive performance compared to regression models.
While the current dataset offers detailed information on delayed times based on factors like railway type, operator, accident cause and accident type, the author points out the need for more comprehensive data to assess the full impact of accidents. Suggestions include incorporating information such as the speed at the time of an accident, detailed breakdown of total delay time, including time affected by the main line, time required for recovery, time of occurrence and location of the route where the accident occurred. Despite these limitations, the study is considered instructive for transit agencies in assessing the repercussions of railway accidents and elevating service quality. The article also acknowledges the predictive superiority of the ANN model, but emphasizes that there are still shortcomings to address, such as the consideration of additional factors that have not statistically affected the dependent variable. Ultimately, the study provides valuable insights, but for policymakers seeking specific areas to focus resources for reducing the impact on rail accidents, further enhancements in data and analysis are necessary.

7. Conclusions

The study emphasizes the importance of considering both human and technical factors in predicting and understanding delays in railroad accidents. The primary conclusions drawn from analyzing factors impacting time delays in railroad accidents are as follows:
For the human-related factors:
  • Delayed time for human-involved or ground-level crossing accidents is typically shorter compared to other events.
  • Time delays are more likely in the fall season and tend to be longer.
  • Accidents involving the public or passengers on railroads have a higher likelihood of time delays, with shorter durations for passenger-related incidents.
  • Conventional train accidents have lower likelihoods of time delays, but longer delays when they occur.
  • Incidents on the main line, between stations, increase the probability and duration of time delays.
  • Suicide-related accidents lead to increased likelihood and duration of delays.
For the technical-related factors:
  • High-potential incidents increase delay likelihood.
  • Derailments or fires result in prolonged delays.
  • Signal, telecommunication, rail, or structural facility accidents have minimal delays and shorter delay durations.
  • Electric power accidents lengthen delays, while those involving non-crew personnel decrease delay likelihood.
  • Conventional train incidents extend delay durations.
  • Incidents on the main line, particularly between stations, lead to extended delays.

Author Contributions

Conceptualization, K.-K.L.; Methodology, K.-K.L.; Software, J.-M.K. and K.-K.L.; Validation, J.-M.K.; Investigation, J.-M.K. and K.-K.L.; Data curation, J.-M.K.; Writing—original draft, K.-K.L.; Writing—review & editing, J.-M.K. and K.-K.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2022R1F1A106314112) and supported by research fund by Songwon University (2023).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

  1. KOSIS (Korean Statistical Information Service). Available online: https://kosis.kr/index/index.do (accessed on 5 June 2023).
  2. Evans, A.W. A statistical analysis of fatal collisions and derailments of passenger trains on British railways: 1967–1996. Proc. Inst. Mech. Eng. 1997, 211, 73–86. [Google Scholar] [CrossRef]
  3. Evans, A.W. Fatal train accidents on Britain’s mainline railways. J. R. Stat. Soc. Ser. A (Stat. Soc.) 2000, 163, 99–119. [Google Scholar] [CrossRef]
  4. Evans, A.W. Speed and rolling stock of trains in fatal accidents on Britain’s mainline railways: 1967–2000. Proc. Inst. Mech. Eng. 2002, 216, 81–95. [Google Scholar] [CrossRef]
  5. Miwa, M.; Gozun, B.; Oyama, T. Statistical data analyses to elucidate the causes and improve the countermeasures for preventing train accidents in Japan. Int. Trans. Oper. Res. 2006, 13, 229–251. [Google Scholar] [CrossRef]
  6. Park, M.S.; Eom, J.K.; Choi, J.; Heo, T.-Y. Analysis of the Railway Accident-Related Damages in South Korea. Appl. Sci. 2020, 10, 8769. [Google Scholar] [CrossRef]
  7. Liu, X.; Saat, M.R.; Qin, X.; Barkan, C. Analysis of U.S. freight-train derailment severity using zero-truncated negative binomial regression and quantile regression. Accid. Anal. Prev. 2013, 59, 87–93. [Google Scholar] [CrossRef]
  8. Lim, K.K. Analysis of Railroad Accident Prediction using Zero-Truncated Negative Binomial Regression and Artificial Neural Network Model: A Case Study of National Railroad in South Korea. KSCE J. Civ. Eng. 2023, 27, 333–344. [Google Scholar] [CrossRef]
  9. Evans, A.W. Fatal train accidents on Europe’s railways: An update to 2019. Accid. Anal. Prev. 2021, 158. [Google Scholar] [CrossRef]
  10. Zhang, Z.; Turla, T.; Liu, X. Analysis of human-factor-caused freight train accidents in the United States. J. Transp. Saf. Secur. 2021, 13, 1157–1186. [Google Scholar] [CrossRef]
  11. Austin, R.; Carson, J. An alternative accident prediction model for highway-rail interfaces. Accid. Anal. Prev. 2002, 34, 31–42. [Google Scholar] [CrossRef]
  12. Lu, P.; Tolliver, D. Accident prediction model for public highway-rail grade crossings. Accid. Anal. Prev. 2016, 90, 73–81. [Google Scholar] [CrossRef] [PubMed]
  13. Hu, S.; Li, C.; Lee, C. Model crash frequency at highway-rail grade crossings using negative binomial regression. J. Chin. Inst. Eng. 2010, 35, 841–852. [Google Scholar] [CrossRef]
  14. Oh, J.; Washington, S.P.; Nam, D. Accident prediction model for railway-highway interfaces. Accid. Anal. Prev. 2006, 38, 346–356. [Google Scholar] [CrossRef] [PubMed]
  15. Raub, R.A. Examination of highway–rail grade crossing collisions nationally from 1998 to 2007. Transp. Res. Rec. 2009, 2122, 63–71. [Google Scholar] [CrossRef]
  16. Ma, C.; Hao, W.; Xiang, W.; Yan, W. The impact of aggressive driving behavior on driver-injury severity at highway-rail grade crossings accidents. J. Adv. Transp. 2018. [Google Scholar] [CrossRef]
  17. Kang, Y.; Khattak, A. Cluster-based approach to analyzing crash injury severity at highway–rail grade crossings. Transp. Res. Rec. 2017, 2608, 58–69. [Google Scholar] [CrossRef]
  18. Ghomi, H.; Bagheri, M.; Fu, L.; Miranda-Moreno, L.F. Analyzing injury severity factors at highway railway grade crossing accidents involving vulnerable road users: A comparative study. Traffic Inj. Prev. 2016, 17, 833–841. [Google Scholar] [CrossRef]
  19. Hao, W.; Daniel, J.R. Severity of injuries to motor vehicle drivers at highway–rail grade crossings in the United States. Transp. Res. Rec. 2013, 2384, 102–108. [Google Scholar] [CrossRef]
  20. Hao, W.; Daniel, J. Motor vehicle driver injury severity study under various traffic control at highway-rail grade crossings in the United States. J. Saf. Res. 2014, 51, 41–48. [Google Scholar] [CrossRef]
  21. Hao, W.; Daniel, J. Driver injury severity related to inclement weather at highway-rail grade crossings in the United States. Traffic Inj. Prev. 2016, 17, 31–38. [Google Scholar] [CrossRef]
  22. Haleem, K.; Gan, A. Contributing factors of crash injury severity at public highway railroad grade crossings in the US. J. Saf. Res. 2015, 53, 23–29. [Google Scholar] [CrossRef] [PubMed]
  23. Fan, W.; Kane, M.R.; Haile, E. Analyzing severity of vehicle crashes at highway-rail grade crossings: Multinomial logit modeling. J. Transp. Res. Forum 2015, 39–56. [Google Scholar] [CrossRef]
  24. Eluru, N.; Bagheri, M.; Miranda-Moreno, L.F.; Fu, L. A latent class modeling approach for identifying vehicle driver injury severity factors at highway-railway crossings. Accid. Anal. Prev. 2012, 47, 119–127. [Google Scholar] [CrossRef] [PubMed]
  25. Savolainen, P.T.; Mannering, F.L.; Lord, D.; Quddus, M.A. The statistical analysis of highway crash-injury severities: A review and assessment of methodological alternatives. Accid. Anal. Prev. 2011, 43, 1666–1676. [Google Scholar] [CrossRef] [PubMed]
  26. Yan, X.; Han, L.D.; Richards, S.; Millegan, H. Train-vehicle crash risk comparison between before and after stop signs installed at highway-rail grade crossings. Traffic Inj. Prev. 2010, 11, 535–542. [Google Scholar] [CrossRef] [PubMed]
  27. Mathew, J.; Benekohal, R.F. Highway-rail grade crossings accident prediction using Zero Inflated Negative Binomial and Empirical Bayes method. J. Saf. Res. 2021, 79, 211–236. [Google Scholar] [CrossRef]
  28. Miranda-Moreno, L.F.; Fu, L. A comparative study of alternative model structures and criteria for ranking locations for safety improvements. Netw. Spat. Econ. 2006, 6, 97–110. [Google Scholar] [CrossRef]
  29. Lambert, D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 1992, 34, 1–14. [Google Scholar] [CrossRef]
  30. Ridout, M.; Demétrio, C.G.B.; Hinde, J. Models for count data with many zeros. In Proceedings of the 19th International Biometric Conference, Cape Town, South Africa, 14–18 December 1998. [Google Scholar]
  31. Joe, H.; Zhu, R. Generalized Poisson distribution: The property of mixture of Poisson and comparison with negative binomial distribution. Biom. J. 2005, 47, 219–229. [Google Scholar] [CrossRef]
  32. Mwalili, S.M.; Lesaffre, E.; Declerck, D. The zero-inflated negative binomial regression model with correction for misclassification: An example in caries research. Stat. Methods Med. Res. 2007, 17, 123–139. [Google Scholar] [CrossRef]
  33. Neelon, B.H.; O’Malley, A.J.; Normand, S.-L.T. A Bayesian model for repeated measures zero-inflated count data with application to outpatient psychiatric service use. Stat. Model. Int. J. 2010, 10, 421–439. [Google Scholar] [CrossRef]
  34. Zeng, Q.; Huang, H. A stable and optimized neural network model for crash injury severity prediction. Accid. Anal. Prev. 2014, 73, 351–358. [Google Scholar] [CrossRef]
  35. Abdelwahab, H.T.; Abdel-Aty, M.A. Development of artificial neural network models to predict driver injury severity in traffic accidents at signalized intersections. Transp. Res. Rec. 2001, 1746, 6–13. [Google Scholar] [CrossRef]
  36. Codur, M.Y.; Tortum, A. An artificial neural network model for highway accident prediction: A case study of Erzurum, Turkey. Traffic Transp. 2015, 27, 217–225. [Google Scholar] [CrossRef]
  37. Zheng, Z.; Lu, P.; Pan, D. Predicting highway–rail grade crossing collision risk by neural network systems. J. Transp. Eng. Part A Syst. 2019, 145, 4019033. [Google Scholar] [CrossRef]
  38. Gao, L.; Lu, P.; Ren, Y. A deep learning approach for imbalanced crash data in predicting highway-rail grade crossings accidents. Reliab. Eng. Syst. Saf. 2021, 216. [Google Scholar] [CrossRef]
  39. Yang, C.; Trudel, E.; Liu, Y. Machine learning-based methods for analyzing grade crossing safety. Cluster Comput. 2017, 20, 1625–1635. [Google Scholar] [CrossRef]
  40. UIC Safety Unit. UIC Safety Report 2023—Significant Accidents 2022. Public Report, October 2023. [Google Scholar]
  41. Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
  42. Calvin, J.A. Regression models for categorical and limited dependent variables. Technometrics 1998, 40, 80–81. [Google Scholar] [CrossRef]
  43. Saputro, M.I.A.; Qudratullah, M.F. Estimation of Zero-Inflated Negative Binomial Regression Parameters Using the Maximum Likelihood Method (Case Study: Factors Affecting Infant Mortality in Wonogiri in 2015). Proc. Int. Conf. Sci. Eng. 2021, 4, 240–254. [Google Scholar]
  44. Chang, L. Analysis of freeway accident frequencies: Negative binomial regression versus artificial neural network. Saf. Sci. 2005, 43, 541–557. [Google Scholar] [CrossRef]
  45. Haghani, S.; Sedehi, M.; Kheiri, S. Artificial neural network to modeling zero-inflated count data: Application to predicting number of return to blood donation. J. Res. Health Sci. 2017, 17, 392. [Google Scholar]
  46. Nielsen, M.A. Neural Networks and Deep Learning; Determination Press: San Francisco, CA, USA, 2015; pp. 24–38. [Google Scholar]
Figure 1. Trends in the number of accidents, incidents, and fatalities per accident.
Figure 1. Trends in the number of accidents, incidents, and fatalities per accident.
Applsci 14 01697 g001
Figure 2. Ratio of railroad accidents and incidents with time delay records.
Figure 2. Ratio of railroad accidents and incidents with time delay records.
Applsci 14 01697 g002
Figure 3. Number of significant accidents of UIC members in 32 different countries (see page 27 of the UIC Safety Report 2023).
Figure 3. Number of significant accidents of UIC members in 32 different countries (see page 27 of the UIC Safety Report 2023).
Applsci 14 01697 g003
Figure 4. Causes of accidents of UIC member in 32 different countries (see page 28 of the UIC Safety Report 2023).
Figure 4. Causes of accidents of UIC member in 32 different countries (see page 28 of the UIC Safety Report 2023).
Applsci 14 01697 g004
Figure 5. Distribution of delay time due to human and technical factors.
Figure 5. Distribution of delay time due to human and technical factors.
Applsci 14 01697 g005
Figure 6. The architecture of artificial neural networks.
Figure 6. The architecture of artificial neural networks.
Applsci 14 01697 g006
Figure 7. Flow of computational processes within an artificial neural network model.
Figure 7. Flow of computational processes within an artificial neural network model.
Applsci 14 01697 g007
Table 1. Comparison of the number of railroad events by type in two different periods.
Table 1. Comparison of the number of railroad events by type in two different periods.
Types of Railroad EventsNumbers in
Year 2001–2007
Numbers in
Year 2008–2022
Total
AccidentsTrain fire235
Crash61723
Derailment2485109
Ground-level crossing accidents304188492
Facility breaks282048
Facility fire162238
Other facility-related accidents022
An accident involving passengers related to the train17247922516
     involving staff related to the train130131261
     involving public related to the train11267631889
An accident involving passengers not related to the train310196506
     involving staff not related to the train7815371318
     involving public not related to the train5158109
IncidentsA high potential to develop into a railroad accident602282
Delayed operation312042537373
Total7682708914,771
Table 2. Number of events (incident, accident) and time delayed.
Table 2. Number of events (incident, accident) and time delayed.
YearNumber of Events by CausesDelay Time in Average (Minute)
HumanTechnicalExternalTotalHumanTechnicalExternal
2008447257070448.036.40.0
2009419286170626.330.183.0
20103532483163224.837.047.5
20112923062962720.634.124.5
20122772892358921.031.730.7
20132582672955417.034.729.0
20142332064948826.134.031.5
20151661874039334.028.932.5
20161681604237023.833.943.0
20171271954036228.331.831.6
20181121734633126.732.549.7
20191002655642124.132.628.8
2020711944030525.532.833.6
2021961634130014.838.341.0
20221251542830723.051.087.4
Total324433504957089Avg. 27.2Avg. 34.3Avg. 38.6
Table 3. Summary statistics for explanatory variables.
Table 3. Summary statistics for explanatory variables.
CategoryVariablesProportion
Human FactorTechnical Factor
Types of event (incident or accident)Train fire0.0000.001
Crash0.0040.001
Derailment0.0100.013
Ground-level crossing (GL)0.0560.001
Facility involved0.0050.006
Human involved (related to trains)0.5110.005
Human involved (not related to trains)0.2420.006
High potential to develop into an event0.0120.002
Delayed operation0.1590.965
Year occurredYear (2008~2012)--
Season occurredSpring0.2600.227
Summer0.2650.284
Fall0.2390.224
Winter0.2350.264
Types of trainExpress 0.0790.282
Conventional0.5180.469
Metro0.4030.250
Size of train operatorLarge0.9460.910
Middle0.0140.013
Small0.0400.077
Location occurredCrossing0.0260.001
In station building0.4050.419
Railroad within station0.0430.044
Railroad outside station0.2910.392
In cabin0.0020.000
Train depot0.0080.004
Others0.2250.140
Person type directly related to accidentsPublic0.326-
Passenger0.297-
Crew member0.110-
Control member0.016-
Other staffs0.251-
Cause directly related to human involvedDisobeying rules0.060-
Carelessness or errors0.420-
Illegal act0.255-
Suicide0.239-
Other causes0.026-
Facility type directly related to accidentRail or structures-0.039
Signal or telecommunication -0.135
Electric power -0.047
Train-0.728
Interface between facilities-0.017
Others-0.034
Cause directly related to facility involvedOld parts and defects-0.356
Installation related-0.024
Design or production related-0.093
Operation related-0.009
Maintenance issues-0.330
Other causes-0.188
Table 4. Modeling outputs (delayed time) for the human-related factors.
Table 4. Modeling outputs (delayed time) for the human-related factors.
ParameterTwo-Stage ModelZINB Model
EstimateS.E.p-ValueEstimateS.E.p-Value
Negative Binomial Part
Intercept1.5890.0990.0001.7670.0680.000
Type of event
High potential event0.0110.0880.897−0.0440.0610.478
Human involved 0.0530.1770.763−0.3570.1330.007
   GL−0.0980.0970.312−0.2340.0720.001
Season occurred
   Fall0.0920.0460.0440.0430.0330.188
Person type
   Public−0.1740.0950.066−0.1040.0660.114
   Passenger−0.5070.1010.000−0.2290.0720.001
   Crew−0.1360.0880.123−0.1230.0630.051
Size of train operator
   Middle−0.1650.1770.350−0.3570.1320.007
Type of train
   Conventional0.1890.0470.0000.2430.0340.000
Cause directly related to
   Suicide0.4050.0490.0000.1870.0370.000
Location occurred
 Level crossing0.0320.1350.8120.1690.1010.094
   Between stations0.2190.0450.0000.1100.0360.003
     α   ( o v e r d i s p e r s i o n ) 1.9490.0490.0001.9420.0830.000
Logit Part
Intercept1.1700.1990.0001.2020.2100.000
Type of event
High potential event2.7080.2250.0002.9800.2580.000
Human involved −2.8670.2240.000−2.8650.2280.000
   GL0.9310.3570.0091.0430.4110.011
Season occurred
   Fall0.2810.1380.0420.2940.1480.047
Person type
   Public2.2150.2030.0002.3820.2160.000
   Passenger1.5130.2080.0001.6450.2200.000
   Crew−0.0760.2350.745−0.1240.2560.627
Size of train operator
   middle0.3310.5340.5350.6340.6330.317
Type of train
   Conventional−0.2790.1350.039−0.3590.1440.013
Cause directly related to
   Suicide1.1430.1570.0001.1370.1680.000
Location occurred
   Level crossing−0.7180.4360.099−0.8450.4830.081
   Between stations0.8170.1560.0000.8420.1690.000
LL at constant only −5606 −5606
LL at convergence −4864 −4757
ρ 2 0.132 0.151
AIC 9755 9568
Table 5. Model performance comparison for the human-related factors.
Table 5. Model performance comparison for the human-related factors.
ModelTraining DataTesting Data
R 2 MSERMSE R 2 MSERMSE
Two-stage0.35512.1143.4810.31113.0123.607
ZINB0.38911.4783.3880.34211.5033.392
ANN0.44610.4263.2290.37310.8083.288
ANN/TWO25.6%−13.9%−7.2%19.9%−16.9%−8.8%
ANN/ZINB14.7%−9.2%−4.7%9.1%−6.0%−3.1%
Table 6. Modeling outputs (delayed time) for the technical-related factors.
Table 6. Modeling outputs (delayed time) for the technical-related factors.
ParameterTwo-Stage ModelZINB Model
EstimateS.E.p-ValueEstimateS.E.p-Value
Negative Binomial Part
Intercept1.5480.1700.0001.4850.1480.000
Type of event
High potential event0.0900.1600.5920.1890.1470.198
   Train fire1.0370.4000.0091.0760.3210.001
   Derailment0.0670.2100.7480.4270.1850.021
Facility type related to
   Signal (or telecom.)−0.2900.0360.005−0.2360.0330.000
   Electric power0.1510.0570.0080.1850.0490.000
   Rail (or structures)−0.1660.0650.010−0.0820.0580.158
Cause directly related to
   Others −0.1140.0300.000−0.0890.0260.001
Type of train
   Conventional0.4220.0230.0000.4410.0210.000
Location occurred
   Between stations0.0910.0230.0000.0920.0200.000
    α   ( o v e r d i s p e r s i o n ) 5.5210.0390.0002.4910.0810.000
Logit Part
Intercept1.0050.3420.0000.9070.3480.000
Type of event
High potential event3.8480.3420.0003.9690.3500.000
Train fire14.794378.2920.96814.794656.0150.982
Derailment1.3100.5050.0091.3490.5090.008
Facility type related to
   Signal (or telecom.)−0.7310.1960.000−0.7510.2230.001
   Electric power−0.4490.3570.208−   0.5130.3850.183
   Rail (or structures)−1.0890.2890.000−1.1830.3050.000
Cause directly related to
   Others −0.3210.1790.072−0.3070.1980.121
Type of train
   Conventional−0.0210.1560.892−0.1800.1750.301
Location occurred
   Between stations0.2040.1610.2050.11790.1760.311
LL at constant only −6965 −6965
LL at convergence −6740 −6558
ρ 2 0.032 0.058
AIC 13522 13158
Table 7. Model performance comparison for the technical-related factors.
Table 7. Model performance comparison for the technical-related factors.
ModelTraining DataTesting Data
R 2 MSERMSE R 2 MSERMSE
Two-stage0.19713.3383.6520.10115.5223.940
ZINB0.20913.1403.6250.12414.0483.748
ANN0.30011.6263.4100.18312.4163.524
ANN/TWO52.3%−12.8%−6.6%81.2%−20.0%−10.6%
ANN/ZINB43.5%−11.5%−5.9%47.6%−11.6%−6.0%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, J.-M.; Lim, K.-K. Analyzing the Factors Influencing Time Delays in Korean Railroad Accidents. Appl. Sci. 2024, 14, 1697. https://doi.org/10.3390/app14051697

AMA Style

Kim J-M, Lim K-K. Analyzing the Factors Influencing Time Delays in Korean Railroad Accidents. Applied Sciences. 2024; 14(5):1697. https://doi.org/10.3390/app14051697

Chicago/Turabian Style

Kim, Ji-Myong, and Kwang-Kyun Lim. 2024. "Analyzing the Factors Influencing Time Delays in Korean Railroad Accidents" Applied Sciences 14, no. 5: 1697. https://doi.org/10.3390/app14051697

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop