A Normal Distribution-Based Methodology for Analysis of Fatal Accidents in Land Hazardous Material Transportation

The deaths of accident occurring in land hazardous material transport (rail and road) is a scale standard for judging accident severity in safety programmes. The f-N curve is a common practice to express the results from past scattered accident data through curve fitting method, which only estimate the overall trend. For this reason, this paper proposed a simple methodology by combination of normal distribution and f-N curve. To verify the method, the following three sets of statistical data were selected and analysed in this study: 1932 accidents in over 95 countries (1931–2004) and 322 accidents in China (2000–2008) available in the literature, and 2046 accidents investigated in China (2013–2017). It was found that the mean value curve is almost identical or even better than the best-fitted curve, while the predicted upper and lower limits with 96% reliability (±2σ) covering nearly all the statistical data are beyond the scope of common curve fitting. The result explains the inherent relation between accumulated frequency and deaths in different transport mode, in different country and at different period. This study also provides insights on the evolution of accident severity with the development of social economy and the requirement of safety.


Introduction
Hazardous material accidents occurring in land transport often result in fateful consequences to the population and environment due to the characteristics of dangerous goods. Programmes on safety improvement have been conducted in many countries, such as the Hazardous material Cooperative Research Program (HMCRP) in the United States, the 5-year plan for safety production in China, and the zero-accident goal by 2050 in EU. Therefore, it is important to evaluate the effects of accident severity in safety programmes and make suitable plans with the development of social economy and the requirement of safety.
Several studies using statistical data have discussed the accident severity with different methods [1][2][3]. For example, Ellis et al. [4] analysed the ratio of hazmat-ship accidents to overall-ship accidents, and Zhang et al. [5] used the proportion of death to injury for hazardous chemical accidents. Abdolhamidzadeh et al. [6] presented the average number of fatalities per accident globally. Brüde [7] described a model for successively monitoring the development of fatalities. These studies mostly make a simple overview of death and injury, without deeply analysis on accident severity. Due to incomplete or inaccurate information of accidents (e.g., economic loss and injured degree) [8,9], it is statistically impossible to conduct a comprehensive study considering all the consequences of hazmat accidents. In this study, only fatal accidents, which are nearly 100% precise, were considered [10].
The f -N curve (social risk curve) is commonly used to describe the accident severity, by presenting the relationship between the accumulated probability and the number of deaths [11][12][13][14]. Except for marine accidents of hazmat transportation, straight lines are obtained by plotting an f-N curve for road, rail and pipeline accidents [15]. The slope of the straight line, obtained by normal curve fitting, is used to evaluate the accident severity as a metric. For example, by comparing the slope of the f -N curve, Hemmatian et al. [16] and Darbra et al. [17] concluded that hazmat accidents involving domino effects have slightly more severe consequences for the affected population than that of hazmat accidents without domino effects.
A new method with a mean value curve based on normal distribution was proposed to be identical or even better than the best-fitted curve [18,19]. Furthermore, predictions with 96% reliability (±2σ) of the upper and lower bounds covering nearly all the statistical data are beyond the scope of normal curve fitting [18], and this may also be used for the prediction of severity analysis on fatal accidents. On this basis, the aim of this paper is to propose a methodology based on normal distribution and the f -N curve and, therefore, to explore ways to decrease fatal accidents with the related safety policies and programmes.

Data Resource
Hazardous material accidents mostly happened during transportation, especially in land transport; in China, at least 80% of hazmat transportation accidents occurred on the road [2]. However, the accident statistical data is limited due to underreported accidents-for example, some states only report accidents that result in property damage above a specific threshold dollar amount, while others require the degree of vehicle damage to be above a certain level [20]. In addition, it is well-known that individuals involved in accidents involving no injury or minor injury are far less likely to have their crashes reported to the police. A technical report by the National Traffic Safety Administration [21] estimated that 25% of minor injury accidents and half of no-injury crashes are unreported-a sharp contrast to fatal crashes, for which the reporting rate is nearly 100% [10].
Recognizing the limited accident information is an important consideration in selecting an appropriate sample data. Hence, this study selected accidents with at least one death that occurred in land hazardous material transportation, and three sets of data were analysed in this study. One data set is digitized from Oggero conducting a survey of 1932 accidents (including 242 fatal accidents) occurred during the land transport of hazardous material from over 95 countries. Oggero et al. [11] analysed fatalities with an f -N curve which provides a comparative study for testing the normal distribution-based method. The second sample data set is digitized from Yang et al. [12] with 322 accidents occurred during the road hazmat transportation in China from 2000 to 2008. The reliability of graphic digitalization has been checked by comparing with the original result in the literature. It is found that the digitized error is no more than 1%, which is acceptable. In addition, deaths analysis with an f -N curve was also carried out in Yang et al. [12]. The third data set contains 2046 accidents (including 217 accidents caused deaths) happened in road hazardous material transportation in China from 2013 to 2017, and these data are investigated from the Journal of Safety and Environment and Chemical Accidents Information Network. Detailed information was gathered from the newspaper and the Internet.

Method
The f -N curve is a method expressing the results of risk evaluation. It refers to accumulated frequency of N people or more than N people affected in accidents. For evaluating the severity of fatal accidents in hazmat transportation, the most commonly used method is to plot f -N curve by showing the relation between accumulated frequency and number of deaths. Accidents with at least one fatality were selected and grouped according to the number of deaths. The cumulative probability or frequency is calculated by the following expression: where N is the number of deaths (x-axis), P (x≥N) = F j is the probability that the number of deaths is equal or more than N (y-axis) in an accident, n is the total number of categories or rankings, and N i is the number of accidents in a given category i.
In a log-log system, F j is roughly linear to N i for land transport accidents (lgF j = b · lgN i ), which is different to the marine transport that has a hump in the f -N curve [15]. To better present this distribution in quantification and visualization, previous studies have obtained the best fitted curve with a slope of b [11][12][13][14]. The slope of b refers to relative probability, which means, for example, that the probability of an accident involving 10 or more deaths is 10 −b (F = 10 b /100 b = 10 −b = N −b , when N = 10) times greater than that of an accident involving 100 or more deaths. However, the value of b is different in various conditions-for example, the slope is −0.81 from Oggero et al. [11] while −0.84 was obtained in Vílchez [14]. Therefore, it is essential to obtain the value of b in precise.
The slope of b is defined by the intersection of two criteria-F and N, and is normally distributed by using Equation (2). Therefore, the distribution of b i can be obtained with the corresponding mean value µ and the standard deviation σ.
The distribution of b i , closest to a normal distribution with the smallest standard deviation is used to determine the relative probability and make a prediction of an f -N curve with the mean (µ) and deviation (σ). With the accumulated probability (F), the number of deaths (N i ), and the corresponding standard deviations (σ), the equation of a straight line can be written to include the mean value curve and upper and lower limits with 96% reliability (±2σ), i.e., The above equation means that the relative probability can be fully predicted with the mean µ (µ = b) and the standard deviation σ. The severity of accidents can then be analysed with an accurate mean value curve and upper and lower limits that address the fluctuation of accidents in hazardous material transportation.
To show the usefulness of the methodology, we examine the relation between accumulated frequency and number of deaths in different transport mode, in different country and at different period.

Severity Analysis of Fatalities in Different Transport Mode
The severity of fatal accidents occurred in different transport mode is different. In this paper, by plotting an f -N curve based on normal distribution, we conducted a comparative study to analyse the severity of fatalities of road and rail transport accidents. There are 31 results (F, N i ) in road and 18 results (F, N i ) in rail both digitized from Oggero et al. [11], listed in Tables 1 and 2, respectively. In the table, NA means b i is not available in a log-log system. Table 1. Accumulated frequency of road accidents with N deaths digitized from Oggero et al. [11]. Based on the corresponding experimental data in Oggero et al. [11], the slope of b i is normally distributed from Equation (2) with the mean value µ = −0.770 and the standard deviation σ = 0.0768 (in Figure 1a). The f -N curves of the mean value and the predictions of upper and lower limits from Equation (3) were plotted in Figure 1b, together with previous result-a fitted curve (in blue colour). The fitted curve is plotted from the digitized data and is identical to the original result in the literature.
by plotting an f-N curve based on normal distribution, we conducted a comparative study to analyse the severity of fatalities of road and rail transport accidents. There are 31 results (F, Ni) in road and 18 results (F, Ni) in rail both digitized from Oggero et al. [11], listed in Tables 1 and 2, respectively. In the table, NA means bi is not available in a log-log system.  Based on the corresponding experimental data in Oggero et al. [11], the slope of bi is normally distributed from Equation (2) with the mean value = −0.770 and the standard deviation = 0.0768 (in Figure 1a). The f-N curves of the mean value and the predictions of upper and lower limits from Equation (3) were plotted in Figure 1b, together with previous result-a fitted curve (in blue colour). The fitted curve is plotted from the digitized data and is identical to the original result in the literature.  (1) and (3) based on the normal distribution analysis of bi in (a) together with the statistical data in road accidents from 1931 to 2004 digitized from Oggero et al. [11].
The mean value curve (the black straight line) in Figure 1b, based on normal distribution, is slightly different from the fitted curve (the blue dotted line). The mean value curve (b = −0.77) describes these experimental results even better than the best-fitted curve (b = −0.81) [11]. The same result that the normal distribution is identical or better than the curve fitting was published by Zhang et al. [18]. The gradient of the upper dashed line is −0.616, and the gradient of the lower The mean value curve (the black straight line) in Figure 1b, based on normal distribution, is slightly different from the fitted curve (the blue dotted line). The mean value curve (b = −0.77) describes these experimental results even better than the best-fitted curve (b = −0.81) [11]. The same result that the normal distribution is identical or better than the curve fitting was published by Zhang et al. [18]. The gradient of the upper dashed line is −0.616, and the gradient of the lower dashed line is −0.924. This indicates that the relative frequency of an accident involving 10 or more deaths versus an accident involving 100 or more deaths, ranges from 4.1 to 8.4 (F= 10 −b ), and the mean value is 5.9. Table 2 presents the accumulated frequency (F), the number of deaths (N i ) and the slope of b i for rail transport accidents (NA means b i is not available). The same as road accidents, the slope of b i in rail accidents is normally distributed from Equation (2) with the mean value µ = −0.752 (approximately 0.75 in the literature) and the standard deviation σ = 0.062. Figure 2 shows the distribution of fatalities and the f -N curves with the mean value and the predictions of upper and lower limits from Equation (3), together with a fitted curve from Oggero et al. [11] on a log-log system. Table 2. Accumulated frequency of rail accidents with N deaths digitized from Oggero et al. [11].
In Figure 2, the slope of the fitted curve (−0.748) [11] is almost identical to the slope of the mean value curve (0.752). b i ranges from −0.873 to −0.630 using Equation (3). This means, for example, an accident involving 10 or more deaths is 4.3 to 7.5 (F= 10 −b ) times greater than an accident involving 100 or more deaths, and the mean value is 5.6.
for rail transport accidents (NA means bi is not available). The same as road accidents, the slope of bi in rail accidents is normally distributed from Equation (2) with the mean value = −0.752 (approximately 0.75 in the literature) and the standard deviation = 0.062. Figure 2 shows the distribution of fatalities and the f-N curves with the mean value and the predictions of upper and lower limits from Equation (3), together with a fitted curve from Oggero et al. [11] on a log-log system. Table 2. Accumulated frequency of rail accidents with N deaths digitized from Oggero et al. [11].  Figure 2, the slope of the fitted curve (−0.748) [11] is almost identical to the slope of the mean value curve (0.752). bi ranges from −0.873 to −0.630 using Equation (3). This means, for example, an accident involving 10 or more deaths is 4.3 to 7.5 (F= 10 −b ) times greater than an accident involving 100 or more deaths, and the mean value is 5.6. Results show that the slope of f-N curve (the mean value curve) of rail accidents is less than the slope of road accidents, showing fatal accidents occurred in rail transport is more severe than in road transport in general. However, the scope of predictions with upper and lower bounds in road accidents (−0.924-−0.616) contains the scope of rail accidents (−0.873-−0.630), and this indicates that the severity of fatal accidents occurred in road transport is more severe in the upper bound and is less severe in the lower bound than that of rail transport. Results show that the slope of f -N curve (the mean value curve) of rail accidents is less than the slope of road accidents, showing fatal accidents occurred in rail transport is more severe than in road transport in general. However, the scope of predictions with upper and lower bounds in road accidents (−0.924-−0.616) contains the scope of rail accidents (−0.873-−0.630), and this indicates that the severity of fatal accidents occurred in road transport is more severe in the upper bound and is less severe in the lower bound than that of rail transport.

Severity Analysis of Fatalities in Different Country
The severity of fatal accidents is closely pertinent to the development of the country. Oggero et al. [11] analysed the fatalities of accidents by grouping countries into three groups: (1) United States, Canada, Australia, Japan, New Zealand and Norway; (2) the European Union; and (3) the rest of the world. Countries in Group 1 and Group 2 are all well-developed, while countries in Group 3 are almost developing countries. Oggero et al. [11] concluded that the severity of accidents in the first two groups is similar and in group (3) the severity of accidents is clearly higher than in the first two.
To conduct a comparative study on the differences of severities influenced by the level of development in different countries, we use the same sample and groups from Oggero's [11]. In total, 13 data points (F, N i ) of Group 1, 11 data points (F, N i ) of Group 2, and 33 data points (F, N i ) of Group 3 were digitized, and listed in Table 3. All these data contains fatal accidents occurred in both road and rail hazmat transportation in the same period. NA means b i is not available in a log-log system in Table 3. Table 3. Accumulated frequency of land transport accidents with N deaths for the three groups digitized from Oggero et al. [11]. According to Equation (2), we draw the same conclusion that the slope of b i is normally distributed and equal to the original literature for all the three groups. Based on a normal distribution, we obtain the mean value curve and the predictions with upper and lower lines, together with the fitted curve commonly used, and they are listed in  In Figures 3 and 4, the mean value of Group 1 (b = −1.114) is very close to the mean value in Group 2 (b = −1.079), and this two groups include most developed countries. This indicates that the severity of fatalities occurred in developed countries is similar. While the predicted slopes with the upper and lower bounds describe the gap between Group 1 and Group 2 by using Equation (3). In Group 1, the relative probability for the deaths ranges from 5.8 to 29.1, and in Group 2, this value ranges from 5.4 to 26.7. This indicates in the countries (i.e., United States, Canada, Australia, et al.) of Group 1, fatalities caused in the accident of hazardous material transportation may appear to higher volatility than in EU of Group 2.
In Group 3, the slope of the mean value curve is −0.428, and is much higher than the first two groups. The slopes of predictions with upper and lower bounds ranges from −0.152 to −0.704 (Figure 5b), and this means an accident involving 10 or more deaths is 1.4 to 5.1 (F = 10 −b ) times greater than an accident involving 100 or more deaths. Comparing to the first two groups, the severity of fatalities in Group 3 is the highest, and this result is in good agreement with the study published by Oggero et al. [11].

Severity Analysis of Fatalities at Different Period
To provide an updated survey and conduct a comparative study on the severity of fatal accidents occurred in the road transportation of hazardous material, this study selected 322 accidents (2000-2008) with 8 data points (F, N i ) digitized from Yang et al. [12] and investigated 2046 accidents including  Table 4 and NA is not available. Table 4. Accumulated frequency of hazardous material accidents occurred in road transportation with N deaths from Yang et al. [12] and investigation in different periods of China.

2000-2008 (Yang et al. [12])
2013-2017 (Investigated Data) As shown in Table 4, it is easy to establish a normal distribution for the slope b i with specified or 96% reliability. The normal distribution analysis on slope b i based on Equation (2), and the predictions with mean value and with upper and lower bounds based on Equation (3), are shown in Figures 6 and 7, together with the corresponding best-fitted curves.
severity of fatalities in Group 3 is the highest, and this result is in good agreement with the study published by Oggero et al. [11].

Severity Analysis of Fatalities at Different Period
To provide an updated survey and conduct a comparative study on the severity of fatal accidents occurred in the road transportation of hazardous material, this study selected 322 accidents (2000-2008) with 8 data points (F, Ni) digitized from Yang et al. [12] and investigated 2046 accidents including 217 fatal accidents (2013-2017) with 11 data points (F, Ni) in China. All the data points are listed in Table 4 and NA is not available.  Table 4, it is easy to establish a normal distribution for the slope bi with specified or 96% reliability. The normal distribution analysis on slope bi based on Equation (2), and the predictions with mean value and with upper and lower bounds based on Equation (3), are shown in Figures 6 and 7, together with the corresponding best-fitted curves.   Figure 6a shows that b i is normally distributed with the mean value µ = −1.239 and the standard deviation σ = 0.118. The fitted-curve (b = −1.186) plotted by using the digitized data is slightly different from the original result in the literature. This small error may come from digitization of the experimental data. However, the digitized error is acceptable. In Figure 6b, different from the fitted curve, the mean value curve is close to the first few data points with less than or equal to 6 deaths, which is more representative. Figure 7a shows that b i is normally distributed with the mean value µ = −1.687 and the standard deviation σ = 0.278. In Figure 7b, the mean value curve is lower than the best-fitted curve because accidents are within severe fluctuation (i.e., 40 and 58 deaths in accidents) in this sample. This indicates the mean value curve is better than the fitted curve. curve, the mean value curve is close to the first few data points with less than or equal to 6 deaths, which is more representative. Figure 7a shows that bi is normally distributed with the mean value = −1.687 and the standard deviation = 0.278. In Figure 7b, the mean value curve is lower than the best-fitted curve because accidents are within severe fluctuation (i.e., 40 and 58 deaths in accidents) in this sample. This indicates the mean value curve is better than the fitted curve.

Fatal Transportation Accidents by Road and Rail
According to the statistical data of hazardous material transportation, accidents occurring on a railway are much less frequent than that on a road, while research shows that the consequence of an accident is likely to be more severe if the accident occurs on a railway rather than on a road [11]. The same conclusion was obtained through the analysis of the mean value curve-the slope is −0.77 for road accidents (Figure 1) and −0.75 for rail accidents (Figure 2). However, predictions on the upper and lower bounds with 96% reliability (±2σ) show that the relative probability of the fatalities of road accidents (4.1 to 8.4) ranges larger than the rail accidents (4.3 to 7.5). This indicates that in some cases, the road accident-severity may be more severe than rail because the consequences of an accident are determined by possible causes, for example, impact failure, mechanical failure, human factor and external conditions, and by emergency responses. Besides, railway hazmat transportation has a fixed route system and each train involves multiple cars with hazardous material. Once a hazmat train derailed, multiple tank cars may release [22]. This usually leads to severe consequences in a particular scope. Research on the train derailment severity of derailed cars has been carried out in recent years [23][24][25][26].

Fatal Transportation Accidents by Road and Rail
According to the statistical data of hazardous material transportation, accidents occurring on a railway are much less frequent than that on a road, while research shows that the consequence of an accident is likely to be more severe if the accident occurs on a railway rather than on a road [11]. The same conclusion was obtained through the analysis of the mean value curve-the slope is −0.77 for road accidents ( Figure 1) and −0.75 for rail accidents (Figure 2). However, predictions on the upper and lower bounds with 96% reliability (±2σ) show that the relative probability of the fatalities of road accidents (4.1 to 8.4) ranges larger than the rail accidents (4.3 to 7.5). This indicates that in some cases, the road accident-severity may be more severe than rail because the consequences of an accident are determined by possible causes, for example, impact failure, mechanical failure, human factor and external conditions, and by emergency responses. Besides, railway hazmat transportation has a fixed route system and each train involves multiple cars with hazardous material. Once a hazmat train derailed, multiple tank cars may release [22]. This usually leads to severe consequences in a particular scope. Research on the train derailment severity of derailed cars has been carried out in recent years [23][24][25][26].

The Impact of the Development Levels
In Group 1 and Group 2, the slopes of the mean value curves are approximately −1.1 (Figures 3-5), which is in agreement with the result in Oggero et al. [11] and Carol et al. [27]. Subtle differences can be described by using the upper and lower boundaries with 96% reliability. In Group 1, the relative probability ranges from 5.8 to 29.1, while in Group 2 this value ranges from 5.4 to 26.7. This indicates that the consequences of an accident are likely to be identical or more severe if the accident occurs in the European Union than in the countries of Group 1. In Group 3, the relative probability of upper and lower limits ranges from 1.4 to 5.1, which is far less than the first two groups. This result is identical to the hypothesis of Law et al. (2011) [28]. Considering the levels of economic development, it is found that the developing countries in Group 3 are at the early stage of industrialization and pay little attention on safety programmes. In developed countries, the severity of fatal accidents declines as the per capita income increases and the improvement in safety management.

The Evolution of Accident Severity
Combining the five stages in safety production and the development of the social economy [1,29], the severity of fatal accidents occurred in hazmat transportation will undergo five stages correspondingly. The five stages are as follows: Stage 1-the severity is relatively low with few transportation accidents in agricultural economy; Stage 2-the severity increases as the number of accidents increase in early industrialization; Stage 3-the severity reaches a general trend with fluctuations in middle industrialization; Stage 4-a general decline of accident severity in advanced industrialization is seen; Stage 5-a stabilized period for accident severity in the information society is seen.
By comparing the slope of the mean value curve in Figure 6b and in Figure 7b, it shows that the consequences of an accident are likely to be slightly less severe if the accident occurred in 2013-2017 rather than in 2000-2008. This indicates that in China the severity of fatal accidents in hazardous material transportation has decreased as a whole. Considering the above result and the reduction of safety accidents [30][31][32], China is at Stage 4 in the evolution of the accident severity in hazardous material transportation. This achievement may be attributed to the development of social economy and the implementation of national safety programmes, such as the 5-year plans for safety production.

Conclusions
For the fatalities analysis of accidents, the common practice is to fit the average results (i.e., fitting an f -N curve) regardless how large the scatters in statistical results are. In this study, by combination of normal distribution and the f -N curve, a simple and reliable methodology is proposed for accident data analysis. On the basis of this theory, we presented the fatal accidents by groups of transport modes (i.e., road and rail), countries, and periods, respectively. It is evident that the mean value curves are almost identical or even better than the linear-fitted curves, but the predicted upper and lower limits with 96% reliability covering nearly all the statistical data are beyond the scope of common curve fitting. With the development of social economy and safety requirements, the evolution of accident-severity undergoes five stages. Usually in developed countries the severity of fatalities is less severe than that in developing countries, because infrastructures and safety measures during the transportation of hazardous material are not effective in developing countries.
In principle, the methodology presented in this study on the fatalities analysis can be used to evaluate safety policies and propose countermeasures for the transportation of hazardous material. This will make relevant study more objective than the common practice of curve fitting as routinely done.