Potentialities of Vehicle Trajectory Big Data for Monitoring Potentially Fatigued Drivers and Explaining Vehicle Crashes on Motorway Sections

: Task-related fatigue, caused by prolonged driving, is a major cause of vehicle crashes. Despite noticeable academic achievements, monitoring drivers’ fatigue on road sections is still an ongoing challenge which must be met to prevent and reduce tra ﬃ c accidents. Fortunately, individual instances of vehicle trajectory big data collected through advanced vehicle-GPS systems o ﬀ er a strong opportunity to trace driving durations. We propose a new approach by which to monitor task-related fatigued drivers by directly using the ratio of potentially fatigued drivers (RFD) to all drivers for each road section. The method used to compute the RFD index was developed based on two inputs: the distribution of the driving duration (extracted from vehicle trajectory data), and the boundary condition of the driving duration between fatigued and non-fatigued states. We demonstrate the potentialities of the method using vehicle trajectory big data and real-life tra ﬃ c accident data. Results showed that the measured RFD has a strong explanatory power with regard to the tra ﬃ c accident rate, with a statistical correlation of 0.86 at least, for regional motorway sections. Therefore, it is expected that the proposed approach is a feasible means of successfully monitoring fatigued drivers in the present and near future era of smart-mobility big data.


Introduction
Given that automobile driving is a critical safety task and also a daily socio-economic activity, automobile crashes are a major cause of enormous socio-economic losses (e.g., severe injuries, deaths, and economic losses) [1][2][3][4]. For instance, road traffic crashes caused 1.25 million deaths and cost governments around 3% of GDP globally in 2013 [5]. A review of the literature suggests that the majority of road-related crashes are generally attributed to drivers' faults, which could have been prevented [6][7][8]. Hence, in order to analyze and prevent road traffic crashes more efficaciously, noticeable academic efforts to explore and determine significant factors that may affect vehicle crashes have been made in the field of road transport safety.
One significant factor from academia and driver experience is known as "driving fatigue." The term "driving fatigue" has been defined as a state of deteriorated mental alertness [9], a transient period between awake and asleep [10], and physiological and psychological processes [11], all of which, if left undisturbed, impact on the abilities of drivers to perform driving tasks safely. There is a consensus that driving fatigue is a direct or contributing cause of road traffic accidents, especially on motorways [10,[12][13][14][15][16][17][18]. Prolonged and monotonous driving on motorways increases driving fatigue [1,9,10,19], which in turn significantly affects fatigue-related vehicle crashes [11,12,14,[16][17][18][19][20][21][22][23][24]. vehicle crashes. The answers, if obtainable and useful, can then provide a new opportunity to more efficaciously analyze and prevent automobile accidents from the standpoint of driver fatigue.
To address this ongoing issue, a promising and practical method that allows the direct measurement of potentially fatigued drivers along a motorway network under the condition of vehicle trajectory big data is proposed for the first time in this field. This model is devised based on a direct and disaggregated big data-driven approach, sparing man-made artificial and complex mathematical formulae. The big data-driven approach is simplified for real-life applications, and it is designated to satisfy the computational time required in time-critical big data systems. This approach is also a feasible solution to the privacy policies which have been a chronic hindrance to the utilization of vehicle trajectory data in many cases. In this manner, a new measurement of the number of potentially fatigued drivers on any road section is computed, based on any type of actual vehicle trajectory big information. The feasible potentialities of the proposed method for monitoring fatigued drivers on any motorway section and understanding relationships between the monitored fatigued drivers and vehicle crashes are demonstrated through a case study with both vehicle-GPS trajectory big data and motorway traffic accident data based on observations. Finally, based on the case study results and certain findings, the limitations of the proposed method, and further research directions are given from academic and practical viewpoints.

Approach Concept
It is self-evident that the driving duration (i.e., time-on-task) is a major cause of driver fatigue under typical driving conditions and in turn, that driver fatigue strongly affects the risk of an automobile crash. These facts present the opportunity to use the actual path travel times of individual vehicles on a road network, if available, to survey real-life driving durations and then to monitor the driver fatigue state along a road section. This opportunity can be realized through the GPS information (i.e., geographic coordinates) pertaining to vehicle trajectories, due to the fact that the driving durations of individual drivers along a chain of road sections can be directly and precisely tracked using individual spatiotemporal vehicle trajectories. Currently, vehicle trajectory big data are also collected through widely utilized vehicle-GPS devices and smartphone navigators.
To mine this opportunity, a method capable of measuring potentially fatigued drivers for any road section using vehicle-GPS trajectory data is proposed in this study. The proposed method is developed based on the following two concepts. The first concept is that the driving-duration distribution of drivers using a road section while utilizing vehicle-GPS devices has a high statistical correlation with that of all drivers using the same road section. This concept is also supported by the clear fact that the vehicle-GPS probe volume is a direct part of the vehicular traffic volume for a road location [41,42]. The second concept is that the physical and physiological conditions of drivers develop into a fatigue state when they experience driving durations greater than a time limit for continued driving. If the two concepts are reasonable, then a ratio of potentially fatigued drivers (RFD) to all drivers for a road section can be inferred using both the driving-duration distribution of drivers using vehicle-GPS and the time limit. As such, the RFD value can be efficaciously used for a risk analysis of fatigue-related vehicle crashes in itself due to the fact that the risk of vehicle crashes increases substantially during prolonged driving [17]. Despite this potential, no research on the monitoring of driver fatigue of a road section using the feature of the driving-duration distribution could be found in our literature review.

Measurement of Potentially Fatigued Drivers
The aim of this study is to directly monitor potentially fatigued drivers on a road segment using the RFD index. In order to compute the RFD of a target road section, two components (the driving-duration distribution of drivers using vehicle-GPS, and a time limit for continued driving) are defined, after which an estimation function to generate the RFD is proposed using the components. It is noted that using an average of driving duration as an independent variable cannot generate the Sustainability 2020, 12, 5877 4 of 16 RFD, even though a relationship can be identified between average values of driving durations and traffic accidents for road sections.
In this study, the driving duration (t) for a driver who uses a GPS system in the vehicle is regarded as the actual driving time from a departure location to the target road section. The actual driving time can also be obtained by removing the not-driving activity time at rest areas from the total travel time between the departure location and target road section (Figure 1a). Each driving duration of drivers from their departure locations to the target road section can be extracted through map-matching techniques ( Figure 1b) and can then be compiled into a frequency distribution of the driving duration with a fixed interval of the driving duration (∆t). Thus, let us define the distribution of t for anonymous drivers as a frequency function of f (t), as shown in Figure 1c, to formulate the RFD, where 0.0 ≤ t ≤ t max , with t max as the maximum driving duration. It should be noted that this disaggregated approach is efficient and convenient for the collection and management of big individual data with a marginal error of ∆t/2 in practice. This approach is also a feasible solution to real-life obstacles (e.g., computational time, data management and transmission, and privacy policies) especially in the case of utilization of vehicle trajectory big data.

Data and Characteristics
To demonstrate the potential of RFD to explain the vehicle-crash risk on motorways, a case study was carried out using two types of real-life test data: an accident-rate index and the frequency distribution of driving durations. The time span of the test data is the full year of 2017. The test beds for the data are two main motorway lines in South Korea, as shown in Figure 2, in which lines 1 and 15 consist of 37 and 33 road sections between two interchanges, respectively. All road sections also meet motorway design standards and guidelines through continuous improvements (in an effort to reduce vehicle crashes). The test data for the test bed are shown in Table 1. Related to the time limit, safe limits for continued driving has been widely debated in driving simulation-based studies. For instance, a wide range of safe limits, including durations of 40 [1,43], 60 [11,33,44], and 80 minutes [17,45] were proposed, using various fatigue-symptom measures which are significantly related to the temporal development of driving fatigue. These safe limits were proposed based on the driving duration, a major fatigue factor, under typical and monotonous driving conditions. Driver fatigue is also commonly related to multidimensional causal factors such as the driving environments (e.g., road geometry, visibility, travel speed, traffic volume) [29,46], physiological conditions (e.g., sleep deprivation, chronic sleepiness, extended durations of wakefulness, mental workload, circadian rhythms, ethanol and drug use) [25][26][27][47][48][49][50], and social features (e.g., age, gender) [48,51]. Hence, it is natural that boundary conditions between non-fatigue and fatigue states for individual drivers reveal wide variations in real-life driving. Here, we assume that the boundary conditions for anonymous drivers have a normal probability distribution p(t u, δ) over the driving duration (t), where u and δ are the mean and standard deviation of the probability distribution, respectively ( Figure 1c). In addition, p(t u, δ) plays a role in explaining the temporal evolution of driving fatigue. Thus, the probability of fatigued drivers at t can be estimated by the cumulative probability cp(t u, δ) , as shown in Figure 1c. This parameter is expressed as follows: Based on these considerations (i.e., f (t) and cp(t)), the number of potentially fatigued drivers (PFD), as defined in Equation (2), out of all drivers for a target road section, is estimated using f (t) and p(t), as shown in Figure 1d. Finally, the ratio of potentially fatigued drivers (RFD, 0.0~1.0) to all drivers who use the target road section is computed with Equation (3). During the actual computation process using the frequency distribution of the driving duration, which ranges from t = 1 to t = t max with an interval of the driving duration (i.e., ∆t), the RFD is computed with Equation (4).

Data and Characteristics
To demonstrate the potential of RFD to explain the vehicle-crash risk on motorways, a case study was carried out using two types of real-life test data: an accident-rate index and the frequency distribution of driving durations. The time span of the test data is the full year of 2017. The test beds for the data are two main motorway lines in South Korea, as shown in Figure 2, in which lines 1 and 15 consist of 37 and 33 road sections between two interchanges, respectively. All road sections also meet motorway design standards and guidelines through continuous improvements (in an effort to reduce vehicle crashes). The test data for the test bed are shown in Table 1.

Data and Characteristics
To demonstrate the potential of RFD to explain the vehicle-crash risk on motorways, a case study was carried out using two types of real-life test data: an accident-rate index and the frequency distribution of driving durations. The time span of the test data is the full year of 2017. The test beds for the data are two main motorway lines in South Korea, as shown in Figure 2, in which lines 1 and 15 consist of 37 and 33 road sections between two interchanges, respectively. All road sections also meet motorway design standards and guidelines through continuous improvements (in an effort to reduce vehicle crashes). The test data for the test bed are shown in Table 1.   RN and SN stand for line number and road section number, respectively. N = number of lanes; L = length (km); Q = annual average daily traffic volume (vehicle/day); AN = number of vehicle crashes (accidents/year); AR = accident rate. DS stands for dataset-scenario number which was used to identify an optimal boundary condition in Section 3.2.
The accident rate (AR, accidents per 10 6 × vehicle-km) for each road section (i), as defined by Equation (5), was used as an index of the risk of a vehicle crash in our case study. AR is also a widely used indicator in analyses of traffic accidents in practice due to the fact it is based on revealed traffic accident data. The ARs for the 70 target road sections obtained from the traffic accident database of the Korea Highway Cooperation are shown in Figure 3a. ARs that exhibit wide variations from 0.02 to 0.27 exponentially decrease to the increment of the traffic volume.
Here, a i , q i , and l i are the number of vehicle crashes (accidents/year), the annual average daily traffic volume (vehicle/day), and the length (km) of each target road section (i). In addition, d is the number of days of the year. average daily traffic volume) × 100. The trend of the PR values is stable within an average ± 10%, despite the fact that the variation of PR increases according to the decrement of the traffic volume. PRs for all target road sections also statistically meet the sample rate (%) during the full year, which is the time dimension of the test data. For instance, the recommended sample size for a population of 2,007,500 (= 5500 vehicle/day × 365 day/year) is 16,504 (i.e., sample rate = 0.822%) at a 99.0% confidence level with a 1.0% margin of error. The frequency distribution of the driving duration for each target road segment was built using the collected vehicle trajectory data, as follows: 1) each path travel time from a departure location to the middle location of the target road segment was extracted from the vehicle trajectory database through a map-matching process. 2) Each driving duration value for the target segment was calculated by removing the total non-driving time at rest areas from the path travel time. 3) All driving duration values for anonymous drivers were compiled into the frequency distribution of driving durations for the target segment with the length of the interval (∆ ) equal to one minute and the maximum driving duration (i.e., ) equal to 480 min. The frequency distributions built in this manner are highly complex, ranging from left-biased one-peak to right-biased multi-peak examples, as shown in Figure 4. Fatigued drivers, from the standpoint of the recommended safe limit (60 min) for continued driving [11,33,44], range from 15.2% to 84.7% with average and deviation values of 54.1% and 18.6%, respectively. That is, more than half of vehicle drivers for more than 50% of the road sections are at least in a state of task-related fatigue, as the target road sections mainly serve as an inter-regional motorway to cover middle-and longdistance vehicle trips. This fact presents a new opportunity to monitor the degree of task-related fatigued drivers directly, at least for a motorway section for cases in which a suitable boundary between non-fatigue and fatigue driving is available. Additionally, no research on a risk analysis of Point-to-point vehicle trajectory big data (about 25 terabytes/year), collected by a live vehicle-GPS system and transferred to a data center through 4G LTE wireless communication, were used in our case study. The vehicle-GPS is mounted typically on cars and trucks, even though the type of vehicle is not clear. In addition, the coordinates (i.e., latitude and longitude) of vehicle trajectory were adjusted through a map-matching process. The penetration rate of the vehicle-GPS service is shown in Figure 3b, where penetration rate (PR, %) = (annual average daily vehicle-GPS probe)/(annual average daily traffic volume) × 100. The trend of the PR values is stable within an average ± 10%, despite the fact that the variation of PR increases according to the decrement of the traffic volume. PRs for all target road sections also statistically meet the sample rate (%) during the full year, which is the time dimension of the test data. For instance, the recommended sample size for a population of 2,007,500 (=5500 vehicle/day × 365 day/year) is 16,504 (i.e., sample rate = 0.822%) at a 99.0% confidence level with a 1.0% margin of error.
The frequency distribution of the driving duration for each target road segment was built using the collected vehicle trajectory data, as follows: (1) each path travel time from a departure location to the middle location of the target road segment was extracted from the vehicle trajectory database through a map-matching process. (2) Each driving duration value for the target segment was calculated by removing the total non-driving time at rest areas from the path travel time. (3) All driving duration values for anonymous drivers were compiled into the frequency distribution of driving durations for the target segment with the length of the interval (∆t) equal to one minute and the maximum driving duration (i.e., t max ) equal to 480 min.
The frequency distributions built in this manner are highly complex, ranging from left-biased one-peak to right-biased multi-peak examples, as shown in Figure 4. Fatigued drivers, from the standpoint of the recommended safe limit (60 min) for continued driving [11,33,44], range from 15.2% to 84.7% with average and deviation values of 54.1% and 18.6%, respectively. That is, more than half of vehicle drivers for more than 50% of the road sections are at least in a state of task-related fatigue, as the target road sections mainly serve as an inter-regional motorway to cover middle-and long-distance vehicle trips. This fact presents a new opportunity to monitor the degree of task-related fatigued drivers directly, at least for a motorway section for cases in which a suitable boundary between non-fatigue and fatigue driving is available. Additionally, no research on a risk analysis of real-life traffic accidents using the driving-duration characteristics of road sections could be found during our literature review. real-life traffic accidents using the driving-duration characteristics of road sections could be found during our literature review.

Identifying the Optimal Boundary Condition and Findings
The explanatory power of RFD for the risk of a vehicle crash relies closely on the boundary condition of ( | , ) between non-fatigue and fatigue states. That is, the and parameters (min) play key roles in measuring reliable RFD values, given the distribution of driving durations. At present, no universally accepted boundary condition for such a real-life accident rate exists, though several safe limits (i.e., 40, 60, and 80 min) for continued driving were proposed in previous investigations. Therefore, it is essential to analyze and identify suitable values of the two parameters in order to prevent estimation failures (i.e., overestimations or underestimations of the RFD). Due to this, a numeric simulation to identify the optimal values of and was carried out for the possible combinations of 60 ≤ ≤ 240 and 10 ≤ ≤ 120 in increments of ten minutes. The entire test data was exactly separated into two datasets (i.e., 50% training and 50% testing) as shown in Table 1. For the cross-fold validation, datasets 1 and 2 were employed as training and testing datasets for scenario 1, and datasets 2 and 1 were also used as training and testing datasets for scenario 2.
The effects of the combination of the and values on the statistical coefficient of the correlation ( ) between RFD and AR values for the two datasets are shown in Figure 5, where the behaviors of correlation coefficient are categorized into two groups with = 130 min. With regard to the first group ( ≤ 130), each correlation coefficient between RFD and AR reaches its maximum value only when the value is 10 or 120 for the two datasets. This indicates that the boundary conditions (i.e., ( | , )) at the maximum values, despite the fact that the values (more than 0.80) are statistically acceptable, may not be acceptable for real-world use, due to the following reasons. In the case of ( , ) = (60, 120), it can be seen that unreal fatigue-related behaviors for normal drivers arise due to the over-dispersion problem of the boundary condition. For instance, 30.9 % of drivers (0.309 = (0) with Equation (1)) are already in a fatigued state when their driving starts. In addition, 2.28% of drivers (0.023 = 1.0 − ( + 2 )) are still in a non-fatigued state before the continued driving of 300 min (= 60 + 2 × 120), even when they have sufficient rest time at a rest area. Moreover, fatigue drivers are up to 43.4%, 50.0%, and 56.6% of the proposed safe limits of 40, 60, and 80 minutes, respectively, according to previous research [1,11,17,33,[43][44][45]. For the case of ( , ) = (130, 10), the boundary condition is too homogeneous to explain early-fatigued drivers, despite the fact that the values (≥ 0.85) for the two datasets appear acceptable. Due to this, fatigued drivers account for only 0.1 % of the total (0.001 = ( − 3 )) before continued driving of 100 min (= 130 − 3 × 10). The percentage of fatigued drivers is also zero for all safe limits.

Identifying the Optimal Boundary Condition and Findings
The explanatory power of RFD for the risk of a vehicle crash relies closely on the boundary condition of p(t µ, δ) between non-fatigue and fatigue states. That is, the µ and δ parameters (min) play key roles in measuring reliable RFD values, given the distribution of driving durations. At present, no universally accepted boundary condition for such a real-life accident rate exists, though several safe limits (i.e., 40, 60, and 80 min) for continued driving were proposed in previous investigations. Therefore, it is essential to analyze and identify suitable values of the two parameters in order to prevent estimation failures (i.e., overestimations or underestimations of the RFD). Due to this, a numeric simulation to identify the optimal values of µ and δ was carried out for the possible combinations of 60 ≤ µ ≤ 240 and 10 ≤ δ ≤ 120 in increments of ten minutes. The entire test data was exactly separated into two datasets (i.e., 50% training and 50% testing) as shown in Table 1. For the cross-fold validation, datasets 1 and 2 were employed as training and testing datasets for scenario 1, and datasets 2 and 1 were also used as training and testing datasets for scenario 2.
The effects of the combination of the µ and δ values on the statistical coefficient of the correlation (r) between RFD and AR values for the two datasets are shown in Figure 5, where the behaviors of correlation coefficient are categorized into two groups with µ = 130 min. With regard to the first group (µ ≤ 130), each correlation coefficient between RFD and AR reaches its maximum value only when the δ value is 10 or 120 for the two datasets. This indicates that the boundary conditions (i.e., p(t µ, δ) ) at the maximum r values, despite the fact that the values (more than 0.80) are statistically acceptable, may not be acceptable for real-world use, due to the following reasons. In the case of (µ, δ) = (60, 120), it can be seen that unreal fatigue-related behaviors for normal drivers arise due to the over-dispersion problem of the boundary condition. For instance, 30.9 % of drivers (0.309 = cp(0) with Equation (1)) are already in a fatigued state when their driving starts. In addition, 2.28% of drivers (0.023 = 1.0 − cp(µ + 2δ)) are still in a non-fatigued state before the continued driving of 300 min (= 60 + 2 × 120), even when they have sufficient rest time at a rest area. Moreover, fatigue drivers are up to 43.4%, 50.0%, and 56.6% of the proposed safe limits of 40, 60, and 80 min, respectively, according to previous research [1,11,17,33,[43][44][45]. For the case of (µ, δ) = (130, 10), the boundary condition is too homogeneous to explain early-fatigued drivers, despite the fact that the r values (≥ 0.85) for the two datasets appear acceptable. Due to this, fatigued drivers account for only 0.1 % of the total (0.001 = cp(µ − 3δ)) before continued driving of 100 min (= 130 − 3 × 10). The percentage of fatigued drivers is also zero for all safe limits. Sustainability 2020, 12, 5877 9 of 16 monitored RFD can be used as a significant explanatory variable to forecast the risk of motorway traffic accidents. Therefore, critical and values ( , ) of 160 and 50 were identified for the critical boundary condition (CBC, ( | , )) that maximizes the statistical explanatory power of the RFD with regard to AR. In addition, the CBC can be used to demonstrate the potentiality of the RFD in research involving risk analyses of motorway vehicle crashes as well as more detailed analyses.   Regarding the second group (µ > 130), r value for each case of the µ value (steeply) increases to its maximum space and then gradually decreases as the δ value increases (Figure 5a,b). This convex-shaped outcome of concurrence is evidence that a useful boundary condition exists, regardless of whether or not the boundary space is obvious. The maximum r values for all µ values also exceed 0.85. In this manner, the maximum r values for the dataset 1 and 2 are as high as 0.860 and 0.867 with µ = 160 when δ values are 60 and 40, respectively. The results of model calibration and validation are summarized in Table 2, where the differences between two r values (i.e., correlation coefficient) for calibration and validation are within 0.007 for all scenarios. More importantly, the performances of calibration and validation for the same dataset, in terms of r value, are very comparable, despite the fact that the two groups of µ and δ values (i.e., the best µ and δ values calibrated using the dataset and another µ and δ values calibrated using the counterpart dataset) are not equal. This reveals that an optimal boundary condition between non-fatigued and fatigued driving states can be determined based on given observed data in advance. Additionally, this implies that the boundary condition can be effectively employed to monitor RFD for any road section within an acceptable level of monitoring error from the standpoint of actual AR, and the monitored RFD can be used as a significant explanatory variable to forecast the risk of motorway traffic accidents. Therefore, critical µ and δ values (µ c , δ c ) of 160 and 50 were identified for the critical boundary condition (CBC, p c (t µ c , δ c ) ) that maximizes the statistical explanatory power of the RFD with regard to AR. In addition, the CBC can be used to demonstrate the potentiality of the RFD in research involving risk analyses of motorway vehicle crashes as well as more detailed analyses.

Potentialities and Findings
The potentialities of the RFD for both monitoring fatigued drivers and the predictability of AR for motorway sections are demonstrated, and some of the findings are presented in this subsection. In addition, further research directions which are closely associated with and which effectively address the shortcomings of this research are suggested.
Detailed relationships between the driving-duration distribution, the RFD, and AR are shown with seven cases in Figure 6. The shape of the driving-duration distributions varies from a left-biased shape (e.g., C1) to a right-biased shape (e.g., C7), as the numbers of mid-range and long driving trips increase. These variations of the distribution originate from the fact that the features of the driving times from the departure locations to a road section wholly rely on the complex spatiotemporal behaviors of the origin-destination vehicle trips in the road network. Therefore, each distribution has its own characteristics distinguishable from those of the others. It can be seen that these characteristics of the driving-duration distribution has intuitive discriminant power for the RFD in itself, if a suitable boundary for non-fatigued and fatigued driving conditions could be given. For instance, the RFD values, considering the CBC, increase from 0.036 to 0.55 when a portion of the mid-range and long driving durations increases. The maximum RFD value is also more than 15 times the minimum value. This fact indicates that the RFD can be effectively utilized to monitor the percentage of fatigued drivers, at least for regional motorway sections using the features of the driving-duration distribution.
The potentialities of the RFD for both monitoring fatigued drivers and the predictability of AR for motorway sections are demonstrated, and some of the findings are presented in this subsection. In addition, further research directions which are closely associated with and which effectively address the shortcomings of this research are suggested.
Detailed relationships between the driving-duration distribution, the RFD, and AR are shown with seven cases in Figure 6. The shape of the driving-duration distributions varies from a left-biased shape (e.g., C1) to a right-biased shape (e.g., C7), as the numbers of mid-range and long driving trips increase. These variations of the distribution originate from the fact that the features of the driving times from the departure locations to a road section wholly rely on the complex spatiotemporal behaviors of the origin-destination vehicle trips in the road network. Therefore, each distribution has its own characteristics distinguishable from those of the others. It can be seen that these characteristics of the driving-duration distribution has intuitive discriminant power for the RFD in itself, if a suitable boundary for non-fatigued and fatigued driving conditions could be given. For instance, the RFD values, considering the CBC, increase from 0.036 to 0.55 when a portion of the mid-range and long driving durations increases. The maximum RFD value is also more than 15 times the minimum value. This fact indicates that the RFD can be effectively utilized to monitor the percentage of fatigued drivers, at least for regional motorway sections using the features of the driving-duration distribution.
Concerning the relationship between the RFD and AR, the AR values increase from 0.02 to 0.27 when the RFD values increase (Figure 6). The statistical correlation (0.86) between the RFD and AR with the CBC is also significant ( Figure 5). This fact indicates that the RFD has significant explanatory power with regard to AR and thus, can be successfully used for predicting AR values with considerable reliability. This also indicates that the feature of the driving-duration distribution is closely associated with AR in the case of motorway sections, where the percentage of middle-and long-distance trips is higher than short-distance trips. investigations. Regarding ( − 2 ) (i.e., 60 min), CBC guarantees the different safe limits for continued driving with tolerable differences. Fatigued drivers for safe limits of 40, 60, and 80 minutes are 0.8, 1.4, and 5.5% of drivers (= (a safe limit) × 100), respectively. The values of the two datasets for the safe limit of 80 min with = 10 (i.e., ( , ) = (80, 10)) in Figure 5 are 0.69 and 0.66, which are useful during a risk analysis of vehicle crashes. These facts indicate that safe limits, when the distribution of the driving duration for a road section is available, could at least provide a useful boundary with which to monitor the percentage of fatigued drivers directly from the practical Figure 6. Relationships between the driving-duration distribution, RFD, and AR.
Concerning the relationship between the RFD and AR, the AR values increase from 0.02 to 0.27 when the RFD values increase ( Figure 6). The statistical correlation (0.86) between the RFD and AR with the CBC is also significant ( Figure 5). This fact indicates that the RFD has significant explanatory power with regard to AR and thus, can be successfully used for predicting AR values with considerable reliability. This also indicates that the feature of the driving-duration distribution is closely associated with AR in the case of motorway sections, where the percentage of middle-and long-distance trips is higher than short-distance trips.
From the aspect of reliability of CBC, it can be seen that the meaningful points of CBC (µ c − 2δ c , µ c − δ c , µ c , µ c + δ c ) in Figure 6 are in line with the findings of previous fatigue-related investigations. Regarding (µ c − 2δ c ) (i.e., 60 min), CBC guarantees the different safe limits for continued driving with tolerable differences. Fatigued drivers for safe limits of 40, 60, and 80 min are 0.8, 1.4, and 5.5% of drivers (= cp(a safe limit) × 100), respectively. The r values of the two datasets for the safe limit of 80 min with δ = 10 (i.e., (µ, δ) = (80, 10)) in Figure 5 are 0.69 and 0.66, which are useful during a risk analysis of vehicle crashes. These facts indicate that safe limits, when the distribution of the driving duration for a road section is available, could at least provide a useful boundary with which to monitor the percentage of fatigued drivers directly from the practical perspectives of public safety, despite the academic debate about safe limits for continued driving. With respect to (µ c − δ c ) (i.e., 110 min), CBC meets the duration of the continued driving for safety driving, as recommended in a notable study [45] in which the researchers stressed "drivers should stop and rest every 1-2 h". Moreover, CBC guarantees a turning point (90 min) of the mean fatigue score [45], after which the fatigue score varied little. Concerning (µ c ) and (µ c + δ c ) (i.e., 160 and 210 min), CBC guarantees two significant periods of prolonged driving times [28] in terms of the temporal development of fatigue states. The µ c value of 160 explains the middle of the first period (120-190 min), during which, subject drivers started to feel some fatigue, attempted to resist it and struggled to remain alert. The µ c + δ c value of 210 locates in the middle of the second period (190-240 min), during which subject drivers became fatigued quickly, began to feel sleepier, and lost interest in remaining awake as the driving task continued further. Furthermore, any additional continued driving tasks should be halted after the second period [28].
Interestingly, the values of µ c , µ c + δ c , and µ c + 2δ c of CBC (i.e., 160, 210, and 260 min) are greater than the three threshold values of the prolonged driving duration (i.e., 120, 190, and 240 min) [28] with differences of 20-40 min. This implies that CBC does not exclude the following significant factors which have not been included in driving simulator-based studies but are deeply related to drivers' fatigue, especially in the case of mid-range and long-distance real-life driving trips. In order to reduce driving-related fatigue and sleepiness, drivers use self-initiated tactics (e.g. blasting the radio, listening to driving music, rolling down the windows, turning on the air conditioner) [52]; ingest caffeine-containing products [25,27,50,[52][53][54], and take naps at rest areas [17,50,52]. Various advanced in-vehicle systems (e.g., lane-departure warning systems, lane-keeping assist systems, adaptive cruise control systems, forward collision avoidance systems) have also been utilized to assist with safe driving. In addition to these, drivers tend to underestimate the impact of fatigue, ignore feelings of drowsiness, and continue driving as they become sleepy [20,52].
The explanatory power of the RFD with regard to AR is illustrated in Figure 7. The RFD shows a positive trend with regard to AR with an R 2 value of 0.736 (Figure 7a). That is, the RFD statistically explains more than 73% of motorway traffic accidents, at least in our study. This indicates that task-related fatigue significantly impacts actual traffic accidents and that the RFD can be efficaciously used as a powerful variable in the monitoring of fatigued drivers and in assessing the risk of fatigue-related traffic accidents for motorway sections. The R 2 values (0.797, 0.741, and 0.761) for the three different numbers of lanes (N) (Figure 7b-d) are greater than that for all lanes (Figure 7a). This result indicates that the impacts of driving fatigue on traffic accidents differ according to N, and thus, RFD can be successfully used for the forecasting of risk of traffic accidents according to different N.
Specifically, the effects of driving fatigue on the risk of a traffic accident when N ≤ 3 are distinguished from that when N ≥ 4. When N ≤ 3, the trend of the ARs increases exponentially as the RFD values increase. This implies that weighted fatigue (or drowsy) driving states caused by prolonged monotonous driving more negatively affect the risk of a traffic accident than the average level of fatigue caused by monotonous driving. This also indicates that drivers, even when they take a break periodically, should take naps or at least get enough rest to reduce fatigue [17,50] in cases when the accumulated driving duration reaches the maximum level, thus seriously impairing their abilities related to safe driving. When N ≥ 4, the trend of the ARs shows lower values than those in the other cases, despite the fact that the explanatory power of the RFD is comparable to the power of other variables. This is likely due to the fact that increased interactions between vehicles (e.g., lane-changing, overtaking, acceleration, and deceleration) frequently interrupt continuously monotonous driving, which is a major cause of task-related fatigued driving. In contrast, the AR values for the two cases of N ≥ 3 show stationary trends with little variation when RFD ≤ 0.1 (Figure 7c,d), and the estimated AR values when RFD = 0.0 range from 0.03 to 0.08 (Figure 7b-d). This implies that the impacts of fatigued driving on the risk of a vehicle crash, in the case when the percentage of fatigued drivers is low, could decrease when combined with other accident-related causal factors. Despite this, it is clear that the RFD, even when low, offers the promising opportunity, at least to be used as a significant causal factor, without ignoring the impacts of small percentages of fatigued drivers on traffic accidents. Despite the demonstrated feasibility of the proposed approach, further investigations related to the additional potentialities of big vehicle trajectory data should be conducted from the standpoints of (1) driving-duration boundaries for monitoring fatigued drivers and understanding fatiguerelated traffic accidents, (2) the risk management of traffic accidents based on driver fatigue.
Regarding the driving-duration boundary, it can be seen that subsequent investigations will be necessary for more reliable monitoring of fatigued and drowsy drivers, though several remarkable studies of the time course of task-related fatigue from mental and physical aspects have been conducted for continued driving. Multiple driving-duration boundaries that can effectively explain the actual behaviors of discontinued driving (i.e., driving, resting, driving, resting, and so on), especially in cases of middle-and long-distance driving, should be investigated and determined; that is, suitable multi-step boundaries are fundamental for explaining the iterative accumulation of and recovery from fatigue. Conservative and progressive multi-step boundaries, at least for different age groups, can also be explored to monitor fatigued drivers and estimate fatigue-related traffic accidents, respectively. Moreover, multiple boundaries related to the time of day, at least day and night, should be investigated in order to consider the effects of the circadian rhythm on the temporal progression of driver fatigue. Despite the demonstrated feasibility of the proposed approach, further investigations related to the additional potentialities of big vehicle trajectory data should be conducted from the standpoints of (1) driving-duration boundaries for monitoring fatigued drivers and understanding fatigue-related traffic accidents, (2) the risk management of traffic accidents based on driver fatigue.
Regarding the driving-duration boundary, it can be seen that subsequent investigations will be necessary for more reliable monitoring of fatigued and drowsy drivers, though several remarkable studies of the time course of task-related fatigue from mental and physical aspects have been conducted for continued driving. Multiple driving-duration boundaries that can effectively explain the actual behaviors of discontinued driving (i.e., driving, resting, driving, resting, and so on), especially in cases of middle-and long-distance driving, should be investigated and determined; that is, suitable multi-step boundaries are fundamental for explaining the iterative accumulation of and recovery from fatigue. Conservative and progressive multi-step boundaries, at least for different age groups, can also Sustainability 2020, 12, 5877 13 of 16 be explored to monitor fatigued drivers and estimate fatigue-related traffic accidents, respectively. Moreover, multiple boundaries related to the time of day, at least day and night, should be investigated in order to consider the effects of the circadian rhythm on the temporal progression of driver fatigue.
More importantly, the present and future availability of vehicle trajectory data provides promising opportunities and future research directions for the offline and online fatigue-based risk management of traffic accidents, as the proposed method was developed based on the intrinsic driving-duration characteristics, which are included in vehicle trajectory data. Research for offline fatigue-based risk management related to the time of day, at least for day and night times, should be conducted using the driving-duration features of certain vehicle types (e.g., cars, buses, trucks). For instance, the driving-duration distributions of heavy vehicles (i.e., buses and trucks), obtained by digital tachograph devices, can be used effectively to monitor the fatigue of heavy-vehicle drivers and then to reflect the effects of their fatigue on traffic accidents. Note that point-to-point vehicle trajectory data, even when collected by different vehicle-GPS devices, can easily be integrated into a structured system database, as each point of information consists of the geographic coordinates and the time. This convenient integration of vehicle-GPS data indicates that big vehicle trajectory data to guarantee a high market rate of vehicle-GPS can be collected and used in real time. The distribution of driving durations, as proposed in this study, is also a practicable solution to address privacy policies effectively. In this vein, it can be seen that further research related to online fatigue-based risk management, based on the real-time availability of big vehicle-GPS data, is necessary, proactively, to prevent fatigue-related traffic accidents currently and in the near future.

Conclusions
Driver fatigue, inevitably caused by prolonged continued driving, is a major factor in vehicle crashes. Hence, noticeable achievements and findings have been reported through painstaking academic efforts. Despite these continued efforts, there still exists an ongoing challenge, which is to explain driver fatigue and then to understand the significant relationship between driver fatigue and actual traffic accidents for real-life road segments. Fortunately, advanced vehicle-GPS systems, widely installed in vehicles and smart phones, provide a feasible opportunity to use vehicle trajectory big data, which includes crucial information about driving durations (i.e., time-on-task). However, despite this opportunity, no research to monitor driver fatigue and to explain the impact of driver fatigue on actual traffic accidents for road sections using individual vehicle trajectory data has been reported.
To mine this opportunity, a concept for the direct monitoring of a degree of drivers' task-related fatigue (i.e., RFD) for road sections was introduced in this study. A data-oriented method to monitor driver fatigue was developed based on the distribution of the driving duration, which, in this case, was extracted from vehicle trajectory data. The robustness of the method was demonstrated using vehicle trajectory big data obtained from a vehicle-GPS system and real-life vehicle crash data collected from regional motorways. The analysis results were notable from two standpoints: monitoring the RFD and understanding traffic accidents (i.e., AR) using the monitored RFD. It was found that drivers' task-related fatigue for a road section can be successfully measured in terms of the RFD using a suitable boundary of the driving duration for non-fatigue and fatigue states. It was also found that the measured RFD has significant explanatory power with regard to traffic accidents in terms of AR, at least for regional motorway sections. Therefore, it is expected that the direct monitoring of drivers' task-related fatigue for road sections using vehicle trajectory big data is a promising approach for successfully addressing the current limitations of fatigue-based risk management of vehicle crashes. Moreover, the proposed approach is instantly practicable when vehicle trajectory data is available, at least with a market rate of 1.0%.
The findings of this research pertaining to the implications of road safety policies can be summarized as follows. First, a suitable boundary condition (i.e., p(t u, δ) ) between non-fatigue and fatigue driving states can be determined based on observed data (i.e., the distribution of driving duration and the risk of vehicle crashes) in advance. The identified boundary condition can be efficaciously employed to continuously monitor the temporal evolution of driving fatigue for a driver using only driving duration collected through a vehicle-GPS system, and then the monitored fatigue state can be dynamically provided through the vehicle-GPS in order to prevent fatigue-related vehicle crashes. Second, RFD for a motorway section can be easily monitored using a suitable boundary condition and (real-time or past) driving-duration distribution. The monitored RFD can be provided to drivers for defensive and careful driving on the motorway section through information terminals (e.g., variable message sign, and online vehicle-GPS), and can be also employed in analyzing the location adequacy of fatigue-related motorway infrastructures (e.g., rest area). Lastly, RFD has a significant explanatory power with regard to traffic accident rate. RFD, thus, can be efficaciously used as a fatigue-related explanatory variable for forecasting the risk of vehicle crashes, especially in the case of the network of motorway and inter-regional highway.
This research presents a first step toward proposing a feasible solution with which to undertake direct measurements of drivers' task-related fatigue using vehicle trajectory big data. In spite of the acceptable results of this study, there are still other opportunities to enhance the method and to improve its capabilities using the aforementioned multiple driving-duration boundaries. In addition, further research should be conducted to accomplish the aforementioned offline and online fatigue-based risk management of traffic accidents, given the availability of vehicle trajectory data and the high penetration rate of vehicle-GPS.