1. Introduction
Driving under the influence (DUI), especially under the influence of alcohol, is a global problem that costs many lives, seriously injures people, causes immense costs, and is one of the leading causes of car accidents [
1,
2]. In Europe, it is responsible for 25% of accidents [
3]. Not every accident is fatal, but alcohol was responsible for 6.4% of all accident victims in Germany in 2021 [
4,
5]. According to the National Highway Traffic Safety Administration (NHTSA), about 10,000 people die each year in the U.S. [
6]. This makes driving under the influence of alcohol (DUI) the number one cause of fatal accidents in the U.S., and 31% of fatal car accidents are correlated with alcohol [
6]. The relative probability of causing an accident increases exponentially with blood alcohol concentration (BAC) [
7,
8]. This is visualized in
Figure 1.
Figure 1 shows the relative crash risk in relation to the BAC, where the relative risk is set in relation to the accident risk of driving sober (BAC = 0).
This is why countries around the world have set limits above which it is illegal to drive a car. In Germany, the legal limit is 0.05% BAC. The relative crash risk at the legal limit is only 38% higher than driving sober [
7]. This is why this study examined a blood alcohol concentration just above the legal limit of 0.06%. Various approaches exist in the literature to distinguish between sober and impaired drivers. These include methods based on thermal (Far-Infrared (FIR)) cameras [
9], near-infrared (NIR) cameras [
10,
11], gas sensors [
1,
2,
12], electrocardiography (ECG) [
13], driving behavior analysis [
14,
15], and multi-sensor systems [
2,
15,
16,
17]. Most of today’s production vehicles do not integrate these sensors, but the available driving data can still be utilized. In addition, camera-based driver monitoring systems (DMSs) are being developed (e.g., DMSs from the Frauenhofer Institute [
18]). According to a study by ABI Research, DMSs with driver monitoring cameras are considered a key technology for systems for driver condition recognition required by EU regulations (General Safety Regulations (GSRs) [
19]) from 2024 and for semi-autonomous driving [
18,
20]. Therefore, the combination of driving behavior analysis and DMSs should increase the accuracy of detecting DUI.
2. Related Work
Research has extensively documented the effects of alcohol on driving behavior and visual performance, with studies broadly categorized into those examining driving behaviors and those analyzing visual features through camera-based detection. The impact of alcohol on driving behavior generally intensifies with an increasing blood alcohol concentration (BAC), as shown in numerous studies [
7,
8]. Alcohol consumption lowers inhibitions, which is often associated with riskier driving behavior [
21].
Fillmore et al. explored this behavioral influence by conducting a study on 14 participants (7 men, 7 women, ages 21–30), where each participant completed two drives under placebo conditions and two under a BAC of 0.065% on separate days. Their analysis revealed that intoxicated participants showed significantly higher lane position deviation and more abrupt steering maneuvers. Intoxicated drivers also tended to drive faster, accelerate more aggressively, and display delayed reaction times, particularly in responding to red lights [
21]. Similar results were found by Zhang et al., who conducted a study on 25 male drivers (ages 20–25) with at least three years of driving experience. Participants completed drives under BAC levels of 0.00%, 0.03%, 0.06%, and 0.09% on different days, showing that higher BAC levels were associated with faster speeds through curves and less stable speed control. A higher BAC also worsened lateral movement stability, with deviations in average speed, lane position, and their respective standard deviations [
22].
In another study, Li et al. investigated the impact of varying alcohol dosages on steering behavior during curves. In their study (25 men, ages 20-35), the test subjects drove a test track in a simulator under four different breath alcohol concentrations (BrACs). The steering angle, steering speed (SS), steering reversal rate (SRR), and peak-to-peak (PP) value of the steering angle were examined. The steering speed on radius curves was significantly faster for drivers under the influence of alcohol compared with those given a placebo. SS, SRR, and PP tended to increase as the BrAC increased [
23]. Helland et al. demonstrated an additional effect, showing that increased alcohol intake appeared to reduce simulator sickness due to the opposing effects of alcohol (inducing risky driving) and simulator-induced caution. In three test drives under BAC levels of 0.00%, 0.05%, and 0.09%, the subjects under the influence of alcohol tended to drive faster and more aggressively, reducing simulator sickness symptoms [
24,
25].
Ying et al. applied a classification and regression tree model to classify driving behaviors under various BAC levels. In this study (22 men, ages 20–35, including 12 professional drivers), subjects completed five drives: one sober, three under BAC levels of 0.02%, 0.05%, and 0.08%, and one fatigued. The model utilized variables such as the variance in average speed (SPAVG), standard deviation of speed (SPSD), lane position average (LPAVG), and standard deviation of lane position (LPSD). The results demonstrated that the model was highly accurate in distinguishing “alert” from “abnormal” driving (90.9%) and distinguishing between drunk and tired states (94.4%) but less effective in classifying specific BAC levels [
26]. In another study, Ramaekers et al. examined the combined effects of THC and alcohol, finding that both substances significantly impaired driving performance, with notable effects on lane deviation, time driven out of lane, reaction time, and headway stability. Together, they significantly amplify the impact and severely degrade driving performance [
27].
Other studies focused specifically on reaction time under the influence of alcohol. Yadav et al. and Culik et al. [
28,
29] found that reaction time worsens with a higher BAC, while driving experience mitigates this effect. In Yadav’s study, reaction times to pedestrian crossings and road crossings increased by 36%, 53%, and 94% for BAC levels of 0.03%, 0.05%, and 0.08% for pedestrian crossings and of 64%, 78%, and 116% for road crossings, respectively. Experienced drivers responded about 15% faster than younger drivers, and frequent drinkers responded approximately 11% faster than occasional drinkers [
28]. Similarly, Culik et al. found that driving experience and regular alcohol consumption could lessen reaction time delays under alcohol. Above all, concentration has an impact on reaction time. Therefore, it should be ensured that all subjects are concentrated [
29]. Li et al. also noted a correlation between reaction time and lane deviation, observing that frequent drinkers reacted 10.2% faster than occasional drinkers and 30.5% faster than non-drinkers [
30].
Research on alcohol’s impact on visual performance complements these findings. Makowski et al. examined eye movements such as gaze direction, fixation, and saccades under the influence of alcohol, discovering that BAC affects the velocity and acceleration of these movements, achieving a general classification accuracy of 69% (74% for known individuals) [
31]. Silva et al. found similar results in a study using a Visual Maze test under BAC levels of 0.00% and 0.08%. Intoxicated participants showed an increased fixation duration, first fixation latency, and total number of fixations compared to the placebo [
32]. Watten et al. observed that both the number of fixations and the duration of eye fixations increased significantly with BrAC levels of 0.00%, 0.05%, and 0.1%, further indicating the impairing effect of alcohol on visual tracking [
33].
Zhang et al. investigated the influence of alcohol on contrast sensitivity (CS) in their study of nine participants over the age of 23, showing a significant decline in CS with higher BAC levels [
34]. The study by Roche et al. (78 heavy drinkers (HDs), 60 light social drinkers (LDs)) explored uniform tracking and pro-saccadic and anti-saccadic eye movements. The measurements were carried out under three different BACs (0.00%, 0.04%, and 0.08%). The study showed a significant impact of alcohol on uniform tracking amplification and pro- and anti-saccadic latency, speed, and accuracy. The influence is highly dependent on the drinking habits of the participants. The HDs showed much less influence than the LDs. Likewise, the dose of alcohol had major influence on the indicators [
35].
Marple-Horvat et al. conducted a study (five men, five women, ages 20–36) examining the correlation between eye movement and steering maneuvers under various BAC levels in their driving study. The time difference between eye movement and steering maneuver was considered. The sober driver drove with foresight, first directing their gaze toward the intended direction before turning the steering wheel. The average time difference was around 0.61 s. Conversely, the intoxicated driver struggled to coordinate between eye movements and steering maneuvers effectively. The intoxicated driver synchronized their eye movements almost simultaneously with steering actions, directing their gaze toward their intended direction while providing steering input. The time advantage of the eye movement was minimal and was about one-tenth of a second (0.098 s). The time advantage of the eyes diminishes linearly with the amount of alcohol consumed. Additionally, speed influences the time advantage of the eyes, decreasing as the driver’s speed increases [
15].
In summary, previous studies have illustrated the detrimental effects of alcohol on both driving behavior and visual capabilities. Driving impairments under the influence of alcohol range from erratic steering and lane deviation to delayed reaction times and increased speed instability. Visual performance is similarly affected, with prolonged fixation, reduced contrast sensitivity, and compromised eye–hand coordination. However, the interaction between these impaired visual and driving behaviors remains unexamined. This study addresses this gap by investigating whether a combined assessment of driving and gaze behaviors can enhance the DUI detection accuracy. Additionally, given the impact of driving experience and drinking habits on observable indicators, our recruitment criteria aimed to ensure a uniform test group. Details on these criteria are provided in the Recruiting Criteria Section. While a homogeneous subject group offers certain advantages, it also limits the validity of our findings. These limitations are discussed in detail in
Section 6.3.
3. Study Design
3.1. Participants
In the study, 23 participants (18 men and 5 women) took part. Their ages ranged from 24 to 47, with a mean age of 32.14. Two participants completed only the test drives, resulting in 21 test subjects (17 men and 4 women) who completed all drives. Each participant completed one test drive, consisting of two sober and two drunk drives, all conducted on separate days. The target BAC for drunk driving was 0.06%. More on this is covered in
Section 3.2. The weight of the test persons was determined to calculate the correct amount of alcohol. The calculation of the amount of alcohol to be consumed in g was calculated using the modified Widmark formula [
36]:
where a is the amount of alcohol in grams, p is the body weight in kg, r is the reduction factor (man 0.7, woman 0.6), ct is the target value of the BAC at the time of collection,
is the rate of alcohol degradation, and t is the time elapsed between the alcohol test and the blood sample. In this case, t = 0 as only the breath alcohol and not the blood alcohol is measured, and this is recorded immediately during the measurement. As the weight of the subjects was between 53 and 125 kg, the amount of alcohol administered varied greatly (24–60 g).
Recruiting Criteria
In
Section 2, the correlation between driving experience and driving behavior is determined. In order to filter out novice drivers, the prerequisites for participation were defined as holding a driving license for more than 3 years and driving a car at least once a week. This minimizes the variance due to driving experience. All test subjects were able to tolerate alcohol. Therefore, only subjects who drank alcohol at least once a month participated. The same applied to subjects with Alcohol Use Disorder (AUD), further called alcoholics. Through filtering out non-drinkers and alcoholics, the variance within the test group was reduced. This increases the significance of the results. The smaller the test group, the more homogeneous the group of test subjects must be. The detailed requirements of the recruitment criteria are in
Appendix B.
3.2. Experimental Procedure
The experimental procedure for this study consisted of three main phases: an intake screening, a practice session, and test sessions for each participant. Initially, potential participants who met the recruiting criteria outlined in
Section 3.1 were contacted via email or phone, where the purpose of the study was explained. Those interested underwent a telephone screening, which assessed their demographics, health status, drinking habits, and driving experience. Participants who did not meet the eligibility requirements were excluded from the study, while qualified individuals provided informed consent in line with GDPRs. On the day of testing, participants were again screened to ensure they met the acute recruiting criteria.
Following screening, participants engaged in a practice session to minimize learning effects and assess simulator sickness. Participants had one hour to familiarize themselves with the procedure and complete a full driving scenario. Simulator sickness was evaluated through pre- and post-session questionnaires, leading to the exclusion of two participants.
The remaining participants proceeded to the test sessions, completing two sober and two drunk drives on different days within 14 days. A target BAC of 0.06% was aimed for during alcohol sessions, with driving sessions alternating to mitigate time series effects. Half began with sober driving, while the other half started with the alcoholized drive. Participants completed questionnaires before and after each session to assess health status and potential simulator sickness.
Prior to sober driving, a breathalyzer (AL5500) verified a BAC of 0.0%. Based on the available measurement methods, this study approximated the BAC using the BrAC. On alcohol test days, participants consumed a vodka (40%) and orange juice mixture calibrated to achieve a BAC of 0.06%. Alcohol was administered incrementally every 10 min. Participants self-assessed their intoxication level. The BrAC was measured approximately every 15 min during the session. After about 60 min, subjects completed their next simulator ride.
Post drive, participants self-assessed their experience and completed a simulator sickness questionnaire. Supervisors ensured that participants had an alcohol level below 0.01% before release. A trained first aider was present to manage any medical emergencies, though none occurred during the study.
3.3. Simulator
A dynamic simulator was used for the test. The simulator consisted of 3 65-inch screens with full HD resolution (1920 × 1080), reflecting the driving content. The driver’s seat was located on a movable platform that moves to match the driving action. This aims to make the ride feel more realistic and immerse the driver more deeply in the driving experience. The simulator also consisted of the following hardware and software:
Three 65-inch screens;
Next-level motion platform for the driver seat;
Fanatec steering wheels and pedals (16-bit encoder for steering);
Four IR cameras;
Three webcams for driver and scene supervision;
iMotions software for data collection;
Coherent light source for proper lighting conditions.
The setup is shown in
Figure 2. The individual sensors were placed in the same positions as in a car. A CAD model of a car was used to position the sensors.
3.4. Scenarios
Scenarios were created using Unity (version 2022.3.14) software. They were divided into city, country road, and highway scenarios. Each driving scenario lasted around 10 min and had its own driving events in which the driver’s behavior could be explicitly observed. The events made the drunk driving and the sober driving scenarios more comparable.
The rides always started in the city. A route is shown in
Figure 3. The orange, green, and red events are traffic light events with the respective colors of a traffic light. The blue events are stop events, where the driver has to stop due to an event. The yellow arrows symbolize the course of the route. The occurring events are listed in
Table A1 in
Appendix C. Each event is associated with an expected driver behavior, serving as a clear indicator of whether the intoxicated driver can still meet expectations.
After the city scenario, participants drive onto a country road. The country road consists mainly of straight stretches and has three tight bends. The main idea behind this setup was to focus on the intersections of the middle lanes and the differences in steering behavior due to the curved stretches combined with emergency events. The rural road map is shown in
Figure 4a.
In the last section of the test track, the driver completes a section on the highway. The route shown in
Figure 4b was designed so that the driver can be observed without interruption for as long as possible.
4. Method
The primary objective of the data analysis is to identify distinct features that allow for reliable classification. Additionally, various classification models will be developed and compared to determine the most effective ones for distinguishing between sober and intoxicated drivers. In integrating data from multiple sensors, the aim is to achieve the highest possible classification accuracy, demonstrating that sensor fusion enhances the precision of the results.
4.1. Method of Significance Analysis
Data preparation followed the principles of exploratory data analysis (EDA). Initially, the data were validated and checked for completeness and plausibility. Subsequently, the data were correctly labeled and cleansed of impurities. This process included a graphical representation of the signals, computational checks, and comparison with video recordings of the experiment. Larger outliers were filtered, and the overall data distribution was analyzed. In cases of a non-normal data distribution, the Wilcoxon signed-rank test was applied as an alternative.
The literature describes behavioral changes under the influence of alcohol, particularly affecting driving and gaze behavior, which serve as indicators of alcohol influence. Specific changes in driving behavior, such as acceleration, speed, braking, distance, lane keeping, steering behavior, and reaction times, were examined. Additionally, camera-based tracking showed changes in gaze patterns (fixations and saccades), facial expressions, and eye-steering coordination. In the next step, the recorded signals were compared with the indicators described in the literature. Paired t-tests were employed to identify significant differences between the data collected under the influence of alcohol and in a sober state.
In the study design, each driver completed five drives: one test drive, two sober drives, and two drives under the influence of alcohol. To compare the sober drives with the alcohol-affected drives, whether significant differences existed within the groups (sober and alcoholized) was first examined. If no significant differences were found within the groups, the drives were pooled and then compared.
Each drive included three scenarios: city driving, rural roads, and highways. Therefore, the analysis was conducted both across all scenarios and individually for each scenario. This approach was necessary because calculating averages, variances, and other statistical values for the entire route might obscure effects that become evident when analyzing individual scenarios. Furthermore, it was crucial to determine in which scenarios the most significant differences occurred to identify the situations where a drunk driver could be detected most reliably.
After identifying significant differences between the individual indicators, each was analyzed in detail to assess the nature of the observed changes. For instance, speed behavior was examined to determine how it evolved—whether it tended to increase or decrease, how the variance shifted, and in which scenarios the changes were most pronounced. While a paired t-test can confirm significant changes between two paired groups, it does not reveal the direction or characteristics of those changes. Therefore, a closer speed analysis was conducted, focusing on trends and variance.
Additionally, reaction behavior was specifically evaluated using events from the driving scenarios. These events allowed for the straightforward measurement of reaction times and the corresponding changes. Reaction time was assessed in two ways: first, during events requiring an emergency brake or stop, and second, in response to a traffic light turning from red to green, where acceleration was measured.
4.2. Method of Classification
The first step involved determining whether significant differences existed depending on the scenario and the specific indicators under consideration. Building on this, a classification algorithm was developed to distinguish between sober and intoxicated drivers. Given the complexity of alcohol’s influence—which varies depending on individual factors and blood alcohol concentration—a machine learning approach is well suited for detection.
This approach employs classification models based on calculated indicators and key values, which are compared against a more complex machine learning model that leverages the entire time series dataset. Among the most promising models based on the indicators are logistic regression, random forest, and gradient boosting.
Logistic regression is advantageous due to its simplicity and ease of interpretation. It models the probability of a binary state by applying a logistic function to a linear combination of the indicators. Random forest, by contrast, is a more complex model that constructs multiple decision trees from randomly selected subsets of indicators, and then classifies based on a majority vote. This method is particularly effective for capturing nonlinear relationships and is more resistant to outliers and overfitting. Gradient boosting also uses decision trees, but it builds them sequentially, with each tree correcting the errors of its predecessor. Although this method can achieve higher accuracy, it is more prone to overfitting and can be sensitive to outliers.
For the more complex machine learning model, a Long Short-Term Memory (LSTM) network was utilized. The LSTM model’s essential advantage over the others is its ability to process entire time series without the need for feature extraction. In contrast, simpler models rely on summary statistics such as the mean, median, standard deviation, and variance, which can result in a loss of important information. The LSTM’s ability to retain more of the original data is expected to yield better classification results.
In comparing the models, each driving scenario is analyzed individually, as well as the entire driving sequence. Additionally, the models are trained and evaluated based on four configurations:
The aim was to demonstrate that fusing data from multiple sensors enhances the classification accuracy. Data-level fusion first combines the two data sources—driving data and camera data—before using them to train the models. In contrast, interpretation-level fusion trains and tests models separately on the driving and camera data. During testing, each model provides a prediction about the driver’s state, along with a confidence value. Ultimately, the prediction with the higher confidence value is selected as the final result.
For training our classification methods, we applied supervised machine learning. The sample set, consisting of only 21 participants (each with two drives: sober and intoxicated), limited the possibilities for machine learning approaches. Using unsupervised machine learning with all observed indicators (see
Appendix D) led to overfitting and poor results. Therefore, we opted to select only a subset of indicators for model training. Similarly, for the LSTM model, we selected specific time series rather than key values for training. Data augmentation was not applied, as the number of sober drives matched the number of intoxicated drives, offering no clear advantage from augmentation.
Given the relatively small dataset, a leave-one-out cross-validation (LOOCV) approach was applied. In this method, one subject serves as the test set while the remaining subjects were used for training. This process was repeated until each subject was used as a test set once. This technique ensures meaningful insights despite a limited sample size.
Due to lengthy and costly computational demands, the LSTM model was not tested using the LOOCV method. LOOCV requires substantial computing power, as the LSTM model would need to be trained and tested for each subject across every dataset and test track. Instead, a traditional approach was adopted, using approximately 90% of the subjects for training and 10% for testing. This high training proportion was selected due to the relatively small sample size. The differing approach, compared to other models, may have influenced the results. Therefore, an initial comparison was made among the three simpler models, followed by a separate comparison with the LSTM model.
A detailed analysis and comparison of the various models and data configurations provided valuable insights into the most effective methods for detecting alcohol-impaired driving behavior and assessing their real-world applicability. In examining the various data configurations, the benefits of sensor data fusion are highlighted, showcasing its added value in improving detection accuracy.
5. Results
This study involved simulated drives conducted both under the influence of alcohol and in a sober state. The main objective was to identify the most effective classification algorithm for distinguishing between these two conditions and show that combining multiple data sources improves the classification accuracy. To achieve this, the characteristic differences between the two groups were analyzed, following the approach outlined in
Section 4.
5.1. Significance Testing
To compare sober driving with driving under the influence of alcohol, it was first necessary to determine whether significant differences existed within each group (sober and intoxicated). Identifying these internal variations was crucial for isolating the key features that are essential for a reliable classification.
Using paired t-tests, no significant differences were found within the groups—neither in the recorded driving data nor in the camera-based data. This was consistent across all individual scenarios (city, country road, and highway) as well as the entire route. Therefore, merging the data from the sober and intoxicated drives within each group was possible for further analysis.
In the next step, the sober and intoxicated groups were compared to identify significant differences, again using paired t-tests.
In terms of gaze behavior, it became clear that the presence of significant differences depended heavily on the driving scenario. In the city scenario, 6 out of 10 indicators showed significant differences, while only 3 out of 10 did so on the country road, and no significant differences were observed on the highway (0 out of 10). When analyzing the entire route, only two of the ten indicators revealed significant differences.
Conversely, the significant differences in driving behavior were less dependent on the scenario. Across the entire drive, 16 out of 19 indicators showed significant differences; in the city scenario, 17 out of 20; on the country road, 21 out of 26; and on the highway, 17 out of 22 (See
Table 1). However, not every indicator was recorded in each scenario, complicating direct comparisons. This variability also influenced the number of available indicators for each scenario. A list of all observed indicators is in
Appendix D.
5.2. Analysis of Gaze and Driving Behavior
5.2.1. Acceleration Behavior
Changes in acceleration behavior were most pronounced on the rural road. Not only did the majority of indicators show significant differences in this scenario, but the magnitude of these differences was also the greatest. For instance, positive acceleration increased by 7.47% over the entire route, 5.95% in urban areas, 9.56% in rural areas, and 9.53% on the highway. However, significant effects for average acceleration were only found in rural areas. The most reliable indicators of intoxication include average positive acceleration, average throttle position, average acceleration speed, overall deceleration, and acceleration variance. Intoxicated drivers tend to accelerate more abruptly and erratically, which correlates with difficulties in maintaining a steady speed and a general tendency to drive faster [
21].
5.2.2. Braking Behavior
Braking behavior, in contrast, proves to be a highly reliable indicator across all scenarios. Indicators such as the average brake pedal position, the standard deviation of pedal position, the average brake pedal speed, and the standard deviation of pedal speed all show significant differences. These indicators suggest that intoxicated drivers brake more frequently or forcefully or do so less consistently, resulting in greater variability in brake pedal position.
5.2.3. Speed Behavior
Speed behavior is closely linked to acceleration and braking patterns. These factors can partially explain the observed differences in speed between sober and intoxicated drives. Intoxicated drivers tend to drive faster on country roads, highways, and across the entire route, especially in rural curves and at maximum speeds. This confirms the expected increase in speed due to intoxication. Additionally, speed variability increases (higher variance and standard deviation), likely due to erratic acceleration and braking. Reliable indicators include average speed, speed standard deviation, speed variance, and the relative time spent above the speed limit.
5.2.4. Steering Behavior
Changes in steering behavior, as described in the literature, were confirmed. Intoxicated drivers exhibited faster steering movements and an increased number of steering corrections, which affects lane-keeping ability. Drunk drivers struggle to maintain a stable lane position, resulting in more frequent steering adjustments. Significant differences in steering behavior were observed across all road types, with additional effects noted on rural roads during curves. Reliable indicators across all scenarios include the number of steering reversals greater than 5° and 10° per minute and the average steering speed in both clockwise and counterclockwise directions.
5.2.5. Reaction Behavior
Reaction times were analyzed through events simulating hazardous situations that required a braking response. Reaction time was measured as the time between the appearance of the hazard and the driver’s initial braking action. In urban areas, hazardous events included pedestrians suddenly crossing the road, and while in rural areas, they involved an animal crossing or a rock slide. Three events remained for analysis (two pedestrian crossings and one animal crossing) due to traffic-related braking obscuring other reactions. However, learning effects overshadowed the results, preventing significant conclusions about reaction times.
To supplement the analysis, reaction times were also measured when traffic lights changed from red to yellow. A significant difference in reaction time was found at only one of four traffic lights, with drunk drivers consistently showing slower reaction times. However, in all four traffic light scenarios, there was a tendency toward a slower reaction time.
5.2.6. Fixations and Saccades
Fixations and saccades were identified as the most meaningful gaze indicators. The analysis of the speed, amplitude, duration, and frequency of fixations and saccades revealed that the significance of these indicators varied depending on the driving scenario. In urban areas, the number of fixations and saccades decreased, fixation duration increased, and saccade amplitude decreased significantly. Saccade speed, however, remained essentially unchanged. On country roads, only fixation frequency and saccade amplitude showed significant differences, while on highways, no significant differences were observed.
The results align with findings in the literature, where alcohol consumption is associated with tunnel vision. This manifests as a reduction in the number of fixations and saccades, longer fixation durations, and a narrower field of vision, as indicated by a decreased saccade amplitude. Scenario-specific differences can be explained by varying demands on the driver. For example, urban driving requires frequent changes in the line of sight due to turning and lane changes, whereas highway driving involves a more constant, forward-focused vision. As a result, significant differences in gaze behavior are harder to detect on highways.
5.2.7. Eye-Steering Coordination
Eye-steering coordination was analyzed by examining two aspects: the temporal delay between eye and steering movements and the frequency of turns or lane changes made without prior sideways glances. Normally, drivers look into a curve or turn before initiating a steering maneuver. Under the influence of alcohol, this foresight is reduced, and a shorter time lead between eye and steering movements is expected. However, no significant changes in this behavior were observed in any scenario.
On the other hand, significant differences were noted in lane-changing and turning behavior in urban and rural road scenarios, where intoxicated drivers showed fewer sideways glances before making these maneuvers. This suggests a tendency for drunk drivers to drive more recklessly and carelessly.
5.3. Classification Algorithms for DUI
To ensure a reliable classification of driving under the influence of alcohol, various classification algorithms were tested and compared. The performance of each algorithm was evaluated across different driving scenarios.
5.3.1. Logistic Regression
Logistic regression proved to be an effective method for classifying DUI across all scenarios. As demonstrated in
Table 2, accuracy generally improved when data from multiple sensor sources were combined. The fusion of data at both the data and interpretation levels yielded comparable results.
In the training data, the fusion of both datasets outperforms each individual data source, both at the data level and interpretation level. This trend appears to continue when predicting the test data; however, the highest classification accuracy was observed when using only the camera data for the complete scenario. This may result from the small dataset size, potentially leading to overfitting. The situation was further examined using a confusion matrix.
Figure 5 illustrates that the logistic regression model significantly outperforms random chance in predicting DUI. The figure highlights where the model made correct predictions and where it erred. The true positive rate (drunk drivers correctly classified) and true negative rate are roughly equal, indicating that the model is not biased toward classifying only sober or only drunk drivers. Instead, both groups were accurately identified to a large extent. Consequently, the precision and recall result in an overall accuracy of 66.76% and 70%, respectively.
5.3.2. Random Forest
Table 3 presents the results of the random forest model. Similar to logistic regression, accuracy improves when multiple sensor sources are fused. However, as with the previous model, the training accuracy exceeds the test accuracy. In most scenarios, the random forest model achieves 100% accuracy on the training data, indicating significant overfitting and noticeably poorer performance on the test data. Despite this, the test accuracy remains comparable to that of logistic regression. Due to the inherent randomness of the random forest algorithm, the results are not fully reproducible, with slight variations occurring in each run. Nevertheless, these variations are minimal, and the overall accuracy remains consistent with the values shown in
Table 3.
In the best-case scenario on the country road, the random forest model achieves an accuracy of over 70% (see
Figure 6). It also attains precision and recall values of 75% and 70%, respectively. These results indicate both a high true positive rate and a strong true negative rate, reflecting the model’s ability to accurately classify both drunk and sober drivers.
5.3.3. XGBoost
The results of the XGBoost model are presented in
Table 4. The notably high training accuracies suggest significant overfitting, leading to relatively poor performance on the test data. In urban scenarios, the model performs reasonably well, with an accuracy ranging from 56% to 63%. On rural roads, the model shows moderate performance, achieving between 51% and 63% accuracy. However, on the highway, prediction accuracy is notably low, even falling below the threshold of random guessing (50%). Overall, XGBoost performs better in urban scenarios, particularly when data are fused at the interpretation level (see
Figure 7).
5.3.4. Comparison of the Models
When comparing the models, their performance across different scenarios was evaluated. In terms of training accuracy, all models show very high performance, reaching up to 100%. While XGBoost and random forest consistently achieved close to 100%, logistic regression reaches between 67% and 97%, which suggests a lower risk of overfitting compared to the other models. The near-perfect training accuracy of XGBoost and random forest indicates a strong tendency for overfitting, as they almost perfectly memorize the training data.
In the case of logistic regression, there is a clear trend toward higher training accuracy as the datasets are fused. While this increases the risk of overfitting, it also improves the overall accuracy.
However, the focus should be on test accuracy, which is a more reliable indicator of model performance. Logistic regression maintains consistent test accuracy across all scenarios, with a slight reduction in performance on rural roads. In contrast, XGBoost and random forest exhibit greater variability in test accuracy across scenarios. Test accuracy is notably higher in city and rural road scenarios compared to the highway. When the entire route is considered, the models achieve test accuracy values that fall between the scenario-specific results.
Additionally, the results demonstrate that data fusion tends to improve test accuracy. As shown in
Table 5, the table highlights which model performs best in each scenario and indicates the data sources used to achieve the corresponding test accuracy.
In general, logistic regression and random forest tend to outperform XGBoost. Random forest achieves classification accuracies of up to 73%. However, across all three models, overfitting remains a significant issue, with near-perfect training accuracies contrasting sharply with the fluctuating test accuracies and F1 scores.
5.4. LSTM Network
Another classification model, based on Long Short-Term Memory (LSTM) networks, was developed to predict the driver’s state more accurately. Unlike the previous models, the LSTM model uses the full time series of measured values rather than rely on calculated metrics such as average speed. It was anticipated that using time series data would lead to better classification accuracy by preserving more of the dynamic information inherent in the signals.
The model was trained with approximately 90% of the participants’ data, while the remaining 10% were used for testing. This high training proportion was chosen due to the relatively small sample size. Each scenario was analyzed individually, and the model was trained and tested with the respective datasets. The model was trained over 100 epochs, with training improvements generally stabilizing around 60–70 epochs. In line with previous results, the fusion of multiple data sources tended to yield better results than using a single dataset, though the accuracy varied across scenarios:
The average testing accuracy in urban scenarios ranged from approximately 50% to 70%.
The best testing accuracy was observed on rural roads, with maximum values exceeding 80%.
A poor testing accuracy was noted on the highway, with values around 50%.
During training and testing, it became evident that classification accuracy was more influenced by driving behavior than by eye-tracking data. This explains why the test accuracy was higher on rural roads, where driving behavior showed the most pronounced differences between sober and intoxicated states.
However, the small sample size posed a significant challenge for the LSTM model. Overfitting was a persistent issue during training, and attempts to mitigate this by augmenting the data with synthetic samples did not yield any meaningful improvement in classification performance.
6. Discussion
By focusing our study on a highly homogeneous group of participants, we minimized confounding factors such as variations in drinking habits and driving experience [
28,
30,
35]. This approach allowed us to draw significant conclusions despite the relatively small sample size of 21 subjects.
6.1. Features
In the study, each subject completed five drives. Excluding the test drive, each driver performed two sober and two intoxicated drives. The results indicate that the features considered for drives in the same state (sober or intoxicated) did not differ significantly. This suggests that learning effects did not notably influence the driving and gaze behavior. Consequently, it was possible to merge the drives with the same alcohol state and compare them with those from the opposite state.
When comparing sober and intoxicated states, the features showed significant differences, aligning with the expected results from the literature. This study confirms the findings regarding risky and more aggressive driving behaviors, such as increased speed, acceleration, and variance [
21,
22,
26]. We also observed a tendency toward slower reaction times. However, the changes were not significant enough to be detected with the paired
t-test, nor did we find an increase in errors when stopping at red traffic lights [
21]. This absence of notable changes may be attributed to learning effects from the repeated driving scenarios. Fortunately, the learning effects were limited to the events that occurred and did not influence general driving and viewing behavior.
Changes in lateral stability and steering behavior were also noted [
22,
23,
26]. The most pronounced changes were observed on the rural road. Although significant differences were found in all scenarios (see
Table 1), the extent of these changes was greatest on the rural road. It is hypothesized that the overlay of numerous events in urban settings results in less noticeable changes in driving behavior. At the same time, the more monotonous driving on highways may lead to more minor changes.
Gaze behavior showed different patterns of change, with a stronger dependence on the driving scenario. Gaze behavior exhibited the most changes in the city, which is expected due to increased driver activity. In urban environments, drivers must frequently shift their gaze when turning or changing lanes. On rural roads, drivers still need to look around, whereas on highways, they can focus ahead for extended periods. The reduced gaze activity in certain scenarios makes it more challenging to detect differences between sober and intoxicated states. As noted by Makowski et al., Watten et al., Roche et al., and Silva et al., fixation and saccades are effective indicators of changing gaze patterns [
31,
32,
33,
35].
Figure 1 demonstrates that the relative risk of causing an accident increases exponentially with the BAC. This indicates that driving behavior deteriorates more severely with higher BAC levels. In our study, participants consumed alcohol only up to a BAC of 0.6 per mile. As the graph illustrates, the risk at this BAC is still relatively low (only 68% higher risk than driving sober), which might explain why we did not reproduce some of the changes reported in the literature. For instance, we did not observe significant changes in eye-steering coordination, contrary to the findings of Marple-Horvat et al. [
15]. Additionally, smaller changes are harder to classify compared to more pronounced changes at higher BACs. Nonetheless, detecting impairment just above the legal limit is particularly valuable. We hypothesize that our models would perform better at higher BACs, although the opposite conclusion may not be achievable.
6.2. Models
While it is anticipated that the selected models will achieve better classification accuracy at higher BACs,
Section 5.3 shows that the classification accuracy is influenced by other factors as well. Various scenarios and datasets were examined during the training and testing of different models. While the literature typically focuses on identifying changes that occur, we went further by evaluating how reliably these changes can be used to detect a driver’s state. Moreover, we advanced the analysis by integrating multiple sensor sources for state classification and comparing the outcomes. Various classification algorithms were employed to assess detection accuracy (see
Section 5.3.4). The focus was on three simplified models—logistic regression, random forest, and XGBoost—which were then compared to a more complex LSTM network.
As expected, combining multiple datasets improved model performance. Fusion at both the interpretation and data levels produced similar results. However, the driving scenario had a more significant impact on test accuracy than the choice of dataset. On average, the models performed best in urban and rural road scenarios, with significantly worse performance on highways. This observation aligns with the findings from the feature analysis, where the most significant differences were noted in city and rural conditions. The analysis across all scenarios resulted in test accuracies between those observed in city and highway settings.
Similar trends were observed with the LSTM model, which achieved particularly high test accuracies on rural roads. A significant challenge for all models, especially the LSTM, was the relatively small sample size. The limited number of participants was insufficient for training the models effectively, leading to overfitting issues.
6.3. Limitations
Our models and conclusions about altered driving behavior are based on a narrowly defined, homogeneous group of participants. It is essential to verify whether these findings can be generalized to a broader population, as the classification accuracy may shift with a more diverse subject pool. Factors like varying drinking habits and driving experience could impact the reliability of detection [
28,
30,
35].
Additionally, this study was conducted under controlled, optimal conditions. Participants were required to appear sober, having consumed only a light snack, and to abstain from alcohol, caffeine, or medication prior to the study. Given these conditions and the fact that the study took place in a driving simulator rather than a real vehicle—where the driving routes were identical each time—the applicability of the results to real-world scenarios must be considered. Furthermore, individual factors such as daily physical and mental conditions, as well as inherent differences between participants, could strongly influence the results. Factors like adrenaline rushes, fatigue after partying, or other variables that impact real driving behavior were not accounted for.
However, this study’s primary objective was to provide a direct comparison between sober and intoxicated driving. The controlled simulator environment, with its consistent conditions, was the most suitable approach for achieving this comparison. Nonetheless, the validity of these findings could be strengthened by increasing the number of participants and conducting tests in a more realistic driving environment.
7. Conclusions
Road safety continues to be a paramount concern worldwide, with driving under the influence (DUI) remaining one of the leading causes of traffic accidents. Investigating DUI is a logical step to enhance safety measures. Our study extends previous research by not only confirming the known effects of alcohol on driving and gaze behavior but also using these indicators to classify a driver’s state. In doing so, we took an innovative step further by classifying a driver’s impairment state using multiple-sensor data. By combining eye-tracking and driving behavior data, we were able to fuse these sources of information in various ways, leading to valuable insights and high-performing classification models.
This study confirms the well-documented impact of alcohol on driving, particularly highlighting risky behaviors such as increased speed, more aggressive maneuvers, and changes in steering and lateral stability. These changes were most pronounced in rural road scenarios, where monotonous driving heightened the visibility of alcohol-induced impairments. Urban environments, while complex and event-heavy, also demonstrated noticeable shifts in driving behavior, though these were less distinct due to the multitude of visual and driving tasks required. Interestingly, our findings reveal that gaze behavior (specifically fixation and saccades) was strongly dependent on the driving scenario, with city settings displaying the most dynamic shifts due to the frequent need for attention shifts when turning or changing lanes.
One of the novel contributions of our study was the application of classification models to detect intoxication based on sensor data. The fusion of eye-tracking and driving behavior features allowed for a strong classification performance, particularly in urban and rural settings. Our models achieved test accuracies exceeding 70%, with the more complex LSTM model reaching up to 80% in accuracy on rural roads. These results illustrate that combining data sources enhances the ability to detect a driver’s state, offering a more robust method for identifying impairment compared to relying on a single data stream.
However, while our models showed strong performance, they were hindered by overfitting, primarily due to the relatively small sample size of participants. This overfitting was evident in the gap between training and testing accuracies, especially on highways, where the more consistent driving environment made it harder for the models to distinguish between sober and intoxicated states. Additionally, the moderate BAC level of 0.06% may have limited the detection of more severe impairments that might occur at higher levels of intoxication. Nonetheless, detecting impairment just above the legal limit is particularly important for real-world applications, where even mild impairment can pose significant risks.
Looking ahead, future research should focus on addressing the limitations encountered in this study. A larger sample size would help mitigate the overfitting issue and improve the generalizability of the models. Furthermore, investigating higher BAC levels could reveal more pronounced behavioral changes and offer deeper insights into the relationship between alcohol consumption and driving performance. Additionally, exploring correlations between gaze behavior and driving performance could provide valuable insights into how attentional focus is impacted by intoxication. By refining these approaches, future studies could further enhance the accuracy and reliability of classification algorithms, offering powerful tools for improving road safety and potentially informing the development of real-time DUI detection systems in vehicles.
In conclusion, this study not only validated previous research on the effects of alcohol on driving but also advanced the field by demonstrating how multiple sensor sources can be leveraged to classify a driver’s intoxication level accurately. With improvements in sample size and expanded research into higher BAC levels, these models hold great potential for contributing to future road safety technologies, making our roads safer for everyone.