Introduction
The size of the human pupil regulates to the amount of light that enters the eye. An increase in luminance there-fore results in a fast constriction of the pupil, a luminance decrease in a gradual ‘unconstriction’. The pupillary light reflex (PLR) (Mathôt, Van der Linden, Grainger, & Vitu, 2013)regulates the light influx. On the other hand, there is a well-studied correlation between pupillary dilation and cognitive factors such as workload (
Schwalm, 2009), sur-prise (
Kloosterman et al., 2015), attention (
Hoeks & Levelt, 1993), and emotional arousal (
Granholm & Steinhauer, 2004).
Pupil dilation constitutes a proxy for indirect measure-ment of these cognitive factors, which would otherwise only be visible with costly and intrusive measurements such as EEG. Through behavioral observation, however, one can only measure the superposition of PLR and cogni-tive influences. Both, ‘unconstriction’ due to a luminance change and pupil dilation due to an increased arousal level result in a larger pupil. This fact effectively limits the use-fulness of pupil dilation in most practical applications, as an equiluminant surrounding is only realistic for labora-tory experiments.
Not only are the causes of PLR unconstriction and cog-nitive pupil dilation different, but also different brain re-gions trigger them and they manifest through different components of the eye musculature. PLR is driven mainly by the constriction and relaxation of the iris sphincter mus-cle; cognitive changes innervate the iris dilator muscle (
Mathôt et al., 2013), (Tanaka, Kuchiiwa, & Izumi, 2005).
With their work on the ‘index of cognitive activity’ (ICA) Marshall demonstrated that these processes are in fact so different in their manifestation (especially in the speed and acceleration of dilation and constriction) that so-phisticated signal processing can separate cognitive from PLR caused pupil size changes (
Marshall, 2007). Schwalm et al. later used this method to distinguish between mental workload levels of drivers in a simulator (
Schwalm, 2009). In the context of driving, the study of pupillary dilation has mostly focused on mental workload (M. Recarte & Nunes, 2003; Zhang, Owechko, & Zhang, 2004).
Driving is generally considered a foveated task (i.e., an object generally needs to be fixated by the driver in order to be perceived). However, drivers can perceive certain po-tential hazards without or shortly before an explicit fixa-tion. On the other hand, several studies have empirically shown that the mere fixation of a specific object does not imply its perception not its interpretation as hazardous by the driver. For example, in (
Kübler et al., 2014) hazard fix-ation was found to be unreliable for predicting hazard per-ception, as an object can either be ‘cognitively overlooked’ or incorrectly judged as non-hazardous by the driver. Thus, if we want to infer information on hazard perception in a driving scene, the fixation-based information is not suffi-cient. Other physiological signals, such as electrocardiog-raphy (ECG) or Galvanic Skin Response (GSR), can help us to disambiguate. However, they usually show a variable and relatively long delay (within several seconds) so that they are not applicable to a real-time use case, e.g. to trig-ger assistance systems, nor to determine the exact moment in time when a hazard is perceived. Contrary to these phys-iological parameters, pupil response happens almost in-stantly and spans only about 2 seconds(Privitera, Renninger, Carney, Klein, & Aguilar, 2010). This lack of a delay allows for a timely interaction with in-vehicle sys-tems.
In this study, we investigate the pupil dilation in imme-diate response to a hazard during driving. Our aim is to investigate the predictive quality of the pupillary signal to infer hazard perception. Being able to detect hazard per-ception of a driver reliably via a change in pupil diameter is interesting for multiple reasons: In Underwood, Ngai & Underwood, 2013 the authors perform a hazard perception task where subjects are to press the space bar once they perceive a hazard. Similar experimental setups are com-mon in studying hazard detection, e.g., in (Bowers, Mandel, Goldstein, & Peli, 2009) subjects were to honk upon detection of a pedestrian. We could substitute such artificial manual feedback by a non-invasive measurement of pupil dilation. Furthermore, we could disambiguate other stress signals, such as hazard fixation, heart rate changes or the galvanic skin response by use of the pupil diameter: was an object perceived and judged as hazard-ous?
For the purpose we can built on insights gained from the analysis of mental workload during driving, as the identification of a stress response shares the common prob-lem of isolating a cognitive pupillary dilation from the PLR. For example in(
M. A. Recarte & Nunes, 2000), the authors find an increase in pupillary dilation with mental workload that is reliable even under the daylight variations of a natural environment. However, the detection of this effect is only possible through averaging over of a large amount of data and by applying statistical methods. Find-ing a statistically significant difference in a large collection of data does not imply that a useful classification of indi-vidual trials towards a specific mental workload state is possible.
In this context, the Index of cognitive activity is of much interest, as its authors claim that it is almost immune to illumination changes. Therefore, a wavelet transfor-mation filters only those pupil changes that did not origi-nate from ambient illumination changes. By analyzing only certain components of the wavelet-transformed sig-nal, we filter for a specific dilation speed and ampli-tude(
Marshall, 2007).
For determining a stress level, increasing mean values of the pupil diameter over time are commonly used(
Zhai & Barreto, 2006). This averaging has the advantage of be-ing relatively robust towards momentary pupil diameter changes as caused by rapid illumination changes. Pedrotti et al. used a wavelet transformed pupil diameter in a sim-ulated driving task in order to classify different stress lev-els of the driver(
Pedrotti et al., 2014). Such a procedure is useful when a gradual change in stress level is expected. However, for our application, we are interested in sponta-neous, fast stress events and an average filter would delay the detection of the expected steep and short peaks.
In the following, a filtering and classification cascade for the pupil diameter signal is introduced that can be uti-lized to classify the perception of hazards during driving in a simulator.
Methods
Driving Simulator Experiment
Thirty-one subjects drove in the moving-base driving simulator (
Zeeb, 2010) at the Mercedes-Benz Technology Center in Sindelfingen, Germany. The cabin contained a real car body amidst a 360° virtual reality, thus the driving experience was very realistic. Each subject absolved a 40 min drive of 37.5 km length. Nine hazardous situations oc-curred at predefined positions along the course. A Dikablis essential eye tracker (Ergoneers GmbH, Manching/Ger-many) recorded eye movements and pupil size at 25 Hz. Simultaneously, we recorded the physiological parameters galvanic skin conductance (GSC, Biotrace+ with finger electrodes) and heart rate (ECG, mobile 3-channel cus-tomed EKG).
Figure 1 shows the experimental setup. The processing steps required to derive an indicator of hazard perception from these sensors are published in(
Kübler et al., 2014).
All subjects were recruited from the department of Neuro-Ophthalmology at the University of Tübingen (Ger-many). The research study was approved by the Institu-tional Review Board of the University of Tübingen (Ger-many) and was performed according to the Declaration of Helsinki. Aim of the original study was to analyze the driv-ing performance of patients with binocular visual field loss (16 patients, 15 control subjects). For the analysis provided here, we do not expect an influence of these groups on the pupil diameter and therefore provide no further interpreta-tion with regard to the visual field defects.
Pupillary Data Processing
As we are operating on data recorded in a close to re-alistic environment, we have to first assure sufficient data quality. In a preprocessing step, we eliminated blinks, par-tial blinks and unlikely pupil sizes from the data:
The first 30 seconds of the pupil signal were very noisy due to an acclimation phase of the subject in the car. We discarded this relatively short time interval for all subjects. We identified blinks, tracking failures of the eye tracker and pupil size samples that differed by more than 10% from their preceding value (empirically chosen and mainly dependent on pupil detection quality). We eliminated these usually relatively short tracking losses from the data. That produces an artifact spanning up to five samples, given the 25 Hz sampling rate of the eye tracker. Additionally, two samples (corresponding to 40 ms) before and after a blink were removed as well since a partial occlusion of the pupil by a half-closed eyelid may cause the pupil detection to report a smaller size than actual pupil size. To eliminate physiologically unlikely pupil sizes, we used a statistical approach and considered all pupil sizes that exceed the av-erage by more than three standard deviations outliers. Such samples result from a failure of the pupil detection algo-rithm (e.g., by detecting the iris instead of the pupil). We filled the gaps from missing/eliminated data by a linear in-terpolation between the neighboring valid samples. This step was necessary as the following frequency-based pro-cessing steps require a continuous signal without disconti-nuities.
Trials with less than 75% of valid data (with interpo-lated points of the previous step counting towards invalid data) were not included for further analysis. In the next step, we compensate for a non-stationary trend (i.e., a gradual slow change in pupil diameter over several minutes). We identify such a local trend by reconstruction of the original signal from wavelet coefficients that corre-spond to a low frequency band (see
Figure 2). It is neces-sary to remove such a trend before applying spectral anal-ysis, as it distorts the spectra of the signal at low fre-quency(
Andreas & Trevino, 1997).
A manual analysis of the pupil diameter signal after fil-tering and smoothing indicated that peaks do indeed occur at the hazardous situations, but also that a simple threshold approach is insufficient to detect them reliably amongst the high noise level. Spurious pupil diameter peaks need to be distinguished from the peaks corresponding to hazardous situations. We employ the method introduced in (
Haver, 2008) for this purpose:
First, we detect zero-crossings of the smoothed first de-rivative of the pupil diameter signal. They correspond to extrema in the original signal. We consider them as candi-date peaks, if their amplitude exceeds 1.5 standard devia-tions. Then, a parabola is fit to the set of points within a 2.5 second time-window around the peak by least squares quadratic fit (
Figure 3) using the
full width at half maxi-mum method(
O'Haver, 2017). The pupil response to visual detection is supposed to last for 2-2.5 seconds(
Privitera et al., 2010), motivating this choice of window width.
Wavelet Analysis
For each drive, we identified and labelled all hazardous events and the corresponding pupil signal. We automated this process as the driving simulation provided the position of the vehicle on the track and we knew about the position of the pre-programmed hazardous events.
Several different events resulted in a stress or emo-tional response on different levels of intensity during the driving session. As the illumination within the simulator environment does not change as rapidly and intensely as during actual on-road driving, we can expect these events to have a major impact on pupil dilation. A stress response results in rapid pupil dilation, but also in the following gradual return to normal size. This gradual return is often of oscillatory nature and contains several (decreasing) waves. The more significant the event, the longer this re-turn phase(
Andreassi, 2000). In order to discriminate be-tween possible causes for a pupil dilation, we perform a scale analysis of the time series: wavelet analysis.
The wavelet transform decomposes a signal into wave-lets (i.e., small waves with their energy concentrated in time). These wavelets are scaled and shifted copies of a main pattern, called the mother wavelet. In a multiresolu-tion representation, the signal is decomposed into increas-ingly finer details based on wavelet and scaling functions, which correspond to a high pass and a low pass filter. Pre-cise time information is contained at high frequencies and frequency information at low frequencies. These filters are applied successively to the signal joined by a down sam-pling by factor 2 (
Figure 4). The maximum level of decom-position depends on the relevant time scale of events under consideration(
Kaiser, 2010),(
Mallat, 1998).
We can separate events at different levels on the arousal scale by partial reconstruction of the signal in only one specific frequency sub-band, which corresponds to the respective arousal level.
For our purpose, we chose to decompose and recon-struct the signal accurately at a time scale of 1-2 s. The pupil can react to stimuli within 200-350 ms and reaches peak response between 500-1000 ms(
Privitera et al., 2010). For the 25Hz sampling rate of our eye tracker, this corresponds to the fourth level decomposition.
It is important to select a wavelet that matches the shape and frequency characteristics of the signal we want to separate. The Daubechies wavelet family is optimal in the sense that most of the wavelet coefficients are small or zero, making them well suited for matching smooth poly-nomial features in a given signal(
Daubechies, 1992).
Classification of Pupil Size Peaks
To discriminate between peaks that occur as an effect of noise and ambient illumination change during the drive from pupil responses to hazardous events, we used a sup-port vector machine (SVM) with radial basis function (RBF) kernel. The feature vectors used for the training of the SVM were composed of the following: amplitude, mean diameter, area of the approximation coefficient A4, and the wavelet relative energy corresponding to the detail coefficients D1-D4 (
Figure 4). The SVM selects those cri-teria and their interactions that help us to distinguish be-tween different kinds of peaks. Such a machine learning approach is sensitive to unbalanced data. In our case, the relatively large amount of peaks occurring during normal driving (that we want to classify as noise) would result in a relatively high classification accuracy, even if the SVM would simply classify every as noise. It would simply ne-glect the few hazardous events. Therefore, we balanced the number of feature vectors for each class by oversampling of the minority class (i.e., the hazardous events). We trained and tested the SVM using leave-one-out cross-val-idation and evaluated the classification accuracy sepa-rately for each subject by using only training data from the other subjects. This evaluation procedure is almost unbi-ased and gives a good indication of the cross-subject gen-eralization performance (
Elisseeeff & Pontil, 2003) while it makes good use of our limited training and test data. It should however be noted that the selection of candidate peaks and the construction of the feature vector involves subject-specific adaptation such as the subject’s average pupil diameter and its distribution.
Results
Figure 5 shows the detailed results of the classifi-cation for each subject. A white circle indicates a change in pupil diameter that the classificatory judged relevant; a black circle indicates that such a change was not detected. The surrounding square indicates whether the driving instructor judged the driving response as adequate or not. Both markers have to be considered in conjunction. For example, a black square and black circle indi-cate a situation that the driver did not perceive and, consequently, did not react to. A white square with a white circle would correspond to a hazard that the subject responded to adequately and that caused a pupil dilation.
Figure 6 shows the ROC curves for the classification, sep-arately for the patient and the control group. As there were very few inadequate driving responses in the control group, we can expect the curves to differ even in the case that the visual field defect does not have any effect on the pupil diameter.
Such a prediction assumes that a hazard to which the driver reacted was perceived and via versa a hazard that was not reacted to adequately was overlooked by the driver. From previous analyses of vital parameter data we know that this was not always the case, e.g., some drivers responded in-adequately to a hazard they had perceived. Therefore, we cannot expect a perfectly reliable classification result. For our analysis, we decided to predict as many of the hazard-ous situations from the pupil data as possible and allowed for a moderate number of false positives (so we judge in favor of hazard perception in case of doubt). The numeri-cal classification results are provided in
Table 1.
When we analyze those situations that lead to a failure of the driving test, we can now distinguish between a percep-tual failure and a behavioral failure: Subject PH11 fails the first situation without perceiving the hazard. The same subject also fails the sixth situation, but this time perceived the hazard, as a pupil diameter change happens. This might be due to a general awareness of a dangerous situation without knowledge about the exact location. Just as inter-esting is that we can also derive that PH07 showed an ad-equate driving behavior to the first and seventh situation, even though the hazardous object was likely not perceived. Being able to include such events in the evaluation of driv-ing performance will allow us to better judge driving safety also for subjects with a more defensive driving style that would require extensive testing before the percep-tional deficit becomes obvious in terms of a driving test failure.
Table 2 gives some insights to the false positives that influence the ROC curves and classification results. We can observe that the classification step performs well in filtering only few events from many candidates (e.g., from 86 to 7 for subject CG66). It returns an average of 8±8 false positives, i.e. it classified pupil size peaks as a stress re-sponse to a hazard that were not associated with one of the predefined hazardous situations. Without the classifier, an average of 50 false peaks per drive would be reported. For the task at hand we aimed at predicting hazard perception at the predefined hazardous situations. It is possible that some subjects were very careful at several other situations along the route that looked like potential hazards, and therefore, showed valid additional stress responses that we are (wrongly) counting as false positives. In order to de-cide in favor of hazard perception we accepted a relatively high number of false positives along the complete drive..
Discussion
Hazard perception involves the input of sensory infor-mation and subsequent cognitive processing. This pro-cesses result in the identification of potentially dangerous traffic situations. Only the combined process of seeing and identifying a hazard will lead to a stress response. We found that pupil dilation can be utilized to disambiguate hazard perception and adequate driving reaction in a sim-ulated driving scenario.
We employed a filtering and classification cascade that is able to identify sudden stress responses from the pupil data. We aimed at correctly detecting as many of the haz-ardous situations as possible from the pupil diameter only while trying to minimize the false positives. This allows us to determine whether the subject likely perceived a haz-ard. Due to the number of false positives during the drive it could not be used as a stand-alone detection system for hazardous situations, e.g., to trigger assistance systems. It only indicates the perception of the driver, if such an event has occurred. We designed the hazard situations to be eas-ily overlooked by the driver and to resemble a looming emergency. They are therefore very attention arousing and stress inducing. For less challenging scenarios where the driver can detect hazardous objects earlier and sufficient reaction time is available, no stress signals would be ex-pected. Eye-tracking measures would then be sufficient.
Pupil dilation events were more absent in those situa-tions that were relatively difficult (e.g. 1, 2 and 6, where subjects actually failed the driving test or had only few re-action time available). That indicates that careful, prospec-tive driving behavior may have resulted in a less intense experience of the hazardous situation for some drivers – or that they were simply lucky to have passed the situation. We further found that there is a large individual variation in the number of predicted stress peaks per subject, likely associated with the level of engagement and emotional arousal of the driver during the test scenario.
We showed that the pupil variation events occur with the detection, recognition and reaction to potentially dan-gerous events while driving. It indicates the moment at which a potentially dangerous event becomes relevant to awareness. Furthermore, the pupil dynamics can resolve the ambiguity of perception and unexpected uncertainty that plays an important role in detecting and recognizing unexpected dangerous events(Einhäuser, Stout, Koch, & Carter, 2008),(Nassar et al., 2012).
As brightness within a simulated world (road surface, sky, vegetation, etc.) varies only about ±5% from the av-erage brightness(Palinko, Kun, Shyrokov, & Heeman, 2010), we can currently not conclude as to whether and to what extend these findings may hold for on-road driving. Yet, the indicator may be useful for studies that require a precise distinction between hazard perception and the be-havioral driving response without requiring unnatural be-havior such as pressing a button upon detection. The ap-proach may also be used to assess the design of a simulator track as to whether a timely detection of planned hazard scenarios is possible.