Next Article in Journal
A Scheme for Enhancing Precision in 3-Dimensional Positioning for Non-Contact Measurement Systems Based on Laser Triangulation
Next Article in Special Issue
A Single RF Emitter-Based Indoor Navigation Method for Autonomous Service Robots
Previous Article in Journal
Predicting Long-Term Stability of Precise Oscillators under Influence of Frequency Drift
Previous Article in Special Issue
A Survey of Data Semantization in Internet of Things
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Wearable Driver Distraction Identification On-The-Road via Continuous Decomposition of Galvanic Skin Responses

by
Omid Dehzangi
*,†,‡,
Vikas Rajendra
and
Mojtaba Taherisadr
Computer and Information Science Department, University of Michigan-Dearborn, Dearborn, MI 48128, USA
*
Author to whom correspondence should be addressed.
Current address: University of Michigan-Dearborn, 4901 Evergreen Road, Dearborn, MI 48128, USA.
These authors contributed equally to this work.
Sensors 2018, 18(2), 503; https://doi.org/10.3390/s18020503
Submission received: 4 December 2017 / Revised: 21 January 2018 / Accepted: 30 January 2018 / Published: 7 February 2018
(This article belongs to the Special Issue Sensing, Data Analysis and Platforms for Ubiquitous Intelligence)

Abstract

:
One of the main reasons for fatal accidents on the road is distracted driving. The continuous attention of an individual driver is a necessity for the task of driving. While driving, certain levels of distraction can cause drivers to lose their attention, which might lead to an accident. Thus, the number of accidents can be reduced by early detection of distraction. Many studies have been conducted to automatically detect driver distraction. Although camera-based techniques have been successfully employed to characterize driver distraction, the risk of privacy violation is high. On the other hand, physiological signals have shown to be a privacy preserving and reliable indicator of driver state, while the acquisition technology might be intrusive to drivers in practical implementation. In this study, we investigate a continuous measure of phasic Galvanic Skin Responses (GSR) using a wristband wearable to identify distraction of drivers during a driving experiment on-the-road. We first decompose the raw GSR signal into its phasic and tonic components using Continuous Decomposition Analysis (CDA), and then the continuous phasic component containing relevant characteristics of the skin conductance signals is investigated for further analysis. We generated a high resolution spectro-temporal transformation of the GSR signals for non-distracted and distracted (calling and texting) scenarios to visualize the associated behavior of the decomposed phasic GSR signal in correlation with distracted scenarios. According to the spectrogram observations, we extract relevant spectral and temporal features to capture the patterns associated with the distracted scenarios at the physiological level. We then performed feature selection using support vector machine recursive feature elimination (SVM-RFE) in order to: (1) generate a rank of the distinguishing features among the subject population, and (2) create a reduced feature subset toward more efficient distraction identification on the edge at the generalization phase. We employed support vector machine (SVM) to generate the 10-fold cross validation (10-CV) identification performance measures. Our experimental results demonstrated cross-validation accuracy of 94.81% using all the features and the accuracy of 93.01% using reduced feature space. The SVM-RFE selected set of features generated a marginal decrease in accuracy while reducing the redundancy in the input feature space toward shorter response time necessary for early notification of distracted state of the driver.

1. Introduction

The fatalities on the road have increased in 2016 by 5.6% percent from calendar year 2015 (37,461 lives were lost on U.S. roads in 2016) according to National Highway Traffic Safety Administration (NHTSA). The major contributing factor in the fatal accidents on the roadway is distracted driving. Continuous focus of the driver is a necessity for driving. Various research studies have shown that the driver’s attention decreases during multitasking, such as slower reaction time, decreased situational awareness, impairing judgments and narrowed visual scanning [1]. Distraction occurs when drivers divert their attention from the task of driving to a secondary activity instead such as having a phone conversation, texting, using the infotainment system [2], etc. The most common distracting secondary tasks during driving is when the driver uses his/her personal cell phone for either calling or texting. It is crucial to detect and notify driver distraction at its early stages in order to minimize the risk of road accidents. Many research investigations have been conducted to develop reliable feedback systems to alert distraction scenarios to the drivers. Many of the previous works employed techniques based on eye lid closure and movement tracking [3], lane tracking [4], and video cameras as an image processing technique by periodically taking video images of the driver [5] to identify inattention state of the drivers. Even though successful performances were achieved through above methods, they suffer from issues such as privacy violation risks and delayed detection and responses when the effect of distraction is visually noticeable. Those limitations can be overcome via continuous monitoring of physiological signals such as Electroencephalogram (EEG) rather than cameras. EEG based systems that generate state-of-the-art results are comprehensive and reliable [6] . However, the complexity of setup for collecting and analyzing the data is one of the major limitations of EEG, which makes the system expensive and intrusive to implement [7,8]. Galvanic Skin Response (GSR), on the other hand, is a minimally intrusive modality that can be sensed on the wrist and fingers and can be recorded easily [9,10]. GSR also known as skin conductance (SC) is one of the most sensitive markers for emotional arousal [11]. Unconscious response of our body to different stimuli through skin conductance is measured using GSR. Changes in skin conductance in the hands and foot region triggers emotional stimulation [11,12]. Higher skin conductance is demonstrated for intense level of arousal. Sympathetic activity, driving human behavior, cognitive, and emotional state on a subconscious level is controlled autonomously by the skin conductance.
Several investigations on synchronously recorded GSR signals have been conducted to inspect the impacts of cognitive state change. In study [12], the authors used GSR as an index of cognitive load to evaluate users’ stress due to workload while performing reading and arithmetic task. Temporal and spectral features were explored and concluded that spectral features showed to be promising in measuring the cognitive workload compared to the temporal features. In the previous work [13], a novel method for analyzing skin conductance (SC) using Short Time Fourier Transform (STFT) was employed to extract estimation of mental work load with high enough temporal bandwidth to be useful for augmented cognition application. Graphical data analysis of the STFT showed notable increase in the power spectrum across a range of frequencies directly following fault events. GSR was used in [14] for emotion recognition by extracting time domain and wavelet based features. Features were extracted using various window lengths. Random forest machine learning algorithm was used to characterize valence and arousal satisfactorily. In previous work [15], a system for human emotion recognition that automatically selects GSR features was proposed. Thirty features were extracted and a covariance based feature selection was implemented to extract an optimized feature set to better characterize the human emotions. Support vector machine (SVM) has been used for human emotion recognition with an accuracy of more than 66.67%. The above-mentioned previous works provided enough support to consider GSR as a reliable measure to identify and characterize mental workload. However, very few investigations were performed to detect cognitive workload or distraction while naturalistic driving using GSR. The authors in Ref. [16] used physiological signals like electrocardiogram, galvanic skin response and respiration to develop a novel system for stress detection during naturalistic driving. Features were extracted mainly from time, spectral and wavelet multi-domains. Features were generated for 10-s intervals of data. Detection of stress was accomplished using kernel-based classifiers. This study provided satisfactory results to employ physiological signal measures to in-vehicle intelligent systems to assist drivers on the road for early detection of stress. In that study, raw GSR signal in combination with other physiological measures was used to detect stress and solely time domain features of raw GSR were explored. In our previous work [10], we considered raw GSR signals for a preliminary analysis of driver distraction during a naturalistic driving experiment. We solely focused on two scenarios: (i) normal driving (non-distracted state) and (ii) driving while having an engaging phone conversation (distracted state). We then extracted some standard statistical measures and used binary SVM for distraction detection. Our aim was to analyze the discriminative power in the raw GSR space between normal and distracted driver state. We evaluated the detection model on six subjects and achieved the average detection accuracy of 91%.
Our aim in this paper is to design a system to identify the impact of secondary tasks of calling and texting on drivers using a continuous measure of phasic GSR signal during a naturalistic driving experiment. In our experiments, we use a wrist band wearable GSR on a population of 10 driver subjects that participated in this study during real driving experiments. Three scenarios were investigated in our experiments: (1) normal driving focusing attention on the primary task of driving; (2) phone distracted driving while having an engaging phone conversation; and (3) text distracted driving while writing and sending texts when driving. We hypothesize calling to be a cognitive distraction element in comparison to texting, which represent cognitive and visual distraction at the same time. We aim to evaluate GSR towards identification of distraction on the edge using short-term segments of GSR. The collected GSR data was decomposed into phasic and tonic components using continuous decomposition analysis (CDA) [17]. We then conducted a high resolution spectro-temporal analysis of the decomposed signals and continuous phasic components of GSR containing the most discriminative information was considered for subsequent analysis. We then extracted several spectral and temporal measures that characterize the phasic GSR signal in correlation with distracted scenarios. We employed linear and kernel-based Support Vector Machine (SVM) and 10 fold cross validation (10-CV) to generate identification results. Upon evaluating the result, phasic GSR showed promise as a reliable indicator of driver distraction by achieving an overall average accuracy of 94.81% to identify distraction elements under a naturalistic driving condition. Since input feature space is constructed in a manual process, the redundancy and computational complexity of the space might decrease the accuracy and response time of distraction identification in the generalization phase. Therefore, we employed support vector machine – recursive feature elimination (SVM-RFE) [18] to remove the redundancies for more efficient processing on the edge. We employed SVM-RFE in order for the following: (1) generate a rank of the discriminative features for the subject population, and (2) create a reduced feature subset with the highest distraction identification accuracy. Our experimental results using SVM-RFE demonstrated marginal decrease in accuracy while reducing the computational complexity and the redundancy in the input space towards early notification of distraction state to the driver.

2. Materials and Methods

Figure 1 depicts a flowchart of the proposed driver monitoring and intervention system on the edge. After preprocessing, the recorded raw GSR signal was decomposed into phasic and tonic component using continuous decomposition analysis (CDA) and based on spectro-temporal analysis and characterization of the phasic GSR, we designed and developed segmentation and feature extraction modules. The extracted features were then used for identification tasks. Furthermore, feature selection was implemented using SVM-RFE to reduce the dimension of the feature space to alleviate the curse of dimensionality and improve the response time. In this section, we discuss the implementation of each step in detail.

2.1. Data Acquisition and Preprocessing

We have developed a custom designed wearable data acquisition platform comprising a synchronized multi-modal solution to acquire the physiological signals using a comprehensive wearable sensor network, which is used to collect data [19]. Our platform is capable of collecting large amount of heterogeneous drivers’ physiological including Galvanic Skin Response (GSR) during naturalistic driving.
Experiments conducted in this study were approved by the Institutional Review Board (IRB) of the University of Michigan with the Submission ID: HUM00102869. The driver participants were given a consent form to sign before the experiments, in which the nature of the data collection were described. The experiments were conducted under constant supervision of two research investigators for a reliable data acquisition process.
The total of 10 subjects between the age group of 20–40 years that participated in our experiment were legally permitted to drive. Healthy male subjects were considered for our experiment to rule out the inconsistency. These subjects were told to stay away from any alcoholic beverages or pharmaceuticals that would trade off their sharpness amid the investigation. Figure 2a depicts a driver subject while conducting our naturalistic driving experiment. Three scenarios of driving was considered in our experiments, each of which were performed by the subjects for ≈2 min and their corresponding GSR signals were collected. The three driving scenarios were (i) driving under normal condition, (ii) driving while engaging in a phone conversation and (iii) driving while using the phone for texting. Normal driving (non-distracted) is represented by Scenario (i) while distracted driving is represented by scenarios (ii) and (iii). Figure 3 illustrates data collection order during our driving experiment. We employed a 10th order Butterworth low-pass filter <20 Hz on the recorded raw GSR to cope with and remove artifacts such as high frequency-noise, motion artifacts and also electromyography (EMG) artifacts that might interfere due to the movement of hand and finger during phone and texting experiment sessions.

2.2. Continuous Decomposition Analysis (CDA)

Skin conductance data is described by the superposition of subsequent skin conductance responses (SCRs). Due to this characteristic of the SCRs, the process of calculating the actual responses to a sympathetic activity in response to an external stimulus becomes tedious. This limitation is overcome by the deconvolution technique that separates the skin conductance (SC) data into phasic and tonic continuous activities. The tonic activity might include noise and shows subject dependencies. The tonic activity can be observed as a trend in the original SC signal in the top subplot of Figure 4. On the other hand, the phasic activity of the SC signal is considered for further investigation as this component of SC signal contains the actual response to any event-related sympathetic activity predominantly in the form of distinct burst of peaks with a zero baseline. The phasic component of SC also demonstrated trends in spectro-temporal space in correlation with the distracted scenarios.
Extracting the phasic component is done in three steps: deconvolution of galvanic skin response (GSR) data, computation of tonic activity and computation of phasic activity. A particular change in skin conductivity is triggered by secretion of sweat due to activity of the sudomotor nerve. In mathematical terms, the sudomotor nerve activity can be treated as a driver, containing distinct sequence impulse/bursts, which triggers a particular impulse response (i.e., SCRs). The outcome of this procedure can be described by convolution of the driver with impulse response function (IRF). IRF characterizes the shape of impulse response over time [17]:
S C p h a s i c = D r i v e r p h a s i c I R F .
The phasic activity is believed to bestride a gradually changing tonic activity. Hence, SC activity can be considered to be composed as follows:
S C = S C t o n i c + S C p h a s i c = S C t o n i c + D r i v e r p h a s i c I R F .
Tonic activity can also be considered as convolution of a driver function with same IRF. SC data can then be written as:
S C = ( D r i v e r p h a s i c + D r i v e r t o n i c ) I R F .
Deconvolution is the reverse process of convolution. Skin conductance data’s deconvolution incorporates a phasic and tonic fraction. By estimating one of them, the other can be determined easily:
S C I R F = D r i v e r S C = ( D r i v e r p h a s i c + D r i v e r t o n i c ) .
Tonic electro-dermal activity can be observed in the absence of any phasic activity. However, SCRs (representing phasic SC activity) have a slowly recovering trail that may obscure any tonic SC activity. For the driver, the time constant of phasic responses is markedly reduced and so is their overlap. Time intervals between distinct phasic impulses can then be used to estimate tonic activity. Convolution can be conceived as a smoothing operation. Deconvolution has the reverse effect and amplifies error noise. Therefore, the resulting driver is smoothed by convolution with a Gaussian window. According to Equation (4), the phasic driver can now be computed by subtracting the tonic driver from the total driver signal. This subtraction results in a signal, which shows a virtually zero baseline and positive deflections reflecting the time-constrained nature of the phasic activity underlying the original SC data. The above methodology was driven from previous work [17]. Figure 4 and Figure 5 represents the deconvolution/decomposition of skin conductance signal for two scenarios, normal and distracted. The top most subplot in both the figures represents the original SC signal, the middle subplot represents deconvolution of tonic driver and the bottom subplot represents the phasic driver of the SC signal that contains the most discriminating component to characterize distraction. It can be observed from Figure 4 that, for a normal scenario, the phasic driver does not show impulse burst (bursts of consecutive peaks) as the subject is not under heavy workload. However, when we observe Figure 5 for the distracted scenario, more impulse bursts are observed as the subject is in a distracted state undergoing cognitive and visual distraction.

2.3. Spectral Analysis of Phasic Skin Conductance

Phasic SC as a non-stationary and multi-component signal varies in time and components over the time, and it needs to be analyzed in both temporal and spectral space to have a comprehensive evaluation on its characteristics. In order to investigate the spectro-temporal characteristics of the phasic SC signals related to different states of the driver, we conducted a Time-Frequency (TF) analysis and designed a high resolution TF representation. TF analysis could be considered as non-stationary signals analysis with frequency content varying with time. TF is a suitable representation for non-stationary and multi-component signals, which is able to describe the energy distribution of the given signal over time and frequency space simultaneously. The TF presents the beginning and end times of the different components of the signal as well as their frequency scope. To achieve the TF representation of the phasic SC signal, we use the Wigner–Ville Distribution (WVD) [20] approach attempting to attenuate the unwanted cross-terms in the TF space relative to the signal components. Figure 6 depicts TF representation of three scenarios, including phone, text, and normal states. Figure 6a–c represent normal, phone, and text scenarios, respectively, where the top panel in all of the sub-figures represents a raw phasic SC signal. The left panel depicts energy spectral density of a subject. The spectrograph represent time on the x-axis and normalized frequency on the y-axis, and the color is used to indicate the power of the TF sample. Based on the Nyquist frequency equation, with the sampling frequency of 50, the frequency range is 0–25 Hz, which was normalized to 0–1 Hz. The spectrograms were generated with normalized frequency observing the spectrograms in Figure 6, and we could identify considerably higher frequency components and peaks during distracted scenarios versus normal scenarios.
Based on our observations in the TF space, we designed a signal processing recipe for driver distraction identification on the edge for proactive monitoring and intervention. In the following sections, we will describe implementation of segmentation, feature extraction from spectral and temporal domain, feature selection using SVM-RFE and identification using linear and kernel-based SVM.

2.4. Segmentation and Window Analysis

To meet the requirement of short-response time for the proposed driver monitoring and intervention system on the edge, we implemented and employed a segmentation method to extract 5 s windows with 4 s overlap.

2.5. Feature Extraction

Several spectral and temporal features were extracted based on the analysis of the generated spectrograms from every window. The results of the calculated features were labeled accordingly and our feature space was generated using these sample data points. Extracted features shown in Table 1 are explained below:
*
Mean (1): Sum of all the data points over total number of data points present in each window.
*
Variance (2): Average of squared distance from mean.
*
Accumulated GSR (3): Summation of GSR values in a window over total task time [11,12].
*
Maximum GSR value (4): Maximum phasic GSR value in a window.
*
Power (5): The sum of the absolute squares of a signals time-domain samples divided by the signal length, or, equivalently, the square of its root mean square (RMS) level.
*
Number of peaks based on second derivative (6): The phasic GSR, P ( i ) signal’s first and second derivatives, Q 0 ( i ) and Q 1 ( i ) are calculated as in [21], as follows:
Q 0 ( i ) = | P ( i + 1 ) P ( i 1 ) | ,
Q 1 ( i ) = | P ( i + 2 ) 2 P ( i ) + P ( i 2 ) | .
These two arrays are scaled and then summed:
Q 2 ( i ) = 1.3 Q 0 ( i ) + 1.1 Q 1 ( i ) .
This array is scanned until a threshold is met or exceeded:
Q 2 ( j ) 1.0 .
*
Summation of amplitude of peaks (7): The amplitude of number of peaks detected in a window previously is calculated and the amplitude is summed.
*
Short Time Fourier Transform (STFT) (8–11): A technique quite often used to analyze physiological signal is STFT. Analysis of signals both in frequency and time domain can be performed using STFT. Fourier Transform is applied independently to each segment after dividing the original signal into equal segments. In [13], STFT analysis on SC data showed effective results in detecting work load. Based on our observation via spectro-temporal analysis of the phasic skin response, we found that most of the activity changes occur in the range of 0 to 50 Hz; therefore, we extracted four STFT coefficients each representing a sub-band with 12.5 Hz bandwidth between 0 and 50 Hz.
*
Fractal Dimensions (FD) (12–13): A physiological signals chaotic or fractal nature is estimated using fractal dimension [22]. It is an impressive mathematical tool to model various complex physiological signals and an index to quantify the complexity of a fractal pattern. To characterize the time-series data, the fractal dimension technique is used generally. Fractal dimension was extracted using Higuchi and Katz methods.
*
Auto-Regressive (AR) (14–18): The present output by an AR model of order p is calculated based on linear combination of past p output values along with some noise terms. By computing the weights on the previous p outputs of an auto-regression model, the mean squared error prediction can be minimized. The model with current output value y ( n ) and zero mean white space noise input x ( n ) is:
y ( n ) = k = 1 p a ( k ) y ( n k ) = x ( n ) .
In this study, the parameters of order p were selected as the features for each window. The order of p was set to 5, which generated five AR features.

2.6. Feature Selection Using Support Vector Machine—Recursive Feature Elimination (SVM-RFE)

SVM-RFE feature selection methodology based on support vector machine (SVM) was introduced by [23] and the authors used this methodology to select an important subset of features. It reduces the computational time for classification along with improvement of classification accuracy. The basis of SVM-RFE is backward removal of features iteratively. The main steps for feature selection is as follows [18]:
  • Input the dataset to be classified,
  • Weight of each feature is calculated,
  • Removal of features having the smallest weight to obtain ranking of features.
Initially, entire features in the dataset is considered to compute ranking weights for all features. The linear SVM classifier is used to compute the rank weight of each feature. Then, iteratively, features with the lowest rank weight are discarded until only one feature remains in the dataset. Finally, the features will be listed in descending order of generated ranked weights. Algorithm 1 illustrates an iteration of SVM-RFE for ranking features in which the weight of each feature ω i is given by
ω i = i ( α i y i x i ) ,
where α is a linear SVM classifier used to train the given feature set F and is given by
α = S V M T r a i n ( X , y ) ,
and X is the input sample dataset and y is the corresponding class labels.
Algorithm 1: SVM-RFE for Ranking features
begin
  Given set of features, F X
  where X is the sample training dataset
  Ranked set of features, R
   repeat
   Train linear SVM with feature set F
   Calculate the weight of each feature ω i
    For each feature f F do
    Compute sorting standard: c i = ( ω i ) 2
    EndFor
   Find the feature with minimum weight, f m i n = a r g m i n {c}
   Update R = R { f m i n } ; F = F \ { f m i n } ;
   until all features are ranked
end: output R

2.7. Identification Task

To classify the original and transformed feature space, we employed linear and kernel-based support vector machine (SVM) with 10-fold cross validation. The SVM is a cutting edge discriminative learning model that aims to maximize the generalization capability of the predictor. SVM is a linear learner in nature, which can be boosted with nonlinearity using the kernel trick presented in [24]. SVM casts the input vector x into a scalar value g ( x ) as the output score,
g ( x ) = j = 1 N α j y j K ( x j , x ) + c ,
where the vectors { x j | j = 1 , .... , N } are the support vectors, N is the number of support vectors, α j > 0 are adjustable weights, y j = { 1 , + 1 } , c is the bias term, and the function K ( x j , x ) is the kernel function. For the 2-class classification, the class decision is made based on the sign of g ( x ) . As it can be seen, the classifier is constructed from sums of the kernel function expressed as,
K ( x j , x ) = ω ( x j ) t ω ( x ) ,
where ω ( x ) is a mapping from the input space to a possibly infinite dimensional space. To model the nonlinear characteristics of the input data, kernel SVM functions such as Radial Basis function (RBF) and Polynomial function with two degrees of freedom (Poly d = 2) also known as quadratic kernel were employed. In order to extend the SVM to the multi-class task in hand, we employed one-vs.-one framework [24].

3. Results and Discussion

In this section, we provide the experimental analysis and results of the proposed methodology. In order to meet the short response time to identify distraction, the decomposed phasic GSR signal was segmented into 5 s windows with 4 s overlap and then extracted 18 different spectral and temporal features from each window based on our spectrogram observations. We employed linear and kernel-based SVM on the data corresponding to each subject separately for training and then evaluation of the predictive model using 10-fold cross validation (10-CV). In 10-CV, the original dataset is partitioned into 10 equal size subsets. Of the 10 subsets, a single subset is retained as the validation data for testing the model, and the remaining nine subsamples are used as training data. The cross-validation process is then repeated 10 times (the folds), with each of the 10 subsets used exactly once as the validation data. The 10 results from the folds can then be averaged to produce a single estimation. The advantage of this method is that all observations are used for validation in exactly one out of the 10 iterations without being participated in the training process whatsoever for that iteration.

3.1. Identification Results Using All the Features

Initially, the original 18D feature space was evaluated for identification generalization accuracy estimates. Table 2 reports the predictive model performance using all 18 features. Table 2 demonstrates that using all the 18 features the nonlinear polynomial d = 2 kernel SVM classifier achieved the highest average prediction accuracy of 94.81%, which shows that some critical discriminative information underlies in the nonlinear feature subspaces. It generated the results with an average prediction speed of 6620 observations per second and average training time of 0.86 s. From analyzing Table 3, the polynomial kernel classifier also demonstrated an average precision of 92.44% implying low false positive rate, recall of 96.38% implying even lower false negative rate, and f-score of 94.35%, the harmonic mean of the precision and recall, which are consistent with the accuracy results. It can be observed that linear SVM with an average accuracy of 91.94% was the fastest classifier achieving an average prediction speed of 7240 observations per second and an average training time of 0.75 s with a marginal decrease in average accuracy of 2.87%. This shows promising evidence that the generated feature space was effective in identifying the inattention state of the driver subjects even in the case of linear prediction.

3.2. Identification Results Using SVM-RFE Selected Feature Subset

To overcome the redundancy in the feature space, and computational complexity of the predictive learner due to the high dimensionality of the feature space, we employed SVM-RFE feature selection method. SVM-RFE also ranks the original feature space based on the feature correlation to the class label. It iterates through all the features calculating the correlation between the feature and class label and assigns a weight based on the level of correlation to the class label as described in Section 2.6.
Since the feature selection performed by SVM-RFE is a binary class technique, we considered two scenarios, namely (1) normal vs. phone and (2) normal vs. text. Scenario (1) was considered as cognitive distraction and scenario (2) was considered as cognitive and visual distraction. Table 4 shows the SVM-RFE ranking for normal vs. phone scenario and Table 5 shows the ranking for scenario normal vs. text. We observed a slight variance in the feature ranks between Table 4 and Table 5. However, considering subsets of features with various sizes, consistently similar set of features were selected for both distracted scenarios. In addition, by inspecting subject by subject responses in Table 4 and Table 5, a high level of similarity in the feature ranks across all subjects is observed. These observations helped us pick a unified set of features that are highly relevant for the task of distraction identification and are consistent between different scenarios and subjects. The best subset of features were selected by calculating the most frequently occurring features across all subjects. We then selected the first seven frequently occurring features as they demonstrated higher 10-CV identification accuracies compared to other feature subset sizes. We observed that, despite the fact that the order of first seven features is different for both scenarios, the set of the best seven features were the same for both distraction scenarios. The selected seven best features are listed below with their corresponding feature number as shown in subsection 2.5: (i) number of peaks in a window (6); (ii) STFT’s 2nd feature (9); (iii) Katz fractal dimension (13); and (iv–vii) Auto-regressive features (14, 15, 17 and 18).
Table 6 and Table 7 provide the identification results corresponding to the selected subset of 7D feature space in comparison with the results from 18D space. From Table 6 results using the subset of features that SVM-RFE selected, it is observed that we were able to achieve the average identification accuracy of 93.01% with a prediction speed of 7490 observations per second using polynomial kernel SVM classifier. When compared to the original 18D feature space, the results showed a marginal decrease of 1.8% in identification accuracy while providing a considerable improvement in the prediction speed (increase of 870 observations per second). From Table 7, we can see that the predictive model generated after reducing the feature space to 7D also demonstrated similar performance as the 18D predictive model for precision, recall and f-score measure of 89.61%, 95.83% and 92.50%, respectively. This minimal performance decrease in the identification accuracy from the original 18D feature space to 7D is a trade-off that results in improved response time for the online identification task in hand. Reducing the dimensions of the feature space can also potentially alleviate the effect of curse of dimensionality and increase the robustness of the predictive model in the generalization phase.
Table 6 demonstrates the nonlinear polynomial and RBF kernel SVM classifier generated the highest average accuracy of 93.01% and 91.61% respectively compared to linear SVM classifier with 88.35%. From Table 7, it can be observed that polynomial kernel SVM demonstrates the highest average precision, recall and f-score of 89.61%, 95.83% and 92.50%, respectively, which is consistent with the results based on the original feature space. These observations indicate that the selected subsets of features did not disrupt the discriminative capabilities of the original feature space by following similar performance trends to identify the distracted state from the non-distracted state with minimal decrease in the prediction accuracy and a considerable gain in response time.

4. Conclusions

In this study, we investigated if a continuous measure of phasic GSR can be used to identify distracted driving state under naturalistic driving conditions using a wrist band wearable. In contrast to other state-of-the-art driver monitoring and alerting systems using intrusive physiological signal measures such as EEG and ECG, GSRs are minimally intrusive. We evaluated GSR toward real-time identification of distraction using short-term segmented windows. We conducted decomposition on the raw GSR to obtain a continuous phasic GSR signal that contained the most discriminative characteristics to identify driver distraction. We then generated high resolution spectrograms of the phasic GSR signal to visualize and better understand its behavior. We extracted 18 features including spectral and temporal features that capture the pattern changes from normal to distracted scenarios at a physiological level. The nature of feature extraction is manual and might include redundancies that can increase the computational complexity and decrease the robustness of the predictor. Therefore, we employed the SVM-RFE technique to alleviate this limitation. SVM-RFE then selected seven features based on the rank assigned to the features that best characterized the distracted state from the non-distracted state. We employed linear and kernel-based SVM with 10-fold cross validation (10-CV) to generate identification results on both the original 18D and reduced 7D feature space. Further investigating the results demonstrated marginal reduction in prediction accuracy and considerable increase in the prediction prediction speed. The results across the population of subjects also demonstrated a high level of consistency. Our proposed driver monitoring and identification system on the edge provided evident results using GSR as a reliable indicator of driver distraction while meeting the requirement of early notification of distraction state to driver.

Author Contributions

Vikas Rajendra contributed to the concept of the study, data analytics, writing and preparing the manuscript. Omid Dehzangi contributed to the concept of the study, data analytics, and writing the manuscript. Mojtaba Taherisadr contributed in data analytics and preparing the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dawson, D.; Searle, A.K.; Paterson, J.L. Look before you (s)leep: Evaluating the use of fatigue detection technologies within a fatigue risk management system for the road transport industry. Sleep Med. Rev. 2014, 18, 141–152. [Google Scholar] [CrossRef] [PubMed]
  2. Goshvarpour, A.; Abbasi, A.; Goshvarpour, A.; Daneshvar, S. Discrimination between different emotional states based on the chaotic behavior of galvanic skin responses. Signal Image Video Process. 2017, 11, 1347–1355. [Google Scholar] [CrossRef]
  3. Metz, B.; Schömig, N.; Krüger, H.P. Attention during visual secondary tasks in driving: Adaptation to the demands of the driving task. Transp. Res. Part F Traffic Psychol. Behav. 2011, 14, 369–380. [Google Scholar] [CrossRef]
  4. Young, K.L.; Lenné, M.G.; Williamson, A.R. Sensitivity of the lane change test as a measure of in-vehicle system demand. Appl. Ergon. 2011, 42, 611–618. [Google Scholar] [CrossRef] [PubMed]
  5. Wege, C.; Will, S.; Victor, T. Eye movement and brake reactions to real world brake-capacity forward collision warnings—A naturalistic driving study. Accid. Anal. Prev. 2013, 58, 259–270. [Google Scholar] [CrossRef] [PubMed]
  6. Alizadeh, V.; Dehzangi, O. The impact of secondary tasks on drivers during naturalistic driving: Analysis of EEG dynamics. In Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016; pp. 2493–2499. [Google Scholar]
  7. Wang, S.; Zhang, Y.; Wu, C.; Darvas, F.; Chaovalitwongse, W.A. Online prediction of driver distraction based on brain activity patterns. IEEE Trans. Intell. Transp. Syst. 2015, 16, 136–150. [Google Scholar] [CrossRef]
  8. Almahasneh, H.; Chooi, W.T.; Kamel, N.; Malik, A.S. Deep in thought while driving: An EEG study on drivers’ cognitive distraction. Transp. Res. Part F Traffic Psychol. Behav. 2014, 26, 218–226. [Google Scholar] [CrossRef]
  9. Ciabattoni, L.; Ferracuti, F.; Longhi, S.; Pepa, L.; Romeo, L.; Verdini, F. Real-time mental stress detection based on smartwatch. In Proceedings of the 2017 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 8–10 January 2017. [Google Scholar]
  10. Rajendra, V.; Dehzangi, O. Detection of distraction under naturalistic driving using Galvanic Skin Responses. In Proceedings of the 2017 IEEE 14th International Conference on Wearable and Implantable Body Sensor Networks (BSN), Eindhoven, The Netherlands, 9–12 May 2017; pp. 157–160. [Google Scholar]
  11. Nourbakhsh, N.; Wang, Y.; Chen, F. GSR and blink features for cognitive load classification. In Lecture Notes in Computer Science (LNCS); Springer: Berlin/Heidelberg, Germany, 2013; Volume 8117, pp. 159–166. [Google Scholar]
  12. Nourbakhsh, N.; Wang, Y.; Chen, F.; Calvo, R.A. Using galvanic skin response for cognitive load measurement in arithmetic and reading tasks. In Proceedings of the 24th Conference on Australian Computer-Human Interaction OzCHI ’12, Melbourne, Australia, 26–30 November 2012; pp. 420–423. [Google Scholar]
  13. Lew, R.; Dyre, B.P.; Werner, S.; Wotring, B. Exploring the Potential of Short-Time Fourier Transforms for Analyzing Skin Conductance and Pupillometry in Real-Time Applications. In Proceedings of the Human Factors and Ergonomics Society 52th Annual Meeting, New York, NY, USA, 22–26 September 2008. [Google Scholar]
  14. Ayata, D.; Yaslan, Y.; Kamasak, M. Emotion recognition via random forest and galvanic skin response: Comparison of time based feature sets, window sizes and wavelet approaches. In Proceedings of the 2016 Medical Technologies National Conference, TIPTEKNO 2016, Antalya, Turkey, 27–29 October 2017. [Google Scholar]
  15. Liu, M.; Fan, D.; Zhang, X.; Gong, X. Human Emotion Recognition Based on Galvanic Skin Response Signal Feature Selection and SVM. In Proceedings of the 2016 International Conference on Smart City and Systems Engineering (ICSCSE), Hunan, China, 25–26 November 2016; pp. 157–160. [Google Scholar]
  16. Chen, L.; Zhao, Y.; Ye, P.; Zhang, J.; Zou, J. Detecting driving stress in physiological signals based on multimodal feature analysis and kernel classifiers. Expert Syst. Appl. 2017, 85, 279–291. [Google Scholar] [CrossRef]
  17. Benedek, M.; Kaernbach, C. A continuous measure of phasic electrodermal activity. J. Neurosci. Methods 2010, 190, 80–91. [Google Scholar] [CrossRef] [PubMed]
  18. Huang, M.L.; Hung, Y.H.; Lee, W.M.; Li, R.K.; Jiang, B.R. SVM-RFE based feature selection and taguchi parameters optimization for multiclass SVM Classifier. Sci. World J. 2014, 2014. [Google Scholar] [CrossRef] [PubMed]
  19. Dehzangi, O.; Williams, C. Towards multi-modal wearable driver monitoring: Impact of road condition on driver distraction. In Proceedings of the 2015 IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN), Cambridge, MA, USA, 9–12 June 2015; pp. 1–6. [Google Scholar]
  20. Pedersen, F. Joint Time Frequency Analysis in Digital Signal Processing. Ph.D. Thesis, Aalborg Universitetsforlag, Aalborg, Denmark, 1997. [Google Scholar]
  21. Kher, R.; Vala, D.; Pawar, T.; Thakar, V.K. Implementation of derivative based QRS complex detection methods. In Proceedings of the 2010 3rd International Conference on Biomedical Engineering and Informatics, BMEI 2010, Yantai, China, 16–18 October 2010; Volume 3, pp. 927–931. [Google Scholar]
  22. Esteller, R.; Vachtsevanos, G.; Echauz, J.; Litt, B. A Comparison of waveform fractal dimension algorithms. IEEE Trans. Circuits Syst. I Fundam. Theory Appl. 2001, 48, 177–183. [Google Scholar] [CrossRef]
  23. Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
  24. Vapnik, V. Support vector machine. Mach. Learn. 1995, 20, 273–297. [Google Scholar]
Figure 1. Flowchart of the proposed driver monitoring and intervention system on the edge. Galvanic Skin Responses (GSR); Continuous Decomposition Analysis (CDA); 10-fold Cross Validation (10-CV); Support Vector Machine Recursive Feature Elimination (SVM-RFE).
Figure 1. Flowchart of the proposed driver monitoring and intervention system on the edge. Galvanic Skin Responses (GSR); Continuous Decomposition Analysis (CDA); 10-fold Cross Validation (10-CV); Support Vector Machine Recursive Feature Elimination (SVM-RFE).
Sensors 18 00503 g001
Figure 2. Experimental setup.
Figure 2. Experimental setup.
Sensors 18 00503 g002
Figure 3. Illustrates data collection order during our naturalistic driving experiment.
Figure 3. Illustrates data collection order during our naturalistic driving experiment.
Sensors 18 00503 g003
Figure 4. Continuous decomposition analysis (CDA) for normal scenario. (top): original skin conductance (SC) signal; (middle): decomposed Tonic and Phasic driver and (bottom): Phasic driver.
Figure 4. Continuous decomposition analysis (CDA) for normal scenario. (top): original skin conductance (SC) signal; (middle): decomposed Tonic and Phasic driver and (bottom): Phasic driver.
Sensors 18 00503 g004
Figure 5. Continuous decomposition analysis (CDA) for distracted scenario, (top): original skin conductance (SC) signal; (middle): decomposed Tonic and Phasic driver and (bottom): Phasic driver.
Figure 5. Continuous decomposition analysis (CDA) for distracted scenario, (top): original skin conductance (SC) signal; (middle): decomposed Tonic and Phasic driver and (bottom): Phasic driver.
Sensors 18 00503 g005
Figure 6. High-resolution time-frequency (TF) representation of the phasic component of GSR signal (for each sub-plot): (top): Phasic GSR Signal; (left): Energy Spectral Density and (Spectrograph): Time on the x-axis and frequency on the y-axis, and the color is used to indicate the power of the time-frequency sample.
Figure 6. High-resolution time-frequency (TF) representation of the phasic component of GSR signal (for each sub-plot): (top): Phasic GSR Signal; (left): Energy Spectral Density and (Spectrograph): Time on the x-axis and frequency on the y-axis, and the color is used to indicate the power of the time-frequency sample.
Sensors 18 00503 g006
Table 1. List of all the extracted spectral and spacial/temporal features.
Table 1. List of all the extracted spectral and spacial/temporal features.
Feature DomainFeature Names
Spacial/temporalMean, Variance, Accumulated galvanic skin response (GSR), Average GSR, Maximum Value,
Number of peaks and Sum of Amplitude of peaks,
Fractal Dimensions, Auto Regressive
SpectralShort Term Fourier Transforms (4 features)
Table 2. Identification results using linear and kernel-based (polynomial d = 2 and radial basis function) support vector machine (SVM) including accuracy, prediction speed, and training time with 18D feature space.
Table 2. Identification results using linear and kernel-based (polynomial d = 2 and radial basis function) support vector machine (SVM) including accuracy, prediction speed, and training time with 18D feature space.
SubjectsClassifier Performance
Support Vector Machine
Linear SVMPoly d = 2Radial Basis Function (RBF)
AccuracyPrediction SpeedTrainingAccuracyPrediction SpeedTrainingAccuracyPrediction SpeedTraining
in %in obs/secTime in secin %in obs/secTime in secin %in obs/secTime in sec
Subject 182.906600.000.7087.005700.000.7485.405100.000.78
Subject 297.307700.000.7597.307100.000.7996.706800.000.81
Subject 392.306100.000.6894.404900.000.7790.604700.000.79
Subject 499.506100.000.7499.205400.000.7396.805300.000.79
Subject 587.106200.000.7996.406400.001.3787.406200.000.76
Subject 688.608000.000.7493.607100.000.7792.807300.000.89
Subject 798.006300.000.7597.006300.000.7397.506200.000.78
Subject 896.2010000.000.7597.408900.000.8095.908500.000.90
Subject 984.008500.000.8690.608000.001.0887.407600.000.86
Subject 1093.506900.000.7595.206400.000.8189.206300.000.81
Average91.947240.000.7594.816620.000.8691.976400.000.82
Table 3. 10-fold cross validation (10-CV) identification results using linear and kernel-based (polynomial d = 2 and radial basis function) support vector machine (SVM) including accuracy, precision, recall and F-Score generated with 18D feature space.
Table 3. 10-fold cross validation (10-CV) identification results using linear and kernel-based (polynomial d = 2 and radial basis function) support vector machine (SVM) including accuracy, precision, recall and F-Score generated with 18D feature space.
SubjectsPerformance Measures in %
Support Vector Machine
Linear SVMPoly d = 2Radial Basis Function (RBF)
AccuracyPrecisionRecallF-ScoreAccuracyPrecisionRecallF-ScoreAccuracyPrecisionRecallF-Score
Subject 182.9075.5780.8583.4487.0084.4284.4984.6985.4076.4793.4684.12
Subject 297.3095.7099.6397.6297.3096.3698.8897.6196.7095.9998.1397.05
Subject 392.3084.71100.0091.7294.4088.8299.3193.7790.6081.82100.0090.00
Subject 499.5099.3399.3399.3399.2098.04100.0099.0196.8097.2694.6795.95
Subject 587.1084.4288.4286.3896.4095.3496.8496.0887.4084.1689.4786.73
Subject 688.6075.1590.7182.2093.6085.1694.2989.4992.8085.2390.7187.89
Subject 798.0098.3197.2297.7797.0096.1597.2296.6997.5098.8595.5697.18
Subject 896.2096.6798.5197.5897.4096.9199.7998.3395.9095.3399.5897.40
Subject 984.0079.7896.0087.1490.6086.9898.0092.1687.4084.7894.6789.45
Subject 1093.5093.4795.0294.2495.2096.2295.0295.6289.2090.5090.8790.68
Average91.9488.3194.5791.7494.8192.4496.3894.3591.9789.0494.7191.65
Table 4. Support vector machine-recursive feature elimination (SVM-RFE) feature ranking for normal vs. phone.
Table 4. Support vector machine-recursive feature elimination (SVM-RFE) feature ranking for normal vs. phone.
SubjectsNormal vs. Phone
RANK
123456789101112131415161718
Subject 1613151714181695112783114210
Subject 2161561714918131512711832410
Subject 3161561417189131251781132410
Subject 4151316146189171571211832410
Subject 5115617161418913512711832104
Subject 6151317614189165112248311710
Subject 7615141691817131211751423810
Subject 8615131491618171251711834210
Subject 9617115141813169512711832104
Subject 10156141718139165121117243810
Frequent Feature61561714189135512711832410
Table 5. Support vector machine-recursive feature elimination (SVM-RFE) feature ranking for normal vs. text.
Table 5. Support vector machine-recursive feature elimination (SVM-RFE) feature ranking for normal vs. text.
SubjectsNormal vs. Text
RANK
123456789101112131415161718
Subject 1156171418916135112711382104
Subject 2156171418169131271151381024
Subject 3561517181416913112711342810
Subject 4171513181461691125711382104
Subject 5617151614189135112711382410
Subject 6171661518149131241258311710
Subject 7151791418161312811107654321
Subject 8515617141816913127811104321
Subject 9561715181491613112711832410
Subject 10151712141813961611518324710
Frequent Feature1561714181491313112711382210
Table 6. Identification results of linear and kernel-based (polynomial d = 2 and radial basis function) support vector machine (SVM) including accuracy, prediction speed, and training time with the reduced 7D Feature Space.
Table 6. Identification results of linear and kernel-based (polynomial d = 2 and radial basis function) support vector machine (SVM) including accuracy, prediction speed, and training time with the reduced 7D Feature Space.
SubjectsClassifier Performance
Support Vector Machine
Linear SVMPoly d = 2Radial Basis Function (RBF)
AccuracyPrediction SpeedTrainingAccuracyPrediction SpeedTrainingAccuracyPrediction SpeedTraining
in %in obs/secTime in secin %in obs/secTime in secin %in obs/secTime in sec
Subject 179.707000.000.6885.606200.001.4284.606300.000.69
Subject 297.109200.000.6896.308600.001.8997.308000.000.76
Subject 385.806600.000.7591.205700.001.3988.505800.000.71
Subject 499.707300.000.6699.705700.000.7998.705000.000.77
Subject 575.007800.001.2795.407300.003.7088.806500.000.78
Subject 682.908800.000.7588.608700.000.7686.808000.000.70
Subject 794.007900.000.6596.307100.000.6995.805500.000.78
Subject 892.5012000.000.7893.8010000.000.7794.609600.000.89
Subject 981.409800.001.4988.008400.001.1386.508200.000.77
Subject 1095.408200.000.6695.207200.000.7794.507000.000.70
Average88.358460.000.8493.017490.001.3391.616990.000.76
Table 7. 10-fold cross validation (10-CV) identification results using linear and kernel-based (polynomial d = 2 and radial basis function) support vector machine (SVM) including accuracy, precision, recall and F-Score with the reduced 7D feature space.
Table 7. 10-fold cross validation (10-CV) identification results using linear and kernel-based (polynomial d = 2 and radial basis function) support vector machine (SVM) including accuracy, precision, recall and F-Score with the reduced 7D feature space.
SubjectsPerformance Measures in %
Support Vector Machine
Linear SVMPoly d = 2Radial Basis Function (RBF)
AccuracyPrecisionRecallF-ScoreAccuracyPrecisionRecallF-ScoreAccuracyPrecisionRecallF-Score
Subject 179.7072.9481.0576.7885.6080.1286.9383.3984.6075.2693.4683.38
Subject 297.1095.3699.6397.4596.3095.2998.1396.6997.3095.7099.6397.62
Subject 385.8075.2699.3185.6391.2083.1499.3190.5188.5078.69100.0088.07
Subject 499.7099.34100.0099.6799.7099.34100.0099.6798.7098.0198.6798.34
Subject 575.0069.1682.6375.3095.4093.8596.3295.0688.8083.3394.7488.67
Subject 682.9065.4186.4374.4688.6075.7689.2981.9786.8075.0081.4378.08
Subject 794.0096.4390.0093.1096.3096.0995.5695.8295.8095.5395.0095.26
Subject 892.5092.9397.6695.2493.8093.3998.9496.0894.6093.9899.3696.59
Subject 981.4076.9495.6785.2988.0083.3398.3390.2186.5081.4998.3389.12
Subject 1095.4096.6295.0295.8295.2095.8395.4495.6394.5094.2995.8595.06
Average88.3584.0492.7487.8793.0189.6195.8392.5091.6187.1395.6591.02

Share and Cite

MDPI and ACS Style

Dehzangi, O.; Rajendra, V.; Taherisadr, M. Wearable Driver Distraction Identification On-The-Road via Continuous Decomposition of Galvanic Skin Responses. Sensors 2018, 18, 503. https://doi.org/10.3390/s18020503

AMA Style

Dehzangi O, Rajendra V, Taherisadr M. Wearable Driver Distraction Identification On-The-Road via Continuous Decomposition of Galvanic Skin Responses. Sensors. 2018; 18(2):503. https://doi.org/10.3390/s18020503

Chicago/Turabian Style

Dehzangi, Omid, Vikas Rajendra, and Mojtaba Taherisadr. 2018. "Wearable Driver Distraction Identification On-The-Road via Continuous Decomposition of Galvanic Skin Responses" Sensors 18, no. 2: 503. https://doi.org/10.3390/s18020503

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop