EEG Signal Multichannel Frequency-Domain Ratio Indices for Drowsiness Detection Based on Multicriteria Optimization

Drowsiness is a risk to human lives in many occupations and activities where full awareness is essential for the safe operation of systems and vehicles, such as driving a car or flying an airplane. Although it is one of the main causes of many road accidents, there is still no reliable definition of drowsiness or a system to reliably detect it. Many researchers have observed correlations between frequency-domain features of the EEG signal and drowsiness, such as an increase in the spectral power of the theta band or a decrease in the spectral power of the beta band. In addition, features calculated as ratio indices between these frequency-domain features show further improvements in detecting drowsiness compared to frequency-domain features alone. This work aims to develop novel multichannel ratio indices that take advantage of the diversity of frequency-domain features from different brain regions. In contrast to the state-of-the-art, we use an evolutionary metaheuristic algorithm to find the nearly optimal set of features and channels from which the indices are calculated. Our results show that drowsiness is best described by the powers in delta and alpha bands. Compared to seven existing single-channel ratio indices, our two novel six-channel indices show improvements in (1) statistically significant differences observed between wakefulness and drowsiness segments, (2) precision of drowsiness detection and classification accuracy of the XGBoost algorithm and (3) model performance by saving time and memory during classification. Our work suggests that a more precise definition of drowsiness is needed, and that accurate early detection of drowsiness should be based on multichannel frequency-domain features.


Introduction
Drowsiness is the intermediate state between wakefulness and sleep [1]. Terms such as sleepiness or tiredness are used synonymously with drowsiness in related studies [2][3][4]. Although it is intuitively clear what drowsiness is, it is not so easy to determine exactly whether a person is in a drowsy state or not. The reason for this is the unclear definition of drowsiness. Some researchers define drowsiness as stage 1 sleep (S1) [5][6][7][8][9], which is also known as non-rapid eye movement 1 (NREM 1) sleep. Da Silveira et al. [10] used S1 sleep stage data in their research of drowsiness. Johns [11] claims that the S1 sleep stage is equivalent to microsleep (episodes of psychomotor insensitivity due to sleeprelated wakefulness loss [12]), while drowsiness is stated to occur before S1 sleep, but it is not stated when it begins and what characterizes it. Researchers who do not use any of the aforementioned definitions of drowsiness typically use a subjective assessment of drowsiness, e.g., the Karolinska sleepiness scale [13]. In this paper, the term drowsiness is used as a synonym for the S1 sleep stage.
In a drowsy state, people are not able to function at the level required to safely perform an activity [14], due to the progressive loss of cortical processing efficiency [15]. Drowsiness is, therefore, a significant risk factor for human lives in many occupations, e.g., for air traffic controllers, pilots and regular car drivers [16]. According to the reports from NASA [17] and the National Transportation Safety Board [18], one of the main factors in road and air accidents is drowsiness. Gonçalves et al. [19] conducted a study across 19 European countries and concluded that in the last two years, 17% of drivers fell asleep while driving, while 7% of them had an accident due to drowsiness. The high frequency and prevalence of drowsiness-related accidents speak in favor of the development of early drowsiness detection systems, which is the subject of this paper.
Many researchers are trying to solve the problem of early detection of drowsiness in drivers. Balandong et al. [20], in their recent review, divided the techniques for detecting driver drowsiness into six categories: (1) subjective measures, (2) vehicle-based systems, (3) driver's behavior-based systems, (4) mathematical models of sleep-wake dynamics, (5) human physiological signal-based systems and (6) hybrids of one or more of these techniques. Currently, the most common techniques used in practice are vehicle-based systems [5], but these systems are mostly unreliable and depend largely on the driver's motivation to drive as well as possible [20].
Physiological signals are the promising alternative for reliable drowsiness detection [21]. The main problem with this approach is that these systems are often not easy to use and are intrusive to drivers [20]. Nevertheless, many researchers are working on small, automated and wearable devices [21][22][23][24], or on steering wheel devices [25,26] in order to overcome these obstacles. Techniques for detecting drowsiness based on physiological signals can be further subdivided according to the type of signal used, such as electroencephalogram (EEG) [27], electrooculogram (EOG) [28] or electrocardiogram (ECG) [29].
The most studied and applied physiological signal to detect drowsiness is the EEG. In this paper, frequency-domain features of the EEG signal are analyzed and two novel multichannel ratio indices for the detection of drowsiness are proposed. Besides the frequency-domain features, there are also other types of features: (1) nonlinear features [30], (2) spatiotemporal (functional connectivity) features [31] and (3) entropies [32]. These three groups of features have a lower frequency of use compared to the frequency-domain features, so in this paper, we focus only on frequency-domain features. Based on the recent review [33] of EEG-based drowsiness detection systems, 61% of the included papers used frequency-domain features, 38% used entropies, 10% used nonlinear features and 10% used spatiotemporal features (some papers used multiple groups of features, so the sum of the percentages is greater than 100%). This shows the difference in the use of drowsiness detection systems, and the difference is even greater in the general field of neurophysiological scientific papers. Although the three feature groups mentioned above are used less frequently, there are still a certain number of papers that include them, especially entropies.
All these frequency-domain features and ratio indices are calculated from a single EEG channel, i.e., from a single brain region. In recent research, Wang et al. [38] showed that the significance of a decrease in delta and an increase in (θ + α)/β indices depends on the brain region. This significant diversity of the correlation of features with drowsiness in different brain regions is the motivation for this research. Since all currently used frequency-domain features and ratio indices are based on a single channel (single brain region), this work aims to use the best distinguishing features of each brain region for the detection of drowsiness and to combine them into a single multichannel ratio index feature.
In our work, we use a computational method based on multicriteria optimization to extract the multichannel EEG-based frequency-domain ratio index features. This method allows us to discover new multichannel ratio indices that show improvements in the detection of drowsiness compared to single-channel ratio indices. Finally, with the use of machine learning models, we prove that multichannel indices detect drowsiness with higher accuracy, higher precision, reduced memory and faster computation compared to single-channel features.
In the Materials and Methods Section, we show the methodology of our work, including a description of the dataset, preprocessing and feature extraction methods used. Novel multichannel ratio indices and the multi-objective optimization method are also described there. In the Results Section, we present the results of our work, including statistical analysis, drowsiness prediction and computational properties of the proposed indices. In the Discussion Section, we discuss in more detail the topics covered in this paper. Finally, in the last section, we conclude the paper.

Dataset, Preprocessing and Feature Extraction
The data used in this paper were obtained from the PhysioNet portal [39], in particular from the 2018 PhysioNet computing in cardiology challenge [40]. The original dataset contains data records from 1985 subjects, and each recording includes a six-channel EEG, an electrooculogram, an electromyogram, a respiration signal from the abdomen and chest, airflow and oxygen saturation signals and a single-channel electrocardiogram during the all-night sleep. The records were divided into training and test sets of equal size. The sleep stages [41] of all subjects were annotated by clinical staff based on the American Academy of Sleep Medicine (AASM) manual for the scoring of sleep [42]. There are six types of annotations for different stages: wakefulness (W), stage 1 (S1), stage 2 (S2), stage 3 (S3), rapid eye movement (REM) and undefined.
In this research, we wanted to use a training set (992 subjects) to detect drowsiness. The officially provided way of acquiring the data is through torrent download, but we managed to download only 393 subjects completely, due to a lack of seeders. Of these 393 subjects, EEG signal recordings from 28 subjects were selected, based on the condition that each recording had at least 300 s of the W stage and, immediately after that, at least 300 s of the S1 stage. From each recording, a fragment of 600 s (300 s of W stage and 300 s of S1 stage) was used for analysis. In the original dataset, each EEG signal recording consists of six channels (F3, F4, C3, C4, O1 and O2, based on the International 10/20 System), with a sampling frequency of 200 Hz. Table 1 shows the identification numbers of all the selected subjects. The subjects were divided into two groups, one group used for training of the model (16 subjects) and the other one for the test of the obtained models (12 subjects). The training set was used to obtain novel ratio indices (with the method described below) and the test set was used to check these novel indices on the unseen data. Before feature extraction, the EEG signal must be filtered. For this purpose, the DC component was removed from the signal and the signal was filtered with a Butterworth filter to remove high-frequency artifacts and low-frequency drifts. We used the sixth-order Butterworth filter, the low-cut frequency of 1 Hz and the high-cut frequency of 40 Hz. In the selected fragments of the recordings, there was an insignificant number of eye-related artifacts, so we decided not to use the independent component analysis for their removal in order to prevent potential information loss due to component removal.
The signals were divided into epochs to calculate features. The epochs were five seconds long with a 50% overlap between them. Frequency-domain features are often used in EEG signal analysis. These features were extracted from the power spectral density (PSD) of the signal. To obtain the PSD of the signal, Welch's method [43] was used. Welch's method is used more often than Fast Fourier transform in the field of EEG signal analysis since it produces PSD with lower variance. The standard frequency-domain features were calculated, i.e., delta (δ, 0.5-4 Hz), theta (θ, 4-8 Hz), alpha (α, 8-12 Hz) and beta (β, 12-30 Hz) bands. We also calculated the less frequently used frequency-domain features, i.e., gamma (γ, >30 Hz), sigma (σ, 12-14 Hz), low alpha (α1, 8-10 Hz) and high alpha (α2, 10-12 Hz) bands [44].

Novel Multichannel Ratio Indices
Ratios between frequency-domain features have often been used as new features in different areas of EEG signal analysis [10,36]. All these features have a simple mathematical formulation but often lead to an improvement in detection and reduction of dimensionality for drowsiness. Moreover, they are calculated based on a single channel only. The idea behind the novel indices we present in this work is to design the feature formulation in such a way that frequency-domain features from different channels can be combined. Figure 1 illustrates the difference between these two approaches. For simplicity of visualization, only four epochs, two channels (F3 and F4) and three features per channel are shown in Figure 1. We define a new index, I, for each epoch, e, which is calculated as a ratio of the feature values, F(e), for all six channels in the epoch, e. In both the nominator and denominator, the feature value of each channel, j, is multiplied with a dedicated coefficient, C ij or K ij respectively, as indicated in the Equation (1): The purpose of the coefficients is to reduce or even eliminate the influence of certain channels of frequency-domain features, by setting the value in the range [0, 1 , or increase the influence of certain channels of the frequency-domain features by setting the corresponding coefficient to a value in the range [1, ∞ . There are 48 (6 channels and 8 features per channel) C coefficients and 48 K coefficients.
The ideal output of I(e) should look like a step function (or an inverse step function), which would indicate a clear difference between the two stages: W and S1. Figure 2 illustrates the main features of the output. The output can be divided into two parts: the left one corresponds to stage W and the right one to S1. While the output in each part should be as smooth as possible, i.e., with minimal oscillations, it is expected that there will be a transition period between the phases, which may have significant oscillations. This transition period would ideally be the step function, but in realistic settings, it is expected that the transition between phases of brain activity will probably last several epochs and would not be considered as either stage W or S1. In order to determine the appropriate value of the coefficients that would provide the output as close as possible to the ideal, at least two criteria must be taken into account: the absolute difference between the mean values left and right of the transition window and the quantification of the oscillations in each part. This can be defined as a multi-objective optimization problem that we want to solve using a metaheuristic multi-objective evolutionary optimization method, as described in the next section. To the best of our knowledge, this state transition problem has never been approached with evolutionary computation.

Multi-Objective Optimization
The optimization of a step function that is representative of the problem of flat surfaces is generally a challenge for any optimization algorithm because it does not provide information about which direction is favorable and an algorithm can get stuck on one of the flat plateaus [45]. To overcome this challenge, instead of optimizing the function according to one criterion, we define two objectives that we optimize simultaneously: (1) to maximize the absolute difference between the mean value of I(e) output for the W and S1 stages, and (2) to minimize the oscillations of the output value around the mean value in each stage. According to Figure 2, the left part of the I(e) output occurring before the transition phase corresponds to the W stage, and the right part, occurring after the transition phase, corresponds to the S1 stage. Since optimization problems are usually expressed as minimization problems, where the first objective function, O1, is defined as the inverse absolute difference between the mean value of I(e) of the left part (avg left ) and the right part (avg right ), Equation (2) is established: The second objective function, O2, expresses the oscillations in the function and is defined as the number of times the difference between the output values of I(e) for two adjacent epochs was greater than a given limit. The exact value of this limit will be discussed later in this section as it is closely related to the specifics of the optimization method used. The main goal of the objective function O2 is to minimize the influence of the biggest flaw in the way that the objective O1 is calculated, i.e., to use the averaging function. For example, if a possible solution is a completely straight line, except for a large negative spike in the left part and a large positive spike in the right part, based only on the objective function O1, this would be a good solution, while the objective function O2 would penalize this solution.
As mentioned above, the transition between two stages will probably take several epochs and show significant oscillations of the function output values. According to the annotation made by clinical personnel, the transition phase should be approximately in the middle of the I(e) output, but it cannot be determined exactly how long it will last. In our work, which is based on expert knowledge of human behavior in the case of drowsiness, we assume that it lasts about one minute, which corresponds to about 30 epochs. Within the transition window, neither one of the two objective functions is calculated, since it is assumed to belong neither to the W nor to the S1 stages. We also allow it to move around the center, shifting left and right, due to a possible error of the human observer who marked the data.
The multi-objective optimization problem can now be expressed as min{O1,O2}, where O1 and O2 are the conflicting objective functions, as defined above. The evolutionary metaheuristic algorithm NSGA-II [46] was applied to solve this multi-objective optimization problem. The genetic algorithms (GAs) are normally used to solve complex optimization and search problems [47]. NSGA-II is one of the most popular evolutionary multicriteria optimization methods due to its versatility and ability to easily adapt to different types of optimization problems. The strong points of this MO algorithm are: (1) the fast nondominated sorting ranking selection method used to emphasize Pareto-optimal solutions, (2) maintaining the population diversity by using the crowding distance and (3) the elitism approach, which ensures the preservation of best candidates through generations without the setting of any new parameters other than the normal genetic algorithm parameters, such as population size, termination parameter, crossover and mutation probabilities. Additionally, it was often used for the elimination of EEG channels with the similar purpose as in our case-dimensionality reduction [48]. This paper uses the implementation of NSGA-II provided by the MOEA framework [49] and is based on the guidelines defined in [46,50].
NSGA-II was used with the following configuration. The chromosome was divided into two parts: in the first part, genes represented the nominator coefficient values (C ij ), and in the second part, genes represented the denominator coefficient values (K ij ). In each part, the genes were grouped by frequency-domain features and channels, as illustrated in Figure 3. The genes were encoded as real values in the range [0.0, 10.0], and standard NSGA-II crossover and mutation operators were used to support operation on real values. Each solution is evaluated based on the values of objectives O1 and O2, as described in the pseudocode in Algorithm 1. First, the chromosome is decoded (line 1). Then, for each test fragment, two values are calculated: (1) the inverse absolute difference (IAD) between the mean index value, I(e), of the left part and the right part, represented by the invAbsDiff variable in the pseudocode, and (2) the oscillations in the function, represented by the oscillation variable in the pseudocode (lines 3-5). Finally, the value of each objective O1 for the given solution is defined as the average value of invAbsDiff for all test fragments, and the value of objective O2 is defined as the average value of oscillation for all test fragments (lines 7-8). The algorithm for the IAD calculation is provided in the pseudocode in Algorithm 2. The calculation of the IAD for each fragment was slightly modified compared to Equation (1) to allow a faster convergence of the search algorithm. The transition phase was not in the same position in each fragment but allowed to move more loosely away from the center because the annotation in the original dataset was performed manually and there was a possibility of human error in case the observer would register a transition from W to S1 a little too early or too late. The algorithm allows the transition phase to begin no earlier than 30 epochs from the fragment start, and end no later than 60 epochs before the fragment end (line 2). The algorithm assumes the transition phase by looking for a window of 30 epochs which has the maximum difference of index, I(e), values between the left and the right part (lines 9-13).
The gradation of the absolute difference between the mean value of the left and the right parts is also introduced (lines 19-22) to allow easier and faster convergence of the algorithm. The optimization of the objective O1 can be considered as an optimization problem with soft constraints that are related to how much O1 deviates from the optimal value. However, it is quite difficult to determine the optimal value precisely a priori. As indicated in [51,52], constraints are often treated with penalties in optimization techniques. The basic idea is to transform a constrained optimization problem into an unconstrained one by introducing a penalty into the original objective function to penalize violations of constraints. According to a comprehensive overview in [51], the penalty should be based on the degree of constraint violation of an individual. In [53], it is also recommended that instead of having just one fixed penalty coefficient, the penalty coefficient should increase when higher levels of constraint violation are reached. The greatest challenge, however, is to determine the exact penalty values. If the penalty is too high or too low, evolutionary algorithms spend either too much or too little time exploring the infeasible region, so it is necessary to find the right trade-off between the objective function and the penalty function so that the search moves towards the optimum in the feasible space. As the authors have shown in [54], the choice of penalty boundaries is problem-dependent and difficult to generalize. Since we cannot strictly determine the optimal value of O1 in our case, we have chosen several thresholds for the absolute difference value, with the penalty increasing by a factor of 10 for each new threshold. The exact thresholds were selected based on the experience gained from the first few trial runs of the algorithm. Based on the observations from the trial runs, a third modification was also introduced: the difference is calculated with a relative, instead of absolute, value of I(e). The relative value of I(e) is calculated by using the lowest I(e) value as a reference point, instead of zero, i.e., the zero is "moved", as shown in code lines 16-18 in Algorithm 2. The pseudocode for calculating the oscillations in the function as the second objective, O2, is provided in Algorithm 3. Again, the optimization of the oscillations can be considered a constrained optimization problem, so that, in the same way as in the case of the IAD calculation discussed previously, a gradation of the difference between the output values of I(e) for two adjacent epochs is used to penalize the larger differences more severely (lines 7-10 and 15-18). The exact thresholds were chosen based on the experience gained from the first few trial runs of the algorithm. In order to make the algorithm converge more easily and quickly, the concept of "moved zero" was used again (lines 2, 3, 6 and 14).  Finally, to further minimize the oscillations, and help the search algorithm converge more quickly, the maximum change in the I(e) value between two adjacent epochs is set to 10% of the first of the two epochs. The mathematical formulation of this limit is provided in Equation (3):

Results
The optimization algorithm was executed over 107 generations, using 100 randomly selected chromosomes as a starting point. Ideally, the optimization algorithm would have many C and K coefficients equal to zero and only a few non-zero coefficients in order to obtain a simple and easily understandable mathematical formulation of a novel multichannel ratio index. Unfortunately, even the best solutions of the optimization algorithm had only up to 20 C and K coefficients equal to zero. Although such a novel multichannel ratio index showed good behavior in detecting drowsiness, it is impractical to use a formula with 76 coefficients. We consider anything above 15 coefficients to be impractical.
In order to reduce the number of coefficients and to simplify the formulation of the novel multichannel ratio index, some coefficients were manually set to zero. In order to decide which coefficients have the least influence on the final solution, we counted how often a large value of the coefficient is fixed to a certain frequency-domain feature. By analyzing the coefficients of all solutions in the final population of the optimization algorithm, we concluded that the most frequently selected features were δ, α, α1 and α2. After manually fixing the coefficients of all other frequency-domain features to zero, the search range for the optimization algorithm was reduced to half.
Although 48 C and K coefficients remained in the solution at that time, the algorithm provided equally good results in terms of drowsiness detection, but with a much simpler mathematical formulation. In addition to the 48 coefficients that were manually set to zero, the algorithm often set many more coefficients to zero. A decision on the best solution in the final population was made based on the O1 and O2 values of the optimization algorithm in combination with the number of coefficients set to zero after using the floor operator on the coefficients. The floor operator was used to simplify the equation by removing the decimal numbers. Preferred solutions are those with a higher number of coefficients set to zero. Our choice was the solution with 13 non-zero coefficients, as shown in Equation (4): All C and K coefficients were rounded to a lower value (floor operator). Here, e represents the current epoch and all the features on the right side were from that same epoch.
The goal of the second condition of the optimization algorithm was to minimize the oscillations of the I(e) function. The results were much better with this condition than without it, but the resulting function still oscillated strongly. In order to additionally minimize the oscillations, a limitation was performed. The maximum change between any two adjacent samples was set to 10% of the value of the first sample. Equation (5) shows the mathematical formulation of this limitation of the maximum change: where I1(e) is defined by Equation (4) and e is the current epoch. Limiting the maximum change of adjacent samples further improves the detection model, and therefore Equation (5) presents the first novel multichannel ratio index.
We have tried to further simplify the formulation of the multichannel ratio index. This time, brute force search for the best solution was applied with the following constraints: (1) encoding of all C and K coefficients was set to integer values of zero or one for the sake of simplicity, and (2) a maximum of five addends in the equation was allowed. With these constraints, we obtained Equation (6): Again, similar to the first index, the maximum change was limited, so that the final equation for the second ratio index was obtained as: I2(e), else where I2(e) is defined by Equation (6) and e is the current epoch. After obtaining the two novel indices, they were normalized to the range [0, 1] for each subject to eliminate interindividual differences between the subjects. The two novel multichannel ratio indices defined by Equations (5) and (7) were compared with the seven existing indices θ/α and β/α [36], (θ + α)/β, θ/β and (θ + α)/(α + β) [37], and γ/δ and (γ + β)/(δ + α) [10]. The indices γ/δ and (γ + β)/(δ + α) were calculated based on the wavelet transform, i.e., in the same way as in the original paper. Figure 4 shows a comparison of our novel indices with the best and the worst channel for θ/α and (θ + α)/β single-channel indices for subject tr08-0111. These two single-channel indices were selected because they are the best predictors of drowsiness for a given subject among all single-channel indices.

Statistical Analysis
The Wilcoxon signed-rank test [55] was used to analyze the statistical differences between the awake state and the S1 state. This test was chosen because it refers to data that do not necessarily follow the normal distribution. Table 2 shows p-values for each subject in the training set and each index. The significance level α 0 = 0.01 was used together with the Bonferroni correction [55] to reduce the probability of false-positive results, as the test was repeated 144 times (16 subjects and 9 indices), giving us the final α p = 6.9 × 10 −5 . For the existing indices, the p-value was calculated for each channel, but only the p-values of the best channel (the lowest average of p-value for all subjects) are shown in Table 2.
The two novel indices show p-values lower than α p for most subjects. From this, we can conclude that, for Index1, 14 of 16 subjects show two different distributions for the W stage and the S1 stage, while 13 of 16 subjects show significantly different distributions of the W stage and the S1 stage for Index2. There are only two existing indices where the p-value is lower than αp in more than ten cases. These are θ/β and (θ + α)/(α + β), both by Jap et al. [37]. Table 3 shows p-values for each subject in the test set and each index. Again, the two novel indices, together with the (γ + β)/(δ + α) [10] index, show p-values lower than α p for most subjects.

Drowsiness Prediction Analysis
An additional comparison of ratio indices was performed by analyzing the drowsiness detection accuracy and precision, as obtained with the XGBoost algorithm [56]. Default parameters were applied: learning rate eta equal to 0.3, gamma equal to 0 and a maximum depth of a tree equal to 6. For a detailed comparison of the indices, classification accuracy and precision were calculated for each subject. Namely, each subject has 238 epochs of the measured signal, with the first half representing the W state and the second half the S1 state. The algorithm classified the subject's state for each epoch (238 classifications per subject), and the accuracy for each subject was calculated based on these classifications. The leave-one-subject-out cross-validation method was applied on the training set, i.e., the algorithm was trained on the data of 15 subjects and tested on the subject excluded from the training set, and this was repeated 16 times to evaluate drowsiness detection on each subject from the training set. Table 4 shows the classification accuracy achieved on the training set. Table 2. Statistical significance p-values were obtained by the Wilcoxon signed-rank test for distinguishing the awake state from the S1 state. The shaded green cells with bold text represent the lowest p-value for each subject in the training set. At the bottom, the index having p-values lower than α p for most subjects is marked in the same way.

Subject
Index1   Index1 has the highest average accuracy and the highest classification accuracy for 3 of 16 subjects. Index2 has the second-highest average accuracy and the highest classification accuracy for 4 of 16 subjects, which is the most of all indices. θ/α [36] and (θ + α)/(α + β) [37] are the only other indices with an average classification accuracy above 0.58, while θ/α [36] and θ/β [37] are the only other indices with the highest accuracy for 3 of 16 subjects. The β/α [36] index has the lowest average classification accuracy on the training set (0.5515). Table 5 shows the classification accuracy on the test set. Index1 has the highest average accuracy and the highest classification accuracy for 3 of 12 subjects. Index2 has the thirdhighest average accuracy and the highest classification accuracy for 4 of 12 subjects, which is the most of all indices. The only other index with comparable accuracy is θ/α [36], with the second-highest average accuracy. All other indices have at least 2.5% lower accuracy than the two novel indices. Table 6 shows the degree of precision of drowsiness detection on the training set. Index2 has the highest average precision of drowsiness detection and the highest precision of drowsiness detection for five subjects, which is the highest of all indices. Index1 has the second-best average precision of drowsiness detection. (θ + α)/(α + β) [37] and γ/δ [10] have a precision of drowsiness detection comparable to Index1 and Index2, while all other ratio indices have lower precision. Table 5. The classification accuracy was obtained with the XGBoost algorithm for each subject in the test set. The shaded green cells with bold text show the highest accuracy obtained for each subject. At the bottom, the best mean accuracy for each ratio index is marked in the same way.

Subject
Index1  Table 6. The precision of drowsiness detection was obtained with the XGBoost algorithm for each subject in the training set. The shaded green cells with bold text show the highest precision obtained for each subject. At the bottom, the best mean precision for each ratio index is marked in the same way.  Table 7 shows the degree of precision of drowsiness detection achieved on the test set. Index1 has the highest average precision. θ/α [36] and γ/δ [10] have 1% lower precision than Index1, while all other indices have at least 4% lower precision. Index2 has the second-highest average precision. Table 7. The precision of drowsiness detection was obtained with the XGBoost algorithm for each subject in the test set. The shaded green cells with bold text show the highest precision obtained for each subject. At the bottom, the best mean precision for each ratio index is marked in the same way.

Computational Analysis
With regard to the classification and the use of machine learning algorithms, an advantage of using the novel multichannel indices compared to the existing single-channel indices is also the saving of memory and time, due to the reduction of dimensionality. The accuracies of Index1 and Index2 from Table 4 were achieved with the model constructed from the single feature only, while all other indices had six features since the dataset contains six EEG channels. For this reason, storing the novel indices consumes six times less memory. The time consumption was measured as an average of 100 executions. The measured time included classifier initialization, classifier training, classifications on the test subject and calculation of classification accuracy. Table 8 shows the results of time consumption measurements. The use of the novel multichannel indices saves about 30% of time compared to all other traditionally used single-channel ratio indices.

Discussion
The main idea of our research was to combine frequency-domain features from different brain regions into a multichannel ratio index to improve frequency-domain features for the detection of drowsiness and to gain new insights into drowsiness. The results in Tables 2-8 suggest that two novel multichannel ratio indices improve the detection of drowsiness based on the frequency-domain features and reduce the time required for detection.
We must note that the main idea of this research was not to create the best possible model for drowsiness detection but only to bring improvement into frequency-domain features that are often used for drowsiness detection. Our focus was on developing the method for obtaining these novel indices, which is explained in Section 2.3 "Multi-Objective Optimization". In order to confirm that our conclusions also hold for other classifiers besides XGBoost, Table 9 shows the average accuracy on the test set obtained with Naïve Bayes, k nearest neighbors, logistic regression, decision tree, random forest and support vector machine classifiers (using the scikit-learn library at default settings). The average accuracies of two novel indices vary from 56% to 65% among the algorithms. All the algorithms show that our novel multichannel indices are better than existing single-channel indices. Table 9. The average accuracy was obtained on the test set with different classification algorithms. Each row is colored with a pallet of colors ranging from dark green for the highest number in the row to dark red for the lowest number in the row. The algorithms are: NB-Naïve Bayes, KNN-k nearest neighbors, Logistic-logistic regression, DT-decision tree, RF-random forest and SVM-support vector machine.

Algorithm
Index1 Our results were compared with the seven existing single-channel ratio indices that are currently state-of-the-art frequency-domain features. The newest one was introduced in 2016 [10], but all of these single-channel ratio indices are often used in the more recent drowsiness detection papers [57][58][59].
The authors in the aforementioned research report 92% accuracy as the best-obtained accuracy [57]. This accuracy was obtained based on the epoch-level validation. Epoch-level validation is a cross-validation procedure on the epoch level, which means that there is a very high probability that all subjects will have epochs in the training set and in the test set at the same time. On the other hand, subject-level validation is validation where it is ensured that subjects in the test set are not contained in the training set. An example of a subject-level validation is the leave-one-subject-out cross-validation that we used in this research. The only proper way for model validation is subject-level validation, as it represents the real-life setting in which the data from a new subject are used only for testing the model. Empirical tests conducted in related research showed a large difference in the accuracies between epoch-level validation and subject-level validation [60].
In a study from Mehreen et al. [57], the authors also provide subject-level validation, and the accuracy achieved was 71.5% based on 15 frequency-domain features. The highest accuracy achieved in our research is shown in Table 9, and it was 65.45%, achieved by logistic regression. This 65.45% accuracy is relatively close to 71.5%, and it must be noted that it was obtained based only on the Index2 feature, with a simple algorithm and without any parameter optimization. Due to this, we are confident that the addition of our two multichannel ratio indices would lead to an improvement in all state-of-the-art drowsiness detection systems that use frequency-domain features. Again, our aim was not to create the best possible drowsiness detection model but to prove that the novel multichannel indices are better than the existing single-channel frequency-domain features.
The Equations (4) and (6) for these multichannel ratio indices, obtained after optimizing the parameters with the optimization algorithm, suggest that alpha and delta are two of the most important frequency power bands for drowsiness detection. Equation (6) suggests that delta power in the frontal region describes drowsiness better than in the central region, while alpha power in the occipital and central regions describes drowsiness better than in the frontal region.
These results are consistent with several previous research papers on drowsiness detection that reported the importance of increasing alpha power [22,35,61,62]. Delta power is usually only present in deep sleep stages [36], so some researchers studying drowsiness do not include delta in their research [63]. However, there is still much research that includes delta power. The increase in delta power is considered to be an indicator of drowsiness [4]. Our research found that theta and beta powers are not as good drowsiness indicators as alpha and delta powers, while many other research studies disagree. A decrease in beta power was found to be an indicator of drowsiness in [4,36,64,65] and an increase in theta power was found to be an indicator of drowsiness in [27,34,61,62,65]. Wang et al. [38], in their study of microsleep events, found that alpha and delta rhythms characterize microsleep events. As mentioned earlier, there is an inconsistency in terminology, and some researchers consider sleep stage S1 as drowsiness [5][6][7][8][9], while Johns [11] considers it equivalent to microsleep events in the driving scenario. We used the data from sleep stage S1 and referred to it in this research as drowsiness. Since our results suggest that delta and alpha are the most significant for the detection of drowsiness, as in the work of Wang et al. [38] on microsleep events, our work suggests that sleep stage S1 may be more similar to microsleep events than to drowsiness, but further research is needed to support this as a fact.
Apart from the indication that drowsiness is closely related to microsleep events, it may also be closely linked to driver fatigue. Some researchers even use the term fatigue as a synonym for drowsiness [66]. Fatigue is a consequence of prolonged physical or mental activity [67] and can lead to drowsiness [68]. Normally, rest and inactivity relieve fatigue, however, they exacerbate drowsiness [69]. Lal and Craig [70] found that delta and theta band activities increase significantly during fatigue. Craig et al. [71] reported significant changes in the alpha 1, alpha 2, theta and beta bands, while they did not find any significant changes in the delta band when observing driver fatigue. Simon et al. [68] report that alpha band power and alpha spindles correlate with fatigue.
These three research papers [68,70,71] all use visual inspection to define the ground truth of fatigue. This approach to defining the ground truth is prone to subjectivity. A similar problem occurs when drowsiness is defined by using subjective drowsiness ratings, such as the Karolinska sleepiness scale [72].
Driver drowsiness, driver fatigue and microsleep events are defined as different internal states of the brain, but show similar behavior when observing the features obtained from the EEG. Possible explanations could be that fatigue, drowsiness and microsleep have a similar effect on brain functions and cause the driver's inability to function at the desired level. Most researchers of these three driver states only use frequency-domain features, while there are a number of other features (nonlinear features [30], spatiotemporal features [31] and entropies [32]) that could be used. Further studies with these features could find some features of the EEG signal that distinguish drowsiness, fatigue and microsleep. Distinguishing features of these three brain states could lead to the exact definitions of these terms. Precise and standardized definitions of fatigue, drowsiness and microsleep would help researchers to compare their work more easily. Figure 4 shows that the proposed procedure for creating the novel ratio indices has succeeded in creating step-like indices for a given subject. In addition to Index1 and Index2, which show desirable behavior, the indices (θ + α)/β and θ/β show similar, favorable behavior for a few channels. Figure 5 shows a comparison of novel ratio indices with the best and the worst channel for γ/δ and (γ + β)/(δ + α) single-channel indices for subject tr04-0726. Index1, index2, θ/α [36], (θ + α)/β, (θ + α)/(α + β) [37] and γ/δ [10] show similar behavior. These indices seem to detect drowsiness well, but with about a 50 epochs delay. Since several different single-channel indices that were previously shown to correlate with drowsiness together with two novel multichannel indices show the same delay in detecting drowsiness, this suggests that there may be shortcomings in the labeling of the initial signals. The manual for scoring sleep [42] provides guidelines for labeling, and it may be possible that the professionals who labeled the sleep signals labeled an approximate time of transition from the W state to the S1 state, as it is known that labeling any kind of several-hour-long EEG signal is a very tedious, hard and time-consuming job [73]. For this reason, the loose transition window is applied in the optimization algorithm, as described in Section 2.3.
The main shortcoming in applying our approach is the need to place six EEG electrodes on the driver's scalp while driving. Apart from being intrusive, there is also a problem with noise in real-world applications that cannot be neutralized with the current state-ofthe-art filter technology. All electrophysiological signals measured with wearable devices have a similar problem with intrusiveness and noise. ECG measurements, for example, are somewhat less susceptible to noise than EEG. Several recent works have shown that ECG can be used as a good predictor of sleep stages based on deep learning classifiers. Sun et al. [74] combined ECG with abdominal respiration and obtained a kappa value of 0.585, while Sridhar et al. [75] obtained a kappa value of 0.66. Combining EEG and ECG measurements has also been proposed in the context of driver drowsiness detection under simulator-based laboratory conditions [76]. Despite the problems of intrusiveness and noise susceptibility, research based on the electrophysiological signals brings a shift towards a precise definition of drowsiness. Once there is an exact definition of drowsiness or at least guidelines and manuals that accurately describe drowsiness (similar to the manuals for evaluating sleep stages), a big step will be taken to solve the problem of early detection of drowsiness [77]. It is doubtful that a wearable system based on electrophysiological signals will ever be widely used in real-world driving, but they still need to be developed. In our opinion, such wearable electrophysiological devices are more likely to be used for calibration/validation of non-intrusive systems (such as the driving performance-based or video-based systems) in controlled/simulated driving scenarios. In such scenarios, it is possible to control ambient noise, leading to a reduction in the effects of noise sensitivity.
An additional limitation of this work is that we were able to download data from 393 of 992 subjects completely, and only 28 of these 393 subjects were included in our study due to the inclusion condition that we described in Section 2.1 "Dataset, Preprocessing and Feature Extraction". Although it is a small subset of data, with the use of 12 subjects as a test set, we showed that the dataset is large enough to provide a good generalization (as seen in Tables 3, 5 and 7). In a recent review paper about state-of-the-art drowsiness detection [33], the authors reviewed 39 papers, and the average number of subjects in the included works is 23.5, which also indicates that our number of subjects included in the current study (28) is acceptable. Figure 5. The comparison of the two novel multichannel indices with the best and the worst channel for γ/δ and (γ + β)/(δ + α) single-channel indices for subject tr04-0726. The white part of the diagram represents the awake state, while the yellow part of the diagram represents the stage 1 of sleep, i.e., the drowsiness state.

Conclusions
This paper presented two novel multichannel ratio indices for the detection of drowsiness obtained by multi-objective optimization based on evolutionary computation. The results suggested that alpha and delta powers are good drowsiness indicators. The novel multichannel ratio indices were compared with seven existing single-channel ratio indices and showed better results in detecting drowsiness measured with precision and in the overall classification accuracy of both states using several machine learning algorithms. Our work suggests that a more precise definition of drowsiness is needed, and that accurate early detection of drowsiness should be based on multichannel frequency-domain ratio indices. The multichannel features also reduced the time needed for classification. The process of obtaining these indices by using a multi-objective optimization algorithm can also be applied to other areas of EEG signal analysis.
Research such as this, together with research on small hardware for physiologybased drowsiness detection, can eventually lead to an easy-to-use, non-intrusive device that reliably detects drowsiness. In addition, research on a reliable and standardized definition of drowsiness is needed and it would lead to improvements in the field of drowsiness detection.