2.1. Dataset, Preprocessing and Feature Extraction
The data used in this paper were obtained from the PhysioNet portal [
39], in particular from the 2018 PhysioNet computing in cardiology challenge [
40]. The original dataset contains data records from 1985 subjects, and each recording includes a six-channel EEG, an electrooculogram, an electromyogram, a respiration signal from the abdomen and chest, airflow and oxygen saturation signals and a single-channel electrocardiogram during the all-night sleep. The records were divided into training and test sets of equal size. The sleep stages [
41] of all subjects were annotated by clinical staff based on the American Academy of Sleep Medicine (AASM) manual for the scoring of sleep [
42]. There are six types of annotations for different stages: wakefulness (W), stage 1 (S1), stage 2 (S2), stage 3 (S3), rapid eye movement (REM) and undefined.
In this research, we wanted to use a training set (992 subjects) to detect drowsiness. The officially provided way of acquiring the data is through torrent download, but we managed to download only 393 subjects completely, due to a lack of seeders. Of these 393 subjects, EEG signal recordings from 28 subjects were selected, based on the condition that each recording had at least 300 s of the W stage and, immediately after that, at least 300 s of the S1 stage. From each recording, a fragment of 600 s (300 s of W stage and 300 s of S1 stage) was used for analysis. In the original dataset, each EEG signal recording consists of six channels (
O1 and
O2, based on the International 10/20 System), with a sampling frequency of 200 Hz.
Table 1 shows the identification numbers of all the selected subjects. The subjects were divided into two groups, one group used for training of the model (16 subjects) and the other one for the test of the obtained models (12 subjects). The training set was used to obtain novel ratio indices (with the method described below) and the test set was used to check these novel indices on the unseen data.
Before feature extraction, the EEG signal must be filtered. For this purpose, the DC component was removed from the signal and the signal was filtered with a Butterworth filter to remove high-frequency artifacts and low-frequency drifts. We used the sixth-order Butterworth filter, the low-cut frequency of 1 Hz and the high-cut frequency of 40 Hz. In the selected fragments of the recordings, there was an insignificant number of eye-related artifacts, so we decided not to use the independent component analysis for their removal in order to prevent potential information loss due to component removal.
The signals were divided into epochs to calculate features. The epochs were five seconds long with a 50% overlap between them. Frequency-domain features are often used in EEG signal analysis. These features were extracted from the power spectral density (PSD) of the signal. To obtain the PSD of the signal, Welch’s method [
43] was used. Welch’s method is used more often than Fast Fourier transform in the field of EEG signal analysis since it produces PSD with lower variance. The standard frequency-domain features were calculated, i.e., delta (δ, 0.5–4 Hz), theta (θ, 4–8 Hz), alpha (α, 8–12 Hz) and beta (β, 12–30 Hz) bands. We also calculated the less frequently used frequency-domain features, i.e., gamma (γ, >30 Hz), sigma (σ, 12–14 Hz), low alpha (α1, 8–10 Hz) and high alpha (α2, 10–12 Hz) bands [
2.2. Novel Multichannel Ratio Indices
Ratios between frequency-domain features have often been used as new features in different areas of EEG signal analysis [
36]. All these features have a simple mathematical formulation but often lead to an improvement in detection and reduction of dimensionality for drowsiness. Moreover, they are calculated based on a single channel only. The idea behind the novel indices we present in this work is to design the feature formulation in such a way that frequency-domain features from different channels can be combined.
Figure 1 illustrates the difference between these two approaches. For simplicity of visualization, only four epochs, two channels (
F3 and
F4) and three features per channel are shown in
Figure 1.
We define a new index,
I, for each epoch,
e, which is calculated as a ratio of the feature values,
e), for all six channels in the epoch,
e. In both the nominator and denominator, the feature value of each channel,
j, is multiplied with a dedicated coefficient,
Cij or
Kij respectively, as indicated in the Equation (1):
The purpose of the coefficients is to reduce or even eliminate the influence of certain channels of frequency-domain features, by setting the value in the range or increase the influence of certain channels of the frequency-domain features by setting the corresponding coefficient to a value in the range . There are 48 (6 channels and 8 features per channel) C coefficients and 48 K coefficients.
The ideal output of
e) should look like a step function (or an inverse step function), which would indicate a clear difference between the two stages: W and S1.
Figure 2 illustrates the main features of the output. The output can be divided into two parts: the left one corresponds to stage W and the right one to S1. While the output in each part should be as smooth as possible, i.e., with minimal oscillations, it is expected that there will be a transition period between the phases, which may have significant oscillations. This transition period would ideally be the step function, but in realistic settings, it is expected that the transition between phases of brain activity will probably last several epochs and would not be considered as either stage W or S1.
In order to determine the appropriate value of the coefficients that would provide the output as close as possible to the ideal, at least two criteria must be taken into account: the absolute difference between the mean values left and right of the transition window and the quantification of the oscillations in each part. This can be defined as a multi-objective optimization problem that we want to solve using a metaheuristic multi-objective evolutionary optimization method, as described in the next section. To the best of our knowledge, this state transition problem has never been approached with evolutionary computation.
2.3. Multi-Objective Optimization
The optimization of a step function that is representative of the problem of flat surfaces is generally a challenge for any optimization algorithm because it does not provide information about which direction is favorable and an algorithm can get stuck on one of the flat plateaus [
45]. To overcome this challenge, instead of optimizing the function according to one criterion, we define two objectives that we optimize simultaneously: (1) to maximize the absolute difference between the mean value of
e) output for the W and S1 stages, and (2) to minimize the oscillations of the output value around the mean value in each stage. According to
Figure 2, the left part of the
e) output occurring before the transition phase corresponds to the W stage, and the right part, occurring after the transition phase, corresponds to the S1 stage. Since optimization problems are usually expressed as minimization problems, where the first objective function,
O1, is defined as the inverse absolute difference between the mean value of
e) of the left part (
avgleft) and the right part (
avgright), Equation (2) is established:
The second objective function, O2, expresses the oscillations in the function and is defined as the number of times the difference between the output values of I(e) for two adjacent epochs was greater than a given limit. The exact value of this limit will be discussed later in this section as it is closely related to the specifics of the optimization method used. The main goal of the objective function O2 is to minimize the influence of the biggest flaw in the way that the objective O1 is calculated, i.e., to use the averaging function. For example, if a possible solution is a completely straight line, except for a large negative spike in the left part and a large positive spike in the right part, based only on the objective function O1, this would be a good solution, while the objective function O2 would penalize this solution.
As mentioned above, the transition between two stages will probably take several epochs and show significant oscillations of the function output values. According to the annotation made by clinical personnel, the transition phase should be approximately in the middle of the I(e) output, but it cannot be determined exactly how long it will last. In our work, which is based on expert knowledge of human behavior in the case of drowsiness, we assume that it lasts about one minute, which corresponds to about 30 epochs. Within the transition window, neither one of the two objective functions is calculated, since it is assumed to belong neither to the W nor to the S1 stages. We also allow it to move around the center, shifting left and right, due to a possible error of the human observer who marked the data.
The multi-objective optimization problem can now be expressed as min{
O2}, where
O1 and
O2 are the conflicting objective functions, as defined above. The evolutionary metaheuristic algorithm NSGA-II [
46] was applied to solve this multi-objective optimization problem. The genetic algorithms (GAs) are normally used to solve complex optimization and search problems [
47]. NSGA-II is one of the most popular evolutionary multicriteria optimization methods due to its versatility and ability to easily adapt to different types of optimization problems. The strong points of this MO algorithm are: (1) the fast non-dominated sorting ranking selection method used to emphasize Pareto-optimal solutions, (2) maintaining the population diversity by using the crowding distance and (3) the elitism approach, which ensures the preservation of best candidates through generations without the setting of any new parameters other than the normal genetic algorithm parameters, such as population size, termination parameter, crossover and mutation probabilities. Additionally, it was often used for the elimination of EEG channels with the similar purpose as in our case-dimensionality reduction [
48]. This paper uses the implementation of NSGA-II provided by the MOEA framework [
49] and is based on the guidelines defined in [
NSGA-II was used with the following configuration. The chromosome was divided into two parts: in the first part, genes represented the nominator coefficient values (
Cij), and in the second part, genes represented the denominator coefficient values (
Kij). In each part, the genes were grouped by frequency-domain features and channels, as illustrated in
Figure 3. The genes were encoded as real values in the range [0.0, 10.0], and standard NSGA-II crossover and mutation operators were used to support operation on real values.
Each solution is evaluated based on the values of objectives
O1 and
O2, as described in the pseudocode in Algorithm 1. First, the chromosome is decoded (line 1). Then, for each test fragment, two values are calculated: (1) the inverse absolute difference (IAD) between the mean index value,
e), of the left part and the right part, represented by the invAbsDiff variable in the pseudocode, and (2) the oscillations in the function, represented by the oscillation variable in the pseudocode (lines 3–5). Finally, the value of each objective
O1 for the given solution is defined as the average value of invAbsDiff for all test fragments, and the value of objective
O2 is defined as the average value of oscillation for all test fragments (lines 7–8).
Algorithm 1. Evaluation. |
1: decode chromosome to get coefficient values |
2: for each fragment do |
3: indexVals[[] = calculate index value for each epoch |
4: invAbsDiff += IADCalc(indexVals[[], windowStart) |
5: oscillation += OscillationCalc(indexVals[[], windowStart, winSize) |
6: end for |
7: objective1 = invAbsDiff/number_of_fragments |
8: objective2 = oscillation/number_of_fragments |
The algorithm for the IAD calculation is provided in the pseudocode in Algorithm 2. The calculation of the IAD for each fragment was slightly modified compared to Equation (1) to allow a faster convergence of the search algorithm. The transition phase was not in the same position in each fragment but allowed to move more loosely away from the center because the annotation in the original dataset was performed manually and there was a possibility of human error in case the observer would register a transition from W to S1 a little too early or too late. The algorithm allows the transition phase to begin no earlier than 30 epochs from the fragment start, and end no later than 60 epochs before the fragment end (line 2). The algorithm assumes the transition phase by looking for a window of 30 epochs which has the maximum difference of index, I(e), values between the left and the right part (lines 9–13).
The gradation of the absolute difference between the mean value of the left and the right parts is also introduced (lines 19–22) to allow easier and faster convergence of the algorithm. The optimization of the objective
O1 can be considered as an optimization problem with soft constraints that are related to how much
O1 deviates from the optimal value. However, it is quite difficult to determine the optimal value precisely a priori. As indicated in [
52], constraints are often treated with penalties in optimization techniques. The basic idea is to transform a constrained optimization problem into an unconstrained one by introducing a penalty into the original objective function to penalize violations of constraints. According to a comprehensive overview in [
51], the penalty should be based on the degree of constraint violation of an individual. In [
53], it is also recommended that instead of having just one fixed penalty coefficient, the penalty coefficient should increase when higher levels of constraint violation are reached. The greatest challenge, however, is to determine the exact penalty values. If the penalty is too high or too low, evolutionary algorithms spend either too much or too little time exploring the infeasible region, so it is necessary to find the right trade-off between the objective function and the penalty function so that the search moves towards the optimum in the feasible space. As the authors have shown in [
54], the choice of penalty boundaries is problem-dependent and difficult to generalize. Since we cannot strictly determine the optimal value of
O1 in our case, we have chosen several thresholds for the absolute difference value, with the penalty increasing by a factor of 10 for each new threshold. The exact thresholds were selected based on the experience gained from the first few trial runs of the algorithm. Based on the observations from the trial runs, a third modification was also introduced: the difference is calculated with a relative, instead of absolute, value of
e). The relative value of
e) is calculated by using the lowest
e) value as a reference point, instead of zero, i.e., the zero is “moved”, as shown in code lines 16–18 in Algorithm 2.
Algorithm 2. IAD Calculation. |
1: function IADCalc(indexVals[[], windowStart) |
2: for j between 30 and (indexVals.size-60) do |
3: maxAbsDiff = 0 |
4: left = 0 |
5: right = 0 |
6: avgLeft = average value of all Index values before j |
7: avgRight = average value of all Index values after j+30 |
8: diff = ABS(avgRight–avgLeft) |
9: if diff ≥ maxAbsDiff then |
10: maxAbsDiff 0 diff |
11: left = avgLeft |
12: right = avgRight |
13: windowStart = j |
14: end if |
15: end for |
16: lowestVal = getLowestVal(indexVals) |
17: movedZero = lowestVal–0.01*lowestVal |
18: absDiff = ABS(right–left)/MIN(left–movedZero, right–movedZero) |
19: if absDiff ≥ 5.0 then invAbsDiff = 1/absDiff |
20: else if absDiff ≥ 1.0 then invAbsDiff = 10/absDiff |
21: else if absDiff ≥ 0.5 then invAbsDiff = 100/absDiff |
22: else invAbsDiff = 1000 |
23: end if |
24: return invAbsDiff |
25: end function |
The pseudocode for calculating the oscillations in the function as the second objective,
O2, is provided in Algorithm 3. Again, the optimization of the oscillations can be considered a constrained optimization problem, so that, in the same way as in the case of the IAD calculation discussed previously, a gradation of the difference between the output values of
e) for two adjacent epochs is used to penalize the larger differences more severely (lines 7–10 and 15–18). The exact thresholds were chosen based on the experience gained from the first few trial runs of the algorithm. In order to make the algorithm converge more easily and quickly, the concept of “moved zero” was used again (lines 2, 3, 6 and 14).
Algorithm 3. Oscillation Calculation. |
1: function OscillationCalc(indexVals[[], windowStart, winSize) |
2: lowestVal = getLowestVal(indexVals) |
3: movedZero = lowestVal–0.01*lowestVal |
4: oscillation = 0 |
5: for i between 1 and windowStart–1 do |
6: absDiff = ABS((indexVals[i]-indexVals[i-1])/(indexVals[i-1] -movedZero)) |
7: if absDiff ≥ 5.0 then oscillation += 1000 |
8: else if absDiff ≥ 1.0 then oscillation += 100 |
9: else if absDiff ≥ 0.5 then oscillation += 10 |
10: else if absDiff ≥ 0.25 then oscillation += 1 |
11: end if |
12: end for |
13: for i between windowStart+winSize and indexVals.size()-1 do |
14: absDiff = ABS((indexVals[i]-indexVals[i-1])/(indexVals[i-1] -movedZero)) |
15: if absDiff ≥ 5.0 then oscillation += 1000 |
16: else if absDiff ≥ 1.0 then oscillation += 100 |
17: else if absDiff ≥ 0.5 then oscillation += 10 |
18: else if absDiff ≥ 0.25 then oscillation += 1 |
19: end if |
20: end for |
21: return oscillation |
22: end function |
Finally, to further minimize the oscillations, and help the search algorithm converge more quickly, the maximum change in the
e) value between two adjacent epochs is set to 10% of the first of the two epochs. The mathematical formulation of this limit is provided in Equation (3):