1. Introduction
Scientists have always been captivated by the brain, and cognitive processes are also the most intriguing for most people. A fundamental part of these cognitive processes is the attention process. To obtain knowledge, first, the attention process is needed. Attention is a cognitive process that enables selecting, focusing on, and sustained information processing [
1]. The object of attention can either be an environmental stimulus actively processed by sensory systems or associative information and response alternatives generated by the ongoing cognitive activity. This allows us to orient ourselves towards relevant stimuli, ignoring those not, and act accordingly. Moreover, it is the basis of learning, and it is necessary to have it, in order to begin the learning process. There have been many measuring techniques, such as using the response times or the number of clicks given while using particular software, the eye contact time measured from videos, Magnetic Resonance Imaging (MRI) or functional Magnetic Resonance Imaging (fMRI) studies, among other techniques. Autism Spectrum Disorder (ASD) is a neurodevelopmental life condition characterized by problems with social interaction, low verbal and non-verbal communication skills, and repetitive and restricted behavior [
2,
3]. People with ASD usually have variable attention levels because they have hypersensitivity and large amounts of environmental information are a problem for them.
There are many methods for measuring attention reported in the literature, such as eye-tracking/gaze [
4,
5], fMRI [
6,
7], using a program [
8], biofeedback [
9], and electroencephalographic (EEG) signals [
10,
11,
12], among others. The last one delivers great advantages over other neuroimaging techniques due to its high temporal resolution [
13], neurodevelopmental diagnosis accuracy [
14], cognitive-related bioelectrical data [
15], low cost [
16], and non-invasive application methods [
17]. The authors [
8] show how the attention of 49 children with ASD and a group of 51 typical children is measured using a mindfulness-based program (MBP); in other words, this is a computerized attention test. This MBP software measures the accuracy and reaction times, but they did not directly measure. Another way to measure attention is by analyzing the facial expressions or measuring the timing of eye contact from video recordings. The study [
5] shows a measuring technique based on the analysis of video recordings of 1756 toddlers from 12 to 72 months with ASD while watching selected short videos on an iPhone or an iPad. Their facial expressions are video-recorded and analyzed as they watch the videos. Reference [
18] presents a study about the concentration measurement of a children group while interacting with an NAO robot and their teacher. In this case, the eye contact time was measured by analyzing the video recordings of the sessions obtained with two cameras at the posterior.
The study [
19] shows an approach to the joint analysis of EEG and eye-tracking for children’s ASD evaluation. First, the synchronization measures, information entropy, and time-frequency features of the multi-channel EEG are derived. Then, a random forest is applied to the eye-tracking recordings of the same subjects to single out the most significant features. A convolutional graph network (GCN) model naturally fuses the two groups of features to differentiate the children with ASD from the typically developed (TD) subjects. Reference [
20] uses EEG activity (raw EEG and alpha power) to provide a time-resolved index of attentional orienting towards salient stimuli that either matched or did not match target-defining properties. In all of the references presented above, the use of feature extraction techniques helps to obtain information from the signals acquired. These feature extraction techniques can help us to obtain useful or descriptive information while eliminating or reducing redundant or unnecessary information, noise, or artifacts. Once the feature extraction stage has finished, the classification can quantify the signals. This paper also shows the feature extraction and the classification algorithms most frequently used.
Nowadays, intelligent systems that incorporate artificial intelligence (AI) frequently rely on machine learning (ML) [
21,
22]. ML is a term that refers to a system’s ability to learn from problem-specific training data in order to automate the process of developing analytical models and completing associated tasks [
23,
24]. Deep learning (DL) is a paradigm in machine learning that is based on the use of artificial neural networks [
25,
26]. Commonly, the use of ML algorithms is centered in the diagnosis or detection of ASD, as is presented in [
20]. The authors in [
27] used EEG and eye-tracking features to identify children with ASD. In [
28], the authors used deep convolutional architectures to detect ASD. Other studies [
29] reported statistical features for ASD classification. In reference [
30], they used an ML and a DL process for diagnosing ASD from time-frequency spectrogram images of EEG. The authors in [
31] reported that it is possible to evaluate mental stress using DL and EEG records. There are also studies such as [
32], where they used the free artifact signal of two electrodes to detect ASD. In [
33], they used a hybrid light-weighted feature extractor from signal to spectrogram images.
Recent studies have a focus on the relationship between human and machine behavior, based on the premise that diverse social and psychological backgrounds correspond in practice with different modalities of human–computer interaction [
34]. In general, EEG feature extraction techniques have offered strong clinical consistency since the beginning of their use for assessing and diagnosing different cognitive and neurological domains in ASD [
35], learning difficulties [
36], and attention [
37]. It is widely accepted that AI techniques are helpful for automatic diagnosis and rehabilitation procedures in ASD cases. For example, in [
38], a review of DL methods focusing on neuroimaging-based approaches is presented. Furthermore, the authors report a review of studies based on DL networks for diagnosing ASD and the challenges in automatized detection and ASD rehabilitation. Nowadays, there are some DL applications for brain disease diagnoses, such as the ones presented in [
39], which presents a review of automated multiple sclerosis (MS) detection methods based on MRI. They notice that the most used architectures for MS detection are convolutional neural networks (CNNs), autoencoders (AEs), generative adversarial networks (GANs), and CNN-RNN models. Schizophrenia (Sz) is another brain disease detected with DL methods using EEG signal processing [
40]. The authors compare their results with the traditional AI methods, such as support vector machine (SVM), k-nearest neighbors, decision tree, naïve Bayes, random forest, extremely randomized trees, and bagging. The DL models used are long short-term memories (LSTMs), one-dimensional convolutional networks (1D-CNNs), and 1D-CNN-LSTMs. Convolutional neural networks and LSTMs perform best, cross-validated with a k-fold of 5. Moreover, epileptic seizures are detectable by using EEG signal processing; for example, in [
41], the authors present a novel diagnostic procedure that uses fuzzy theory and DL techniques. They propose an adaptive neuro-fuzzy inference system (ANFIS) with a breeding swarm optimization (BS) method. These ANFIS-BS methods present accuracy of 99.74 % in a two-class classification task.
Appendix A summarizes in
Table A1 and
Table A2 the state of the art and shows a comparison with the proposed method, considering the dataset, data source, preprocessing, methods/algorithm, main findings, and applications.
The research questions that motivate this paper are: (1) What brain regions activate on average when attention increases? At what levels? Depending on the type of activity to be developed? (2) Can the level of the attention span of a person with Autism Spectrum Disorder be quantified as a feature using time-frequency analysis methods? (3) Is there a relationship between the increment in the power of electroencephalographic signals and attention span in a child with Autism Spectrum Disorder?
In this paper, the hypothesis is that measuring and quantifying the brain’s electrical activity (power spectrum density) makes it possible to assess the level of attention when performing various cognitive activities and interacting with different software or systems. Therefore, this article aims to detect when an ASD user has high attention levels while developing learning activities based on the EEG signals acquired by an Epoc+ Brain–Computer Interface (BCI). The novelty of this paper is the use of ML algorithms to classify the “Attention” and “No Attention” states of an ASD user. This research presents a new methodology based on EEG signals and ML algorithms for classifying the attention of a 13-year-old boy with ASD. This research formulates a method for processing electroencephalographic signals to determine attention lapses in people with ASD, tested by performing various learning activities and interacting with computer programs.
The rest of this paper is organized as follows.
Section 2 presents the materials and the proposed methodology.
Section 3 shows the findings of this paper.
Section 4 presents the discussion. Finally,
Section 5 summarizes our conclusions.
2. Materials and Methods
The approval of this research by the Ethics Committee and Research for Pre-Graduates and Post-Graduates of the Facultad de Ingeniería y Negocios Guadalupe Victoria de la Universidad Autónoma de Baja California was obtained on 8 October 2020, with the POSG/020-1-04 register. The EEG signals were acquired with an Epoc+ Brain–Computer Interface (BCI) [
42,
43] via the Emotiv Pro platform while the ASD user developed several learning activities, and data were processed with Matlab 2019a and Emotiv Pro software using the Student Version.
Figure 1 depicts the electrode location (left) and the Emotiv Epoc+ headset (right). According to the coherence analysis in attention [
44,
45], the selected electrodes were F3, F4, P7, and P8.
The proposed methodology and the simulations were performed on a personal computer with the following specifications: Intel(R) Core i5-8250U CPU @ 1.60 GHz, 1800 Mhz, 4 Cores, 8 Logical Processors, and 8 GB in RAM.
The signal was sampled at 2048 Hz, filtered with a dual-notch filter at 50 Hz and 60 Hz and a low-pass filter at 64 Hz, and then downsampled to 128 Hz for transmission. It was necessary to multiply the signal by 0.51 to convert it to a voltage.
The proposed data acquisition process is as follows:
- Step 1.
Place the headset with the electrodes hydrated on the test subject.
- Step 2.
Start the video recording and the EEG data acquisition.
- Step 3.
Give the worksheet to the test subject and the instructions.
- Step 4.
Let the test subject start the activity, and give him additional instructions if necessary, as in a regular school session.
- Step 5.
When the activity is over, stop video recording and data acquisition.
Figure 2 shows the EEG acquisition process and how the boy worked with the activity sheets using the Epoc+ headset.
2.1. Activity Sheets
Figure 3 and
Figure 4 depict examples of other activity sheets provided by the child’s teachers, according to his knowledge and abilities.
Figure 3 shows an activity sheet about reading, following instructions, and drawing.
Figure 4 is a counting animal activity sheet. The school for children with ASD Eduke (
https://www.facebook.com/EDUKE-123602824381330, accessed on 29 December 2021), located in Tijuana, Baja California, México, provided all the activity sheets used in this research.
2.2. Signal Processing Procedure
Figure 5 depicts the block diagram of the procedure used for signal processing. The first step is preprocessing the EEG signal, and then the power spectrum density of signals is calculated and separated into bands. Next, we obtain the features presented in
Table 1 and validate them. With these features, we train the machine learning algorithms. In the next section, we give more information about these steps.
2.2.1. Preprocessing of EEG Signal
The Emotiv software gives the recordings in a .csv file with integer numbers. It is necessary to convert the EEG signal acquired by the Epoc+ to its voltage equivalent by multiplying it by the factor .
2.2.2. Band Power Separation
In EEG signal processing, it is common to separate the power spectrum density into the following bands: Delta (1–4 Hz), Theta (4–8 Hz), Alpha (8–12 Hz), Beta (12–30 Hz), and Ram (or Gamma) (30–50 Hz), depicted in
Figure 6. These band powers [
46] are the basis for calculating relative powers and ratios in the feature extraction stage. The Emotiv software gives the power of each band, except for the Delta band, and it gives the Beta band separated into Low Beta and High Beta powers [
47]. For this research, we add both Beta band powers.
The Emotiv software uses two-second windows to calculate the power spectrum density in absolute values, with units
/Hz, and then separates it into bands. The two-second window involves 256 samples [
47,
48].
Figure 7 shows an example of band power separation. For this paper, we use the electrodes F3, P7, F4, and P8 because they show high coherence in attention tasks [
44,
45].
2.2.3. Feature Extraction
To detect the Theta–Beta Ratio (TBR) and the Theta–Alpha Ratio (TAR), it is necessary first to calculate the band power spectrum density (PSD) of the EEG signal in two-second windows and for each channel or electrode. It is common to use the TBR features in attention detection and neurofeedback and the Theta Relative Power Beta and Theta/(Alpha + Beta), known as TBAR [
48].
Table 1 presents the features calculated and their equations [
48]. The next step is to use these features to train several machine learning models and evaluate their performance.
Table 1.
Feature equations for attention detection.
Table 1.
Feature equations for attention detection.
Feature | Equation |
---|
Theta Relative Power | TRP |
Alpha Relative Power | ARP |
Beta Relative Power | BRP |
Theta–Beta Ratio | TBR |
Theta–Alpha Ratio | TAR |
| TBAR |
Figure 8 depicts the Theta, Alpha, and Beta relative powers (R.P.) obtained for the F4 electrode using the equations presented in
Table 1. These R.P. values change with the time and function of the activity performance.
Figure 9 shows the Theta–Beta Ratio, Theta–Alpha Ratio, and Theta/(Alpha–Beta Ratio) for the same F4 electrode.
2.2.4. Dataset Preparation
The dataset consists of 24 features, 6 features for each electrode, with four electrodes (F3, F4, P7, and P8) and two classes: “Attention” and “No Attention”. The dataset has 33,936 samples; it has 16,968 samples for each class to conserve balance.
Figure A1 from
Appendix B shows a fragment of the created dataset with 24 features acquired through the processing of EEG signals when the user is performing didactic activities and paying attention and when he is not paying attention to his learning process.
The
Supplementary Materials dataset included six different Attention activities (counting, forming words, completing words, looking for differences between two figures, reading text, and answering simple questions from the reading), taken in 6 different moments. There are also No Attention samples recorded in non-learning activities such as watching cartoons, echolalia, doing nothing, and just sitting awake, trying to be as relaxed as possible.
2.2.5. Machine Learning Algorithm Training
In this paper, we chose eight ML algorithms to evaluate the classification of attention through the EEG signals of an ASD user. The chosen ML algorithms were naive Bayes (N.B.), stochastic gradient descent (SGD), decision trees (D.T.), support vector machine (SVM)-RBF, k-nearest neighbors (KNN), multi-layer perceptron neural network (MLP-NN), random forest (R.F.), and extra trees (E.T.). These ML models are part of the Scikit Learn library [
49].
Figure 10 shows the flowchart to perform the training test of the ML algorithms. First, it is necessary to import the libraries or toolboxes required, such as Scikit Learn, Pandas, and Seaborn. Then, the features dataset is loaded; subsequently, separating the input data (features) from the output data is necessary. Next, we randomly divide the dataset, 80% for training and 20% for tests. Then, the data are scaled between 0 and 1 to obtain optimized results. Then, the machine learning model is trained. Then, we perform the scoring of the ML model, i.e., using the confusion matrix and performance metrics to evaluate the ML models.
3. Results
To evaluate the ML models, we rely on the metrics of the Scikit Learn library [
49]. The metrics used to evaluate the scoring of the ML models are the confusion matrix (true positives, true negatives, false positives, false negatives), accuracy, F1 score, precision, sensitivity/recall, and specificity.
Table 2 shows the scoring parameters obtained for the ML models tested in this paper. The first four parameters correspond to the results of the confusion matrix. Naive Bayes with an accuracy of 0.7628, SGD with 0.8619, decision tree with 0.8697, SVM-RBF with 0.8940, KNN with 0.8968, MLP-NN with 0.9298, random forest with 0.9291, and finally extra trees with an accuracy of 0.9270. Therefore, the extra trees model has the best accuracy score.
Regarding the F1 score parameter, it is observable that naive Bayes, SGD, decision trees, and SVM-RBF obtained a score lower than 0.90. Meanwhile, the KNN, MLP-NN, random forest, and extra trees models obtained a score greater than 0.90, with extra trees achieving the highest score. Regarding the specificity/precision, we observed that the naive Bayes model was the lowest, while the extra trees and MLP-NN models were the highest, with 0.8896 and 0.9155, respectively. Regarding the sensitivity/recall score, all the models obtained a result greater than 0.90, except decision trees with 0.8720, and the extra trees model achieved the best result with 0.9738.
Table 3 shows the performance metrics obtained for each ML model. The metrics used to evaluate the performance of the ML models were the Area Under the Curve (AUC), the Cohen’s Kappa coefficient, Hamming loss, and the Matthews correlation coefficient. Regarding the AUC metric, we notice that the naive Bayes, stochastic gradient descent, and decision trees models are the lowest, with 0.7642, 0.8624, and 0.8697, while the support vector machine (SVM)-RBF, KNN, extra trees, MLP-NN, and random forest (R.F.) models are the ones that obtained the best AUC, with 0.8944, 0.8972, 0.9274, 0.9299, and 0.9294, respectively, with the MLP-NN model obtaining a better AUC. This measure compares labelings by different human annotators, not a classifier versus ground truth, regarding Cohen’s Kappa coefficient. The Kappa score is a number between −1 and 1. Scores above 0.8 indicate good agreement; zero or lower means no agreement (practically random labels). We observe that the naive Bayes, stochastic gradient descent, decision trees, support vector machine (SVM)-RBF, and KNN models obtained a Kappa coefficient less than 0.80 but greater than zero. However, the extra trees, MLP-NN, and random forest (R.F.) models obtained Kappa coefficients of 0.8542, 0.8597, and 0.8583, respectively, which are more significant than 0.80. Therefore, it means that these ML models have good agreement. We notice that the model MLP-NN is the one that obtained the highest Cohen’s Kappa coefficient.
Regarding the Hamming loss, this Hamming loss should be zero; that is, the closer it is to zero, the model tends to be perfect or ideal. In this case, the extra trees, MLP-NN, and random forest (R.F.) models have the lowest Hamming loss. The MLP-NN model has the lowest Hamming loss, with 0.0701. We use in machine learning the Matthews correlation coefficient (MCC) or phi coefficient as a measure of the quality of binary (two-class) classifications, introduced by biochemist Brian W. Matthews [
50]. In this case, the three best models are extra trees, MLP-NN, and random forest (R.F.), with 0.8580, 0.8602, and 0.8613, respectively, with random forest being the best (R.F.).
Figure 11 depicts the ROC curve of the top five ML models trained for attention classification using EEG data. The ROC curve shows the trade-off between sensitivity (TPR) and specificity (1-FPR). Classifiers that give curves closer to the top-left corner indicate better performance. The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test is. The SVM-RBF and KNN models are closer to the 45-degree diagonal, resulting in less accuracy. On the other hand, the random forest, extra trees, and MLP-NN models are closest to the upper left. Therefore, they are the ones with the best performance.
Figure 12 depicts the training time of the eight ML models tested in this study. The N.B., SGD, KNN, and D.T. models have the shortest training time. However, according to the results shown in
Table 2 and
Table 3, they have the lowest performance metrics. In contrast, the SVM-RBF, R.F., and MLP-NN models have a longer training time of 17.01, 21.14, and 73.10 s, with the MLP-NN model having a longer training time. However, the model also has better performance metrics, as shown in
Table 2 and
Table 3. Therefore, the classifier designer must conduct a cost–benefit analysis in terms of accuracy and processing time. In most cases, programmers prefer better accuracy, sacrificing training time since this process (training) is only done once and only uses the trained model. For this reason, in this study, it would be more convenient to choose the MLP-NN model.
4. Discussion
In this research, we observed that the power spectrum density (PSD) is helpful for attention detection, as proposed in the hypothesis. The features based on band PSD, such as Relative Theta Power (RTP), Relative Alpha Power (RAP), Relative Beta Power (RBP), Theta–Beta Ratio (TBR), Theta–Alpha Ratio (TAR), and the TBAR are good features for attention classification. With these features, the multi-layer perceptron neural network model (MLP-NN) achieved the best performance, with an AUC of 0.9299, Cohen’s Kappa coefficient of 0.8597, Matthews correlation coefficient of 0.8602, and Hamming loss of 0.0701. Nevertheless, MLP-NN requires a longer training time of up to 73.1 s. However, the results presented in
Table 2 and
Table 3 and
Figure 11 and
Figure 12 show that the random forest and extra trees models have good performance metrics and a training time of 21.14 and 2.21, respectively. Therefore, the classifier designer must perform a cost–benefit analysis in terms of accuracy and processing time. In most cases, designers prefer better accuracy, sacrificing training time since this process (training) is only performed once, and then only the trained model is used. For this reason, in this study, it would be more convenient to choose the MLP-NN model.
Furthermore, feature extraction improves the acquisition of relevant information for accuracy for diagnosis and has been widely applied to different neuropsychological and neurophysiological fields [
51]; the type of waveforms and definition of the morphology of EEG patterns increases the amount of available information for clinical decision making from brain dysfunction [
52] to cognitive impairment [
53]. Particular interest has been historically directed to the frontal areas in attention measurement, as they correspond to the brain regions responsible for activity direction and orientation. Classification of features may help to describe cortical connectivity, particularly for attentional deficits associated with frontal theta in children [
36]. Other research refers to frontal bilateral theta waves in resting EEG in children with learning difficulties and an association with bilateral synchronous frontal theta waves [
37], which closely relates to techniques for brain activity description in this study.
Limitations of the Study
One of the limitations of this research is that a BCI is required. The ASD user should not have much hair. The BCI must be pleasant and tolerated by them. Moreover, the electrodes must be kept hydrated with saline solution. It also depends on the battery life of the BCI. The emotional state of the ASD user is essential because good measurements will not be obtained if altered. Activities should be done in a scenario with learning conditions without distractions, such as a classroom.
5. Conclusions
In this paper, a methodology for the classification of attention by EEG signals of an ASD user was presented. The EEG data acquisition was performed while the ASD user performed some didactic learning activities. In addition, our dataset was created for the post-processing of the information and training of the ML algorithms. To create the dataset, it was necessary to perform preprocessing, filtering, and feature extraction. The proposed features can be used to train and evaluate several ML models to classify attention using EEG signals.
On the other hand, with these findings, therapists, teachers, and psychologists can develop better learning scenarios according to the cognitive needs of ASD users. In addition, diagnosis accuracy can be improved by acquiring individual EEG features, which provide relevant information for differential clinical neurodevelopmental symptomatology classification. Furthermore, with the proposed methodology, one can obtain quantifiable information about the performance of ML models when an ASD user performs didactic/learning activities, the above with the purpose of reinforcing the perception of the teacher or therapist.
The future work will involve implementing the proposed method on a real-time embedded system—for example, a stand-alone version using an edge device, novel deep learning methods, and internet of things (IoT). It is possible to explore the feasibility of a mobile-based platform that links with a BCI, instead of a computer. Furthermore, future replication of this methodology is needed to approach a broad spectrum of attention processes and standard estimation.