Advancing Emotion Recognition: EEG Analysis and Machine Learning for Biomedical Human–Machine Interaction
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors1. This manuscript is not a review paper. As a research article, the introduction part should be refined to highlight the existing problems in the current research.
2. The manuscript needs to clearly highlight how its approach differs significantly from existing models and why it is a necessary advancement.
3. In Line 285, 10 cross-validations were carried out, but the training and test sets were divided into 70% and 30% in Figure 5, Why?
4. Include detailed explanations of used features in the manuscript.
5. Include detailed explanations of feature selection method in Figure 5.
6. Why do you select Fp1, AF3 and F3 electrodes in Figure 6-8? How do this results relate to the other results of the article?
7. How can be used to identify specific patterns associated with different mental states in Figure 12?
Author Response
Reviewer 1
R1: This manuscript is not a review paper. As a research article, the introduction part should be refined to highlight the existing problems in the current research.
A: Thank you for this suggestion. The “Introduction” section has been widely revised. It is the authors belief that the possibilities offered by EEG emotion detection are better described in the document. In subsection “Problem” a clarification was also included as: “When examining brain activity, the existing literature focuses on two primary aspects. First, it is crucial to identify which electrodes, corresponding to active brain regions, are associated with specific emotional states. Second, attention is directed toward the signal itself, aiming to determine which patterns correlate with particular emotions. From a computational perspective, especially within a machine learning framework, these aspects translate into identifying relevant features and selecting appropriate classification models. Together, these components create a broad and promising area of research that remains relatively underexplored, offering significant potential for advancements in emotion decoding and brain-computer interface applications.”
R1: The manuscript needs to clearly highlight how its approach differs significantly from existing models and why it is a necessary advancement.
A: Thank you for this question. The classification of emotions from EEG data remains an active area of research, with no universally accepted approach established to date. In the proposed manuscript, multiple machine learning techniques are systematically evaluated using a single dataset and a unified feature extraction method. This setup serves as a controlled testbed, enabling direct comparisons of various classification algorithms. Furthermore, the selected algorithms are designed to be interpretable, emphasizing explainability—a critical factor when working with physiological data and healthcare-related systems. The study also investigates the most significant electrodes associated with each emotion, offering insights into the active brain regions from a physiological perspective. These findings not only enhance our understanding of emotional processes but also contribute to the development of simpler and more efficient classification systems. Knowing the relevant electrodes makes it feasible to adapt and implement such systems in neurowearable devices, which are always constrained by a limited number of electrodes, computational resources, and power. The focus on simplicity and adaptability of the proposed work, can lead to practical, low-cost emotion recognition tools in wearable technology, an approach that is rather unexplored in existent literature.
R1: In Line 285, 10 cross-validations were carried out, but the training and test sets were divided into 70% and 30% in Figure 5, Why?
A: We have added more explanatory text about the cross-validation process to the ‘2.2.5 Statistical analyses’ section. The text is as follows:
“In this study, a 5-fold cross-validation approach was used. This method divides the dataset into 5 equal parts (folds), ensuring that each fold serves as the test set exactly once while the remaining 4 folds are used for training the model. This process is repeated across 5 iterations, resulting in a comprehensive evaluation of the model’s performance. Cross-validation is crucial because it prevents the model from overfitting, i.e., memorizing the training data, and ensures it generalizes well to unseen data. A ratio of 70 % for training and 30 % for testing was chosen in order to balance the trade-off between learning and evaluation. Allocating 70 % of the data for training provides the model with a sufficient amount of information to learn from, while reserving 30 % for testing ensures a reliable and representative evaluation of its performance. An inadequate split can lead to overfitting, where the model performs well on training data but fails to generalise to new data. Bearing in mind that when running the code several times, the results may change somewhat due to, for example, splitting the data into different training and test sets, the cross-validation process was repeated 10 times and then the average of the accuracy and F1 scores obtained were calculated, as well as the standard deviations of the results. This combined approach of cross-validation and repeated evaluations provides confidence in the reported results by ensuring they are not dependent on a single random split of the data. It reflects the model’s ability to generalize across diverse data configurations, which is essential for assessing its real-world applicability.”.
R1: Include detailed explanations of used features in the manuscript.
A: We have added the chapter “2.2.3. Feature extraction”, where we explain in more detail the features used in the manuscript.
R1: Include detailed explanations of feature selection method in Figure 5.
A: Thank you for your suggestions. We’ve added more information in section ‘2.2.5 Statistical analysis’ about this topic. The explanatory paragraph is:
“It should be noted that selecting the right characteristics of the EEG signals is crucial to identifying the most relevant information for a given predictive model. This can be done by eliminating redundant variables and selecting those that best distinguish between classes. In addition, to obtain better results for the various parameters evaluated, it is necessary to optimise the classification process by exploring the parameter space. In this way, by reducing the dimensionality of the data and the complexity of the model, it is possible to promote a greater computational efficiency and accuracy of the results, leading a more portable system, easy to implement, for example in a neurowearable device.”
R1: Why do you select Fp1, AF3 and F3 electrodes in Figure 6-8? How do this results relate to the other results of the article?
A: We’ve selected those electrodes due to their relevance to our study. This information was already present in the text but has now been moved to subsection ‘2.2.1. Data preparation and analysis’. The explanation text is:
“In this study, the Fp1, AF3 and F3 electrodes were considered due to their relevance for capturing electrical brain activity in areas of the brain associated with emotional processing, offering specific and highly informative data. These areas of the brain have shown greater efficiency in emotional recognition compared to other channels, making them ideal for this type of analysis”.
The results obtained made it possible to identify the most active frequencies during emotional states, enabling the subsequent construction of topographical maps, as described in chapters ‘2.2.3. Feature extraction’ and ‘3. Results’.
R1: How can be used to identify specific patterns associated with different mental states in Figure 12?
A: We have deleted this figure because we believe it is not necessary for the study in question.
Author Response File: Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsReis and collaborators used Spectral EEG power as a subjective measure of emotion valence and arousal. After reading the introduction, I was expecting the study to look at EEG correlates of emotions, but the project did not look at emotions per se. The project needs to be put in the proper context in the introduction. Statistical methods are missing, as well as a understandable interpretation of the results. The results have also not been presented against the literature. Please find more specific comments below.
Abstract: Should be re-written after my comments have been addressed.
Introduction:
- "but also as an aid in the provision of care and in the development of brain-computer interfaces" This is a bit sudden, I suggest deleting as the concepts are better introduced further down.
- Third paragraph of 1.1.1 and table 1 are redundant, I suggest keeping one or the other.
- Last paragraph of 1.1.1 repeated. I also question the need for this paragraph considering that the introduction is very long.
- Please define SEED and DEAP datasets.
- While sections 1.1 and 1.2 are very high level and detailed, 1.3 is quite hard to read with concepts such as "AsMap two-dimensional feature vector", "the influence of window size", "EEG sine recordings" and "eight octants within a Valence-Arousal-Dominance space" which can't be understood without explanation.
- Replace 'at least' by 'at last'?
- "the highest recognition accuracy of 62.58 %," compared to what? Accuracy with CNN was 84.69.
- Table 1 there should be table 2 and I suggest moving it to supplementary.
- "unlike non-physiological ones," This is very broad and one can assume that some non-physiological signals would, alike physiological ones, not be influenced by subjective factors. Please delete.
Methods:
- "The 32 participants included individuals were 19 years old and, 37,50% of whom were women." Everyone was 19yo? Also please rephrase.
- "a video of the frontal face was recorded for 22 of these participants" What was this video for? And why not all of them?
- We need more information about the input data. Were the music clips intended to induce emotions? Were participants subjective emotions captured? (I see that there is some information about this further down, it needs to be brought up). Also, EEG recordings settings need to be presented.
- "a selection of data was made, including only the information from the 22 individuals who had facial recordings available" Why only these individuals?
- "To detect possible anomalies in the data (unusual or unexpected patterns in the brain signals, which may be indicative of an emotion)," I'm not sure it's right to talk about 'anomalies' here?
- Please define MNE and SVM.
- Statistical analyses section is missing.
Results and discussion:
- I'm not sure that the first paragraph and Figure 6 brings anything relevant to this project's results.
- Reading the results, some elements of methods are repeated here. I suggest to lighten the methods section, presenting only the general design, and bring up the specific methods used as the results are presented. Indeed, the results are very much related to the methodological steps and it is hard to separate them. Indeed, I could not understand much of the methods section, but now I understand as I read the results.
- Need to specify why Fp1, AF3 and F3 were chosen.
- One thing that is not clear to me is: are the graphs presented extracted from a single participant or is it group average? Is the 'anomaly' detection on a single participant or is it group average? I assume it is all at the participant level, but it is not clear. If it's average, averaged around what?
- Valence and arousal are averaged across participants, right? Again, not specified.
- Figure 11 and table 4 need a legend to define the acronyms.
- Table 4 title: what statistical analyses were conducted?
- "time interval from 0.05 to 0.251 seconds." after what?
- I'm not sure that figure 12 is needed.
- "In these graphs, it is worth noting a slightly displaced position of the electrodes in relation to the head, which is a more accurate reflection of reality since, when the electrode network is removed, it is larger than the head itself." This sentence doesn't make much sense to me.
- Figure 13: "a greater incidence can be seen in the occipital region of the brain," What makes you say that? I have the same question for figure 14. Theta and Alpha seem to vary a lot with time, so it would be important to clarify your interpretation.
- What about Delta?
- Between figure 13 through to 20 (not 209), there's a lot of 'decrease', 'increase' of activity, but with no support from any statistical analyses. It is unclear how the authors proceed with the interpretation of the graphs.
- Tables 5 and 6: what are these numbers? What do they mean?
- Figures 21 and 22: once again, what am I looking at? What do these matrices tell me?
- I saw a few elements of discussion throughout the results section, but a proper discussion section is missing. The results need to be summarise and described/compared against the literature.
Conclusions:
- "This study has provided a deeper understanding of the EEG, highlighting it as a valuable tool in the detection of emotions, offering significant insights into the brain activity 456 associated with different emotional states." I can't immediately say this after reading your results. You need to offer me a summary of the results first.
- I'm not sure how these results can be useful in the 'real world'.
Author Response
Abstract:
R2: Should be re-written after my comments have been addressed.
A: Thank you for the suggestion. The abstract has been changed to the text shown below:
“Human emotions are subjective psychophysiological processes that play an important role in the daily interactions of human life. Emotions often do not manifest themselves in isolation, people can experience a mixture of them and may not express them in a visible or perceptible way. In this sense, this study seeks to uncover EEG patterns linked to emotions, as well as examine brain activity across emotional states and optimize machine learning techniques for accurate emotion classification. For these purposes, the DEAP dataset was used to comprehensively analyse electroencephalogram (EEG) data and understand how emotional patterns can be observed. In addition, machine learning algorithms, such as SVM, MLP and RF were implemented to predict valence and arousal classifications for different combinations of frequency bands and brain regions. The analysis reaffirms the value of EEG as a tool for objective emotion detection, demonstrating its potential in both clinical and technological contexts. By highlighting the benefits of using fewer electrodes, this study emphasizes the feasibility of creating more accessible and user-friendly emotion recognition systems. However, further improvements in feature extraction and model generalisation are necessary for clinical applications. This study highlights the potential of emotion classification to develop bio-medical applications but also to enhance human-machine interaction systems.”
Introduction:
R2: "but also as an aid in the provision of care and in the development of brain-computer interfaces" This is a bit sudden, I suggest deleting as the concepts are better introduced further down.
A: Thank you for the suggestion. We removed the phrase and added "The categorisation of emotions represents a significant step forward in deepening the understanding of emotional states".
R2: Third paragraph of 1.1.1 and table 1 are redundant, I suggest keeping one or the other.
A: Thank you for the suggestion. We removed the table and left some of the text that we found to be more explicit and informative.
R2: Last paragraph of 1.1.1 repeated. I also question the need for this paragraph considering that the introduction is very long.
A: Thank you for noting this. We removed the paragraph and Figure 4.
R2: Please define SEED and DEAP datasets.
A: Thank you for the suggestion. We have defined the SEED dataset as “SJTU Emotion EEG Dataset” and DEAP dataset as “Dataset for Emotion Analysis using Physiological Signals” on lines 179 and 180.
R2: While sections 1.1 and 1.2 are very high level and detailed, 1.3 is quite hard to read with concepts such as "AsMap two-dimensional feature vector", "the influence of window size", "EEG sine recordings" and "eight octants within a Valence-Arousal-Dominance space" which can't be understood without explanation.
A: Thank you for your suggestion. We've made some changes to section “1.5. Supporting Studies”, including defining the concepts mentioned.
R2: Replace 'at least' by 'at last'?
A: Thank you for the suggestion. We’ve made this change.
R2: "the highest recognition accuracy of 62.58 %," compared to what? Accuracy with CNN was 84.69.
A: We don't understand this comment. In the study carried out by Kusumaningrum et al. (2020), the methods used were Random Forest, SVM and k-NN, and the highest accuracy was obtained with the first method. In this study CNNs were not used.
R2: Table 1 there should be table 2 and I suggest moving it to supplementary.
A: Thank you for the suggestion. We’ve made those changes.
R2: "unlike non-physiological ones," This is very broad and one can assume that some non-physiological signals would, alike physiological ones, not be influenced by subjective factors. Please delete.
A: Thank you for the suggestion. We've deleted that segment of the sentence.
Methods:
R2: "The 32 participants included individuals were 19 years old and, 37,50% of whom were women." Everyone was 19yo? Also please rephrase.
A: Thank you for the suggestion. We've made changes to the text of the “2.1 Input data”. That sentence has been changed to the text shown below:
“The data analysed consisted of EEG and peripheral physiological signals from 32 healthy participants, included individuals were aged between 19 and 37 (average 26.9 years), which 50 % of whom were women.”
R2: "a video of the frontal face was recorded for 22 of these participants" What was this video for? And why not all of them?
A: Thank you. We have revised the sentence to make it clearer and more descriptive, stating: “The aim of capturing these videos is to allow the participants' facial expressions to be analysed while watching the music videos. This helps to correlate the emotional responses expressed in the participants' evaluations with physiological reactions and facial expressions, providing a more comprehensive understanding of the emotions induced by the stimuli. The dataset documentation does not provide an explanation as to why the frontal face video recording was conducted for only 22 out of the 32 participants, leaving this aspect unclear.”
R2: We need more information about the input data. Were the music clips intended to induce emotions? Were participants subjective emotions captured? (I see that there is some information about this further down, it needs to be brought up). Also, EEG recordings settings need to be presented.
A: Thank you for the suggestion. We’ve made changes to the section “2.1 Input data”, where we explain this dataset in more detail.
R2: "a selection of data was made, including only the information from the 22 individuals who had facial recordings available" Why only these individuals?
A: We have added information to section ‘2.2.1 Data preparation and analysis’ in order to better explain this choice. The explanation sentence is: “This choice allows for a more in-depth and detailed study of the data.”
R2: "To detect possible anomalies in the data (unusual or unexpected patterns in the brain signals, which may be indicative of an emotion)," I'm not sure it's right to talk about 'anomalies' here?
A: We changed that concept to “irregularities”.
R2: Please define MNE and SVM.
A: Thank you for the suggestion. We’ve defined MNE as “a framework Magnetoencephalography and Electroencephalography (MNE)” and SVM as “Support Vector Machines”.
R2: Statistical analyses section is missing.
A: Thank you. We decided to add a statistical analysis section in chapter ‘2.2. Methodology’.
Results and discussion:
R2: I'm not sure that the first paragraph and Figure 6 brings anything relevant to this project's results.
A: Thank you for the suggestion. We’ve decided to delete that information.
R2: Reading the results, some elements of methods are repeated here. I suggest to lighten the methods section, presenting only the general design, and bring up the specific methods used as the results are presented. Indeed, the results are very much related to the methodological steps and it is hard to separate them. Indeed, I could not understand much of the methods section, but now I understand as I read the results.
A: Thank you for the suggestion. In order to make the methods ‘lighter’, the specific methods used at each stage have been presented in the results. In the methods section, only a general explanation of what was carried out in the study was provided.
R2: Need to specify why Fp1, AF3 and F3 were chosen.
A: We’ve selected those electrodes due to their relevance to our study. We explained this better in the ‘2.2.1. Data preparation and analysis’ section, with the following text:
“In this study, the characteristics extracted from the data correspond to the signals coming from the Fp1, AF3 and F3 electrodes. These electrodes were considered due to their relevance for capturing electrical brain activity in areas of the brain associated with emotional processing, offering specific and highly informative data. These areas of the brain have shown greater efficiency in emotional recognition compared to other channels, making them ideal for this type of analysis.”
R2: One thing that is not clear to me is: are the graphs presented extracted from a single participant or is it group average? Is the 'anomaly' detection on a single participant or is it group average? I assume it is all at the participant level, but it is not clear. If it's average, averaged around what?
A: The graphs presented and the anomaly detection were performed at the level of a single participant, not as an average across participants. We have added information to section ‘2.2.1 Data preparation and analysis’ in order to better explain this choice. The explanation text is:
“Furthermore, as the aim is to identify irregularities or specific patterns of emotional response, analysing one individual can make it easier to identify unique characteristics that could be missed in a more comprehensive analysis.”
R2: Valence and arousal are averaged across participants, right? Again, not specified.
A: No, as mentioned above, these results were performed at the level of a single participant, not as an average across participants.
R2: Figure 11 and table 4 need a legend to define the acronyms.
A: We've added a legend to Figure 11, which also serves for Table 4.
R2: Table 4 title: what statistical analyses were conducted?
A: We have changed the title of the table so that it is clearer which statistical analyses were carried out. The title is now: “Table 4. Statistical Analysis of Valence and Arousal Combinations: Comparison of Mean and Standard Deviation Across Categories (HAHV, LAHV, HALV, LALV).”.
R2: "time interval from 0.05 to 0.251 seconds." after what?
A: We've added a text explaining the choice of this interval:
“After, topographic maps of each brain wave were developed in the time interval from 0.05 to 0.251 seconds. During this interval, a fixation cross was presented to the participant, who was asked to relax during this period. In this way, the brain's initial response to visual or auditory stimuli is captured, which is crucial to understanding how the brain processes and reacts to new stimuli and emotional states. Additionally, focusing on a specific and relatively short interval can help reduce variability in the data. Figures 10 to 13 show the topographic maps for theta, alpha, beta and gamma waves, respectively.”
R2: I'm not sure that figure 12 is needed.
A: Thank you for the suggestion. We’ve decided to delete that figure.
R2: "In these graphs, it is worth noting a slightly displaced position of the electrodes in relation to the head, which is a more accurate reflection of reality since, when the electrode network is removed, it is larger than the head itself." This sentence doesn't make much sense to me.
A: We deleted this sentence.
R2: Figure 13: "a greater incidence can be seen in the occipital region of the brain," What makes you say that? I have the same question for figure 14. Theta and Alpha seem to vary a lot with time, so it would be important to clarify your interpretation.
A: Thank you for this question. We have added to section ‘4. Discussion’ a text that makes the interpretation of the topographic maps clearer and addresses the variation of theta and alpha waves. The texts are presented below:
“Topographic maps of brain activity use colours to represent the intensity of electrical activity in different areas of the brain, measured in microvolts. Regions highlighted with warmer colours, such as red or orange, indicate more intense electrical activity compared to other areas. This visual representation makes it possible to intuitively identify the areas of the brain with the greatest activation of specific waves.”
“The activity of the theta wave is notoriously variable over time. This variation can be attributed to the different intensities and durations of stress that the individual experiences or to their level of fatigue. The reduction in cognitive capacity and performance associated with fatigue is often related to an increase in the overall power of the theta waves.
The alpha wave map shown in Figure 11 reveals the variation in their activity over time, which can be attributed to differences in stress and fatigue levels. In situations of greater stress, characterised by a lower degree of relaxation, alpha activity tends to decrease. On the other hand, fatigue can result in an increase in alpha activity, reflecting changes in the individual's mental state.”
R2: What about Delta?
A: We don't analyse the delta wave because the frequencies associated with these waves do not show significant activity in the expression of emotional states during signal analysis. We've added a paragraph to the ‘3. Results’ section that explains this better. The text is presented below:
“based on the analysis of Figure 7, it can be seen that the delta wave, whose frequency varies between 0.5 and 3 Hz, has a low or even non-existent amplitude in the PSD. This suggests that the frequencies associated with these waves do not show significant activity in the expression of emotional states during signal analysis. Further-more, delta waves are typically related to states of drowsiness, as discussed above. Given that the experiment was conducted with fully awake participants, analysis of the delta waves was not carried out, since their relevance in emotional contexts is limited under these conditions.”
R2: Between figure 13 through to 20 (not 209), there's a lot of 'decrease', 'increase' of activity, but with no support from any statistical analyses. It is unclear how the authors proceed with the interpretation of the graphs.
A: Thank you for your suggestion. We applied a statistical analysis to the topographic maps to corroborate the conclusions drawn. This topic is mentioned in section ‘2.2.5 Statistical analysis’, with the following text:
“In order to corroborate the conclusions drawn from the topographical maps, a statistical analysis was carried out using the Mann-Whitney test. This non-parametric test is suitable for assessing whether there are significant differences between two groups, without assuming that the data follows a specific distribution. The analysis was conducted in two stages, with the first centred on assessing the variation in frequency bands over time, and the second on comparing brain activity between the different frequency bands and brain regions. This approach provided a solid statistical basis for the interpretations of the variations in brain activity associated with the different emotional states analysed.”
R2: Tables 5 and 6: what are these numbers? What do they mean?
A: We have made changes to the text that precedes the table to include an explanation of what the values are. The text is shown below:
“The performance of the three machine learning algorithms (SVM, MLP and RF) in classifying valence and excitation using different combinations of frequency bands and brain regions was evaluated based on the average accuracy and the average F1 score, both accompanied by their respective standard deviations. The results obtained for the performance of the three algorithms for excitation and valence are shown in Table 5 and 6, respectively.”
Additionally, we have changed the legends of those tables in order to better understand the meaning of these numbers.
R2: Figures 21 and 22: once again, what am I looking at? What do these matrices tell me?
A: We have improved the introductory text of these Figures to make it clearer what they mean. The text is shown below:
“Lastly, based on the results obtained, confusion matrices were constructed for the algorithm with the best performance in classifying valence and excitation, considering each combination of frequency band and brain region. Figure 18 illustrates confusion matrices that assess the RF algorithm's performance in predicting valence labels for the theta wave in the central region of the brain, the beta wave in the frontal region, the gamma wave in the parietal region and the alpha wave in the occipital region. These matrices make it possible to visualise the performance of the classification model, categorising the predictions into true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN). Each of these categories is represented by a specific cell within the confusion matrix, making it easier to analyse the results of the model in detail.”
R2: I saw a few elements of discussion throughout the results section, but a proper discussion section is missing. The results need to be summarised and described/compared against the literature.
A: Thank you for the suggestion. We've separated the results of the discussion into separate chapters. In addition, we have added a subsection ‘5.2. Comparative study’ where we compare the results obtained in our study with the results of the studies described in chapter ‘1.5. Supporting Studies’.
Conclusions:
R2: "This study has provided a deeper understanding of the EEG, highlighting it as a valuable tool in the detection of emotions, offering significant insights into the brain activity 456 associated with different emotional states." I can't immediately say this after reading your results. You need to offer me a summary of the results first.
A: We have made some changes to chapter ‘5. Conclusions’, dividing it into two sub-chapters, including ‘5.1 Main contributions’ where we summarise the results obtained.
R2: I'm not sure how these results can be useful in the 'real world'.
A: Thank you for the insightful thought. An approach to classify emotions based on EEG has been proposed in the manuscript, along with an analysis of the most relevant electrodes or brain regions. This can help to address some of the problems mentioned in the introductory section. In a real-world context, the proposed approach can be implemented in a neurowearable device, such as the Muse or Emotiv Epoch. These systems are commercially available to the wide public and are designed to be used/worn daily for long periods of time. Emotion recognition systems can provide real-time information or period reports, helping to manage emotions or pathologies and allowing for immediate intervention.
Author Response File: Author Response.docx
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors proposed an approach for emotion recognition from EEG signals. The approach is based on the analysis of EEG signals and classification by using Machine Learning. After preprocessing, EEG signals are classified by using SVM, MLP and RF and predictions for different combinations of frequency bands and brain regions. The proposed approach has been assessed on an ECG dataset coming from Kaggle. A comparison with other methods from the state-of-the-art show that the proposed approach exhibits good accuracy and precision for gender classification of ECG signals.
The approach taken by the authors is sound. The paper is well structured and clear.
However, the authors must address the following points:
1. Perform the corrections indicated in the PDF file.
2. Add a clearer explanation about the extracted features. For instance, the feature extraction method, the number of features, the types of the features, etc.
3. Explain in more detail the DEAP dataset.
4. Explain in more detail the preprocessing of the EEG signals.
5. Explain in more detail the cross-validation performed. For instance, the name of the cross-validation method, the total number of samples in the training, testing and/validation sets, the number of classes, the total number of samples for each class, etc.
6. Add more details about the classifiers employed. For instance, for each classifier, hyperparameters, etc.
7. Add a table to compare the proposed approach with other approaches from the state of the art.
8. Add to the introduction a paragraph to state the contributions of the study.
9. Improve the quality of figures 4, 13-20.
10. Reference [3] is incomplete.
Comments for author File: Comments.pdf
Author Response
R3: Perform the corrections indicated in the PDF file.
A: Thank you for your comments. We have made all the corrections indicated.
R3: Add a clearer explanation about the extracted features. For instance, the feature extraction method, the number of features, the types of the features, etc.
A: We have added one section to the Methods, namely “2.2.3. Feature extraction”, where we explain this process in more detail.
R3: Explain in more detail the DEAP dataset.
A: Thank you for the suggestion. We’ve made changes to the section “2.1 Input data”, where we explain this dataset in more detail.
R3: Explain in more detail the preprocessing of the EEG signals.
A: We have added two sections to the Methods, namely “2.2.1 Data preparation and analysis” and “2.2.2 Irregularity Detection” to better explain this part.
R3: Explain in more detail the cross-validation performed. For instance, the name of the cross-validation method, the total number of samples in the training, testing and/validation sets, the number of classes, the total number of samples for each class, etc.
A: Thank you for your suggestion. We have added some information to the existing text to make it clearer, as shown below:
“In this study, a 5-fold cross-validation approach was used. This method divides the dataset into 5 equal parts (folds), ensuring that each fold serves as the test set exactly once while the remaining 4 folds are used for training the model. This process is repeated across 5 iterations, resulting in a comprehensive evaluation of the model’s performance. Cross-validation is crucial because it prevents the model from overfit-ting, i.e., memorizing the training data, and ensures it generalizes well to unseen data. A ratio of 70 % for training and 30 % for testing was chosen in order to balance the trade-off between learning and evaluation. Allocating 70 % of the data for training provides the model with a sufficient amount of information to learn from, while re-serving 30 % for testing ensures a reliable and representative evaluation of its performance. An inadequate split can lead to overfitting, where the model performs well on training data but fails to generalise to new data. Bearing in mind that when running the code several times, the results may change somewhat due to, for example, splitting the data into different training and test sets, the cross-validation process was repeated 10 times and then the average of the accuracy and F1 scores obtained were calculated, as well as the standard deviations of the results. This combined approach of cross-validation and repeated evaluations provides confidence in the reported results by ensuring they are not dependent on a single random split of the data. It reflects the model’s ability to generalize across diverse data configurations, which is essential for assessing its real-world applicability.”
R3: Add more details about the classifiers employed. For instance, for each classifier, hyperparameters, etc.
A: We have added the chapter ‘2.2.4. Emotion Classification Task’ where we present details about the classifiers used in the classification task.
R3: Add a table to compare the proposed approach with other approaches from the state of the art.
A: We have added a table with the comparison between the proposed method and related works to chapter “5.2. Comparative study”.
R3: Add to the introduction a paragraph to state the contributions of the study.
A: We think it makes more sense to mention the study's contributions in the conclusion. To this end, we have added a chapter to the conclusion entitled ‘5.1 Main contributions’ where we mention the various contributions of our study.
R3: Improve the quality of figures 4, 13-20.
A: We’ve done our best to improve the quality of these figures.
R3: Reference [3] is incomplete.
A: Thanks for the warning, we've already added the missing information.
Author Response File: Author Response.docx
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsMany thanks to the Authors for the revised version.
All the major issues were sufficiently addressed.
Author Response
Thank you so much for your comments.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors proposed an approach for emotion recognition from EEG signals. The approach is based on the analysis of EEG signals and classification by using Machine Learning. After preprocessing, EEG signals are classified by using SVM, MLP and RF and predictions for different combinations of frequency bands and brain regions. The proposed approach has been assessed on an ECG dataset coming from Kaggle. A comparison with other methods from the state-of-the-art show that the proposed approach exhibits good accuracy and precision for gender classification of ECG signals.
The approach taken by the authors is sound. The paper is well structured and clear.
The authors have attended to most of the suggestions. However, the authors must still address the following points:
1. Perform the corrections indicated in the PDF file.
2. Indicate the total number of samples in the training, testing and/validation sets, the number of classes, the total number of samples for each class.
3. Indicate the values of the hyperparameters of the SVM (C, kernel type, parameters of the kernel, size of the input, etc.). Indicate the hyperparameters of the MLP (number of layers, activation function of each layer, size of each layer, size of the input, etc.).
4. Add also the number of parameters used in each work to table 12.
5. Indicate the differences in the training/testing conditions of the works presented in table 12.
6. Improve the quality of figures 14-17.
Comments for author File: Comments.pdf
Author Response
R3: Perform the corrections indicated in the PDF file.
A: We have corrected the phrase to “…K-fold cross-validation was carried out.” on page 11.
R3: Indicate the total number of samples in the training, testing and/validation sets, the number of classes, the total number of samples for each class.
A: Thank you for noting this. Besides the general information about the DEAP dataset (32 informants and 40 minutes recordings for each one), already in section “2.1. Input data”, to further clarify the process that was followed, we have added details to section “2.2.3. Feature Extraction”, as here transcribed: “The EEG data was processed considering the full one minute recording dedicated to each independent video”, and to section “2.2.4. Emotion Classification Task” as follows “Based on the dataset annotation, where valence and arousal rating are comprised on a 1 to 9 scale, binarized labels were attributed, considering 1-4 as Low and 5-9 as High, for both variables. The distribution of High and Low classes across the dataset is ap-proximately balanced.”.
R3: Indicate the values of the hyperparameters of the SVM (C, kernel type, parameters of the kernel, size of the input, etc.). Indicate the hyperparameters of the MLP (number of layers, activation function of each layer, size of each layer, size of the input, etc.).
A: Thank you for the suggestion. Providing the SVM and MLP hyperparameters is an important information to ensure the reproducibility of the work. We have included this information in section 2.2.4 as follows: “The SVM classifier used was configured with a linear separation type (kernel), which allows the data to be divided with a single straight line. The regulation parameter (C), which controls the balance between the separation margin and the correct classification of the training points, was kept at the default value of 1.0. In addition, a random state of 42 was set to guarantee the reproducibility of the results and the probability estimation option was activated, providing additional information on the confidence of the predictions made. The input size is defined by the number of features in the dataset used during training.” and “In order for the model to learn efficiently, we used the Adam method, a stochastic gradient-based optimiser that dynamically adjusts the weights (strength of the connections between neurons) during training, making it more effective. In addition, the tanh (hyperbolic tangent) activation function was used in all the hidden layers, which helps to introduce non-linearities into the model, improving its ability to learn complex patterns. Another relevant adjustment was the alpha parameter, set to a value of 0.3, which helps to avoid overfitting the training data, ensuring that the model continues to perform well with new data. In addition, the model was trained for up to 400 iterations to improve the accuracy and quality of the model's predictions. As far as the architecture is concerned, the standard configuration was maintained, with a single hidden layer containing 100 neurons. The input size was automatically defined by the number of features present in the training data. Finally, the random state parameter was set to 42, ensuring reproducibility of results and consistency in training.”.
R3: Add also the number of parameters used in each work to table 12.
A: Thank you for the suggestion. We’ve added this information to table 12.
R3: Indicate the differences in the training/testing conditions of the works presented in table 12.
A: Thank you for the suggestion. We’ve added this information to table 12.
R3: Improve the quality of figures 14-17.
A: Figures 14-17 are automatically generated using the MNE Python package and we have little control over the included elements. In any case, we have tried to improve resolution and remove some of the superimposed text.
Author Response File: Author Response.pdf