Multimodal Explainability Using Class Activation Maps and Canonical Correlation for MI-EEG Deep Learning Classification
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors"Multimodal Explainability using Class Activation Maps and Canonical Correlation for MI-EEG Deep Learning Classification".
One significant disadvantage of the proposed framework is its reliance on subject-dependent models. The authors primarily focus on evaluating deep learning models for subject-dependent MI-EEG discrimination, as stated in Section 3.2. While this approach is effective for individual subjects, it does not address the challenge of inter-subject variability, which is a critical issue in the field of EEG analysis. The authors acknowledge this limitation in the Introduction and Section 2, mentioning that differences in genetic, cognitive, and neurodevelopmental factors cause the same task or stimuli to evoke distinct brain patterns across individuals. However, the proposed framework does not fully address this challenge. Developing subject-independent models that can generalize across different individuals remains a significant challenge and is not fully explored by the authors. To improve the robustness and generalizability of the framework, the authors should consider incorporating techniques that account for inter-subject variability, such as transfer learning or domain adaptation methods. This would enhance the practical applicability of the framework in real-world scenarios where subject-independent models are crucial.
The authors state that simpler models, such as ShallowConvNet, show better performance in MI-EEG classification compared to more complex architectures like DeepConvNet. However, they also note that DeepConvNet struggles with classifying good MI subjects and performs poorly overall. This issue is discussed in sections 4.1 and 4.2, where the authors present the classification results and explainability of MI-EEG. The authors should revisit these sections and reframe them to emphasize that more complex models may suffer from overfitting, especially when dealing with the high variability and noise inherent in EEG data. They should also consider including
regularization methods or reducing the number of parameters in more complex models to mitigate overfitting and improve performance.
The authors state that Class Activation Maps (CAMs) are a powerful tool for interpretability in MI-EEG classification. However, they acknowledge that CAMs can generate spatially noisy maps, as seen with the TCFusion model in Figure 15. This noise complicates the interpretation of the results and may not accurately reflect the underlying neural processes. The effectiveness of CAMs in capturing relevant EEG features can vary depending on the specific model and the quality of the EEG data. This issue is particularly evident in sections 4.2 and 4.3, where the authors discuss the explainability of MI-EEG classification and the use of CAMs. The authors should revisit these sections and provide more detailed explanations of how the noise in CAMs affects the interpretation of the results. They should also consider including additional techniques or methods to mitigate the noise and improve the clarity of the insights provided by CAMs.
The authors present an innovative approach by integrating questionnaire data through the QMIP-CCA framework to enhance the interpretability of MI-EEG classification. However, the effectiveness of this integration heavily relies on the availability and quality of the questionnaire data. The authors selected questions based on entropy, as described in Section 3.1, but the relevance and completeness of the questionnaire data can significantly impact the results. If the questionnaire data is incomplete or not well-aligned with the EEG data, the correlation analysis may not yield meaningful insights. This dependency on external data sources introduces an additional layer of complexity and potential variability in the analysis. The authors should revisit Section 3.4, where the QMIP-CCA framework is discussed, and provide more detailed explanations on how the questionnaire data is selected and integrated. They should also consider including additional validation steps to ensure the completeness and alignment
of the questionnaire data with the EEG data to mitigate the potential variability in the analysis.
The authors present a Multimodal and Explainable Deep Learning (MEDL) framework for MI-EEG classification, which involves evaluating different deep learning models and using Class Activation Maps (CAMs) to highlight relevant features. However, the proposed approach requires a large amount of labeled data to train these models, which can be a significant challenge in the field of MI-EEG classification. Collecting labeled data for MI-EEG classification is time-consuming and expensive, often requiring specialized equipment and expertise. This limitation is particularly relevant to Section 3.2, where the authors discuss the subject-dependent MI-EEG classification using deep learning. The authors should address this issue by discussing the feasibility of collecting such data and the potential impact on the applicability of their approach in different research settings or applications. Additionally, they should consider exploring methods to reduce the dependency on large labeled datasets, such as data augmentation techniques or semi-supervised learning approaches, to enhance the practicality of their framework.
One of the key advantages of the proposed approach is its ability to evaluate different deep learning models for subject-dependent MI-EEG discrimination. The authors tested various architectures, including EEGNet, KREEGNet, KCS-FCNet, ShallowConvNet, DeepConvNet, and TCFusionNet, and found that shallow networks, such as ShallowConvNet, achieved acceptable MI discrimination results. This highlights the effectiveness of simpler models in handling noisy EEG data, which is a common challenge in this field.
Another significant advantage is the use of CAM-based methods to visualize and quantify relevant MI-EEG features. The authors successfully demonstrated that CAMs can identify spatio-frequency patterns in the data, which are crucial for decision-making in motor imagery classification. This visualization technique provides valuable insights into the neural processes involved in motor imagery and helps researchers better understand the underlying mechanisms.
Moreover, the introduction of the Questionnaire-MI Performance Canonical Correlation Analysis (QMIP-CCA) framework is a notable contribution. This framework allows for the correlation of physiological data with MI-EEG performance, offering an enhanced, interpretable solution for Brain-Computer Interfaces (BCIs). By integrating questionnaire data with EEG signals, the authors were able to identify important physiological and cognitive factors that influence the performance of MI-EEG classification models. This multimodal approach provides a more comprehensive understanding of the factors affecting motor imagery and can potentially lead to more accurate and personalized BCIs.
While the paper presents a compelling approach to enhance the classification and interpretability of Motor Imagery (MI) Electroencephalography (EEG) data using deep learning techniques, there are several disadvantages related to the proposed methodology within the context of MI-EEG analysis.
1. One significant disadvantage is the reliance on subject-dependent models. The authors primarily focus on evaluating deep learning models for subject-dependent MI-EEG discrimination. This approach, while effective for individual subjects, does not address the challenge of inter-subject variability, which is a critical issue in the field of EEG analysis. Developing subject-independent models that can generalize across different individuals remains a significant challenge and is not fully addressed by the proposed framework.
2. Another limitation is the potential overfitting of deeper networks, as observed with models like DeepConvNet. The authors note that DeepConvNet struggles to classify good MI subjects and performs poorly compared to simpler models like ShallowConvNet. This suggests that more complex architectures may not always be beneficial for MI-EEG classification due to the risk of overfitting, especially when dealing with the high variability and noise inherent in EEG data.
3. Furthermore, the use of Class Activation Maps (CAMs) for interpretability, while powerful, may not always provide clear and actionable insights. The authors acknowledge that CAMs can generate spatially noisy maps, as seen with the TCFusion model. This noise can complicate the interpretation of the results and may not always accurately reflect the underlying neural processes. Additionally, the effectiveness of CAMs in capturing relevant EEG features may vary depending on the specific model and the quality of the EEG data.
4. Additionally, the integration of questionnaire data through the QMIP-CCA framework, while innovative, relies on the availability and quality of such data. The authors selected questions based on entropy, but the relevance and completeness of the questionnaire data can significantly impact the results. If the questionnaire data is incomplete or not well-aligned with the EEG data, the correlation analysis may not yield meaningful insights. This dependency on external data sources introduces an additional layer of complexity and potential variability in the analysis.
5. Lastly, the proposed approach requires a large amount of labeled data to train the deep learning models, which can be a challenge in the field of MI-EEG classification. Collecting labeled data for MI-EEG classification can be time-consuming and expensive, and may require specialized equipment and expertise. This can limit the applicability of the proposed approach to certain research settings or applications.
In conclusion, the proposed MEDL framework represents a significant advancement in the field of MI-EEG classification. By combining deep learning techniques with explainable methods like CAMs and CCA, the authors have developed a robust and interpretable approach that addresses the limitations of traditional EEG analysis methods. The findings of this study have important implications for the development of more effective and user-friendly BCIs, and the authors' work paves the way for future research in this area.
Comments for author File: Comments.pdf
Author Response
See attached pdf
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors demonstrate the performance of a classification task using several deep learning methods. The task is about the binary classification from multi-channel EEG data where the stimuli are either ‘left hand’ or ‘right hand’, and the subjects just think about their behaviors without any movement of their hands, by following the stimuli. The deep learning classifiers are trained and tested in a subject-based manner. A correlation study was performed by comparing the DL performance of the testing with the subject’s questionnaire on their judgment of their covert behavior. Moreover, the authors seek to demonstrate the effectiveness of the CAM-enhanced method by comparing it with the traditional non-CAM based image classification. The content of the paper is interesting, but in general it is very difficult to understand the methods used in the paper. It needs to be more clearly rewritten or rephrased with improved illustrations with figures.
1. It seems that an image of (C x tau) dimensions was generated from EEG data. It is not clear what images are input to the CNN models. Did you use any pre-processing steps to arrive at the image data from EEG data? If so, please describe it in detail. Please add a figure and show the images that are input to the CNN models. Also, please add a figure that shows the images with their heat maps overlaid to them using Grad-CAM.
2. Figure 3. Please clearly explain how to calculate the entropy for each question number. It may be helpful to add a reference for it.
3. Figure 4 is hard to understand. It can be improved using better visualization of the network architectures.
4. Figure A1-A2. There are three subjects in each figure, but the caption says “… for two subjects …”.
5. Figure 12-15. They show topomaps. It is not clear how to generate the topomaps. It is also not clear how to interpret the results.
6. Table 2. The authors compare ACC vs. CAM-enhanced ACC. Please explain clearly the CAM-enhanced method. The CAM-enhanced method seems to be described in section 3.3., but it is difficult to understand the concept.
7. Equations 16-19. The notations in the equations are very complicated. They should be replaced with more simple Greek letters.
8. The numbers of samples for training/validation/test should be shown in the paper.
Comments on the Quality of English Language
I think that the rephrasing of the sentences for clarity is more important than the simple grammar checking/correction. The paper can be improved with better writing and clearer explanation.
Author Response
See attached file
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper introduces class activation maps and canonical correlation analysis. While the theme is interesting, the manuscript lacks essential technical details.
Firstly, the document does not mention any preprocessing of EEG data—a significant flaw. Using raw EEG signals without preprocessing is generally unacceptable, as the authors themselves acknowledge EEG’s vulnerability to noise in the abstract. I recommend adding a new subsection dedicated to EEG preprocessing within the Methods section. This section should outline all relevant steps chronologically and without omissions.
It appears that the paper references a dataset but does not clarify whether the authors adopted its preprocessing steps, as I had initially assumed. Regardless, it is essential to disclose all preprocessing steps. If you followed the preprocessing outlined in the dataset paper, simply state this and describe each step. In addition, the authors of the dataset paper did not specify whether the initial reference was included when computing the average for re-referencing. You should explicitly describe your re-referencing approach and confirm the full rank of the data, as re-referencing impacts data rank [1]. Without this information, the study cannot be replicated, and rank-deficient data would undermine its validity. If you included the initial reference in the average, you can simply state: 'We re-referenced to the average after including the initial reference, ensuring the data's full rank [citation].' Otherwise, provide a detailed description of your approach and your confirmation of the full data rank.
[1] https://doi.org/10.3389/frsip.2023.1064138
Another significant issue is the absence of a comparison with traditional methods like CSP for EEG-based classification. CSP and its variants are standard benchmarks in EEG analysis, particularly for motor imagery tasks. Including a comparison with CSP would establish a baseline to evaluate the effectiveness of the deep learning models used. For this comparison, ensuring full data rank is crucial.
The results were not validated using appropriate statistical methods. All reported findings in the paper require validation. General statements about performance being 'good' or 'bad' are insufficient. Statistical tests should be applied to all comparisons, particularly those shown in the figures from Figure 6 onward. Report these statistical results in both the main text and figure captions.
I can’t find a TRUE discussion. While the authors named the section 4 ‘results and discussion’, it just reports results without interpretation or context. I strongly suggest separating the results and discussion into distinct sections. A discussion should interpret findings, compare them with existing work, and address limitations and future research directions. Furthermore, this discussion lacks citations to relevant literature, leaving it unsupported by prior studies. Without references, it fails to substantiate claims, position the proposed method meaningfully, or demonstrate how it advances the field. Addressing these gaps will substantially strengthen the discussion
Author Response
See attached pdf
Author Response File: Author Response.pdf
Round 2
Reviewer 3 Report
Comments and Suggestions for AuthorsI appreciate the authors’ sincere response. However, some issue has not been fully addressed.
Response. Thank you for your valuable comment. To ensure comparability of results and highlight the benefits of deep learning (DL) for motor imagery EEG (MI-EEG) analysis, we have adopted a straightforward average referencing approach. This decision is aligned with our goal of emphasizing the DL model’s ability to process data with minimal reliance on conventional preprocessing techniques, thereby preserving the integrity of the "raw" data inputs—a practice increasingly common in modern DL methodologies. Then, Section 3.1 was updated as:
ð It seems like the authors misunderstood my intention. I encourage the authors to revisit the points raised and consult the referenced paper, which clarifies the issue. My primary concern is about reproducibility. Kindly review my initial comment to address these points in more detail.
ð "a straightforward average referencing approach" does not clarify whether the initial reference was included. This detail is crucial because it determines the data rank, a fundamental quality for EEG analyses. Please specify whether the initial reference was included when computing the average, as this affects BOTH data rank and reproducibility.
ð Again, two aspects require explicit clarification with (1) Reproducibility of the study (you should reveal whether or not you included the initial reference when computing the average. If you don’t disclose this, no one can reproduce your study properly) and (2) data rank (you should confirm a full rank of the data in general). Revisit the paper I suggested, which explains how different re-referencing to the average affects data rank. If you included the initial reference in the average, that process wouldn’t affect the data rank, addressing both aspects concurrently, and you can simply state: 'We re-referenced to the average after including the initial reference, ensuring the data's full rank [citation].' Otherwise, provide a detailed description and your confirmation of the full data rank.
[1] https://doi.org/10.3389/frsip.2023.1064138
Author Response
See attached pdf
Author Response File: Author Response.pdf
Round 3
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors successfully addressed the issue.