Explainable Multi-Frequency Long-Term Spectrum Prediction Based on GC-CNN-LSTM
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe core models are established, but the novelty lies in the specific method of improving Grad-CAM to generate a global time-frequency heatmap for a regression task and subsequently using it for two-dimensional (time and frequency) input data optimization. This is a clever and practical application of XAI. The paper provides clear evidence of practical benefit, showing a 6.22% reduction in RMSE. While not tested in a live industrial system, the method provides a tangible pathway for engineers to optimize complex models for cognitive radio, enhancing both performance and trust. The proposed methodology is highly generalizable. The approach of generating a global heatmap from a time-series model to guide feature selection could be applied to any domain involving time-frequency data, such as audio signal processing (spectrograms), medical signal analysis (EEG), or seismic data analysis. The authors could briefly mention this potential in the conclusion. Claims are well-supported by simulation results and comparisons with multiple baseline models. The head-to-head comparison of the proposed optimization method against a SHAP-based one is a strength. The soundness could be improved by providing more justification for the stated limitations of SHAP in this context (as mentioned in Section 1 comments).
Overall, this is a well-structured and interesting paper that addresses the important problem of interpretability in deep learning models for spectrum prediction. The proposed method of using an improved Grad-CAM for two-dimensional feature selection is novel and the results are promising.
Along with the postive parts, I recommend adding a new "Discussion" section to the manuscript. This section would provide a valuable opportunity to contextualize the findings, frankly discuss the limitations and broader impact of the work, and outline promising directions for future research.
Please find specific comments in the attached file.
Comments for author File: Comments.pdf
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsDear authors, in the next paragraphs, my comments about your manuscript.
The article tackles a crucial spectrum management problem in wireless networks with an actual application in Cognitive Radio, following current research. The CNN-LSTM integration is well motivated in capturing the spatio-temporal patterns, while the incorporation of an enhanced Grad-CAM for interpretation at the time-frequency domain is novel. Apart from the reference methods considered, including LSTM, GRU, CNN, Transformer, CNN-LSTM-Attention, the advantages of the proposed method can be thus put into context. The modeling, architecture, and evaluation metrics section provides a sufficient amount of detail, with equations and explanations allowing for replication. The comparison of SHAP and GC-Grad-CAM is well thought-out: the restrictions of the former versus the benefits of the latter are presented. Clear percentage gains in RMSE and MAPE are presented supported by systematic experimentation, thus giving clear support to the proposal. Use of Electrosense actual data increases the credibility and application of the findings.
Section 4.1: One pure CNN-LSTM outscored LSTM, GRU, and CNN, thereby confirming the advantage of the hybrid approach in capturing spatiotemporal patterns. Transformers kind of came close in terms of MAPE but had higher RMSE, thus showing inconsistency in their predictions. CNN-LSTM-Attention provided minor improvements over the CNN-LSTM at the expense of increased model complexity, proving that attention did not result in enough benefits for this task.
Section 4.2.1: In terms of frequency selections, SHAP slightly helped reduce MAPE by up to 3.55% but almost did not impact RMSE by 0.86%, showing the limitation of this approach for this problem. A lack of temporal explanations is critical as it bars the ability to identify useful temporal patterns.
Section 4.2.2: Global heatmaps made it possible to visualize simultaneous time-and-frequency relevance, enabling the selective discarding of less-relevant temporal segments and optimal frequency selection. The best configuration (45 frequencies + 6 less relevant temporal segments removed) reduced the RMSE and MAPE by 6.22% and 4.25%, respectively, over base CNN-LSTM
Points to Discuss for Improvement
- The authors test for time interval differences nor for band and other noise conditions that would acceptably fight for the robustness of the model under unfavorable conditions.
- While percentage differences are provided, no statistical analyses (e.g., t-test, or ANOVA) are performed to assert the gains as statistically significant.
- Computational costs of this approach are not discussed: Considerations surrounding training time, GPU consumption, memory consumption, and execution of the system on low-resource configurations.
- The application is only validated with the dataset of one kind of spectrum; it would be interesting to test application performance in other bands or contexts to verify the generalization ability.
- Even though frequency and time resolutions were chosen, no isolated analysis was performed regarding the impact of each component (CNN, LSTM, Grad-CAM) on the overall performance.
- The possible causes behind the differences between methods should be investigated further, such as why attention did not provide clear improvements over a straightforward CNN-LSTM.
- Some plots (e.g., the global heatmap) could be provided with numerical scales and richer captions coupled with concrete examples of interpretations, so less experienced readers can better grasp the content.
- The conclusion section could be enriched with a critical view of limitations and concrete suggestions on future work directions, including integration into real spectrum sensing systems.
- This model was simulated and evaluated with data from a certain range and with stable conditions. It has not been tested for other spectral environments, noisier bands, different sampling resolutions, or mobility scenarios. So it remains to be evaluated if the approach keeps its performance under low SNR or with real disturbances such as interference and sudden variations.
- We do not find, in any work, any analysis of statistical significance (e.g., p-values, confidence intervals) to demonstrate that the improvements observed are not due merely to random variation.
- There are no studies conducted to measure the impact of each of the components (CNN, LSTM, Grad-CAM) on final improvement, nor to compare the improved Grad-CAM with the original one to quantify improvements.
- No measurements on training/inference time, memory consumption, or minimum hardware requirements are given; hence, there is no discussion on the feasibility of running such user applications on edge systems or devices with constrained processing capacity and power.
- Cases where the model fails or exhibits higher error have not been studied; also heatmap patterns are not correlated with real events in the spectrum-happening at certain channels, for instance.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThanks for providing the revision, yet following concerns remain:
1. The discussion is overly generic. It could be made more specific to the context of the results you have achieved and connect with some references.
2. "The Figure 6 shows the top 30 frequencies" -> Incorrect reference. This should refer to Figure 7. Please correct the figure number.
3. "The software platform utilized is PyCharm Community Edition 2024." -> PyCharm is not the "software platform" and this level of detail is unnecessary and not standard for academic papers. Recommend removing this sentence.
4. The text says the Transformer model has a performance close to CNN-LSTM in MAPE, but the RMSE of CNN-LSTM is lower (better), not higher. Please correct this. The value in Table 1 for CNN-LSTM is 1.9932 vs Transformer's 2.0753.
See some specific comments in the attached PDF
Comments for author File: Comments.pdf
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors justified and added the comments I indicated.
Author Response
Thank you very much for your comments, which are of great benefit to the improvement of our work.
Round 3
Reviewer 1 Report
Comments and Suggestions for Authors1. Crucial details about the neural network architectures were still missing. A table should be provided listing the specific hyperparameters for all models tested (LSTM, GRU, CNN, Transformer, CNN-LSTM, CNN-LSTM-Attention), including: number of layers, number of units/filters, kernel sizes, activation functions, dropout rates, learning rate, optimizer, batch size, etc. This is a major omission for reproducibility, and can be listed as a tabel in the Appendix.
2. The font size of the figure should be consistant.
3. The paper should briefly describe how the SHAP analysis was conducted. Which explainer was used (e.g., KernelExplainer, DeepExplainer)? What was the background dataset? These details are important for understanding the results in Section 4.2.1.
4. To improve transparency and reproducibility, the authors should include a statement on the availability of the source code used for the models and analyses.
5. Specific comments attached.
Comments for author File: Comments.pdf
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 4
Reviewer 1 Report
Comments and Suggestions for AuthorsThanks for revising, now this manuscript is acceptable.