Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Explainable Deep Kernel Learning for Interpretable Automatic Modulation Classification

Computers 2025, 14(9), 372; https://doi.org/10.3390/computers14090372

by Carlos Enrique Mosquera-Trujillo^*

, Juan Camilo Lugo-Rojas

, Diego Fabian Collazos-Huertas

, Andrés Marino Álvarez-Meza^*

and German Castellanos-Dominguez

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3:

Babu Baniya

Computers 2025, 14(9), 372; https://doi.org/10.3390/computers14090372

Submission received: 18 July 2025 / Revised: 27 August 2025 / Accepted: 2 September 2025 / Published: 5 September 2025

(This article belongs to the Special Issue AI in Complex Engineering Systems)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper introduces CRFFDT-Net as a deep kernel learning model which serves to optimize the trade-off between accuracy and model complexity and interpretability in AMC systems. The concept of integrating Random Fourier Features (RFF) into deep learning architecture is attractive in theory yet several basic flaws exist in the novelty and experimental rigor and depth of analysis of the approach. The following points must be addressed.

1. Novelty is overstated
The fundamental novel concept of CRFFSinCos relies on the use of a convolutional implementation for sine and cosine-based RFF mappings. The fundamental theory (Bochner’s theorem and RFF approximations) remains traditional and applying sine/cosine projection within a convolutional layer represents a minimal extension. The authors fail to demonstrate experimentally that the new method outperforms existing approaches including fixed RFF filters and conventional CNNs or phase-shifted RFF embeddings.
→ The study includes a comparison between CRFFSinCos and:
Phase-shifted RFF embeddings (Eq. 5)
Standard CNN with learnable filters
Your own prior architecture (as referenced in [31])
A lack of comparison between these approaches prevents the validation of the proposed architectural contribution.

2. Interpretability analysis is superficial
The title and abstract focus on interpretability yet the actual interpretability assessment relies on GradCAM++ heatmaps as its only method. The technique functions as a generic tool that works with any CNN-based architecture. The paper does not clarify how kernel approximation methods improve interpretability beyond visualization purposes. The study lacks quantitative measurements for evaluating interpretability (deletion/insertion scores, localization error) together with user studies or task-driven interpretability assessments.

→ Either:
Either implement a quantitative interpretability evaluation (fidelity, localization, consistency) or
State that the method works with interpretability tools even though it does not contribute to enhanced explainability as its main feature.

3. Dataset limitations and risk of overfitting
The model testing relies solely on the synthetic RadioML 2016.10A dataset through a random 70/30 train/test split. Two major problems arise from this evaluation approach.
The approach lacks evaluation of the model's ability to recognize modulation types beyond its training set and different channel environments.
The random method of data splitting creates a risk of overlapping signal characteristics between different channel conditions.
→ Include:
The model should validate across different SNR levels by training with high SNR data and testing with low SNR data.
Evaluation on at least one additional dataset (e.g., RadioML 2018.01A or a real-world RF dataset)
The system should conduct testing on modulation classes it has not seen before through a leave-one-out setup

4. Claims of low complexity lack runtime evidence
The authors support their argument about low model complexity through parameter count data (29K) yet omit important metrics including latency and throughput and energy consumption data. Models with fewer parameters (e.g., ULNN with 8.8K) have already been developed. The statement about CRFFDT-Net being suitable for embedded systems lacks experimental validation.

→ Report:
The paper should include measurement results for running the model on embedded devices such as Jetson Nano, Raspberry Pi or equivalent systems.
Memory usage
FLOPs (Floating Point Operations)
Measurement of energy usage per inference when available data exists

5. Thresholding module is overengineered without justification
The RSBU-based denoising block adds substantial complexity to the overall architecture. The paper fails to demonstrate how the denoising block improves performance when compared to basic denoising methods. The necessity of this component remains unclear.
→ Add a comparison of:
CRFFDT-Net with and without thresholding
CRFFDT-Net using simpler soft/hard thresholding
RSBU vs. static thresholds
The absence of such analysis makes the design choice appear arbitrary.

6. Evaluation metrics are incomplete
The paper reports only the average classification accuracy metric. The single evaluation metric does not meet the requirements for analyzing a multi-class problem that presents class imbalances between QAM64 and WBFM. The analysis fails to show that certain classes exhibit poor prediction even when the SNR is high according to the confusion matrices.
→ Report:
Precision, recall, and F1-score per class
Macro and weighted averages
Confidence intervals for overall accuracy

7. Reproducibility is not ensured
The paper indicates training on Kaggle with particular hardware yet it fails to provide code, trained weights or complete hyperparameter configurations.
→ Release:
Source code (e.g., via GitHub or supplementary materials)
The model's performance heavily depends on its hyperparameters which include learning rate and batch size as well as number of epochs and weight initialization parameters.
Trained weights or training logs

8. Writing and figure issues
The figures especially confusion matrices become difficult to read because of the selected font sizes and colors.
Some mathematical equations use different notation in the paper (e.g., softmax function written as fsoftmax, missing equation numbering).
The abbreviation CRFFDT-Net appears in different ways throughout the document (CRFFTD-Net in certain sections).
Some citations appear multiple times unnecessarily while others are placed in inappropriate positions throughout the text (e.g., [31], [40] not well defined).
The paper needs to enhance clarity and resolve all inconsistencies and achieve proper readability of all figures when printed in grayscale format.

9. Lack of Structural and Information-Theoretic Interpretability
GradCAM++ visualizations applied to convolutional layers form the sole basis of the paper’s interpretability analysis. This approach helps identify spatial saliency yet fails to provide an understanding of network information flow and learned representation differences across modulation types. Modern research demonstrates the power of uniting explainable AI (xAI) techniques with statistical dependency structures along with causal feature interactions to identify systematic relationships between input features.Recommended Citations

Kindermans et al. (2019) – The (Un)reliability of Saliency Methods
- Critically evaluates the limitations of gradient-based interpretability, such as GradCAM, and proposes more reliable alternatives. Highlights the need for robustness in interpretability claims.
Choi & Kim (2024)
- Demonstrates xAI applied to complex statistical structures; relevant for uncovering relationships between features beyond saliency maps.
Yeh et al. (2019) – On the (In)fidelity and Sensitivity of Explanations
- Introduces metrics (fidelity, sensitivity) to evaluate the faithfulness of interpretability methods—missing in this paper.
Schwab & Karlen (2019) – CXPlain: Causal Explanations for Model Interpretation under Uncertainty
- Proposes a causality-based attribution method to improve model transparency under noisy conditions—highly relevant for low-SNR settings.

APA-Style References
Kindermans, P.-J., Hooker, S., Adebayo, J., Alber, M., Schütt, K. T., Dähne, S., ... & Montavon, G. (2019). The (Un)reliability of saliency methods. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (pp. 267–280). Springer. https://doi.org/10.1007/978-3-030-28954-6_14
Choi, I., & Kim, W. C. (2024). Unlocking ETF price forecasting: Exploring the interconnections with statistical dependence-based graphs and xAI techniques. Knowledge-Based Systems, 112567. https://doi.org/10.1016/j.knosys.2024.112567
Yeh, C.-K., Hsieh, C.-Y., Suggala, A., Inouye, D. I., & Ravikumar, P. (2019). On the (in)fidelity and sensitivity of explanations. NeurIPS 2019, 10965–10976. https://papers.nips.cc/paper_files/paper/2019/file/92650b2e922f11737a0a33f985e97996-Paper.pdf
Schwab, P., & Karlen, W. (2019). CXPlain: Causal explanations for model interpretation under uncertainty. NeurIPS 2019, 10220–10230. https://papers.nips.cc/paper_files/paper/2019/file/b94e3bcef58b5e310cebc55c7d6c4d79-Paper.pdf

Conclusion
While the idea of combining RFF-based feature mapping and lightweight architectures is interesting, the current form of the paper does not justify the claims of novelty, interpretability, or efficiency. Extensive ablation studies, expanded evaluation, and clearer empirical justification are necessary.

Author Response

See the attached file.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors,
This manuscript introduces the Convolutional Random Fourier Features with Denoising Thresholding Network (CRFFDT-Net), a novel deep learning architecture for Automatic Modulation Classification (AMC). The work attempts to address the critical challenges of high computational cost, performance in low-SNR conditions, and the lack of interpretability in modern AMC models. The proposed CRFFSinCos layer is a technically interesting approach to creating a lightweight yet effective feature extractor. The experimental results show that the model achieves a competitive performance-to-complexity ratio, which I think is a valuable contribution to the field.

The manuscript is overall well-structured and addresses a relevant problem. However, I believe some revisions are required to strengthen the scientific claims, improve clarity, and fully validate the contributions before the paper can be accepted for publication. Please see my comments below.

1. Framing of the core contribution. The primary strength of CRFFDT-Net appears to be its excellent efficiency, achieving accuracy comparable to much larger models. The results show an average accuracy of ~62%, while MCLDNN achieves ~63%, and the statistical tests confirm no significant difference between your model and other top performers like MCLDNN and PET-CGDNN. The manuscript should be revised to more clearly frame its contribution around this accuracy-complexity trade-off. The current narrative sometimes implies superior classification accuracy, which is not strongly supported by the data. Highlighting that you achieve statistically comparable results with a fraction of the parameters is a more robust and equally impactful claim.

2. Depth of Interpretability Analysis. The inclusion of GradCAM++ for interpretability is a strong point. However, I feel that the current analysis is still preliminary.
The conclusion that the CRFFSinCos layer is the most influential is interesting but requires deeper investigation. What specific features does this layer learn? Why does it dominate the decision-making process compared to the subsequent CNN and GRU layers? The analysis should move beyond showing activation levels to providing more profound insights.
Also, a comprehensive ablation study is essential to validate the architectural choices. The manuscript would be strengthened by systematically evaluating the model's performance and complexity under different xonditions (such as replacing the CRFFSinCos layer with a standard convolutional layer, removing the threshold-denoising module, and removing the GRU layer). I believe this would provide some empirical evidence for the specific contribution of each component.

3. Novelty of CRFFSinCos. The paper claims novelty in developing a convolutional RFF layer based on the sine-cosine embedding, stating that prior work used a less optimal variant. This claim must be defended more rigorously. Please expand the related work section to include a more detailed comparison with existing research on kernel approximations in convolutional networks to clearly delineate and establish the novelty of your specific implementation.

Author Response

See the attached file.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

In dataset description (subsection 3.1), is it 1,000 signals per modulation or 1.000 (i.e., 1)?

In Figure 6, authors reported the accuracy vs number of parameters. The maximum accuracy lower 60. However, in confusion matrix (where diagonal elements represented correctly classified samples), the accuracy is close to 90%. Can you please explain about these?

Each row in confusion matrix represents 1 class (it seems that total sum is equal to 100). Please check all 9-confusion matrix whether the sum of each row equal to 100 or not.

It is not clear about k-fold, can you please add the k value in revised version?

One more, author used many hyphens (it looks like double hyphens), verify the reason for this kind of formatting.

Author Response

See the attached file.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I have carefully reviewed the revised manuscript. The authors have addressed all of my previous comments and concerns thoroughly and thoughtfully. The revisions have significantly improved the clarity, rigor, and overall quality of the manuscript.

I am satisfied with the current version and have no further concerns. Therefore, I recommend the manuscript for acceptance in its present form.

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors,

thank you very much for your revisions and your replies. Overall, I am satisfied with them.

Thank you for the interesting research, and I wish you good luck.

Article Menu

Explainable Deep Kernel Learning for Interpretable Automatic Modulation Classification

Further Information

Guidelines

MDPI Initiatives

Follow MDPI