Equivariant Transition Matrices for Explainable Deep Learning: A Lie Group Linearization Approach
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsIn this manuscript, the authors propose Equivariant Transition Matrices, a post-hoc explainability framework designed to improve the structural consistency of explanations produced for deep learning models. The method introduces equivariance constraints derived from Lie group linearization into the construction of transition matrices that map latent deep features to interpretable features. By incorporating Lie algebra generators and enforcing an approximate intertwining condition between feature representations, the proposed framework seeks to ensure that explanations remain stable under geometric transformations such as rotations. The optimization problem is formulated as a convex least-squares system solved via SVD. The approach is validated on both synthetic datasets and the MNIST benchmark, demonstrating substantial reductions in symmetry defect while maintaining comparable reconstruction quality. The topic is timely and relevant, as ensuring the stability and reliability of explanations in deep learning models is an increasingly important requirement for deployment in regulated or safety-critical domains.
However, this reviewer finds that while the study presents an interesting conceptual integration between geometric deep learning and explainable AI, several aspects of the work require clarification and strengthening before the manuscript can be considered for publication. The following points should therefore be addressed:
- the authors should more clearly position the novelty of the proposed ETM approach relative to existing work on equivariant neural networks and geometric deep learning. While the paper emphasizes that the method operates in a post-hoc setting, the conceptual relationship between the proposed transition matrices and existing equivariant architectures or Jacobian regularization techniques remains insufficiently discussed.
- The manuscript does not sufficiently discuss the transferability of the trained machine learning models across geographic regions or varying agro-ecological zones. Since soil properties and climatic variables differ substantially across landscapes, it would be useful to evaluate the model on independent spatial domains or conduct domain adaptation analysis to assess robustness beyond the training distribution.
- the theoretical assumptions underlying the method require further discussion. The approach assumes that both the formal model feature space and the interpretable feature space exhibit approximate equivariance under the same Lie group actions. In practical applications, however, this assumption may not hold, especially when interpretable features are derived from heuristic or domain-specific descriptors. The authors should discuss the robustness of the method when these assumptions are violated and provide guidance on how suitable symmetry groups should be selected in real-world settings.
- The manuscript does not specify whether all environmental predictors were temporally aligned. For example, if satellite-derived variables and climate datasets originate from different years or aggregation periods, this could introduce temporal mismatch bias in the modeling process. The authors should clarify the temporal consistency of all input layers.
- While the manuscript reports prediction accuracy metrics, it lacks quantification of predictive uncertainty. Incorporating uncertainty estimates (e.g., prediction intervals, ensemble variance, or Monte Carlo simulations) would significantly improve the interpretability and reliability of the resulting spatial predictions.
- The estimation procedure for Lie algebra generators using finite differences and dimensionality reduction requires additional justification. In the synthetic experiment, the authors rely on multidimensional scaling to embed high-dimensional features into a 2D space before applying rotations. While this approach is creative, it may introduce distortions that affect the estimation of the generators. A more detailed analysis of the potential bias introduced by this embedding step and its impact on the resulting transition matrices would improve the methodological rigor of the study.
- Several of the environmental predictors (e.g., climatic variables, vegetation indices, and soil proxies) are likely to exhibit strong multicollinearity. The authors should clarify whether variance inflation factor analysis, correlation filtering, or dimensionality reduction techniques were applied prior to model training to mitigate redundancy and prevent inflated feature importance estimates.
- Although the experiments demonstrate large reductions in symmetry defect, the evaluation is relatively limited in scope. The MNIST experiment focuses primarily on rotation transformations and image reconstruction from latent features. It would be beneficial to include additional experiments involving other types of transformations or more complex datasets to demonstrate the generality of the approach.
- The practical implications of the proposed framework for explainability remain somewhat unclear. While the reduction in symmetry defect is quantitatively impressive, the manuscript does not provide qualitative examples illustrating how ETM improves the interpretability or usability of explanations for human users. Including visualization examples or case studies showing how the explanations behave under transformations would significantly strengthen the practical relevance of the work.
- The description of hyperparameter tuning for the machine learning models is limited. It is unclear whether the authors employed grid search, randomized search, Bayesian optimization, or cross-validated tuning frameworks. Additionally, the search space, evaluation metric, and stopping criteria should be explicitly reported to ensure reproducibility.
- The computational complexity and scalability of the approach deserve further discussion. The formulation involving Kronecker products and SVD may become computationally demanding for large feature spaces. Although the authors mention the use of iterative solvers such as LSQR, a more explicit analysis of runtime, memory requirements, and scalability to modern deep learning architectures would be valuable.
- Several aspects of the manuscript could benefit from improved clarity and presentation. Some sections of the methodology contain dense mathematical derivations that may be difficult for readers outside the geometric deep learning community. Providing clearer intuitive explanations alongside the formal derivations would improve accessibility. Additionally, minor language and formatting issues should be addressed during revision.
- For transparency and reproducibility, the manuscript should specify whether the modeling workflow, scripts, and preprocessing procedures will be made publicly available (e.g., via GitHub or a data repository). Providing the computational pipeline would significantly enhance the scientific value of the study.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe traditional transition matrix maps deep features to interpretable features solely by minimizing reconstruction error, which fails to guarantee consistency of explanations under geometric transformations. To address this, the authors linearize the Lie group action within feature spaces, establish an intertwining condition via Lie algebra generators, and jointly optimize a fidelity objective with an equivariance regularization term, solved through singular value decomposition. Although the authors validate the proposed method across multiple datasets, several issues remain, as detailed below:
-
The experiments are conducted only on synthetic data (15 samples) and MNIST, resulting in limited data scale and insufficient diversity of application scenarios.
-
The current work validates only the one-dimensional continuous group SO(2), yet the authors provide no empirical or theoretical evidence to support the scalability of the proposed method to more general group structures.
-
The paper primarily compares against the authors' own earlier work, lacking horizontal quantitative comparisons with other post-hoc explanation methods that incorporate structural constraints.
-
It remains unclear whether the proposed method is applicable to equivariance research on 3D point clouds. The authors are encouraged to cite the following works [1][2] and provide a dedicated discussion on the scalability of ETM to 3D point cloud scenarios. [1] Point clouds meets physics: Dynamic acoustic field fitting network for point cloud understanding. [2] Gpsformer: A global perception and local structure fitting-based transformer for point cloud understanding.
-
The selection of λ=0.5 is based on validation set performance; however, no principled or automated strategy for determining λ is provided for practical deployment scenarios where labeled data may not be available.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe work discuss the explainable AI (XAI) and geometric deep learning. The authors introduce Equivariant Transition Matrices (ETM) as a post‑hoc mechanism that enforces Lie‑group‑based symmetry constraints on explanations derived from deep neural networks.
I have only a point which could be imporved: The paper is very dense, with mathematical details. An overview picture with the flow of the method in the introduction would improve the readability of the paper.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI would like to thank the authors for their responses to my comments and suggestions. From my point of view, the paper is well-prepared and suitable for publication
Reviewer 2 Report
Comments and Suggestions for AuthorsThe author solved all my problems, and I recommend accepting the manuscript.

