Next Article in Journal
A Dynamic Hypergraph-Based Encoder–Decoder Risk Model for Longitudinal Predictions of Knee Osteoarthritis Progression
Previous Article in Journal
A Novel Prediction Model for Multimodal Medical Data Based on Graph Neural Networks
 
 
Article
Peer-Review Record

Geometric Reasoning in the Embedding Space

Mach. Learn. Knowl. Extr. 2025, 7(3), 93; https://doi.org/10.3390/make7030093
by David Mojžíšek 1,*, Jan Hůla 2, Jiří Janeček 1, David Herel 2 and Mikoláš Janota 2
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Reviewer 4: Anonymous
Mach. Learn. Knowl. Extr. 2025, 7(3), 93; https://doi.org/10.3390/make7030093
Submission received: 24 June 2025 / Revised: 20 August 2025 / Accepted: 27 August 2025 / Published: 2 September 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The aim of this paper is to study how neural networks represent spatial relationships. The authors investigate two types of architectures: graph neural networks (GNN) and autoregressive transformers. They conduct a series of experiments on constraint satisfaction problems (CSP) motivated by elementary geometry to evaluate the performance of both models from several points of view. They find that GNNs are more suitable to the given problem and they scale more effectively to larger problems. 

The main question addressed by the research is how purpose-built geometric CSPs are solved by GNNs and autoregressive transformers. The authors describe a simple CSP language with five type of constraints to define:
- midpoints of line segments,
- axis of symmetry between to points,
- squares formed by four points,
- translation between two vectors, and
- fixing certain points in the 2D integer grid.

In this way, simple geometric problems can be defined, where some points are fixed, and the rest can be iteratively resolved by the constraints. The aim of the authors is to reveal how this deductive process works in case of the above mentioned learning tools. 

The topic of the research is new. Former works shortly described by the authors used large-language models or visual diagram processing for geomteric reasoning. The presented approach -- analyzing geometric reasoning by sloving CSPs --provides insights into how neural networks develop spatial understanding. 

The authors performed an extensive experiment to investigate different aspects of the problem, and these are nicely presented in the figures and tables. Since many results are put in the Appendix, I found the experimental part a bit hard to read. The authors should consider to reorganize the structure of the paper for better readability (when the layout of the journal allows).  

The references are appropriate. 

I find the paper very interesting, as it provides valuable insights into how neural networks perform in reasoning tasks. The technical quality of the paper is good, the objective is clear, the methods are well-presented, the results are convincing. I recommend accepting the paper. 

The text needs some minor corrections:

Lines 180, 401, 558 & 603: The word "Section" is missing.
Line 216: "to the option"
Line 310: "This model ... to its performance ..."
Line 512: Missing full stop

 

Author Response

Thank you for reviewing our manuscript. We have addressed the listed points in the revised version. We appreciate your time and positive feedback.

Reviewer 2 Report

Comments and Suggestions for Authors

My major concern with this paper is its organization. If a reader just wants to know the work at the general level then this organization is fine.  If a reader wants to understand  exactly what was done then a reader needs to comb back and force  multiple times between the paper and appendixes to find different pieces and combine them.

I feel that the paper needs to be reorganized with one clear example presented in the paper itself in detail. It should be a small example, say, only with one square and one translation and a few points. It can be an example presenting in Figure 1 with details.  The example should exactly demonstrate how the embeddings are built with 128 dimensions. The exact structure of the initial embedding needs to be presented (both with random and rotation used) in this example.  The graph of the dependencies between used constraints needs to be presented exactly. The way dependencies are combined with the updating of embeddings should be shown too in the example.

The formal mathematical part with formulas (1) and (2) should be fully specified describing their components.   

I  also feel that the value of the visualizations with PCA and UMAP needs to be clarified. The black-box GNN only approximates somehow actual geometric properties of squares and others. A useful visualization would help to explain how this black-box approximation  process captures these geometric properties.  Expanding what is shown in Figure 6 can be a way to conduct it.  The value of PCA and UMAP for this needs to be clarified.

Below are some details about the content.

Lines 182-183. Please define how the embedding layer that shares weights with the final classification layer was defined.

Lines187-189.  Please define n, m and d here just after (1) and (2) . The value d as the embedding dimension is only defined later on line 213.  The values n and m as the number of variables and  constraints  are defined in line 176.  Please clearly state that Xt  is the embedding of the variable and Zt is the embedding of the constraints.  Explain the meaning of matrixes AX, AC, define Fx and Fc with explaining their meaning.

Lines 198- 200.  Please define how “each embedding of a variable (row in the matrix Xt)”  is initiated.  It is presented later in different places.

Lines 201-202. Please define the “relevant constraint embeddings” with its structure.

Figure 1.  Please define “initial embeddings”

Lines 352-360. Please describe exactly  the process how d dimensions are filled, e.g., for d=128 without a reader guessing it.

Lines 391-392.  Please describe how exactly these dependencies were defined and incorporated into embedding and embedding updating process.  

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors
  1. The constraint design does not consider permutation invariance. Constraints (such as S-constraints) rely on a fixed variable order (such as vertex order), but in real-world geometric problems, constraints should exhibit permutation invariance. The impact of variable reordering on the model has not been tested. Additional experiments should be conducted to verify robustness under variable order perturbations, or modify the constraint definition to ensure it satisfies permutation invariance (such as using symmetric aggregation functions).
  2. The GNN update function lacks ablation experiments, and the LSTM update function is claimed to be superior to RNN, but no comparative data (such as accuracy/convergence speed differences) is provided. Supplement the ablation experiment of RNN vs LSTM to verify the necessity of LSTM.
  3. The profit mechanism of grid initialization is unclear, and grid initialization accelerates convergence, but how it affects the embedding space structure (such as whether it suppresses the learning of nonlinear manifolds) has not been analyzed. Compare the curvature differences of PCA under random initialization and grid initialization, and explain the path of geometric bias.
  4. The geometric interpretation of embeddings is weak, and PCA shows embeddings as cup-shaped surfaces, but the relationship between curvature and mesh distortion (such as greater boundary distortion) has not been analyzed. Calculate local curvature and associate it with grid position (such as boundary vs center point).
  5. The input information of Transformer is redundant, and the input sequence contains all constraints, but predicting a single variable may only require local information, and the effect of attention sparsity has not been tested. Comparative experiments with increased local attention or subgraph input.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The authors wrote an article titled: "Geometric Reasoning in the Embedding Space", which deals with the ability of neural networks, specifically graph neural networks (GNNs) and Transformers, to solve geometric problems on a 2D grid.
Ideas for improvement:
- what is the computational complexity? How does it compare to other methods? Discuss it.
- How does it compare to other types of neural networks or hybrid approaches?
- What about noise in the data? Write about it.
These are just small things. Otherwise, I have no comments.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Lines 3-15 and Lines  537-547. Title, abstract and conclusions.  Comparison of statements in the abstract and in the conclusion shows that the conclusion is more specific and modest relative to the promise in the abstract. I would recommend modifying the title and abstract to be consistent with conclusion to avoid over expectations.

The title and the abstract hint that the paper will recover the geometric reasoning in the embedding space. This would be a fundamental explainability result.  In fact, the paper is answering the question:  Can GNN and Transformer mimic the result of geometric reasoning. It also partially shows how GNN and Transformer  conduct their processes in the embedding space as stated in the conclusion “provided several insights into the process by which they find the solution”.  It is a useful first step and I support its publication,  but revealing the extent of real geometric reasoning by GNNs and transformers is the future  work as I comment later. I would suggest adding a future work section to the main part of the paper, while only some suggestions can be found in the Appendices.  

Lines 263-267.  While this new section is called “Embedding Initialization” it still needs to  define values of embedding Initialization assigned to each 128-D point like W45.  Only later in lines 437-449  some clarification is provided, while still it is not clear that it is applicable to the example on lines 263-267.  As I suggested in the first review, please put all such information in a single place to avoid guessing.   

Lines 650-652. Together, these metrics provide  complementary views on how spatial structure and geometric complexity evolve, and how these differ across initialization strategies. 

Lines 657-659. These visualizations confirm that the model successfully learns to embed spatial relationships in its high-dimensional representation, with the embeddings organized on a curved surface that preserves local neighborhood structure. .

Review. Please explain the value of these conclusions for explaining/understanding  how geometric reasoning  is produced by GNN?  It seems that GNN captured that the training data with static  points are relatively uniformly split between all grid points. What is the value beyond this?

Line 663:  200 epochs, Line 355:  message-passing iterations, Line: 388 inference iterations. Table 1: Training iterations/layers.  Lines 587- 590:  fixed and variable iteration training (different for each batch)…15 steps …

Review: Please unify terminology and define terms explicitly making clear difference between  iterations, epochs, steps, batches, layers. 15 iteration and 200 epochs are quite different.

Lines 620- 624: When using the same hyperparameters as the LSTM model (for a direct comparison), the RNN plateaued at 38.4% and exhibited higher validation loss.  … These results confirm the advantage of using more expressive update mechanisms (such as  LSTM) for modeling our geometric constraints.        

Lines 805-808:  For early iterations, the classifier achieved over 80 % accuracy. However, performance degrades for higher iterations, as shown in Figure A16. The model increasingly predicts  “satisfied” for most constraints as iteration count increases, regardless of actual satisfaction  status.                                              

Review.  I feel that the expressiveness of the mechanism needs to be explored beyond these experiments as a core of the future work to be described in a new the future work section.  Is expressiveness  of LSTM sufficient to recover the actual formulas of the constraints not just imitate them. Is expressiveness  of MLP sufficient to recover the actual formulas of the square constraint to test if a given candidate is the actual point  D?  What will be a sufficient  mechanism if LSTM and MLP are not sufficient?

The advantage of the task is that all constraints have exact mathematical formulas.  Can these formulas be recovered explicitly by the proposed deep learning methods?  The square constraint has an exact formula that computes the coordinates of the 4th point of the square D when points A=(a1,a2),  B=(b1,b2), and C=(c1,c2) are given. It is just computing norms of two vectors  B-A and C-A with  producing D=(d1,d2)=(|B-A|, |C-A|) in the coordinate system with a basis in vectors B-A and C-A, where A is the origin of this orthogonal coordinate system.  Converting D to the original coordinate system (U1,U2) of the grid just requires a linear transformation (matrix multiplication) that LSTM can theoretically conduct.  Will LSTM actually discover  vectors B-A  and C-A  and their norms, will it transform them to U1,U2 coordinates as a position of the point D?  If the future work will confirm these hypotheses then it will demonstrate that  these deep learning methods really discover geometric reasoning, not only imitating it to some extent at the “parrot level”.

Lines 782-798.  Figure A15 shows the clear separation between constraint types in the projected embedding space using UMAP.  Each constraint type exhibits subclustering patterns that reflect geometric properties, network processing biases, and some generator design choices. Square constraints form subclusters based on side orientation relative to the grid (parallel versus diagonal orientations).  These patterns indicate that constraint embeddings encode both geometric properties and structural biases from the network’s processing order.

Review. UMAP coordinates are not interpretable, thus the side orientation of square clusters is low informative for interpretation. Next 2-D visualizations (dimension reduction)  in UMAP and PCA are lossy (not reversible) only very partially representing all 128-D embedding data. As a result, they are deeply limited to  reveal what is going on in 128-D embedding space.  Fundamentally this analysis needs interpretable and lossless (or almost lossless) visualizations of n-D data. They have been emerging for the last 10 years under the names of Visual Knowledge Discovery and General Line Coordinates where n-D points are visualized as graphs in 2-D instead of 2D/3D points in UMAP (search in Google Scholar).  It seems to be a natural direction for future work to reveal how geometric reasoning is actually conducted in the embedding space.

English

Page 5: For constraints which have only  3 variables we append a zero vector to use preserve the shape.  => For constraints that have only three variables, we append a zero vector to preserve the shape.

Line 543 “occurence”  => occurrence. 

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

accept.

Author Response

Thank you very much for your response. We have updated our manuscript.

Back to TopTop