Text-Guided Geometric Relation Parsing with Logic Regularization
Abstract
1. Introduction
1.1. Background and Challenges
1.2. Current Research Status and Limitations
1.3. Motivation and Approach
1.4. Proposed Method and Contributions
2. Related Work
2.1. Geometric Diagram Parsing and Automated Problem Solving
2.2. Multimodal Fusion in the Geometric Domain
2.3. Neuro-Symbolic Learning and Logic Regularization
3. Materials and Methods
3.1. Overall Architecture
| Algorithm 1. Training and inference pipeline of the proposed parser |
| Input: training set , active relation set , atomic rule set , model parameters θ |
| Output: trained parser and predicted relation graph |
Training phase:
|
3.2. Multimodal Atomic Perception
- Visual Stream: We adopt ResNet-50 as the visual backbone [29]. An FPN detector is used to support multi-scale primitive perception [30]. In the present study, relation prediction is evaluated on the derived candidate primitives provided by the PGDP5K-based preprocessing pipeline, as recorded in the Ext-PGDP5K split files.
- Candidate edges are constructed from these preprocessed primitives according to the Ext-PGDP5K split files. All valid candidate primitive pairs recorded in the Ext-PGDP5K split files are retained for relation prediction. We do not perform additional negative sampling; candidate edge–relation entries without derived positive labels are treated as negative instances in the masked binary relation loss. This setting leads to severe class imbalance, so FRA is reported only as an auxiliary metric, while Edge-F1 and Macro-F1 are emphasized for relation-level evaluation. The reported metrics are computed on the resulting candidate primitive pairs; therefore, the evaluation focuses on relation parsing rather than standalone primitive detection. This design isolates relation-level prediction errors from primitive-detection errors.
- Text Stream: Explicit Atomic Semantic Probe: Instead of encoding the full problem statement into a single undifferentiated sentence vector, we explicitly model a compact set of geometric atoms. The seed vocabulary contains six cue families centered on parallel, perpendicular, tangent, bisector, angle-bisector, and intersection semantics. The atomic weak labels are generated by matching normalized cue expressions to these predefined cue families. This procedure is closer to lexical cue extraction than to full natural-language semantic parsing. The vocabulary includes textual forms, symbolic forms, and common variants, such as “parallel”, “//”, and “||” for the Parallel atom. The cue vocabulary and normalization rules are available in the project repository. For example, ‘parallel’, ‘//’, and ‘||’ are normalized to the Parallel atom. This design makes the text stream efficient and interpretable, but it may be brittle to paraphrases, implicit relation descriptions, and syntactic forms not covered by the seed vocabulary or normalization rules.
3.3. Iterative Visual–Semantic Feedback Fusion
3.4. Logic Consistency Regularization
3.5. Design Rationale and Computational Discussion
4. Results and Discussion
4.1. Experimental Design and Evaluation Protocol
4.2. Protocol Statistics
4.3. Main Comparison and Modality-Control Analysis
4.4. Relation-Wise Analysis
4.5. Ablation, Efficiency, and Sensitivity Analysis
4.6. Limitations and Threats to Validity
5. Conclusions
5.1. Main Findings
5.2. Limitations and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ma, J.; Wang, W.; Jin, Q. A Survey of Deep Learning for Geometry Problem Solving. arXiv 2025, arXiv:2507.11936. [Google Scholar] [CrossRef]
- Seo, M.; Hajishirzi, H.; Farhadi, A.; Etzioni, O.; Malcolm, C. Solving Geometry Problems: Combining Text and Diagram Interpretation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1466–1476. [Google Scholar] [CrossRef]
- Zhang, M.-L.; Yin, F.; Hao, Y.-H.; Liu, C.-L. Plane Geometry Diagram Parsing. arXiv 2022, arXiv:2205.09363. [Google Scholar] [CrossRef]
- Lu, P.; Qiu, L.; Yu, W.; Welleck, S.; Chang, K.-W. A Survey of Deep Learning for Mathematical Reasoning. arXiv 2022, arXiv:2212.10535. [Google Scholar] [CrossRef]
- Zhu, N.; Zhang, X.; Huang, Q.; Zhu, F.; Zeng, Z.; Leng, T. FGeo-Parser: Autoformalization and Solution of Plane Geometric Problems. Symmetry 2025, 17, 8. [Google Scholar] [CrossRef]
- Trinh, T.H.; Wu, Y.; Le, Q.V.; He, H.; Luong, T. Solving Olympiad Geometry without Human Demonstrations. Nature 2024, 625, 476–482. [Google Scholar] [CrossRef] [PubMed]
- Lu, P.; Gong, R.; Jiang, S.; Qiu, L.; Huang, S.; Liang, X.; Zhu, S.-C. Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021; pp. 6774–6786. [Google Scholar] [CrossRef]
- Li, Z.-Z.; Zhang, M.-L.; Yin, F.; Liu, C.-L. LANS: A Layout-Aware Neural Solver for Plane Geometry Problem. In Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, 11–16 August 2024; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 2596–2608. [Google Scholar] [CrossRef]
- Zhang, M.-L.; Li, Z.-Z.; Yin, F.; Lin, L.; Liu, C.-L. Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses from Diagram. arXiv 2024, arXiv:2407.07327. [Google Scholar] [CrossRef]
- Ping, B.; Luo, M.; Dang, Z.; Wang, C.; Jia, C. AutoGPS: Automated Geometry Problem Solving via Multimodal Formalization and Deductive Reasoning. In Proceedings of the Fourteenth International Conference on Learning Representations, Rio de Janeiro, Brazil, 23–27 April 2026; Available online: https://openreview.net/forum?id=PVtZnUh04m (accessed on 15 May 2026).
- Zhang, Z.; Cheng, J.-K.; Deng, J.; Tian, L.; Ma, J.; Qin, Z.; Zhang, X.; Zhu, N.; Leng, T. Diagram Formalization Enhanced Multi-Modal Geometry Problem Solver. arXiv 2024, arXiv:2409.04214. [Google Scholar] [CrossRef]
- Murphy, L.; Yang, K.; Sun, J.; Li, Z.; Anandkumar, A.; Si, X. Autoformalizing Euclidean Geometry. In Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024; Proceedings of Machine Learning Research. Volume 235, pp. 36847–36893. [Google Scholar]
- Lu, P.; Qiu, L.; Chen, J.; Xia, T.; Zhao, Y.; Zhang, W.; Yu, Z.; Liang, X.; Zhu, S.-C. IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning. arXiv 2021, arXiv:2110.13214. [Google Scholar] [CrossRef]
- Chen, J.; Tang, J.; Qin, J.; Liang, X.; Liu, L.; Xing, E.P.; Lin, L. GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online, 1–6 August 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 513–523. [Google Scholar] [CrossRef]
- Tan, H.; Bansal, M. LXMERT: Learning Cross-Modality Encoder Representations from Transformers. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019; pp. 5100–5111. [Google Scholar] [CrossRef]
- Li, C.; Xu, H.; Tian, J.; Wang, W.; Yan, M.; Bi, B.; Ye, J.; Chen, H.; Xu, G.; Cao, Z.; et al. mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 7241–7259. [Google Scholar] [CrossRef]
- Agrawal, A.; Batra, D.; Parikh, D.; Kembhavi, A. Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4971–4980. [Google Scholar] [CrossRef]
- Geirhos, R.; Jacobsen, J.-H.; Michaelis, C.; Zemel, R.; Brendel, W.; Bethge, M.; Wichmann, F.A. Shortcut Learning in Deep Neural Networks. Nat. Mach. Intell. 2020, 2, 665–673. [Google Scholar] [CrossRef]
- Besold, T.R.; d’Avila Garcez, A.; Bader, S.; Bowman, H.; Domingos, P.; Hitzler, P.; Kühnberger, K.-U.; Lamb, L.C.; Lowd, D.; Lima, P.M.V.; et al. Neural-Symbolic Learning and Reasoning: A Survey and Interpretation. arXiv 2017, arXiv:1711.03902. [Google Scholar] [CrossRef]
- Xu, J.; Zhang, Z.; Friedman, T.; Liang, Y.; Van den Broeck, G. A Semantic Loss Function for Deep Learning with Symbolic Knowledge. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Proceedings of Machine Learning Research. Volume 80, pp. 5502–5511. [Google Scholar]
- Hu, Z.; Ma, X.; Liu, Z.; Hovy, E.; Xing, E. Harnessing Deep Neural Networks with Logic Rules. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; pp. 2410–2420. [Google Scholar] [CrossRef]
- Diligenti, M.; Gori, M.; Saccà, C. Semantic-Based Regularization for Learning and Inference. Artif. Intell. 2017, 244, 143–165. [Google Scholar] [CrossRef]
- Fischer, M.; Balunovic, M.; Drachsler-Cohen, D.; Gehr, T.; Zhang, C.; Vechev, M. DL2: Training and Querying Neural Networks with Logic. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Proceedings of Machine Learning Research. Volume 97, pp. 1931–1941. [Google Scholar]
- Bach, S.H.; Broecheler, M.; Huang, B.; Getoor, L. Hinge-Loss Markov Random Fields and Probabilistic Soft Logic. J. Mach. Learn. Res. 2017, 18, 1–67. [Google Scholar]
- Dong, H.; Mao, J.; Lin, T.; Wang, C.; Li, L.; Zhou, D. Neural Logic Machines. arXiv 2019, arXiv:1904.11694. [Google Scholar] [CrossRef]
- Manhaeve, R.; Dumančić, S.; Kimmig, A.; Demeester, T.; De Raedt, L. DeepProbLog: Neural Probabilistic Logic Programming. arXiv 2018, arXiv:1805.10872. [Google Scholar] [CrossRef]
- Serafini, L.; d’Avila Garcez, A. Logic Tensor Networks: Deep Learning and Logical Reasoning from Data and Knowledge. arXiv 2016, arXiv:1606.04422. [Google Scholar] [CrossRef]
- Ratner, A.; Bach, S.H.; Ehrenberg, H.; Fries, J.; Wu, S.; Ré, C. Snorkel: Rapid Training Data Creation with Weak Supervision. Proc. VLDB Endow. 2017, 11, 269–282. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar] [CrossRef]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar] [CrossRef]





| Set | N | Pair | I | T | P | ⊥ | B | Avg |
|---|---|---|---|---|---|---|---|---|
| Train | 3500 | 363,872 | 571 | 0 | 176 | 685 | 105 | 0.439 |
| Val. | 500 | 55,912 | 77 | 0 | 29 | 100 | 23 | 0.458 |
| Test | 1000 | 110,086 | 175 | 0 | 47 | 204 | 53 | 0.479 |
| Total | 5000 | 529,870 | 823 | 0 | 252 | 989 | 181 | 0.449 |
| Method | Text | Edge-F1 (%) | Macro-F1 (%) | FRA (%) | LVR (%) |
|---|---|---|---|---|---|
| Text-only | Orig. | 0.00 | 0.00 | 69.0 | 0.000 |
| Image-only | None | 27.05 | 16.43 | 71.6 | 0.027 |
| Global fusion | Orig. | 30.78 | 16.16 | 72.8 | 0.035 |
| Img. + shuf. text | Shuf. | 3.67 | 2.42 | 70.2 | 0.001 |
| Ours | Paired | 53.63 | 42.56 | 77.8 | 0.244 |
| Rel. | Img. | Fusion | Ours | Δ |
|---|---|---|---|---|
| Int. | 42.32 | 59.97 | 53.60 | −6.37 |
| Par. | 0.00 | 0.00 | 55.70 | +55.70 |
| Perp. | 23.39 | 4.66 | 60.95 | +37.56 |
| Bis. | 0.00 | 0.00 | 0.00 | 0.00 |
| Macro avg. | 16.43 | 16.16 | 42.56 | +26.13 |
| Variant | Edge-F1 (%) | Macro-F1 (%) | FRA (%) | LVR (%) | Pos./S |
|---|---|---|---|---|---|
| Base | 27.05 | 16.43 | 71.6 | 0.027 | 0.1125 |
| +AP | 46.45 | 36.82 | 75.6 | 0.116 | 0.2530 |
| +AP+Logic | 52.14 | 29.85 | 77.0 | 0.086 | 0.2690 |
| +AP+Fb | 39.38 | 29.29 | 75.1 | 0.091 | 0.2320 |
| Full | 53.63 | 42.56 | 77.8 | 0.244 | 0.2985 |
| Gold | — | — | — | — | 0.4790 |
| Method | Rds. | Params (M) | Time (ms) | FRA (%) | LVR (%) |
|---|---|---|---|---|---|
| Image-only | N.A. | 3.89 | 7.82 | 71.6 | 0.027 |
| Global fusion | N.A. | 3.89 | 7.84 | 72.8 | 0.035 |
| Ours-1R | 1 | 3.89 | 7.55 | 76.3 | 0.115 |
| Ours-2R | 2 | 3.89 | 8.02 | 77.8 | 0.244 |
| Ours-3R | 3 | 3.89 | 8.45 | 74.3 | 0.028 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jian, P.; Zhang, X.; Wu, L.; Sun, Q. Text-Guided Geometric Relation Parsing with Logic Regularization. Electronics 2026, 15, 2460. https://doi.org/10.3390/electronics15112460
Jian P, Zhang X, Wu L, Sun Q. Text-Guided Geometric Relation Parsing with Logic Regularization. Electronics. 2026; 15(11):2460. https://doi.org/10.3390/electronics15112460
Chicago/Turabian StyleJian, Pengpeng, Xuhui Zhang, Lei Wu, and Quanhong Sun. 2026. "Text-Guided Geometric Relation Parsing with Logic Regularization" Electronics 15, no. 11: 2460. https://doi.org/10.3390/electronics15112460
APA StyleJian, P., Zhang, X., Wu, L., & Sun, Q. (2026). Text-Guided Geometric Relation Parsing with Logic Regularization. Electronics, 15(11), 2460. https://doi.org/10.3390/electronics15112460

