Syntax–Semantics–Numeracy Fusion for Improving Math Word Problem Representation and Solving
Abstract
1. Introduction
- An enhanced MWP solver is proposed that leverages BERT and RoBERTa-based encoders as the backbone. The model strengthens text encoding through multi-dimensional feature enhancement—covering syntax, semantics, and numeracy—to mitigate informational asymmetry and advances the capabilities of human–computer interaction applications.
- We introduce a syntax-regularized contrastive objective that builds positive/negative pairs using TF–IDF over dependency-tree meta-structures, encouraging linguistically similar MWPs to occupy nearby regions in representation space while preserving semantics.
- We design a two-level numeracy objective to explicitly encode numerical attributes/relations and number-type cues, serving as a stabilizing regularizer that prevents representation degradation when only linguistic signals are used.
- A visual analytic scheme is proposed to elucidate how the fusion strategy enhances the mathematical representations of the encoder.
2. Related Work
2.1. Encoder–Decoder Architectures for MWPs
2.2. Decoders for MWPs
2.3. Encoders for MWPs
3. Building SSN4Solver by Syntax-Semantics-Numeracy Fusion
3.1. The Overall Architecture of SSN4Solver
3.2. Syntax–Semantics–Numeracy Fusion
3.2.1. Loss Function for Syntax
- 1.
- 2.
- Meta-structure extraction: The meta-structure set is extracted from each dependency tree, representing the core syntactic patterns.
- 3.
- Similarity computation: The syntax similarity of two trees are calculated using the term frequency–inverse document frequency (TF–IDF) algorithm [25] applied to the corresponding two meta-structure sets. TF-IDF is conventionally used as a token-based tool for analyzing document similarity. In our approach, we leverage the dependency trees of problem statements and extract meta-paths from these trees to serve as ’tokens’ for TF-IDF calculation.
- Positive samples: —problems with similarity scores above a predefined threshold .
- Negative samples: —randomly selected problems with dissimilar dependency trees.
3.2.2. Loss Function for Semantic POS
3.2.3. Loss Function for Numeracy
- Token-level objectives: Focus on individual numerical tokens and their semantic properties.
- Sentence-level objectives: Address numerical relations across complete problem contexts.
3.3. Full Loss Function and Training
3.4. Remarks
4. Experimental Evaluation
4.1. Experimental Setup
- Math23K: A Chinese-language dataset comprising 23,164 elementary-school-level mathematical word problems, primarily focused on solving single-unknown linear equations, which was used in [1,5,11,13,20,27,28,29,30,31]. Each problem is annotated with its corresponding equation template, tokenized text, and numerical solution. Following the standard partitioning scheme adopted in prior work [13], the dataset is divided into 21,161 training samples, 1000 validation samples, and 998 test samples.
- MathQA: An English-language mathematical reasoning dataset containing 37,200 problems, which was used in [5,11,13,29,30,31]. Each problem is accompanied by a fully specified operational program that delineates step-by-step solution procedures, natural language rationales, and an annotated formula specifying computational steps. The dataset is split into 29,837 training samples, 4475 validation samples, and 2985 test samples.
- Seq2Tree methods include S-Aligned [29], AST-Dec [33], HMS [34], NSSolver [35], Seq2DAG [30], and GTS [5]. These models use data structures like stacks or trees to build math expressions from the bottom up, which helps the model better understand the logical structure of the expressions. Some papers, like Graph2Tree [31] and RHMS [20], use graph-based residual connections to capture long-range relations in the problem text but still use the sequence-based encoder like GRU or LSTM.
- PLM-based advantages methods, such as MWP-BERT [13] and BERT-CL [11], leverage the pre-trained language models (e.g., BERT and RoBERTa), which demonstrate remarkable advantages in semantic understanding. PLMs training on vast amounts of text data through tasks like masked language modeling learn deep linguistic patterns, including syntax and context-aware meaning. The entries BERT-GTS and RoBERTa-GTS in Table 2 refer to variants where the GRU encoder is substituted with BERT to enhance semantic understanding. The PLM helps the model create more meaningful representations of the math problems.
| Model | Accuracy | ||
|---|---|---|---|
| Math23K | MathQA | ||
| Seq2seq | DeepNS [1] | 58.1 | – |
| Math-EN [28] | 66.7 | – | |
| T-RNN [18] | 66.9 | – | |
| GroupAttn [32] | 69.5 | – | |
| Seq2tree | S-Aligned [29] | 67.1 | 71.3 |
| AST-Dec [33] | 69.5 | – | |
| NS-Solver [35] | 75.6 | – | |
| Seeq2DAG [30] | 72.5 | 75.5 | |
| HMS [34] | 76.1 | – | |
| RHMS [20] | 78.6 | – | |
| GTS [5] | 75.6 | 71.3 | |
| Graph2Tree [31] | 77.4 | 72.0 | |
| PLM | BERT-GTS | 83.8 | 75.1 |
| RoBERTa-GTS | 83.5 | 75.3 | |
| BERT-CL [11] | 83.2 | 76.3 | |
| MWP-BERT [13] | 84.7 | 76.2 | |
| MWP-RoBERTa [13] | 84.5 | 76.6 | |
| LLM | GPT-3.5 (Zero-shot) [37] | 36.99 | - |
| GPT-3.5 (Zero-shot CoT) [37] | 57.91 | - | |
| GPT-4 (Zero-shot) [37] | 43.05 | - | |
| GPT-4 (Zero-shot CoT) [37] | 78.16 | - | |
| Ours | SSN4Solver(BERT) | 85.6 | 76.6 |
| SSN4Solver(RoBERTa) | 86.0 | 76.8 | |
4.2. Performance Comparison
4.2.1. Accuracy Comparison
- Our proposed method consistently outperforms all baseline models, demonstrating significant performance improvements. This suggests that the learned encoder effectively captures linguistic knowledge from mathematical word problems (MWPs), thereby enhancing accuracy and providing superior hidden representations for the decoder. To validate this, we also evaluate the model’s performance with frozen encoder parameters.
- The performance improvement is more pronounced on the Math23K dataset compared to MathQA. This discrepancy may stem from the fact that MathQA is derived from the AQuA dataset by replacing numerical or entity names. Consequently, within similar problem groups, keywords such as verbs and prepositions remain largely unchanged, limiting the model’s ability to learn syntactic dependencies from diverse contexts. This lack of diversity hinders the effectiveness of our contrastive learning objective. For example, consider the logical opposition between “more than” and “less than.” These phrases share similar part-of-speech tags and syntactic structures but dictate opposite mathematical operations. Our model is designed to learn these subtle distinctions by contrasting them in the embedding space. However, if a dataset is dominated by a single syntactic pattern, the model cannot effectively learn the relative logical relationship or a robust decision boundary. The experimental results indicate that the efficacy of our proposed method is significantly influenced by the linguistic diversity and semantic richness of the training data. The marginal improvement observed on the MathQA dataset reveals a potential limitation: our model relies on identifying nuanced syntactic dependencies, which are often obscured in MathQA due to its template-based nature and repetitive sentence structures.
- While we have included large language models (LLMs) in our comparative analysis, several important distinctions must be noted. First, the parameter scale of LLMs vastly exceeds that of models based on the BERT-GTS architecture. Furthermore, Math23k is not a standard benchmark typically used for evaluating LLMs, and mathematical reasoning has historically been a significant limitation, particularly for earlier iterations of these models. Specifically, the experimental results cited from [37] rely on the gpt-3.5-turbo-0301 and gpt-4-0314 versions. While the primary advantage of LLMs lies in their Artificial General Intelligence (AGI) capabilities—and we acknowledge that the latest state-of-the-art models likely demonstrate improved problem-solving performance. The scope and contribution of this study remain focused on the injection of external knowledge and constraints, as well as the investigation of internal annotations and solver performance.
4.2.2. Ablation Experiments
- Full model: The complete framework integrating all three proposed components (CL, SRO, and MKO).
- w/o CL: A variant excluding the multi-view contrastive learning mechanism, designed to isolate the impact of the semantic optimization module.
- w/o SRO: A configuration without the semantic-aware representation optimization module, used to assess the contribution of contrastive learning.
- w/o MKO: A model where the math knowledge optimization component is removed, intended to evaluate the role of semantic enhancement.
- Baseline: A standard encoder–decoder architecture without any of the proposed enhancements, serving as the performance lower bound.
4.3. Comparison on Embedding Effect of SSN-Encoder Against Vanilla Encoder
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, Y.; Liu, X.; Shi, S. Deep Neural Solver for Math Word Problems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 9–11 September 2017; pp. 845–854. [Google Scholar] [CrossRef]
- Wang, L.; Zhang, D.; Gao, L.; Song, J.; Guo, L.; Shen, H.T. MathDQN: Solving Arithmetic Word Problems via Deep Reinforcement Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Gan, W.; Yu, X.; Zhang, T.; Wang, M. Automatically Proving Plane Geometry Theorems Stated by Text and Diagram. Int. J. Pattern Recognit. Artif. Intell. 2019, 33, 1940003:1–1940003:26. [Google Scholar] [CrossRef]
- Jian, P.; Sun, C.; Yu, X.; He, B.; Xia, M. An End-to-End Algorithm for Solving Circuit Problems. Int. J. Pattern Recognit. Artif. Intell. 2019, 33, 1940004:1–1940004:21. [Google Scholar] [CrossRef]
- Xie, Z.; Sun, S. A Goal-Driven Tree-Structured Neural Model for Math Word Problems. In Proceedings of the International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019. [Google Scholar]
- Zhang, Y.; Zhou, G.; Xie, Z.; Huang, J. HGEN: Learning Hierarchical Heterogeneous Graph Encoding for Math Word Problem Solving. IEEE/ACM Trans. Audio Speech Lang. Process. 2022, 30, 816–828. [Google Scholar] [CrossRef]
- Tao, X.; Zhang, Y.; Xie, Z.; Zhao, Z.; Zhou, G.; Lu, Y. Unifying the syntax and semantics for math word problem solving. Neurocomputing 2025, 636, 130042. [Google Scholar] [CrossRef]
- Yu, X.; Sun, H.; Sun, C. A relation-centric algorithm for solving text-diagram function problems. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 8972–8984. [Google Scholar] [CrossRef]
- Yu, X.; Cheng, W.; Yang, C.; Zhang, T. A theoretical review on solving algebra problems. Expert Syst. Appl. 2026, 296, 128789. [Google Scholar] [CrossRef]
- Jian, P.; Sun, T.; Ma, B.; Xi, H.; Yang, Y. Dual Decoder Mathematical Word Problem Solving Model Based on Lie Group Intrinsic Mean Feature Matrix. Neural Process. Lett. 2025, 57, 85. [Google Scholar] [CrossRef]
- Li, Z.; Zhang, W.; Yan, C.; Zhou, Q.; Li, C.; Liu, H.; Cao, Y. Seeking Patterns, Not just Memorizing Procedures: Contrastive Learning for Solving Math Word Problems. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, 22–27 May 2022. [Google Scholar]
- Shen, J.T.; Yamashita, M.; Prihar, E.; Heffernan, N.; Wu, X.; Graff, B.; Lee, D. MathBERT: A Pre-trained Language Model for General NLP Tasks in Mathematics Education. arXiv 2021, arXiv:2106.07340. [Google Scholar]
- Liang, Z.; Zhang, J.; Wang, L.; Qin, W.; Lan, Y.; Shao, J.; Zhang, X. MWP-BERT: Numeracy-Augmented Pre-training for Math Word Problem Solving. In Proceedings of the NAACL-HLT, Seattle, WA, USA, 10–15 July 2022; pp. 997–1009. [Google Scholar]
- Yu, X.; Lyu, X.; Peng, R.; Shen, J. Solving arithmetic word problems by synergizing syntax-semantics extractor for explicit relations and neural network miner for implicit relations. Complex Intell. Syst. 2023, 9, 697–717. [Google Scholar] [CrossRef]
- Peng, R.; Yu, X.; Yang, C.; Lyu, X. A Scene-Attention Relation-Centric Algorithm for Solving Arithmetic Word Problems. Expert Syst. Appl. 2025, 277, 127197. [Google Scholar] [CrossRef]
- Patel, A.; Bhattamishra, S.; Goyal, N. Are NLP Models really able to Solve Simple Math Word Problems? In Proceedings of the North American Chapter of the Association for Computational Linguistics, Online, 6–11 June 2021. [Google Scholar]
- He, B.; Yu, X.; Huang, L.; Meng, H.; Liang, G.; Chen, S. Comparative study of typical neural solvers in solving math word problems. Complex Intell. Syst. 2024, 10, 5805–5830. [Google Scholar] [CrossRef]
- Wang, L.; Zhang, D.; Zhang, J.; Xu, X.; Gao, L.; Dai, B.T.; Shen, H.T. Template-Based Math Word Problem Solvers with Recursive Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 7144–7151. [Google Scholar]
- He, B.; Yu, X.; Jian, P.; Zhang, T. A relation based algorithm for solving direct current circuit problems. Appl. Intell. 2020, 50, 2293–2309. [Google Scholar] [CrossRef]
- Lin, X.; Huang, Z.; Zhao, H.; Chen, E.; Liu, Q.; Lian, D.; Li, X.; Wang, H. Learning Relation-Enhanced Hierarchical Solver for Math Word Problems. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 13830–13844. [Google Scholar] [CrossRef] [PubMed]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics, Boulder, CO, USA, 1–3 June 2019; pp. 4171–4186. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Manning, C.; Surdeanu, M.; Bauer, J.; Finkel, J.; Bethard, S.; McClosky, D. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, MD, USA, 23–24 June 2014; pp. 55–60. [Google Scholar] [CrossRef]
- Qi, P.; Zhang, Y.; Zhang, Y.; Bolton, J.; Manning, C.D. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Online, 5–10 July 2020; pp. 101–108. [Google Scholar]
- Salton, G.; Yang, C.S.; Gupta, A. A Vector Space Model for Automatic Indexing. Commun. ACM 1975, 18, 613–620. [Google Scholar] [CrossRef]
- van den Oord, A.; Li, Y.; Vinyals, O. Representation Learning with Contrastive Predictive Coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
- Li, Y.; Wang, L.; Kim, J.J.; Tan, C.S.; Luo, Y. On the Selection of Positive and Negative Samples for Contrastive Math Word Problem Neural Solver. In Proceedings of the 17th International Conference on Educational Data Mining, Atlanta, GA, USA, 4–17 July 2024; pp. 96–106. [Google Scholar] [CrossRef]
- Wang, L.; Wang, Y.; Cai, D.; Zhang, D.; Liu, X. Translating a Math Word Problem to a Expression Tree. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 1064–1069. [Google Scholar] [CrossRef]
- Chiang, T.R.; Chen, Y.N.V. Semantically-Aligned Equation Generation for Solving and Reasoning Math Word Problems. arXiv 2018, arXiv:1811.00720. [Google Scholar]
- Cao, Y.; Hong, F.; Li, H.; Luo, P. A Bottom-Up DAG Structure Extraction Model for Math Word Problems. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; pp. 39–46. [Google Scholar]
- Zhang, J.; Wang, L.; Lee, R.K.-W.; Bin, Y.; Wang, Y.; Shao, J.; Lim, E.-P. Graph-to-tree learning for solving math word problems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 3928–3937. [Google Scholar]
- Li, J.; Wang, L.; Zhang, J.; Wang, Y.; Dai, B.T.; Zhang, D. Modeling Intra-Relation in Math Word Problems with Different Functional Multi-Head Attentions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 6162–6167. [Google Scholar] [CrossRef]
- Liu, Q.; Guan, W.; Li, S.; Kawahara, D. Tree-structured Decoding for Solving Math Word Problems. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 2370–2379. [Google Scholar] [CrossRef]
- Lin, X.; Huang, Z.; Zhao, H.; Chen, E.; Liu, Q.; Wang, H.; Wang, S. HMS: A hierarchical solver with dependency-enhanced understanding for math word problem. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021. [Google Scholar]
- Qin, J.; Liang, X.; Hong, Y.; Tang, J.; Lin, L. Neural-Symbolic Solver for Math Word Problems with Auxiliary Tasks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 5870–5881. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Kim, J.; Kim, Y.; Baek, I.; Bak, J.; Lee, J. It Ain’t Over: A Multi-aspect Diverse Math Word Problem Dataset. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; pp. 14984–15011. [Google Scholar] [CrossRef]
- Ling, W.; Yogatama, D.; Dyer, C.; Blunsom, P. Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada; Computational Linguistics: Stroudsburg, PA, USA, 2017; pp. 158–167. [Google Scholar] [CrossRef]
- Maaten, L.v.d.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]




| Dataset | Math23K | MathQA |
|---|---|---|
| Num. of problem | 23,162 | 37,259 |
| Avg. problem length | 29 | 37.9 |
| Avg. number of ops | 2.28 | 5.3 |
| Num. of vocab | 2574 | 6912 |
| Num. of syntax metapath | 8048 | 25,081 |
| Proportion of related problems | 0.24 | 0.86 |
| Avg. number of relations | 2.87 | 3.67 |
| Math23K | MathQA | |
|---|---|---|
| SSN4Solver (BERT) | 85.5 | 76.6 |
| -w/o Contrastive term | 85.1 | 76.5 |
| -w/o Pos term | 85.1 | 76.5 |
| -w/o Numeracy term | 83.8 | 76.5 |
| Our method-RoBERTa | 86.0 | 76.8 |
| -w/o Contrastive term | 85.8 | 76.5 |
| -w/o Pos term | 85.7 | 76.7 |
| -w/o Numeracy term | 85.5 | 76.1 |
| SSN4Solver (RoBERTa) | RoBERTa-GTS | |
|---|---|---|
| Overall accuracy | 0.85972 | 0.83467 |
| In-cluster instances | 220/231 (0.9524) | 221/231 (0.9567) |
| Out-cluster instances | 638/767 (0.8318) | 612/767 (0.7979) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Feng, Z.; Ming, H.; Yu, X. Syntax–Semantics–Numeracy Fusion for Improving Math Word Problem Representation and Solving. Symmetry 2026, 18, 434. https://doi.org/10.3390/sym18030434
Feng Z, Ming H, Yu X. Syntax–Semantics–Numeracy Fusion for Improving Math Word Problem Representation and Solving. Symmetry. 2026; 18(3):434. https://doi.org/10.3390/sym18030434
Chicago/Turabian StyleFeng, Zihan, Hao Ming, and Xinguo Yu. 2026. "Syntax–Semantics–Numeracy Fusion for Improving Math Word Problem Representation and Solving" Symmetry 18, no. 3: 434. https://doi.org/10.3390/sym18030434
APA StyleFeng, Z., Ming, H., & Yu, X. (2026). Syntax–Semantics–Numeracy Fusion for Improving Math Word Problem Representation and Solving. Symmetry, 18(3), 434. https://doi.org/10.3390/sym18030434

