An End-to-End Formula Recognition Method Integrated Attention Mechanism
Abstract
1. Introduction
2. Related Work
2.1. Traditional Methods
2.2. Neural Methods for Formula Recognition
3. Methods
3.1. Encoder
3.2. Decoder
3.3. Loss Function
4. Experiments
4.1. Preprocessed Data
Algorithm 1 LaTeX sequence normalization |
|
4.2. Settings
4.3. Measurements
Algorithm 2 Calculate the maximum edit distance |
|
5. Discussion and Implications
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Suzuki, M.; Tamari, F.; Fukuda, R.; Uchida, S.; Kanahori, T. Infty: An integrated ocr system for mathematical documents. In Proceedings of the 2003 ACM Symposium on Document Engineering, Grenoble, France, 20–22 November 2003; pp. 95–104. [Google Scholar]
- Ion, P.; Miner, R.; Buswell, S.; Devitt, A. Mathematical Markup Language (MathML) 1.0 Specification; World Wide Web Consortium (W3C): Cambridge, MA, USA, 1998. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep structured output learning for unconstrained text recognition. arXiv 2014, arXiv:1412.5903. [Google Scholar]
- Vinyals, O.; Toshev, A.; Bengio, S.; Erhan, D. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3156–3164. [Google Scholar]
- Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhudinov, R.; Zemel, R.; Bengio, Y. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2048–2057. [Google Scholar]
- Cheng, H.; Yu, R.; Tang, Y.; Fang, Y.; Cheng, T. Text Classification Model Enhanced by Unlabeled Data for LaTeX Formula. Appl. Sci. 2021, 11, 10536. [Google Scholar] [CrossRef]
- Zhong, W.; Yang, J.H.; Lin, J. Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval. arXiv 2022, arXiv:2203.11163. [Google Scholar]
- Shi, B.; Bai, X.; Yao, C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 2298–2304. [Google Scholar] [CrossRef] [PubMed]
- Shi, B.; Yang, M.; Wang, X.; Lyu, P.; Yao, C.; Bai, X. Aster: An attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2035–2048. [Google Scholar] [CrossRef] [PubMed]
- Luo, C.; Jin, L.; Sun, Z. Moran: A multi-object rectified attention network for scene text recognition. Pattern Recognit. 2019, 90, 109–118. [Google Scholar] [CrossRef]
- Anderson, R.H. Syntax-directed recognition of hand-printed two-dimensional mathematics. In Symposium on Interactive Systems for Experimental Applied Mathematics; Association for Computing Machinery Inc. Symposium: New York, NY, USA, 1967; pp. 436–459. [Google Scholar]
- Deng, Y.; Kanervisto, A.; Rush, A.M. What you get is what you see: A visual markup decompiler. arXiv 2016, arXiv:1609.04938. [Google Scholar]
- Karpathy, A.; Fei-Fei, L. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3128–3137. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Luong, M.T.; Pham, H.; Manning, C.D. Effective approaches to attention-based neural machine translation. arXiv 2015, arXiv:1508.04025. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Okamoto, M.; Imai, H.; Takagi, K. Performance evaluation of a robust method for mathematical expression recognition. In Proceedings of the Sixth International Conference on Document Analysis and Recognition, Seattle, WA, USA, 10–13 September 2001; pp. 121–128. [Google Scholar]
- Berman, B.P.; Fateman, R.J. Optical character recognition for typeset mathematics. In Proceedings of the International Symposium on Symbolic and Algebraic Computation, Oxford, UK, 20–22 July 1994; pp. 348–353. [Google Scholar]
- Álvaro, F.; Sánchez, J.A. Comparing several techniques for offline recognition of printed mathematical symbols. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 1953–1956. [Google Scholar]
- Zanibbi, R.; Blostein, D.; Cordy, J.R. Recognizing mathematical expressions using tree transformation. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 1455–1467. [Google Scholar] [CrossRef]
- Lee, H.J.; Wang, J.S. Design of a mathematical expression recognition system. In Proceedings of the 3rd International Conference on Document analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 2, p. 1084. [Google Scholar]
- Twaakyondo, H.M.; Okamoto, M. Structure analysis and recognition of mathematical expressions. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 430–437. [Google Scholar]
- Suzuki, M.; Terada, Y.; Kanahori, T.; Yamaguchi, K. New Tools to Convert PDF Math Contents into Accessible e-Books Efficiently. In Assistive Technology; IOS Press: Washington, DC, USA, 2015; pp. 1060–1064. [Google Scholar]
- Gao, L.; Yi, X.; Liao, Y.; Jiang, Z.; Yan, Z.; Tang, Z. A deep learning-based formula detection method for PDF documents. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; Volume 1, pp. 553–558. [Google Scholar]
- Wu, J.W.; Yin, F.; Zhang, Y.M.; Zhang, X.Y.; Liu, C.L. Image-to-markup generation via paired adversarial learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2018; pp. 18–34. [Google Scholar]
- Deng, Y.; Kanervisto, A.; Ling, J.; Rush, A.M. Image-to-markup generation with coarse-to-fine attention. In Proceedings of the International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 980–989. [Google Scholar]
- Zhang, J.; Du, J.; Dai, L. A gru-based encoder-decoder approach with attention for online handwritten mathematical expression recognition. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; Volume 1, pp. 902–907. [Google Scholar]
- Zhang, J.; Du, J.; Dai, L. Track, attend, and parse (tap): An end-to-end framework for online handwritten mathematical expression recognition. IEEE Trans. Multimed. 2018, 21, 221–233. [Google Scholar] [CrossRef]
- Wang, J.; Sun, Y.; Wang, S. Image to latex with densenet encoder and joint attention. Procedia Comput. Sci. 2019, 147, 374–380. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–27 July 2017; pp. 4700–4708. [Google Scholar]
- Zhang, W.; Bai, Z.; Zhu, Y. An improved approach based on CNN-RNNs for mathematical expression recognition. In Proceedings of the 2019 4th International Conference on Multimedia Systems and Signal Processing, Guangzhou, China, 10–12 May 2019; pp. 57–61. [Google Scholar]
- Peng, S.; Yuan, K.; Gao, L.; Tang, Z. Mathbert: A pre-trained model for mathematical formula understanding. arXiv 2021, arXiv:2105.00377. [Google Scholar]
- Wu, J.W.; Yin, F.; Zhang, Y.M.; Zhang, X.Y.; Liu, C.L. Graph-to-graph: Towards accurate and interpretable online handwritten mathematical expression recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 2925–2933. [Google Scholar]
- Wang, Z.; Liu, J.C. Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training. Int. J. Doc. Anal. Recognit. 2021, 24, 63–75. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Rubinstein, R. The cross-entropy method for combinatorial and continuous optimization. Methodol. Comput. Appl. Probab. 1999, 1, 127–190. [Google Scholar] [CrossRef]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 7–12 July 2002; pp. 311–318. [Google Scholar]
- Chowdhury, S.D.; Bhattacharya, U.; Parui, S.K. Online handwriting recognition using Levenshtein distance metric. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25–28 August 2013; pp. 79–83. [Google Scholar]
Formula Type | Formula Picture | LaTeX Expression |
---|---|---|
Matrix (square brackets) | ||
Matrix (parentheses) | ||
Angle bracket formula | ||
Curly bracket formula | ||
Piecewise Function | ||
Multi-line Expression |
Ling | Model | BLEU | MED | Exact Match |
---|---|---|---|---|
Single Line | INFTY | 56.9 | 56.70 | 56.8 |
WYGIWYS | 58.71 | 63.6 | 61.0 | |
DoubleAttention | 59.5 | 67.2 | 63.1 | |
DenseNet | 71.34 | 59.6 | - | |
MI2LS | 73.53 | 78.33 | 63.8 | |
MathBERT | 86.0 | 81.61 | 73.77 | |
Our Model | 90.02 | 90.34 | 70.24 | |
Multi-Line | INFTY | 45.45 | 50.32 | 15.70 |
WYGIWYS | 53.77 | 57.51 | 45.31 | |
DoubleAttention | 58.17 | 54.32 | 32.12 | |
DenseNet | 61.41 | 63.34 | - | |
MI2LS | 67.15 | 65.60 | 70.57 | |
MathBERT | 69.32 | 71.37 | 74.93 | |
Our Model | 71.45 | 73.55 | 65.27 |
Ling | Model | BLEU | MED | Exact Match |
---|---|---|---|---|
Single Line | INFTY | 36.41 | 37.50 | 27.25 |
WYGIWYS | 35.19 | 40.80 | 32.84 | |
DoubleAttention | 40.40 | 43.94 | 37.51 | |
DenseNet | 39.66 | 42.51 | - | |
MI2LS | 43.00 | 46.78 | 32.09 | |
MathBERT | 50.41 | 47.94 | 53.6 | |
G2G | 54.46 | 52.05 | 55.28 | |
Our Model | 54.29 | 57.80 | 60.20 | |
Multi-Line | INFTY | 46.15 | 32.0 | 15.27 |
WYGIWYS | 47.46 | 42.45 | 45.46 | |
DoubleAttention | 49.49 | 51.3 | 47.68 | |
DenseNet | 52.13 | 55.72 | - | |
MI2LS | 53.65 | 52.21 | 48.3 | |
MathBERT | 54.65 | 56.71 | 57.22 | |
G2G | 54.90 | 57.81 | 55.28 | |
Our Model | 55.39 | 58.20 | 60.22 |
Model | BLEU | MED | Exact Match |
---|---|---|---|
INFTY | 66.65 | 53.82 | 15.60 |
WYGIWYS | 87.73 | 87.60 | 77.46 |
DoubleAttention | 88.42 | 88.57 | 79.81 |
DenseNet | 88.25 | 91.57 | - |
MI2LS | 90.28 | 91.90 | 82.33 |
MathBERT | 90.45 | 90.11 | 87.52 |
Our Model | 92.11 | 90.0 | 60.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, M.; Cai, M.; Li, G.; Li, M. An End-to-End Formula Recognition Method Integrated Attention Mechanism. Mathematics 2023, 11, 177. https://doi.org/10.3390/math11010177
Zhou M, Cai M, Li G, Li M. An End-to-End Formula Recognition Method Integrated Attention Mechanism. Mathematics. 2023; 11(1):177. https://doi.org/10.3390/math11010177
Chicago/Turabian StyleZhou, Mingle, Ming Cai, Gang Li, and Min Li. 2023. "An End-to-End Formula Recognition Method Integrated Attention Mechanism" Mathematics 11, no. 1: 177. https://doi.org/10.3390/math11010177
APA StyleZhou, M., Cai, M., Li, G., & Li, M. (2023). An End-to-End Formula Recognition Method Integrated Attention Mechanism. Mathematics, 11(1), 177. https://doi.org/10.3390/math11010177