LocRecNet: A Synergistic Framework for Table Localization and Rectification
Abstract
:1. Introduction
- 1.
- A novel network architecture, LocRecNet, is proposed to effectively detect and correct deformations in table data. It precisely localizes key points and corrects deformations, ensuring a more reliable input for subsequent structure recognition, thereby improving both accuracy and robustness.
- 2.
- A new keypoint detection algorithm, tailored for table image analysis and serving as a preprocessing step for correcting deformations, efficiently detects and localizes tables of various types and structures. This approach addresses the limitations of existing methods in handling severely deformed or noisy tables, substantially enhancing processing and recognition capabilities.
- 3.
- Multiple deformed table datasets are generated using the algorithm, covering various table types, such as financial reports and forms, and incorporating different levels of geometric distortions, noise, and other real-world challenges. This fills a gap in current research, where comprehensive deformed datasets have been notably lacking.
2. Related Work
3. Methodology
3.1. Overall Architecture
3.2. Table Edge Point Localization
3.3. Image Correction
Algorithm 1 Table Image Correction Algorithm |
|
3.4. Table Structure Recognition
4. Experiments
4.1. Experimental Setting
4.2. Dataset
4.3. Evaluation Metrics
4.4. Experimental Results and Analysis
4.4.1. Performance Evaluation of LocRecNet
4.4.2. Computational Cost Analysis of LocRecNet
4.4.3. Visualization of Results
4.4.4. Overall Performance
4.5. Ablation Study
4.5.1. LocRecNet Table Edge Point Localization
4.5.2. Impact of LocRecNet on Standard Table Data
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Model | Parameter Magnitude |
---|---|
HRNet | ≈48.6 M |
HRNet-s | ≈11.2 M |
Res50 | ≈25.6 M |
Res101 | ≈44.5 M |
Res152 | ≈60.3 M |
References
- Hu, J.; Kashi, R.S.; Lopresti, D.P.; Wilfong, G. Table structure recognition and its evaluation. In Document Recognition and Retrieval VIII, Proceedings of the 8th International Conference on Document Recognition and Retrieval, San Jose, CA, USA, 21 December 2000; SPIE: Bellingham, WA, USA, 2000; pp. 44–55. [Google Scholar]
- Deng, Y.; Rosenberg, D.; Mann, G. Challenges in end-to-end neural scientific table recognition. In Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia, 20–25 September 2019; IEEE: New York, NY, USA, 2019; pp. 894–901. [Google Scholar]
- Alexiou, M.S.; Bourbakis, N.G. Pinakas: A methodology for deep analysis of tables in technical documents. Int. J. Artif. Intell. Tools 2023, 32, 2350042. [Google Scholar] [CrossRef]
- Göbel, M.; Hassan, T.; Oro, E.; Orsi, G. ICDAR 2013 table competition. In Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), Washington, DC, USA, 25–28 August 2013; IEEE: New York, NY, USA, 2013; pp. 1449–1453. [Google Scholar]
- Desai, H.; Kayal, P.; Singh, M. TabLeX: A benchmark dataset for structure and content information extraction from scientific tables. In Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR), Lausanne, Switzerland, 5–10 September 2021; Part II. Springer: Cham, Switzerland, 2021; pp. 554–569. [Google Scholar]
- Zhong, X.; ShafieiBavani, E.; Jimeno Yepes, A. Image-based table recognition: Data, model, and evaluation. In Computer Vision—ECCV 2020, Proceedings of the 16th European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer: Cham, Switzerland, 2020; pp. 569–585. [Google Scholar]
- Long, R.; Wang, W.; Xue, N.; Gao, F.; Yang, Z.; Wang, Y.; Xia, G.S. Parsing table structures in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 944–952. [Google Scholar]
- Qiao, L.; Li, Z.; Cheng, Z.; Zhang, P.; Pu, S.; Niu, Y.; Ren, W.; Tan, W.; Wu, F. LGPMA: Complicated table structure recognition with local and global pyramid mask alignment. In Document Analysis and Recognition—ICDAR 2021, Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR), Lausanne, Switzerland, 5–10 September 2021; Lladós, J., Lopresti, D., Uchida, S., Eds.; Springer: Cham, Switzerland, 2021; pp. 67–73. [Google Scholar]
- Liu, H.; Li, X.; Liu, B.; Jiang, D.; Liu, Y.; Ren, B.; Ji, R. Show, read and reason: Table structure recognition with flexible context aggregator. In Proceedings of the 29th ACM International Conference on Multimedia (MM ’21), New York, NY, USA, 20–24 October 2021; ACM: New York, NY, USA, 2021; pp. 1084–1092. [Google Scholar]
- Liu, H.; Li, X.; Liu, B.; Jiang, D.; Liu, Y.; Ren, B. Neural collaborative graph machines for table structure recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; IEEE: New York, NY, USA, 2022; pp. 4533–4542. [Google Scholar]
- Xing, H.; Gao, F.; Long, R.; Bu, J.; Zheng, Q.; Li, L.; Yu, Z. LORE: Logical location regression network for table structure recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; AAAI Press: New York, NY, USA, 2023; Volume 37, pp. 2992–3000. [Google Scholar]
- Zhang, Z.; Hu, P.; Ma, J.; Du, J.; Zhang, J.; Yin, B.; Liu, C. SEMv2: Table separation line detection based on instance segmentation. Pattern Recognit. 2024, 149, 110279. [Google Scholar] [CrossRef]
- Huang, Y.; Yan, Q.; Li, Y.; Chen, Y.; Wang, X.; Gao, L.; Tang, Z. A YOLO-based table detection method. In Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia, 20–25 September 2019; IEEE: New York, NY, USA, 2019; pp. 813–818. [Google Scholar]
- Li, Y.; Yang, S.; Liu, P.; Zhang, S.; Wang, Y.; Wang, Z.; Xia, S.-T. SimCC: A simple coordinate classification perspective for human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; Springer Nature Switzerland: Cham, Switzerland, 2022; pp. 89–106. [Google Scholar]
- Yu, C.; Xiao, B.; Gao, C.; Yuan, L.; Zhang, L.; Sang, N.; Wang, J. Lite-hrnet: A lightweight high-resolution network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: New York, NY, USA, 2021; pp. 10440–10450. [Google Scholar]
- Yang, S.; Quan, Z.; Nie, M.; Yang, W. Transpose: Keypoint localization via transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 11802–11812. [Google Scholar]
- Keller, W.; Borkowski, A. Thin plate spline interpolation. J. Geod. 2019, 93, 1251–1269. [Google Scholar] [CrossRef]
- Wood, S.N. Thin plate regression splines. J. R. Stat. Soc. Ser. B Stat. Methodol. 2003, 65, 95–114. [Google Scholar] [CrossRef]
- Prautzsch, H.; Boehm, W.; Paluszny, M. Bézier and B-Spline Techniques; Springer: Berlin/Heidelberg, Germany, 2002; Volume 6, pp. 25–41. [Google Scholar]
- Jin, B.; Liu, Y.; Liu, D.; Qi, W.; Chen, Y.; Wang, S. Research on automatic correction of the document images based on perspective transformation. In Proceedings of the 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), Fuzhou, China, 24–26 September 2021; IEEE: New York, NY, USA, 2021; pp. 291–297. [Google Scholar]
- Müller, R.; Kornblith, S.; Hinton, G.E. When does label smoothing help? Adv. Neural Inf. Process. Syst. 2019. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 770–778. [Google Scholar]
Method | Data | With/Without LocRecNet | P | R | F1 |
---|---|---|---|---|---|
LORE | SCITSR-curved | ×/✓ | 93.8% | 74.3% | 82.9% |
SCITSR-curved | ✓/× | 92.7% | 88.4% | 90.5% | |
PubTabNet-curved | ×/✓ | 96.5% | 83.3% | 89.4% | |
PubTabNet-curved | ✓/× | 97.2% | 86.7% | 91.6% | |
WTW-curved | ×/✓ | 94.5% | 95.9% | 95.1% | |
WTW-curved | ✓/× | 98.5% | 97.2% | 97.9% | |
LGPMA | SCITSR-curved | ×/✓ | 92.4% | 67.7% | 78.1% |
SCITSR-curved | ✓/× | 93.6% | 85.1% | 89.1% | |
PubTabNet-curved | ×/✓ | 96.2% | 76.2% | 85.1% | |
PubTabNet-curved | ✓/× | 96.8% | 86.7% | 91.5% | |
WTW-curved | ×/✓ | 52.0% | 83.3% | 64.1% | |
WTW-curved | ✓/× | 88.5% | 88.6% | 88.5% |
SCITSR | PubTabNet | WTW | |
---|---|---|---|
Table edge point localization | 0.008 s | 0.006 s | 0.069 |
Image correction | 0.016 s | 0.029 s | 0.109 |
Method | Representation | Input Size | AP | AR |
---|---|---|---|---|
HRNet | Heatmap | 256 × 192 | 77.3% | 79.7% |
384 × 288 | 82.5% | 84.2% | ||
SimCC | 256 × 192 | 84.0% | 86.9% | |
SimCC* | 256 × 192 | 85.3% | 87.1% | |
384 × 288 | 87.1% | 88.6% | ||
HRNet-s | SimCC* | 256 × 192 | 83.4% | 85.1% |
384 × 288 | 86.1% | 87.4% | ||
Res50 | Heatmap | 256 × 192 | 75.1% | 77.8% |
384 × 288 | 80.3% | 82.3% | ||
SimCC | 256 × 192 | 75.3% | 82.1% | |
384 × 288 | 79.7% | 84.4% | ||
SimCC* | 384 × 288 | 85.0% | 87.1% | |
Res101 | Heatmap | 256 × 192 | 68.9% | 72.7% |
384 × 288 | 76.5% | 79.0% | ||
SimCC | 256 × 192 | 75.7% | 82.5% | |
384 × 288 | 81.8% | 85.5% | ||
Res152 | Heatmap | 256 × 192 | 75.8% | 78.8% |
384 × 288 | 81.4% | 83.3% | ||
SimCC | 384 × 288 | 81.2% | 85.4% |
Method | Data | With/Without LocRecNet | P | R | F1 |
---|---|---|---|---|---|
LORE | SCITSR | ×/✓ | 94.3% | 90.9% | 92.6% |
SCITSR | ✓/× | 94.1% | 91.6% | 92.8% | |
PubTabNet | ×/✓ | 97.9% | 88.2% | 92.8% | |
PubTabNet | ✓/× | 97.9% | 88.2% | 92.8% | |
LGPMA | SCITSR | ×/✓ | 93.8% | 84.5% | 88.9% |
SCITSR | ✓/× | 93.8% | 85.2% | 89.3% | |
PubTabNet | ×/✓ | 97.6% | 87.5% | 92.3% | |
PubTabNet | ✓/× | 97.6% | 87.5% | 92.3% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cai, Z.; Feng, J.; Hou, Z.; Zhang, H.; Ma, H. LocRecNet: A Synergistic Framework for Table Localization and Rectification. Electronics 2025, 14, 1920. https://doi.org/10.3390/electronics14101920
Cai Z, Feng J, Hou Z, Zhang H, Ma H. LocRecNet: A Synergistic Framework for Table Localization and Rectification. Electronics. 2025; 14(10):1920. https://doi.org/10.3390/electronics14101920
Chicago/Turabian StyleCai, Zefeng, Jie Feng, Zhaokun Hou, Haixiang Zhang, and Hanjie Ma. 2025. "LocRecNet: A Synergistic Framework for Table Localization and Rectification" Electronics 14, no. 10: 1920. https://doi.org/10.3390/electronics14101920
APA StyleCai, Z., Feng, J., Hou, Z., Zhang, H., & Ma, H. (2025). LocRecNet: A Synergistic Framework for Table Localization and Rectification. Electronics, 14(10), 1920. https://doi.org/10.3390/electronics14101920