Next Article in Journal
Effects of Porous Filling and Nanofluids on Heat Transfer in Intel i9 CPU Minichannel Heat Sinks
Next Article in Special Issue
Enhancing Basketball Team Strategies Through Predictive Analytics of Player Performance
Previous Article in Journal
Wake-Up Effects on Improving Gradual Switching and Variation in Al2O3-Based RRAM
Previous Article in Special Issue
Distributed Partial Label Learning for Missing Data Classification
 
 
Article
Peer-Review Record

LocRecNet: A Synergistic Framework for Table Localization and Rectification

Electronics 2025, 14(10), 1920; https://doi.org/10.3390/electronics14101920
by Zefeng Cai 1,†, Jie Feng 2,*,†, Zhaokun Hou 2,†, Haixiang Zhang 2,† and Hanjie Ma 2,†
Electronics 2025, 14(10), 1920; https://doi.org/10.3390/electronics14101920
Submission received: 8 April 2025 / Revised: 29 April 2025 / Accepted: 4 May 2025 / Published: 9 May 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In Equation (2) what is q?
What is M, in equation (4)? The authors define M=(m1,m2). I suppose it is a point, isn't it?
And m0? and r?
Is x in equation (5) the same one of that in p?

 Important parts of this article need citations, some equations, the Heatmap method, etc.
-  Please ensure to cite the articles (benchmark publications), e.g., YOLO (J. Redmon), Bezier curve (P. Bézier,1962), equations (13) to (15) (matrix confusion), etc. This detail is essential due to the recognition of its creators/inventors. More recent references can be used as supporting bibliography for application problems.
-  What are the advantages and disadvantages that should be highlighted?
- Thus, the Authors should highlight (objectively) the evident innovation of this proposal in the literature.

Author Response

Dear Reviewer,

We appreciate your careful review of our manuscript and the constructive feedback you provided. Your suggestions have not only helped improve the quality of the paper but also guided us in refining our research direction. We have made the necessary revisions based on your recommendations, and below are our detailed responses to each of your comments. The main modifications are highlighted in yellow in the file LocRecNet.pdf.

Comments 1:“In Equation (2) what is q?

Response 1:

Thank you for your comment. In Equation (2), denotes the target control point in the destination image that corresponds to the source control point . This term measures the Euclidean distance between the mapped point  and its actual position  in the target image, serving as an indicator of the registration error. In the revised version, we have explicitly clarified the definition of  in the context of the equation to avoid ambiguity.

 

Comments 2:“What is M, in equation (4)? The authors define M=(m1,m2). I suppose it is a point, isn't it?And m0? and r?

Response 2:

Thank you for your careful review. In Equation (4),  is in fact a 2×2 affine transformation matrix that represents the global linear component of the Thin Plate Spline (TPS) transformation. Here, and  correspond to the two column vectors of the affine matrix, each containing parameters for linear transformation.

The term  is a two-dimensional translation vector that accounts for the global translation component of the transformation, controlling overall displacement.

Additionally,  denotes the Euclidean distance between an arbitrary point  and a control point , i.e., , and is used in the computation of the radial basis function.

We have added explicit definitions and explanations of these terms in the revised manuscript to improve the clarity and understanding of the equations.

 

Comments 3:“Is x in equation (5) the same one of that in p?

Response 3:

Thank you for pointing out this issue. We confirm that there was a notation inconsistency in Equation (5). This equation defines the radial basis function , where the variable should be , representing the Euclidean distance between an arbitrary point  and a control point , i.e., . The use of  in the original formula was a typographical error and should be replaced by .

We have corrected the formula to:

and have added a clarification in the revised manuscript to explicitly explain the relationship between  and , in order to avoid any ambiguity.

 

Comments 4:“Important parts of this article need citations, some equations, the Heatmap method, etc.

Response 4:

Thank you for your comment. Regarding the citation of certain equations, we have already referenced articles [17] and [18] in the section describing the correction scheme. As for the keypoint detection methods mentioned in the ablation study, they are primarily based on SimCC, so no additional citations were included.

 

Comments 5:“Please ensure to cite the articles (benchmark publications), e.g., YOLO (J. Redmon), Bezier curve (P. Bézier,1962), equations (13) to (15) (matrix confusion), etc. This detail is essential due to the recognition of its creators/inventors. More recent references can be used as supporting bibliography for application problems.

Response 5:

Thank you for your valuable suggestion. We have reviewed the citations in the manuscript as per your comments. For the YOLO and Bézier curve references, we selected the citations [13] and [19] based on their publication time and relevance to the methods actually used in our work. Regarding equations (13) to (15), which describe the calculation of precision, recall, and F1-score, these are standard evaluation metrics widely used in the field and are generally not attributed to a specific original publication.

 

Comments 6:“What are the advantages and disadvantages that should be highlighted?

Response 6:

Thank you for your suggestion. In the revised manuscript, we have added a discussion of the advantages and limitations of our method in Section 4.4.4. The main advantages include its effectiveness in handling geometric distortions in table images and its strong generalization ability across multiple datasets. However, there are also certain limitations, such as reduced performance under extreme distortion or severe occlusion. We plan to further optimize the model architecture in future work to enhance its robustness.

Comments 7:“What are the advantages and disadvantages that should be highlighted?

Response 7:

Thank you for your suggestion. In the revised manuscript, we have further emphasized the innovations of this study in the Conclusion section. Additionally, summaries of the main contributions have been included in both the Introduction and Methodology sections. All relevant modifications have been highlighted in yellow for ease of reference.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Summary:

This is a nice manuscript about the lack of robustness in nowadays Table Structure Recognition systems, when they meet geometrically deformed or distorted table images (typical in real-world scenarios: packaging, mobile captured pictures or scanned documents). Existing TSR approaches usually experiment issues under these conditions, and public datasets rarely contain realistic deformations. Authors’ approache consists of a preprocessing framework and a novel data generation mechanism:

  1. Framework: 2 steps:
    1. Table edge point localization, using a keypoint detection model to estimate the structure of the distorted table.
    2. Geometric rectification, using thin plate spline interpolation to correct the table deformation using control points (boundary and internal structure heuristics). The corrected image is passed to well-know TSR models improving later recognition accuracy.
  2. New synthetic data generation: combining Bézier curve deformation and perspective transforms, to create deformed versions of standard datasets for training and evaluation.

The authors test their approach on synthetically deformed datasets and real-world deformed samples; results show that their method improves performance metrics, with better TSR accuracy in deformed and standard table images under reasonable computational costs.

 

Broad comments:

Strengths:

  1. The research is focused on a very active topic and the approach is simple (keypoint base localization + TPS correction), but apparently effective.
  2. The proposal is tested not only on synthetic datasets as most of the similar papers, but also under real datasets.
  3. Performance even under difficult datasets is quite acceptable.
  4. The text is easy to follow for any reader.

 

Weaknesses:

  1. I am missing a comparison with other rectification mechanisms (STN, document image rectification?), to provide a benchmark with respect to other alternatives.
  2. Whole pipeline studies, from beginning to end, a joint TSR plus the author’s proposal, trained together, could complete the analysis.
  3. I am also missing cases where the algorithm is not providing expected results (especially under  poor light conditions).
  4. The Conclusion section needs to be elaborated.

 

Specific comments:

  1. Major issues:
    1. Please, provide a baseline using classic geometric correction methods (STN, for instance), comparing it with your approach’s results.
    2. If possible, please, include cases where your algorithm is failing, to understand the (logical) limits of the proposal.
    3. Please, consider studying an end to end TSR plus your approach, trained together.
    4. Please, create a “Conclusion” section, summarizing the ideas behind the manuscript, and the results. At this moment, it is just a summary of the Abstract.

 

  1. Minor issues:
    1. Please, provide a description of the environment used for the tests (hardware and software).
    2. Citations:
  1. Please, provide a cite for the experimental results that are mentioned in line 186.
  2. Missing space between the previous word and the citation number ([1] in line 22 is just an example of the 22 cases).
  • Why are FLAGNet and NCGM (line 218), LORE and LGPMA (line 223) and SCITSR and PubtabNet (line 225) cited again?
    1. Figures:
  1. Please, improve the caption of Figure 5
  2. Please, add a space between “transformation”and “(f)…” in Figure 5.
    1. Please, provide a rationale for the 25 points in line 196.
    2. Please, add a table with the magnitudes used in the text, at the end  of the manuscript.
    3. Missing space in line 342 (“…processes.Specifically…”
    4. Trailing asterisk in line 390 (“…SimCC*”)?

 

Comments for author File: Comments.pdf

Author Response

Dear Reviewer,

 

We sincerely thank you for your valuable time, careful reading, and constructive comments on our manuscript. Your insights are greatly appreciated and have helped us to improve the clarity, rigor, and overall quality of our work. We have carefully considered each of your suggestions and have revised the manuscript accordingly. Below, we provide detailed responses to your comments point by point. The main modifications are highlighted in blue in the file LocRecNet.pdf.

Major issues:

Comments 1:“Please, provide a baseline using classic geometric correction methods (STN, for instance), comparing it with your approach’s results.”

Response 1:

Thank you for your valuable suggestion. In response, we conducted experiments and explored classic geometric correction methods, such as the Spatial Transformer Network (STN). However, the results showed that STN’s performance is limited when dealing with table images that exhibit nonlinear distortions. While STN allows for flexible spatial transformations, it struggles to adapt effectively to table deformations, particularly when handling large-scale nonlinear distortions, where its geometric correction accuracy does not meet expectations. Although classic methods offer advantages in certain scenarios, their poor performance in the specific task of geometric correction for table images led us to exclude them as baseline models for comparison. The following results from training and testing on part of the dataset with STN demonstrate its unsatisfactory performance, which is why we chose not to pursue further comparisons. The first row shows the input images, the third row shows the target images, and the second row shows the output images from the STN model.

Comments 2:“If possible, please, include cases where your algorithm is failing, to understand the (logical) limits of the proposal.”

Response 2:

Thank you for your valuable suggestions. In response to your request, we have included examples of algorithm failures to better highlight the limitations of the proposed approach. These failure cases reveal challenges the model faces in certain scenarios, such as ambiguous table layouts or missing clear boundary information, which can result in inaccuracies in localization and correction. By analyzing these cases, we gain a clearer understanding of the model’s shortcomings in specific situations, providing valuable insights for future improvements. We believe this additional information will help further refine the method and offer useful guidance for its future optimization. The specific revisions can be found in Section 4.4.4 of the paper.

 

Comments 3:“Please, consider studying an end to end TSR plus your approach, trained together.”

Response 3:

Thank you for your valuable suggestion. We fully agree on the importance of broadening the comparison scope, especially by incorporating end-to-end Table Structure Recognition (TSR) models, which would offer a more comprehensive evaluation of the effectiveness of our proposed method.

At present, we have been closely following several representative end-to-end TSR approaches and plan to explore them further in our future work. However, there are still some practical limitations at this stage. On the one hand, many recent end-to-end models are not open-sourced or lack complete implementation details, making it difficult to reproduce them for fair comparison. On the other hand, even for methods that have been released, most are integrated as user-facing tools designed primarily for single-image processing. These models often lack support for batch processing of table images, which limits their practical feasibility for large-scale comparison with our system.

Therefore, these methods were not included in the current comparison experiments. Nonetheless, we will continue to monitor the latest developments in this area and incorporate more representative approaches in future evaluations when feasible, to further validate the robustness and adaptability of our method.

 

Comments 4:“Please, create a “Conclusion” section, summarizing the ideas behind the manuscript, and the results. At this moment, it is just a summary of the Abstract.”

Response 4:

Thank you for your valuable suggestion. In response, we have rewritten the “Conclusion” section to ensure it is no longer a mere repetition of the abstract. Instead, it provides a comprehensive summary of the research framework, the proposed methods, and the key experimental findings and their significance. The revised conclusion highlights the innovations and effectiveness of our approach in table image localization, correction, and structure recognition. It also discusses the applicability of the method and outlines potential directions for future research. Please refer to the “Conclusion” section of the revised manuscript for details.

 

Minor issues:

Comments 1:“Please, provide a description of the environment used for the tests (hardware and software).”

Response 1:

Thank you for your suggestion. In response, we have added a detailed description of the hardware and software environment used for testing in Section 4.1 of the manuscript, to ensure better clarity and reproducibility for readers.

 

Citations:

Comments 2:“Please, provide a cite for the experimental results that are mentioned in line 186.”

Response 2:

Thank you for the reviewer’s suggestion. We have added experimental results from 10 additional test points (see Figure 3) in the revised manuscript and provided further analysis. By using these points to construct the initial control points for correction, the experimental results show that the method can effectively achieve preliminary structural recovery of the table’s boundary regions. However, for the internal regions of the table and images with significant deformations, the initial approach still has some limitations in correction accuracy. We have addressed this issue in the revision and outlined directions for future improvements.

 

Comments 3:“Missing space between the previous word and the citation number ([1] in line 22 is just an example of the 22 cases).”

Response 3:

Thank you for pointing this out. The issue regarding the missing space between the preceding word and the reference number has been corrected in the latest version of the manuscript.

 

Comments 4:“Why are FLAGNet and NCGM (line 218), LORE and LGPMA (line 223) and SCITSR and PubtabNet (line 225) cited again?”

Response 4:

Thank you for your comment. The repeated citations of FLAGNet and NCGM (line 218), LORE and LGPMA (line 223), as well as SCITSR and PubTabNet (line 225), were originally included to enhance readability, as the original references appeared much earlier in the text. However, we understand that this may result in redundancy. In response to your suggestion, we have revised the manuscript by removing the repeated citations to maintain clarity and conciseness.

 

Figures:

Comments 5:“Please, improve the caption of Figure 5,Please, add a space between “transformation”and “(f)…” in Figure 5.”

Response 5:

Thank you for your careful review. The display issue in Figure 5 occurred because the title of subfigure (f) was too long. To resolve this, we have reformatted the title by splitting it into two lines. This adjustment ensures proper spacing and improves the overall clarity and aesthetics of the figure caption.

 

Comments 6:“Please, provide a rationale for the 25 points in line 196.”

Response 6:

Thank you for the reviewer’s question. Regarding the selection principle of the 25 control points mentioned in line 196, we have provided a detailed analysis in the revised manuscript regarding the poor correction effect when using 10 control points. Due to the insufficient information provided by 10 control points in handling large deformations or internal regions of the table, the correction accuracy is limited. To address this issue, we have fully explained the principle of increasing the number of control points in the paragraph of line 198, and by selecting 25 control points, we aim to improve the correction effect. Increasing the number of control points allows for more precise geometric constraints, thereby significantly improving the correction accuracy of deformed table regions, especially when the internal structure of the table is complex or when large deformations are present.

 

Comments 7:“Please, add a table with the magnitudes used in the text, at the end  of the manuscript.”

Response 7:

Thank you for the reviewer’s suggestion. In the revised manuscript, we have added a table at the end, listing the parameter magnitudes of the keypoint detection network backbones used in the paper. By comparing the parameter magnitudes of different models, especially the differences between HRNet-s and other larger models such as HRNet, ResNet-50, ResNet-101, and ResNet-152, we demonstrate that the HRNet-s model, while maintaining high performance, is more lightweight, making it suitable for applications that require lower computational resources. This comparison further supports the goal of achieving both lightweight and efficient performance in the design of HRNet-s.

 

Comments 8:“Missing space in line 342 (“…processes.Specifically…””

Response 8:

Thank you for your careful review. The missing space in line 342 (“processes.Specifically”) has been corrected in the revised manuscript. It now reads as “processes. Specifically”.

 

Comments 9:“Trailing asterisk in line 390 (“…SimCC*”)?”

Response 9:

Thank you for pointing this out. Regarding the term “SimCC*” at the end of line 390, we would like to clarify that SimCC* refers to an enhanced version of the original SimCC method, in which Gaussian smoothing is applied. This is clearly noted in the footnote of Table 3, and also described in the corresponding paragraph of the main text to help readers understand the distinction and improvements. If any clarification is still needed, we are happy to refine the explanation further.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Dear Authors,
The present study introduces LocRecNet, a modern framework for table localization and restoration in pictures that aims to improve the structure of tables recognition with practical deformations. The purpose for this approach is clear, especially in terms of overcoming the limits of current TSR approaches in tackling practical deformations in table graphics. However, I suggest you make some modifications as follows;

  1. The study presents two baselines (LORE and LGPMA). Consider expanding your comparison by incorporating more contemporary approaches, or explain a reason for limiting it to these two.
  2. Maintain consistent citation style along with adding or revising some recent references (2023 onwards) to improve the literature review.

Author Response

Dear Reviewer,

We sincerely thank you for taking the time to thoroughly review our manuscript and provide insightful comments. Your suggestions have been invaluable in enhancing the clarity, rigor, and overall quality of our work. We have carefully considered each of your points and made the necessary revisions to the manuscript. Below, we provide detailed responses to your comments. The main modifications are highlighted in green in the file LocRecNet.pdf.

Comments 1:“The study presents two baselines (LORE and LGPMA). Consider expanding your comparison by incorporating more contemporary approaches, or explain a reason for limiting it to these two.”

Response 1:

Thank you for your valuable suggestion. In this study, LORE and LGPMA were selected as baseline methods based on two main considerations. First, these methods are representative in the task of table structure recognition, reflecting two mainstream strategies: node-relation modeling and graph-sequence parsing, respectively. Second, both methods demonstrate stable performance on public datasets and offer reproducibility, which facilitates fair comparison.

We fully agree with the importance of broadening the comparison scope and are willing to introduce more contemporary methods for further validation. However, some recent methods in this field have not released their code or lack sufficient implementation details, which limits the feasibility of direct reproduction and fair comparison. In future work, we will continue to monitor related developments and incorporate more approaches when feasible, in order to further verify the generalizability and robustness of the proposed method.

 

Comments 2:“Maintain consistent citation style along with adding or revising some recent references (2023 onwards) to improve the literature review.”

Response 2:

We sincerely thank the reviewer for the valuable comments. We attach great importance to the consistency of reference formatting and the timeliness of the literature review. A thorough examination of all references has been conducted to ensure formatting consistency throughout the manuscript. Issues such as the one in reference [5] have been corrected to fully comply with the journal’s requirements.

Regarding the suggestion on the timeliness of the references, we have carefully re-examined the cited literature. Most of the methods referenced in the manuscript are based on recent studies, particularly representative works published in 2023 and beyond, which reflect the current mainstream approaches and research trends. Therefore, no additional references have been added in this revision. However, we believe the existing references already provide a comprehensive overview of the latest developments and technological progress in this field.

Once again, we sincerely appreciate the reviewer’s thoughtful review and constructive suggestions, which have been very helpful in improving the quality of our manuscript.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

This is a manuscript I have already reviewed. In my previous review, I had raised some major and minor comments to make it more suitable for publication. Now, in this updated version, the authors have met all requirements, that I quickly summarize:

 

 Major coments:

  1. Baseline comparison with classical geometric correction methods: the authors tested the STN as a baseline; their results showed that STN underperformed in cases of nonlinear distortion. This was the idea of providing a baseline, so it is fine.
  2. Failure case analysis: this updated version of th manuscript includes now a section showing specific failure scenarios of LocRecNet (ambiguous table layouts, poorly defined boundaries). With this information, we can now identify the method’s current limitations.
  3. End to end training with TSR models: the authors acknowledge the relevance of integrating LocRecNet into a fully end to end pipeline with TSR. However, because of practical limitations (unavaliable open-source implementations and inability to process batches), they have not included this experiment in this version. I don’t consider this an issue, now that they have tried.
  4. Conclusion section: it has been rewritten to provide a summary of the framework, main findings, practical importance  and future research directions.

 

Minor comments:

  1. Hard/software environment: a description of the computational environment used for training and testing has been added.
  2. Citations:
    • Experimental details: the manuscript now includes a clarification and citation for the rationale behind the selection of control points used in geometric correction. They have included test results and an explanation to support their choice.
    • Spacing before numbers: themissing spaces between text and citation brackets have been corrected.
    • Redundant citations: the repetitive citations of models and datasets were removed.
  3. Figure caption: figure 5 caption has been fixed.
  4. Reason for 25 control points: the authors have now provided a rationale for using the 25 control points.
  5. Table of magnitudes: a table comparing the number of parameters in HRNet*, ResNet* has been added.
  6. Typographical problms: spacing issues (mssing, trailing) have been corrected.
  7. Clarification of Asterisk in “SimCC”: it is now clearly explained in the text and footnotes (a SimCC variant).

 

To me, the authors have solved every issue I could find in the previous version, and the manuscript is ready for publication.

Comments for author File: Comments.pdf

Back to TopTop