LocRecNet: A Synergistic Framework for Table Localization and Rectification

This paper introduces LocRecNet, a deformation-aware network for table localization and correction, aimed at improving the recognition accuracy of complex table data. Conventional algorithms typically depend on table cell or line features for model training but exhibit limitations when processing real-world deformed table data. LocRecNet addresses these challenges by correcting deformations prior to table structure recognition, significantly enhancing model performance. The proposed network employs a novel keypoint detection method to precisely locate table edge points, enabling the efficient correction of deformed tables. Experimental results reveal that integrating LocRecNet substantially improves table recognition algorithms in terms of various key performance metrics, with recall rates increasing by up to 10% and F1 scores nearing 90%. Tests conducted on real-world datasets further validate its effectiveness, demonstrating a reasonable trade-off between computational cost and performance gains. Additionally, LocRecNet enhances performance even on standard table data, highlighting its strong generalizability and potential for broader application.

Keywords:

LocRecNet; keypoint detection; table correction; TSR

1. Introduction

In the current era of big data, the ability to efficiently access and retrieve data, as well as extract meaningful insights from vast datasets, has emerged as a critical imperative across various sectors. Being a key medium for organizing data, tables are valued for their ability to efficiently consolidate information and clearly represent data relationships, which has led to their widespread use across industries. Table data are commonly used in documents to summarize and present information []. In the context of the ongoing digital transformation, there is a rapidly growing demand to extract tabular data from unstructured sources, such as images and PDF files []. Although this task may seem trivial to humans, the significant variability in table layouts and formats poses a considerable challenge for automated systems.

Table structure recognition (TSR) aims to interpret the structure of tables and represent them in machine-readable formats. However, due to variations in table layouts, presentation styles, and noise, this task continues to pose significant challenges. Accurately detecting and recognizing tables under such conditions has become a crucial aspect of table data processing. While numerous algorithms exist for the structural recognition of standard tables, they often require high-quality input table images []. In real-world scenarios, however, table images often display various deformations or missing elements, compromising the effectiveness of earlier algorithms.

In this study, we propose a novel network architecture, LocRecNet, aimed at addressing deformations in table data to enhance extraction accuracy. To simulate table data representing various degrees of deformation in real-life situations, a deformation algorithm is designed to generate the input dataset. In the table structure recognition process, LocRecNet is incorporated into the preprocessing stage of existing recognition models, acting as a data correction module to accurately restore deformed tables and improve recognition performance. Unlike traditional approaches that rely on edge detection or similar techniques to define image boundaries, we introduce a novel keypoint detection algorithm specifically tailored for table image analysis. After being localized and corrected by LocRecNet, the table data are subsequently input into the table structure recognition model. Experimental results show that this method substantially improves table structure recognition performance. In conclusion, our contributions are reflected in the following three aspects:

1.: A novel network architecture, LocRecNet, is proposed to effectively detect and correct deformations in table data. It precisely localizes key points and corrects deformations, ensuring a more reliable input for subsequent structure recognition, thereby improving both accuracy and robustness.
2.: A new keypoint detection algorithm, tailored for table image analysis and serving as a preprocessing step for correcting deformations, efficiently detects and localizes tables of various types and structures. This approach addresses the limitations of existing methods in handling severely deformed or noisy tables, substantially enhancing processing and recognition capabilities.
3.: Multiple deformed table datasets are generated using the algorithm, covering various table types, such as financial reports and forms, and incorporating different levels of geometric distortions, noise, and other real-world challenges. This fills a gap in current research, where comprehensive deformed datasets have been notably lacking.

2. Related Work

In recent years, the technology for recognizing table structures has rapidly developed, achieving significant results on datasets such as ICDAR-13 [], SCITSR [], PubTabNet [], and WTW []. The progress in this field can be divided into two stages: Initially, it relied on traditional algorithms, including methods based on row and column segmentation, text detection and expansion, text block classification, and the integration of various approaches. These traditional methods performed well in simple scenarios but struggled in complex situations with high noise levels or low text density. With the advent of deep learning technologies, table structure recognition has made breakthroughs in adapting to complex scenarios. However, current algorithms primarily process data from PDFs or scans, which are relatively uniform and free of deformations. Therefore, the effectiveness of recognizing deformed table images in real-world applications still needs improvement.

The continuous exploration by researchers aimed at overcoming the limitations of existing technologies has led to the introduction of more advanced solutions. In 2021, Qiao and others launched the LGPMA [] network, which employs a local and global pyramid mask alignment framework. It refines the prediction boundaries in local and global feature maps through a soft pyramid mask learning mechanism, enhancing the capability to locate and divide empty cells. Subsequently, Liu and his team attempted to integrate Transformer technology to learn more suitable inductive biases but faced challenges due to large data scales and unstable training. They further proposed FLAG-Net [], combining a Transformer in an adaptive manner with a graph-based context aggregator, to achieve end-to-end table element relation inference without the need for additional metadata or OCR information. Additionally, addressing the diversity issue in table structure recognition, Liu’s team developed a new Neural Cooperative Graph Machine (NCGM) [], alternating stacked cooperative blocks to extract intra-modal context and layering inter-modal interactions to represent the intra-modal relationships of table elements and adjust the cooperation mode among different modalities more accurately. In 2023, Xing and colleagues observed that many methods relied on heuristic rules to recover table structures, which not only required a large amount of training data but were also inefficient. Therefore, they proposed the LORE [] framework, simplifying table structure recognition to a logical position regression problem. This approach is simpler, easier to train, and more accurate than traditional TSR methods.

We examined the current advancements and limitations in table structure recognition technology. While advanced algorithms like LGPMA and FLAG-Net have made significant progress in processing conventional table structures, their performance on diverse deformed tables remains suboptimal. This issue can be attributed to two factors: the lack of datasets with sufficient deformed tables and the immaturity of recognition techniques for deformed data. To address these challenges, this paper proposes an innovative deformation simulation algorithm designed to generate deformed table images simulating real-world scenarios, effectively augmenting existing datasets. Additionally, we introduce a novel recognition strategy that involves first localizing and correcting deformed tables and then performing structural recognition. This strategy integrates a localization and correction network at the front end of the recognition pipeline. Once the deformations are corrected, existing recognition techniques can be applied directly, significantly enhancing both accuracy and efficiency, as well as paving the way for the more effective recognition of complex deformed tables.

3. Methodology

3.1. Overall Architecture

This section provides a detailed explanation of the proposed LocRecNet, designed to optimize table structure recognition models for various table formats. As shown in Figure 1, LocRecNet first employs a table edge point localization network to detect deformed tables, obtaining the key points needed for subsequent correction. In the correction network, based on the key points from table localization, curve fitting of the four boundary lines is performed, along with the calculation of boundary lengths, to determine the size of the restored table. The correction algorithm then uses the boundary curves and internal key points to restore the deformed table image, producing a normalized output. Finally, the corrected image is fed into the table structure recognition network for recognition. The following subsections introduce these key components in detail.

Figure 1. Overallstructure diagram.

3.2. Table Edge Point Localization

In the LocRecNet framework, the precise localization of table boundary lines is a crucial component. For the design of the table edge point localization module, we conducted extensive analyses and experimentation. While many existing image localization techniques, such as boundary detection [] and YOLO [], demonstrate excellent accuracy, the information provided by these methods is limited for subsequent correction tasks, particularly in handling tables. These techniques generally only detect the corner points of tables, offering insufficient information for precise boundary curve fitting. To address this limitation, we propose a novel keypoint detection network specifically tailored for table detection.

Our algorithm is inspired by the lightweight keypoint detection network SimCC [], with significant improvements to better accommodate the characteristics of table images. Unlike SimCC, our approach not only transforms the regression problem into a classification problem but also enhances both accuracy and efficiency by optimizing feature extraction and keypoint prediction. We specifically adopted an optimized version of HRNet [] as the backbone network and introduced a new architecture, HRNet-s, as shown in Figure 2. By removing the fourth stage of HRNet and retaining only the first three, we reduced the model’s complexity while preserving its strong feature extraction capabilities for table images.

Figure 2. Keypointdetection network.

HRNet-s [] extracts keypoint feature representations from the input table images, generating a feature map of shape (n, H′, W′). To facilitate subsequent classification operations, we flatten these features into a two-dimensional vector of shape (n, H′ × W′). Building upon this, we design a novel method to predict keypoint coordinates. Inspired by the classification strategy of SimCC, we employ separate classifiers for the horizontal and vertical coordinates, each consisting of a linear layer, to predict the x and y coordinates of each keypoint.

To further enhance localization accuracy, we adopt a method akin to that of SimCC, classifying each pixel into multiple bins to minimize the quantization error. This approach enables sub-pixel precision in keypoint detection, ultimately mapping precise coordinates through probability distributions. This foundation provides a solid basis for subsequent table correction and structural recognition.

3.3. Image Correction

The image correction algorithm is an integral part of LocRecNet, developed to restore deformed tables and resolve recognition challenges caused by structural distortions. The correction process was designed with the careful consideration of computational costs as a preprocessing step for table structure recognition, resulting in a low-computation solution. Inspired by the thin-plate spline interpolation method [], a traditional algorithm-based table correction approach was devised. This approach effectively manages deformed tables while enhancing the efficiency of the overall recognition process.

Based on the theory of thin-plate spline interpolation, it is necessary to define two terms initially: one is the fitting term

E_{Φ}

, which measures the magnitude of the deformation of source points towards the target points; the second is the bending term

E_{d}

, which measures the amount of distortion of the surface. Therefore, we can derive the total loss function as follows:

E = E_{Φ} + λ E_{d}

(1)

where

λ

is a weighting coefficient that controls the degree to which non-rigid deformations are permitted, with different values of

λ

being suitable for varying degrees of deformation. Specifically,

\begin{matrix} E_{Φ} = \sum_{i = 1}^{N} {||Φ (p_{i}) - q_{i}||}^{2} \end{matrix}

(2)

\begin{matrix} E_{d} = {\int \int}_{R^{2}} {({(\frac{\partial^{2} Φ}{\partial x^{2}})}^{2} + 2 {(\frac{\partial^{2} Φ}{\partial x \partial y})}^{2} + {(\frac{\partial^{2} Φ}{\partial y^{2}})}^{2})}^{2} d x d y \end{matrix}

(3)

In Equation (2), N denotes the number of control points. Each control point

p_{i} \in R^{2}

in the source image is mapped by the deformation function

Φ

and compared to its corresponding point

q_{i} \in R^{2}

in the target image, forming the data fidelity term. Equation (3) defines the bending energy of the deformation function

Φ

over the entire domain, incorporating second-order partial derivatives with respect to spatial coordinates to quantify the smoothness of the transformation. By jointly minimizing these two energy terms, a unique closed-form solution for the deformation function

Φ

can be obtained, as presented in Equation (4):

Φ (p) = M \cdot p + m_{0} + \sum_{i = 1}^{N} ω_{i} U (||p - p_{i}||)

(4)

Let p denote an arbitrary point on the surface, expressed as

p = {(x, y)}^{T}

, and let

p_{i}

be the i-th control point in the domain. The term

ω_{i}

represents the weight associated with the radial basis function centered at

p_{i}

. The matrix

M = (m_{1}, m_{2})

is a

2 \times 2

affine transformation matrix that models the global linear deformation, where

m_{1}

and

m_{2}

are its parameter vectors. The vector

m_{0}

denotes the translation term, which controls the overall translation of the deformation. The function

U (\cdot)

is the radial basis function, indicating that the deformation at any point on the surface is influenced by all control points. The complete formulation is given in Equation (5):

U (r) = r^{2} log r

(5)

where

r = | | p - p_{i} | |

represents the Euclidean distance between the point p and the control point

p_{i}

. This function governs the influence of each control point on different locations on the surface, with the influence decreasing as the distance increases [].

To address the problem of image deformation, this study proposes a novel image correction algorithm designed to restore the original structure and content of distorted images by adjusting the control points. Inspired by thin-plate spline interpolation theory, the algorithm is optimized and customized for specific correction tasks, significantly enhancing its practicality and correction performance. Initially, the target dimensions of the corrected image are determined based on the keypoint detection results, with the primary objective of preserving the integrity of the table content. Based on the table localization outputs, 10 keypoints are selected as the initial control points, as their target transformation coordinates can be readily inferred. By calculating the length of the pre-correction table boundaries, the coordinates of the corner points and equally spaced division points along the top and bottom boundaries of the corrected table are derived. Using these points to construct the initial control set, the correction results (see Figure 3) demonstrate the algorithm’s capability to recover the structural layout along the table boundaries. However, its performance in handling interior regions of the table or cases with severe deformations remains limited.

Figure 3. Correction results using 10 control points. (a) Original deformed image; (b) image after correction.

To further improve correction accuracy, a control point enhancement strategy based on table structural features is proposed. Specifically, the left and right boundaries of the table are divided into four equal segments, excluding the existing corner points, thereby generating six additional evenly spaced points along the vertical edges. These enhance the algorithm’s ability to handle vertical distortions. Furthermore, nine internal intersection points are derived by analyzing the equally spaced divisions across both the vertical and horizontal directions, providing more detailed and accurate references for deformation modeling. The coordinates of these points and their corresponding target positions in the corrected image can be directly computed from the predefined image dimensions. In total, 25 control points are constructed, comprising 10 points along the top and bottom boundaries (including corner points and evenly divided points), 6 points along the vertical boundaries (excluding corners), and 9 internal intersection points. The top and bottom boundary points primarily constrain overall structural deformation, the vertical boundary points enhance the correction of vertical distortions, and the internal points offer finer-grained geometric reference information, collectively improving the precision and robustness of the correction process. Two corresponding control point sets are ultimately established, denoted as sets S and T, each containing 25 points.

\begin{matrix} S = {(x_{0}, y_{0}), (x_{0}, y_{1}), \dots, (x_{4}, y_{4})} \end{matrix}

(6)

\begin{matrix} T = {(x_{0}^{'}, y_{0}^{'}), (x_{0}^{'}, y_{1}^{'}), \dots, (x_{4}^{'}, y_{4}^{'})} \end{matrix}

(7)

where set S and set T correspond one-to-one, with set S referred to as the template point set and set T referred to as the target point set, which are the control points for thin-plate spline interpolation. The specific configuration of these point sets is illustrated in Figure 4.

Figure 4. Correctionalgorithm architecture diagram.

Subsequently, utilizing the aforementioned 25 control points, the displacements

(Δ x, Δ y)

can be derived through coordinate transformations, yielding

Δ S = {(Δ x_{0}, Δ y_{0}), \dots, (Δ x_{4}, Δ y_{4})}

. Following this, it is necessary to establish a thin-plate spline interpolation model. By employing the control points and their corresponding displacements, two distinct thin-plate spline models are constructed: one for the interpolation of the horizontal displacement

Δ x

and the other for the interpolation of the vertical displacement

Δ y

. The specific functional forms are as follows:

\begin{matrix} Δ x (p) = Φ {(p)}_{Δ x} \end{matrix}

(8)

\begin{matrix} Δ y (p) = Φ {(p)}_{Δ y} \end{matrix}

(9)

Thus, for each pixel point

(x, y)

in the original image, the horizontal displacement

Δ x (x, y)

and vertical displacement

Δ y (x, y)

can be computed utilizing the thin-plate spline interpolation model. Consequently, as depicted in fig4, the updated pixel coordinates

(x_{new}, y_{new})

can be derived using the following equations:

\begin{matrix} x_{new} = x + Δ x (x, y) \end{matrix}

(10)

\begin{matrix} y_{new} = y + Δ y (x, y) \end{matrix}

(11)

To obtain the corrected table image, we need to combine all the newly acquired pixel points. The implementation steps for the entire table image correction are shown in Algorithm 1. Ultimately, utilizing the aforementioned pixel processing techniques, we are able to achieve the correction of distorted tables, as illustrated in Figure 5.

Algorithm 1 Table Image Correction Algorithm

Require:: Deformed table image $I m g_{d e f o r m}$ , Control points set S, Target points set T
Ensure:: Rectified table image $I m g_{c o r r e c t e d}$
1:: //Compute control points displacement
2:: for $i = 0$ to $length (S) - 1$ do
3:: $Δ S [i] = T [i] - S [i]$
4:: end for
5:: //Establish TPS interpolation model
6:: $m o d e l_{Δ x} = BuildRBFModel (S, Δ S_{x} component)$
7:: $m o d e l_{Δ y} = BuildRBFModel (S, Δ S_{y} component)$
8:: //Apply interpolation model to each pixel
9:: for $y = 0$ to image_height $(I m g_{d e f o r m}) - 1$ do
10:: for $x = 0$ to image_width $(I m g_{d e f o r m}) - 1$ do
11:: $Δ x = Interpolate (m o d e l_{Δ x}, x, y)$
12:: $Δ y = Interpolate (m o d e l_{Δ y}, x, y)$
13:: $x_{new} = x + Δ x$
14:: $y_{new} = y + Δ y$
15:: $I m g_{c o r r e c t e d} [y_{new}] [x_{new}] = GetPixel (I m g_{d e f o r m}, x_{new}, y_{new})$
16:: end for
17:: end for
18:: return $I m g_{c o r r e c t e d}$

Figure 5. Correction results using 25 control points. (a) Original deformed image; (b) image after correction.

3.4. Table Structure Recognition

Numerous recognition algorithms have achieved significant success in processing standard tables. However, their performance remains limited when dealing with deformed tables. Although research efforts such as FLAG-Net and NCGM have sought to address this challenge, the absence of publicly available models and datasets has hindered in-depth performance comparisons. To overcome this limitation, the present study integrates open-source standard table structure recognition algorithms with the proposed LocRecNet to explore more effective methods for recognizing deformed table structures.

Following a comprehensive analysis, we select LORE and LGPMA as baseline algorithms for a performance evaluation, primarily due to their open-source nature and the validation of their original models on the SCITSR and PubTabNet datasets. As recent advancements in the field, both LORE and LGPMA show strong potential for handling complex deformed table structure recognition tasks. Consequently, we plan to use the pre-trained models of these two algorithms to recognize both original and deformed tables, incorporating LocRecNet for further optimization. This will allow us to assess the performance improvements brought by LocRecNet across different types of table images.

4. Experiments

4.1. Experimental Setting

All experiments were conducted on a server equipped with an NVIDIA GeForce RTX 3090 GPU (24 GB VRAM) and an Intel Xeon Gold 6133 CPU @ 2.50 GHz, with 24 GB of RAM and a 2 TB SSD. The software environment consisted of Ubuntu 20.04.6, PyTorch 2.2.1, Python 3.9, and CUDA 12.1. Image processing and data preprocessing were performed using OpenCV and other related libraries.

For table keypoint detection, the SCITSR dataset was used for training. The input image sizes were set to 256 × 192 and 384 × 288 for different model scales. Training was performed for 150 epochs with a batch size of 16 using the Adam optimizer. The initial learning rate was 1×

10^{- 3}

, which was reduced to 1 ×

10^{- 4}

and 1 ×

10^{- 5}

at the 80th and 120th epochs, respectively. Label smoothing was employed to improve the generalization capability of the classification-based keypoint detection model. In the table structure recognition task, the proposed LocRecNet was integrated into both the LORE and LGPMA frameworks. All models were trained and evaluated under identical settings to ensure fair comparisons and to validate the effectiveness and adaptability of the proposed method.

4.2. Dataset

Currently, most publicly available table datasets primarily consist of standard table data obtained from scanned images or PDF screenshots, and, thus, they lack distortions or other artifacts. Although some datasets with deformed tables exist, they contain a very limited number of images. A review of the relevant literature showed that many datasets used in deformation-based table recognition studies are custom-created for specific experiments and are either not publicly available or restricted to internal access. This study applied a deformation algorithm to standard table image datasets, simulating the task of recognizing deformed tables.

The SCITSR and PubTabNet datasets were chosen as the foundation for the table deformation experiments. Renowned for their high-quality table images, comprehensive annotations, and extensive data volumes, these datasets are widely regarded as benchmarks in table structure recognition. They are frequently utilized to evaluate various table recognition algorithms, ensuring the generality and reliability of experimental results. To generate deformed data, an innovative algorithm combining Bézier curves [] and perspective transformation [] was developed. This approach integrates the nonlinear deformation characteristics of Bézier curves with the geometric deformation properties of perspective transformation, converting normal table data into deformed tables to effectively simulate the complex deformations encountered in real-world scenarios. Compared to methods such as NCGM, which rely on either Bézier curves or perspective transformations alone, the proposed combined approach generates more complex and realistic deformations. This provides more challenging and representative data for evaluating subsequent correction algorithms.

As shown in Figure 6, images b and c, generated using a single algorithm, exhibit simpler deformations. When the deformation extends beyond the image boundaries, black padding is typically applied. This not only affects structure recognition but also simplifies table localization, since the table boundaries are easily detectable through the black edges. In contrast, our method combines Bézier curves and perspective transformation, producing deformed table images (e.g., image f) that feature both bending and perspective effects. This results in more complex table structures that closely resemble real-world scenarios. Additionally, we improved the padding method by replacing the black padding with a white fill to match the table’s background color. This adjustment enables the table to blend more seamlessly into the background, producing cleaner data and minimizing external noise.

Figure 6. Datacomparison chart. (a) Original images; (b) Bezier transformation; (c) perspective transformation; (d) original images; (e) Bezier transformation; (f) perspective transformation on (e).

Given that the aforementioned deformed table image datasets are primarily algorithmically generated, the WTW dataset was selected to further validate the algorithm’s effectiveness in real-world scenarios. The WTW dataset is a public resource focused on table and text recognition in document images, containing table images from a variety of complex scenarios, offering high diversity and authenticity. To create the WTW-curved dataset, table samples with significant deformation were selected from the WTW dataset, primarily from ingredient lists on product packaging. The deformations in these samples are typically caused by surface distortions of the products or issues with the shooting angle, resulting in typical nonlinear deformation characteristics (as shown in Figure 7). During the selection process, efforts were made to ensure the representativeness of the samples, with some images cropped and cleaned to meet the experimental requirements. The introduction of the WTW-curved dataset was crucial for evaluating the adaptability of the LocRecNet algorithm in real-world scenarios. Compared to algorithmically generated deformed data, these samples more accurately reflect the algorithm’s performance in handling complex and irregular deformations. Experiments conducted on this dataset effectively analyzed the robustness and correction capabilities of LocRecNet in real-world settings, providing reliable evidence for its practical application.

Figure 7. WTW-curved data. (a–c) are WTW-curved example images.

4.3. Evaluation Metrics

Firstly, in assessing the performance of keypoint detection, we primarily use two metrics: average precision (AP) and average recall (AR). AP is a critical metric for measuring the accuracy of model predictions, calculated as the ratio of correctly predicted keypoints to the total number of keypoints predicted, demonstrating the precision of the model in predicting keypoints. AR measures the model’s completeness, evaluated by the average recall rates. Recall itself is defined as the ratio of correctly predicted keypoints to the total number of actual keypoints, reflecting the model’s ability to correctly identify keypoints. The calculations for AP and AR depend on Object Keypoint Similarity (OKS), which is based on the Euclidean distance between the predicted keypoints and their true coordinates. OKS is defined as follows:

O K S = \frac{\sum_{i} exp (- \frac{d_{i}^{2}}{2 s^{2} j_{i}^{2}}) σ (v_{i} > 0)}{\sum_{i} σ (v_{i} > 0)}

(12)

where

d_{i}

represents the Euclidean distance between the i-th predicted keypoint and its true coordinates, s denotes the scale of the target,

σ

is a fixed standard deviation, and

v_{i}

indicates the visibility flag of the keypoint. Both AP and AR are calculated as the averages over multiple OKS thresholds, providing a comprehensive evaluation of model performance. This method of calculation ensures the completeness and accuracy of the evaluation results.

Secondly, for the determination of metrics for evaluating table structure, we chose the three most used metrics in table structure recognition work: precision, recall, and F1 score. Precision is defined as the ratio of the number of samples correctly identified as positive by the model to the total number of samples identified as positive by the model. It measures how many of the samples predicted as positive are truly positive. Recall is the ratio of the number of samples correctly identified as positive by the model to the total number of actual positive samples in the dataset. It assesses the model’s ability to identify all positive samples. The F1 score is the harmonic mean of precision and recall, considering the performance of both. The F1 score ranges between 0 and 1, with values closer to 1 indicating better model performance. The specific calculation formulas are as follows:

\begin{matrix} P r e c i s i o n = \frac{T P}{T P + F P} \end{matrix}

(13)

\begin{matrix} R e c a l l = \frac{T P}{T P + F N} \end{matrix}

(14)

\begin{matrix} F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} \end{matrix}

(15)

where TP stands for true positives, FP stands for false positives, and FN stands for false negatives.

4.4. Experimental Results and Analysis

4.4.1. Performance Evaluation of LocRecNet

To validate the effectiveness of LocRecNet, we tested the LORE and LGPMA algorithms on the SCITSR-curved, PubTabNet-curved, and WTW-curved datasets, with the metrics presented in Table 1. The experimental results indicate that the inclusion of LocRecNet led to performance improvements for both algorithms across the three datasets.

Table 1. Performance comparison of LORE and LGPMA methods with/without LocRecNet.

On the SCITSR-curved and PubTabNet-curved datasets, LocRecNet significantly enhanced recognition performance. For the SCITSR-curved dataset, the recall rate of both algorithms increased by approximately 14% after integrating LocRecNet, with the F1 scores nearing 90%. On the PubTabNet-curved dataset, although the overall improvement was slightly lower than on the SCITSR-curved dataset, LocRecNet still yielded notable gains. In particular, the recall rate of the LGPMA algorithm improved by 10.5%, with an increase of 6.4% in the F1 score. These gains on deformed datasets highlight the substantial benefit of LocRecNet in precisely localizing table boundaries and structures, effectively enhancing both localization and recognition capabilities.

In the real-world scenario tests on the WTW-curved dataset, we first evaluated the LORE algorithm using its pre-trained model. The results showed that, after integrating LocRecNet, accuracy increased by 4%, and recall improved by 1.3%, with the F1 score reaching 97.9%. As for the LGPMA algorithm, which lacked a pre-trained model on the WTW dataset, the pre-trained model from PubTabNet was used for training. However, the experimental results revealed that the direct recognition performance of LGPMA was relatively weak, with an accuracy of 52.0% and an F1 score of only 64.1%. After integrating LocRecNet, the recall rate of LGPMA increased by just 5.3%, but accuracy surged by 36.5%, and the F1 score improved by 24.4%. The primary contribution of LocRecNet is its ability to effectively eliminate extraneous information outside the table, thereby preserving the table’s key content. The results presented in Figure 8 show that the data processed by LocRecNet were more streamlined, with the table’s primary structure clearly retained, significantly reducing the difficulty of model recognition and enhancing overall performance.

Figure 8. Results of LocRecNet on the WTW-curved dataset. (a) Original images; (b) LocRecNet result; (c) original images; (d) LocRecNet result.

4.4.2. Computational Cost Analysis of LocRecNet

To evaluate the impact of introducing LocRecNet on computational cost, detailed tests are conducted on the performance of the keypoint detection and correction modules. The experimental results, summarized in Table 2, reveal notable differences in the processing time across datasets. These variations are hypothesized to stem from the differing characteristics of the table images in each dataset, which directly influence the localization and correction processes. Specifically, the WTW dataset, comprising primarily ingredient lists on product packaging, contains table images with complex color and background information, resulting in relatively longer processing times. In contrast, the SCITSR and PubTabNet datasets feature more structured and simpler table images, significantly reducing LocRecNet’s processing time. For example, the average processing time for standard table images is 0.03 s, whereas processing a single table image from the WTW dataset requires 0.2 s. While this processing time is relatively higher, it remains within an acceptable range. Notably, despite the slightly longer initial processing time for the WTW dataset, LocRecNet substantially enhances model performance in subsequent table recognition tasks, particularly in terms of recognition accuracy and stability. From a holistic perspective, the additional computational overhead introduced by LocRecNet is both reasonable and acceptable. Given the significant improvement in table recognition accuracy, the advantages of LocRecNet’s performance are particularly compelling.

Table 2. LocRecNet’s processing time.

4.4.3. Visualization of Results

To intuitively demonstrate the correction effectiveness of LocRecNet, representative images from different datasets were selected for a result comparison. The first row shows the recognition results on the original images, while the second row presents the results after table correction using LocRecNet. As illustrated in Figure 9 and Figure 10, LocRecNet effectively rectifies table deformation in the SCITSR-curved and PubTabNet-curved datasets, significantly improving recognition accuracy. Additionally, the results on the WTW-curved dataset (Figure 11) further validate the advantages of LocRecNet, particularly in table edge point localization, where it demonstrates outstanding performance on complex and distorted tables. In the comparison images, the second column shows the recognition results of the LGPMA algorithm, while the third column displays those of LORE. These visual comparisons clearly highlight the superiority of LocRecNet in table structure recovery tasks.

Figure 9. SCITSR-curved recognition results. (a) SCITSR-curved data; (b) LGPMA result; (c) LORE result; (d) LocRecNet result; (e) LGPMA result; (f) LORE result.

Figure 10. PubTabNet-curvedrecognition results. (a) PubTabNet-curved data; (b) LGPMA result; (c) LORE result; (d) LocRecNet result; (e) LGPMA result; (f) LORE result.

Figure 11. WTW-curved recognition results. (a) WTW-curved data; (b) LGPMA result; (c) LORE result; (d) LocRecNet result; (e) LGPMA result; (f) LORE result.

4.4.4. Overall Performance

The experimental results demonstrate that LocRecNet significantly enhances the performance of two state-of-the-art table structure recognition algorithms, LORE and LGPMA, across three benchmark datasets featuring distorted tables—the SCITSR-curved, PubTabNet-curved, and WTW-curved datasets. Notably, LocRecNet exhibits a strong robustness and high recognition accuracy in scenarios involving structurally complex and heavily deformed table images. By simultaneously improving table localization and structure reconstruction, the proposed method advances the overall recognition pipeline. Although the integration of LocRecNet introduces additional computational overhead—particularly a longer processing time for high-complexity samples in the WTW dataset—the trade-off is well justified by the substantial improvements in accuracy and result stability, underscoring the practical utility of the approach.

Nevertheless, despite its effectiveness, LocRecNet presents certain limitations under specific conditions. In particular, an experimental analysis reveals that the localization module may produce inaccurate predictions for tables lacking upper and lower boundary lines, thereby diminishing the effectiveness of the geometric correction stage. The absence of clear boundary cues in these borderless tables impairs the model’s ability to accurately detect table contours, often resulting in cell segmentation errors or abnormal cell merging. As illustrated in Figure 12, such issues are evident: Figure 12a displays the manually annotated keypoints of the original table, while Figure 12b shows the predicted keypoints generated by LocRecNet, revealing noticeable positional deviations. These deviations propagate to the correction module, ultimately leading to a rectification output with visible residual distortions and structural misalignment, as seen in Figure 12c. These findings indicate that, in complex scenarios characterized by both ambiguous layouts and severe deformations, the current method still encounters challenges in boundary perception and geometric modeling. Future work will explore the integration of semantic guidance and visual attention mechanisms to enhance the model’s adaptability and generalization capabilities in handling structurally ambiguous, borderless tables.

Figure 12. LocRecNet limitations on special tables. (a) Original table keypoint annotations; (b) predicted keypoints by LocRecNet; (c) corrected result by LocRecNet.

4.5. Ablation Study

4.5.1. LocRecNet Table Edge Point Localization

As a critical component of LocRecNet, the accuracy of table edge point localization directly affects the effectiveness of subsequent correction operations. To determine the optimal keypoint detection network for table images, we conducted multiple rounds of ablation experiments. These experiments evaluated the impact of various model architectures and algorithmic approaches on overall performance. To maintain fairness and scientific rigor, all experiments were conducted under identical training parameters for a consistent comparison.

Baseline Algorithm Selection Experiment: We explored the impact of different algorithmic approaches on model performance, focusing on a comparison between the SimCC method and the traditional Heatmap method. As shown in Table 3, we found that the SimCC method consistently outperformed the traditional Heatmap approach in terms of performance, regardless of the backbone network. Furthermore, after applying Gaussian label smoothing [] at the final SimCC output stage, the model’s performance improved significantly, especially in recognizing complex table structures. This validates the SimCC* method’s effectiveness for keypoint detection and offers clear guidance for further algorithmic optimization.

Table 3. Table recognition metrics.

Input Size Selection Experiment: To determine the optimal input size, we compared two resolutions: 256 × 192 and 384 × 288. As shown in Table 3, with the same backbone and algorithm, the model achieved higher AP and AR values with an input size of 384 × 288. We speculate that the larger input size offers a broader field of view, allowing the network to capture more contextual information. However, despite the performance improvements associated with the 384 × 288 input size, we also observed an increase in computational cost and model complexity. This may pose challenges for deployment and inference efficiency in resource-constrained applications.

Backbone Network Selection Experiment: In selecting the backbone network, we evaluated various models, including HRNet and ResNet []. We conducted extensive experiments under different algorithmic approaches, ensuring that each comparison was performed on the same baseline. The results indicated that HRNet consistently outperformed others regardless of the algorithm, demonstrating its superior performance in keypoint detection and establishing it as the ideal choice for our research. To mitigate the increased computational costs and model complexity associated with the larger input size, we optimized the HRNet structure by removing the fourth stage and retaining only the first three stages (referred to as HRNet-s) to simplify the model architecture. As shown in Table 3 and Table A1, under the same input size and algorithm, the optimized HRNet exhibited a minimal decrease in AP and AR values, while the parameter count was reduced to 25% of the original, significantly decreasing the model size. This adjustment significantly reduced model complexity while maintaining acceptable performance, improving deployment efficiency and inference speed in practical applications.

4.5.2. Impact of LocRecNet on Standard Table Data

To determine whether our designed LocRecNet impacts the recognition performance of standard tables, we conducted experiments on the SCITSR and PubTabNet datasets, with the results presented in Table 4. We found that, for both algorithms, the addition of LocRecNet did not change the metrics for the PubTabNet dataset compared to direct recognition. However, on the SCITSR dataset, we observed that, while the accuracy of the LORE algorithm decreased by 0.2% after adding LocRecNet, both the recall and F1 scores increased by 0.7% and 0.2%, respectively. In the case of the LGPMA algorithm, the improvement was more pronounced, with no change in accuracy but increases of 0.7% and 0.4% in the recall and F1 scores, respectively. Overall, these experimental results indicate that the incorporation of the LocRecNet network leads to a noticeable improvement in recognition performance on standard table data, particularly reflected in the recall metrics.

Table 4. Performance comparison of LORE and LGPMA methods with/without LocRecNet.

We analyzed the reasons behind the improved recognition performance of the algorithms on the SCITSR dataset after LocRecNet processing. Observations of the images revealed a significant presence of both tables and captions, which introduced noise and complicated the recognition process (Figure 13a). However, after processing with LocRecNet, the images (Figure 13b) contained only the table content. This enhancement increased the model’s accuracy in locating the target, reducing missed detections and enabling it to capture more positive samples.

Figure 13. LocRecNet processing results on SCITSR-curved data. (a) Original images; (b) LocRecNet result.

5. Conclusions

This paper proposes a novel table structure recognition framework, LocRecNet, designed to address the challenges posed by complex table images exhibiting geometric distortions. Centered on deformation correction, LocRecNet integrates keypoint detection and geometric rectification into a unified pipeline, enabling robust structure restoration, even under severe deformations or ambiguous layouts.

Extensive experiments conducted on three challenging benchmark datasets—the SCITSR-curved, PubTabNet-curved, and WTW-curved datasets—demonstrate that LocRecNet significantly outperforms state-of-the-art methods such as LORE and LGPMA in terms of both the recognition accuracy and F1 score. Notably, LocRecNet shows strong robustness and generalization capabilities when dealing with structurally complex or semantically dense tables, validating its applicability in real-world scenarios. Although the introduction of LocRecNet results in a modest increase in computational cost when processing highly complex samples, the corresponding improvements in accuracy and output stability justify this trade-off. Moreover, an analysis of failure cases highlights areas for further enhancement, particularly in handling borderless tables and improving boundary perception under layout ambiguity.

In summary, LocRecNet achieves remarkable performance in the recognition of deformed table structures and introduces a new technical paradigm that combines deformation awareness with correction-driven design. This work lays a foundation for the development of more flexible, semantically guided table analysis systems and offers valuable insights for future research and practical deployment in the field of document image processing.

Author Contributions

Conceptualization, Z.C. and J.F.; methodology, Z.C. and J.F.; software, Z.C.; validation, Z.C. and Z.H.; formal analysis, Z.C.; investigation, Z.C.; resources, J.F., H.Z. and H.M.; data curation, Z.H.; writing—original draft preparation, Z.C.; writing—review and editing, J.F. and H.Z.; visualization, Z.H.; supervision, H.M. and H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

The parameter magnitude of the keypoint detection network backbone.

Table A1. Model parameter scale.

Model	Parameter Magnitude
HRNet	≈48.6 M
HRNet-s	≈11.2 M
Res50	≈25.6 M
Res101	≈44.5 M
Res152	≈60.3 M

References

Hu, J.; Kashi, R.S.; Lopresti, D.P.; Wilfong, G. Table structure recognition and its evaluation. In Document Recognition and Retrieval VIII, Proceedings of the 8th International Conference on Document Recognition and Retrieval, San Jose, CA, USA, 21 December 2000; SPIE: Bellingham, WA, USA, 2000; pp. 44–55. [Google Scholar]
Deng, Y.; Rosenberg, D.; Mann, G. Challenges in end-to-end neural scientific table recognition. In Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia, 20–25 September 2019; IEEE: New York, NY, USA, 2019; pp. 894–901. [Google Scholar]
Alexiou, M.S.; Bourbakis, N.G. Pinakas: A methodology for deep analysis of tables in technical documents. Int. J. Artif. Intell. Tools 2023, 32, 2350042. [Google Scholar] [CrossRef]
Göbel, M.; Hassan, T.; Oro, E.; Orsi, G. ICDAR 2013 table competition. In Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), Washington, DC, USA, 25–28 August 2013; IEEE: New York, NY, USA, 2013; pp. 1449–1453. [Google Scholar]
Desai, H.; Kayal, P.; Singh, M. TabLeX: A benchmark dataset for structure and content information extraction from scientific tables. In Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR), Lausanne, Switzerland, 5–10 September 2021; Part II. Springer: Cham, Switzerland, 2021; pp. 554–569. [Google Scholar]
Zhong, X.; ShafieiBavani, E.; Jimeno Yepes, A. Image-based table recognition: Data, model, and evaluation. In Computer Vision—ECCV 2020, Proceedings of the 16th European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer: Cham, Switzerland, 2020; pp. 569–585. [Google Scholar]
Long, R.; Wang, W.; Xue, N.; Gao, F.; Yang, Z.; Wang, Y.; Xia, G.S. Parsing table structures in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 944–952. [Google Scholar]
Qiao, L.; Li, Z.; Cheng, Z.; Zhang, P.; Pu, S.; Niu, Y.; Ren, W.; Tan, W.; Wu, F. LGPMA: Complicated table structure recognition with local and global pyramid mask alignment. In Document Analysis and Recognition—ICDAR 2021, Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR), Lausanne, Switzerland, 5–10 September 2021; Lladós, J., Lopresti, D., Uchida, S., Eds.; Springer: Cham, Switzerland, 2021; pp. 67–73. [Google Scholar]
Liu, H.; Li, X.; Liu, B.; Jiang, D.; Liu, Y.; Ren, B.; Ji, R. Show, read and reason: Table structure recognition with flexible context aggregator. In Proceedings of the 29th ACM International Conference on Multimedia (MM ’21), New York, NY, USA, 20–24 October 2021; ACM: New York, NY, USA, 2021; pp. 1084–1092. [Google Scholar]
Liu, H.; Li, X.; Liu, B.; Jiang, D.; Liu, Y.; Ren, B. Neural collaborative graph machines for table structure recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; IEEE: New York, NY, USA, 2022; pp. 4533–4542. [Google Scholar]
Xing, H.; Gao, F.; Long, R.; Bu, J.; Zheng, Q.; Li, L.; Yu, Z. LORE: Logical location regression network for table structure recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; AAAI Press: New York, NY, USA, 2023; Volume 37, pp. 2992–3000. [Google Scholar]
Zhang, Z.; Hu, P.; Ma, J.; Du, J.; Zhang, J.; Yin, B.; Liu, C. SEMv2: Table separation line detection based on instance segmentation. Pattern Recognit. 2024, 149, 110279. [Google Scholar] [CrossRef]
Huang, Y.; Yan, Q.; Li, Y.; Chen, Y.; Wang, X.; Gao, L.; Tang, Z. A YOLO-based table detection method. In Proceedings of the 16th International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia, 20–25 September 2019; IEEE: New York, NY, USA, 2019; pp. 813–818. [Google Scholar]
Li, Y.; Yang, S.; Liu, P.; Zhang, S.; Wang, Y.; Wang, Z.; Xia, S.-T. SimCC: A simple coordinate classification perspective for human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; Springer Nature Switzerland: Cham, Switzerland, 2022; pp. 89–106. [Google Scholar]
Yu, C.; Xiao, B.; Gao, C.; Yuan, L.; Zhang, L.; Sang, N.; Wang, J. Lite-hrnet: A lightweight high-resolution network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: New York, NY, USA, 2021; pp. 10440–10450. [Google Scholar]
Yang, S.; Quan, Z.; Nie, M.; Yang, W. Transpose: Keypoint localization via transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 11802–11812. [Google Scholar]
Keller, W.; Borkowski, A. Thin plate spline interpolation. J. Geod. 2019, 93, 1251–1269. [Google Scholar] [CrossRef]
Wood, S.N. Thin plate regression splines. J. R. Stat. Soc. Ser. B Stat. Methodol. 2003, 65, 95–114. [Google Scholar] [CrossRef]
Prautzsch, H.; Boehm, W.; Paluszny, M. Bézier and B-Spline Techniques; Springer: Berlin/Heidelberg, Germany, 2002; Volume 6, pp. 25–41. [Google Scholar]
Jin, B.; Liu, Y.; Liu, D.; Qi, W.; Chen, Y.; Wang, S. Research on automatic correction of the document images based on perspective transformation. In Proceedings of the 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), Fuzhou, China, 24–26 September 2021; IEEE: New York, NY, USA, 2021; pp. 291–297. [Google Scholar]
Müller, R.; Kornblith, S.; Hinton, G.E. When does label smoothing help? Adv. Neural Inf. Process. Syst. 2019. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 770–778. [Google Scholar]

Figure 1. Overallstructure diagram.

Figure 2. Keypointdetection network.

Figure 3. Correction results using 10 control points. (a) Original deformed image; (b) image after correction.

Figure 4. Correctionalgorithm architecture diagram.

Figure 5. Correction results using 25 control points. (a) Original deformed image; (b) image after correction.

Figure 6. Datacomparison chart. (a) Original images; (b) Bezier transformation; (c) perspective transformation; (d) original images; (e) Bezier transformation; (f) perspective transformation on (e).

Figure 7. WTW-curved data. (a–c) are WTW-curved example images.

Figure 8. Results of LocRecNet on the WTW-curved dataset. (a) Original images; (b) LocRecNet result; (c) original images; (d) LocRecNet result.

Figure 9. SCITSR-curved recognition results. (a) SCITSR-curved data; (b) LGPMA result; (c) LORE result; (d) LocRecNet result; (e) LGPMA result; (f) LORE result.

Figure 10. PubTabNet-curvedrecognition results. (a) PubTabNet-curved data; (b) LGPMA result; (c) LORE result; (d) LocRecNet result; (e) LGPMA result; (f) LORE result.

Figure 11. WTW-curved recognition results. (a) WTW-curved data; (b) LGPMA result; (c) LORE result; (d) LocRecNet result; (e) LGPMA result; (f) LORE result.

Figure 12. LocRecNet limitations on special tables. (a) Original table keypoint annotations; (b) predicted keypoints by LocRecNet; (c) corrected result by LocRecNet.

Figure 13. LocRecNet processing results on SCITSR-curved data. (a) Original images; (b) LocRecNet result.

Table 1. Performance comparison of LORE and LGPMA methods with/without LocRecNet.

Method	Data	With/Without LocRecNet	P	R	F1
LORE	SCITSR-curved	×/✓	93.8%	74.3%	82.9%
	SCITSR-curved	✓/×	92.7%	88.4%	90.5%
	PubTabNet-curved	×/✓	96.5%	83.3%	89.4%
	PubTabNet-curved	✓/×	97.2%	86.7%	91.6%
	WTW-curved	×/✓	94.5%	95.9%	95.1%
	WTW-curved	✓/×	98.5%	97.2%	97.9%
LGPMA	SCITSR-curved	×/✓	92.4%	67.7%	78.1%
	SCITSR-curved	✓/×	93.6%	85.1%	89.1%
	PubTabNet-curved	×/✓	96.2%	76.2%	85.1%
	PubTabNet-curved	✓/×	96.8%	86.7%	91.5%
	WTW-curved	×/✓	52.0%	83.3%	64.1%
	WTW-curved	✓/×	88.5%	88.6%	88.5%

Table 2. LocRecNet’s processing time.

	SCITSR	PubTabNet	WTW
Table edge point localization	0.008 s	0.006 s	0.069
Image correction	0.016 s	0.029 s	0.109

Table 3. Table recognition metrics.

Method	Representation	Input Size	AP	AR
HRNet	Heatmap	256 × 192	77.3%	79.7%
	Heatmap	384 × 288	82.5%	84.2%
	SimCC	256 × 192	84.0%	86.9%
	SimCC*	256 × 192	85.3%	87.1%
	SimCC*	384 × 288	87.1%	88.6%
HRNet-s	SimCC*	256 × 192	83.4%	85.1%
HRNet-s	SimCC*	384 × 288	86.1%	87.4%
Res50	Heatmap	256 × 192	75.1%	77.8%
	Heatmap	384 × 288	80.3%	82.3%
	SimCC	256 × 192	75.3%	82.1%
	SimCC	384 × 288	79.7%	84.4%
	SimCC*	384 × 288	85.0%	87.1%
Res101	Heatmap	256 × 192	68.9%	72.7%
	Heatmap	384 × 288	76.5%	79.0%
	SimCC	256 × 192	75.7%	82.5%
	SimCC	384 × 288	81.8%	85.5%
Res152	Heatmap	256 × 192	75.8%	78.8%
	Heatmap	384 × 288	81.4%	83.3%
	SimCC	384 × 288	81.2%	85.4%

SimCC* add Gaussian label smoothing before output.

Table 4. Performance comparison of LORE and LGPMA methods with/without LocRecNet.

Method	Data	With/Without LocRecNet	P	R	F1
LORE	SCITSR	×/✓	94.3%	90.9%	92.6%
	SCITSR	✓/×	94.1%	91.6%	92.8%
	PubTabNet	×/✓	97.9%	88.2%	92.8%
	PubTabNet	✓/×	97.9%	88.2%	92.8%
LGPMA	SCITSR	×/✓	93.8%	84.5%	88.9%
	SCITSR	✓/×	93.8%	85.2%	89.3%
	PubTabNet	×/✓	97.6%	87.5%	92.3%
	PubTabNet	✓/×	97.6%	87.5%	92.3%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

LocRecNet: A Synergistic Framework for Table Localization and Rectification

Abstract

1. Introduction

3. Methodology

3.1. Overall Architecture

3.2. Table Edge Point Localization

3.3. Image Correction

3.4. Table Structure Recognition

4. Experiments

4.1. Experimental Setting

4.2. Dataset

4.3. Evaluation Metrics

4.4. Experimental Results and Analysis

4.4.1. Performance Evaluation of LocRecNet

4.4.2. Computational Cost Analysis of LocRecNet

4.4.3. Visualization of Results

4.4.4. Overall Performance

4.5. Ablation Study

4.5.1. LocRecNet Table Edge Point Localization

4.5.2. Impact of LocRecNet on Standard Table Data

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Article Metrics

Citations

Article Access Statistics