Optimizing Text Recognition in Borehole Log Images Using a Multi-Layout Adjustment Voting Mechanism

Guo, Zhiyong; Guo, Yiwei; Deng, Jiqiu; Fattah, Hassan Ali

doi:10.3390/app15169171

Open AccessArticle

Optimizing Text Recognition in Borehole Log Images Using a Multi-Layout Adjustment Voting Mechanism

by

Zhiyong Guo

^1,2

,

Yiwei Guo

^1,2,3,4,5,

Jiqiu Deng

^1,2,*

and

Hassan Ali Fattah

^1,2

¹

School of Geosciences and Info-Physics, Central South University, Changsha 410083, China

²

Key Laboratory of Metallogenic Prediction of Nonferrous Metals and Geological Environment Monitoring of Ministry of Education, Central South University, Changsha 410083, China

³

Guangzhou Urban Planning & Surveying & Designing Research Institute Co., Ltd., Guangzhou 510060, China

⁴

Guangzhou Collaborative Innovation Center of Natural Resources Planning and Marine Technology, Guangzhou 510060, China

⁵

Guangdong Enterprise Key Laboratory for Urban Sensing, Monitoring and Early Warning, Guangzhou 510030, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(16), 9171; https://doi.org/10.3390/app15169171

Submission received: 20 July 2025 / Revised: 13 August 2025 / Accepted: 16 August 2025 / Published: 20 August 2025

(This article belongs to the Special Issue GeoBigData, GeoAI, and GeoModeling Applications in Geo-Information Systems)

Download

Browse Figures

Versions Notes

Abstract

The borehole log image contains valuable text information, encompassing key geological data such as structural composition, orebody distribution, and lithological characteristics. These data are important for mineral prediction, GeoBigData, and GeoModeling. However, text recognition in borehole log images is challenging due to complex structures, image noise, and diverse fonts, leading to low accuracy with traditional OCR methods. As a result, substantial manual intervention is often required for verification and correction, hindering efficient application. This study proposes an optimization method based on the multi-layout adjustment voting mechanism to improve text recognition accuracy in borehole log images. During the recognition process, multiple OCR results are generated by adjusting text layouts, and a voting mechanism integrates these results to produce the most accurate output. Experimental results on the Dayingezhuang and Dingjiashan datasets demonstrate the effectiveness of the proposed method, achieving F1 scores of 97.96% and 94.36%, respectively. This optimization method improves text recognition accuracy and recall without modifying the OCR algorithm or applying post-processing, providing a new technical approach to enhancing text recognition precision in borehole log images. This improvement in text extraction accuracy from geological borehole data not only facilitates large-scale integration and analysis of subsurface geological information but also provides essential foundational data for GeoBigData and GeoModeling applications.

Keywords:

multi-layout adjustment voting mechanism; text recognition; OCR optimization; information extraction; borehole log images

1. Introduction

Geological borehole data serve as an important record and a primary outcome of geological exploration, providing essential subsurface information that supports a wide range of geoscientific and engineering applications. They play a vital role in mineral resource prediction by supplying key geological parameters for identifying and evaluating resource potential [1,2]; in groundwater monitoring by offering reliable data for assessing aquifer conditions and dynamics [3,4]; and in geological hazard prevention and mitigation through informing assessments of potential risks such as landslides, subsidence, and seismic activity [5,6,7]. Borehole data are also utilized in environmental monitoring to track subsurface environmental changes and contamination [8], and in engineering construction to provide site-specific geological information essential for safe and efficient design and implementation of infrastructure projects [9,10]. With the widespread adoption of computer technology, some borehole data have transitioned from traditional paper-based records to digital formats such as Excel, Word, PDF, and MapGIS [11]. However, due to historical constraints and data generation limitations, some borehole records still exist as paper-based images, exhibiting characteristics of multi-source heterogeneity and unstructured data [12]. To efficiently process and analyze such data, they must first be converted into borehole log images, followed by vectorization, line extraction, and text recognition to extract valuable information. Among these steps, text is the most abundant and essential component, and it is typically extracted with high precision using Optical Character Recognition (OCR) technology.

OCR technology, a fundamental method for automatically recognizing text information, has progressively evolved since its inception and patenting by German scientist Tauschek in 1929. In recent years, open-source OCR techniques have gained widespread adoption, with Tesseract-OCR [13] and Paddle-OCR [14,15] demonstrating notable effectiveness in text recognition tasks. Traditional OCR techniques rely on digital image processing and statistical machine learning models. For instance, Zhang et al. [16] enhanced printed text recognition by integrating connected component analysis with an improved SVM algorithm. Similarly, Narang et al. [17] employed a combination of cosine similarity analysis and the Adaboost model to recognize specific handwritten scripts, achieving an accuracy of 91.7%. However, such methods depend on precise text segmentation and are highly susceptible to noise, font variations, and other distortions. On the other hand, deep learning-based approaches encompass both single-text segmentation and sequential line recognition, with the latter being the most widely adopted. For example, Shi et al. [18] implemented an end-to-end sequence recognition approach using a CRNN model, effectively modeling text as a sequence without needing character-level segmentation while incorporating context information. This method has demonstrated impressive accuracy, reaching 75.1%, 81.7%, and 84.3% across different datasets. These advancements have significantly enhanced the performance of OCR technology across various text recognition tasks.

Researchers have enhanced OCR accuracy by optimizing image preprocessing, refining models, and improving post-processing techniques. For image preprocessing studies, Shin et al. [19] applied morphological filtering to images, achieving an accuracy of 92.82%. Similarly, Michalak and Okarma [20] introduced a local image entropy filtering method to address uneven lighting, enhancing OCR accuracy by refining image features. Regarding model optimization, Sporici et al. [21] improved OCR performance by applying convolutional processing before feeding images into a general recognizer, significantly boosting the F1 score to 0.729. Additionally, some researchers have enhanced recognition accuracy by integrating multiple models, optimizing OCR results from a model aggregation perspective [22,23]. Recent advances in this direction include the LMV-RPA framework, which integrates outputs from multiple OCR engines and large language models via a voting strategy to achieve 99% accuracy [24], and the Consensus Entropy approach proposed by Zhang et al., which selects the most reliable output from multiple vision–language model-based OCR systems based on their agreement characteristics [25]. In the context of post-OCR correction, Ramirez-Orta et al. demonstrated that combining multiple character-level sequence-to-sequence models through a voting scheme can substantially improve OCR text correction performance [26]. For post-processing approaches, Kumar and Ramakrishnan [27] proposed a dictionary-based method for error correction, demonstrating its effectiveness in improving OCR accuracy. However, although post-processing can improve OCR accuracy to some extent, its effectiveness is heavily contingent on the quality of preprocessing and model optimization. If the pre-processing is not in place, it is difficult for post-processing to make up for the shortcomings effectively. Therefore, these studies show that although image pre-processing, model optimization and post-processing techniques can enhance OCR accuracy to a certain extent, significant room for improvement remains, particularly in refining preprocessing methods.

Although OCR techniques have achieved relatively accurate recognition, their application in specific domains, such as geological borehole data, remains limited. Most existing methods focus on general text recognition, making them less effective for complex and highly structured image data like borehole image data. For instance, Zhang et al. [12] automated the processing of borehole logs using the Hough transform and corner tagging method combined with the Tesseract-OCR engine. While the method achieved high accuracy in simple text recognition, its performance declined for complex text, resulting in an overall recognition rate of 90%. The process involved image preprocessing, layout analysis, form structure extraction, and text recognition, enabling initial automation of information extraction. However, its precision and recall still require significant improvement when handling complex forms and information-dense content.

The accuracy of text recognition is significantly impeded by the presence of complex backgrounds [28], encompassing factors such as image distortion, blurriness, and luminance fluctuations. To address these challenges, prior studies have explored advanced methodologies to optimize OCR performance under intricate background conditions [29,30]. However, most existing studies focus on how background interference affects text recognition, while the impact of canvas padding on recognition accuracy after text segmentation has received little attention. In practical applications, the canvas padding may influence the spatial configuration of textual elements and compromise the efficiency of feature extraction, thereby exerting a subtle yet consequential impact on recognition precision.

This paper investigates the impact of varying canvas padding on text recognition accuracy while keeping text size constant and without altering OCR algorithms or applying post-processing techniques, offering novel insights and methodologies for optimizing OCR performance. Specifically, this paper improves the accuracy of borehole log text extraction through optimized image preprocessing, adjusts the canvas padding around text, and introduces the multi-layout adjustment voting mechanism. In the text recognition phase, we incorporate the multi-layout adjustment voting strategy, which adjusts the canvas padding around text and votes on the recognition results of multiple layouts to enhance accuracy further.

The main contributions of this paper are as follows:

(1) We propose a novel multi-layout adjustment voting strategy that enhances text recognition without modifying the OCR engine or adding post-processing.

(2) We introduce a layout variation method based on canvas padding to reduce recognition uncertainty under a single layout.

(3) The effectiveness of this method is verified through experimental comparison of two drilling datasets, providing new ideas for improving text recognition accuracy in other fields.

2. Materials and Methods

2.1. Overall Approach

Text recognition in borehole log images faces many challenges, such as complex backgrounds, noise interference, and irregularities in the table structure, leading to lower recognition accuracy. This is especially true for complex tables with dense information. To address this problem, this paper proposes a multi-layout adjustment voting strategy. It enhances image preprocessing, adjusts the canvas padding around text in borehole log images, and applies comprehensive voting based on multiple canvas padding. The method consists of two main components: (1) Based on the height of the Chinese text line in the borehole log images, it dynamically adjusts the blank area around the text to build a variety of text layouts with different blank area sizes. (2) By integrating a voting mechanism, OCR recognition results from multiple layouts are analyzed and voted on to improve text recognition accuracy. The overall workflow is shown below.

Figure 1 shows the multi-layout recognition and voting process for borehole log image text. The process started with data preprocessing and region cropping to extract regions of interest (ROIs) from the raw borehole images. These red boxes indicate the segmented text-line regions, which serve as the inputs for OCR recognition. Then, multiple layouts were applied to the content of the acquired text lines to generate text images adapted to different canvas padding sizes by dynamically adjusting the blank area around the text. Subsequently, text recognition was performed on these text images with different layouts using OCR recognition tools. Finally, the text recognition results were output through a comprehensive voting mechanism based on multiple layout recognition results. The comprehensive voting mechanism determines the final output based on the frequency of recognition results across different layouts. As illustrated in the figure, the character “石 (Shi)” appears four times, which is the highest among all candidates, and is thus chosen as the final result.

2.2. Data Preprocessing

The extraction of lines and text in the borehole log images depends heavily on image quality. This is particularly true for scanned paper images, where issues such as printing defects, wear, uneven lighting, and noise can hinder information recognition. Therefore, image enhancement is necessary to improve clarity and optimize the extraction process. The enhancement process primarily involves grayscale conversion and binarization, skew correction, and morphological erosion and dilation.

(a): Grayscale conversion and binarization

By converting the image from RGB format to grayscale, the amount of computation can be reduced, and the processing efficiency can be improved. A local adaptive thresholding method based on a Gaussian-weighted neighborhood mean was applied to enhance binarization under uneven illumination. (Figure 2).

(b): Skew Correction

The borehole log images exhibit a distinct tabular frame structure. Thus, a skew correction algorithm was employed based on Canny edge detection [31] and the Hough transform. First, Canny edge detection was used to denoise, smooth the image, and extract edge information. Then, the Hough transform detected straight lines and calculated their tilt angles (ranging from −45° to 45°). Based on the detected skew angle, the image was rotated for correction (Figure 3), ensuring the accurate orientation of the table frame and text. This method effectively corrects skew resulting from improper document placement during scanning, thereby improving the accuracy of subsequent information extraction.

(c): Morphological Erosion and Dilation

Morphological operations, through erosion and dilation, remove noise and repair gaps and missing parts in the image. The erosion operation reduces the size of the target objects in the image, while dilation increases their size. By combining these two operations, missing sections caused by issues such as unclear printing or uneven lighting can be effectively restored, thereby enhancing the target features. The “Removing noise” step (Figure 4) uses a morphological operation sequence with a 2 × 2 rectangular structuring element, which effectively reduces scanning noise and repairs incomplete lines or characters caused by printing defects or uneven illumination.

Based on data preprocessing, the region of interest (ROI) in borehole images was extracted using a combination of layout analysis [32], frame line extraction [33], and the corner tagging method. The method effectively separates table content from text content, thus ensuring accurate extraction of subsequent textual information.

2.3. OCR Optimization Based on Multi-Layout Voting

The development of OCR engines for general domains has been rapid, achieving high efficiency and accuracy in tasks involving plain text images. For example, Paddle-OCR claims to reach an accuracy of around 95%. However, in practical recognition tasks, particularly in specialized fields such as geological borehole log images (the target domain of this study), the recognition accuracy falls significantly short of the advertised rates.

This study proposes an OCR optimization method based on input image layout optimization. After image preprocessing, the input text image was processed using horizontal and vertical projections to focus the text. Subsequently, the layout of the text in the image was optimized by adjusting the canvas padding. The processed image was then fed into the OCR system (Paddle-OCR and Tesseract-OCR were used in this study). Finally, multiple recognition results were obtained by inputting text images with different layouts, and a voting mechanism was used to optimize the final results. The specific process is as follows:

STEP 1: Text Line Focusing

First, potential segmentation locations were identified by projecting the image vertically and horizontally, and neighboring regions were merged to extract character regions (as shown in Figure 5a). In addition, the accuracy of recognition may be affected for text lines with large character spacing. Hence, they must be adjusted appropriately to optimize the recognition effect (as shown in Figure 5b).

STEP 2: Text Layout Adjustment

After obtaining the text image segmented into lines in STEP 1, further layout adjustments were performed on the resulting text line images. The layout adjustment process includes both line text focusing and canvas padding. Line text focusing was mainly used to optimize the text lines with word spacing that was too wide. The method sets a threshold(H/4) to segment text with normal spacing and then reassembles the text blocks to restore reasonable word spacing. This process generates a more standardized text line image (Figure 6a). The purpose of background resizing is to optimize the layout of the text in the image by changing the canvas padding. Taking a text line image of size H×W as an example, where H denotes the height and W denotes the original width of the text line image. Based on OpenCV, the canvas padding of the text line image was adjusted using H as a benchmark to optimize the overall layout. Specifically, 2H(W + H) denotes resizing the background width to 2H and the length to W + H, where the expanded region is filled with a white background (as shown in Figure 6b). Other canvas padding adjustments follow this pattern. For each distinct text line image, a set of differently laid-out images is generated and subsequently fed into the OCR engine.

STEP 3: Multi-layout recognition voting

After adjusting the layout of the text line image, it was fed into the OCR engine to obtain recognition results. Since different layouts may have differences in the overall recognition enhancement, a voting mechanism was introduced to count and vote on the recognition results under multiple layouts to determine the final output. The voting mechanism works by counting the frequency of each recognition result across different layouts and selecting the one with the highest frequency as the final output. When calculating the frequency, if the recognition result of a certain layout is empty, it is not counted in the statistics. For example, if Layout 4 recognizes a text line as “black mica schist” while other layouts produce no result, the probability of “black mica schist” is 1/1. Eventually, the recognized text is determined based on the voting results and then output. In the case of a tie during the voting process, the first among the tied results in the original recognition order is selected as the final output. If all recognition results are empty, the output is set as an empty string.

3. Experimental Data and Results

3.1. Experimental Data and Evaluation Metrics

(1): Experimental Data and Operating Environment

To verify the usability and generalizability of the proposed method, this study selected data from the Dayingezhuang gold deposit and the Dingjashan lead-zinc deposit. A total of 47 borehole log images were used, including 34 images exported from MapGIS vectors of the DaiYingeZhuang gold deposit in “.png” format and 13 scanned paper borehole log images from the DingJaShan lead-zinc deposit in “.jpg” format. Since the data comes from different deposits, there are some differences in their organization and performance characteristics. The scanned paper images are of lower quality, with noise issues such as blurring, indentations, and smudging in some areas. In contrast, the MapGIS vector-exported images are of higher quality, featuring clear text and lines with no noticeable noise. Detailed data information are provided in Table 1.

The experiments in this study were conducted in a Windows 11 operating environment. The system has an Intel(R) Core(TM) i7-12700H CPU @ 2.30 GHz and an NVIDIA GeForce RTX 3060 Laptop GPU. The implementation was carried out using Python 3.7. Both PaddleOCR and Tesseract-OCR were locally deployed and accessed through Python interfaces during the experiments. The average processing time for a single borehole log image (including preprocessing, multi-layout generation, OCR recognition, and result integration) was approximately 3 min. This processing time may vary slightly depending on the complexity and resolution of the input image.

(2): Evaluation Metrics

To assess the OCR recognition results, this study employs several commonly used evaluation metrics. Since Paddle-OCR operates at the text-line level, all metrics are calculated based on the number of text lines. The primary evaluation metrics include Precision, Recall, and F1-Score.

Precision: The ratio of correctly identified text lines to the total number of recognized lines by the OCR system.

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

Recall: The ratio of the number of lines accurately located and recognized by the OCR system to the number of all originally existing lines.

R e c a l l = \frac{T P}{T P + F N}

(2)

TP: The number of correctly located and correctly recognized text lines. FP: The number of correctly located but incorrectly recognized text lines. FN: The number of missed or incorrectly recognized text lines. F1-Score: A comprehensive metric that evaluates both the accuracy and completeness of the OCR system’s text positioning and recognition.

F 1 - S c o r e = \frac{2 \times (P r e c i s o n \times R e c a l l)}{P r e c i s o n + R e c a l l}

(3)

3.2. Experimental Results

After segmenting the table cells and independent text blocks in the borehole log images, OCR recognition was performed on the extracted text images. In this study, Paddle-OCR and Tesseract-OCR were employed to recognize the text, and evaluation metrics were computed separately for each engine. Figure 7 presents recognition results (example) based on Multi-layout adjustment and multi-layout voting, where red boxes indicate missed or misrecognized text areas. By comparing results across different canvas padding sizes, the effectiveness of layout adjustment and multi-layout voting on improving recognition completeness and accuracy was visually demonstrated. Detailed statistics and accuracy evaluations are provided in Table 2.

The values in Table 2 are computed according to Equations (1)–(3) based on the total counts of TP, FP, and FN for each dataset and method. For example, when TP = 560, FP = 17, and FN = 19, the precision is calculated as 560/(560 + 17) ≈ 97.05%, recall as 560/(560 + 19) ≈ 96.72%, and the F1-score as 2 × (97.05% × 96.72%)/(97.05% + 96.72%) ≈ 96.89%. It is worth noting that for the Dayingezhuang dataset, the precision, recall, and F1-score of the Multi-layout voting (Paddle) method are all 97.96%. This is because, in this case, all output results from the comprehensive voting mechanism were correctly located characters, with no mislocated or incorrectly positioned text lines. The recognition results with a canvas padding size of H (default text height) represent the preprocessed outcomes without layout adjustment, and the “Multi-layout voting” results were generated by integrating recognition results from different layouts through a voting process.

3.3. General Data Experiment

To further verify the effectiveness and generalizability of our proposed multi-layout voting method, we conducted comparative experiments on two additional datasets: 1000 street scene images and 1000 synthetic Chinese text images. Examples of both datasets are shown in Figure 8.

The experimental results are summarized in Table 3.

Note that the default layout in Paddle/Tesserac OCR uses a single predefined canvas padding size (H) and the “Multi-layout voting” results were generated by integrating recognition results from different layouts through a voting process.

As shown in Table 3, the multi-layout voting approach consistently improves the OCR performance across both datasets. For PaddleOCR, our method achieves 1.6–2.8 percentage points improvement in F1-score. Even for the baseline Tesseract engine, which performs poorly in challenging scenarios, our method boosts F1-score by approximately 2–4 percentage points, confirming its general applicability.

4. Experimental Analysis

From Table 2, Paddle-OCR demonstrates significantly higher overall accuracy than Tesseract-OCR. This difference is primarily due to the test data being a Chinese geological borehole log image, which contains numerous complex geological terms. Tesseract-OCR struggles with recognizing such domain-specific Chinese content. In addition to the borehole-specific datasets, we also evaluated the multi-layout voting strategy on general-purpose data, including street scene images and synthetic Chinese text. As shown in Table 3, the proposed method improves the F1-score by 1.6–2.8 percentage points for PaddleOCR and by 2–4 percentage points for Tesseract. Notably, the baseline recognition performance (without multi-layout) of both PaddleOCR and Tesseract on these datasets aligns with the results reported in recent OCR benchmarking studies, such as the 2024 open-source evaluation of 12 OCR engines [34]. This consistency confirms the reliability of our experimental results. Moreover, our multi-layout voting strategy demonstrates additional performance gains on top of these already-representative baselines, thereby validating its effectiveness in improving OCR accuracy even on general datasets.

As shown in Table 2, recognition accuracy changes as the original layout H was gradually adjusted to 2H, 3H, and 4H. The overall trend shows that increasing canvas padding sizes gradually improves recognition accuracy, indicating that layout adjustment contributes to better text recognition in borehole log images. However, the accuracy gains stabilize beyond a certain range with minor variations. After applying the multi-layout voting strategy, recognition accuracy, recall, and F1 score show significant improvements over the results of layout adjustment alone. For instance, in the Dayingezhuang gold deposit dataset, the “Multi-layout voting (Paddle)” increases the F1 score from 95.93% to 97.96%, while the “Multi-layout voting (Tesseract)” improves the F1 score from 59.88% to 61.82%. These results confirm the effectiveness of this optimization strategy in enhancing text recognition accuracy for borehole log images.

Table 2 illustrates that “Multi-layout voting (Paddle)” and “Multi-layout voting (Tesseract)” achieved the highest recognition performance on the Dayingzhuang dataset, with F1 scores of 97.96% and 61.82%, respectively. This outcome is attributed to the poor quality of the Dingjiashan dataset, which has some noise issues. In contrast, the Dayingzhuang dataset exhibits more uniform text distribution, with standardized line and word spacing, which contributes to improved recognition performance. These findings indicate that the quality of the original data significantly impacts text recognition accuracy.

As shown in Table 2, for results with high accuracy after preprocessing, optimizing the canvas padding yields only marginal improvements in recognition accuracy. For instance, on the Dayingezhuang dataset, the F1 score increases by merely 2%. In contrast, when preprocessing accuracy is lower, comprehensive adjustments to the canvas padding sizes result in more significant improvements in recognition performance. For example, on the Dingjiashan dataset, the F1 score increases by approximately 10%. This suggests that the integrated multi-layout adjustment voting strategy has a more pronounced enhancement effect on cases with lower initial recognition accuracy.

When comparing the results in Table 2 and Table 3, it is evident that Tesseract consistently underperforms relative to PaddleOCR across all datasets. This performance gap is particularly pronounced in the street scene image dataset in Table 3, where Tesseract’s F1-score remains below 16%, compared to over 70% for PaddleOCR. This substantial difference can be attributed to several factors. First, Tesseract was primarily designed for printed documents and lacks robust pretraining for complex real-world scenes with distorted or cluttered text layouts. Second, its support for Chinese character sets—especially in scene text where font, orientation, and background vary greatly—is limited compared to deep learning-based engines like PaddleOCR. As a result, Tesseract struggles to generalize to unconstrained environments such as natural scene text, which often involves noisy backgrounds, irregular lighting, and artistic or non-standard fonts. This explains the large performance gap observed in Table 3 and further underscores the advantage of using more modern, learning-based OCR engines in real-world applications.

In summary, the multi-layout adjustment voting optimization strategy significantly improves the precision, recall, and F1-score of OCR engines on borehole log images, with recall showing the most notable enhancement. This approach not only ensures high recognition accuracy but also significantly enhances the completeness of the recognition results. Despite overall improvements, the proposed method still has some limitations. Recognition errors may occur in areas with poor image quality, overlapping symbols, or irregular text backgrounds. For instance, in the Dingjiashan dataset, geological symbols placed close to Chinese characters sometimes lead to misrecognition, particularly with Tesseract-OCR. In addition, slanted or occluded text lines can cause segmentation failures and reduce recognition accuracy. These cases suggest that the method’s effectiveness may be limited under severe noise or non-standard layouts. Future work will consider integrating post-processing techniques, such as language models or domain-specific correction tools, to enhance robustness.

5. Conclusions

This study proposed the multi-layout adjustment voting optimization strategy that integrates image processing with optical text recognition techniques, improving text recognition accuracy and recall for borehole log images without modifying the OCR algorithm or applying post-processing. During the text recognition process, multiple OCR results were generated by adjusting the text layout, and a voting mechanism was employed to merge these results, thereby significantly enhancing accuracy. While this approach is effective, the increased computational cost introduced by repeated OCR passes may become a concern when applied to large-scale datasets. Despite achieving certain results of this study, certain limitations remain. For instance, image quality and slicing errors may impact recognition performance, as inaccurate cell segmentation due to image degradation can compromise results. To mitigate this issue, post-processing can be applied following ROI slicing, incorporating a text-word model and corpus-based corrections to refine recognition outcomes and enhance information completeness. Additionally, introducing a post-processing step after text recognition is expected to further improve accuracy beyond the initial recognition results. Future research will focus on refining the proposed approach to enhance its adaptability, accuracy, and processing efficiency in complex environments, as well as exploring its potential applications in other domains and languages.

Author Contributions

Z.G.: methodology, software, formal analysis, and writing—original draft. Y.G.: investigation, software, validation, and writing. J.D.: conceptualization, methodology, funding acquisition, writing, and supervision. H.A.F.: software, formal analysis, and writing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China (Grant Number: 42172330).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data is confidential, therefore it cannot be shared.

Conflicts of Interest

Author Yiwei Guo was employed by the company Guangzhou Urban Planning & Surveying & Designing Research Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Wang, G.; Huang, L. 3d geological modeling for mineral resource assessment of the tongshan cu deposit, heilongjiang province, china. Geosci. Front. 2012, 3, 483–491. [Google Scholar] [CrossRef]
Ma, Z.; Wang, Z.; Zhang, L.; Yao, Y.; Qiao, Y. A new 3d geological modeling method and its application in chengdu-changdu region modeling. Northwestern Geol. 2022, 55, 82–92. [Google Scholar]
Li, B.; Chen, Y.; Liu, M.; Hu, R.; Yang, Z.; Zhou, C. A generalized non-darcian model for packer tests considering groundwater level and borehole inclination. Eng. Geol. 2021, 286, 106091. [Google Scholar] [CrossRef]
Bayat, M.; Eslamian, S.; Shams, G.; Hajiannia, A. Groundwater level prediction through gms software–case study of karvan area, iran. Quaest. Geogr. 2020, 39, 139–145. [Google Scholar] [CrossRef]
Jiawei, W.; Haijun, W.; Hanning, W.; Yan, W.; Ke, H.; Xin, C.; Mintao, D. Research on transparency of hidden disaster causing factors in coal mines based on 3d geological modeling technology. J. Mine Autom. 2024, 50, 71–81. [Google Scholar]
Wei, W.; Shuilin, W.; Hua, T.; Pinggen, Z. Application of 3-d gis to monitoring and forecast system of landslide hazard. Rock Soil Mech. 2009, 30, 3379–3385. [Google Scholar]
Minimo, L.G.; Lagmay, A.M.F.A. 3d modeling of the buhi debris avalanche deposit of iriga volcano, philippines by integrating shallow-seismic reflection and geological data. J. Volcanol. Geotherm. Res. 2016, 319, 106–123. [Google Scholar] [CrossRef]
Gao, Y.; Cheng, J.; Dong, X.; Li, Z. Research on unfavorable geology environment modeling and alignment optimization design for railway route selection in wind-sand area based on gis. Railw. Stand. Des 2023, 67, 30–36. [Google Scholar]
Xu, X.; Zang, H.; Fu, S.; Liu, S.; Wang, X.; Guo, J.; Dun, L. 3d geological modeling of shallow strata in shenyang city using semi-supervised deep learning method with pseudo-labels based on boreholes. Geogr. Geogr. Inf. Sci. 2023, 39, 9–17. [Google Scholar]
Zhong, D.; Yan, F.; Li, M.; Huang, C.; Fan, K.; Tang, J. A real-time analysis and feedback system for quality control of dam foundation grouting engineering. Rock Mech. Rock Eng. 2015, 48, 1947–1968. [Google Scholar] [CrossRef]
Wang, L.; Wang, B.; Li, G.; Wang, D.; Peng, Z. Major progresses of geological survey and research in east tethys: An overview. Sediment. Geol. Tethyan Geol. 2021, 41, 283–296. [Google Scholar]
Zhang, J.; Zhang, Y.; Tian, Y.; Liu, G.; Xu, L.; Hu, Y. A rapid method for information extraction from borehole log images. Appl. Sci. 2020, 10, 5520. [Google Scholar] [CrossRef]
Rithika, H.; Santhoshi, B.N. Image text to speech conversion in the desired language by translating with raspberry pi. In Proceedings of the 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Chennai, India, 15–17 December 2016; pp. 1–4. [Google Scholar]
Du, Y.; Li, C.; Guo, R.; Cui, C.; Liu, W.; Zhou, J.; Lu, B.; Yang, Y.; Liu, Q.; Hu, X. Pp-ocrv2: Bag of tricks for ultra lightweight ocr system. arXiv 2021, arXiv:2109.03144. [Google Scholar] [CrossRef]
Du, Y.; Li, C.; Guo, R.; Yin, X.; Liu, W.; Zhou, J.; Bai, Y.; Yu, Z.; Yang, Y.; Dang, Q. Pp-ocr: A practical ultra lightweight ocr system. arXiv 2020, arXiv:2009.09941. [Google Scholar] [CrossRef]
Zhang, J.; Wu, X.; Yu, Y.; Luo, D. A method of neighbor classes based svm classification for optical printed chinese character recognition. PLoS ONE 2013, 8, e57928. [Google Scholar]
Narang, S.R.; Jindal, M.K.; Kumar, M. Devanagari ancient character recognition using dct features with adaptive boosting and bootstrap aggregating. Soft Comput. 2019, 23, 13603–13614. [Google Scholar] [CrossRef]
Shi, B.; Bai, X.; Yao, C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 2298–2304. [Google Scholar] [CrossRef]
Shin, H.-R.; Lee, S.-H.; Park, J.-S.; Song, J.-K. Performance improvement of optical character recognition for parts book using pre-processing of modified vgg model. J. Korea Inst. Electron. Commun. Sci. 2019, 14, 433–438. [Google Scholar]
Michalak, H.; Okarma, K. Improvement of image binarization methods using image preprocessing with local entropy filtering for alphanumerical character recognition purposes. Entropy 2019, 21, 562. [Google Scholar] [CrossRef]
Sporici, D.; Cușnir, E.; Boiangiu, C.-A. Improving the accuracy of tesseract 4.0 ocr engine using convolution-based preprocessing. Symmetry 2020, 12, 715. [Google Scholar] [CrossRef]
Wang, Z.-R.; Du, J.; Wang, W.-C.; Zhai, J.-F.; Hu, J.-S. A comprehensive study of hybrid neural network hidden markov model for offline handwritten chinese text recognition. Int. J. Doc. Anal. Recognit. (IJDAR) 2018, 21, 241–251. [Google Scholar] [CrossRef]
Wang, B.; Ma, Y.-W.; Hu, H.-T. Hybrid model for chinese character recognition based on tesseract-ocr. Int. J. Internet Protoc. Technol. 2020, 13, 102–108. [Google Scholar] [CrossRef]
Abdellaif, O.H.; Ayman, A.; Hamdi, A. Lmv-rpa: Large model voting-based robotic process automation. In Proceedings of the International Conference of Advanced Computing and Informatics; Springer: Cham, Switzerland, 2025; pp. 134–144. [Google Scholar]
Zhang, Y.; Liang, T.; Huang, X.; Cui, E.; Guo, X.; Chu, P.; Li, C.; Zhang, R.; Wang, W.; Liu, G. Consensus entropy: Harnessing multi-vlm agreement for self-verifying and self-improving ocr. arXiv 2025, arXiv:2504.11101. [Google Scholar]
Ramirez-Orta, J.A.; Xamena, E.; Maguitman, A.; Milios, E.; Soto, A.J. Post-ocr document correction with large ensembles of character sequence-to-sequence models. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 22 February–1 March 2022; pp. 11192–11199. [Google Scholar]
Kumar, H.R.S.; Ramakrishnan, A.G. Lipi gnani: A versatile ocr for documents in any language printed in kannada script. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2020, 19, 60. [Google Scholar] [CrossRef]
Chen, X.; Jin, L.; Zhu, Y.; Luo, C.; Wang, T. Text recognition in the wild: A survey. ACM Comput. Surv. (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]
Chang, D.; Li, Y. Dlora-trocr: Mixed text mode optical character recognition based on transformer. arXiv 2024, arXiv:2404.12734. [Google Scholar]
Zhou, X.; Yao, C.; Wen, H.; Wang, Y.; Zhou, S.; He, W.; Liang, J. East: An efficient and accurate scene text detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5551–5560. [Google Scholar]
Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, PAMI-8, 679–698. [Google Scholar] [CrossRef]
Strecker, T.; Van Beusekom, J.; Albayrak, S.; Breuel, T.M. Automated ground truth data generation for newspaper document images. In Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, 26–29 July 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 1275–1279. [Google Scholar][Green Version]
Von Gioi, R.G.; Jakubowicz, J.; Morel, J.-M.; Randall, G. Lsd: A fast line segment detector with a false detection control. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 32, 722–732. [Google Scholar] [CrossRef]
Benchmark Evaluation of Twelve Open-Source Ocr Tools. 2024. Available online: https://www.ceietn.com/a/zhengcefagui/guojiaji/930.html (accessed on 15 July 2025).[Green Version]

Figure 1. Workflow of Multi-Layout OCR and Voting for Borehole Log Text Recognition.

Figure 2. Image Binarization. Title text: “丁家山铅锌矿床ZK1501钻孔柱状图”/“Pinyin: Ding Jia Shan Qian Xin Kuang Chuang ZK1501Zuan Kong Zhu Zhuang Tu”/“English: ZK1501 borehole histogram of Dingjiashan lead-zinc deposit”.

Figure 3. Image Skew Correction Process.

Figure 4. Schematic of Erosion and Dilation. Text: “绿泥”/“Pinyin:Lv Ni”/“English: Green mud”.

Figure 5. Results of text line focusing. Text1 (a): “2, 变辉长岩灰绿色, 细粒柱状变晶结构, 块状构造。主要组成矿物为斜长石, 角闪石, 辉石, 岩石坚硬完整, 与轴心夹角0°, 23–40°, 70–80°三组裂缝内见毫米级碳酸盐充填。”/“Pinyin: 2, Bian Hui Chang Yan Hui Lv Se, Xi Li Zhu Zhuang Bian Jing Jie Gou, Kuai Zhuang Gou Zao. Zhu Yao Zu Cheng Kuang Wu Wei Xie Chang Shi, Jiao Shan Shi, Hui Shi, Yan Shi Jian Ying Wan Zheng, Yu Zhou Xin Jia Jiao 0°, 23–40°, 70–80° San Zu Lie Feng Nei Jian Hao Mi Ji Tan Suan Yan Chong Tian.”/“English: 2. Metamorphic gabbro Gray green, with fine-grained columnar crystal structure and blocky structure. The main mineral components are plagioclase, clinopyroxene, and pyroxene. The rock is hard and intact, with millimeter sized carbonate filling in three sets of fractures at angles of 0°, 23–40°, and 70–80° to the axis”. Text2 (b): “岩心描述”/“Pinyin: Yan Xin Miao Shu”/“English: Core Description”.

Figure 6. Text layout adjustment examples. Text1 (a): “岩心描述”/“Pinyin: Yan Xin Miao Shu”/“English: Core Description”. Text2 (b): “石英绿帘石片岩”/“Pinyin: Shi Ying Lv Lian Shi Pian Yan”/“English: Quartz greenstone schist”.

Figure 7. Example of multi-layout recognition results. Text: “3, 英云闪长岩: 浅灰白色, 粒状结构, 块状结构，矿物成份有: 长石, 石英, 云母, 绿泥石, 碳酸盐。30.20–30.40米为构造裂隙，充填破裂岩。33.49–50.50米, 节理发育, 较直立, 面上分布褐色铁质物及碳酸盐。”/“Pinyin: 3, Ying Yun Shan Chang Yan Qian Hui Bai Se, Li Zhuang Jie Gou, Kuai Zhuang Jie Gou, Kuang Wu Cheng Fen You: Chang Shi, Shi Ying, Yun Mu, Lv Ni Shi, Tan Suan Yan 30.20–30.40 Mi Wei Gou Zao Lie Xi, Chong Tian Po Lie Yan 33.49–50.50 Mi, Jie Li Fa Yu, Jiao Zhi Li, Mian Shang Fen Bu He Se Tie Zhi Wu Ji Tan Suan Yan”/“English: 3. Yingyun diorite: Light gray white, granular structure, blocky structure, mineral composition includes feldspar, quartz, mica, chlorite, and carbonate. 30.20–30.40 m are structural fractures filled with fractured rocks. 33.49–50.50 m, with developed joints, relatively upright, and brown iron and carbonate distributed on the surface”.

Figure 8. Examples of street scenes and synthetic Chinese text images. Text: “2楼佳妍咖啡厅”/“Pinyin: 2 lou Jia yan ka fei ting”/“English: 2nd Floor Jiayan Cafe”; “24h自助”/“Pinyin: 24h zi zhu”/“English: 24h Self-service”; “鼻部冲洗”/“Pinyin: bi bu chong xi”/“English: Nasal Rinse”; “佳鑫广告”/“Pinyin: Jia xin guang gao”/“English: Jiaxin Advertising”; “天缘日化”/“Pinyin: Tian yuan ri hua”/“English: Tianyuan Daily Chemicals”; “宏盛玻璃”/“Pinyin: Hong sheng bo li”/“English: Hongsheng Glass”; “全新公寓招租”/“Pinyin: Quan xin gong yu zhao zu”/“English: Brand-new Apartment for Rent”; “螺蛳粉”/“Pinyin: luo si fen”/“English: Luosifen (River Snail Rice Noodles)”;“博悦钢琴”/“Pinyin: Bo yue gang qin”/“English: Boyue Piano”; “两桌，亲朋好”/“Pinyin: Liang zhuo, qin peng hao”/“English: Two tables, good for friends and family”; “岩性为细砂与粉砂互层”/“Pinyin: Yan xing wei xi sha yu fen sha hu ceng”/“English: Lithology consists of interbedded fine sand and silt”; “我的左手的小手指和中”/“Pinyin: Wo de zuo shou de xiao shou zhi he zhong”/“English: My left hand’s little finger and middle… (incomplete phrase)”; “可以买一个MP3或去”/“Pinyin: Ke yi mai yi ge MP3 huo qu”/“English: You can buy an MP3 or go… (incomplete phrase)”; “老套”/“Pinyin: Lao tao”/“English: Old-fashioned/Cliche”; “6. 说话就更节省”/“Pinyin: Shuo hua jiu geng jie sheng”/“English: Talking saves more (words/money)”; “整的空地。鲁”/“Pinyin: Zheng de kong di. Lu”/“English: Vacant land. Lu (incomplete/unclear phrase)”; “机遇往往有这样的特点”/“Pinyin: Ji yu wang wang you zhe yang de te dian”/“English: Opportunities often have such characteristics”; “就像他拥有这绝招一”/“Pinyin: Jiu xiang ta yong you zhe jue zhao yi”/“English: Just like he has this unique trick… (incomplete phrase)”.

Table 1. Detailed Information on experimental data.

Data Source	Data Type	Storage Format	File Count	Resolution	Noise Presence
DaYingezZhuang	Vector Export	.png	34	300 DPI	No
DingJiasShan	Scanned Output	.jpg	13	300 DPI	Yes

Table 2. Accuracy evaluation of multi-layout voting results.

Data Source	OCR Engine	Canvas Padding	Precision	Recall	F1-Score
Dayingezhuang	Multi-layout (Paddle)	H	96.14%	95.82%	95.93%
		2H	97.05%	96.72%	96.89%
		3H	96.82%	96.50%	96.66%
		4H	97.61%	97.45%	97.53%
	Multi-layout voting (Paddle)	-	97.96%	97.96%	97.96%
	Multi-layout (Tesseract)	H	60.12%	59.65%	59.88%
		2H	60.85%	60.36%	60.60%
		3H	60.35%	60.11%	60.23%
		4H	61.21%	60.13%	60.67%
	Multi-layout voting (Tesseract)	-	62.19%	61.46%	61.82%
Dingjiashan	Multi-layout (Paddle)	H	85.60%	80.80%	83.13%
		2H	84.14%	77.48%	80.67%
		3H	85.60%	79.98%	82.70%
		4H	92.25%	86.60%	89.33%
	Multi-layout voting (Paddle)	-	94.57%	94.16%	94.36%
	Multi-layout (Tesseract)	H	53.21%	52.64%	52.92%
		2H	52.34%	52.17%	52.25%
		3H	53.15%	52.84%	52.99%
		4H	54.26%	53.25%	53.75%
	Multi-layout voting (Tesseract)	-	56.34%	56.12%	56.23%

Table 3. Accuracy Evaluation on Street Scene and Synthetic Chinese Datasets.

Data Source	OCR Engine	Precision	Recall	F1-Score
Street scene images	Paddle	70.23%	69.88%	70.05%
	Multi-layout voting (Paddle)	72.15%	71.13%	71.64%
	Tesseract	17.32%	13.94%	15.45%
	Multi-layout voting (Tesseract)	20.12%	15.36%	17.42%
Synthetic Chinese text images	Paddle	83.37%	81.45%	82.40%
	Multi-layout voting (Paddle)	86.13%	84.35%	85.23%
	Tesserac	52.36%	48.98%	50.61%
	Multi-layout voting (Tesseract)	56.02%	53.32%	54.64%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, Z.; Guo, Y.; Deng, J.; Fattah, H.A. Optimizing Text Recognition in Borehole Log Images Using a Multi-Layout Adjustment Voting Mechanism. Appl. Sci. 2025, 15, 9171. https://doi.org/10.3390/app15169171

AMA Style

Guo Z, Guo Y, Deng J, Fattah HA. Optimizing Text Recognition in Borehole Log Images Using a Multi-Layout Adjustment Voting Mechanism. Applied Sciences. 2025; 15(16):9171. https://doi.org/10.3390/app15169171

Chicago/Turabian Style

Guo, Zhiyong, Yiwei Guo, Jiqiu Deng, and Hassan Ali Fattah. 2025. "Optimizing Text Recognition in Borehole Log Images Using a Multi-Layout Adjustment Voting Mechanism" Applied Sciences 15, no. 16: 9171. https://doi.org/10.3390/app15169171

APA Style

Guo, Z., Guo, Y., Deng, J., & Fattah, H. A. (2025). Optimizing Text Recognition in Borehole Log Images Using a Multi-Layout Adjustment Voting Mechanism. Applied Sciences, 15(16), 9171. https://doi.org/10.3390/app15169171

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimizing Text Recognition in Borehole Log Images Using a Multi-Layout Adjustment Voting Mechanism

Abstract

1. Introduction

2. Materials and Methods

2.1. Overall Approach

2.2. Data Preprocessing

2.3. OCR Optimization Based on Multi-Layout Voting

3. Experimental Data and Results

3.1. Experimental Data and Evaluation Metrics

3.2. Experimental Results

3.3. General Data Experiment

4. Experimental Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI