MDPI - Publisher of Open Access Journals

22 pages, 9762 KiB

Open AccessArticle

A Map Information Collection Tool for a Pedestrian Navigation System Using Smartphone

by Kadek Suarjuna Batubulan, Nobuo Funabiki, Komang Candra Brata, I Nyoman Darma Kotama, Htoo Htoo Sandi Kyaw and Shintami Chusnul Hidayati

Information 2025, 16(7), 588; https://doi.org/10.3390/info16070588 - 8 Jul 2025

Viewed by 284

Abstract

Nowadays, a pedestrian navigation system using a smartphone has become popular as a useful tool to reach an unknown destination. When the destination is the office of a person, a detailed map information is necessary on the target area such as the room [...] Read more.

Nowadays, a pedestrian navigation system using a smartphone has become popular as a useful tool to reach an unknown destination. When the destination is the office of a person, a detailed map information is necessary on the target area such as the room number and location inside the building. The information can be collected from various sources including Google maps, websites for the building, and images of signs. In this paper, we propose a map information collection tool for a pedestrian navigation system. To improve the accuracy and completeness of information, it works with the four steps: (1) a user captures building and room images manually, (2) an OCR software using Google ML Kit v2 processes them to extract the sign information from images, (3) web scraping using Scrapy (v2.11.0) and crawling with Apache Nutch (v1.19) software collects additional details such as room numbers, facilities, and occupants from relevant websites, and (4) the collected data is stored in the database to be integrated with a pedestrian navigation system. For evaluations of the proposed tool, the map information was collected for 10 buildings at Okayama University, Japan, a representative environment combining complex indoor layouts (e.g., interconnected corridors, multi-floor facilities) and high pedestrian traffic, which are critical for testing real-world navigation challenges. The collected data is assessed in completeness and effectiveness. A university campus was selected as it presents a complex indoor and outdoor environment that can be ideal for testing pedestrian navigations in real-world scenarios. With the obtained map information, 10 users used the navigation system to successfully reach destinations. The System Usability Scale (SUS) results through a questionnaire confirms the high usability. Full article

(This article belongs to the Special Issue Feature Papers in Information in 2024–2025)

► Show Figures

Figure 1

27 pages, 1417 KiB

Open AccessArticle

A BERT-Based Multimodal Framework for Enhanced Fake News Detection Using Text and Image Data Fusion

by Mohammed Al-alshaqi, Danda B. Rawat and Chunmei Liu

Computers 2025, 14(6), 237; https://doi.org/10.3390/computers14060237 - 16 Jun 2025

Viewed by 1125

Abstract

The spread of fake news on social media is complicated by the fact that fake information spreads extremely fast in both textual and visual formats. Traditional approaches to the detection of fake news focus mainly on text and image features, thereby missing valuable [...] Read more.

The spread of fake news on social media is complicated by the fact that fake information spreads extremely fast in both textual and visual formats. Traditional approaches to the detection of fake news focus mainly on text and image features, thereby missing valuable information contained within images and texts. In response to this, we propose a multimodal fake news detection method based on BERT, with an extension to text combined with the extracted text from images through Optical Character Recognition (OCR). Here, we consider extending feature analysis with BERT_base_uncased to process inputs for retrieving relevant text from images and determining a confidence score that suggests the probability of the news being authentic. We report extensive experimental results on the ISOT, WELFAKE, TRUTHSEEKER, and ISOT_WELFAKE_TRUTHSEEKER datasets. Our proposed model demonstrates better generalization on the TRUTHSEEKER dataset with an accuracy of 99.97%, achieving substantial improvements over existing methods with an F1-score of 0.98. Experimental results indicate a potential accuracy increment of +3.35% compared to the latest baselines. These results highlight the potential of our approach to serve as a strong resource for automatic fake news detection by effectively integrating both text and visual data streams. Findings suggest that using diverse datasets enhances the resilience of detection systems against misinformation strategies. Full article

(This article belongs to the Special Issue Recent Advances in Social Networks and Social Media)

► Show Figures

Figure 1

28 pages, 5387 KiB

Open AccessArticle

A Deep Learning Framework of Super Resolution for License Plate Recognition in Surveillance System

by Pei-Fen Tsai, Jia-Yin Shiu and Shyan-Ming Yuan

Mathematics 2025, 13(10), 1673; https://doi.org/10.3390/math13101673 - 20 May 2025

Viewed by 1145

Abstract

Recognizing low-resolution license plates from real-world scenes remains a challenging task. While deep learning-based super-resolution methods have been widely applied, most existing datasets rely on artificially degraded images, and common quality metrics poorly correlate with OCR accuracy. We construct a new paired low- [...] Read more.

Recognizing low-resolution license plates from real-world scenes remains a challenging task. While deep learning-based super-resolution methods have been widely applied, most existing datasets rely on artificially degraded images, and common quality metrics poorly correlate with OCR accuracy. We construct a new paired low- and high-resolution license plate dataset from dashcam videos and propose a specialized super-resolution framework for license plate recognition. Only low-resolution images with OCR accuracy ≥5 are used to ensure sufficient feature information for effective perceptual learning. We analyze existing loss functions and introduce two novel perceptual losses—one CNN-based and one Transformer-based. Our approach improves recognition performance, achieving an average OCR accuracy of 85.14%. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

31 pages, 4226 KiB

Open AccessArticle

Raster Image-Based House-Type Recognition and Three-Dimensional Reconstruction Technology

by Jianbo Chang, Yunlei Lv, Jian Wang, Hao Pang and Yaqiu Liu

Buildings 2025, 15(7), 1178; https://doi.org/10.3390/buildings15071178 - 3 Apr 2025

Viewed by 701

Abstract

The automatic identification and three-dimensional reconstruction of house plans has emerged as a significant research direction in intelligent building and smart city applications. Three-dimensional models reconstructed from two-dimensional floor plans provide more intuitive visualization for building safety assessments and spatial suitability evaluations. To [...] Read more.

The automatic identification and three-dimensional reconstruction of house plans has emerged as a significant research direction in intelligent building and smart city applications. Three-dimensional models reconstructed from two-dimensional floor plans provide more intuitive visualization for building safety assessments and spatial suitability evaluations. To address the limitations of existing public datasets—including low quality, inaccurate annotations, and poor alignment with residential architecture characteristics—this study constructs a high-quality vector dataset of raster house plans. We collected and meticulously annotated over 5000 high-quality floor plans representative of urban housing typologies, covering the majority of common residential layouts in the region. For architectural element recognition, we propose a key point-based detection approach for walls, doors, windows, and scale indicators. To improve wall localization accuracy, we introduce CPN-Floor, a method that achieves precise key point detection of house plan primitives. By generating and filtering candidate primitives through axial alignment rules and geometric constraints, followed by post-processing to refine the positions of walls, doors, and windows, our approach achieves over 87% precision and 88% recall, with positional errors within 1% of the floor plan’s dimensions. Scale recognition combines YOLOv8 with Shi–Tomasi corner detection to identify measurement endpoints, while leveraging the pre-trained multimodal OFA-OCR model for digital character recognition. This integrated solution achieves scale calculation accuracy exceeding 95%. We design and implement a house model recognition and 3D reconstruction system based on the WebGL framework and use the front-end MVC design pattern to interact with the data and views of the house model. We also develop a high-performance house model recognition and reconstruction system to support the rendering of reconstructed walls, doors, and windows; user interaction with the reconstructed house model; and the history of the house model operations, such as forward and backward functions. Full article

(This article belongs to the Special Issue Information Technology in Building Construction Management)

► Show Figures

Figure 1

14 pages, 423 KiB

Open AccessArticle

A Small-Scale Evaluation of Large Language Models Used for Grammatical Error Correction in a German Children’s Literature Corpus: A Comparative Study

by Phuong Thao Nguyen, Bernd Nuss, Roswita Dressler and Katie Ovens

Appl. Sci. 2025, 15(5), 2476; https://doi.org/10.3390/app15052476 - 25 Feb 2025

Viewed by 1153

Abstract

Grammatical error correction (GEC) has become increasingly important for enhancing the quality of OCR-scanned texts. This small-scale study explores the application of Large Language Models (LLMs) for GEC in German children’s literature, a genre with unique linguistic challenges due to modified language, colloquial [...] Read more.

Grammatical error correction (GEC) has become increasingly important for enhancing the quality of OCR-scanned texts. This small-scale study explores the application of Large Language Models (LLMs) for GEC in German children’s literature, a genre with unique linguistic challenges due to modified language, colloquial expressions, and complex layouts that often lead to OCR-induced errors. While conventional rule-based and statistical approaches have been used in the past, advancements in machine learning and artificial intelligence have introduced models capable of more contextually nuanced corrections. Despite these developments, limited research has been conducted on evaluating the effectiveness of state-of-the-art LLMs, specifically in the context of German children’s literature. To address this gap, we fine-tuned encoder-based models GBERT and GELECTRA on German children’s literature, and compared their performance to decoder-based models GPT-4o and Llama series (versions 3.2 and 3.1) in a zero-shot setting. Our results demonstrate that all pretrained models, both encoder-based (GBERT, GELECTRA) and decoder-based (GPT-4o, Llama series), failed to effectively remove OCR-generated noise in children’s literature, highlighting the necessity of a preprocessing step to handle structural inconsistencies and artifacts introduced during scanning. This study also addresses the lack of comparative evaluations between encoder-based and decoder-based models for German GEC, with most prior work focusing on English. Quantitative analysis reveals that decoder-based models significantly outperform fine-tuned encoder-based models, with GPT-4o and Llama-3.1-70B achieving the highest accuracy in both error detection and correction. Qualitative assessment further highlights distinct model behaviors: GPT-4o demonstrates the most consistent correction performance, handling grammatical nuances effectively while minimizing overcorrection. Llama-3.1-70B excels in error detection but occasionally relies on frequency-based substitutions over meaning-driven corrections. Unlike earlier decoder-based models, which often exhibited overcorrection tendencies, our findings indicate that state-of-the-art decoder-based models strike a better balance between correction accuracy and semantic preservation. By identifying the strengths and limitations of different model architectures, this study enhances the accessibility and readability of OCR-scanned German children’s literature. It also provides new insights into the role of preprocessing in digitized text correction, the comparative performance of encoder- and decoder-based models, and the evolving correction tendencies of modern LLMs. These findings contribute to language preservation, corpus linguistics, and digital archiving, offering an AI-driven solution for improving the quality of digitized children’s literature while ensuring linguistic and cultural integrity. Future research should explore multimodal approaches that integrate visual context to further enhance correction accuracy for children’s books with image-embedded text. Full article

(This article belongs to the Special Issue Applications of Natural Language Processing to Data Science)

► Show Figures

Figure 1

32 pages, 9587 KiB

Open AccessArticle

A Layered Framework for Universal Extraction and Recognition of Electrical Diagrams

by Weiguo Cao, Zhong Chen, Congying Wu and Tiecheng Li

Electronics 2025, 14(5), 833; https://doi.org/10.3390/electronics14050833 - 20 Feb 2025

Cited by 1 | Viewed by 1003

Abstract

Secondary systems in electrical engineering often rely on traditional CAD software (AutoCAD v2024.1.6) or non-structured, paper-based diagrams for fieldwork, posing challenges for digital transformation. Electrical diagram recognition technology bridges this gap by converting traditional diagram operations into a “digital” model, playing a critical [...] Read more.

Secondary systems in electrical engineering often rely on traditional CAD software (AutoCAD v2024.1.6) or non-structured, paper-based diagrams for fieldwork, posing challenges for digital transformation. Electrical diagram recognition technology bridges this gap by converting traditional diagram operations into a “digital” model, playing a critical role in power system scheduling, operation, and maintenance. However, conventional recognition methods, which primarily rely on partition detection, face significant limitations such as poor adaptability to diverse diagram styles, interference among recognition objects, and reduced accuracy in handling complex and varied electrical diagrams. This paper introduces a novel layered framework for electrical diagram recognition that sequentially extracts the element layer, text layer, and connection relationship layer to address these challenges. First, an improved YOLOv7 model, combined with a multi-scale sliding window strategy, is employed to accurately segment large and small diagram objects. Next, PaddleOCR, trained with electrical-specific terminology, and PaddleClas, using multi-angle classification, are utilized for robust text recognition, effectively mitigating interference from diagram elements. Finally, clustering and adaptive FcF-inpainting algorithms are applied to repair the connection relationship layer, resolving local occlusion issues and enhancing the overall coupling of the diagram. Experimental results demonstrate that the proposed method outperforms existing approaches in robustness and universality, particularly for complex diagrams, providing technical support for intelligent power grid construction and operation. Full article

► Show Figures

Figure 1

25 pages, 5090 KiB

Open AccessArticle

Research on Intelligent Verification of Equipment Information in Engineering Drawings Based on Deep Learning

by Zicheng Zhang and Yurou He

Electronics 2025, 14(4), 814; https://doi.org/10.3390/electronics14040814 - 19 Feb 2025

Viewed by 724

Abstract

This paper focuses on the crucial task of automatic recognition and understanding of table structures in engineering drawings and document processing. Given the importance of tables in information display and the urgent need for automated processing of tables in the digitalization process, an [...] Read more.

This paper focuses on the crucial task of automatic recognition and understanding of table structures in engineering drawings and document processing. Given the importance of tables in information display and the urgent need for automated processing of tables in the digitalization process, an intelligent verification method is proposed. This method integrates multiple key techniques: YOLOv10 is used for table object recognition, achieving a precision of 0.891, a recall rate of 0.899, mAP50 of 0.922, and mAP50-95 of 0.677 in table recognition, demonstrating strong target detection capabilities; the improved LORE algorithm is adopted to extract table structures, breaking through the limitations of the original algorithm by segmenting large-sized images, with a table extraction accuracy rate reaching 91.61% and significantly improving the accuracy of handling complex tables; RapidOCR is utilized to achieve text recognition and cell correspondence, solving the problem of text-cell matching; for equipment name semantic matching, a method based on BERT is introduced and calculated using a comprehensive scoring method. Meanwhile, an improved cuckoo search algorithm is proposed to optimize the adjustment factors, avoiding local optima through sine optimization and the catfish effect. Experiments show the accuracy of equipment name matching in semantic similarity calculation approaches 100%. Finally, the paper provides a concrete system practice to prove the effectiveness of the algorithm. In conclusion, through experimental comparisons, this method exhibits excellent performance in table area location, structure recognition, and semantic matching and is of great significance and practical value in advancing table data processing technology in engineering drawings. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

31 pages, 6157 KiB

Open AccessArticle

A Self-Adaptive Traffic Signal System Integrating Real-Time Vehicle Detection and License Plate Recognition for Enhanced Traffic Management

by Manar Ashkanani, Alanoud AlAjmi, Aeshah Alhayyan, Zahraa Esmael, Mariam AlBedaiwi and Muhammad Nadeem

Inventions 2025, 10(1), 14; https://doi.org/10.3390/inventions10010014 - 5 Feb 2025

Cited by 4 | Viewed by 4801

Abstract

Traffic management systems play a crucial role in smart cities, especially because increasing urban populations lead to higher traffic volumes on roads. This results in increased congestion at intersections, causing delays and traffic violations. This paper proposes an adaptive traffic control and optimization [...] Read more.

Traffic management systems play a crucial role in smart cities, especially because increasing urban populations lead to higher traffic volumes on roads. This results in increased congestion at intersections, causing delays and traffic violations. This paper proposes an adaptive traffic control and optimization system that dynamically adjusts signal timings in response to real-time traffic situations and volumes by applying machine learning algorithms to images captured through video surveillance cameras. This system is also able to capture the details of vehicles violating signals, which would be helpful for enforcing traffic rules. Benefiting from advancements in computer vision techniques, we deployed a novel real-time object detection model called YOLOv11 in order to detect vehicles and adjust the duration of green signals. Our system used Tesseract OCR for extracting license plate information, thus ensuring robust traffic monitoring and enforcement. A web-based real-time digital twin complemented the system by visualizing traffic volume and signal timings for the monitoring and optimization of traffic flow. Experimental results demonstrated that YOLOv11 achieved a better overall accuracy, namely 95.1%, and efficiency compared to previous models. The proposed solution reduces congestion and improves traffic flow across intersections while offering a scalable and cost-effective approach for smart traffic and lowering greenhouse gas emissions at the same time. Full article

(This article belongs to the Special Issue Advanced Technologies and Artificial Intelligence for Sustainable and Intelligent Transportation Systems)

► Show Figures

Figure 1

18 pages, 7527 KiB

Open AccessArticle

Simulation and Experimental Study on the Oil Circulation Rate (OCR) of R290 Electrical Vehicle Compressors

by Jianhong Chen, Leren Tao, Lihao Huang, Xiaofei Wang, Xingjiang Li and Haonan Chen

Appl. Sci. 2025, 15(3), 1391; https://doi.org/10.3390/app15031391 - 29 Jan 2025

Viewed by 992

Abstract

This paper establishes a simulation model for the performance of an R290 variable frequency compressor in automotive air conditioning and sets up a compressor performance testing system. It investigates the effects of different evaporation temperatures, condensation temperatures, compressor speeds, and pressure ratios on [...] Read more.

This paper establishes a simulation model for the performance of an R290 variable frequency compressor in automotive air conditioning and sets up a compressor performance testing system. It investigates the effects of different evaporation temperatures, condensation temperatures, compressor speeds, and pressure ratios on the oil circulation rate (OCR), as well as the impact of various oil circulation rates on the performance of the R290 compressor. As the comparison between simulation and experimental data shows, compressor performance predictions from the simulation model align with experimental results when the OCR is not taken into consideration. Experimental results indicate that the OCR increases with a rising evaporation temperature, decreases with a lowering condensation temperature, and increases with higher compressor speeds. The simulation model shows a minor deviation when predicting volumetric efficiency, while errors are larger when predicting isentropic efficiency and the discharge temperature. Isentropic efficiency and the discharge temperature show a notable impact from the OCR. Additionally, for system cooling capacity, power, and COP predictions, when the OCR is within the range of 2~10%, the accuracy of the simulation model proves satisfactory, with deviations within 5%. Full article

► Show Figures

Figure 1

26 pages, 8033 KiB

Open AccessArticle

Time-Series Image-Based Automated Monitoring Framework for Visible Facilities: Focusing on Installation and Retention Period

by Seonjun Yoon and Hyunsoo Kim

Sensors 2025, 25(2), 574; https://doi.org/10.3390/s25020574 - 20 Jan 2025

Cited by 4 | Viewed by 1080

Abstract

In the construction industry, ensuring the proper installation, retention, and dismantling of temporary structures, such as jack supports, is critical to maintaining safety and project timelines. However, inconsistencies between on-site data and construction documentation remain a significant challenge. To address this, this study [...] Read more.

In the construction industry, ensuring the proper installation, retention, and dismantling of temporary structures, such as jack supports, is critical to maintaining safety and project timelines. However, inconsistencies between on-site data and construction documentation remain a significant challenge. To address this, this study proposes an integrated monitoring framework that combines computer vision-based object detection and document recognition techniques. The system utilizes YOLOv5 for detecting jack supports in both construction drawings and on-site images captured through wearable cameras, while optical character recognition (OCR) and natural language processing (NLP) extract installation and dismantling timelines from work orders. The proposed framework enables continuous monitoring and ensures compliance with retention periods by aligning on-site data with documented requirements. The analysis includes 23 jack supports monitored daily over 28 days under varying environmental conditions, including lighting changes and structural configurations. The results demonstrate that the system achieves an average detection accuracy of 94.1%, effectively identifying discrepancies and reducing misclassifications caused by structural similarities and environmental variations. To further enhance detection reliability, methods such as color differentiation, construction plan overlays, and vertical segmentation were implemented, significantly improving performance. This study validates the effectiveness of integrating visual and textual data sources in dynamic construction environments. The study supports the development of automated monitoring systems by improving accuracy and safety measures while reducing manual intervention, offering practical insights for future construction site management. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

16 pages, 4123 KiB

Open AccessArticle

Improving Text Recognition Accuracy for Serbian Legal Documents Using BERT

by Miloš Bogdanović, Milena Frtunić Gligorijević, Jelena Kocić and Leonid Stoimenov

Appl. Sci. 2025, 15(2), 615; https://doi.org/10.3390/app15020615 - 10 Jan 2025

Cited by 3 | Viewed by 1240

Abstract

Producing a new high-quality text corpus is a big challenge due to the required complexity and labor expenses. High-quality datasets, considered a prerequisite for many supervised machine learning algorithms, are often only available in very limited quantities. This in turn limits the capabilities [...] Read more.

Producing a new high-quality text corpus is a big challenge due to the required complexity and labor expenses. High-quality datasets, considered a prerequisite for many supervised machine learning algorithms, are often only available in very limited quantities. This in turn limits the capabilities of many advanced technologies when used in a specific field of research and development. This is also the case for the Serbian language, which is considered low-resourced in digitized language resources. In this paper, we address this issue for the Serbian language through a novel approach for generating high-quality text corpora by improving text recognition accuracy for scanned documents belonging to Serbian legal heritage. Our approach integrates three different components to provide high-quality results: a BERT-based large language model built specifically for Serbian legal texts, a high-quality open-source optical character recognition (OCR) model, and a word-level similarity measure for Serbian Cyrillic developed for this research and used for generating necessary correction suggestions. This approach was evaluated manually using scanned legal documents sampled from three different epochs between the years 1970 and 2002 with more than 14,500 test cases. We demonstrate that our approach can correct up to 88% of terms inaccurately extracted by the OCR model in the case of Serbian legal texts. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 1456 KiB

Open AccessArticle

Ventinel: Automated Detection of Android Vishing Apps Using Optical Character Recognition

by Daegyeom Kim, Sehwan O, Younghoon Ban, Jungsoo Park, Kyungho Joo and Haehyun Cho

Future Internet 2025, 17(1), 24; https://doi.org/10.3390/fi17010024 - 7 Jan 2025

Viewed by 1446

Abstract

Vishing, a blend of “voice” and “phishing”, has evolved to include techniques like Call Redirection and Display Overlay Attacks, causing significant financial losses. Existing research has largely focused on user behavior and awareness, leaving gaps in addressing attacks originating from vishing applications. In [...] Read more.

Vishing, a blend of “voice” and “phishing”, has evolved to include techniques like Call Redirection and Display Overlay Attacks, causing significant financial losses. Existing research has largely focused on user behavior and awareness, leaving gaps in addressing attacks originating from vishing applications. In this work, we present Ventinel, an Android-based defense system designed to detect these attacks without requiring OS modifications. Ventinel employs Optical Character Recognition (OCR) to compare phone numbers during calls, effectively preventing Call Redirection and Display Overlay Attacks. Additionally, it safeguards against Duplicated Contacts Attacks by cross-referencing call logs and SMS records. Ventinel achieves 100% detection accuracy, surpassing commercial applications, and operates with minimal data collection to ensure user privacy. We also describe malicious API behavior and demonstrate that the same behavior is possible for API levels 29 and higher. Furthermore, we analyze the limitations of existing solutions and propose new attack and defense strategies. Full article

(This article belongs to the Special Issue Information and Future Internet Security, Trust and Privacy—3rd Edition)

► Show Figures

Figure 1

20 pages, 32805 KiB

Open AccessArticle

Application of Generative Artificial Intelligence Models for Accurate Prescription Label Identification and Information Retrieval for the Elderly in Northern East of Thailand

by Parinya Thetbanthad, Benjaporn Sathanarugsawait and Prasong Praneetpolgrang

J. Imaging 2025, 11(1), 11; https://doi.org/10.3390/jimaging11010011 - 6 Jan 2025

Viewed by 2527

Abstract

This study introduces a novel AI-driven approach to support elderly patients in Thailand with medication management, focusing on accurate drug label interpretation. Two model architectures were explored: a Two-Stage Optical Character Recognition (OCR) and Large Language Model (LLM) pipeline combining EasyOCR with Qwen2-72b-instruct [...] Read more.

This study introduces a novel AI-driven approach to support elderly patients in Thailand with medication management, focusing on accurate drug label interpretation. Two model architectures were explored: a Two-Stage Optical Character Recognition (OCR) and Large Language Model (LLM) pipeline combining EasyOCR with Qwen2-72b-instruct and a Uni-Stage Visual Question Answering (VQA) model using Qwen2-72b-VL. Both models operated in a zero-shot capacity, utilizing Retrieval-Augmented Generation (RAG) with DrugBank references to ensure contextual relevance and accuracy. Performance was evaluated on a dataset of 100 diverse prescription labels from Thai healthcare facilities, using RAG Assessment (RAGAs) metrics to assess Context Recall, Factual Correctness, Faithfulness, and Semantic Similarity. The Two-Stage model achieved high accuracy (94%) and strong RAGAs scores, particularly in Context Recall (0.88) and Semantic Similarity (0.91), making it well-suited for complex medication instructions. In contrast, the Uni-Stage model delivered faster response times, making it practical for high-volume environments such as pharmacies. This study demonstrates the potential of zero-shot AI models in addressing medication management challenges for the elderly by providing clear, accurate, and contextually relevant label interpretations. The findings underscore the adaptability of AI in healthcare, balancing accuracy and efficiency to meet various real-world needs. Full article

(This article belongs to the Section AI in Imaging)

► Show Figures

Figure 1

16 pages, 3815 KiB

Open AccessArticle

An Ante Hoc Enhancement Method for Image-Based Complex Financial Table Extraction

by Weiyu Peng and Xuhui Li

Appl. Sci. 2025, 15(1), 370; https://doi.org/10.3390/app15010370 - 2 Jan 2025

Viewed by 981

Abstract

In the field of finance, the table is a common form of data organization. Extracting data from these tables in large quantities is a fundamental and important task for researchers. However, this can be a challenging task, as many tables exist in unstructured [...] Read more.

In the field of finance, the table is a common form of data organization. Extracting data from these tables in large quantities is a fundamental and important task for researchers. However, this can be a challenging task, as many tables exist in unstructured forms, such as scanned images in PDFs, rather than forms which can be easily processed, such as Excel spreadsheets. In recent years, a large number of table extraction methods utilizing heuristic algorithms or deep learning models have been proposed to free people from manual processing tasks, which are time-consuming and troublesome. Although existing methods achieve high levels of accuracy in processing some kinds of tables, they often fail to achieve optimal results when extracting complex financial tables with multi-line text and missing demarcation lines. In this article, we propose an enhancement method for image-based complex table extraction. This method consists of two modules: a split module and a filter module. The split module uses an OCR (optical character recognition) model to locate text regions, and a heuristic algorithm to obtain candidate demarcation lines. The filter module is based on a text semantic matching model and another heuristic algorithm. The experimental results show that the use of the proposed method can significantly improve the performance of different table extraction methods, with increases in F1 scores of between 5.10 and 14.36 points being recorded. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 9016 KiB

Open AccessArticle

Leveraging Transformer-Based OCR Model with Generative Data Augmentation for Engineering Document Recognition

by Wael Khallouli, Mohammad Shahab Uddin, Andres Sousa-Poza, Jiang Li and Samuel Kovacic

Electronics 2025, 14(1), 5; https://doi.org/10.3390/electronics14010005 - 24 Dec 2024

Viewed by 4461

Abstract

The long-standing practice of document-based engineering has resulted in the accumulation of a large number of engineering documents across various industries. Engineering documents, such as 2D drawings, continue to play a significant role in exchanging information and sharing knowledge across multiple engineering processes. [...] Read more.

The long-standing practice of document-based engineering has resulted in the accumulation of a large number of engineering documents across various industries. Engineering documents, such as 2D drawings, continue to play a significant role in exchanging information and sharing knowledge across multiple engineering processes. However, these documents are often stored in non-digitized formats, such as paper and portable document format (PDF) files, making automation difficult. As digital engineering transforms processes in many industries, digitizing engineering documents presents a crucial challenge that requires advanced methods. This research addresses the problem of automatically extracting textual content from non-digitized legacy engineering documents. We introduced an optical character recognition (OCR) system for text detection and recognition that leverages transformer-based generative deep learning models and transfer learning approaches to enhance text recognition accuracy in engineering documents. The proposed system was evaluated on a dataset collected from ships’ engineering drawings provided by a U.S. agency. Experimental results demonstrated that the proposed transformer-based OCR model significantly outperformed pretrained off-the-shelf OCR models. Full article

(This article belongs to the Special Issue Deep Learning in Video and Image Processing: Challenges, Solutions, and Future Directions)

► Show Figures

Figure 1

Search Results (89)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (89)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI