MDPI - Publisher of Open Access Journals

20 pages, 1256 KB

Open AccessArticle

Semantic Classification of Railway Bridge Drawings Based on OCR and BP Neural Networks

by Wanqi Wang, Ze Guo, Liu Bao, Xing Yang, Yalong Xie, Ruichang Shi and Shuoyang Zhao

Appl. Sci. 2026, 16(9), 4206; https://doi.org/10.3390/app16094206 (registering DOI) - 24 Apr 2026

Viewed by 92

Digital management of modern railway bridges, a substantial part of high-speed railway networks, is often hindered by manual interpretation of construction drawings for Building Information Modeling (BIM). While individual technologies like optical character recognition (OCR) and neural networks are well-established, their generic application [...] Read more.

Digital management of modern railway bridges, a substantial part of high-speed railway networks, is often hindered by manual interpretation of construction drawings for Building Information Modeling (BIM). While individual technologies like optical character recognition (OCR) and neural networks are well-established, their generic application often fails on complex engineering documents. To address this, a domain-adaptive automatic recognition and semantic interpretation framework is proposed for railway bridge construction drawings. The novelty of this work lies in a specialized hybrid data fusion strategy that intelligently merges vector CAD file parsing with morphology-denoised OCR, resolving spatial and semantic conflicts. Furthermore, a back-propagation (BP) neural network is explicitly adapted to classify the extracted text into specific engineering categories, overcoming the challenges of dense layouts and overlapping symbols. Finally, the framework achieves end-to-end integration by transforming these semantic entities directly into structured, IFC-compatible BIM parameters. Evaluated on 250 real-world drawings, the framework achieved an average F1-score of 91.0% in semantic classification and improved processing efficiency by 6.5 times compared to manual methods. Moreover, 93.8% of the extracted entities achieved strict BIM parameter correctness, defined as seamless mapping to Revit IFC attributes without manual intervention. Full article

(This article belongs to the Special Issue Advancing Building Information Modeling Through Artificial Intelligence: Methods, Applications, and Challenges)

31 pages, 2783 KB

Open AccessArticle

SurveyNet: A Unified Deep Learning Framework for OCR and OMR-Based Survey Digitization

by Rubi Quiñones, Sreeja Cheekireddy and Eren Gultepe

J. Imaging 2026, 12(4), 175; https://doi.org/10.3390/jimaging12040175 - 17 Apr 2026

Viewed by 342

Abstract

Manual survey data entry remains a bottleneck in large-scale research, marketing, and public policy, where survey sheets are still widely used due to accessibility and high response rates. Despite the progress in Optical Character Recognition (OCR) and Optical Mark Recognition (OMR), existing systems [...] Read more.

Manual survey data entry remains a bottleneck in large-scale research, marketing, and public policy, where survey sheets are still widely used due to accessibility and high response rates. Despite the progress in Optical Character Recognition (OCR) and Optical Mark Recognition (OMR), existing systems treat these tasks separately and are typically tailored to clean, standardized forms, making them unreliable for real-world survey sheets with diverse markings and handwritten inputs. These limitations hinder automation and introduce significant error rates in data transcription. To address this, we propose SurveyNet, a unified deep learning framework that combines OCR and OMR capabilities to automatically digitize complex survey responses within a single model. SurveyNet processes both handwritten digits and a wide variety of mark types including ticks, circles, and crosses across multiple question formats. We also introduce SurveySet, a novel dataset comprising 135 real-world survey forms annotated across four key response types. Experimental results demonstrate that SurveyNet achieves between 50% and 97% classification accuracy across tasks, with strong performance even on small and imbalanced datasets. This framework offers a scalable solution for streamlining survey digitization workflows, reducing manual errors, and enabling timely analysis in domains ranging from consumer research to public health and education. Full article

(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)

► Show Figures

Figure 1

21 pages, 1291 KB

Open AccessArticle

Development of a Software Model for Classification and Automatic Cataloging of Archive Documents

by Adilbek Dauletov, Bahodir Muminov, Noila Matyakubova, Uldona Abdurahmonova, Khurshida Bakhriyeva and Makhbubakhon Fayzieva

Information 2026, 17(4), 341; https://doi.org/10.3390/info17040341 - 1 Apr 2026

Viewed by 501

Abstract

This study proposes an integrated software model for automatic document classification and metadata generation based on the Dublin Core standard to address the issue of rapid and consistent management of archival documents in a digital environment. This approach combines the stages of receiving [...] Read more.

This study proposes an integrated software model for automatic document classification and metadata generation based on the Dublin Core standard to address the issue of rapid and consistent management of archival documents in a digital environment. This approach combines the stages of receiving incoming documents, converting them to text using optical character recognition (OCR), image preprocessing (binarization, deskew, noise reduction), and text cleaning and vectorization (TF–IDF) into a single pipeline. In the document classification stage, the Bidirectional Encoder Representations from Transformers (BERT) model with a context-sensitive transformer architecture is used, along with classical machine learning models (Logistic Regression, Naive Bayes, Support Vector Machine) and an ensemble approach (LightGBM), to increase the accuracy by modeling the document content at a deep semantic level. Experiments were conducted on the RVL-CDIP dataset, and the OCR efficiency was evaluated using the Character Error Rate (CER) indicator, and the classification results were evaluated using the accuracy, precision, recall and F1-score metrics. The results confirmed the high stability and generalization ability of the BERT (accuracy, 95.1%; F1, 95.0%) and LightGBM (accuracy, 93.2%; F1, 93.2%) models. In the final stage, OCR, NER, and classification outputs are automatically organized into Dublin Core metadata elements (Title, Creator, Date, Description, Subject, Type, Format, Language) and exported in JSON/XML formats. This automation significantly reduces manual cataloging effort and improves indexing and retrieval efficiency in digital archival systems. Full article

(This article belongs to the Topic Advances in Integrative AI, Machine Learning, and Big Data for Transformative Applications)

► Show Figures

Graphical abstract

11 pages, 1144 KB

Open AccessArticle

Perovskite MAPbBr₂I All-Optical Synapses for Dynamic Pattern Recognition and Diffractive Neuromorphic Computing

by Yang Fang, Yitong Wu, Qing Hou and Xi Chen

Photonics 2026, 13(4), 328; https://doi.org/10.3390/photonics13040328 - 27 Mar 2026

Viewed by 411

Abstract

Conventional optoelectronic synapses rely on electrical signals for core operations, resulting in complex circuitry, limited response speed, and energy inefficiency. Herein, an all-optical synapse based on perovskite MAPbBr₂I is developed that directly converts optical stimuli into transmittance responses that mimic fundamental [...] Read more.

Conventional optoelectronic synapses rely on electrical signals for core operations, resulting in complex circuitry, limited response speed, and energy inefficiency. Herein, an all-optical synapse based on perovskite MAPbBr₂I is developed that directly converts optical stimuli into transmittance responses that mimic fundamental synaptic plasticity, including paired-pulse facilitation, short- and long-term memory, and learning. By using the dynamic transmittance response as input to an artificial neural network, high-accuracy dynamic pattern recognition of sequential characters is achieved. Furthermore, the optically controlled transmittance states are successfully integrated as programmable weights into a diffractive neural network, enabling all-optical classification of MNIST handwritten digits with an accuracy of 89%. This fully optical architecture, which eliminates electronic components and complex circuits, offers a promising pathway toward high-speed, energy-efficient vision systems by fundamentally circumventing the von Neumann bottleneck. Full article

(This article belongs to the Section New Applications Enabled by Photonics Technologies and Systems)

► Show Figures

Figure 1

19 pages, 764 KB

Open AccessArticle

FeOCR: Domain-Adaptive Chinese OCR with Visual Character Disambiguation and LLM-Based Correction for Metallurgical Documents

by Qiang Zheng, Yaxuan Sun, Lin Wang, Haoning Zhang, Fanjie Meng and Minghui Li

Electronics 2026, 15(6), 1144; https://doi.org/10.3390/electronics15061144 - 10 Mar 2026

Viewed by 537

Abstract

High-quality text corpora are essential for knowledge graph construction and domain-specific large model pre-training in technology-intensive industries, with the steel metallurgy sector serving as a representative case. However, many industrial documents remain in scanned or PDF formats, where general-purpose Optical Character Recognition (OCR) [...] Read more.

High-quality text corpora are essential for knowledge graph construction and domain-specific large model pre-training in technology-intensive industries, with the steel metallurgy sector serving as a representative case. However, many industrial documents remain in scanned or PDF formats, where general-purpose Optical Character Recognition (OCR) systems exhibit systematic errors when recognizing Chinese metallurgical documents. In particular, visually similar Chinese characters that differ by only minor strokes are frequently confused, leading to severe degradation of text reliability and cascading errors in downstream knowledge extraction. This paper proposes FeOCR, a general-purpose domain-adaptive framework for machine-printed Chinese characters, which is specifically evaluated within the context of the steel metallurgy industry. The framework integrates visual character disambiguation with context-aware semantic correction. We first construct a metallurgy-specific OCR dataset emphasizing high-frequency confusable Chinese word pairs and enhance data diversity through font perturbation and noise synthesis. Parameter-efficient fine-tuning (LoRA) is then applied to adapt a general OCR model to domain-specific visual patterns. Furthermore, a Large Language Model-based correction module performs semantic refinement of residual errors under domain lexical constraints. Experiments demonstrate significant reductions in character and word error rates, especially for confusable technical terms, providing a reliable foundation for industrial Chinese document digitization. Full article

► Show Figures

Figure 1

24 pages, 6624 KB

Open AccessArticle

Application of Computer Vision to the Automated Extraction of Metadata from Natural History Specimen Labels: A Case Study on Herbarium Specimens

by Jacopo Zacchigna, Weiwei Liu, Felice Andrea Pellegrino, Adriano Peron, Francesco Roma-Marzio, Lorenzo Peruzzi and Stefano Martellos

Plants 2026, 15(4), 637; https://doi.org/10.3390/plants15040637 - 17 Feb 2026

Viewed by 1127

Abstract

Metadata extraction from natural history collection labels is a pivotal task for the online publication of digitized specimens. However, given the scale of these collections—which are estimated to host more than 2 billion specimens worldwide, including ca. 400 million herbarium specimens—manual metadata extraction [...] Read more.

Metadata extraction from natural history collection labels is a pivotal task for the online publication of digitized specimens. However, given the scale of these collections—which are estimated to host more than 2 billion specimens worldwide, including ca. 400 million herbarium specimens—manual metadata extraction is an extremely time-consuming task. Thus, automated data extraction from digital images of specimens and their labels therefore is a promising application of state-of-the-art computer vision techniques. Extracting information from herbarium specimen labels normally involves three main steps: text segmentation, multilingual and handwriting recognition, and data parsing. The primary bottleneck in this workflow lies in the limitations of Optical Character Recognition (OCR) systems. This study explores how the general knowledge embedded in multimodal Transformer models can be transferred to the specific task of herbarium specimen label digitization. The final goal is to develop an easy-to-use, end-to-end solution to mitigate the limitations of classic OCR approaches while offering greater flexibility to adapt to different label formats. Donut-base, a pre-trained visual document understanding (VDU) transformer, was the base model selected for fine-tuning. A dataset from the University of Pisa served as a test bed. The initial attempt achieved an accuracy of 85%, measured using the Tree Edit Distance (TED), demonstrating the feasibility of fine-tuning for this task. Cases with low accuracies were also investigated to identify limitations of the approach. In particular, specimens with multiple labels, especially if combining handwritten and typewritten text, proved to be the most challenging. Strategies aimed at addressing these weaknesses are discussed. Full article

(This article belongs to the Special Issue Advanced Botanical Research in the Mediterranean Area: Studies in Honor of Prof. Francesco Maria Raimondo on the Occasion of His 80th Birthday)

► Show Figures

Figure 1

9 pages, 2031 KB

Open AccessProceeding Paper

Leveraging LLMs and Computer Vision for Personalized Nutrition Advice

by Mateo Tokić, Časlav Livada, Tomislav Galba and Alfonzo Baumgartner

Eng. Proc. 2026, 125(1), 21; https://doi.org/10.3390/engproc2026125021 - 16 Feb 2026

Viewed by 502

Abstract

This paper investigates the application of large language models (LLMs) in the domain of dietary advice, focusing on the recognition of ingredients and nutritional values from food products and the integration of this information into a system capable of delivering personalized recommendations. The [...] Read more.

This paper investigates the application of large language models (LLMs) in the domain of dietary advice, focusing on the recognition of ingredients and nutritional values from food products and the integration of this information into a system capable of delivering personalized recommendations. The research involved the development of a mobile application utilizing React Native and Python Flask frameworks. Optical character recognition (OCR) was implemented through the docTR model to extract nutritional information and ingredients from product images. Based on the extracted data and user profiles stored in a Firestore database, the system generates tailored dietary advice employing OpenAI’s GPT-3.5-turbo model. The findings demonstrate the feasibility of using LLMs to provide personalized dietary recommendations, thereby opening new opportunities in the digital transformation of nutrition and dietary planning. Full article

(This article belongs to the Proceedings of The 34th International Scientific Conference on Organization and Technology of Maintenance (OTO 2025))

► Show Figures

Figure 1

17 pages, 19690 KB

Open AccessArticle

Multilingual Intelligent Retrieval System via Unified End-to-End OCR and Hybrid Search

by Shuo Yang, Zhandong Liu, Ke Li, Ruixia Song, Yong Li and Xiangwei Qi

Appl. Sci. 2026, 16(4), 1771; https://doi.org/10.3390/app16041771 - 11 Feb 2026

Viewed by 803

Abstract

This study addresses the limitations of current Optical Character Recognition (OCR) systems in supporting minority languages and integrating intelligent retrieval functions. We propose an integrated system that combines an advanced end-to-end OCR model with a novel hybrid search approach. First, we developed the [...] Read more.

This study addresses the limitations of current Optical Character Recognition (OCR) systems in supporting minority languages and integrating intelligent retrieval functions. We propose an integrated system that combines an advanced end-to-end OCR model with a novel hybrid search approach. First, we developed the MultiLang-OCR-30K dataset containing 30,000 annotated samples of handwritten Chinese, Tibetan, and Uyghur texts. Second, we extended the GOT model using a freeze encoder–fine-tune decoder strategy to enhance multilingual capabilities. Finally, we designed a character-level hybrid retrieval framework integrating TF-IDF efficiency with Sentence-BERT semantic strength. Experimental results show our extended GOT model achieves sentence accuracies of 82.3%, 76.5%, and 78.1% for handwritten Chinese, Tibetan, and Uyghur, respectively. The hybrid search improves F1 score by 28.7% over TF-IDF alone while maintaining 23 ms average response time. This system provides a practical solution for multilingual document digitization and management, thereby bridging the technological gap for minority languages. Full article

► Show Figures

Figure 1

48 pages, 1973 KB

Open AccessReview

A Review on Reverse Engineering for Sustainable Metal Manufacturing: From 3D Scans to Simulation-Ready Models

by Elnaeem Abdalla, Simone Panfiglio, Mariasofia Parisi and Guido Di Bella

Appl. Sci. 2026, 16(3), 1229; https://doi.org/10.3390/app16031229 - 25 Jan 2026

Viewed by 1892

Abstract

Reverse engineering (RE) has been increasingly adopted in metal manufacturing to digitize legacy parts, connect “as-is” geometry to mechanical performance, and enable agile repair and remanufacturing. This review consolidates scan-to-simulation workflows that transform 3D measurement data (optical/laser scanning and X-ray computed tomography) into [...] Read more.

Reverse engineering (RE) has been increasingly adopted in metal manufacturing to digitize legacy parts, connect “as-is” geometry to mechanical performance, and enable agile repair and remanufacturing. This review consolidates scan-to-simulation workflows that transform 3D measurement data (optical/laser scanning and X-ray computed tomography) into simulation-ready models for structural assessment and manufacturing decisions, with an explicit focus on sustainability. Key steps are reviewed, from acquisition planning and metrological error sources to point-cloud/mesh processing, CAD/feature reconstruction, and geometry preparation for finite-element analysis (watertightness, defeaturing, meshing strategies, and boundary condition transfer). Special attention is given to uncertainty quantification and the propagation of geometric deviations into stress, stiffness, and fatigue predictions, enabling robust accept/reject and repair/replace choices. Sustainability is addressed through a lightweight reporting framework covering material losses, energy use, rework, and lead time across the scan–model–simulate–manufacture chain, clarifying when digitalization reduces scrap and over-processing. Industrial use cases are discussed for high-value metal components (e.g., molds, turbine blades, and marine/energy parts) where scan-informed simulation supports faster and more reliable decision making. Open challenges are summarized, including benchmark datasets, standardized reporting, automation of feature recognition, and integration with repair process simulation (DED/WAAM) and life-cycle metrics. A checklist is proposed to improve reproducibility and comparability across RE studies. Full article

(This article belongs to the Section Mechanical Engineering)

► Show Figures

Figure 1

27 pages, 1031 KB

Open AccessArticle

PMR-Q&A: Development of a Bilingual Expert-Evaluated Question–Answer Dataset for Large Language Models in Physical Medicine and Rehabilitation

by Muhammed Zahid Sahin, Fatma Betul Derdiyok, Serhan Ayberk Kilic, Kasim Serbest and Kemal Nas

Bioengineering 2026, 13(1), 125; https://doi.org/10.3390/bioengineering13010125 - 22 Jan 2026

Cited by 2 | Viewed by 710

Abstract

Objectives: This study presents the development of a bilingual, expert-evaluated question–answer (Q&A) dataset, named PMR-Q&A, designed for training large language models (LLMs) in the field of Physical Medicine and Rehabilitation (PMR). Methods: The dataset was created through a systematic and semi-automated [...] Read more.

Objectives: This study presents the development of a bilingual, expert-evaluated question–answer (Q&A) dataset, named PMR-Q&A, designed for training large language models (LLMs) in the field of Physical Medicine and Rehabilitation (PMR). Methods: The dataset was created through a systematic and semi-automated framework that converts unstructured scientific texts into structured Q&A pairs. Source materials included eight core reference books, 2310 academic publications, and 323 theses covering 15 disease categories commonly encountered in PMR clinical practice. Texts were digitized using layout-aware optical character recognition (OCR), semantically segmented, and distilled through a two-pass LLM strategy employing GPT-4.1 and GPT-4.1-mini models. Results: The resulting dataset consists of 143,712 bilingual Q&A pairs, each annotated with metadata including disease category, reference source, and keywords. A representative subset of 3000 Q&A pairs was extracted for expert validation to evaluate the dataset’s reliability and representativeness. Statistical analyses showed that the validation sample accurately reflected the thematic and linguistic structure of the full dataset, with an average score of 1.90. Conclusions: The PMR-Q&A dataset is a structured and expert-evaluated resource for developing and fine-tuning domain-specific large language models, supporting research and educational efforts in the field of physical medicine and rehabilitation. Full article

(This article belongs to the Topic Artificial Intelligence and Big Data in Biomedical Engineering)

► Show Figures

Figure 1

17 pages, 289 KB

Open AccessArticle

Transforming Historical Newspaper Research and Preservation Through AI: A Global Perspective

by Zhao Xun Song, Kwok Wai Cheung and Zi Yun Jia

Journal. Media 2026, 7(1), 10; https://doi.org/10.3390/journalmedia7010010 - 7 Jan 2026

Viewed by 2037

Abstract

Artificial intelligence (AI) is transforming the preservation and research of historical newspapers by providing powerful tools that overcome longstanding challenges in terms of digitization, analysis, and access. This study offers a comprehensive global analysis of AI-driven innovations—including advanced Optical Character Recognition (OCR), Large [...] Read more.

Artificial intelligence (AI) is transforming the preservation and research of historical newspapers by providing powerful tools that overcome longstanding challenges in terms of digitization, analysis, and access. This study offers a comprehensive global analysis of AI-driven innovations—including advanced Optical Character Recognition (OCR), Large Language Models (LLMs) for post-correction, and Natural Language Processing (NLP) techniques—that significantly enhance text extraction, image restoration, metadata generation, and semantic enrichment. Through qualitative case studies and comparative examinations of projects worldwide, this research demonstrates how AI not only improves the accuracy and efficiency of preservation workflows but also enables novel forms of computational inquiry such as cross-lingual analysis, sentiment detection, and discourse tracking. This study further explores emerging ethical and practical challenges and outlines future directions like multimodal analysis and collaborative digital infrastructures. The findings underscore AI’s transformative role in unlocking historical newspaper archives for both scholarly and public use, thereby fostering a deeper understanding of cultural heritage and historical narratives on a global scale. Full article

21 pages, 3379 KB

Open AccessFeature PaperArticle

KORIE: A Multi-Task Benchmark for Detection, OCR, and Information Extraction on Korean Retail Receipts

by Mahmoud SalahEldin Kasem, Mohamed Mahmoud, Mostafa Farouk Senussi, Mahmoud Abdalla and Hyun Soo Kang

Mathematics 2026, 14(1), 187; https://doi.org/10.3390/math14010187 - 4 Jan 2026

Viewed by 2246

Abstract

We introduce KORIE, a curated benchmark of 748 Korean retail receipts designed to evaluate scene text detection, Optical Character Recognition (OCR), and Information Extraction (IE) under challenging digitization conditions. Unlike existing large-scale repositories, KORIE consists exclusively of receipts digitized via flatbed scanning (HP [...] Read more.

We introduce KORIE, a curated benchmark of 748 Korean retail receipts designed to evaluate scene text detection, Optical Character Recognition (OCR), and Information Extraction (IE) under challenging digitization conditions. Unlike existing large-scale repositories, KORIE consists exclusively of receipts digitized via flatbed scanning (HP LaserJet MFP), specifically selected to preserve complex thermal printing artifacts such as ink fading, banding, and mechanical creases. We establish rigorous baselines across three tasks: (1) Detection, comparing Weakly Supervised Object Localization (WSOL) against state-of-the-art fully supervised models (YOLOv9, YOLOv10, YOLOv11, and DINO-DETR); (2) OCR, benchmarking Tesseract, EasyOCR, PaddleOCR, and a custom Attention-based BiGRU; and (3) Information Extraction, evaluating the zero-shot capabilities of Large Language Models (Llama-3, Qwen-2.5) on structured field parsing. Our results identify YOLOv11 as the optimal detector for dense receipt layouts and demonstrate that while PaddleOCR achieves the lowest Character Error Rate (15.84%), standard LLMs struggle in zero-shot settings due to domain mismatch with noisy Korean receipt text, particularly for price-related fields (F1 scores ≈ 25%). We release the dataset, splits, and evaluation code to facilitate reproducible research on degraded Hangul document understanding. Full article

(This article belongs to the Special Issue Advanced Methods and Applications with Deep Learning in Object Recognition)

► Show Figures

Figure 1

43 pages, 1898 KB

Open AccessReview

Advances in Colorectal Cancer: Epidemiology, Gender and Sex Differences in Biomarkers and Their Perspectives for Novel Biosensing Detection Methods

by Konstantina K. Georgoulia, Vasileios Tsekouras and Sofia Mavrikou

Pharmaceuticals 2026, 19(1), 13; https://doi.org/10.3390/ph19010013 - 20 Dec 2025

Viewed by 1615

Abstract

Colorectal cancer (CRC) remains a major cause of morbidity and mortality worldwide, with its incidence and biological behavior influenced by both genetic and environmental factors. Emerging evidence highlights notable sex differences in CRC, with men generally exhibiting higher incidence rates and poorer prognoses, [...] Read more.

Colorectal cancer (CRC) remains a major cause of morbidity and mortality worldwide, with its incidence and biological behavior influenced by both genetic and environmental factors. Emerging evidence highlights notable sex differences in CRC, with men generally exhibiting higher incidence rates and poorer prognoses, while women often display stronger immune responses and distinct molecular profiles. Traditional screening tools, such as colonoscopy and fecal-based tests, have improved survival through early detection but are limited by invasiveness, cost, and adherence issues. In this context, biosensors have emerged as innovative diagnostic platforms capable of rapid, sensitive, and non-invasive detection of CRC-associated biomarkers, including genetic, epigenetic, and metabolic alterations. These technologies integrate biological recognition elements with nanomaterials, microfluidics, and digital systems, enabling the analysis of biomarkers such as proteins, nucleic acids, autoantibodies, epigenetic marks, and metabolic or VOC signatures from blood, stool, or breath and supporting point-of-care applications. Electrochemical, optical, piezoelectric, and FET platforms enable label-free or ultrasensitive multiplexed readouts and align with liquid biopsy workflows. Despite challenges related to standardization, robustness in complex matrices, and clinical validation, advances in nanotechnology, multi-analyte biosensing with artificial intelligence are enhancing biosensor performance. Integrating biosensor-based diagnostics with knowledge of sex-specific molecular and hormonal pathways may lead to more precise and equitable approaches in CRC detection, selection of therapeutic regimes and management. Full article

(This article belongs to the Special Issue Application of Biosensors in Pharmaceutical Research)

► Show Figures

Graphical abstract

23 pages, 2549 KB

Open AccessArticle

Intelligent Symmetry-Based Vision System for Real-Time Industrial Process Supervision

by Gabriel Corrales, Catherine Gálvez, Edwin P. Pruna, Víctor H. Andaluz and Jessica S. Ortiz

Symmetry 2025, 17(12), 2143; https://doi.org/10.3390/sym17122143 - 12 Dec 2025

Viewed by 651

Abstract

Industrial environments still rely heavily on analog instruments for process supervision, as their robustness and low cost make them suitable for harsh conditions. However, these devices require manual readings, which limit automation and digital integration within Industry 4.0 frameworks. To address this gap, [...] Read more.

Industrial environments still rely heavily on analog instruments for process supervision, as their robustness and low cost make them suitable for harsh conditions. However, these devices require manual readings, which limit automation and digital integration within Industry 4.0 frameworks. To address this gap, this study proposes an intelligent and cost-effective system for non-invasive acquisition of measurement data from analog industrial instruments, leveraging machine vision and Artificial Neural Networks (ANNs). The proposed framework exploits the geometric symmetry inherent in circular and linear scales to interpret pointer positions under varying lighting and perspective conditions. A dedicated image-processing pipeline is combined with lightweight ANN architectures optimized for embedded platforms, ensuring real-time inference without the need for high-end hardware. The processed data are wirelessly transmitted to a Human–Machine Interface (HMI) and web-based dashboard for real-time visualization. Experimental validation on pressure and flow instruments demonstrated an average Mean Absolute Error (MAE) of 0.589 PSI and 0.085 GPM, Root Mean Square Error (RMSE) values of 0.731 PSI and 0.097 GPM, and coefficients of determination (R²) of 0.985 and 0.978, respectively. The system achieved an average processing time of 3.74 ms per cycle on a Raspberry Pi 3 platform, outperforming Optical Character Recognition (OCR) and Convolutional Neural Network (CNN)-based methods in terms of computational efficiency and latency. The results confirm the feasibility of a symmetry-driven vision framework for real-time industrial supervision, providing a practical pathway to digitalize legacy analog instruments and promote low-cost, intelligent Industry 4.0 implementations. Full article

(This article belongs to the Special Issue Applications Based on Symmetry in Control Systems and Robotics)

► Show Figures

Figure 1

25 pages, 1910 KB

Open AccessReview

Natural Language Processing in Generating Industrial Documentation Within Industry 4.0/5.0

by Izabela Rojek, Olga Małolepsza, Mirosław Kozielski and Dariusz Mikołajewski

Appl. Sci. 2025, 15(23), 12662; https://doi.org/10.3390/app152312662 - 29 Nov 2025

Cited by 1 | Viewed by 1866

Abstract

Deep learning (DL) methods have revolutionized natural language processing (NLP), enabling industrial documentation systems to process and generate text with high accuracy and fluency. Modern deep learning models, such as transformers and recurrent neural networks (RNNs), learn contextual relationships in text, making them [...] Read more.

Deep learning (DL) methods have revolutionized natural language processing (NLP), enabling industrial documentation systems to process and generate text with high accuracy and fluency. Modern deep learning models, such as transformers and recurrent neural networks (RNNs), learn contextual relationships in text, making them ideal for analyzing and creating complex industrial documentation. Transformer-based architectures, such as BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer), are ideally suited for tasks such as text summarization, content generation, and question answering, which are crucial for documentation systems. Pre-trained language models, tuned to specific industrial datasets, support domain-specific vocabulary, ensuring the generated documentation complies with industry standards. Deep learning-based systems can use sequential models, such as those used in machine translation, to generate documentation in multiple languages, promoting accessibility, and global collaboration. Using attention mechanisms, these models identify and highlight critical sections of input data, resulting in the generation of accurate and concise documentation. Integration with optical character recognition (OCR) tools enables DL-based NLP systems to digitize and interpret legacy documents, streamlining the transition to automated workflows. Reinforcement learning and human feedback loops can enhance a system’s ability to generate consistent and contextually relevant text over time. These approaches are particularly effective in creating dynamic documentation that is automatically updated based on data from sensors, registers, or other sources in real time. The scalability of DL techniques enables industrial organizations to efficiently produce massive amounts of documentation, reducing manual effort and improving overall efficiency. NLP has become a fundamental technology for automating the generation, maintenance, and personalization of industrial documentation within the Industry 4.0, 5.0, and emerging Industry 6.0 paradigms. Recent advances in large language models, search-assisted generation, and multimodal architectures have significantly improved the accuracy and contextualization of technical manuals, maintenance reports, and compliance documents. However, persistent challenges such as domain-specific terminology, data scarcity, and the risk of hallucinations highlight the limitations of current approaches in safety-critical manufacturing environments. This review synthesizes state-of-the-art methods, comparing rule-based, neural, and hybrid systems while assessing their effectiveness in addressing industrial requirements for reliability, traceability, and real-time adaptation. Human–AI collaboration and the integration of knowledge graphs are transforming documentation workflows as factories evolve toward cognitive and autonomous systems. The review included 32 articles published between 2018 and 2025. The implications of these bibliometric findings suggest that a high percentage of conference papers (69.6%) may indicate a field still in its conceptual phase, which contextualizes the article’s emphasis on proposed architecture rather than their industrial validation. Most research was conducted in computer science, suggesting early stages of technological maturity. The leading countries were China and India, but these countries did not have large publication counts, nor were leading researchers or affiliations observed, suggesting significant research dispersion. However, the most frequently observed SDGs indicate a clear health context, focusing on “industry innovation and infrastructure” and “good health and well-being”. Full article

(This article belongs to the Special Issue Emerging and Exponential Technologies in Industry 4.0)

► Show Figures

Figure 1

Search Results (115)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (115)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI