Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (650)

Search Parameters:
Keywords = Character Recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 5310 KB  
Article
Mitochondrial Phylogenomics of Tenthredinidae (Hymenoptera: Tenthredinoidea) Supports the Monophyly of Eriocampinae stat. nov.
by Siying Wan, Xiao Li, Beibei Tan, Meicai Wei and Gengyun Niu
Biology 2026, 15(2), 202; https://doi.org/10.3390/biology15020202 (registering DOI) - 22 Jan 2026
Abstract
Tenthredinidae (Hymenoptera: Tenthredinoidea), the most species-rich sawfly family, features a controversial subfamily classification, particularly regarding Eriocampa Hartig, 1837, Conaspidia Konow, 1898, and their relatives. In this study, we sequenced and characterized 15 complete mitochondrial genomes from Eriocampa, Eriocampopsis Takeuchi, 1952, and Conaspidia [...] Read more.
Tenthredinidae (Hymenoptera: Tenthredinoidea), the most species-rich sawfly family, features a controversial subfamily classification, particularly regarding Eriocampa Hartig, 1837, Conaspidia Konow, 1898, and their relatives. In this study, we sequenced and characterized 15 complete mitochondrial genomes from Eriocampa, Eriocampopsis Takeuchi, 1952, and Conaspidia, and reconstructed the phylogeny of Tenthredinidae using a mitogenomic dataset including 69 species from 16 subfamilies. The mitochondrial genomes of these genera exhibited genus-specific tRNA rearrangements within the IQM and ARNS1EF clusters. Phylogenetic analyses using both Maximum Likelihood and Bayesian Inference consistently recovered (Eriocampa + Eriocampopsis + Conaspidia) as a monophyletic lineage distinct from other subfamilies of Tenthredinidae. Divergence-time estimates indicate that the Eriocampa lineage diverged from other tenthredinids around the Late Cretaceous–Paleocene boundary (~70 Ma) and diversified during the Eocene. This timing coincides with the radiation of their host plants (Araliaceae, Betulaceae, and Juglandaceae). We also compared the morphology of Eriocampinae with that of other subfamilies of Tenthredinidae and summarized the diagnostic characters of Eriocampinae. Integrating morphological and mitogenomic evidence supports the recognition of Eriocampinae Rohwer, 1911 stat. nov. This study not only clarifies the phylogenetic position of these genera but also provides new insights into the coevolutionary history between sawflies and angiosperms. Full article
(This article belongs to the Special Issue Mitochondrial Genomics of Arthropods)
Show Figures

Figure 1

27 pages, 1031 KB  
Article
PMR-Q&A: Development of a Bilingual Expert-Evaluated Question–Answer Dataset for Large Language Models in Physical Medicine and Rehabilitation
by Muhammed Zahid Sahin, Fatma Betul Derdiyok, Serhan Ayberk Kilic, Kasim Serbest and Kemal Nas
Bioengineering 2026, 13(1), 125; https://doi.org/10.3390/bioengineering13010125 - 22 Jan 2026
Abstract
Objectives: This study presents the development of a bilingual, expert-evaluated question–answer (Q&A) dataset, named PMR-Q&A, designed for training large language models (LLMs) in the field of Physical Medicine and Rehabilitation (PMR). Methods: The dataset was created through a systematic and semi-automated [...] Read more.
Objectives: This study presents the development of a bilingual, expert-evaluated question–answer (Q&A) dataset, named PMR-Q&A, designed for training large language models (LLMs) in the field of Physical Medicine and Rehabilitation (PMR). Methods: The dataset was created through a systematic and semi-automated framework that converts unstructured scientific texts into structured Q&A pairs. Source materials included eight core reference books, 2310 academic publications, and 323 theses covering 15 disease categories commonly encountered in PMR clinical practice. Texts were digitized using layout-aware optical character recognition (OCR), semantically segmented, and distilled through a two-pass LLM strategy employing GPT-4.1 and GPT-4.1-mini models. Results: The resulting dataset consists of 143,712 bilingual Q&A pairs, each annotated with metadata including disease category, reference source, and keywords. A representative subset of 3000 Q&A pairs was extracted for expert validation to evaluate the dataset’s reliability and representativeness. Statistical analyses showed that the validation sample accurately reflected the thematic and linguistic structure of the full dataset, with an average score of 1.90. Conclusions: The PMR-Q&A dataset is a structured and expert-evaluated resource for developing and fine-tuning domain-specific large language models, supporting research and educational efforts in the field of physical medicine and rehabilitation. Full article
Show Figures

Figure 1

27 pages, 3763 KB  
Article
GO-PILL: A Geometry-Aware OCR Pipeline for Reliable Recognition of Debossed and Curved Pill Imprints
by Jaehyeon Jo, Sungan Yoon and Jeongho Cho
Mathematics 2026, 14(2), 356; https://doi.org/10.3390/math14020356 - 21 Jan 2026
Abstract
Manual pill identification is often inefficient and error-prone due to the large variety of medications and frequent visual similarity among pills, leading to misuse or dispensing errors. These challenges are exacerbated when pill imprints are engraved, curved, or irregularly arranged, conditions under which [...] Read more.
Manual pill identification is often inefficient and error-prone due to the large variety of medications and frequent visual similarity among pills, leading to misuse or dispensing errors. These challenges are exacerbated when pill imprints are engraved, curved, or irregularly arranged, conditions under which conventional optical character recognition (OCR)-based methods degrade significantly. To address this problem, we propose GO-PILL, a geometry-aware OCR pipeline for robust pill imprint recognition. The framework extracts text centerlines and imprint regions using the TextSnake algorithm. During imprint refinement, background noise is suppressed and contrast is enhanced to improve the visibility of embossed and debossed imprints. The imprint localization and alignment stage then rectifies curved or obliquely oriented text into a linear representation, producing geometrically normalized inputs suitable for OCR decoding. The refined imprints are processed by a multimodal OCR module that integrates a non-autoregressive language–vision fusion architecture for accurate character-level recognition. Experiments on a pill image dataset from the U.S. National Library of Medicine show that GO-PILL achieves an F1-score of 81.83% under set-based evaluation and a Top-10 pill identification accuracy of 76.52% in a simulated clinical scenario. GO-PILL consistently outperforms existing methods under challenging imprint conditions, demonstrating strong robustness and practical feasibility. Full article
(This article belongs to the Special Issue Applications of Deep Learning and Convolutional Neural Network)
Show Figures

Figure 1

23 pages, 1740 KB  
Article
Print Exposure Interaction with Neural Tuning on Letter/Non-Letter Processing During Literacy Acquisition: An ERP Study on Dyslexic and Typically Developing Children
by Elizaveta Galperina, Olga Kruchinina, Polina Boichenkova and Alexander Kornev
Languages 2026, 11(1), 15; https://doi.org/10.3390/languages11010015 - 14 Jan 2026
Viewed by 183
Abstract
Background/Objectives: The first step in learning an alphabetic writing system is to establish letter–sound associations. This process is more difficult for children with dyslexia (DYS) than for typically developing (TD) children. Cerebral mechanisms underlying these associations are not fully understood and are [...] Read more.
Background/Objectives: The first step in learning an alphabetic writing system is to establish letter–sound associations. This process is more difficult for children with dyslexia (DYS) than for typically developing (TD) children. Cerebral mechanisms underlying these associations are not fully understood and are expected to change during the training course. This study aimed to identify the neurophysiological correlates and developmental changes of visual letter processing in children with DYS compared to TD children, using event-related potentials (ERPs) during a letter/non-letter classification task. Methods: A total of 71 Russian-speaking children aged 7–11 years participated in the study, including 38 with dyslexia and 33 TD children. The participants were divided into younger (7–8 y.o.) and older (9–11 y.o.) subgroups. EEG recordings were taken while participants classified letters and non-letter characters. We analyzed ERP components (N/P150, N170, P260, P300, N320, and P600) in left-hemisphere regions of interest related to reading: the ventral occipito-temporal cortex (VWFA ROI) and the inferior frontal cortex (frontal ROI). Results: Behavioral differences, specifically lower accuracy in children with dyslexia, were observed only in the younger subgroup. ERP analysis indicated that both groups displayed common stimulus effects, such as a larger N170 for letters in younger children. However, their developmental trajectories diverged. The DYS group showed an age-related increase in the amplitude of early components (N/P150 in VWFA ROI), which contrasts with the typical decrease observed in TD children. In contrast, the late P600 component in the frontal ROI revealed an age-related decrease in the DYS group, along with overall reduced amplitudes compared to their TD peers. Additionally, the N320 component differentiated stimuli exclusively in the DYS group. Conclusions: The data obtained in this study confirmed that the mechanisms of letter recognition in children with dyslexia differ in some ways from those of their TD peers. This atypical developmental pattern involves a failure to efficiently specialize early visual processing, as evidenced by the increasing N/P150. Additionally, there is a progressive reduction in the cognitive resources available for higher-order reanalysis and control, indicated by the decreasing frontal P600. This disruption in neural specialization and automation ultimately hinders the development of fluent reading. Full article
Show Figures

Figure 1

17 pages, 289 KB  
Article
Transforming Historical Newspaper Research and Preservation Through AI: A Global Perspective
by Zhao Xun Song, Kwok Wai Cheung and Zi Yun Jia
Journal. Media 2026, 7(1), 10; https://doi.org/10.3390/journalmedia7010010 - 7 Jan 2026
Viewed by 422
Abstract
Artificial intelligence (AI) is transforming the preservation and research of historical newspapers by providing powerful tools that overcome longstanding challenges in terms of digitization, analysis, and access. This study offers a comprehensive global analysis of AI-driven innovations—including advanced Optical Character Recognition (OCR), Large [...] Read more.
Artificial intelligence (AI) is transforming the preservation and research of historical newspapers by providing powerful tools that overcome longstanding challenges in terms of digitization, analysis, and access. This study offers a comprehensive global analysis of AI-driven innovations—including advanced Optical Character Recognition (OCR), Large Language Models (LLMs) for post-correction, and Natural Language Processing (NLP) techniques—that significantly enhance text extraction, image restoration, metadata generation, and semantic enrichment. Through qualitative case studies and comparative examinations of projects worldwide, this research demonstrates how AI not only improves the accuracy and efficiency of preservation workflows but also enables novel forms of computational inquiry such as cross-lingual analysis, sentiment detection, and discourse tracking. This study further explores emerging ethical and practical challenges and outlines future directions like multimodal analysis and collaborative digital infrastructures. The findings underscore AI’s transformative role in unlocking historical newspaper archives for both scholarly and public use, thereby fostering a deeper understanding of cultural heritage and historical narratives on a global scale. Full article
17 pages, 3062 KB  
Article
Dynamic Multi-Parameter Sensing Technology for Ecological Flows Based on the Improved DSC-YOLOv8n Model
by Jun Yu, Yongsheng Li, Ting Wang, Peipei Zhang, Wenlong Jiang and Lei Xing
Water 2026, 18(2), 146; https://doi.org/10.3390/w18020146 - 6 Jan 2026
Viewed by 204
Abstract
Ecological flow management is important for maintaining ecosystem stability and promoting sustainable development. Dynamic ecological flow regulation depends on precise real-time monitoring of water levels and flow velocities. To address challenges in ecological flow monitoring, including maintenance difficulties and insufficient accuracy, an improved [...] Read more.
Ecological flow management is important for maintaining ecosystem stability and promoting sustainable development. Dynamic ecological flow regulation depends on precise real-time monitoring of water levels and flow velocities. To address challenges in ecological flow monitoring, including maintenance difficulties and insufficient accuracy, an improved DSC-YOLOv8n-seg model is proposed for dynamic multi-parameter sensing, achieving more efficient object detection and semantic segmentation. Compared with traditional affine transformation-edge detection, this approach enables joint recognition of water level lines and staff gauge characters, achieving an average recognition error of ±1.2 cm, with a model accuracy of 93.1%, recall rate of 94.5%, and mAP50:95 of 93.9%. A deep learning-based spectral principal direction recognition method was also employed to calculate the surface water flow velocity, which demonstrated stable and efficient performance, achieving a relative error of 0.005 m/s for the surface velocity. Experimental results confirm that it can effectively address issues such as environmental interference, exhibiting enhanced robustness in low-light and nighttime scenarios. The proposed method provides efficient and accurate identification for dynamic water level monitoring and for real-time detection of river surface flow velocities to improve ecological flow management. Full article
Show Figures

Figure 1

21 pages, 3379 KB  
Article
KORIE: A Multi-Task Benchmark for Detection, OCR, and Information Extraction on Korean Retail Receipts
by Mahmoud SalahEldin Kasem, Mohamed Mahmoud, Mostafa Farouk Senussi, Mahmoud Abdalla and Hyun Soo Kang
Mathematics 2026, 14(1), 187; https://doi.org/10.3390/math14010187 - 4 Jan 2026
Viewed by 723
Abstract
We introduce KORIE, a curated benchmark of 748 Korean retail receipts designed to evaluate scene text detection, Optical Character Recognition (OCR), and Information Extraction (IE) under challenging digitization conditions. Unlike existing large-scale repositories, KORIE consists exclusively of receipts digitized via flatbed scanning (HP [...] Read more.
We introduce KORIE, a curated benchmark of 748 Korean retail receipts designed to evaluate scene text detection, Optical Character Recognition (OCR), and Information Extraction (IE) under challenging digitization conditions. Unlike existing large-scale repositories, KORIE consists exclusively of receipts digitized via flatbed scanning (HP LaserJet MFP), specifically selected to preserve complex thermal printing artifacts such as ink fading, banding, and mechanical creases. We establish rigorous baselines across three tasks: (1) Detection, comparing Weakly Supervised Object Localization (WSOL) against state-of-the-art fully supervised models (YOLOv9, YOLOv10, YOLOv11, and DINO-DETR); (2) OCR, benchmarking Tesseract, EasyOCR, PaddleOCR, and a custom Attention-based BiGRU; and (3) Information Extraction, evaluating the zero-shot capabilities of Large Language Models (Llama-3, Qwen-2.5) on structured field parsing. Our results identify YOLOv11 as the optimal detector for dense receipt layouts and demonstrate that while PaddleOCR achieves the lowest Character Error Rate (15.84%), standard LLMs struggle in zero-shot settings due to domain mismatch with noisy Korean receipt text, particularly for price-related fields (F1 scores ≈ 25%). We release the dataset, splits, and evaluation code to facilitate reproducible research on degraded Hangul document understanding. Full article
Show Figures

Figure 1

30 pages, 8146 KB  
Article
LICS: Locating Inter-Character Spaces for Multilingual Scene Text Detection
by Po-Chyi Su, Meng-Chieh Lee, Yi-Ting Tung, Li-Zhu Chen, Chih-Hung Han and Tien-Ying Kuo
Sensors 2026, 26(1), 197; https://doi.org/10.3390/s26010197 - 27 Dec 2025
Viewed by 375
Abstract
Scene text detection in multilingual environments poses significant challenges. Traditional detection methods often struggle with language-specific features and require extensive annotated training data for each language, making them less practical for multilingual contexts. The diversity of character shapes, sizes, and orientations in natural [...] Read more.
Scene text detection in multilingual environments poses significant challenges. Traditional detection methods often struggle with language-specific features and require extensive annotated training data for each language, making them less practical for multilingual contexts. The diversity of character shapes, sizes, and orientations in natural scenes, along with text deformation and partial occlusions, further complicates the task of detection. This paper introduces LICS (Locating Inter-Character Spaces), a method that detects inter-character gaps as language-agnostic structural cues, enabling more feasible multilingual text detection. A two-stage approach is employed: first, we train on synthetic data with precise character gap annotations, and then apply weakly supervised learning to real-world datasets with word-level labels. The weakly supervised learning framework eliminates the need for character-level annotations in target languages, substantially reducing the annotation burden while maintaining robust performance. Experimental results on the ICDAR and Total-Text benchmarks demonstrate the strong performance of LICS, particularly on Asian scripts. We also introduce CSVT (Character-Labeled Street View Text), a new scene-text dataset comprising approximately 20,000 carefully annotated streetscape images. A set of standardized labeling principles is established to ensure consistent annotation of text locations, content, and language types. CSVT is expected to facilitate more advanced research and development in multilingual scene-text analysis. Full article
Show Figures

Figure 1

31 pages, 5478 KB  
Article
An Intelligent English-Speaking Training System Using Generative AI and Speech Recognition
by Ching-Ta Lu, Yen-Ju Chen, Tai-Ying Wu and Yen-Yu Lu
Appl. Sci. 2026, 16(1), 189; https://doi.org/10.3390/app16010189 - 24 Dec 2025
Viewed by 562
Abstract
English is the first foreign language most Taiwanese have encountered, yet few have achieved proficient speaking skills. This paper presents a generative AI-based English speaking training system designed to enhance oral proficiency through interactive AI agents. The system employs ChatGPT version 5.2 to [...] Read more.
English is the first foreign language most Taiwanese have encountered, yet few have achieved proficient speaking skills. This paper presents a generative AI-based English speaking training system designed to enhance oral proficiency through interactive AI agents. The system employs ChatGPT version 5.2 to generate diverse and tailored conversational scenarios, enabling learners to practice in contextually relevant situations. Spoken responses are captured via speech recognition and analyzed by a large language model, which provides intelligent scoring and personalized feedback to guide improvement. Learners can automatically generate scenario-based scripts according to their learning needs. The D-ID AI system then produces a virtual character of the AI agent, whose lip movements are synchronized with the conversation, thereby creating realistic video interactions. Learning with an AI agent, the system maintains controlled emotional expression, reduces communication anxiety, and helps learners adapt to non-native interaction, fostering more natural and confident speech production. Accordingly, the proposed system supports compelling, immersive, and personalized language learning. The experimental results indicate that repeated practice with the proposed system substantially improves English speaking proficiency. Full article
(This article belongs to the Section Applied Neuroscience and Neural Engineering)
Show Figures

Figure 1

20 pages, 3334 KB  
Article
The Development of Northern Thai Dialect Speech Recognition System
by Jakramate Bootkrajang, Papangkorn Inkeaw, Jeerayut Chaijaruwanich, Supawat Taerungruang, Adisorn Boonyawisit, Bak Jong Min Sutawong, Vataya Chunwijitra and Phimphaka Taninpong
Appl. Sci. 2026, 16(1), 160; https://doi.org/10.3390/app16010160 - 23 Dec 2025
Cited by 1 | Viewed by 429
Abstract
This study investigated the necessary ingredients for the development of an automatic speech recognition (ASR) system for the Northern Thai language. Building an ASR model for such an arguably low-resource language poses challenges both in terms of the quantity and the quality of [...] Read more.
This study investigated the necessary ingredients for the development of an automatic speech recognition (ASR) system for the Northern Thai language. Building an ASR model for such an arguably low-resource language poses challenges both in terms of the quantity and the quality of the corpus. The experimental results demonstrated that the current state-of-the-art deep neural network trained in an end-to-end manner, and pre-trained from a closely related language, such as Standard Thai, often outperformed its traditional HMM-based counterparts. The results also suggested that incorporating northern Thai-specific tonal information and augmenting the character-based end-to-end model with an n-gram language model further improves the recognition performance. Surprisingly, the quality of the transcription of the speech corpus was not found to positively correlate with the recognition performance in the case of the end-to-end system. The results show that the end-to-end ASR system was able to achieve the best word error rate (WER) of 0.94 on out-of-sample data. This is equivalent to 77.02% and 60.34% relative word error rate reduction over the 4.09 and 2.37 WERs of the traditional TDNN-HMM and the vanilla deep neural network baselines. Full article
(This article belongs to the Special Issue Speech Recognition and Natural Language Processing)
Show Figures

Figure 1

15 pages, 1308 KB  
Article
Evolution of Convolutional and Recurrent Artificial Neural Networks in the Context of BIM: Deep Insight and New Tool, Bimetria
by Andrzej Szymon Borkowski, Łukasz Kochański and Konrad Rukat
Infrastructures 2026, 11(1), 6; https://doi.org/10.3390/infrastructures11010006 - 22 Dec 2025
Viewed by 237
Abstract
This paper discusses the evolution of convolutional (CNN) and recurrent (RNN) artificial neural networks in applications for Building Information Modeling (BIM). The paper outlines the milestones reached in the last two decades. The article organizes the current state of knowledge and technology in [...] Read more.
This paper discusses the evolution of convolutional (CNN) and recurrent (RNN) artificial neural networks in applications for Building Information Modeling (BIM). The paper outlines the milestones reached in the last two decades. The article organizes the current state of knowledge and technology in terms of three aspects: (1) computer visualization coupled with BIM models (detection, segmentation, and quality verification in images, videos, and point clouds), (2) sequence and time series modeling (prediction of costs, energy, work progress, risk), and (3) integration of deep learning results with the semantics and topology of Industry Foundation Class (IFC) models. The paper identifies the most used architectures, typical data pipelines (synthetic data from BIM models, transfer learning, mapping results to IFC elements) and practical limitations: lack of standardized benchmarks, high annotation costs, a domain gap between synthetic and real data, and discontinuous interoperability. We indicate directions for development: combining CNN/RNN with graph models and transformers for wider use of synthetic data and semi-/supervised learning, as well as explainability methods that increase trust in AECOO (Architecture, Engineering, Construction, Owners & Operators) processes. A practical case study presents a new application, Bimetria, which uses a hybrid CNN/OCR (Optical Character Recognition) solution to generate 3D models with estimates based on two-dimensional drawings. A deep review shows that although the importance of attention-based and graph-based architectures is growing, CNNs and RNNs remain an important part of the BIM process, especially in engineering tasks, where, in our experience and in the Bimetria case study, mature convolutional architectures offer a good balance between accuracy, stability and low latency. The paper also raises some fundamental questions to which we are still seeking answers. Thus, the article not only presents the innovative new Bimetria tool but also aims to stimulate discussion about the dynamic development of AI (Artificial Intelligence) in BIM. Full article
(This article belongs to the Special Issue Modern Digital Technologies for the Built Environment of the Future)
Show Figures

Figure 1

14 pages, 17578 KB  
Article
A Two-Stage High-Precision Recognition and Localization Framework for Key Components on Industrial PCBs
by Li Wang, Liu Ouyang, Huiying Weng, Xiang Chen, Anna Wang and Kexin Zhang
Mathematics 2026, 14(1), 4; https://doi.org/10.3390/math14010004 - 19 Dec 2025
Viewed by 240
Abstract
Precise recognition and localization of electronic components on printed circuit boards (PCBs) are crucial for industrial automation tasks, including robotic disassembly, high-precision assembly, and quality inspection. However, strong visual interference from silkscreen characters, copper traces, solder pads, and densely packed small components often [...] Read more.
Precise recognition and localization of electronic components on printed circuit boards (PCBs) are crucial for industrial automation tasks, including robotic disassembly, high-precision assembly, and quality inspection. However, strong visual interference from silkscreen characters, copper traces, solder pads, and densely packed small components often degrades the accuracy of deep learning-based detectors, particularly under complex industrial imaging conditions. This paper presents a two-stage, coarse-to-fine PCB component localization framework based on an optimized YOLOv11 architecture and a sub-pixel geometric refinement module. The proposed method enhances the backbone with a Convolutional Block Attention Module (CBAM) to suppress background noise and strengthen discriminative features. It also integrates a tiny-object detection branch and a weighted Bi-directional Feature Pyramid Network (BiFPN) for more effective multi-scale feature fusion, and it employs a customized hybrid loss with vertex-offset supervision to enable pose-aware bounding box regression. In the second stage, the coarse predictions guide contour-based sub-pixel fitting using template geometry to achieve industrial-grade precision. Experiments show significant improvements over baseline YOLOv11, particularly for small and densely arranged components, indicating that the proposed approach meets the stringent requirements of industrial robotic disassembly. Full article
(This article belongs to the Special Issue Complex Process Modeling and Control Based on AI Technology)
Show Figures

Figure 1

19 pages, 742 KB  
Article
Image-Based Recognition of Children’s Handwritten Arabic Characters Using a Confidence-Weighted Stacking Ensemble
by Helala AlShehri
Sensors 2025, 25(24), 7671; https://doi.org/10.3390/s25247671 - 18 Dec 2025
Viewed by 412
Abstract
Recognizing handwritten Arabic characters written by children via scanned or camera-captured images is a challenging task due to variations in writing style, stroke irregularity, and diacritical marks. Although deep learning has advanced this field, building reliable systems remains challenging. This study introduces a [...] Read more.
Recognizing handwritten Arabic characters written by children via scanned or camera-captured images is a challenging task due to variations in writing style, stroke irregularity, and diacritical marks. Although deep learning has advanced this field, building reliable systems remains challenging. This study introduces a stacking ensemble framework for sensor-acquired handwriting data, enhanced with a dynamic confidence-thresholding mechanism designed to improve prediction reliability. The framework integrates three high-performing convolutional neural networks (ConvNeXtBase, DenseNet201, and VGG16) through a fully connected meta-learner. A key feature is the use of an optimized threshold that filters out uncertain predictions by maximizing the macro F1 score on validation data. The framework is evaluated on two benchmark datasets for children’s Arabic handwriting: Hijja and Dhad. The results demonstrate state-of-the-art performance, with an accuracy of 95.13% and F1 score of 94.62% on Hijja and an accuracy of 96.14% and F1 score of 95.59% on Dhad. Compared to existing methods, the proposed approach achieves more than a 3% improvement in Hijja accuracy while maintaining robust performance across diverse character classes. These findings highlight the effectiveness of confidence-based stacking ensembles in enhancing reliability for Arabic handwriting recognition and suggest strong potential for automated educational assessment tools and intelligent tutoring systems. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

18 pages, 5536 KB  
Article
Automated Particle Size Analysis of Supported Nanoparticle TEM Images Using a Pre-Trained SAM Model
by Xiukun Zhong, Guohong Liang, Lingbei Meng, Wei Xi, Lin Gu, Nana Tian, Yong Zhai, Yutong He, Yuqiong Huang, Fengmin Jin and Hong Gao
Nanomaterials 2025, 15(24), 1886; https://doi.org/10.3390/nano15241886 - 16 Dec 2025
Viewed by 634
Abstract
This study addresses the challenges associated with transmission electron microscopy (TEM) image analysis of supported nanoparticles, including low signal-to-noise ratio, poor contrast, and interference from complex substrate backgrounds. This study proposes an automated segmentation and particle size analysis method based on a large-scale [...] Read more.
This study addresses the challenges associated with transmission electron microscopy (TEM) image analysis of supported nanoparticles, including low signal-to-noise ratio, poor contrast, and interference from complex substrate backgrounds. This study proposes an automated segmentation and particle size analysis method based on a large-scale deep learning model, namely segment anything model (SAM). Using Ru/TiO2 and related materials as representative systems, the pretrained SAM is employed for zero-shot segmentation of nanoparticles, which is further integrated with a custom image processing pipeline, including optical character recognition (OCR) module, morphological optimization, and connected component analysis to achieve high-precision particle size quantification. Experimental results demonstrate that the method retains robust performance under challenging imaging conditions, with a size estimation error between 3% and 5% and a per-image processing time under 1 min, significantly outperforming traditional manual annotation and threshold-based segmentation approaches. This framework provides an efficient and reliable analytical tool for morphological characterization and structure–performance correlation studies in supported nanocatalysts. Full article
(This article belongs to the Section Theory and Simulation of Nanostructures)
Show Figures

Figure 1

21 pages, 1406 KB  
Article
Receipt Information Extraction with Joint Multi-Modal Transformer and Rule-Based Model
by Xandru Mifsud, Leander Grech, Adriana Baldacchino, Léa Keller, Gianluca Valentino and Adrian Muscat
Mach. Learn. Knowl. Extr. 2025, 7(4), 167; https://doi.org/10.3390/make7040167 - 16 Dec 2025
Viewed by 1069
Abstract
A receipt information extraction task requires both textual and spatial analyses. Early receipt analysis systems primarily relied on template matching to extract data from spatially structured documents. However, these methods lack generalizability across various document layouts and require defining the specific spatial characteristics [...] Read more.
A receipt information extraction task requires both textual and spatial analyses. Early receipt analysis systems primarily relied on template matching to extract data from spatially structured documents. However, these methods lack generalizability across various document layouts and require defining the specific spatial characteristics of unseen document sources. The advent of convolutional and recurrent neural networks has led to models that generalize better over unseen document layouts, and more recently, multi-modal transformer-based models, which consider a combination of text, visual, and layout inputs, have led to an even more significant boost in document-understanding capabilities. This work focuses on the joint use of a neural multi-modal transformer and a rule-based model and studies whether this combination achieves higher performance levels than the transformer on its own. A comprehensively annotated dataset, comprising real-world and synthetic receipts, was specifically developed for this study. The open source optical character recognition model DocTR was used to textually scan receipts and, together with an image, provided input to the classifier model. The open-source pre-trained LayoutLMv3 transformer-based model was augmented with a classifier model head, which was trained for classifying textual data into 12 predefined labels, such as date, price, and shop name. The methods implemented in the rule-based model were manually designed and consisted of four types: pattern-matching rules based on regular expressions and logic, database search-based methods for named entities, spatial pattern discovery guided by statistical metrics, and error correcting mechanisms based on confidence scores and local distance metrics. Following hyperparameter tuning of the classifier head and the integration of a rule-based model, the system achieved an overall F1 score of 0.98 in classifying textual data, including line items, from receipts. Full article
Show Figures

Figure 1

Back to TopTop