Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (604)

Search Parameters:
Keywords = character recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 552 KB  
Article
Trust in Stories: A Reader Response Study of (Un)Reliability in Akutagawa’s “In a Grove”
by Inge van de Ven
Literature 2025, 5(4), 24; https://doi.org/10.3390/literature5040024 - 30 Sep 2025
Abstract
For this article, we reviewed and synthesized narratological theories on reliability and unreliability and used them as the basis for an exploratory study, examining how real readers respond to a literary short story that contains several unreliable or conflicting narrative accounts. The story [...] Read more.
For this article, we reviewed and synthesized narratological theories on reliability and unreliability and used them as the basis for an exploratory study, examining how real readers respond to a literary short story that contains several unreliable or conflicting narrative accounts. The story we selected is “In a Grove” by Ryūnosuke Akutagawa (orig. 藪の中/Yabu no naka) from 1922 in the English translation by Jay Rubin from 2007. To investigate how readers evaluate trustworthiness in narrative contexts, we combined quantitative and qualitative methods. We analyzed correlations between reading habits (i.e., Author Recognition Test), cognitive traits (e.g., Need for Cognition; Epistemic Trust), and trust attributions to characters while also examining how narrative sequencing and character-specific reasons for (dis)trust shaped participants’ judgments. This mixed-methods approach allows us to situate narrative trust as a context-sensitive, interpretive process rather than a stable individual disposition. Full article
(This article belongs to the Special Issue Literary Experiments with Cognition)
Show Figures

Figure 1

14 pages, 3652 KB  
Article
Enhancing Mobility for the Blind: An AI-Powered Bus Route Recognition System
by Shehzaib Shafique, Gian Luca Bailo, Monica Gori, Giulio Sciortino and Alessio Del Bue
Algorithms 2025, 18(10), 616; https://doi.org/10.3390/a18100616 - 30 Sep 2025
Abstract
Vision is a critical component of daily life, and its loss significantly hinders an individual’s ability to navigate, particularly when using public transportation systems. To address this challenge, this paper introduces a novel approach for accurately identifying bus route numbers and destinations, designed [...] Read more.
Vision is a critical component of daily life, and its loss significantly hinders an individual’s ability to navigate, particularly when using public transportation systems. To address this challenge, this paper introduces a novel approach for accurately identifying bus route numbers and destinations, designed to assist visually impaired individuals in navigating urban transit networks. Our system integrates object detection, image enhancement, and Optical Character Recognition (OCR) technologies to achieve reliable and precise recognition of bus information. We employ a custom-trained You Only Look Once version 8 (YOLOv8) model to isolate the front portion of buses as the region of interest (ROI), effectively eliminating irrelevant text and advertisements that often lead to errors. To further enhance accuracy, we utilize the Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) to improve image resolution, significantly boosting the confidence of the OCR process. Additionally, a post-processing step involving a pre-defined list of bus routes and the Levenshtein algorithm corrects potential errors in text recognition, ensuring reliable identification of bus numbers and destinations. Tested on a dataset of 120 images featuring diverse bus routes and challenging conditions such as poor lighting, reflections, and motion blur, our system achieved an accuracy rate of 95%. This performance surpasses existing methods and demonstrates the system’s potential for real-world application. By providing a robust and adaptable solution, our work aims to enhance public transit accessibility, empowering visually impaired individuals to navigate cities with greater independence and confidence. Full article
(This article belongs to the Section Combinatorial Optimization, Graph, and Network Algorithms)
Show Figures

Figure 1

22 pages, 365 KB  
Article
Development of a Fully Autonomous Offline Assistive System for Visually Impaired Individuals: A Privacy-First Approach
by Fitsum Yebeka Mekonnen, Mohammad F. Al Bataineh, Dana Abu Abdoun, Ahmed Serag, Kena Teshale Tamiru, Winner Abula and Simon Darota
Sensors 2025, 25(19), 6006; https://doi.org/10.3390/s25196006 - 29 Sep 2025
Abstract
Visual impairment affects millions worldwide, creating significant barriers to environmental interaction and independence. Existing assistive technologies often rely on cloud-based processing, raising privacy concerns and limiting accessibility in resource-constrained environments. This paper explores the integration and potential of open-source AI models in developing [...] Read more.
Visual impairment affects millions worldwide, creating significant barriers to environmental interaction and independence. Existing assistive technologies often rely on cloud-based processing, raising privacy concerns and limiting accessibility in resource-constrained environments. This paper explores the integration and potential of open-source AI models in developing a fully offline assistive system that can be locally set up and operated to support visually impaired individuals. Built on a Raspberry Pi 5, the system combines real-time object detection (YOLOv8), optical character recognition (Tesseract), face recognition with voice-guided registration, and offline voice command control (VOSK), delivering hands-free multimodal interaction without dependence on cloud infrastructure. Audio feedback is generated using Piper for real-time environmental awareness. Designed to prioritize user privacy, low latency, and affordability, the platform demonstrates that effective assistive functionality can be achieved using only open-source tools on low-power edge hardware. Evaluation results in controlled conditions show 75–90% detection and recognition accuracies, with sub-second response times, confirming the feasibility of deploying such systems in privacy-sensitive or resource-constrained environments. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

19 pages, 537 KB  
Article
Tracking the Impact of Age and Dimensional Shifts on Situation Model Updating During Narrative Text Comprehension
by César Campos-Rojas and Romualdo Ibáñez-Orellana
J. Eye Mov. Res. 2025, 18(5), 48; https://doi.org/10.3390/jemr18050048 - 26 Sep 2025
Abstract
Studies on the relationship between age and situation model updating during narrative text reading have mainly used response or reading times. This study enhances previous measures (working memory, recognition probes, and comprehension) by incorporating eye-tracking techniques to compare situation model updating between young [...] Read more.
Studies on the relationship between age and situation model updating during narrative text reading have mainly used response or reading times. This study enhances previous measures (working memory, recognition probes, and comprehension) by incorporating eye-tracking techniques to compare situation model updating between young and older Chilean adults. The study included 82 participants (40 older adults and 42 young adults) who read two narrative texts under three conditions (no shift, spatial shift, and character shift) using a between-subject (age) and within-subject (dimensional change) design. The results show that, while differences in working memory capacity were observed between the groups, these differences did not impact situation model comprehension. Younger adults performed better in recognition tests regardless of updating conditions. Eye-tracking data showed increased fixation times for dimensional shifts and longer reading times in older adults, with no interaction between age and dimensional shifts. Full article
Show Figures

Figure 1

22 pages, 1250 KB  
Article
Entity Span Suffix Classification for Nested Chinese Named Entity Recognition
by Jianfeng Deng, Ruitong Zhao, Wei Ye and Suhong Zheng
Information 2025, 16(10), 822; https://doi.org/10.3390/info16100822 - 23 Sep 2025
Viewed by 183
Abstract
Named entity recognition (NER) is one of the fundamental tasks in building knowledge graphs. For some domain-specific corpora, the text descriptions exhibit limited standardization, and some entity structures have entity nesting. The existing entity recognition methods have problems such as word matching noise [...] Read more.
Named entity recognition (NER) is one of the fundamental tasks in building knowledge graphs. For some domain-specific corpora, the text descriptions exhibit limited standardization, and some entity structures have entity nesting. The existing entity recognition methods have problems such as word matching noise interference and difficulty in distinguishing different entity labels for the same character in sequence label prediction. This paper proposes a span-based feature reuse stacked bidirectional long short term memory network (BiLSTM) nested named entity recognition (SFRSN) model, which transforms the entity recognition of sequence prediction into the problem of entity span suffix category classification. Firstly, character feature embedding is generated through bidirectional encoder representation of transformers (BERT). Secondly, a feature reuse stacked BiLSTM is proposed to obtain deep context features while alleviating the problem of deep network degradation. Thirdly, the span feature is obtained through the dilated convolution neural network (DCNN), and at the same time, a single-tail selection function is introduced to obtain the classification feature of the entity span suffix, with the aim of reducing the training parameters. Fourthly, a global feature gated attention mechanism is proposed, integrating span features and span suffix classification features to achieve span suffix classification. The experimental results on four Chinese-specific domain datasets demonstrate the effectiveness of our approach: SFRSN achieves micro-F1 scores of 83.34% on ontonotes, 73.27% on weibo, 96.90% on resume, and 86.77% on the supply chain management dataset. This represents a maximum improvement of 1.55%, 4.94%, 2.48%, and 3.47% over state-of-the-art baselines, respectively. The experimental results demonstrate the effectiveness of the model in addressing nested entities and entity label ambiguity issues. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Graphical abstract

18 pages, 1694 KB  
Article
FAIR-Net: A Fuzzy Autoencoder and Interpretable Rule-Based Network for Ancient Chinese Character Recognition
by Yanling Ge, Yunmeng Zhang and Seok-Beom Roh
Sensors 2025, 25(18), 5928; https://doi.org/10.3390/s25185928 - 22 Sep 2025
Viewed by 150
Abstract
Ancient Chinese scripts—including oracle bone carvings, bronze inscriptions, stone steles, Dunhuang scrolls, and bamboo slips—are rich in historical value but often degraded due to centuries of erosion, damage, and stylistic variability. These issues severely hinder manual transcription and render conventional OCR techniques inadequate, [...] Read more.
Ancient Chinese scripts—including oracle bone carvings, bronze inscriptions, stone steles, Dunhuang scrolls, and bamboo slips—are rich in historical value but often degraded due to centuries of erosion, damage, and stylistic variability. These issues severely hinder manual transcription and render conventional OCR techniques inadequate, as they are typically trained on modern printed or handwritten text and lack interpretability. To tackle these challenges, we propose FAIR-Net, a hybrid architecture that combines the unsupervised feature learning capacity of a deep autoencoder with the semantic transparency of a fuzzy rule-based classifier. In FAIR-Net, the deep autoencoder first compresses high-resolution character images into low-dimensional, noise-robust embeddings. These embeddings are then passed into a Fuzzy Neural Network (FNN), whose hidden layer leverages Fuzzy C-Means (FCM) clustering to model soft membership degrees and generate human-readable fuzzy rules. The output layer uses Iteratively Reweighted Least Squares Estimation (IRLSE) combined with a Softmax function to produce probabilistic predictions, with all weights constrained as linear mappings to maintain model transparency. We evaluate FAIR-Net on CASIA-HWDB1.0, HWDB1.1, and ICDAR 2013 CompetitionDB, where it achieves a recognition accuracy of 97.91%, significantly outperforming baseline CNNs (p < 0.01, Cohen’s d > 0.8) while maintaining the tightest confidence interval (96.88–98.94%) and lowest standard deviation (±1.03%). Additionally, FAIR-Net reduces inference time to 25 s, improving processing efficiency by 41.9% over AlexNet and up to 98.9% over CNN-Fujitsu, while preserving >97.5% accuracy across evaluations. To further assess generalization to historical scripts, FAIR-Net was tested on the Ancient Chinese Character Dataset (9233 classes; 979,907 images), achieving 83.25% accuracy—slightly higher than ResNet101 but 2.49% lower than SwinT-v2-small—while reducing training time by over 5.5× compared to transformer-based baselines. Fuzzy rule visualization confirms enhanced robustness to glyph ambiguities and erosion. Overall, FAIR-Net provides a practical, interpretable, and highly efficient solution for the digitization and preservation of ancient Chinese character corpora. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

16 pages, 261 KB  
Article
Naming as Narrative Strategy: Semiotic Inversion and Cultural Authenticity in Yemeni Television Drama
by Elham Alzain and Faiz Algobaei
Genealogy 2025, 9(3), 99; https://doi.org/10.3390/genealogy9030099 - 17 Sep 2025
Viewed by 370
Abstract
This study investigates the semiotic and cultural functions of character naming in the Yemeni television series Duroob al-Marjalah (Branching Paths of Manhood) (2024–2025). It applies onomastic theory and Barthesian semiotics to examine how Yemeni screenwriters employ names as narrative and ideological tools. A [...] Read more.
This study investigates the semiotic and cultural functions of character naming in the Yemeni television series Duroob al-Marjalah (Branching Paths of Manhood) (2024–2025). It applies onomastic theory and Barthesian semiotics to examine how Yemeni screenwriters employ names as narrative and ideological tools. A purposive sample of ten central characters was selected from a Yemeni drama series for qualitative analysis. Each name was examined for linguistic structure, semantic meaning, intertextual associations, and socio-cultural alignment. Semiotic interpretation followed Barthes’ signifier–signified–myth model to decode narrative and cultural symbolism. The findings indicate that character names function as multifaceted semiotic tools, conveying heritage, while occasionally employing stylization for satire or fostering empathy through cultural resonance. However, many lack grounding in Yemeni naming conventions, creating a tension between narrative dramatization and socio-onomastic realism. The results suggest that while Yemeni screenwriters show partial awareness of naming as a cultural and narrative tool, the creative process often privileges thematic resonance over ethnographic accuracy. This research contributes to onomastic theory, Arabic media studies, and semiotic analysis by evidencing how localized naming practices—or their absence—shape identity construction, world-building, and cultural recognition in regional television drama. Full article
11 pages, 1005 KB  
Proceeding Paper
Multimodal Fusion for Enhanced Human–Computer Interaction
by Ajay Sharma, Isha Batra, Shamneesh Sharma and Anggy Pradiftha Junfithrana
Eng. Proc. 2025, 107(1), 81; https://doi.org/10.3390/engproc2025107081 - 10 Sep 2025
Viewed by 362
Abstract
Our paper introduces a novel idea of a virtual mouse character driven by gesture detection, eye-tracking, and voice monitoring. This system uses cutting-edge computer vision and machine learning technology to let users command and control the mouse pointer using eye motions, voice commands, [...] Read more.
Our paper introduces a novel idea of a virtual mouse character driven by gesture detection, eye-tracking, and voice monitoring. This system uses cutting-edge computer vision and machine learning technology to let users command and control the mouse pointer using eye motions, voice commands, or hand gestures. This system’s main goal is to provide users who want a more natural, hands-free approach to interacting with their computers as well as those with impairments that limit their bodily motions, such as those with paralysis—with an easy and engaging interface. The system improves accessibility and usability by combining many input modalities, therefore providing a flexible answer for numerous users. While the speech recognition function permits hands-free operation via voice instructions, the eye-tracking component detects and responds to the user’s gaze, therefore providing exact cursor control. Gesture recognition enhances these features even further by letting users use their hands simply to execute mouse operations. This technology not only enhances personal user experience for people with impairments but also marks a major development in human–computer interaction. It shows how computer vision and machine learning may be used to provide more inclusive and flexible user interfaces, therefore improving the accessibility and efficiency of computer usage for everyone. Full article
Show Figures

Figure 1

25 pages, 4660 KB  
Article
Dual-Stream Former: A Dual-Branch Transformer Architecture for Visual Speech Recognition
by Sanghun Jeon, Jieun Lee and Yong-Ju Lee
AI 2025, 6(9), 222; https://doi.org/10.3390/ai6090222 - 9 Sep 2025
Viewed by 877
Abstract
This study proposes Dual-Stream Former, a novel architecture that integrates a Video Swin Transformer and Conformer designed to address the challenges of visual speech recognition (VSR). The model captures spatiotemporal dependencies, achieving a state-of-the-art character error rate (CER) of 3.46%, surpassing traditional convolutional [...] Read more.
This study proposes Dual-Stream Former, a novel architecture that integrates a Video Swin Transformer and Conformer designed to address the challenges of visual speech recognition (VSR). The model captures spatiotemporal dependencies, achieving a state-of-the-art character error rate (CER) of 3.46%, surpassing traditional convolutional neural network (CNN)-based models, such as 3D-CNN + DenseNet-121 (CER: 5.31%), and transformer-based alternatives, such as vision transformers (CER: 4.05%). The Video Swin Transformer captures multiscale spatial representations with high computational efficiency, whereas the Conformer back-end enhances temporal modeling across diverse phoneme categories. Evaluation of a high-resolution dataset comprising 740,000 utterances across 185 classes highlighted the effectiveness of the model in addressing visually confusing phonemes, such as diphthongs (/ai/, /au/) and labio-dental sounds (/f/, /v/). Dual-Stream Former achieved phoneme recognition error rates of 10.39% for diphthongs and 9.25% for labiodental sounds, surpassing those of CNN-based architectures by more than 6%. Although the model’s large parameter count (168.6 M) poses resource challenges, its hierarchical design ensures scalability. Future work will explore lightweight adaptations and multimodal extensions to increase deployment feasibility. These findings underscore the transformative potential of Dual-Stream Former for advancing VSR applications such as silent communication and assistive technologies by achieving unparalleled precision and robustness in diverse settings. Full article
Show Figures

Figure 1

29 pages, 5213 KB  
Article
Design and Implementation of a Novel Intelligent Remote Calibration System Based on Edge Intelligence
by Quan Wang, Jiliang Fu, Xia Han, Xiaodong Yin, Jun Zhang, Xin Qi and Xuerui Zhang
Symmetry 2025, 17(9), 1434; https://doi.org/10.3390/sym17091434 - 3 Sep 2025
Viewed by 580
Abstract
Calibration of power equipment has become an essential task in modern power systems. This paper proposes a distributed remote calibration prototype based on a cloud–edge–end architecture by integrating intelligent sensing, Internet of Things (IoT) communication, and edge computing technologies. The prototype employs a [...] Read more.
Calibration of power equipment has become an essential task in modern power systems. This paper proposes a distributed remote calibration prototype based on a cloud–edge–end architecture by integrating intelligent sensing, Internet of Things (IoT) communication, and edge computing technologies. The prototype employs a high-precision frequency-to-voltage conversion module leveraging satellite signals to address traceability and value transmission challenges in remote calibration, thereby ensuring reliability and stability throughout the process. Additionally, an environmental monitoring module tracks parameters such as temperature, humidity, and electromagnetic interference. Combined with video surveillance and optical character recognition (OCR), this enables intelligent, end-to-end recording and automated data extraction during calibration. Furthermore, a cloud-edge task scheduling algorithm is implemented to offload computational tasks to edge nodes, maximizing resource utilization within the cloud–edge collaborative system and enhancing service quality. The proposed prototype extends existing cloud–edge collaboration frameworks by incorporating calibration instruments and sensing devices into the network, thereby improving the intelligence and accuracy of remote calibration across multiple layers. Furthermore, this approach facilitates synchronized communication and calibration operations across symmetrically deployed remote facilities and reference devices, providing solid technical support to ensure that measurement equipment meets the required precision and performance criteria. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

12 pages, 304 KB  
Article
LoRA-INT8 Whisper: A Low-Cost Cantonese Speech Recognition Framework for Edge Devices
by Lusheng Zhang, Shie Wu and Zhongxun Wang
Sensors 2025, 25(17), 5404; https://doi.org/10.3390/s25175404 - 1 Sep 2025
Viewed by 887
Abstract
To address the triple bottlenecks of data scarcity, oversized models, and slow inference that hinder Cantonese automatic speech recognition (ASR) in low-resource and edge-deployment settings, this study proposes a cost-effective Cantonese ASR system based on LoRA fine-tuning and INT8 quantization. First, Whisper-tiny is [...] Read more.
To address the triple bottlenecks of data scarcity, oversized models, and slow inference that hinder Cantonese automatic speech recognition (ASR) in low-resource and edge-deployment settings, this study proposes a cost-effective Cantonese ASR system based on LoRA fine-tuning and INT8 quantization. First, Whisper-tiny is parameter-efficiently fine-tuned on the Common Voice zh-HK training set using LoRA with rank = 8. Only 1.6% of the original weights are updated, reducing the character error rate (CER) from 49.5% to 11.1%, a performance close to full fine-tuning (10.3%), while cutting the training memory footprint and computational cost by approximately one order of magnitude. Next, the fine-tuned model is compressed into a 60 MB INT8 checkpoint via dynamic quantization in ONNX Runtime. On a MacBook Pro M1 Max CPU, the quantized model achieves an RTF = 0.20 (offline inference 5 × real-time) and 43% lower latency than the FP16 baseline; on an NVIDIA A10 GPU, it reaches RTF = 0.06, meeting the requirements of high-concurrency cloud services. Ablation studies confirm that the LoRA-INT8 configuration offers the best trade-off among accuracy, speed, and model size. Limitations include the absence of spontaneous-speech noise data, extreme-hardware validation, and adaptive LoRA structure optimization. Future work will incorporate large-scale self-supervised pre-training, tone-aware loss functions, AdaLoRA architecture search, and INT4/NPU quantization, and will establish an mJ/char energy–accuracy curve. The ultimate goal is to achieve CER ≤ 8%, RTF < 0.1, and mJ/char < 1 for low-power real-time Cantonese ASR in practical IoT scenarios. Full article
(This article belongs to the Section Electronic Sensors)
Show Figures

Figure 1

23 pages, 1233 KB  
Article
Decoding the Digits: How Number Notation Influences Cognitive Effort and Performance in Chinese-to-English Sight Translation
by Xueyan Zong, Lei Song and Shanshan Yang
Behav. Sci. 2025, 15(9), 1195; https://doi.org/10.3390/bs15091195 - 1 Sep 2025
Viewed by 474
Abstract
Numbers present persistent challenges in interpreting, yet cognitive mechanisms underlying notation-specific processing remain underexplored. While eye-tracking studies in visually-assisted simultaneous interpreting have advanced number research, they predominantly examine Arabic numerals in non-Chinese contexts—neglecting notation diversity increasingly prevalent in computer-assisted interpreting systems where Automatic [...] Read more.
Numbers present persistent challenges in interpreting, yet cognitive mechanisms underlying notation-specific processing remain underexplored. While eye-tracking studies in visually-assisted simultaneous interpreting have advanced number research, they predominantly examine Arabic numerals in non-Chinese contexts—neglecting notation diversity increasingly prevalent in computer-assisted interpreting systems where Automatic Speech Recognition outputs vary across languages. Addressing these gaps, this study investigated how number notation (Arabic digits vs. Chinese character numbers) affects trainee interpreters’ cognitive effort and performance in Chinese-to-English sight translation. Employing a mixed-methods design, we measured global (task-level) and local (number-specific) eye movements alongside expert assessments, output analysis, and subjective assessments. Results show that Chinese character numbers demand significantly greater cognitive effort than Arabic digits, evidenced by more and longer fixations, more extensive saccadic movements, and a larger eye-voice span. Concurrently, sight translation quality decreased markedly with Chinese character numbers, with more processing attempts yet lower accuracy and fluency. Subjective workload ratings confirmed higher mental, physical, and temporal demands in Task 2. These findings reveal an effort-quality paradox where greater cognitive investment in processing complex notations leads to poorer outcomes, and highlight the urgent need for notation-specific training strategies and adaptive technologies in multilingual communication. Full article
(This article belongs to the Section Cognition)
Show Figures

Figure 1

18 pages, 2884 KB  
Article
Research on Multi-Path Feature Fusion Manchu Recognition Based on Swin Transformer
by Yu Zhou, Mingyan Li, Hang Yu, Jinchi Yu, Mingchen Sun and Dadong Wang
Symmetry 2025, 17(9), 1408; https://doi.org/10.3390/sym17091408 - 29 Aug 2025
Viewed by 438
Abstract
Recognizing Manchu words can be challenging due to their complex character variations, subtle differences between similar characters, and homographic polysemy. Most studies rely on character segmentation techniques for character recognition or use convolutional neural networks (CNNs) to encode word images for word recognition. [...] Read more.
Recognizing Manchu words can be challenging due to their complex character variations, subtle differences between similar characters, and homographic polysemy. Most studies rely on character segmentation techniques for character recognition or use convolutional neural networks (CNNs) to encode word images for word recognition. However, these methods can lead to segmentation errors or a loss of semantic information, which reduces the accuracy of word recognition. To address the limitations in the long-range dependency modeling of CNNs and enhance semantic coherence, we propose a hybrid architecture to fuse the spatial features of original images and spectral features. Specifically, we first leverage the Short-Time Fourier Transform (STFT) to preprocess the raw input images and thereby obtain their multi-view spectral features. Then, we leverage a primary CNN block and a pair of symmetric CNN blocks to construct a symmetric spectral enhancement module, which is used to encode the raw input features and the multi-view spectral features. Subsequently, we design a feature fusion module via Swin Transformer to fuse multi-view spectral embedding and thereby concat it with the raw input embedding. Finally, we leverage a Transformer decoder to obtain the target output. We conducted extensive experiments on Manchu words benchmark datasets to evaluate the effectiveness of our proposed framework. The experimental results demonstrated that our framework performs robustly in word recognition tasks and exhibits excellent generalization capabilities. Additionally, our model outperformed other baseline methods in multiple writing-style font-recognition tasks. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

22 pages, 3691 KB  
Article
Graph Convolutional Network with Agent Attention for Recognizing Digital Ink Chinese Characters Written by International Students
by Huafen Xu and Xiwen Zhang
Information 2025, 16(9), 729; https://doi.org/10.3390/info16090729 - 25 Aug 2025
Viewed by 501
Abstract
Digital ink Chinese characters (DICCs) written by international students often contain various errors and irregularities, making the recognition of these characters a highly challenging pattern recognition problem. This paper designs a graph convolutional network with agent attention (GCNAA) for recognizing DICCs written by [...] Read more.
Digital ink Chinese characters (DICCs) written by international students often contain various errors and irregularities, making the recognition of these characters a highly challenging pattern recognition problem. This paper designs a graph convolutional network with agent attention (GCNAA) for recognizing DICCs written by international students. Each sampling point is treated as a vertex in a graph, with connections between adjacent sampling points within the same stroke serving as edges to create a Chinese character graph structure. The GCNAA is used to process the data of the Chinese character graph structure, implemented by stacking Block modules. In each Block module, the graph agent attention module not only models the global context between graph nodes but also reduces computational complexity, shortens training time, and accelerates inference speed. The graph convolution block module models the local adjacency structure of the graph by aggregating local geometric information from neighboring nodes, while graph pooling is employed to learn multi-resolution features. Finally, the Softmax function is used to generate prediction results. Experiments conducted on public datasets such as CASIA-OLWHDB1.0-1.2, SCUT-COUCH2009 GB1&GB2, and HIT-OR3C-ONLINE demonstrate that the GCNAA performs well even on large-category datasets, showing strong generalization ability and robustness. The recognition accuracy for DICCs written by international students reaches 98.7%. Accurate and efficient handwritten Chinese character recognition technology can provide a solid technical foundation for computer-assisted Chinese character writing for international students, thereby promoting the development of international Chinese character education. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

27 pages, 33283 KB  
Article
A Structure-Aware and Condition-Constrained Algorithm for Text Recognition in Power Cabinets
by Yang Liu, Shilun Li and Liang Zhang
Electronics 2025, 14(16), 3315; https://doi.org/10.3390/electronics14163315 - 20 Aug 2025
Viewed by 482
Abstract
Power cabinet OCR enables real-time grid monitoring but faces challenges absent in generic text recognition: 7.5:1 scale variation between labels and readings, tabular layouts with semantic dependencies, and electrical constraints (220 V ± 10%). We propose SACC (Structure-Aware and Condition-Constrained), an end-to-end framework [...] Read more.
Power cabinet OCR enables real-time grid monitoring but faces challenges absent in generic text recognition: 7.5:1 scale variation between labels and readings, tabular layouts with semantic dependencies, and electrical constraints (220 V ± 10%). We propose SACC (Structure-Aware and Condition-Constrained), an end-to-end framework integrating structural perception with domain constraints. SACC comprises (1) MAF-Detector with adaptive dilated convolutions (r{1,3,5}) for multi-scale text; (2) SA-ViT, combining Vision Transformer with GCN for tabular structure modeling; and (3) DCDecoder, enforcing real-time electrical constraints during decoding. Extensive experiments demonstrate SACC’s effectiveness: achieving 86.5%, 88.3%, and 83.4% character accuracy on PCSTD, YUVA EB, and ICDAR 2015 datasets, respectively, with consistent improvements over leading methods. Ablation studies confirm synergistic improvements: MAF-Detector increases recall by 12.3SACC provides a field-deployable solution achieving 30.3 ms inference on RTX 3090. The co-design of structural analysis with differentiable constraints establishes a framework for domain-specific OCR in industrial and medical applications. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Back to TopTop