Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (539)

Search Parameters:
Keywords = pre-annotation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 1406 KB  
Article
Receipt Information Extraction with Joint Multi-Modal Transformer and Rule-Based Model
by Xandru Mifsud, Leander Grech, Adriana Baldacchino, Léa Keller, Gianluca Valentino and Adrian Muscat
Mach. Learn. Knowl. Extr. 2025, 7(4), 167; https://doi.org/10.3390/make7040167 - 16 Dec 2025
Abstract
A receipt information extraction task requires both textual and spatial analyses. Early receipt analysis systems primarily relied on template matching to extract data from spatially structured documents. However, these methods lack generalizability across various document layouts and require defining the specific spatial characteristics [...] Read more.
A receipt information extraction task requires both textual and spatial analyses. Early receipt analysis systems primarily relied on template matching to extract data from spatially structured documents. However, these methods lack generalizability across various document layouts and require defining the specific spatial characteristics of unseen document sources. The advent of convolutional and recurrent neural networks has led to models that generalize better over unseen document layouts, and more recently, multi-modal transformer-based models, which consider a combination of text, visual, and layout inputs, have led to an even more significant boost in document-understanding capabilities. This work focuses on the joint use of a neural multi-modal transformer and a rule-based model and studies whether this combination achieves higher performance levels than the transformer on its own. A comprehensively annotated dataset, comprising real-world and synthetic receipts, was specifically developed for this study. The open source optical character recognition model DocTR was used to textually scan receipts and, together with an image, provided input to the classifier model. The open-source pre-trained LayoutLMv3 transformer-based model was augmented with a classifier model head, which was trained for classifying textual data into 12 predefined labels, such as date, price, and shop name. The methods implemented in the rule-based model were manually designed and consisted of four types: pattern-matching rules based on regular expressions and logic, database search-based methods for named entities, spatial pattern discovery guided by statistical metrics, and error correcting mechanisms based on confidence scores and local distance metrics. Following hyperparameter tuning of the classifier head and the integration of a rule-based model, the system achieved an overall F1 score of 0.98 in classifying textual data, including line items, from receipts. Full article
Show Figures

Figure 1

37 pages, 8656 KB  
Article
Anomaly-Aware Graph-Based Semi-Supervised Deep Support Vector Data Description for Anomaly Detection
by Taha J. Alhindi
Mathematics 2025, 13(24), 3987; https://doi.org/10.3390/math13243987 - 14 Dec 2025
Viewed by 96
Abstract
Anomaly detection in safety-critical systems often operates under severe label constraints, where only a small subset of normal and anomalous samples can be reliably annotated, while large unlabeled data streams are contaminated and high-dimensional. Deep one-class methods, such as deep support vector data [...] Read more.
Anomaly detection in safety-critical systems often operates under severe label constraints, where only a small subset of normal and anomalous samples can be reliably annotated, while large unlabeled data streams are contaminated and high-dimensional. Deep one-class methods, such as deep support vector data description (DeepSVDD) and deep semi-supervised anomaly detection (DeepSAD), address this setting. However, they treat samples largely in isolation and do not explicitly leverage the manifold structure of unlabeled data, which can limit robustness and interpretability. This paper proposes Anomaly-Aware Graph-based Semi-Supervised Deep Support Vector Data Description (AAG-DSVDD), a boundary-focused deep one-class approach that couples a DeepSAD-style hypersphere with a label-aware latent k-nearest neighbor (k-NN) graph. The method combines a soft-boundary enclosure for labeled normals, a margin-based push-out for labeled anomalies, an unlabeled center-pull, and a k-NN graph regularizer on the squared distances to the center. The resulting graph term propagates information from scarce labels along the latent manifold, aligns anomaly scores of neighboring samples, and supports sample-level interpretability through graph neighborhoods, while test-time scoring remains a single distance-to-center computation. On a controlled two-dimensional synthetic dataset, AAG-DSVDD achieves a mean F1-score of 0.88±0.02 across ten random splits, improving on the strongest baseline by about 0.12 absolute F1. On three public benchmark datasets (Thyroid, Arrhythmia, and Heart), AAG-DSVDD attains the highest F1 on all datasets with F1-scores of 0.719, 0.675, and 0.8, respectively, compared to all baselines. In a multi-sensor fire monitoring case study, AAG-DSVDD reduces the average absolute error in fire starting time to approximately 473 s (about 30% improvement over DeepSAD) while keeping the average pre-fire false-alarm rate below 1% and avoiding persistent pre-fire alarms. These results indicate that graph-regularized deep one-class boundaries offer an effective and interpretable framework for semi-supervised anomaly detection under realistic label budgets. Full article
Show Figures

Figure 1

17 pages, 2380 KB  
Article
Utilizing Geoparsing for Mapping Natural Hazards in Europe
by Tinglei Yu, Xuezhen Zhang and Jun Yin
Water 2025, 17(24), 3520; https://doi.org/10.3390/w17243520 - 12 Dec 2025
Viewed by 210
Abstract
Natural hazards exert a detrimental influence on human survival, environmental conditions and society. Historical hazard events have generated a broad corpus of literature addressing the spatiotemporal extent, dissemination or social responses. With regard to quantitative analysis based on information locked within verbose text, [...] Read more.
Natural hazards exert a detrimental influence on human survival, environmental conditions and society. Historical hazard events have generated a broad corpus of literature addressing the spatiotemporal extent, dissemination or social responses. With regard to quantitative analysis based on information locked within verbose text, the release of such information from the narrative format is encouraging. Natural Language Processing (NLP), a technique demonstrated to be capable of automated data extraction, provides a useful tool in establishing a structured dataset on hazard occurrences. In our study, we utilize scattered textual records of historical natural hazard events to create a novel dataset and explore the applicability of NLP in parallel. We put forward a standard list of toponyms based on manual annotation of a compilation of disaster-related texts, all of which were references in an authoritative publication in the field. The final natural hazards dataset comprised location data, which referred to a specific hazard report in Europe during 1301–1500, together with its geocoding result, year of occurrence and detailed event(s). We evaluated the performance of four pre-trained geoparsing tools (Flair, Stanford CoreNLP, spaCy and Irchel Geoparser) for automated toponym extraction in comparion with the standard list. All four tested methods showed a high precision (above 0.99). Flair had the best overall performance (F1 score 0.89), followed by Stanford CoreNLP (F1 score 0.83) and Irchel Geoparser (F1 score 0.82), while spaCy had a poor recall (0.5). Then we divided natural hazards into six categories: extreme heat, snow and ice, wind and hails, rainstorms and floods, droughts, and earthquakes. Finally, we compared our newly digitized natural hazard dataset to a geocoded version of the dataset provided by Harvard University, thus providing a comprehensive overview of the spatial–temporal characteristics of European hazard observations. The statistical outcomes of the present investigation demonstrate the efficacy of NLP techniques in text information extraction and hazard dataset generation, offering references for collaborative and interdisciplinary efforts. Full article
Show Figures

Figure 1

19 pages, 2659 KB  
Article
A Structure-Aware Masked Autoencoder for Sparse Character Image Recognition
by Cheng Luo, Wenhong Wang, Junhang Mai, Tianwei Mu, Shuo Guo and Mingzhe Yuan
Electronics 2025, 14(24), 4886; https://doi.org/10.3390/electronics14244886 - 12 Dec 2025
Viewed by 233
Abstract
Conventional vehicle character recognition methods often treat detection and recognition as separate processes, resulting in limited feature interaction and potential error propagation. To address this issue, this paper proposes a structure-aware self-supervised Masked Autoencoder (CharSAM-MAE) framework, combined with an independent region extraction preprocessing [...] Read more.
Conventional vehicle character recognition methods often treat detection and recognition as separate processes, resulting in limited feature interaction and potential error propagation. To address this issue, this paper proposes a structure-aware self-supervised Masked Autoencoder (CharSAM-MAE) framework, combined with an independent region extraction preprocessing stage. A YOLOv8n detector is employed solely to crop the region of interest (ROI) from full-frame vehicle images using 50 single bounding-box annotated samples. After cropping, the detector is discarded, and subsequent self-supervised pre-training and recognition are fully executed using MAE without any involvement of YOLO model parameters or labeled data. CharSAM-MAE incorporates a structure-aware masking strategy and a region-weighted reconstruction loss during pre-training to improve both local structural representation and global feature modeling. During fine-tuning, a multi-head attention-enhanced CTC decoder (A-CTC) is applied to mitigate issues such as sparse characters, adhesion, and long-sequence instability. The framework is trained on 13,544 ROI images, with only 5% of labeled data used for supervised fine-tuning. Experimental results demonstrate that the proposed method achieves 99.25% character accuracy, 88.6% sequence accuracy, and 0.85% character error rate, outperforming the PaddleOCR v5 baseline (98.92%, 85.2%, and 1.15%, respectively). These results verify the effectiveness of structure-aware self-supervised learning and highlight the applicability of the proposed method for industrial character recognition with minimal annotation requirements. Full article
(This article belongs to the Section Electrical and Autonomous Vehicles)
Show Figures

Figure 1

23 pages, 2222 KB  
Article
Fine-Tuning Generative AI with Domain Question Banks: Evaluating Multi-Type Question Generation and Grading
by Chien-Hung Lai, You-Jen Chen and Ze-Ping Chen
Appl. Sci. 2025, 15(24), 13050; https://doi.org/10.3390/app152413050 - 11 Dec 2025
Viewed by 161
Abstract
This study examines the effectiveness of a fine-tuned generative AI system—trained with a domain question bank—for question generation and automated grading in programming education, and evaluates its instructional usability. Methodologically, we constructed an annotated question bank covering nine item types and, under a [...] Read more.
This study examines the effectiveness of a fine-tuned generative AI system—trained with a domain question bank—for question generation and automated grading in programming education, and evaluates its instructional usability. Methodologically, we constructed an annotated question bank covering nine item types and, under a controlled environment, compared pre- and post-fine-tuning performance on question-type recognition and answer grading using Accuracy, Macro Precision, Macro Recall, and Macro F1. We also collected student questionnaires and open-ended feedback to analyze subjective user experience. Results indicate that the accuracy of question-type recognition improved from 0.6477 to 0.8409, while grading accuracy increased from 0.9474 to 0.9605. Students’ subjective perceptions aligned with these quantitative trends, reporting higher ratings for grading accuracy and question generation quality; overall interactive experience was moderately high, though system speed still requires improvement. These findings provide course-aligned empirical evidence that fine-tuning with domain data can jointly enhance the effectiveness and usability of both automatic question generation and automated grading. Full article
Show Figures

Figure 1

28 pages, 8872 KB  
Article
Development and Application of an Intelligent Recognition System for Polar Environmental Targets Based on the YOLO Algorithm
by Jun Jian, Zhongying Wu, Kai Sun, Jiawei Guo and Ronglin Gao
J. Mar. Sci. Eng. 2025, 13(12), 2313; https://doi.org/10.3390/jmse13122313 - 5 Dec 2025
Viewed by 266
Abstract
As global climate warming enhances the navigability of Arctic routes, their navigation value has become prominent, yet ships operating in ice-covered waters face severe threats from sea ice and icebergs. Existing manual observation and radar monitoring remain limited, highlighting an urgent need for [...] Read more.
As global climate warming enhances the navigability of Arctic routes, their navigation value has become prominent, yet ships operating in ice-covered waters face severe threats from sea ice and icebergs. Existing manual observation and radar monitoring remain limited, highlighting an urgent need for efficient target recognition technology. This study focuses on polar environmental target detection by constructing a polar dataset with 1342 JPG images covering four classes, including sea ice, icebergs, ice channels, and ships, obtained via web collection and video frame extraction. The “Grounding DINO pre-annotation + LabelImg manual fine-tuning” strategy is employed to improve annotation efficiency and accuracy, with data augmentation further enhancing dataset diversity. After comparing YOLOv5n, YOLOv8n, and YOLOv11n, YOLOv8n is selected as the baseline model and improved by introducing the CBAM/SE attention mechanism, SCConv/AKConv convolutions, and BiFPN network. Among these models, the improved YOLOv8n + SCConv achieves the best in polar target detection, with a mean average precision (mAP) of 0.844–1.4% higher than the original model. It effectively reduces missed detections of sea ice and icebergs, thereby enhancing adaptability to complex polar environments. The experimental results demonstrate that the improved model exhibits good robustness in images of varying resolutions, scenes with water surface reflections, and AI-generated images. In addition, a visual GUI with image/video detection functions was developed to support real-time monitoring and result visualization. This research provides essential technical support for safe navigation in ice-covered waters, polar resource exploration, and scientific activities. Full article
Show Figures

Figure 1

26 pages, 6470 KB  
Article
Impact of Synthetic Data on Deep Learning Models for Earth Observation: Photovoltaic Panel Detection Case Study
by Enes Hisam, Jesus Gimeno, David Miraut, Manolo Pérez-Aixendri, Marcos Fernández, Rossana Gini, Raúl Rodríguez, Gabriele Meoni and Dursun Zafer Seker
ISPRS Int. J. Geo-Inf. 2025, 14(12), 481; https://doi.org/10.3390/ijgi14120481 - 4 Dec 2025
Viewed by 499
Abstract
This study explores the impact of synthetic data, both physically based and generatively created, on deep learning analytics for earth observation (EO), focusing on the detection of photovoltaic panels. A YOLOv8 object detection model was trained using a publicly available, multi-resolution very high [...] Read more.
This study explores the impact of synthetic data, both physically based and generatively created, on deep learning analytics for earth observation (EO), focusing on the detection of photovoltaic panels. A YOLOv8 object detection model was trained using a publicly available, multi-resolution very high resolution (VHR) EO dataset (0.8 m, 0.3 m, and 0.1 m), comprising 3716 images from various locations in Jiangsu Province, China. Three benchmarks were established using only real EO data. Subsequent experiments evaluated how the inclusion of synthetic data, in varying types and quantities, influenced the model’s ability to detect photovoltaic panels in VHR imagery. Physically based synthetic images were generated using the Unity engine, which allowed the generation of a wide range of realistic scenes by varying scene parameters automatically. This approach produced not only realistic RGB images but also semantic segmentation maps and pixel-accurate masks identifying photovoltaic panel locations. Generative synthetic data were created using diffusion-based models (DALL·E 3 and Stable Diffusion XL), guided by prompts to simulate satellite-like imagery containing solar panels. All synthetic images were manually reviewed, and corresponding annotations were ensured to be consistent with the real dataset. Integrating synthetic with real data generally improved model performance, with the best results achieved when both data types were combined. Performance gains were dependent on data distribution and volume, with the most significant improvements observed when synthetic data were used to meet the YOLOv8-recommended minimum of 1500 images per class. In this setting, combining real data with both physically based and generative synthetic data yielded improvements of 1.7% in precision, 3.9% in recall, 2.3% in mAP@50, and 3.3% in mAP@95 compared to training with real data alone. The study also emphasizes the importance of carefully managing the inclusion of synthetic data in training and validation phases to avoid overfitting to synthetic features, with the goal of enhancing generalization to real-world data. Additionally, a pre-training experiment using only synthetic data, followed by fine-tuning with real images, demonstrated improved early-stage training performance, particularly during the first five epochs, highlighting potential benefits in computationally constrained environments. Full article
(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications)
Show Figures

Figure 1

36 pages, 22245 KB  
Article
CMSNet: A SAM-Enhanced CNN–Mamba Framework for Damaged Building Change Detection in Remote Sensing Imagery
by Jianli Zhang, Liwei Tao, Wenbo Wei, Pengfei Ma and Mengdi Shi
Remote Sens. 2025, 17(23), 3913; https://doi.org/10.3390/rs17233913 - 3 Dec 2025
Viewed by 481
Abstract
In war and explosion scenarios, buildings often suffer varying degrees of damage characterized by complex, irregular, and fragmented spatial patterns, posing significant challenges for remote sensing–based change detection. Additionally, the scarcity of high-quality datasets limits the development and generalization of deep learning approaches. [...] Read more.
In war and explosion scenarios, buildings often suffer varying degrees of damage characterized by complex, irregular, and fragmented spatial patterns, posing significant challenges for remote sensing–based change detection. Additionally, the scarcity of high-quality datasets limits the development and generalization of deep learning approaches. To overcome these issues, we propose CMSNet, an end-to-end framework that integrates the structural priors of the Segment Anything Model (SAM) with the efficient temporal modeling and fine-grained representation capabilities of CNN–Mamba. Specifically, CMSNet adopts CNN–Mamba as the backbone to extract multi-scale semantic features from bi-temporal images, while SAM-derived visual priors guide the network to focus on building boundaries and structural variations. A Pre-trained Visual Prior-Guided Feature Fusion Module (PVPF-FM) is introduced to align and fuse these priors with change features, enhancing robustness against local damage, non-rigid deformations, and complex background interference. Furthermore, we construct a new RWSBD (Real-world War Scene Building Damage) dataset based on Gaza war scenes, comprising 42,732 annotated building damage instances across diverse scales, offering a strong benchmark for real-world scenarios. Extensive experiments on RWSBD and three public datasets (CWBD, WHU-CD, and LEVIR-CD+) demonstrate that CMSNet consistently outperforms eight state-of-the-art methods in both quantitative metrics (F1, IoU, Precision, Recall) and qualitative evaluations, especially in fine-grained boundary preservation, small-scale change detection, and complex scene adaptability. Overall, this work introduces a novel detection framework that combines foundation model priors with efficient change modeling, along with a new large-scale war damage dataset, contributing valuable advances to both research and practical applications in remote sensing change detection. Additionally, the strong generalization ability and efficient architecture of CMSNet highlight its potential for scalable deployment and practical use in large-area post-disaster assessment. Full article
Show Figures

Figure 1

14 pages, 838 KB  
Article
Leveraging LLMs for User Rating Prediction from Textual Reviews: A Hospitality Data Annotation Case Study
by Patricia Nnanna, Olasoji Amujo, Chinedu Pascal Ezenkwu and Ebuka Ibeke
Information 2025, 16(12), 1059; https://doi.org/10.3390/info16121059 - 2 Dec 2025
Viewed by 310
Abstract
The proliferation of user-generated content in today’s digital landscape has further increased dependence on online reviews as a source for decision-making in the hospitality industry. There has been an increasing interest in automating this decision-support mechanism through recommender systems. However, this process often [...] Read more.
The proliferation of user-generated content in today’s digital landscape has further increased dependence on online reviews as a source for decision-making in the hospitality industry. There has been an increasing interest in automating this decision-support mechanism through recommender systems. However, this process often requires a large amount of labelled corpus to train an effective algorithm, necessitating the use of human annotators for developing training data, where this is lacking. Although the manual annotation can be helpful in enriching the training corpus, it can, on the one hand, introduce errors and annotator bias, including subjectivity and cultural bias, which can affect the quality of the data and fairness in the model. This paper examines the alignment of ratings derived from different annotation sources and the original ratings provided by customers, which are treated as the ground truth. The paper compares the predictions from Generative Pre-trained Transformer (GPT) models against ratings assigned by Amazon Mechanical Turk (MTurk) workers. The GPT 4o annotation outputs closely mirror the original ratings, given its strong positive correlation (0.703) with the latter. The GPT-3.5 Turbo and MTurk showed weaker correlations (0.663 and 0.15, respectively) than GPT 4o. The potential cause of the large difference between original ratings and MTurk (largely driven by human perception) lies in the inherent challenges of subjectivity, quantitative bias, and variability in context comprehension. These findings suggest that the use of advanced models such as GPT-4o can significantly reduce the potential bias and variability introduced by Amazon MTurk annotators, thus improving the prediction accuracy of ratings with actual user sentiment as expressed in textual reviews. Moreover, with the per-annotation cost of an LLM shown to be thirty times cheaper than MTurk, our proposed LLM-based textual review annotation approach will be cost-effective for the hospitality industry. Full article
Show Figures

Figure 1

23 pages, 2450 KB  
Article
DAFT: Domain-Augmented Fine-Tuning for Large Language Models in Emotion Recognition of Health Misinformation
by Youlin Zhao, Xingmi Zhu, Wanqing Tang, Linxing Zhou, Li Feng and Mingwei Tang
Appl. Sci. 2025, 15(23), 12690; https://doi.org/10.3390/app152312690 - 29 Nov 2025
Viewed by 243
Abstract
This study proposes a domain-augmented fine-tuning strategy for improving emotion recognition in health misinformation using pre-trained large language models (LLMs). The proposed method aims to address key limitations of existing approaches, including insufficient precision, weak domain adaptability, and low recognition accuracy for complex [...] Read more.
This study proposes a domain-augmented fine-tuning strategy for improving emotion recognition in health misinformation using pre-trained large language models (LLMs). The proposed method aims to address key limitations of existing approaches, including insufficient precision, weak domain adaptability, and low recognition accuracy for complex emotional expressions in health-related misinformation. Specifically, the Domain-Augmented Fine-Tuning (DAFT) method extends a health emotion lexicon to annotate emotion-oriented corpora, designs task-specific prompt templates to enhance semantic understanding, and fine-tunes GPT-based LLMs through parameter-efficient prompt tuning. Empirical experiments conducted on a health misinformation dataset demonstrate that DAFT substantially improves model performance in terms of prediction error, emotional vector structural similarity, probability distribution consistency, and classification accuracy. The fine-tuned GPT-4o model achieves the best overall performance, attaining an emotion recognition accuracy of 84.77%, with its F1-score increasing by 20.78% relative to the baseline model. Nonetheless, the corpus constructed in this study is based on a six-dimensional emotion framework, which may not fully capture nuanced emotions in complex linguistic contexts. Moreover, the dataset is limited to textual information, and future research should incorporate multimodal data such as images and videos. Overall, the DAFT method effectively enhances the domain adaptability of LLMs and provides a lightweight yet efficient approach to emotion recognition in health misinformation scenarios. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

14 pages, 3673 KB  
Article
IMAGO: An Improved Model Based on Attention Mechanism for Enhanced Protein Function Prediction
by Meiling Liu, Longchang Liang, Qiutong Wang, Yunmeng Zhang, Lin Shi, Tianjiao Zhang and Zhenxing Wang
Biomolecules 2025, 15(12), 1667; https://doi.org/10.3390/biom15121667 - 29 Nov 2025
Viewed by 286
Abstract
Protein function prediction plays an important role in the field of biology. With the wide application of deep learning in the field of bioinformatics, more and more natural language processing (NLP) technologies are applied to the downstream tasks in the field of bioinformatics, [...] Read more.
Protein function prediction plays an important role in the field of biology. With the wide application of deep learning in the field of bioinformatics, more and more natural language processing (NLP) technologies are applied to the downstream tasks in the field of bioinformatics, and it has also shown excellent performance in protein function prediction. Protein-protein interaction (PPI) networks and other biological attributes contain rich information critical for annotating protein functions. However, existing deep learning networks still suffer from overfitting and noise issues, resulting in low accuracy in protein function prediction. Consequently, developing efficient models for protein function prediction remains a popular and challenging topic in the application of NLP in bioinformatics. In this study, we propose a novel protein function prediction model based on attention mechanisms, termed IMAGO. This model employs the Transformer pre-training process, integrating multi-head attention mechanisms and regularization techniques, and optimizes the loss function to effectively reduce overfitting and noise issues during training. It generates more robust embeddings, ultimately improving the accuracy of protein function prediction. Experimental results on human and mouse datasets indicate that our model surpasses other protein function prediction models across multiple metrics. Thus, this efficient, stable, and accurate deep learning model holds significant promise for protein function prediction. Full article
(This article belongs to the Section Bioinformatics and Systems Biology)
Show Figures

Figure 1

17 pages, 1552 KB  
Article
Adaptive Pseudo Text Augmentation for Noise-Robust Text-to-Image Person Re-Identification
by Lian Xiong, Wangdong Li, Huaixin Chen and Yuxi Feng
Sensors 2025, 25(23), 7157; https://doi.org/10.3390/s25237157 - 24 Nov 2025
Viewed by 323
Abstract
Text-to-image person re-identification (T2I-ReID) aims to retrieve pedestrians from images/videos based on textual descriptions. However, most methods implicitly assume that training image–text pairs are correctly aligned, while in practice, issues such as under-correlated and falsely correlated image–text pairs arise due to coarse-grained text [...] Read more.
Text-to-image person re-identification (T2I-ReID) aims to retrieve pedestrians from images/videos based on textual descriptions. However, most methods implicitly assume that training image–text pairs are correctly aligned, while in practice, issues such as under-correlated and falsely correlated image–text pairs arise due to coarse-grained text annotations and erroneous textual descriptions. To address this problem, we propose a T2I-ReID method based on noise identification and pseudo-text generation. We first extracts image–text features using the Contrastive Language–Image Pre-Training model (CLIP), then employs the token fusion model to select and fuse informative local token features, resulting in token fusion embedding (TFE) for fine-grained representations. To identify noisy image–text pairs, we apply the two-component Gaussian mixture model (GMM) to fit the per-sample loss distributions computed by the predictions of basic feature embedding (BFE) and TFE. Finally, when the noise identification tends to stabilize, we employ a multimodal large language model (MLLM) to generate pseudo-texts that replace the noisy text, facilitating learning more reliable visual–semantic associations and cross-modal alignment under noisy conditions. Extensive experiments on the CUHK-PEDES, ICFG-PEDES, and RSTPReid datasets demonstrate the effectiveness of our proposed model and the good compatibility with other baselines. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

26 pages, 477 KB  
Article
MTSA-CG: Mongolian Text Sentiment Analysis Based on ConvBERT and Graph Attention Network
by Qingdaoerji Ren, Qihui Wang, Ying Lu, Yatu Ji and Nier Wu
Electronics 2025, 14(23), 4581; https://doi.org/10.3390/electronics14234581 - 23 Nov 2025
Viewed by 336
Abstract
In Mongolian Text Sentiment Analysis (MTSA), the scarcity of annotated sentiment datasets and the insufficient consideration of syntactic dependency and topological structural information pose significant challenges to accurately capturing semantics and effectively extracting emotional features. To address these issues, this paper proposes a [...] Read more.
In Mongolian Text Sentiment Analysis (MTSA), the scarcity of annotated sentiment datasets and the insufficient consideration of syntactic dependency and topological structural information pose significant challenges to accurately capturing semantics and effectively extracting emotional features. To address these issues, this paper proposes a Mongolian Text Sentiment Analysis model based on ConvBERT and Graph Attention Network (MTSA-CG). Firstly, the ConvBERT pre-trained model is employed to extract textual features under limited data conditions, aiming to mitigate the shortcomings caused by data scarcity. Concurrently, textual data are transformed into graph-structured data, integrating co-occurrence, dependency, and similarity information into a Graph Attention Network (GAT) to capture syntactic and structural cues, enabling a deeper understanding of semantic and emotional connotations for more precise sentiment classification. The proposed multi-graph fusion strategy employs a hierarchical attention mechanism that dynamically weights different graph types based on their semantic relevance, distinguishing it from conventional graph aggregation methods. Experimental results demonstrate that, in comparison with various advanced baseline models, the proposed method significantly enhances the accuracy of MTSA. Full article
Show Figures

Figure 1

12 pages, 7963 KB  
Data Descriptor
SurfaceEMG Datasets for Hand Gesture Recognition Under Constant and Three-Level Force Conditions
by Cinthya Alejandra Zúñiga-Castillo, Víctor Alejandro Anaya-Mosqueda, Natalia Margarita Rendón-Caballero, Marcos Aviles, José M. Álvarez-Alvarado, Roberto Augusto Gómez-Loenzo and Juvenal Rodríguez-Reséndiz
Data 2025, 10(12), 194; https://doi.org/10.3390/data10120194 - 22 Nov 2025
Viewed by 588
Abstract
This work introduces two complementary surface electromyography (sEMG) datasets for hand gesture recognition. Signals were collected from 40 healthy subjects aged 18 to 40 years, divided into two independent groups of 20 participants each. In both datasets, subjects performed five hand gestures. Most [...] Read more.
This work introduces two complementary surface electromyography (sEMG) datasets for hand gesture recognition. Signals were collected from 40 healthy subjects aged 18 to 40 years, divided into two independent groups of 20 participants each. In both datasets, subjects performed five hand gestures. Most of the gestures are the same, although the exact set and the order differ slightly between datasets. For example, Dataset 2 (DS2) includes the simultaneous flexion of the thumb and index finger, which is not present in Dataset 1 (DS1). Data were recorded with three bipolar sEMG sensors placed on the dominant forearm (flexor digitorum superficialis, extensor digitorum, and flexor pollicis longus). A battery-powered acquisition system was used, with sampling rates of 1000 Hz for DS1 and 1500 Hz for DS2. DS1 contains recordings performed at a constant moderate force, while DS2 includes three force levels (low, medium, and high). Both datasets provide raw signals and pre-processed versions segmented into overlapping windows, with clear file structures and annotations, enabling feature extraction for machine learning applications. Together, they constitute a large-scale standardized sEMG resource that supports the development and benchmarking of gesture and force recognition algorithms for rehabilitation, assistive technologies, and prosthetic control. Full article
Show Figures

Figure 1

19 pages, 9959 KB  
Article
Viola–Jones Algorithm in a Bioindicative Holographic Experiment with Daphnia magna Population
by Victor Dyomin, Mickhail Kurkov, Vladimir Kalaida, Igor Polovtsev and Alexandra Davydova
Appl. Sci. 2025, 15(22), 12193; https://doi.org/10.3390/app152212193 - 17 Nov 2025
Viewed by 215
Abstract
This study considers the applicability and effectiveness of the Viola–Jones method to automatically distinguish zooplankton particles from the background in images reconstructed from digital holograms obtained in natural conditions. For the first time, this algorithm is applied to holographic images containing coherent noise [...] Read more.
This study considers the applicability and effectiveness of the Viola–Jones method to automatically distinguish zooplankton particles from the background in images reconstructed from digital holograms obtained in natural conditions. For the first time, this algorithm is applied to holographic images containing coherent noise and residual defocusing. The method was trained on 880 annotated (marked) holographic images of Daphnia magna along with 120 background frames. It was then tested on independent laboratory and field datasets, including morphologically related taxa. With optimized settings, the precision of the algorithm reached ~90% and F1~85% on noisy holographic images, and the algorithm also demonstrated the preliminary ability to recognize similar taxa without retraining. The algorithm is well suited for analyzing holographic data as a fast and resource-efficient pre-filter—it effectively separates particles from the background and thereby allows subsequent classification or its application in real-time aquatic environment monitoring systems. The article presents experimental results demonstrating the efficiency of this algorithm during plankton monitoring in situ. Full article
(This article belongs to the Section Marine Science and Engineering)
Show Figures

Figure 1

Back to TopTop