Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (750)

Search Parameters:
Keywords = manual data labeling

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
32 pages, 9105 KB  
Article
Development of Semi-Automatic Dental Image Segmentation Workflows with Root Canal Recognition for Faster Ground Tooth Acquisition
by Yousef Abo El Ela and Mohamed Badran
J. Imaging 2025, 11(10), 340; https://doi.org/10.3390/jimaging11100340 - 1 Oct 2025
Abstract
This paper investigates the application of image segmentation techniques in endodontics, focusing on improving diagnostic accuracy and achieving faster segmentation by delineating specific dental regions such as teeth and root canals. Deep learning architectures, notably 3D U-Net and GANs, have advanced the image [...] Read more.
This paper investigates the application of image segmentation techniques in endodontics, focusing on improving diagnostic accuracy and achieving faster segmentation by delineating specific dental regions such as teeth and root canals. Deep learning architectures, notably 3D U-Net and GANs, have advanced the image segmentation process for dental structures, supporting more precise dental procedures. However, challenges like the demand for extensive labeled datasets and ensuring model generalizability remain. Two semi-automatic segmentation workflows, Grow From Seeds (GFS) and Watershed (WS), were developed to provide quicker acquisition of ground truth training data for deep learning models using 3D Slicer software version 5.8.1. These workflows were evaluated against a manual segmentation benchmark and a recent dental segmentation automated tool on three separate datasets. The evaluations were performed by the overall shapes of a maxillary central incisor and a maxillary second molar and by the region of the root canal of both teeth. Results from Kruskal–Wallis and Nemenyi tests indicated that the semi-automated workflows, more often than not, were not statistically different from the manual benchmark based on dice coefficient similarity, while the automated method consistently provided significantly different 3D models from their manual counterparts. The study also explores the benefits of labor reduction and time savings achieved by the semi-automated methods. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

10 pages, 2446 KB  
Data Descriptor
A Multi-Class Labeled Ionospheric Dataset for Machine Learning Anomaly Detection
by Aleksandra Kolarski, Filip Arnaut, Sreten Jevremović, Zoran R. Mijić and Vladimir A. Srećković
Data 2025, 10(10), 157; https://doi.org/10.3390/data10100157 - 30 Sep 2025
Abstract
The binary anomaly detection (classification) of ionospheric data related to Very Low Frequency (VLF) signal amplitude in prior research demonstrated the potential for development and further advancement. Further data quality improvement is integral for advancing the development of machine learning (ML)-based ionospheric data [...] Read more.
The binary anomaly detection (classification) of ionospheric data related to Very Low Frequency (VLF) signal amplitude in prior research demonstrated the potential for development and further advancement. Further data quality improvement is integral for advancing the development of machine learning (ML)-based ionospheric data (VLF signal amplitude) anomaly detection. This paper presents the transition from binary to multi-class classification of ionospheric signal amplitude datasets. The dataset comprises 19 transmitter–receiver pairs and 383,041 manually labeled amplitude instances. The target variable was reclassified from a binary classification (normal and anomalous data points) to a six-class classification that distinguishes between daytime undisturbed signals, nighttime signals, solar flare effects, instrument errors, instrumental noise, and outlier data points. Furthermore, in addition to the dataset, we developed a freely accessible web-based tool designed to facilitate the conversion of MATLAB data files to TRAINSET-compatible formats, thereby establishing a completely free and open data pipeline from the WALDO world data repository to data labeling software. This novel dataset facilitates further research in ionospheric signal amplitude anomaly detection, concentrating on effective and efficient anomaly detection in ionospheric signal amplitude data. The potential outcomes of employing anomaly detection techniques on ionospheric signal amplitude data may be extended to other space weather parameters in the future, such as ELF/LF datasets and other relevant datasets. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

25 pages, 2110 KB  
Article
A Robust Semi-Supervised Brain Tumor MRI Classification Network for Data-Constrained Clinical Environments
by Subhash Chand Gupta, Vandana Bhattacharjee, Shripal Vijayvargiya, Partha Sarathi Bishnu, Raushan Oraon and Rajendra Majhi
Diagnostics 2025, 15(19), 2485; https://doi.org/10.3390/diagnostics15192485 - 28 Sep 2025
Abstract
Background: The accurate classification of brain tumor subtypes from MRI scans is critical for timely diagnosis, yet the manual annotation of large datasets remains prohibitively labor-intensive. Method: We present SSPLNet (Semi-Supervised Pseudo-Labeling Network), a dual-branch deep learning framework that synergizes confidence-guided iterative pseudo-labelling [...] Read more.
Background: The accurate classification of brain tumor subtypes from MRI scans is critical for timely diagnosis, yet the manual annotation of large datasets remains prohibitively labor-intensive. Method: We present SSPLNet (Semi-Supervised Pseudo-Labeling Network), a dual-branch deep learning framework that synergizes confidence-guided iterative pseudo-labelling with deep feature fusion to enable robust MRI-based tumor classification in data-constrained clinical environments. SSPLNet integrates a custom convolutional neural network (CNN) and a pretrained ResNet50 model, trained semi-supervised using adaptive confidence thresholds (τ = 0.98  0.95  0.90) to iteratively refine pseudo-labels for unlabelled MRI scans. Feature representations from both branches are fused via a dense network, combining localized texture patterns with hierarchical deep features. Results: SSPLNet achieves state-of-the-art accuracy across labelled–unlabelled data splits (90:10 to 10:90), outperforming supervised baselines in extreme low-label regimes (10:90) by up to 5.34% from Custom CNN and 5.58% from ResNet50. The framework reduces annotation dependence and with 40% unlabeled data maintains 98.17% diagnostic accuracy, demonstrating its viability for scalable deployment in resource-limited healthcare settings. Conclusions: Statistical Evaluation and Robustness Analysis of SSPLNet Performance confirms that SSPLNet’s lower error rate is not due to chance. The bootstrap results also confirm that SSPLNet’s reported accuracy falls well within the 95% CI of the sampling distribution. Full article
Show Figures

Figure 1

29 pages, 3308 KB  
Article
A Comparative Study of BERT-Based Models for Teacher Classification in Physical Education
by Laura Martín-Hoz, Samuel Yanes-Luis, Jerónimo Huerta Cejudo, Daniel Gutiérrez-Reina and Evelia Franco Álvarez
Electronics 2025, 14(19), 3849; https://doi.org/10.3390/electronics14193849 - 28 Sep 2025
Abstract
Assessing teaching behavior is essential for improving instructional quality, particularly in Physical Education, where classroom interactions are fast-paced and complex. Traditional evaluation methods such as questionnaires, expert observations, and manual discourse analysis are often limited by subjectivity, high labor costs, and poor scalability. [...] Read more.
Assessing teaching behavior is essential for improving instructional quality, particularly in Physical Education, where classroom interactions are fast-paced and complex. Traditional evaluation methods such as questionnaires, expert observations, and manual discourse analysis are often limited by subjectivity, high labor costs, and poor scalability. These challenges underscore the need for automated, objective tools to support pedagogical assessment. This study explores and compares the use of Transformer-based language models for the automatic classification of teaching behaviors from real classroom transcriptions. A dataset of over 1300 utterances was compiled and annotated according to the teaching styles proposed in the circumplex approach (Autonomy Support, Structure, Control, and Chaos), along with an additional category for messages in which no style could be identified (Unidentified Style). To address class imbalance and enhance linguistic variability, data augmentation techniques were applied. Eight pretrained BERT-based Transformer architectures were evaluated, including several pretraining strategies and architectural structures. BETO achieved the highest performance, with an accuracy of 0.78, a macro-averaged F1-score of 0.72, and a weighted F1-score of 0.77. It showed strength in identifying challenging utterances labeled as Chaos and Autonomy Support. Furthermore, other BERT-based models purely trained with a Spanish text corpus like DistilBERT also present competitive performance, achieving accuracy metrics over 0.73 and and F1-score of 0.68. These results demonstrate the potential of leveraging Transformer-based models for objective and scalable teacher behavior classification. The findings support the feasibility of leveraging pretrained language models to develop scalable, AI-driven systems for classroom behavior classification and pedagogical feedback. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

18 pages, 2554 KB  
Article
A Hybrid Semi-Supervised Tri-Training Framework Integrating Traditional Classifiers and Lightweight CNN for High-Resolution Remote Sensing Image Classification
by Xiaopeng Han, Yukun Niu, Chuan He, Ding Zhou and Zhigang Cao
Appl. Sci. 2025, 15(19), 10353; https://doi.org/10.3390/app151910353 - 24 Sep 2025
Viewed by 139
Abstract
High-resolution remote sensing imagery offers detailed spatial and semantic insights into the Earth’s surface, yet its classification remains hindered by the limited availability of labeled data, primarily due to the substantial expense and time required for manual annotation. To overcome this challenge, we [...] Read more.
High-resolution remote sensing imagery offers detailed spatial and semantic insights into the Earth’s surface, yet its classification remains hindered by the limited availability of labeled data, primarily due to the substantial expense and time required for manual annotation. To overcome this challenge, we propose a hybrid semi-supervised tri-training framework that integrates traditional classification methods with a lightweight convolutional neural network. By combining heterogeneous learners with complementary strengths, the framework iteratively assigns pseudo-labels to unlabeled samples and collaboratively refines model performance in a co-training manner. Additionally, a landscape-metric-guided relearning module is introduced to incorporate spatial configuration and land cover composition, further enhancing the framework’s representational capacity and classification robustness. Experiments were conducted on four high-resolution multispectral datasets (QuickBird (QB), WorldView-2 (WV-2), GeoEye-1 (GE-1), and ZY-3) covering diverse land-cover types and spatial resolutions. The results demonstrate that the proposed method surpasses state-of-the-art baselines by 1.5–10% while generating more spatially coherent classification maps. Full article
(This article belongs to the Special Issue Advanced Remote Sensing Technologies and Their Applications)
Show Figures

Figure 1

20 pages, 5335 KB  
Article
LiGaussOcc: Fully Self-Supervised 3D Semantic Occupancy Prediction from LiDAR via Gaussian Splatting
by Zhiqiang Wei, Tao Huang and Fengdeng Zhang
Sensors 2025, 25(18), 5889; https://doi.org/10.3390/s25185889 - 20 Sep 2025
Viewed by 250
Abstract
Accurate 3D semantic occupancy perception is critical for autonomous driving, enabling robust navigation in unstructured environments. While vision-based methods suffer from depth inaccuracies and lighting sensitivity, LiDAR-based approaches face challenges due to sparse data and dependence on expensive manual annotations. This work proposes [...] Read more.
Accurate 3D semantic occupancy perception is critical for autonomous driving, enabling robust navigation in unstructured environments. While vision-based methods suffer from depth inaccuracies and lighting sensitivity, LiDAR-based approaches face challenges due to sparse data and dependence on expensive manual annotations. This work proposes LiGaussOcc, a novel self-supervised framework for dense LiDAR-based 3D semantic occupancy prediction. Our method first encodes LiDAR point clouds into voxel features and addresses sparsity via an Empty Voxel Inpainting (EVI) module, refined by an Adaptive Feature Fusion (AFF) module. During training, a Gaussian Primitive from Voxels (GPV) module generates parameters for 3D Gaussian Splatting, enabling efficient rendering of 2D depth and semantic maps. Supervision is achieved through photometric consistency across adjacent camera views and pseudo-labels from vision–language models, eliminating manual 3D annotations. Evaluated on the nuScenes-OpenOccupancy benchmark, LiGaussOcc achieved performance competitive with 30.4% Intersection over Union (IoU) and 14.1% mean Intersection over Union (mIoU). It reached 91.6% of the performance of the fully supervised LiDAR-based L-CONet, while completely eliminating the need for costly and labor-intensive manual 3D annotations. It excelled particularly in static environmental classes, such as drivable surfaces and man-made structures. This work presents a scalable, annotation-free solution for LiDAR-based 3D semantic occupancy perception. Full article
(This article belongs to the Section Radar Sensors)
Show Figures

Figure 1

31 pages, 1887 KB  
Article
ZaQQ: A New Arabic Dataset for Automatic Essay Scoring via a Novel Human–AI Collaborative Framework
by Yomna Elsayed, Emad Nabil, Marwan Torki, Safiullah Faizullah and Ayman Khalafallah
Data 2025, 10(9), 148; https://doi.org/10.3390/data10090148 - 19 Sep 2025
Viewed by 331
Abstract
Automated essay scoring (AES) has become an essential tool in educational assessment. However, applying AES to the Arabic language presents notable challenges, primarily due to the lack of labeled datasets. This data scarcity hampers the development of reliable machine learning models and slows [...] Read more.
Automated essay scoring (AES) has become an essential tool in educational assessment. However, applying AES to the Arabic language presents notable challenges, primarily due to the lack of labeled datasets. This data scarcity hampers the development of reliable machine learning models and slows progress in Arabic natural language processing for educational use. While manual annotation by human experts remains the most accurate method for essay evaluation, it is often too costly and time-consuming to create large-scale datasets, especially for low-resource languages like Arabic. In this work, we introduce a human–AI collaborative framework designed to overcome the shortage of scored Arabic essays. Leveraging QAES, a high-quality annotated dataset, our approach uses Large Language Models (LLMs) to generate multidimensional essay evaluations across seven key writing traits: Relevance, Organization, Vocabulary, Style, Development, Mechanics, and Structure. To ensure accuracy and consistency, we design prompting strategies and validation procedures tailored to each trait. This system is then applied to two unannotated Arabic essay datasets: ZAEBUC and QALB. As a result, we introduce ZaQQ, a newly annotated dataset that merges ZAEBUC, QAES, and QALB. Our findings demonstrate that human–AI collaboration can significantly enhance the availability of labeled resources without compromising assessment quality. The proposed framework serves as a scalable and replicable model for addressing data annotation challenges in low-resource languages and supports the broader goal of expanding access to automated educational assessment tools where expert evaluation is limited. Full article
Show Figures

Figure 1

20 pages, 13462 KB  
Article
An AI-Based System for Monitoring Laying Hen Behavior Using Computer Vision for Small-Scale Poultry Farms
by Jill Italiya, Ahmed Abdelmoamen Ahmed, Ahmed A. A. Abdel-Wareth and Jayant Lohakare
Agriculture 2025, 15(18), 1963; https://doi.org/10.3390/agriculture15181963 - 17 Sep 2025
Viewed by 298
Abstract
Small-scale poultry farms often lack access to advanced monitoring tools and rely heavily on manual observation, which is time-consuming, inconsistent, and insufficient for precise flock management. Feeding and drinking behaviors are critical, as they serve as early indicators of health and environmental issues. [...] Read more.
Small-scale poultry farms often lack access to advanced monitoring tools and rely heavily on manual observation, which is time-consuming, inconsistent, and insufficient for precise flock management. Feeding and drinking behaviors are critical, as they serve as early indicators of health and environmental issues. With global poultry production expanding, raising over 70 billion hens annually, there is an urgent need for intelligent, low-cost systems that can continuously and accurately monitor bird behavior in resource-limited farm settings. This paper presents the development of a computer vision-based chicken behavior monitoring system, specifically designed for small barn environments where at most 10–15 chickens are housed at any time. The developed system consists of an object detection model, created on top of the YOLOv8 model, trained with an imagery dataset of laying hen, feeder, and waterer objects. Although chickens are visually indistinguishable, the system processes each detection per frame using bounding boxes and movement-based approximation identification rather than continuous identity tracking. The approach simplifies the tracking process without losing valuable behavior insights. Over 700 frames were annotated manually for high-quality labeled data, with different lighting, hen positions, and interaction angles with dispensers. The images were annotated in YOLO format and used for training the detection model for 100 epochs, resulting in a model having an average mean average precision (mAP@0.5) metric value of 91.5% and a detection accuracy of over 92%. The proposed system offers an efficient, low-cost solution for monitoring chicken feeding and drinking behaviors in small-scale farms, supporting improved management and early health detection. Full article
Show Figures

Figure 1

22 pages, 3399 KB  
Article
Integrating Cross-Modal Semantic Learning with Generative Models for Gesture Recognition
by Shuangjiao Zhai, Zixin Dai, Zanxia Jin, Pinle Qin and Jianchao Zeng
Sensors 2025, 25(18), 5783; https://doi.org/10.3390/s25185783 - 17 Sep 2025
Viewed by 236
Abstract
Radio frequency (RF)-based human activity sensing is an essential component of ubiquitous computing, with WiFi sensing providing a practical and low-cost solution for gesture and activity recognition. However, challenges such as manual data collection, multipath interference, and poor cross-domain generalization hinder real-world deployment. [...] Read more.
Radio frequency (RF)-based human activity sensing is an essential component of ubiquitous computing, with WiFi sensing providing a practical and low-cost solution for gesture and activity recognition. However, challenges such as manual data collection, multipath interference, and poor cross-domain generalization hinder real-world deployment. Existing data augmentation approaches often neglect the biomechanical structure underlying RF signals. To address these limitations, we present CM-GR, a cross-modal gesture recognition framework that integrates semantic learning with generative modeling. CM-GR leverages 3D skeletal points extracted from vision data as semantic priors to guide the synthesis of realistic WiFi signals, thereby incorporating biomechanical constraints without requiring extensive manual labeling. In addition, dynamic conditional vectors are constructed from inter-subject skeletal differences, enabling user-specific WiFi data generation without the need for dedicated data collection and annotation for each new user. Extensive experiments on the public MM-Fi dataset and our SelfSet dataset demonstrate that CM-GR substantially improves the cross-subject gesture recognition accuracy, achieving gains of up to 10.26% and 9.5%, respectively. These results confirm the effectiveness of CM-GR in synthesizing personalized WiFi data and highlight its potential for robust and scalable gesture recognition in practical settings. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

14 pages, 4724 KB  
Article
Uncertainty-Guided Active Learning for Access Route Segmentation and Planning in Transcatheter Aortic Valve Implantation
by Mahdi Islam, Musarrat Tabassum, Agnes Mayr, Christian Kremser, Markus Haltmeier and Enrique Almar-Munoz
J. Imaging 2025, 11(9), 318; https://doi.org/10.3390/jimaging11090318 - 17 Sep 2025
Viewed by 361
Abstract
Transcatheter aortic valve implantation (TAVI) is a minimally invasive procedure for treating severe aortic stenosis, where optimal vascular access route selection is critical to reduce complications. It requires careful selection of the iliac artery with the most favourable anatomy, specifically, one with the [...] Read more.
Transcatheter aortic valve implantation (TAVI) is a minimally invasive procedure for treating severe aortic stenosis, where optimal vascular access route selection is critical to reduce complications. It requires careful selection of the iliac artery with the most favourable anatomy, specifically, one with the largest diameters and no segments narrower than 5 mm. This process is time-consuming when carried out manually. We present an active learning-based segmentation framework for contrast-enhanced Cardiac Magnetic Resonance (CMR) data, guided by probabilistic uncertainty and pseudo-labelling, enabling efficient segmentation with minimal manual annotation. The segmentations are then fed into an automated pipeline for diameter quantification, achieving a Dice score of 0.912 and a mean absolute percentage error (MAPE) of 4.92%. An ablation study using pre- and post-contrast CMR showed superior performance with post-contrast data only. Overall, the pipeline provides accurate segmentation and detailed diameter profiles of the aorto-iliac route, helping the assessment of the access route. Full article
(This article belongs to the Special Issue Emerging Technologies for Less Invasive Diagnostic Imaging)
Show Figures

Figure 1

15 pages, 3354 KB  
Article
CAFM-Enhanced YOLOv8: A Two-Stage Optimization for Precise Strawberry Disease Detection in Complex Field Conditions
by Hua Li, Jixing Liu, Ke Han and Xiaobo Cai
Appl. Sci. 2025, 15(18), 10025; https://doi.org/10.3390/app151810025 - 13 Sep 2025
Viewed by 210
Abstract
Strawberry, as an important global economic crop, its disease prevention and control directly affects yield and quality. Traditional detection means rely on manual observation or traditional machine learning algorithms, which have defects such as low efficiency, high false detection rate, and insufficient adaptability [...] Read more.
Strawberry, as an important global economic crop, its disease prevention and control directly affects yield and quality. Traditional detection means rely on manual observation or traditional machine learning algorithms, which have defects such as low efficiency, high false detection rate, and insufficient adaptability to tiny disease spots and complex environment. To solve the above problems, this study proposes a strawberry disease recognition method based on improved YOLOv8. By systematically acquiring 3146 image data covering seven types of typical diseases, such as gray mold and powdery mildew, a high-quality dataset containing different disease stages and complex backgrounds was constructed. Aiming at the difficulties in disease detection, the YOLOv8 model is optimized in two stages: on the one hand, the ultra-small scale detection head (32 × 32) is introduced to enhance the model’s ability to capture early tiny spots; on the other hand, the convolution and attention fusion module (CAFM) is combined to enhance the feature robustness in complex field scenes through the synergy of local feature extraction and global information focusing. Experiments show that the mAP50 of the improved model reaches 0.96 and outperforms mainstream algorithms such as YOLOv5 and Faster R-CNN in both recall and F1 score. In addition, the interactive system developed based on the PyQT5 framework can process images, videos and camera inputs in real time, and the disease areas are presented intuitively through visualized bounding boxes and category labels, which provides farmers with a lightweight and low-threshold field management tool. This study not only verifies the effectiveness of the improved algorithm but also provides a practical reference for the engineering application of deep learning in agricultural scenarios, which is expected to promote the further implementation of precision agriculture technology. Full article
(This article belongs to the Section Agricultural Science and Technology)
Show Figures

Figure 1

19 pages, 3745 KB  
Article
Anomaly Detection in Mineral Micro-X-Ray Fluorescence Spectroscopy Based on a Multi-Scale Feature Aggregation Network
by Yangxin Lu, Weiming Jiang, Molei Zhao, Yuanzhi Zhou, Jie Yang, Kunfeng Qiu and Qiuming Cheng
Minerals 2025, 15(9), 970; https://doi.org/10.3390/min15090970 - 13 Sep 2025
Viewed by 269
Abstract
Micro-X-ray fluorescence spectroscopy (micro-XRF) integrates spatial and spectral information and is widely employed for multi-elemental analyses of rock-forming minerals. However, its inherent limitation in spatial resolution gives rise to significant pixel mixing, thereby hindering the accurate identification of fine-scale or anomalous mineral phases. [...] Read more.
Micro-X-ray fluorescence spectroscopy (micro-XRF) integrates spatial and spectral information and is widely employed for multi-elemental analyses of rock-forming minerals. However, its inherent limitation in spatial resolution gives rise to significant pixel mixing, thereby hindering the accurate identification of fine-scale or anomalous mineral phases. Furthermore, most existing methods heavily rely on manually labeled data or predefined spectral libraries, rendering them poorly adaptable to complex and variable mineral systems. To address these challenges, this paper presents an unsupervised deep aggregation network (MSFA-Net) for micro-XRF imagery, aiming to eliminate the reliance of traditional methods on prior knowledge and enhance the recognition capability of rare mineral anomalies. Built on an autoencoder architecture, MSFA-Net incorporates a multi-scale orthogonal attention module to strengthen spectral–spatial feature fusion and employs density-based adaptive clustering to guide semantically aware reconstruction, thus achieving high-precision responses to potential anomalous regions. Experiments on real-world micro-XRF datasets demonstrate that MSFA-Net not only outperforms mainstream anomaly detection methods but also transcends the physical resolution limits of the instrument, successfully identifying subtle mineral anomalies that traditional approaches fail to detect. This method presents a novel paradigm for high-throughput and weakly supervised interpretation of complex geological images. Full article
(This article belongs to the Special Issue Gold–Polymetallic Deposits in Convergent Margins)
Show Figures

Figure 1

4 pages, 2856 KB  
Abstract
Can Transfer Learning Overcome the Challenge of Identifying Lemming Species in Images Taken in the near Infrared Spectrum?
by Davood Kalhor, Mathilde Poirier, Xavier Maldague and Gilles Gauthier
Proceedings 2025, 129(1), 65; https://doi.org/10.3390/proceedings2025129065 - 12 Sep 2025
Viewed by 166
Abstract
Using a camera system developed earlier for monitoring the behavior of lemmings under the snow, we are now able to record a large number of short image sequences from this rodent which plays a central role in the Arctic food web. Identifying lemming [...] Read more.
Using a camera system developed earlier for monitoring the behavior of lemmings under the snow, we are now able to record a large number of short image sequences from this rodent which plays a central role in the Arctic food web. Identifying lemming species in these images manually is wearisome and time-consuming. To perform this task, we present a deep neural network which has several million parameters to configure. Training a network of such an immense size with conventional methods requires a huge amount of data but a sufficiently large labeled dataset of lemming images is currently lacking. Another challenge is that images are obtained in darkness in the near infrared spectrum, causing the loss of some image texture information. We investigate whether these challenges can be tackled by a transfer learning approach in which a network is pretrained on a dataset of visible spectrum images that does not include lemmings. We believe this work provides a basis for moving toward developing intelligent software programs that can facilitate the analysis of videos by biologists. Full article
Show Figures

Figure 1

16 pages, 1138 KB  
Article
A Multi-Working States Sensor Anomaly Detection Method Using Deep Learning Algorithms
by Di Wu, Kari Koskinen and Eric Coatanea
Sensors 2025, 25(18), 5686; https://doi.org/10.3390/s25185686 - 12 Sep 2025
Viewed by 347
Abstract
The data collected from sensors are subject to the presence of anomaly data. These anomalies may stem from sensor malfunctions or poor communication. Prior to the processing of the data, it is imperative to detect and isolate the anomaly data from the substantial [...] Read more.
The data collected from sensors are subject to the presence of anomaly data. These anomalies may stem from sensor malfunctions or poor communication. Prior to the processing of the data, it is imperative to detect and isolate the anomaly data from the substantial volume of normal data. The utilization of data-driven approaches for sensor anomaly detection and isolation frequently confronts the predicament of inadequately labeled data. In one aspect, the data obtained from sensors usually contain no or few examples of faults and those faults are difficult to identify manually from a large amount of raw data. Additionally, the operational states of a machine may undergo alterations during its functioning, potentially resulting in different sensor measurement behaviors. However, the operational states of a machine are not clearly labeled either. In order to address the challenges posed by the absence or lack of labeled data in both domains, a sensor anomaly detection and isolation method using LSTM (long short-term memory) networks is proposed in this paper. In order to predict sensor measurements at a subsequent timestep, behaviors in the preceding timesteps are utilized to consider the influence of the varying operational states. The inputs of the LSTM networks are selected based on prediction errors trained by a small dataset to increase the prediction accuracy and reduce the influence of redundant sensors. The residual between the predicted data and the measurement data is used to determine whether an anomaly has been identified. The proposed method is evaluated using a real dataset obtained from a truck operating in a mine. The results showed that the proposed network with the input-selection method demonstrated the ability to accurately detect drift and stall anomalies accurately in the experiments. Full article
(This article belongs to the Special Issue Fault Diagnosis Based on Sensing and Control Systems)
Show Figures

Figure 1

18 pages, 2065 KB  
Article
Phoneme-Aware Augmentation for Robust Cantonese ASR Under Low-Resource Conditions
by Lusheng Zhang, Shie Wu and Zhongxun Wang
Symmetry 2025, 17(9), 1478; https://doi.org/10.3390/sym17091478 - 8 Sep 2025
Viewed by 535
Abstract
Cantonese automatic speech recognition (ASR) faces persistent challenges due to its nine lexical tones, extensive phonological variation, and the scarcity of professionally transcribed corpora. To address these issues, we propose a lightweight and data-efficient framework that leverages weak phonetic supervision (WPS) in conjunction [...] Read more.
Cantonese automatic speech recognition (ASR) faces persistent challenges due to its nine lexical tones, extensive phonological variation, and the scarcity of professionally transcribed corpora. To address these issues, we propose a lightweight and data-efficient framework that leverages weak phonetic supervision (WPS) in conjunction with two pho-neme-aware augmentation strategies. (1) Dynamic Boundary-Aligned Phoneme Dropout progressively removes entire IPA segments according to a curriculum schedule, simulating real-world phenomena such as elision, lenition, and tonal drift while ensuring training stability. (2) Phoneme-Aware SpecAugment confines all time- and frequency-masking operations within phoneme boundaries and prioritizes high-attention regions, thereby preserving intra-phonemic contours and formant integrity. Built on the Whistle encoder—which integrates a Conformer backbone, Connectionist Temporal Classification–Conditional Random Field (CTC-CRF) alignment, and a multi-lingual phonetic space—the approach requires only a grapheme-to-phoneme lexicon and Montreal Forced Aligner outputs, without any additional manual labeling. Experiments on the Cantonese subset of Common Voice demonstrate consistent gains: Dynamic Dropout alone reduces phoneme error rate (PER) from 17.8% to 16.7% with 50 h of speech and 16.4% to 15.1% with 100 h, while the combination of the two augmentations further lowers PER to 15.9%/14.4%. These results confirm that structure-aware phoneme-level perturbations provide an effective and low-cost solution for building robust Cantonese ASR systems under low-resource conditions. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

Back to TopTop