Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,374)

Search Parameters:
Keywords = shot performance

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 804 KiB  
Article
Beyond Classical AI: Detecting Fake News with Hybrid Quantum Neural Networks
by Volkan Altıntaş
Appl. Sci. 2025, 15(15), 8300; https://doi.org/10.3390/app15158300 - 25 Jul 2025
Abstract
The advent of quantum computing has introduced new opportunities for enhancing classical machine learning architectures. In this study, we propose a novel hybrid model, the HQDNN (Hybrid Quantum–Deep Neural Network), designed for the automatic detection of fake news. The model integrates classical fully [...] Read more.
The advent of quantum computing has introduced new opportunities for enhancing classical machine learning architectures. In this study, we propose a novel hybrid model, the HQDNN (Hybrid Quantum–Deep Neural Network), designed for the automatic detection of fake news. The model integrates classical fully connected neural layers with a parameterized quantum circuit, enabling the processing of textual data within both classical and quantum computational domains. To assess its effectiveness, we conducted experiments on the widely used LIAR dataset utilizing Term Frequency–Inverse Document Frequency (TF-IDF) features, as well as transformer-based DistilBERT embeddings. The experimental results demonstrate that the HQDNN achieves a superior recall performance—92.58% with TF-IDF and 94.40% with DistilBERT—surpassing traditional machine learning models such as Logistic Regression, Linear SVM, and Multilayer Perceptron. Additionally, we compare the HQDNN with SetFit, a recent CPU-efficient few-shot transformer model, and show that while SetFit achieves higher precision, the HQDNN significantly outperforms it in recall. Furthermore, an ablation experiment confirms the critical contribution of the quantum component, revealing a substantial drop in performance when the quantum layer is removed. These findings highlight the potential of hybrid quantum–classical models as effective and compact alternatives for high-sensitivity classification tasks, particularly in domains such as fake news detection. Full article
18 pages, 8446 KiB  
Article
Evaluation of Single-Shot Object Detection Models for Identifying Fanning Behavior in Honeybees at the Hive Entrance
by Tomyslav Sledevič
Agriculture 2025, 15(15), 1609; https://doi.org/10.3390/agriculture15151609 - 25 Jul 2025
Abstract
Thermoregulatory fanning behavior in honeybees is a vital indicator of colony health and environmental response. This study presents a novel dataset of 18,000 annotated video frames containing 57,597 instances capturing fanning behavior at the hive entrance across diverse conditions. Three state-of-the-art single-shot object [...] Read more.
Thermoregulatory fanning behavior in honeybees is a vital indicator of colony health and environmental response. This study presents a novel dataset of 18,000 annotated video frames containing 57,597 instances capturing fanning behavior at the hive entrance across diverse conditions. Three state-of-the-art single-shot object detection models (YOLOv8, YOLO11, YOLO12) are evaluated using standard RGB input and two motion-enhanced encodings: Temporally Stacked Grayscale (TSG) and Temporally Encoded Motion (TEM). Results show that models incorporating temporal information via TSG and TEM significantly outperform RGB-only input, achieving up to 85% mAP@50 with real-time inference capability on high-performance GPUs. Deployment tests on the Jetson AGX Orin platform demonstrate feasibility for edge computing, though with accuracy–speed trade-offs in smaller models. This work advances real-time, non-invasive monitoring of hive health, with implications for precision apiculture and automated behavioral analysis. Full article
Show Figures

Figure 1

18 pages, 516 KiB  
Article
A Nested Named Entity Recognition Model Robust in Few-Shot Learning Environments Using Label Description Information
by Hyunsun Hwang, Youngjun Jung, Changki Lee and Wooyoung Go
Appl. Sci. 2025, 15(15), 8255; https://doi.org/10.3390/app15158255 - 24 Jul 2025
Abstract
Nested named entity recognition (NER) is a task that identifies hierarchically structured entities, where one entity can contain other entities within its span. This study introduces a nested NER model for few-shot learning environments, addressing the difficulty of building extensive datasets for general [...] Read more.
Nested named entity recognition (NER) is a task that identifies hierarchically structured entities, where one entity can contain other entities within its span. This study introduces a nested NER model for few-shot learning environments, addressing the difficulty of building extensive datasets for general named entities. We enhance the Biaffine nested NER model by modifying its output layer to incorporate label semantic information through a novel label description embedding (LDE) approach, improving performance with limited training data. Our method replaces the traditional biaffine classifier with a label attention mechanism that leverages comprehensive natural language descriptions of entity types, encoded using BERT to capture rich semantic relationships between labels and input spans. We conducted comprehensive experiments on four benchmark datasets: GENIA (nested NER), ACE 2004 (nested NER), ACE 2005 (nested NER), and CoNLL 2003 English (flat NER). Performance was evaluated across multiple few-shot scenarios (1-shot, 5-shot, 10-shot, and 20-shot) using F1-measure as the primary metric, with five different random seeds to ensure robust evaluation. We compared our approach against strong baselines including BERT-LSTM-CRF with nested tags, the original Biaffine model, and recent few-shot NER methods (FewNER, FIT, LPNER, SpanNER). Results demonstrate significant improvements across all few-shot scenarios. On GENIA, our LDE model achieves 45.07% F1 in five-shot learning compared to 30.74% for the baseline Biaffine model (46.4% relative improvement). On ACE 2005, we obtain 44.24% vs. 32.38% F1 in five-shot scenarios (36.6% relative improvement). The model shows consistent gains in 10-shot (57.19% vs. 49.50% on ACE 2005) and 20-shot settings (64.50% vs. 58.21% on ACE 2005). Ablation studies confirm that semantic information from label descriptions is the key factor enabling robust few-shot performance. Transfer learning experiments demonstrate the model’s ability to leverage knowledge from related domains. Our findings suggest that incorporating label semantic information can substantially enhance NER models in low-resource settings, opening new possibilities for applying NER in specialized domains or languages with limited annotated data. Full article
(This article belongs to the Special Issue Applications of Natural Language Processing to Data Science)
Show Figures

Figure 1

18 pages, 4203 KiB  
Article
SRW-YOLO: A Detection Model for Environmental Risk Factors During the Grid Construction Phase
by Yu Zhao, Fei Liu, Qiang He, Fang Liu, Xiaohu Sun and Jiyong Zhang
Remote Sens. 2025, 17(15), 2576; https://doi.org/10.3390/rs17152576 - 24 Jul 2025
Abstract
With the rapid advancement of UAV-based remote sensing and image recognition techniques, identifying environmental risk factors from aerial imagery has emerged as a focal point in intelligent inspection during the power transmission and distribution projects construction phase. The uneven spatial distribution of risk [...] Read more.
With the rapid advancement of UAV-based remote sensing and image recognition techniques, identifying environmental risk factors from aerial imagery has emerged as a focal point in intelligent inspection during the power transmission and distribution projects construction phase. The uneven spatial distribution of risk factors on construction sites, their weak texture signatures, and the inherently multi-scale nature of UAV imagery pose significant detection challenges. To address these issues, we propose a one-stage SRW-YOLO algorithm built upon the YOLOv11 framework. First, a P2-scale shallow feature detection layer is added to capture high-resolution fine details of small targets. Second, we integrate a reparameterized convolution based on channel shuffle (RCS) of a one-shot aggregation (RCS-OSA) module into the backbone and neck’s shallow layers, enhancing feature extraction while significantly reducing inference latency. Finally, a dynamic non-monotonic focusing mechanism WIoU v3 loss function is employed to reweigh low-quality annotations, thereby improving small-object localization accuracy. Experimental results demonstrate that SRW-YOLO achieves an overall precision of 80.6% and mAP of 79.1% on the State Grid dataset, and exhibits similarly superior performance on the VisDrone2019 dataset. Compared with other one-stage detectors, SRW-YOLO delivers markedly higher detection accuracy, offering critical technical support for multi-scale, heterogeneous environmental risk monitoring during the power transmission and distribution projects construction phase, and establishes the theoretical foundation for rapid and accurate inspection using UAV-based intelligent imaging. Full article
Show Figures

Figure 1

25 pages, 2129 KiB  
Article
Zero-Shot 3D Reconstruction of Industrial Assets: A Completion-to-Reconstruction Framework Trained on Synthetic Data
by Yongjie Xu, Haihua Zhu and Barmak Honarvar Shakibaei Asli
Electronics 2025, 14(15), 2949; https://doi.org/10.3390/electronics14152949 - 24 Jul 2025
Abstract
Creating high-fidelity digital twins (DTs) for Industry 4.0 applications, it is fundamentally reliant on the accurate 3D modeling of physical assets, a task complicated by the inherent imperfections of real-world point cloud data. This paper addresses the challenge of reconstructing accurate, watertight, and [...] Read more.
Creating high-fidelity digital twins (DTs) for Industry 4.0 applications, it is fundamentally reliant on the accurate 3D modeling of physical assets, a task complicated by the inherent imperfections of real-world point cloud data. This paper addresses the challenge of reconstructing accurate, watertight, and topologically sound 3D meshes from sparse, noisy, and incomplete point clouds acquired in complex industrial environments. We introduce a robust two-stage completion-to-reconstruction framework, C2R3D-Net, that systematically tackles this problem. The methodology first employs a pretrained, self-supervised point cloud completion network to infer a dense and structurally coherent geometric representation from degraded inputs. Subsequently, a novel adaptive surface reconstruction network generates the final high-fidelity mesh. This network features a hybrid encoder (FKAConv-LSA-DC), which integrates fixed-kernel and deformable convolutions with local self-attention to robustly capture both coarse geometry and fine details, and a boundary-aware multi-head interpolation decoder, which explicitly models sharp edges and thin structures to preserve geometric fidelity. Comprehensive experiments on the large-scale synthetic ShapeNet benchmark demonstrate state-of-the-art performance across all standard metrics. Crucially, we validate the framework’s strong zero-shot generalization capability by deploying the model—trained exclusively on synthetic data—to reconstruct complex assets from a custom-collected industrial dataset without any additional fine-tuning. The results confirm the method’s suitability as a robust and scalable approach for 3D asset modeling, a critical enabling step for creating high-fidelity DTs in demanding, unseen industrial settings. Full article
Show Figures

Figure 1

35 pages, 7934 KiB  
Article
Analyzing Diagnostic Reasoning of Vision–Language Models via Zero-Shot Chain-of-Thought Prompting in Medical Visual Question Answering
by Fatema Tuj Johora Faria, Laith H. Baniata, Ahyoung Choi and Sangwoo Kang
Mathematics 2025, 13(14), 2322; https://doi.org/10.3390/math13142322 - 21 Jul 2025
Viewed by 317
Abstract
Medical Visual Question Answering (MedVQA) lies at the intersection of computer vision, natural language processing, and clinical decision-making, aiming to generate accurate responses from medical images paired with complex inquiries. Despite recent advances in vision–language models (VLMs), their use in healthcare remains limited [...] Read more.
Medical Visual Question Answering (MedVQA) lies at the intersection of computer vision, natural language processing, and clinical decision-making, aiming to generate accurate responses from medical images paired with complex inquiries. Despite recent advances in vision–language models (VLMs), their use in healthcare remains limited by a lack of interpretability and a tendency to produce direct, unexplainable outputs. This opacity undermines their reliability in medical settings, where transparency and justification are critically important. To address this limitation, we propose a zero-shot chain-of-thought prompting framework that guides VLMs to perform multi-step reasoning before arriving at an answer. By encouraging the model to break down the problem, analyze both visual and contextual cues, and construct a stepwise explanation, the approach makes the reasoning process explicit and clinically meaningful. We evaluate the framework on the PMC-VQA benchmark, which includes authentic radiological images and expert-level prompts. In a comparative analysis of three leading VLMs, Gemini 2.5 Pro achieved the highest accuracy (72.48%), followed by Claude 3.5 Sonnet (69.00%) and GPT-4o Mini (67.33%). The results demonstrate that chain-of-thought prompting significantly improves both reasoning transparency and performance in MedVQA tasks. Full article
(This article belongs to the Special Issue Mathematical Foundations in NLP: Applications and Challenges)
Show Figures

Figure 1

35 pages, 954 KiB  
Article
Beyond Manual Media Coding: Evaluating Large Language Models and Agents for News Content Analysis
by Stavros Doropoulos, Elisavet Karapalidou, Polychronis Charitidis, Sophia Karakeva and Stavros Vologiannidis
Appl. Sci. 2025, 15(14), 8059; https://doi.org/10.3390/app15148059 - 20 Jul 2025
Viewed by 290
Abstract
The vast volume of media content, combined with the costs of manual annotation, challenges scalable codebook analysis and risks reducing decision-making accuracy. This study evaluates the effectiveness of large language models (LLMs) and multi-agent teams in structured media content analysis based on codebook-driven [...] Read more.
The vast volume of media content, combined with the costs of manual annotation, challenges scalable codebook analysis and risks reducing decision-making accuracy. This study evaluates the effectiveness of large language models (LLMs) and multi-agent teams in structured media content analysis based on codebook-driven annotation. We construct a dataset of 200 news articles on U.S. tariff policies, manually annotated using a 26-question codebook encompassing 122 distinct codes, to establish a rigorous ground truth. Seven state-of-the-art LLMs, spanning low- to high-capacity tiers, are assessed under a unified zero-shot prompting framework incorporating role-based instructions and schema-constrained outputs. Experimental results show weighted global F1-scores between 0.636 and 0.822, with Claude-3-7-Sonnet achieving the highest direct-prompt performance. To examine the potential of agentic orchestration, we propose and develop a multi-agent system using Meta’s Llama 4 Maverick, incorporating expert role profiling, shared memory, and coordinated planning. This architecture improves the overall F1-score over the direct prompting baseline from 0.757 to 0.805 and demonstrates consistent gains across binary, categorical, and multi-label tasks, approaching commercial-level accuracy while maintaining a favorable cost–performance profile. These findings highlight the viability of LLMs, both in direct and agentic configurations, for automating structured content analysis. Full article
(This article belongs to the Special Issue Natural Language Processing in the Era of Artificial Intelligence)
Show Figures

Figure 1

15 pages, 3893 KiB  
Article
Exploration of 3D Few-Shot Learning Techniques for Classification of Knee Joint Injuries on MR Images
by Vinh Hiep Dang, Minh Tri Nguyen, Ngoc Hoang Le, Thuan Phat Nguyen, Quoc-Viet Tran, Tan Ha Mai, Vu Pham Thao Vy, Truong Nguyen Khanh Hung, Ching-Yu Lee, Ching-Li Tseng, Nguyen Quoc Khanh Le and Phung-Anh Nguyen
Diagnostics 2025, 15(14), 1808; https://doi.org/10.3390/diagnostics15141808 - 18 Jul 2025
Viewed by 297
Abstract
Accurate diagnosis of knee joint injuries from magnetic resonance (MR) images is critical for patient care. Background/Objectives: While deep learning has advanced 3D MR image analysis, its reliance on extensive labeled datasets is a major hurdle for diverse knee pathologies. Few-shot learning [...] Read more.
Accurate diagnosis of knee joint injuries from magnetic resonance (MR) images is critical for patient care. Background/Objectives: While deep learning has advanced 3D MR image analysis, its reliance on extensive labeled datasets is a major hurdle for diverse knee pathologies. Few-shot learning (FSL) addresses this by enabling models to classify new conditions from minimal annotated examples, often leveraging knowledge from related tasks. However, creating robust 3D FSL frameworks for varied knee injuries remains challenging. Methods: We introduce MedNet-FS, a 3D FSL framework that effectively classifies knee injuries by utilizing domain-specific pre-trained weights and generalized end-to-end (GE2E) loss for discriminative embeddings. Results: MedNet-FS, with knee-MRI-specific pre-training, significantly outperformed models using generic or other medical pre-trained weights and approached supervised learning performance on internal datasets with limited samples (e.g., achieving an area under the curve (AUC) of 0.76 for ACL tear classification with k = 40 support samples on the MRNet dataset). External validation on the KneeMRI dataset revealed challenges in classifying partially torn ACL (AUC up to 0.58) but demonstrated promising performance for distinguishing intact versus fully ruptured ACLs (AUC 0.62 with k = 40). Conclusions: These findings demonstrate that tailored FSL strategies can substantially reduce data dependency in developing specialized medical imaging tools. This approach fosters rapid AI tool development for knee injuries and offers a scalable solution for data scarcity in other medical imaging domains, potentially democratizing AI-assisted diagnostics, particularly for rare conditions or in resource-limited settings. Full article
(This article belongs to the Special Issue New Technologies and Tools Used for Risk Assessment of Diseases)
Show Figures

Figure 1

16 pages, 2355 KiB  
Article
Generalising Stock Detection in Retail Cabinets with Minimal Data Using a DenseNet and Vision Transformer Ensemble
by Babak Rahi, Deniz Sagmanli, Felix Oppong, Direnc Pekaslan and Isaac Triguero
Mach. Learn. Knowl. Extr. 2025, 7(3), 66; https://doi.org/10.3390/make7030066 - 16 Jul 2025
Viewed by 233
Abstract
Generalising deep-learning models to perform well on unseen data domains with minimal retraining remains a significant challenge in computer vision. Even when the target task—such as quantifying the number of elements in an image—stays the same, data quality, shape, or form variations can [...] Read more.
Generalising deep-learning models to perform well on unseen data domains with minimal retraining remains a significant challenge in computer vision. Even when the target task—such as quantifying the number of elements in an image—stays the same, data quality, shape, or form variations can deviate from the training conditions, often necessitating manual intervention. As a real-world industry problem, we aim to automate stock level estimation in retail cabinets. As technology advances, new cabinet models with varying shapes emerge alongside new camera types. This evolving scenario poses a substantial obstacle to deploying long-term, scalable solutions. To surmount the challenge of generalising to new cabinet models and cameras with minimal amounts of sample images, this research introduces a new solution. This paper proposes a novel ensemble model that combines DenseNet-201 and Vision Transformer (ViT-B/8) architectures to achieve generalisation in stock-level classification. The novelty aspect of our solution comes from the fact that we combine a transformer with a DenseNet model in order to capture both the local, hierarchical details and the long-range dependencies within the images, improving generalisation accuracy with less data. Key contributions include (i) a novel DenseNet-201 + ViT-B/8 feature-level fusion, (ii) an adaptation workflow that needs only two images per class, (iii) a balanced layer-unfreezing schedule, (iv) a publicly described domain-shift benchmark, and (v) a 47 pp accuracy gain over four standard few-shot baselines. Our approach leverages fine-tuning techniques to adapt two pre-trained models to the new retail cabinets (i.e., standing or horizontal) and camera types using only two images per class. Experimental results demonstrate that our method achieves high accuracy rates of 91% on new cabinets with the same camera and 89% on new cabinets with different cameras, significantly outperforming standard few-shot learning methods. Full article
(This article belongs to the Section Data)
Show Figures

Figure 1

20 pages, 2382 KiB  
Article
Heterogeneity-Aware Personalized Federated Neural Architecture Search
by An Yang and Ying Liu
Entropy 2025, 27(7), 759; https://doi.org/10.3390/e27070759 - 16 Jul 2025
Viewed by 212
Abstract
Federated learning (FL), which enables collaborative learning across distributed nodes, confronts a significant heterogeneity challenge, primarily including resource heterogeneity induced by different hardware platforms, and statistical heterogeneity originating from non-IID private data distributions among clients. Neural architecture search (NAS), particularly one-shot NAS, holds [...] Read more.
Federated learning (FL), which enables collaborative learning across distributed nodes, confronts a significant heterogeneity challenge, primarily including resource heterogeneity induced by different hardware platforms, and statistical heterogeneity originating from non-IID private data distributions among clients. Neural architecture search (NAS), particularly one-shot NAS, holds great promise for automatically designing optimal personalized models tailored to such heterogeneous scenarios. However, the coexistence of both resource and statistical heterogeneity destabilizes the training of the one-shot supernet, impairs the evaluation of candidate architectures, and ultimately hinders the discovery of optimal personalized models. To address this problem, we propose a heterogeneity-aware personalized federated NAS (HAPFNAS) method. First, we leverage lightweight knowledge models to distill knowledge from clients to server-side supernet, thereby effectively mitigating the effects of heterogeneity and enhancing the training stability. Then, we build random-forest-based personalized performance predictors to enable the efficient evaluation of candidate architectures across clients. Furthermore, we develop a model-heterogeneous FL algorithm called heteroFedAvg to facilitate collaborative model training for the discovered personalized models. Comprehensive experiments on CIFAR-10/100 and Tiny-ImageNet classification datasets demonstrate the effectiveness of our HAPFNAS, compared to state-of-the-art federated NAS methods. Full article
(This article belongs to the Section Signal and Data Analysis)
Show Figures

Figure 1

21 pages, 3826 KiB  
Article
UAV-OVD: Open-Vocabulary Object Detection in UAV Imagery via Multi-Level Text-Guided Decoding
by Lijie Tao, Guoting Wei, Zhuo Wang, Zhaoshuai Qi, Ying Li and Haokui Zhang
Drones 2025, 9(7), 495; https://doi.org/10.3390/drones9070495 - 14 Jul 2025
Viewed by 342
Abstract
Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore [...] Read more.
Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore open-vocabulary or open-world detection, their application to UAV imagery remains limited and underexplored. In this paper, we address this limitation by exploring the relationship between images and textual semantics to extend object detection in UAV imagery to an open-vocabulary setting. We propose a novel and efficient detector named Unmanned Aerial Vehicle Open-Vocabulary Detector (UAV-OVD), specifically designed for drone-captured scenes. To facilitate open-vocabulary object detection, we propose improvements from three complementary perspectives. First, at the training level, we design a region–text contrastive loss to replace conventional classification loss, allowing the model to align visual regions with textual descriptions beyond fixed category sets. Structurally, building on this, we introduce a multi-level text-guided fusion decoder that integrates visual features across multiple spatial scales under language guidance, thereby improving overall detection performance and enhancing the representation and perception of small objects. Finally, from the data perspective, we enrich the original dataset with synonym-augmented category labels, enabling more flexible and semantically expressive supervision. Experiments conducted on two widely used benchmark datasets demonstrate that our approach achieves significant improvements in both mean mAP and Recall. For instance, for Zero-Shot Detection on xView, UAV-OVD achieves 9.9 mAP and 67.3 Recall, 1.1 and 25.6 higher than that of YOLO-World. In terms of speed, UAV-OVD achieves 53.8 FPS, nearly twice as fast as YOLO-World and five times faster than DetrReg, demonstrating its strong potential for real-time open-vocabulary detection in UAV imagery. Full article
(This article belongs to the Special Issue Applications of UVs in Digital Photogrammetry and Image Processing)
Show Figures

Figure 1

21 pages, 5889 KiB  
Article
Mobile-YOLO: A Lightweight Object Detection Algorithm for Four Categories of Aquatic Organisms
by Hanyu Jiang, Jing Zhao, Fuyu Ma, Yan Yang and Ruiwen Yi
Fishes 2025, 10(7), 348; https://doi.org/10.3390/fishes10070348 - 14 Jul 2025
Viewed by 171
Abstract
Accurate and rapid aquatic organism recognition is a core technology for fisheries automation and aquatic organism statistical research. However, due to absorption and scattering effects, images of aquatic organisms often suffer from poor contrast and color distortion. Additionally, the clustering behavior of aquatic [...] Read more.
Accurate and rapid aquatic organism recognition is a core technology for fisheries automation and aquatic organism statistical research. However, due to absorption and scattering effects, images of aquatic organisms often suffer from poor contrast and color distortion. Additionally, the clustering behavior of aquatic organisms often leads to occlusion, further complicating the identification task. This study proposes a lightweight object detection model, Mobile-YOLO, for the recognition of four representative aquatic organisms, namely holothurian, echinus, scallop, and starfish. Our model first utilizes the Mobile-Nano backbone network we proposed, which enhances feature perception while maintaining a lightweight design. Then, we propose a lightweight detection head, LDtect, which achieves a balance between lightweight structure and high accuracy. Additionally, we introduce Dysample (dynamic sampling) and HWD (Haar wavelet downsampling) modules, aiming to optimize the feature fusion structure and achieve lightweight goals by improving the processes of upsampling and downsampling. These modules also help compensate for the accuracy loss caused by the lightweight design of LDtect. Compared to the baseline model, our model reduces Params (parameters) by 32.2%, FLOPs (floating point operations) by 28.4%, and weights (model storage size) by 30.8%, while improving FPS (frames per second) by 95.2%. The improvement in mAP (mean average precision) can also lead to better accuracy in practical applications, such as marine species monitoring, conservation efforts, and biodiversity assessment. Furthermore, the model’s accuracy is enhanced, with the mAP increased by 1.6%, demonstrating the advanced nature of our approach. Compared with YOLO (You Only Look Once) series (YOLOv5-12), SSD (Single Shot MultiBox Detector), EfficientDet (Efficient Detection), RetinaNet, and RT-DETR (Real-Time Detection Transformer), our model achieves leading comprehensive performance in terms of both accuracy and lightweight design. The results indicate that our research provides technological support for precise and rapid aquatic organism recognition. Full article
(This article belongs to the Special Issue Technology for Fish and Fishery Monitoring)
Show Figures

Figure 1

22 pages, 3279 KiB  
Article
HA-CP-Net: A Cross-Domain Few-Shot SAR Oil Spill Detection Network Based on Hybrid Attention and Category Perception
by Dongmei Song, Shuzhen Wang, Bin Wang, Weimin Chen and Lei Chen
J. Mar. Sci. Eng. 2025, 13(7), 1340; https://doi.org/10.3390/jmse13071340 - 13 Jul 2025
Viewed by 259
Abstract
Deep learning models have obvious advantages in detecting oil spills, but the training of deep learning models heavily depends on a large number of samples of high quality. However, due to the accidental nature, unpredictability, and urgency of oil spill incidents, it is [...] Read more.
Deep learning models have obvious advantages in detecting oil spills, but the training of deep learning models heavily depends on a large number of samples of high quality. However, due to the accidental nature, unpredictability, and urgency of oil spill incidents, it is difficult to obtain a large number of labeled samples in real oil spill monitoring scenarios. Surprisingly, few-shot learning can achieve excellent classification performance with only a small number of labeled samples. In this context, a new cross-domain few-shot SAR oil spill detection network is proposed in this paper. Significantly, the network is embedded with a hybrid attention feature extraction block, which consists of a coordinate attention module to perceive the channel information and spatial location information, as well as a global self-attention transformer module capturing the global dependencies and a multi-scale self-attention module depicting the local detailed features, thereby achieving deep mining and accurate characterization of image features. In addition, to address the problem that it is difficult to distinguish between the suspected oil film in seawater and real oil film using few-shot due to the small difference in features, this paper proposes a double loss function category determination block, which consists of two parts: a well-designed category-perception loss function and a traditional cross-entropy loss function. The category-perception loss function optimizes the spatial distribution of sample features by shortening the distance between similar samples while expanding the distance between different samples. By combining the category-perception loss function with the cross-entropy loss function, the network’s performance in discriminating between real and suspected oil films is thus maximized. The experimental results effectively demonstrate that this study provides an effective solution for high-precision oil spill detection under few-shot conditions, which is conducive to the rapid identification of oil spill accidents. Full article
(This article belongs to the Section Marine Environmental Science)
Show Figures

Figure 1

20 pages, 3147 KiB  
Article
Crossed Wavelet Convolution Network for Few-Shot Defect Detection of Industrial Chips
by Zonghai Sun, Yiyu Lin, Yan Li and Zihan Lin
Sensors 2025, 25(14), 4377; https://doi.org/10.3390/s25144377 - 13 Jul 2025
Viewed by 277
Abstract
In resistive polymer humidity sensors, the quality of the resistor chips directly affects the performance. Detecting chip defects remains challenging due to the scarcity of defective samples, which limits traditional supervised-learning methods requiring abundant labeled data. While few-shot learning (FSL) shows promise for [...] Read more.
In resistive polymer humidity sensors, the quality of the resistor chips directly affects the performance. Detecting chip defects remains challenging due to the scarcity of defective samples, which limits traditional supervised-learning methods requiring abundant labeled data. While few-shot learning (FSL) shows promise for industrial defect detection, existing approaches struggle with mixed-scene conditions (e.g., daytime and night-version scenes). In this work, we propose a crossed wavelet convolution network (CWCN), including a dual-pipeline crossed wavelet convolution training framework (DPCWC) and a loss value calculation module named ProSL. Our method innovatively applies wavelet transform convolution and prototype learning to industrial defect detection, which effectively fuses feature information from multiple scenarios and improves the detection performance. Experiments across various few-shot tasks on chip datasets illustrate the better detection quality of CWCN, with an improvement in mAP ranging from 2.76% to 16.43% over other FSL methods. In addition, experiments on an open-source dataset NEU-DET further validate our proposed method. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

23 pages, 16886 KiB  
Article
SAVL: Scene-Adaptive UAV Visual Localization Using Sparse Feature Extraction and Incremental Descriptor Mapping
by Ganchao Liu, Zhengxi Li, Qiang Gao and Yuan Yuan
Remote Sens. 2025, 17(14), 2408; https://doi.org/10.3390/rs17142408 - 12 Jul 2025
Viewed by 322
Abstract
In recent years, the use of UAVs has become widespread. Long distance flight of UAVs requires obtaining precise geographic coordinates. Global Navigation Satellite Systems (GNSS) are the most common positioning models, but their signals are susceptible to interference from obstacles and complex electromagnetic [...] Read more.
In recent years, the use of UAVs has become widespread. Long distance flight of UAVs requires obtaining precise geographic coordinates. Global Navigation Satellite Systems (GNSS) are the most common positioning models, but their signals are susceptible to interference from obstacles and complex electromagnetic environments. In this case, vision-based technology can serve as an alternative solution to ensure the self-positioning capability of UAVs. Therefore, a scene adaptive UAV visual localization framework (SAVL) is proposed. In the proposed framework, UAV images are mapped to satellite images with geographic coordinates through pixel-level matching to locate UAVs. Firstly, to tackle the challenge of inaccurate localization resulting from sparse terrain features, this work proposes a novel feature extraction network grounded in a general visual model, leveraging the robust zero-shot generalization capability of the pre-trained model and extracting sparse features from UAV and satellite imagery. Secondly, in order to overcome the problem of weak generalization ability in unknown scenarios, a descriptor incremental mapping module was designed, which reduces multi-source image differences at the semantic level through UAV satellite image descriptor mapping and constructs a confidence-based incremental strategy to dynamically adapt to the scene. Finally, due to the lack of annotated public datasets, a scene-rich UAV dataset (RealUAV) was constructed to study UAV visual localization in real-world environments. In order to evaluate the localization performance of the proposed framework, several related methods were compared and analyzed in detail. The results on the dataset indicate that the proposed method achieves excellent positioning accuracy, with an average error of only 8.71 m. Full article
Show Figures

Figure 1

Back to TopTop