Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (498)

Search Parameters:
Keywords = fine-grained tasks

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 8476 KiB  
Article
A Weakly Supervised Network for Coarse-to-Fine Change Detection in Hyperspectral Images
by Yadong Zhao and Zhao Chen
Remote Sens. 2025, 17(15), 2624; https://doi.org/10.3390/rs17152624 - 28 Jul 2025
Abstract
Hyperspectral image change detection (HSI-CD) provides substantial value in environmental monitoring, urban planning and other fields. In recent years, deep-learning based HSI-CD methods have made remarkable progress due to their powerful nonlinear feature learning capabilities, yet they face several challenges: mixed-pixel phenomenon affecting [...] Read more.
Hyperspectral image change detection (HSI-CD) provides substantial value in environmental monitoring, urban planning and other fields. In recent years, deep-learning based HSI-CD methods have made remarkable progress due to their powerful nonlinear feature learning capabilities, yet they face several challenges: mixed-pixel phenomenon affecting pixel-level detection accuracy; heterogeneous spatial scales of change targets where coarse-grained features fail to preserve fine-grained details; and dependence on high-quality labels. To address these challenges, this paper introduces WSCDNet, a weakly supervised HSI-CD network employing coarse-to-fine feature learning, with key innovations including: (1) A dual-branch detection framework integrating binary and multiclass change detection at the sub-pixel level that enhances collaborative optimization through a cross-feature coupling module; (2) introduction of multi-granularity aggregation and difference feature enhancement module for detecting easily confused regions, which effectively improves the model’s detection accuracy; and (3) proposal of a weakly supervised learning strategy, reducing model sensitivity to noisy pseudo-labels through decision-level consistency measurement and sample filtering mechanisms. Experimental results demonstrate that WSCDNet effectively enhances the accuracy and robustness of HSI-CD tasks, exhibiting superior performance under complex scenarios and weakly supervised conditions. Full article
(This article belongs to the Section Remote Sensing Image Processing)
19 pages, 2106 KiB  
Article
Rethinking Infrared and Visible Image Fusion from a Heterogeneous Content Synergistic Perception Perspective
by Minxian Shen, Gongrui Huang, Mingye Ju and Kaikuang Ma
Sensors 2025, 25(15), 4658; https://doi.org/10.3390/s25154658 - 27 Jul 2025
Abstract
Infrared and visible image fusion (IVIF) endeavors to amalgamate the thermal radiation characteristics from infrared images with the fine-grained texture details from visible images, aiming to produce fused outputs that are more robust and information-rich. Among the existing methodologies, those based on generative [...] Read more.
Infrared and visible image fusion (IVIF) endeavors to amalgamate the thermal radiation characteristics from infrared images with the fine-grained texture details from visible images, aiming to produce fused outputs that are more robust and information-rich. Among the existing methodologies, those based on generative adversarial networks (GANs) have demonstrated considerable promise. However, such approaches are frequently constrained by their reliance on homogeneous discriminators possessing identical architectures, a limitation that can precipitate the emergence of undesirable artifacts in the resultant fused images. To surmount this challenge, this paper introduces HCSPNet, a novel GAN-based framework. HCSPNet distinctively incorporates heterogeneous dual discriminators, meticulously engineered for the fusion of disparate source images inherent in the IVIF task. This architectural design ensures the steadfast preservation of critical information from the source inputs, even when faced with scenarios of image degradation. Specifically, the two structurally distinct discriminators within HCSPNet are augmented with adaptive salient information distillation (ASID) modules, each uniquely structured to align with the intrinsic properties of infrared and visible images. This mechanism impels the discriminators to concentrate on pivotal components during their assessment of whether the fused image has proficiently inherited significant information from the source modalities—namely, the salient thermal signatures from infrared imagery and the detailed textural content from visible imagery—thereby markedly diminishing the occurrence of unwanted artifacts. Comprehensive experimentation conducted across multiple publicly available datasets substantiates the preeminence and generalization capabilities of HCSPNet, underscoring its significant potential for practical deployment. Additionally, we also prove that our proposed heterogeneous dual discriminators can serve as a plug-and-play structure to improve the performance of existing GAN-based methods. Full article
(This article belongs to the Section Sensing and Imaging)
51 pages, 4494 KiB  
Review
A Survey of Loss Functions in Deep Learning
by Caiyi Li, Kaishuai Liu and Shuai Liu
Mathematics 2025, 13(15), 2417; https://doi.org/10.3390/math13152417 - 27 Jul 2025
Abstract
Deep learning (DL), as a cutting-edge technology in artificial intelligence, has significantly impacted fields such as computer vision and natural language processing. Loss function determines the convergence speed and accuracy of the DL model and has a crucial impact on algorithm quality and [...] Read more.
Deep learning (DL), as a cutting-edge technology in artificial intelligence, has significantly impacted fields such as computer vision and natural language processing. Loss function determines the convergence speed and accuracy of the DL model and has a crucial impact on algorithm quality and model performance. However, most of the existing studies focus on the improvement of specific problems of loss function, which lack a systematic summary and comparison, especially in computer vision and natural language processing tasks. Therefore, this paper reclassifies and summarizes the loss functions in DL and proposes a new category of metric loss. Furthermore, this paper conducts a fine-grained division of regression loss, classification loss, and metric loss, elaborating on the existing problems and improvements. Finally, the new trend of compound loss and generative loss is anticipated. The proposed paper provides a new perspective for loss function division and a systematic reference for researchers in the DL field. Full article
(This article belongs to the Special Issue Advances in Applied Mathematics in Computer Vision)
22 pages, 6487 KiB  
Article
An RGB-D Vision-Guided Robotic Depalletizing System for Irregular Camshafts with Transformer-Based Instance Segmentation and Flexible Magnetic Gripper
by Runxi Wu and Ping Yang
Actuators 2025, 14(8), 370; https://doi.org/10.3390/act14080370 - 24 Jul 2025
Viewed by 171
Abstract
Accurate segmentation of densely stacked and weakly textured objects remains a core challenge in robotic depalletizing for industrial applications. To address this, we propose MaskNet, an instance segmentation network tailored for RGB-D input, designed to enhance recognition performance under occlusion and low-texture conditions. [...] Read more.
Accurate segmentation of densely stacked and weakly textured objects remains a core challenge in robotic depalletizing for industrial applications. To address this, we propose MaskNet, an instance segmentation network tailored for RGB-D input, designed to enhance recognition performance under occlusion and low-texture conditions. Built upon a Vision Transformer backbone, MaskNet adopts a dual-branch architecture for RGB and depth modalities and integrates multi-modal features using an attention-based fusion module. Further, spatial and channel attention mechanisms are employed to refine feature representation and improve instance-level discrimination. The segmentation outputs are used in conjunction with regional depth to optimize the grasping sequence. Experimental evaluations on camshaft depalletizing tasks demonstrate that MaskNet achieves a precision of 0.980, a recall of 0.971, and an F1-score of 0.975, outperforming a YOLO11-based baseline. In an actual scenario, with a self-designed flexible magnetic gripper, the system maintains a maximum grasping error of 9.85 mm and a 98% task success rate across multiple camshaft types. These results validate the effectiveness of MaskNet in enabling fine-grained perception for robotic manipulation in cluttered, real-world scenarios. Full article
(This article belongs to the Section Actuators for Robotics)
Show Figures

Figure 1

19 pages, 5417 KiB  
Article
SE-TFF: Adaptive Tourism-Flow Forecasting Under Sparse and Heterogeneous Data via Multi-Scale SE-Net
by Jinyuan Zhang, Tao Cui and Peng He
Appl. Sci. 2025, 15(15), 8189; https://doi.org/10.3390/app15158189 - 23 Jul 2025
Viewed by 90
Abstract
Accurate and timely forecasting of cross-regional tourist flows is essential for sustainable destination management, yet existing models struggle with sparse data, complex spatiotemporal interactions, and limited interpretability. This paper presents SE-TFF, a multi-scale tourism-flow forecasting framework that couples a Squeeze-and-Excitation (SE) network with [...] Read more.
Accurate and timely forecasting of cross-regional tourist flows is essential for sustainable destination management, yet existing models struggle with sparse data, complex spatiotemporal interactions, and limited interpretability. This paper presents SE-TFF, a multi-scale tourism-flow forecasting framework that couples a Squeeze-and-Excitation (SE) network with reinforcement-driven optimization to adaptively re-weight environmental, economic, and social features. A benchmark dataset of 17.8 million records from 64 countries and 743 cities (2016–2024) is compiled from the Open Travel Data repository in github (OPTD) for training and validation. SE-TFF introduces (i) a multi-channel SE module for fine-grained feature selection under heterogeneous conditions, (ii) a Top-K attention filter to preserve salient context in highly sparse matrices, and (iii) a Double-DQN layer that dynamically balances prediction objectives. Experimental results show SE-TFF attains 56.5% MAE and 65.6% RMSE reductions over the best baseline (ARIMAX) at 20% sparsity, with 0.92 × 103 average MAE across multi-task outputs. SHAP analysis ranks climate anomalies, tourism revenue, and employment as dominant predictors. These gains demonstrate SE-TFF’s ability to deliver real-time, interpretable forecasts for data-limited destinations. Future work will incorporate real-time social media signals and larger multimodal datasets to enhance generalizability. Full article
Show Figures

Figure 1

23 pages, 7173 KiB  
Article
LiDAR Data-Driven Deep Network for Ship Berthing Behavior Prediction in Smart Port Systems
by Jiyou Wang, Ying Li, Hua Guo, Zhaoyi Zhang and Yue Gao
J. Mar. Sci. Eng. 2025, 13(8), 1396; https://doi.org/10.3390/jmse13081396 - 23 Jul 2025
Viewed by 190
Abstract
Accurate ship berthing behavior prediction (BBP) is essential for enabling collision warnings and support decision-making. Existing methods based on Automatic Identification System (AIS) data perform well in the task of ship trajectory prediction over long time-series and large scales, but struggle with addressing [...] Read more.
Accurate ship berthing behavior prediction (BBP) is essential for enabling collision warnings and support decision-making. Existing methods based on Automatic Identification System (AIS) data perform well in the task of ship trajectory prediction over long time-series and large scales, but struggle with addressing the fine-grained and highly dynamic changes in berthing scenarios. Therefore, the accuracy of BBP remains a crucial challenge. In this paper, a novel BBP method based on Light Detection and Ranging (LiDAR) data is proposed. To test its feasibility, a comprehensive dataset is established by conducting on-site collection of berthing data at Dalian Port (China) using a shore-based LiDAR system. This dataset comprises equal-interval data from 77 berthing activities involving three large ships. In order to find a straightforward architecture to provide good performance on our dataset, a cascading network model combining convolutional neural network (CNN), a bi-directional gated recurrent unit (BiGRU) and bi-directional long short-term memory (BiLSTM) are developed to serve as the baseline. Experimental results demonstrate that the baseline outperformed other commonly used prediction models and their combinations in terms of prediction accuracy. In summary, our research findings help overcome the limitations of AIS data in berthing scenarios and provide a foundation for predicting complete berthing status, therefore offering practical insights for safer, more efficient, and automated management in smart port systems. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

17 pages, 1927 KiB  
Article
ConvTransNet-S: A CNN-Transformer Hybrid Disease Recognition Model for Complex Field Environments
by Shangyun Jia, Guanping Wang, Hongling Li, Yan Liu, Linrong Shi and Sen Yang
Plants 2025, 14(15), 2252; https://doi.org/10.3390/plants14152252 - 22 Jul 2025
Viewed by 253
Abstract
To address the challenges of low recognition accuracy and substantial model complexity in crop disease identification models operating in complex field environments, this study proposed a novel hybrid model named ConvTransNet-S, which integrates Convolutional Neural Networks (CNNs) and transformers for crop disease identification [...] Read more.
To address the challenges of low recognition accuracy and substantial model complexity in crop disease identification models operating in complex field environments, this study proposed a novel hybrid model named ConvTransNet-S, which integrates Convolutional Neural Networks (CNNs) and transformers for crop disease identification tasks. Unlike existing hybrid approaches, ConvTransNet-S uniquely introduces three key innovations: First, a Local Perception Unit (LPU) and Lightweight Multi-Head Self-Attention (LMHSA) modules were introduced to synergistically enhance the extraction of fine-grained plant disease details and model global dependency relationships, respectively. Second, an Inverted Residual Feed-Forward Network (IRFFN) was employed to optimize the feature propagation path, thereby enhancing the model’s robustness against interferences such as lighting variations and leaf occlusions. This novel combination of a LPU, LMHSA, and an IRFFN achieves a dynamic equilibrium between local texture perception and global context modeling—effectively resolving the trade-offs inherent in standalone CNNs or transformers. Finally, through a phased architecture design, efficient fusion of multi-scale disease features is achieved, which enhances feature discriminability while reducing model complexity. The experimental results indicated that ConvTransNet-S achieved a recognition accuracy of 98.85% on the PlantVillage public dataset. This model operates with only 25.14 million parameters, a computational load of 3.762 GFLOPs, and an inference time of 7.56 ms. Testing on a self-built in-field complex scene dataset comprising 10,441 images revealed that ConvTransNet-S achieved an accuracy of 88.53%, which represents improvements of 14.22%, 2.75%, and 0.34% over EfficientNetV2, Vision Transformer, and Swin Transformer, respectively. Furthermore, the ConvTransNet-S model achieved up to 14.22% higher disease recognition accuracy under complex background conditions while reducing the parameter count by 46.8%. This confirms that its unique multi-scale feature mechanism can effectively distinguish disease from background features, providing a novel technical approach for disease diagnosis in complex agricultural scenarios and demonstrating significant application value for intelligent agricultural management. Full article
(This article belongs to the Section Plant Modeling)
Show Figures

Figure 1

17 pages, 1913 KiB  
Article
CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal Attention
by Jian Yan, Xingfa Gu and Yuxing Chen
Remote Sens. 2025, 17(14), 2481; https://doi.org/10.3390/rs17142481 - 17 Jul 2025
Viewed by 311
Abstract
Recent progress in geospatial foundation models (GFMs) has demonstrated strong generalization capabilities for remote sensing downstream tasks. However, existing GFMs still struggle with fine-grained cropland classification due to ambiguous field boundaries, insufficient and low-efficient temporal modeling, and limited cross-regional adaptability. In this paper, [...] Read more.
Recent progress in geospatial foundation models (GFMs) has demonstrated strong generalization capabilities for remote sensing downstream tasks. However, existing GFMs still struggle with fine-grained cropland classification due to ambiguous field boundaries, insufficient and low-efficient temporal modeling, and limited cross-regional adaptability. In this paper, we propose CropSTS, a remote sensing foundation model designed with a decoupled temporal–spatial attention architecture, specifically tailored for the temporal dynamics of cropland remote sensing data. To efficiently pre-train the model under limited labeled data, we employ a hybrid framework combining joint-embedding predictive architecture with knowledge distillation from web-scale foundation models. Despite being trained on a small dataset and using a compact model, CropSTS achieves state-of-the-art performance on the PASTIS-R benchmark in terms of mIoU and F1-score. Our results validate that structural optimization for temporal encoding and cross-modal knowledge transfer constitute effective strategies for advancing GFM design in agricultural remote sensing. Full article
(This article belongs to the Special Issue Advanced AI Technology for Remote Sensing Analysis)
Show Figures

Figure 1

21 pages, 9749 KiB  
Article
Enhanced Pose Estimation for Badminton Players via Improved YOLOv8-Pose with Efficient Local Attention
by Yijian Wu, Zewen Chen, Hongxing Zhang, Yulin Yang and Weichao Yi
Sensors 2025, 25(14), 4446; https://doi.org/10.3390/s25144446 - 17 Jul 2025
Viewed by 323
Abstract
With the rapid development of sports analytics and artificial intelligence, accurate human pose estimation in badminton is becoming increasingly important. However, challenges such as the lack of domain-specific datasets and the complexity of athletes’ movements continue to hinder progress in this area. To [...] Read more.
With the rapid development of sports analytics and artificial intelligence, accurate human pose estimation in badminton is becoming increasingly important. However, challenges such as the lack of domain-specific datasets and the complexity of athletes’ movements continue to hinder progress in this area. To address these issues, we propose an enhanced pose estimation framework tailored to badminton players, built upon an improved YOLOv8-Pose architecture. In particular, we introduce an efficient local attention (ELA) mechanism that effectively captures fine-grained spatial dependencies and contextual information, thereby significantly improving the keypoint localization accuracy and overall pose estimation performance. To support this study, we construct a dedicated badminton pose dataset comprising 4000 manually annotated samples, captured using a Microsoft Kinect v2 camera. The raw data undergo careful processing and refinement through a combination of depth-assisted annotation and visual inspection to ensure high-quality ground truth keypoints. Furthermore, we conduct an in-depth comparative analysis of multiple attention modules and their integration strategies within the network, offering generalizable insights to enhance pose estimation models in other sports domains. The experimental results show that the proposed ELA-enhanced YOLOv8-Pose model consistently achieves superior accuracy across multiple evaluation metrics, including the mean squared error (MSE), object keypoint similarity (OKS), and percentage of correct keypoints (PCK), highlighting its effectiveness and potential for broader applications in sports vision tasks. Full article
(This article belongs to the Special Issue Computer Vision-Based Human Activity Recognition)
Show Figures

Figure 1

29 pages, 5825 KiB  
Article
BBSNet: An Intelligent Grading Method for Pork Freshness Based on Few-Shot Learning
by Chao Liu, Jiayu Zhang, Kunjie Chen and Jichao Huang
Foods 2025, 14(14), 2480; https://doi.org/10.3390/foods14142480 - 15 Jul 2025
Viewed by 275
Abstract
Deep learning approaches for pork freshness grading typically require large datasets, which limits their practical application due to the high costs associated with data collection. To address this challenge, we propose BBSNet, a lightweight few-shot learning model designed for accurate freshness classification with [...] Read more.
Deep learning approaches for pork freshness grading typically require large datasets, which limits their practical application due to the high costs associated with data collection. To address this challenge, we propose BBSNet, a lightweight few-shot learning model designed for accurate freshness classification with a limited number of images. BBSNet incorporates a batch channel normalization (BCN) layer to enhance feature distinguishability and employs BiFormer for optimized fine-grained feature extraction. Trained on a dataset of 600 pork images graded by microbial cell concentration, BBSNet achieved an average accuracy of 96.36% in a challenging 5-way 80-shot task. This approach significantly reduces data dependency while maintaining high accuracy, presenting a viable solution for cost-effective real-time pork quality monitoring. This work introduces a novel framework that connects laboratory freshness indicators to industrial applications in data-scarce conditions. Future research will investigate its extension to various food types and optimization for deployment on portable devices. Full article
Show Figures

Figure 1

25 pages, 4882 KiB  
Article
HSF-YOLO: A Multi-Scale and Gradient-Aware Network for Small Object Detection in Remote Sensing Images
by Fujun Wang and Xing Wang
Sensors 2025, 25(14), 4369; https://doi.org/10.3390/s25144369 - 12 Jul 2025
Viewed by 392
Abstract
Small object detection (SOD) in remote sensing images (RSIs) is a challenging task due to scale variation, severe occlusion, and complex backgrounds, often leading to high miss and false detection rates. To address these issues, this paper proposes a novel detection framework named [...] Read more.
Small object detection (SOD) in remote sensing images (RSIs) is a challenging task due to scale variation, severe occlusion, and complex backgrounds, often leading to high miss and false detection rates. To address these issues, this paper proposes a novel detection framework named HSF-YOLO, which is designed to jointly enhance feature encoding, attention interaction, and localization precision within the YOLOv8 backbone. Specifically, we introduce three tailored modules: Hybrid Atrous Enhanced Convolution (HAEC), a Spatial–Interactive–Shuffle attention module (C2f_SIS), and a Focal Gradient Refinement Loss (FGR-Loss). The HAEC module captures multi-scale semantic and fine-grained local information through parallel atrous and standard convolutions, thereby enhancing small object representation across scales. The C2f_SIS module fuses spatial and improved channel attention with a channel shuffle strategy to enhance feature interaction and suppress background noise. The FGR-Loss incorporates gradient-aware localization, focal weighting, and separation-aware constraints to improve regression accuracy and training robustness. Extensive experiments were conducted on three public remote sensing datasets. Compared with the baseline YOLOv8, HSF-YOLO improved mAP@0.5 and mAP@0.5:0.95 by 5.7% and 4.0% on the VisDrone2019 dataset, by 2.3% and 2.5% on the DIOR dataset, and by 2.3% and 2.1% on the NWPU VHR-10 dataset, respectively. These results confirm that HSF-YOLO is a unified and effective solution for small object detection in complex RSI scenarios, offering a good balance between accuracy and efficiency. Full article
(This article belongs to the Special Issue Application of Satellite Remote Sensing in Geospatial Monitoring)
Show Figures

Figure 1

23 pages, 17223 KiB  
Article
Improving Moving Insect Detection with Difference of Features Maps in YOLO Architecture
by Angel Gomez-Canales, Javier Gomez-Avila, Jesus Hernandez-Barragan, Carlos Lopez-Franco, Carlos Villaseñor and Nancy Arana-Daniel
Appl. Sci. 2025, 15(14), 7697; https://doi.org/10.3390/app15147697 - 9 Jul 2025
Viewed by 355
Abstract
Insect detection under real-field conditions remains a challenging task due to factors such as lighting variations and the small size of insects that often lack sufficient visual features for reliable identification by deep learning models. These limitations become especially pronounced in lightweight architectures, [...] Read more.
Insect detection under real-field conditions remains a challenging task due to factors such as lighting variations and the small size of insects that often lack sufficient visual features for reliable identification by deep learning models. These limitations become especially pronounced in lightweight architectures, which, although efficient, struggle to capture fine-grained details under suboptimal conditions, such as variable lighting conditions, shadows, small object size and occlusion. To address this, we introduce the motion module, a lightweight component designed to enhance object detection by integrating motion information directly at the feature map level within the YOLOv8 backbone. Unlike methods that rely on frame differencing and require additional preprocessing steps, our approach operates on raw input and uses only two consecutive frames. Experimental evaluations demonstrate that incorporating the motion module leads to consistent performance improvements across key metrics. For instance, on the YOLOv8n model, the motion module yields gains of up to 5.11% in mAP50 and 7.83% in Recall, with only a small computational overhead. Moreover, under simulated illumination shifts using HSV transformations, our method exhibits robustness to these variations. These results highlight the potential of the motion module as a practical and effective tool for improving insect detection in dynamic and unpredictable field scenarios. Full article
(This article belongs to the Special Issue Deep Learning for Object Detection)
Show Figures

Figure 1

30 pages, 17752 KiB  
Article
DMA-Net: Dynamic Morphology-Aware Segmentation Network for Remote Sensing Images
by Chao Deng, Haojian Liang, Xiao Qin and Shaohua Wang
Remote Sens. 2025, 17(14), 2354; https://doi.org/10.3390/rs17142354 - 9 Jul 2025
Viewed by 355
Abstract
Semantic segmentation of remote sensing imagery is a pivotal task for intelligent interpretation, with critical applications in urban monitoring, resource management, and disaster assessment. Recent advancements in deep learning have significantly improved RS image segmentation, particularly through the use of convolutional neural networks, [...] Read more.
Semantic segmentation of remote sensing imagery is a pivotal task for intelligent interpretation, with critical applications in urban monitoring, resource management, and disaster assessment. Recent advancements in deep learning have significantly improved RS image segmentation, particularly through the use of convolutional neural networks, which demonstrate remarkable proficiency in local feature extraction. However, due to the inherent locality of convolutional operations, prevailing methodologies frequently encounter challenges in capturing long-range dependencies, thereby constraining their comprehensive semantic comprehension. Moreover, the preprocessing of high-resolution remote sensing images by dividing them into sub-images disrupts spatial continuity, further complicating the balance between local feature extraction and global context modeling. To address these limitations, we propose DMA-Net, a Dynamic Morphology-Aware Segmentation Network built on an encoder–decoder architecture. The proposed framework incorporates three primary parts: a Multi-Axis Vision Transformer (MaxViT) encoder achieves a balance between local feature extraction and global context modeling through multi-axis self-attention mechanisms; a Hierarchy Attention Decoder (HA-Decoder) enhanced with Hierarchy Convolutional Groups (HCG) for precise recovery of fine-grained spatial details; and a Channel and Spatial Attention Bridge (CSA-Bridge) to mitigate the encoder–decoder semantic gap while amplifying discriminative feature representations. Extensive experimentation has been conducted to demonstrate the state-of-the-art performance of DMA-Net, which has been shown to achieve 87.31% mIoU on Potsdam, 83.23% on Vaihingen, and 54.23% on LoveDA, thereby surpassing existing methods. Full article
Show Figures

Figure 1

20 pages, 2285 KiB  
Article
WormNet: A Multi-View Network for Silkworm Re-Identification
by Hongkang Shi, Minghui Zhu, Linbo Li, Yong Ma, Jianmei Wu, Jianfei Zhang and Junfeng Gao
Animals 2025, 15(14), 2011; https://doi.org/10.3390/ani15142011 - 8 Jul 2025
Viewed by 188
Abstract
Re-identification (ReID) has been widely applied in person and vehicle recognition tasks. This study extends its application to a novel domain: insect (silkworm) recognition. However, unlike person or vehicle ReID, silkworm ReID presents unique challenges, such as the high similarity between individuals, arbitrary [...] Read more.
Re-identification (ReID) has been widely applied in person and vehicle recognition tasks. This study extends its application to a novel domain: insect (silkworm) recognition. However, unlike person or vehicle ReID, silkworm ReID presents unique challenges, such as the high similarity between individuals, arbitrary poses, and significant background noise. To address these challenges, we propose a multi-view network for silkworm ReID, called WormNet, which is built upon an innovative strategy termed extraction purification extraction interaction. Specifically, we introduce a multi-order feature extraction module that captures a wide range of fine-grained features by utilizing convolutional kernels of varying sizes and parallel cardinality, effectively mitigating issues of high individual similarity and diverse poses. Next, a feature mask module (FMM) is employed to purify the features in the spatial domain, thereby reducing the impact of background interference. To further enhance the data representation capabilities of the network, we propose a channel interaction module (CIM), which combines an efficient channel attention network with global response normalization (GRN) in parallel to recalibrate features, enabling the network to learn crucial information at both the local and global scales. Additionally, we introduce a new silkworm ReID dataset for network training and evaluation. The experimental results demonstrate that WormNet achieves an mAP value of 54.8% and a rank-1 value of 91.4% on the dataset, surpassing both state-of-the-art and related networks. This study offers a valuable reference for ReID in insects and other organisms. Full article
(This article belongs to the Section Animal System and Management)
Show Figures

Figure 1

19 pages, 9844 KiB  
Article
DSMBAD: Dual-Stream Memory Bank Framework for Unified Industrial Anomaly Detection
by Hongmin Hu, Xiaodong Wang, Jiangtao Fan, Zhiqiang Zeng, Junwen Lu, Otis Hong and Jihuang Zhang
Electronics 2025, 14(14), 2748; https://doi.org/10.3390/electronics14142748 - 8 Jul 2025
Viewed by 330
Abstract
Industrial image anomaly detection requires the simultaneous identification of local structural and global logical anomalies. Existing methods specialize in single-type anomalies due to divergent feature requirements: structural anomalies demand fine-grained local features, while logical anomalies need semantic features. Consequently, designing a unified network [...] Read more.
Industrial image anomaly detection requires the simultaneous identification of local structural and global logical anomalies. Existing methods specialize in single-type anomalies due to divergent feature requirements: structural anomalies demand fine-grained local features, while logical anomalies need semantic features. Consequently, designing a unified network architecture that effectively captures both features without task conflicts remains a key challenge. To address this problem, we propose a Dual-Stream Memory Bank Anomaly Detection (DSMBAD) framework, which enables the collaborative detection of both structural and logical anomalies from complementary perspectives. The framework consists of two memory banks: one stores patch features for detecting structural anomalies through local feature discrepancies, while the other uses segmentation maps to model component relationships for logical anomaly identification. Additionally, a feature distillation mechanism aligns features from different backbone networks to enhance global semantic information. We also introduce a shape-based anomaly scoring method that quantifies differences in component relationships using spatial–morphological features. Experimental results on the MVTec LOCO AD dataset show that our method achieves 91.0% I-AUROC (logical) and 90.8% (structural), significantly outperforming single-type models. Ablation studies confirm the dual-stream design and module effectiveness, offering a novel unified solution. Full article
Show Figures

Figure 1

Back to TopTop