MDPI - Publisher of Open Access Journals

25 pages, 3612 KB

Open AccessArticle

Learning Modality Complementarity for RGB-D Salient Object Detection via Dynamic Neural Network

by Yuanhao Li, Jia Song, Chenglizhao Chen and Xinyu Liu

Electronics 2026, 15(7), 1361; https://doi.org/10.3390/electronics15071361 - 25 Mar 2026

RGB-D salient object detection (RGB-D SOD) aims to accurately localize and segment visually salient objects by jointly leveraging RGB images and depth maps. Some existing methods rely on static fusion strategies with fixed paths and weights, which treat all regions equally and fail [...] Read more.

RGB-D salient object detection (RGB-D SOD) aims to accurately localize and segment visually salient objects by jointly leveraging RGB images and depth maps. Some existing methods rely on static fusion strategies with fixed paths and weights, which treat all regions equally and fail to capture the varying importance of different regions and modalities. Although some attention-based methods alleviate the limitations of static fusion by assigning adaptive weights to different regions and modalities, the quality of RGB and depth data may degrade in real-world scenarios due to sensor noise, illumination changes, or environmental interference. These attention-based methods often overlook inter-modality quality differences and complementarity, making them prone to over-relying on a certain modality, which can lead to noise introduction, feature conflicts, and performance degradation. To address these limitations, this paper proposes a novel dynamic feature routing and fusion framework for RGB-D SOD, which adaptively adjusts the fusion strategy according to the quality of input modalities. To enable modality quality awareness, the proposed method characterizes the modality complementarity between RGB and depth features in a task-driven manner inspired by information-theoretic principles. We introduce a task-relevance scoring function which is integrated with a mutual information estimator to quantify such complementarity, and emphasizes task-relevant features while suppressing redundancy. A dynamic routing module is then designed to perform feature selection guided by the captured complementarity. In addition, we propose a novel cross-modal fusion module to adaptively fuse the features selected by the dynamic routing module, which effectively enhances complementary representations while suppressing redundant features and noise interference. Extensive experiments conducted on seven public RGB-D SOD benchmark datasets demonstrate that the proposed method consistently achieves competitive performance, outperforming existing methods by an average of approximately 1% across multiple evaluation metrics. Notably, in challenging scenarios with severe modality quality degradation, the proposed method outperforms existing best-performing methods by up to 1.8%, demonstrating strong robustness against cluttered backgrounds, complex object structures, and diverse object scales. Overall, the proposed dynamic fusion framework provides a novel solution to modality quality imbalance in RGB-D salient object detection. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

15 pages, 1686 KB

Open AccessArticle

A Data-Driven Approach for Comparing Gaze Allocation Across Conditions

by Jack Prosser, Anna Metzger and Matteo Toscani

J. Eye Mov. Res. 2026, 19(2), 33; https://doi.org/10.3390/jemr19020033 - 18 Mar 2026

Viewed by 178

Abstract

Gaze analysis often relies on hypothesised, subjectively defined regions of interest (ROIs) or heatmaps: ROIs enable condition comparisons but reduce objectivity and exploration; while heatmaps avoid this, they require many pixel-wise comparisons, making differences hard to detect. Here, we propose an advanced data-driven [...] Read more.

Gaze analysis often relies on hypothesised, subjectively defined regions of interest (ROIs) or heatmaps: ROIs enable condition comparisons but reduce objectivity and exploration; while heatmaps avoid this, they require many pixel-wise comparisons, making differences hard to detect. Here, we propose an advanced data-driven approach for analysing gaze behaviour. We use DNNs (adapted versions of AlexNet) to classify conditions from gaze patterns, paired with reverse correlation to show where and how gaze differs between conditions. We test our approach on data from an experiment investigating the effects of object-specific sounds (e.g., church bell ringing) on gaze allocation. ROI-based analysis shows a significant difference between conditions (congruent sound, no sound, phase-scrambled sound and pink noise), with more gaze allocation on sound-associated objects in the congruent sound condition. However, as expected, significance depends on the definition of the ROIs. Heatmaps show some unclear qualitative differences, but none are significant after correcting for pixelwise comparisons. We showed that, for some scenes, the DNNs could classify the task based on individual fixations with accuracy significantly higher than chance. Our approach shows that sound can alter gaze allocation, revealing task-specific, non-trivial strategies: fixations are not always drawn to the sound source but shift away from salient features, sometimes falling between salient features and the sound source. Crucially, such fixation strategies could not be revealed using a traditional hypothesis-driven approach. Overall, the method is objective, data-driven, and enables clear comparisons of conditions. Full article

► Show Figures

Figure 1

26 pages, 12325 KB

Open AccessArticle

Pairwise Comparison-Based Salient Object Ranking Using Multimodal Large Models

by Yifan Liu, Jia Song and Chenglizhao Chen

Sensors 2026, 26(6), 1913; https://doi.org/10.3390/s26061913 (registering DOI) - 18 Mar 2026

Viewed by 106

Abstract

Salient object ranking aims to assign a relative importance order to multiple objects in an image, aligning with human visual attention. However, existing methods struggle with ranking ambiguity in complex scenes, particularly when objects are numerous, occluded, or semantically similar, leading to decreased [...] Read more.

Salient object ranking aims to assign a relative importance order to multiple objects in an image, aligning with human visual attention. However, existing methods struggle with ranking ambiguity in complex scenes, particularly when objects are numerous, occluded, or semantically similar, leading to decreased accuracy for low-saliency objects. To address this, we propose PairwiseSOR-MLMs, a novel framework leveraging multimodal large models and pairwise comparison to achieve salient object ranking. The approach decomposes global ranking into a series of pairwise comparison tasks. It first employs object detection and instance segmentation to identify objects, uses image inpainting to reconstruct scenes by removing occlusions, and then prompts MLMs to perform pairwise comparisons based on visual saliency cues. Finally, another MLM inference aggregates these comparisons into a consistent global ranking. Experiments on ASSR and IRSR benchmarks show our method achieves state-of-the-art or competitive performance across metrics, demonstrating robustness in handling occlusion and semantic similarity. Its pairwise comparison paradigm can extend to other relative assessment tasks. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

23 pages, 13226 KB

Open AccessArticle

DDAF-Net: Decoupled and Differentiated Attention Fusion Network for Object Detection

by Bo Yu, Guanghui Zhang, Qun Wang and Lei Wang

Sensors 2026, 26(6), 1812; https://doi.org/10.3390/s26061812 - 13 Mar 2026

Viewed by 267

Abstract

The fusion of data from visible (RGB) and infrared (IR) sensors is essential for robust all-day and all-weather object detection. However, existing methods often suffer from modality redundancy and noise interference. To address these challenges, we propose the Decoupled and Differentiated Attention Fusion [...] Read more.

The fusion of data from visible (RGB) and infrared (IR) sensors is essential for robust all-day and all-weather object detection. However, existing methods often suffer from modality redundancy and noise interference. To address these challenges, we propose the Decoupled and Differentiated Attention Fusion Network (DDAF-Net). Architecturally, DDAF-Net employs a decoupled backbone with a Siamese weight-sharing strategy to extract modality-common features, while parallel branches capture modality-specific features. To effectively integrate these features, we design the Differentiated Attention Fusion Module (DAFM). First, we introduce Spatial Residual Unshuffle Embedding (SRUE) to achieve lossless downsampling while preserving global semantic information. Second, differentiated attention mechanisms are applied for feature enhancement: Dual-Norm Alignment Attention (DNAA) facilitates effective modal alignment and enhances semantic consistency in modality-common features, while Sparse Purification Attention (SPA) enables selective utilization of complementary information by suppressing noise and focusing on salient regions in modality-specific features. Finally, the Adaptive Complementary Fusion Module (ACFM) integrates these components by using modality-common features as a baseline and dynamically weighting the complementary modality-specific information. Extensive experiments on public datasets such as LLVIP and M³FD demonstrate that DDAF-Net achieves state-of-the-art performance. These results validate the effectiveness of our proposed decoupling–enhancement–fusion paradigm. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

22 pages, 20655 KB

Open AccessArticle

Center Prior Guided Multi-Feature Fusion for Salient Object Detection in Metallurgical Furnace Images

by Lin Pan, Haisheng Zhong, Zhikun Qi, Xiaofang Chen and Denghui Wu

Appl. Sci. 2026, 16(6), 2668; https://doi.org/10.3390/app16062668 - 11 Mar 2026

Viewed by 144

Abstract

This paper proposes a novel salient object detection method for operational hole localization in metallurgical furnaces, addressing challenging industrial conditions including extreme illumination variations and strong electromagnetic interference to enable two-level measurement in aluminum electrolysis cells and impact position recognition of the front-of-furnace [...] Read more.

This paper proposes a novel salient object detection method for operational hole localization in metallurgical furnaces, addressing challenging industrial conditions including extreme illumination variations and strong electromagnetic interference to enable two-level measurement in aluminum electrolysis cells and impact position recognition of the front-of-furnace operation robot. It employs a multi-feature fusion framework combining foreground and background saliency maps with center prior maps. Foreground saliency maps are generated through spatial compactness and local contrast computations, enhancing discriminative features while suppressing shared foreground–background characteristics. Background saliency maps are constructed via sparse reconstruction to exploit redundant features. Then method integrates edge extraction and density clustering to generate center prior maps that emphasize foreground target centroids and mitigate background noise. Comprehensive evaluations on both a specialized operational hole dataset and six public datasets demonstrate superior performance compared to other methods. On the specialized dataset, it achieves a precision of 0.8954, a maximum F-measure of 0.8994, and an S-measure of 0.8662. While maintaining operational robustness, the method offers a practical solution for furnace monitoring and robotic operation guidance in metallurgical processes. Full article

(This article belongs to the Special Issue AI Applications in Modern Industrial Systems)

► Show Figures

Figure 1

24 pages, 1110 KB

Open AccessArticle

Acceptability and Implementation Considerations for 40 Hz Auditory Stimulation Using Nature-Based Soundscapes for Cognitive Health Applications: A Qualitative Exploratory Study

by Kiechan Namkung and Kanghyun Lee

Healthcare 2026, 14(4), 512; https://doi.org/10.3390/healthcare14040512 - 17 Feb 2026

Viewed by 379

Abstract

Background/Objectives: 40 Hz sensory stimulation is being explored for cognitive health applications, but sustained use may be constrained by the listenability of simple 40 Hz auditory stimuli. We examined user-perceived acceptability and implementation considerations for 40 Hz auditory stimulation delivered by embedding a [...] Read more.

Background/Objectives: 40 Hz sensory stimulation is being explored for cognitive health applications, but sustained use may be constrained by the listenability of simple 40 Hz auditory stimuli. We examined user-perceived acceptability and implementation considerations for 40 Hz auditory stimulation delivered by embedding a pure 40 Hz sine wave within nature-based soundscapes. Methods: Eleven adults aged ≥ 40 years in Seoul, Republic of Korea were assigned to waves or forest soundscapes (between-participants) and completed a within-session exposure to two conditions within the assigned set: 40 Hz–OFF (soundscape-only) and 40 Hz–ON (soundscape plus an additively layered 40 Hz sine wave). Each condition comprised seven cycles of 50 s playback and 10 s silence (~7 min) with a 10 min washout. After completing both listening blocks, participants provided brief comparative session-end ratings to aid recall and then completed a semi-structured interview focused on detectability and comparative impressions while blinded to condition identity. Following debriefing about the 40 Hz manipulation, participants completed a session-end 7-point Likert appraisal of the intended intervention stimulus (40 Hz–ON). Interview transcripts were analyzed using thematic analysis and interpreted using the Theoretical Framework of Acceptability and Proctor et al.’s implementation outcomes as sensitizing frameworks. Results: Session-end appraisals suggested that the 40 Hz-integrated soundscape (40 Hz–ON) was generally listenable, with mid-to-high comfort and immersion (medians = 5) and low unpleasantness (median = 2), while perceived artificiality spanned the full scale (range 1–7) and overall preference was moderate (median = 4). Interviews indicated that acceptability was governed by perceptual integration: natural blending supported “backgroundable” listening, whereas salient low-frequency rumble or a mechanical/artificial timbre contributed to negative reactions. Implementation-relevant themes highlighted context fit (bedtime vs. morning routines), low-friction automation (timers/scheduling), and conservative acoustic safeguards (gentle onset and default levels). Conclusions: In a single-session evaluation among adults aged ≥ 40 years, embedding a 40 Hz sine wave within nature-based soundscapes was generally acceptable, with acceptability sensitive to perceptual integration and usage context. This qualitative study does not assess clinical or cognitive efficacy. These findings inform implementation considerations for cognitive health-oriented delivery, including space-oriented playback options, simplified automation, conservative acoustic safeguards, and coherence-supportive user guidance without overclaiming. Full article

(This article belongs to the Section Digital Health Technologies)

► Show Figures

Figure 1

27 pages, 1996 KB

Open AccessArticle

Salient Object Detection for Optical Remote Sensing Images Based on Gated Differential Unit

by Mingsi Sun, Ting Lan, Wei Wang and Pingping Liu

Remote Sens. 2026, 18(3), 389; https://doi.org/10.3390/rs18030389 - 23 Jan 2026

Viewed by 378

Abstract

Salient object detection in optical remote sensing images has attracted extensive research interest in recent years. However, CNN-based methods are generally limited by local receptive fields, while ViT-based methods suffer from common defects in noise suppression, channel selection, foreground-background distinction, and detail enhancement. [...] Read more.

Salient object detection in optical remote sensing images has attracted extensive research interest in recent years. However, CNN-based methods are generally limited by local receptive fields, while ViT-based methods suffer from common defects in noise suppression, channel selection, foreground-background distinction, and detail enhancement. To address these issues and integrate long-distance contextual dependencies, we introduce GDUFormer, an ORSI-SOD detection method based on the ViT backbone and Gated Differential Units (GDU). Specifically, the GDU consists of two key components—Full-Dimensional Gated Attention (FGA) and Hierarchical Differential Dynamic Convolution (HDDC). FGA consists of two branches aimed at filtering effective features from the information flow. The first branch focuses on aggregating spatial local information under multiple receptive fields and filters the local feature maps via a grouping mechanism. The second branch imitates the Vision Mamba to acquire high-level reasoning and abstraction capabilities, enabling weak channel filtering. HDDC primarily utilizes distance decay and hierarchical intensity difference capture mechanisms to generate dynamic kernel spatial weights, thereby facilitating the convolution kernel to fully mix long-range contextual dependencies. Among these, the intensity difference capture mechanism can adaptively divide hierarchies and allocate parameters according to kernel size, thus realizing varying levels of difference capture in the kernel space. Extensive quantitative and qualitative experiments demonstrate the effectiveness and rationality of GDUFormer and its internal components. Full article

► Show Figures

Figure 1

16 pages, 2780 KB

Open AccessArticle

Multi-Class Malocclusion Detection on Standardized Intraoral Photographs Using YOLOv11

by Ani Nebiaj, Markus Mühling, Bernd Freisleben and Babak Sayahpour

Dent. J. 2026, 14(1), 60; https://doi.org/10.3390/dj14010060 - 16 Jan 2026

Viewed by 482

Abstract

Background/Objectives: Accurate identification of dental malocclusions from routine clinical photographs can be time-consuming and subject to interobserver variability. A YOLOv11-based deep learning approach is presented and evaluated for automatic malocclusion detection on routine intraoral photographs, testing the hypothesis that training on a structured [...] Read more.

Background/Objectives: Accurate identification of dental malocclusions from routine clinical photographs can be time-consuming and subject to interobserver variability. A YOLOv11-based deep learning approach is presented and evaluated for automatic malocclusion detection on routine intraoral photographs, testing the hypothesis that training on a structured annotation protocol enables reliable detection of multiple clinically relevant malocclusions. Methods: An anonymized dataset of 5854 intraoral photographs (frontal occlusion; right/left buccal; maxillary/mandibular occlusal) was labeled according to standardized instructions derived from the Index of Orthodontic Treatment Need (IOTN) A total of 17 clinically relevant classes were annotated with bounding boxes. Due to an insufficient number of examples, two malocclusions (transposition and non-occlusion) were excluded from our quantitative analysis. A YOLOv11 model was trained with augmented data and evaluated on a held-out test set using mean average precision at IoU 0.5 (mAP50), macro precision (macro-P), and macro recall (macro-R). Results: Across 15 analyzed classes, the model achieved 87.8% mAP50, 76.9% macro-P, and 86.1% macro-R. The highest per-class AP₅₀ was observed for Deep bite (98.8%), Diastema (97.9%), Angle Class II canine (97.5%), Anterior open bite (92.8%), Midline shift (91.8%), Angle Class II molar (91.1%), Spacing (91%), and Crowding (90.1%). Moderate performance included Anterior crossbite (88.3%), Angle Class III molar (87.4%), Head bite (82.7%), and Posterior open bite (80.2%). Lower values were seen for Angle Class III canine (76%), Posterior crossbite (75.6%), and Big overjet (75.3%). Precision–recall trends indicate earlier precision drop-off for posterior/transverse classes and comparatively more missed detections in Posterior crossbite, whereas Big overjet exhibited more false positives at the chosen threshold. Conclusion: A YOLOv11-based deep learning system can accurately detect several clinically salient malocclusions on routine intraoral photographs, supporting efficient screening and standardized documentation. Performance gaps align with limited examples and visualization constraints in posterior regions. Larger, multi-center datasets, protocol standardization, quantitative metrics, and multimodal inputs may further improve robustness. Full article

(This article belongs to the Special Issue Artificial Intelligence in Oral Rehabilitation)

► Show Figures

Graphical abstract

18 pages, 1564 KB

Open AccessArticle

Salient Object Detection in Optical Remote Sensing Images Based on Hierarchical Semantic Interaction

by Jingfan Xu, Qi Zhang, Jinwen Xing, Mingquan Zhou and Guohua Geng

J. Imaging 2025, 11(12), 453; https://doi.org/10.3390/jimaging11120453 - 17 Dec 2025

Viewed by 566

Abstract

Existing salient object detection methods for optical remote sensing images still face certain limitations due to complex background variations, significant scale discrepancies among targets, severe background interference, and diverse topological structures. On the one hand, the feature transmission process often neglects the constraints [...] Read more.

Existing salient object detection methods for optical remote sensing images still face certain limitations due to complex background variations, significant scale discrepancies among targets, severe background interference, and diverse topological structures. On the one hand, the feature transmission process often neglects the constraints and complementary effects of high-level features on low-level features, leading to insufficient feature interaction and weakened model representation. On the other hand, decoder architectures generally rely on simple cascaded structures, which fail to adequately exploit and utilize contextual information. To address these challenges, this study proposes a Hierarchical Semantic Interaction Module to enhance salient object detection performance in optical remote sensing scenarios. The module introduces foreground content modeling and a hierarchical semantic interaction mechanism within a multi-scale feature space, reinforcing the synergy and complementarity among features at different levels. This effectively highlights multi-scale and multi-type salient regions in complex backgrounds. Extensive experiments on multiple optical remote sensing datasets demonstrate the effectiveness of the proposed method. Specifically, on the EORSSD dataset, our full model integrating both CA and PA modules improves the max F-measure from 0.8826 to 0.9100 (↑2.74%), increases maxE from 0.9603 to 0.9727 (↑1.24%), and enhances the S-measure from 0.9026 to 0.9295 (↑2.69%) compared with the baseline. These results clearly demonstrate the effectiveness of the proposed modules and verify the robustness and strong generalization capability of our method in complex remote sensing scenarios. Full article

(This article belongs to the Special Issue AI-Driven Remote Sensing Image Processing and Pattern Recognition)

► Show Figures

Figure 1

24 pages, 11779 KB

Open AccessArticle

Aircraft Trajectory Tracking via Geometric Prior-Guided Keypoint Detection in SMR

by Xiaoyan Wang, Jiangyan Ji, Mingmin Wu, Peng Li, Xiangli Wang, Zhaowen Tong and Zhixiang Huang

Symmetry 2025, 17(12), 2162; https://doi.org/10.3390/sym17122162 - 16 Dec 2025

Viewed by 444

Abstract

Detecting aircraft in Airport Surface Movement Radar (SMR) imagery presents a unique challenge rooted in the conflict between object symmetry and data asymmetry. While aircraft possess strong structural symmetry, their radar signatures are often sparse, incomplete, and highly asymmetric, leading to target loss [...] Read more.

Detecting aircraft in Airport Surface Movement Radar (SMR) imagery presents a unique challenge rooted in the conflict between object symmetry and data asymmetry. While aircraft possess strong structural symmetry, their radar signatures are often sparse, incomplete, and highly asymmetric, leading to target loss and position jitter in traditional detection algorithms. To overcome this, we introduce SWCR-YOLO, a keypoint detection framework designed to learn and enforce the target’s implicit structural symmetry from its imperfect radar representation. Our model reconstructs a stable aircraft pose by localizing four keypoints (nose, tail, wingtips) that define its symmetric axes. Based on YOLOv11n, SWCR-YOLO incorporates a MultiScaleStem module and wavelet transforms to effectively extract features from the sparse, asymmetric scatter points, while a Multi-Scale Convolutional Attention (MSCA) module refines salient information. Crucially, training is guided by a Geometric Regularized Keypoint Loss (GRKLoss), which introduces a symmetry-based prior by imposing angular constraints on the keypoints to ensure physically plausible pose estimations. Our symmetry-aware approach, on a real-world SMR dataset, achieves an mAP50 of 88.2% and reduces the trajectory root mean square error by 51.8% compared to MTD-CFAR pipeline methods, from 8.235 m to 3.968 m, demonstrating its effectiveness in handling asymmetric data for robust object tracking. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry in Image Processing and Computer Vision)

► Show Figures

Figure 1

14 pages, 3389 KB

Open AccessArticle

A Cascaded Enhancement-Fusion Network for Visible-Infrared Imaging in Darkness

by Hanchang Huang, Hao Liu, Hailu Wang, Yunzhuo Yang, Chuan Guo, Minsun Chen and Kai Han

Photonics 2025, 12(12), 1231; https://doi.org/10.3390/photonics12121231 - 15 Dec 2025

Viewed by 400

Abstract

This paper presents a cascaded imaging method that combines low-light enhancement and visible–long-wavelength infrared (VIS-LWIR) image fusion to mitigate image degradation in dark environments. The framework incorporates a Low-Light Enhancer Network (LLENet) for improving visible image illumination and a heterogeneous information fusion subnetwork [...] Read more.

This paper presents a cascaded imaging method that combines low-light enhancement and visible–long-wavelength infrared (VIS-LWIR) image fusion to mitigate image degradation in dark environments. The framework incorporates a Low-Light Enhancer Network (LLENet) for improving visible image illumination and a heterogeneous information fusion subnetwork (IXNet) for integrating features from enhanced VIS and LWIR images. Using a joint training strategy with a customized loss function, the approach effectively preserves salient targets and texture details. Experimental results on the LLVIP, M³FD, TNO, and MSRS datasets demonstrate that the method produces high-quality fused images with superior performance evaluated by quantitative metrics. It also exhibits excellent generalization ability, maintains a compact model size with low computational complexity, and significantly enhances performance in high-level visual tasks like object detection, particularly in challenging low-light scenarios. Full article

(This article belongs to the Special Issue Technologies and Applications of Optical Imaging)

► Show Figures

Figure 1

21 pages, 18260 KB

Open AccessArticle

Salient Object Detection Guided Fish Phenotype Segmentation in High-Density Underwater Scenes via Multi-Task Learning

by Jiapeng Zhang, Cheng Qian, Jincheng Xu, Xueying Tu, Xuyang Jiang and Shijing Liu

Fishes 2025, 10(12), 627; https://doi.org/10.3390/fishes10120627 - 6 Dec 2025

Cited by 2 | Viewed by 417

Abstract

Phenotyping technologies are essential for modern aquaculture, particularly for precise analysis of individual morphological traits. This study focuses on critical phenotype segmentation tasks for fish carcass and fins, which have significant applications in phenotypic assessment and breeding. In high-density underwater environments, fish frequently [...] Read more.

Phenotyping technologies are essential for modern aquaculture, particularly for precise analysis of individual morphological traits. This study focuses on critical phenotype segmentation tasks for fish carcass and fins, which have significant applications in phenotypic assessment and breeding. In high-density underwater environments, fish frequently exhibit structural overlap and indistinct boundaries, making it difficult for conventional segmentation methods to obtain complete and accurate phenotypic regions. To address these challenges, a double-branch segmentation network is proposed for fish phenotype segmentation in high-density underwater scenes. An auxiliary saliency object detection (SOD) branch is introduced alongside the primary segmentation branch to localize structurally complete targets and suppress interference from overlapping or incomplete fish while inter-branch skip connections further enhance the model’s focus on salient targets and their boundaries. The network is trained under a multi-task learning framework, allowing the branches to specialize in edge detection and accurate region segmentation. Experiments on large yellow croaker (Larimichthys crocea) images collected under real farming conditions show that the proposed method achieves Dice scores of 97.58% for carcass segmentation and 88.88% for fin segmentation. The corresponding ASD values are 0.590 and 0.364 pixels, and the HD95 values are 3.521 and 1.222 pixels. The method outperforms nine existing algorithms across key metrics, confirming its effectiveness and reliability for practical aquaculture phenotyping. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Aquaculture)

► Show Figures

Figure 1

25 pages, 3067 KB

Open AccessArticle

Lightweight Attention-Augmented YOLOv5s for Accurate and Real-Time Fall Detection in Elderly Care Environments

by Bibo Yang, Lan Thi Nguyen and Wirapong Chansanam

Sensors 2025, 25(23), 7365; https://doi.org/10.3390/s25237365 - 3 Dec 2025

Cited by 1 | Viewed by 850

Abstract

Falls among the elderly represent a leading cause of injury and mortality worldwide, necessitating reliable and real-time monitoring solutions. This study aims to develop a lightweight, accurate, and efficient fall detection framework based on an improved YOLOv5s model. The proposed architecture incorporates a [...] Read more.

Falls among the elderly represent a leading cause of injury and mortality worldwide, necessitating reliable and real-time monitoring solutions. This study aims to develop a lightweight, accurate, and efficient fall detection framework based on an improved YOLOv5s model. The proposed architecture incorporates a Convolutional Block Attention Module (CBAM) to enhance salient feature extraction, optimizes multi-scale feature fusion in the Neck for better small-object detection, and re-clusters anchor boxes tailored to the horizontal morphology of elderly falls. A multi-scene dataset comprising 11,314 images was constructed to evaluate performance under diverse lighting, occlusion, and spatial conditions. Experimental results demonstrate that the improved YOLOv5s achieves a mean average precision (mAP@0.5) of 94.2%, a recall of 92.5%, and a false alarm rate of 4.2%, outperforming baseline YOLOv5s and YOLOv4 models while maintaining real-time detection speed at 32 FPS. These findings confirm that integrating attention mechanisms, adaptive fusion, and anchor optimization significantly enhances robustness and generalization. Although performance slightly declines under extreme lighting or heavy occlusion, this limitation highlights future opportunities for multimodal fusion and illumination-invariant modeling. Overall, the study contributes a scalable and deployable AI framework that bridges the gap between algorithmic innovation and real-world elderly care applications, advancing intelligent and non-intrusive safety monitoring in aging societies. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

24 pages, 39644 KB

Open AccessArticle

Locate then Calibrate: A Synergistic Framework for Small Object Detection from Aerial Imagery to Ground-Level Views

by Kaiye Lin, Zhexiang Zhao and Na Niu

Remote Sens. 2025, 17(22), 3750; https://doi.org/10.3390/rs17223750 - 18 Nov 2025

Viewed by 699

Abstract

Detection of small objects in aerial images captured by Unmanned Aerial Vehicles (UAVs) is a critical task in remote sensing. It is vital for applications like urban monitoring and disaster assessment. This task, however, is challenged by unique viewpoints, diminutive target sizes, and [...] Read more.

Detection of small objects in aerial images captured by Unmanned Aerial Vehicles (UAVs) is a critical task in remote sensing. It is vital for applications like urban monitoring and disaster assessment. This task, however, is challenged by unique viewpoints, diminutive target sizes, and dense scenes. To surmount these challenges, this paper introduces the Locate then Calibrate (LTC) framework. It is a deep learning architecture designed to enhance visual perception systems, specifically for the accurate and robust detection of small objects. Our model builds upon the YOLOv8 architecture and incorporates three synergistic innovations. (1) An Efficient Multi-Scale Attention (EMA) mechanism is employed to ‘Locate’ salient targets by capturing critical cross-dimensional dependencies. (2) We propose a novel Adaptive Multi-Scale (AMS) convolution module to ‘Calibrate’ features, using dynamically learned weights to optimally fuse multi-scale information. (3) An additional high-resolution P2 detection head preserves the fine-grained details essential for localizing diminutive targets. Extensive experimental evaluations demonstrate that the proposed model substantially outperforms the YOLOv8n baseline. Notably, it achieves significant performance gains on the challenging VisDrone aerial dataset. On this dataset, the model achieves a remarkable 11.7% relative increase in mean Average Precision (mAP50). The framework also shows strong generalization. Considerable improvements are recorded on ground-level autonomous driving benchmarks such as KITTI and TT100K_mini. This validated effectiveness proves that LTC is a robust solution for high-accuracy detection: it achieves significant accuracy gains at the cost of a deliberate increase in computational GFLOPs, while maintaining a lightweight parameter count. This design choice positions LTC as a solution for edge applications where accuracy is prioritized over minimal computational cost. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

22 pages, 6682 KB

Open AccessArticle

Multimodal Fire Salient Object Detection for Unregistered Data in Real-World Scenarios

by Ning Sun, Jianmeng Zhou, Kai Hu, Chen Wei, Zihao Wang and Lipeng Song

Fire 2025, 8(11), 415; https://doi.org/10.3390/fire8110415 - 26 Oct 2025

Viewed by 1708

Abstract

In real-world fire scenarios, complex lighting conditions and smoke interference significantly challenge the accuracy and robustness of traditional fire detection systems. Fusion of complementary modalities, such as visible light (RGB) and infrared (IR), is essential to enhance detection robustness. However, spatial shifts and [...] Read more.

In real-world fire scenarios, complex lighting conditions and smoke interference significantly challenge the accuracy and robustness of traditional fire detection systems. Fusion of complementary modalities, such as visible light (RGB) and infrared (IR), is essential to enhance detection robustness. However, spatial shifts and geometric distortions occur in multi-modal image pairs collected by multi-source sensors due to installation deviations and inconsistent intrinsic parameters. Existing multi-modal fire detection frameworks typically depend on pre-registered data, which struggles to handle modal misalignment in practical deployment. To overcome this limitation, we propose an end-to-end multi-modal Fire Salient Object Detection framework capable of dynamically fusing cross-modal features without pre-registration. Specifically, the Channel Cross-enhancement Module (CCM) facilitates semantic interaction across modalities in salient regions, suppressing noise from spatial misalignment. The Deformable Alignment Module (DAM) achieves adaptive correction of geometric deviations through cascaded deformation compensation and dynamic offset learning. For validation, we constructed an unregistered indoor fire dataset (Indoor-Fire) covering common fire scenarios. Generalizability was further evaluated on an outdoor dataset (RGB-T Wildfire). To fully validate the effectiveness of the method in complex building fire scenarios, we conducted experiments using the Fire in historic buildings (Fire in historic buildings) dataset. Experimental results demonstrate that the F1-score reaches 83% on both datasets, with the IoU maintained above 70%. Notably, while maintaining high accuracy, the number of parameters (91.91 M) is only 28.1% of the second-best SACNet (327 M). This method provides a robust solution for unaligned or weakly aligned modal fusion caused by sensor differences and is highly suitable for deployment in intelligent firefighting systems. Full article

► Show Figures

Figure 1

Search Results (218)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (218)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI