Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (134)

Search Parameters:
Keywords = hard attention module

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
29 pages, 12456 KB  
Article
A Lightweight Drainage Pipe Defect Detection Method Based on an Improved YOLO11 Network
by Rui Xue, Hongtao Fu, Hui Zhao and Chongquan Wang
Information 2026, 17(6), 613; https://doi.org/10.3390/info17060613 (registering DOI) - 21 Jun 2026
Viewed by 85
Abstract
Drainage pipe defect detection is essential for maintaining the normal operation of urban infrastructure. In recent years, deep learning-based object detection methods have provided an effective technical solution for drainage pipe defect recognition. Among them, YOLO-series models have demonstrated strong potential in visual [...] Read more.
Drainage pipe defect detection is essential for maintaining the normal operation of urban infrastructure. In recent years, deep learning-based object detection methods have provided an effective technical solution for drainage pipe defect recognition. Among them, YOLO-series models have demonstrated strong potential in visual detection tasks due to their end-to-end architecture and high inference efficiency. However, directly applying baseline YOLO models may still face challenges such as limited detection accuracy, relatively high model complexity, and insufficient adaptability for lightweight deployment scenarios. To address these issues, this paper proposes a lightweight drainage pipe defect detection method based on an improved YOLO11 network. Rather than treating detection enhancement and model compression as two separate procedures, the proposed method integrates feature enhancement, adaptive pruning, and distillation-based recovery into a unified lightweight detection framework. Specifically, an improved SimAM attention mechanism is introduced into the backbone and integrated with the C3k2 module to construct the C3K2_SWS module, aiming to enhance the representation capability of critical defect features. In the neck network, a focused diffusion pyramid network with a dimension-aware selective fusion structure, termed FDPN-DASI, is designed to strengthen multi-scale feature interactions. In addition, an adaptive-threshold focal loss (ATFL) is introduced to improve the learning capability for hard samples. For efficient deployment, the LAMP pruning algorithm is further improved, and an entropy-guided entropy-adaptive magnitude-based pruning method (EA-LAMP) is proposed to enable adaptive allocation of pruning ratios across different network layers. Moreover, BCKD knowledge distillation is applied after pruning to mitigate the accuracy degradation caused by model compression. Experimental results indicate that the proposed lightweight YOLO11-SFA+EA+BCKD framework achieves a precision of 92.4%, a recall of 88.5%, and an mAP50 of 93.3%, while maintaining a compact model size of 1.6 M parameters and 4.5 G FLOPs. Compared with the baseline model, the proposed method improves precision, recall, and mAP50 by 5.9%, 5.0%, and 4.7%, respectively, while reducing the number of parameters, FLOPs, and model size by 1.0 M, 1.8 G, and 2.1 M, respectively. These results suggest that the proposed framework can improve detection performance while reducing model complexity under the current experimental setting, indicating its potential for lightweight drainage pipe defect detection tasks. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

23 pages, 13765 KB  
Article
GE-Detection: Efficient Attention and Dropout for Low-Light Object Detection
by Xiaochen Li and Hongtian Zhao
Sensors 2026, 26(12), 3909; https://doi.org/10.3390/s26123909 (registering DOI) - 19 Jun 2026
Viewed by 300
Abstract
Object detection in low-light scenes is difficult because weak illumination reduces local contrast, amplifies sensor noise, and makes small or occluded objects hard to localize. Existing enhancement-before-detection pipelines can improve visual brightness, but they are not always optimized for detection features, while transformer-style [...] Read more.
Object detection in low-light scenes is difficult because weak illumination reduces local contrast, amplifies sensor noise, and makes small or occluded objects hard to localize. Existing enhancement-before-detection pipelines can improve visual brightness, but they are not always optimized for detection features, while transformer-style global reasoning is often too costly for lightweight detectors. To address this gap, we propose GE-Detection, a detector-side framework that integrates Global Sub-Sampled Attention (GSA), Efficient Multi-scale Attention (EMA), and dropout regularization into YOLO- and PicoDet-style architectures. GSA introduces lower-cost global context modeling through spatially reduced key-value tokens, EMA refines multi-scale fused features without aggressive channel compression, and dropout improves training-time regularization with no inference-time parameter overhead. Experiments on COCO, ExDark, BDD100K-Night, and NightOwls show that the method is most effective in low-light detection: on ExDark with YOLO11n, mAP50-95 improves from 34.39% to 36.74%, mAP50 from 56.24% to 59.27%, and Box (P) from 67.63% to 71.36%. The full YOLO11n variant uses 2.91M parameters and maintains 134.7 FPS on an RTX 2080 Ti under the tested setting. Cross-dataset and corruption experiments further indicate that the proposed modules improve localization under several nighttime domain shifts while retaining known limitations under severe noise and adverse weather. These results indicate that combining efficient global attention, multi-scale feature recalibration, and targeted regularization can improve low-light localization while keeping the detector practical for deployment. Full article
Show Figures

Figure 1

36 pages, 13556 KB  
Article
OAD-YOLOv8n: A Lightweight Direction-Adaptive Framework for Steel Strip Surface Defect Detection
by Yuji Liu and Piwei Chen
Metals 2026, 16(6), 666; https://doi.org/10.3390/met16060666 - 16 Jun 2026
Viewed by 260
Abstract
Steel strip surface defect detection remains challenging because defects are often elongated, weakly bounded, low-contrast, and sensitive to imaging degradation. To address these issues, this paper proposes Orthogonal Direction-Adaptive YOLOv8n (OAD-YOLOv8n), a lightweight detector based on You Only Look Once version 8 nano [...] Read more.
Steel strip surface defect detection remains challenging because defects are often elongated, weakly bounded, low-contrast, and sensitive to imaging degradation. To address these issues, this paper proposes Orthogonal Direction-Adaptive YOLOv8n (OAD-YOLOv8n), a lightweight detector based on You Only Look Once version 8 nano (YOLOv8n) and centered on Orthogonal Direction-Adaptive Efficient Multi-Scale Attention (OA-EMA), an orthogonal direction-adaptive attention module that combines debiased strip descriptors, adaptive direction selection, and local directional convolution. Dynamic upsampling by learning to sample (DySample), a lightweight neck structure (SlimNeck), and Adaptive Threshold Focal Loss (ATFL) are further integrated to improve detail-preserving upsampling, efficient multi-scale fusion, and hard-sample optimization. Across five independent runs on NEU-DET, OAD-YOLOv8n improves Precision, Recall, mAP50, and mAP50:95 by 5.0, 3.6, 4.4, and 3.7 percentage points over YOLOv8n, while reducing FLOPs and parameters by approximately 10.3% and 7.0%, respectively. Complementary experiments on GC10-DET, cross-dataset transfer/adaptation, simulated practical image perturbations, failure cases, and measured inference speed provide a broader characterization of the model’s benchmark-level generalization, robustness, and deployment-related behavior. These results indicate that OAD-YOLOv8n provides an effective accuracy–efficiency trade-off for lightweight steel strip surface defect detection. Full article
Show Figures

Figure 1

29 pages, 31575 KB  
Article
DCA-DeepLab: Dual-Coordinate Attention DeepLab with Adaptive Focal Loss for Cotton Growth Semantic Segmentation from UAV Remote Sensing Images
by Liruizhi Jia, Jiazhan Gao, Zuolong Li, Heng Shi and Jihong Zhu
Drones 2026, 10(6), 456; https://doi.org/10.3390/drones10060456 - 11 Jun 2026
Viewed by 306
Abstract
UAV remote sensing provides centimetre-level imagery for fine-grained cotton growth monitoring, yet existing segmentation models face three challenges: cotton fields exhibit a pronounced row and column structure that standard convolutions struggle to capture; conventional decoders fuse features statically, suppressing fine boundary cues; and [...] Read more.
UAV remote sensing provides centimetre-level imagery for fine-grained cotton growth monitoring, yet existing segmentation models face three challenges: cotton fields exhibit a pronounced row and column structure that standard convolutions struggle to capture; conventional decoders fuse features statically, suppressing fine boundary cues; and the pixel-level class distribution is severely imbalanced. We present DCA-DeepLab, built on DeepLabv3+ with three task-specific components: a Dual-Coordinate Attention Gating (DCAG) module that decouples horizontal and vertical dependencies to encode row and column structures; a Multi-Scale Attention-Guided Modulated Feature Merging (MSAM-MFM) module that reweights semantic and detail features at each location; and an adaptive pixel-level modulated focal loss (APMFL), which focuses training on hard, minority-class pixels. We construct a cotton growth dataset of 11,745 UAV patches with four semantic classes. On this dataset and the public LoveDA benchmark, DCA-DeepLab attained the highest mIoU among the compared methods (51.74% and 51.71%), exceeding the strongest cotton baseline by 1.10 percentage points. Relative to DeepLabv3+, the Vigorous and Sparse minority-class IoUs improved by 3.51 and 1.91 percentage points, respectively, and Vigorous recall rose from 51.85% to 60.04%, with only 3.9% more parameters. These results show that encoding directional structure and adaptively balancing class contributions benefits fine-grained UAV crop segmentation. Full article
(This article belongs to the Section Drones in Agriculture and Forestry)
Show Figures

Figure 1

23 pages, 20700 KB  
Article
Edge-Deployable RGB–Thermal UAV Monitoring for Wildfires in Power Transmission Corridors
by Biao Wang, Daochun Huang, Yifeng Lin, Xu He, Zhengxian Guo and Bo Hong
Remote Sens. 2026, 18(12), 1869; https://doi.org/10.3390/rs18121869 - 6 Jun 2026
Viewed by 377
Abstract
Early wildfire monitoring in power transmission corridors requires reliable detection of weak fire and smoke cues under complex field conditions and strict edge-computing constraints. To address these issues, this paper proposes an edge-deployable RGB–thermal framework based on visible and thermal infrared (TIR) imaging [...] Read more.
Early wildfire monitoring in power transmission corridors requires reliable detection of weak fire and smoke cues under complex field conditions and strict edge-computing constraints. To address these issues, this paper proposes an edge-deployable RGB–thermal framework based on visible and thermal infrared (TIR) imaging for unmanned aerial vehicle (UAV)-based corridor monitoring, including a spatial detector, YOLO-MMSC, and a temporal-enhanced version, YOLO-MMSC-T. The study also establishes a self-collected corridor-oriented RGB–thermal (RGB–T) dataset to complement public wildfire data. Unlike existing RGB–thermal wildfire datasets that mainly focus on forest or wildland fire scenes, the proposed dataset is specifically organized for complex-background power transmission-corridor monitoring, including continuous UAV sequences, nighttime conditions, smoke/vegetation occlusion, long-range small targets, and hard-negative interference. To the best of our knowledge, this is the first self-collected RGB–thermal wildfire dataset designed for this specific application scenario. The framework integrates a mobile inverted bottleneck convolution (MBConv) lightweight backbone, a Shallow Detail Fusion Module (SDFM) for shallow cross-modal alignment and denoising, a Content-Guided Attention (CGA) module for adaptive fusion, and normalized Wasserstein distance (NWD)-based box regression for long-range small-target localization. Experiments on public and self-collected datasets show that YOLO-MMSC achieves 94.6% mAP@0.5, 95.0% precision, and 93.9% recall while running at 60 FPS on Jetson Orin NX. With temporal fine-tuning, YOLO-MMSC-T reaches a continuous detection rate (CDR) of 95.6% with a jitter index of 2.8×103. Field experiments using a DJI Matrice 4T further indicate a practical operating altitude of 120–180 m. These results support lightweight RGB–thermal remote sensing for real-time wildfire monitoring in complex transmission-corridor environments. Full article
Show Figures

Figure 1

23 pages, 2468 KB  
Article
Research on Robot Terrain Perception Based on Attention Mechanism and Confusion Enhancement
by Xingyu Liu, Nian Wang, Meng Hong, Chao Huang, Yushuang Xiao, Sijia Liu, Zheng Xiao, Zhongren Wang, Sijia Guan and Min Guo
Electronics 2026, 15(11), 2440; https://doi.org/10.3390/electronics15112440 - 3 Jun 2026
Viewed by 221
Abstract
Robotic visual perception and terrain recognition are critical for autonomous locomotion and adaptive control in complex environments. However, existing models often extract weak features, confuse classes, and deliver unstable recognition. Most prior studies use end-to-end convolutional networks or single-stream feature extraction, which limits [...] Read more.
Robotic visual perception and terrain recognition are critical for autonomous locomotion and adaptive control in complex environments. However, existing models often extract weak features, confuse classes, and deliver unstable recognition. Most prior studies use end-to-end convolutional networks or single-stream feature extraction, which limits the balance between fine-grained visual representation and adaptive discrimination of confusing samples. To solve this problem, this paper proposes a vision model that blends attention mechanisms with a confusion augmentation strategy. Using an improved ResNet50 backbone, we add a local feature sharpening module and a channel–spatial attention module to strengthen edge texture and global context representation. We also design a confusion augmentation strategy based on the similarity of hard samples. It generates mixed samples through cross-perturbation in feature space, thereby improving the discrimination of highly similar terrains. Experiments show that our model achieves an accuracy of over 98.19% on various terrains, including cement, asphalt, sand, and snow. t-SNE visualization and Grad-CAM analysis demonstrate clear class separability and good interpretability, confirming the effectiveness and robustness of the approach for robotic terrain recognition. Full article
Show Figures

Figure 1

25 pages, 22795 KB  
Article
MSDR-Net: Multiscale Dynamic Reasoning for Multi-Label Remote Sensing Image Classification
by Qinghe Sun, Hua Wang, Shuai Wang, Teng Yang, Hui Zhao and Xuewu Fan
Remote Sens. 2026, 18(11), 1798; https://doi.org/10.3390/rs18111798 - 1 Jun 2026
Viewed by 400
Abstract
With the rapid advancement of Earth observation technologies and the growing demand for intelligent remote sensing applications, high-resolution remote sensing imagery provides critical data support for a range of downstream applications, including land monitoring and disaster assessment. In this context, multi-label remote sensing [...] Read more.
With the rapid advancement of Earth observation technologies and the growing demand for intelligent remote sensing applications, high-resolution remote sensing imagery provides critical data support for a range of downstream applications, including land monitoring and disaster assessment. In this context, multi-label remote sensing image classification has become an important research task, because a single image may contain multiple ground-object categories with complex spatial distributions and semantic co-occurrence relationships. However, challenges such as the coexistence of multiscale objects, complex semantic dependencies, and long-tail category distributions impose significant limitations on existing methods in terms of feature representation capacity and class-balanced modeling. To address these challenges, a Multiscale Dynamic Reasoning Network (MSDR-Net) is proposed. Different from methods that focus on localized optimization for a single challenge, MSDR-Net establishes a task-driven modeling framework that jointly integrates multiscale feature extraction, label-aware semantic reasoning, and long-tail category optimization within an end-to-end architecture. The proposed network consists of three core modules. The Multiscale Feature Enhancement (MSFE) module incorporates a Feature Pyramid Network-based fusion mechanism, integrating deep semantic information with shallow, detailed features to effectively enhance the representation of multiscale objects. The Dynamic Semantic Reasoning (DSR) module introduces a Transformer-based global attention mechanism that models long-range dependencies among image features, enabling the capture of complex global semantic relationships. In the loss optimization stage, a Difficulty-Weighted Loss (DW-Loss) is introduced, which jointly incorporates category frequency weights and prior difficulty coefficients to dynamically regulate the contributions of rare classes and hard samples during training, thereby mitigating bias induced by class imbalance. Experiments conducted on the large-scale Detection in Optical Remote Sensing Images dataset demonstrate that the proposed method achieves superior performance. Ablation studies validate the effectiveness of each component, while comparative experiments indicate that MSDR-Net achieves a mean Average Precision of 95.88%, outperforming existing state-of-the-art methods. An improvement of approximately 1.74% is observed over the strongest baseline, MSCA, with consistent advantages demonstrated across Overall F1 and Class-wise F1 metrics. By unifying multiscale feature extraction, global semantic reasoning, and balanced loss optimization within a single framework, MSDR-Net provides a robust and efficient solution for multi-label classification in complex remote sensing scenarios. Full article
(This article belongs to the Special Issue Advanced AI Technology for Remote Sensing Analysis (Second Edition))
Show Figures

Figure 1

26 pages, 3619 KB  
Article
Rapid Detection of Mixed Gases from Lithium Battery Thermal Runaway Based on ISA-LSTM-TCN
by Ruqi Guo, Qian Yu, Hao Li, Zilong Pu and Mingzhi Jiao
Batteries 2026, 12(6), 188; https://doi.org/10.3390/batteries12060188 - 23 May 2026
Viewed by 315
Abstract
As new energy vehicles and energy storage systems become more common, safety accidents caused by lithium-ion batteries overheating have become more of a concern. Early detection based on distinctive gases (such as H2 and CO) can give an earlier warning than typical [...] Read more.
As new energy vehicles and energy storage systems become more common, safety accidents caused by lithium-ion batteries overheating have become more of a concern. Early detection based on distinctive gases (such as H2 and CO) can give an earlier warning than typical monitoring methods like temperature, voltage, or impedance. Nonetheless, attaining high-precision identification in intricate mixed-gas settings continues to be difficult because of the considerable cross-sensitivity of metal oxide semiconductor (MOS) gas sensors. This research presents an ISA-LSTM-TCN multi-task learning model utilizing an enhanced spatial attention mechanism for the swift identification and concentration forecasting of distinctive gases during lithium-ion battery thermal runaway. The model improves key feature extraction and anti-noise performance by combining the long-term temporal modeling ability of the Long Short-Term Memory (LSTM) network with the multi-scale feature extraction ability of the Temporal Convolutional Network (TCN). It also adds an Improved Spatial Attention (ISA) module with a residual multiplication structure. Moreover, in a multi-task learning framework, joint optimization of gas categorization and concentration regression is facilitated using a hard parameter-sharing method. Tests using a built MOS sensor array dataset show that the model is 99.23% accurate at classifying gases and that the R2 values for predicting H2 and CO concentrations are 0.9510 and 0.8400, respectively. Tests on public datasets and in different noisy environments show that the model is even better at generalizing and is more robust. The results show that the suggested method allows for quick, accurate detection of thermal runaway gases. This makes it an effective and smart way to monitor battery safety warning systems. Full article
(This article belongs to the Special Issue Advances in Lithium-Ion Battery Safety and Fire: 2nd Edition)
Show Figures

Figure 1

21 pages, 1929 KB  
Article
Physics-Informed Modified Kolmogorov–Arnold Network for CO Concentration Prediction in Gob Areas of Coal Spontaneous Combustion
by Zhuoqing Li, Jie Hou, Longqiang Han and Xiaodong Wang
Sensors 2026, 26(11), 3292; https://doi.org/10.3390/s26113292 - 22 May 2026
Viewed by 248
Abstract
Coal spontaneous combustion in gob areas is a major disaster endangering safe production in underground coal mines, and accurate prediction of carbon monoxide (CO), the core signature gas of coal oxidation, is critical for early warning and targeted prevention of mine fire disasters. [...] Read more.
Coal spontaneous combustion in gob areas is a major disaster endangering safe production in underground coal mines, and accurate prediction of carbon monoxide (CO), the core signature gas of coal oxidation, is critical for early warning and targeted prevention of mine fire disasters. However, CO concentration in gob areas is governed by complex gas–solid thermal–chemical multi-field coupling, presenting strong nonlinear characteristics. Traditional numerical methods suffer from prohibitive computational cost, purely data-driven models have inherent black-box defects, and conventional Physics-Informed Neural Networks (PINNs) require explicit full governing equations, which are hard to establish for such complex systems. This paper first proposes a Physics-Informed Modified Kolmogorov–Arnold Network (PIM-KAN), which deeply integrates domain physical knowledge with KAN architecture via a physics encoding layer, a residual-modified KAN layer, a multi-physics attention mechanism, and a multi-term physical consistency constraint framework. Experiments on 3125 real coal mine field samples show that the PIM-KAN achieves R2 = 0.9965 and RMSE = 0.9290 ppm, reducing RMSE by 19.5% compared with MLP, and outperforming all baseline models. Ablation studies confirm the significant contribution of each innovation module, and attention weight analysis is highly consistent with Arrhenius reaction kinetics, verifying its superior prediction accuracy, physical consistency and intrinsic interpretability. Full article
(This article belongs to the Special Issue Smart Sensors for Real-Time Mining Hazard Detection)
Show Figures

Figure 1

18 pages, 5622 KB  
Article
MscaVPR: Multi-Scale Coordinate Attention Network for Robust Visual Place Recognition
by Xiaohan Gao, Zhinong Zhong, Yongjian Tan, Ning Jing, Anran Yang and Qingren Jia
Sensors 2026, 26(10), 3261; https://doi.org/10.3390/s26103261 - 21 May 2026
Viewed by 610
Abstract
Visual place recognition (VPR) aims to localize a query image by matching its visual representation against a geotagged database. One major challenge in VPR is to learn place representations that remain robust under appearance changes, viewpoint variations, and perceptual aliasing. However, existing VPR [...] Read more.
Visual place recognition (VPR) aims to localize a query image by matching its visual representation against a geotagged database. One major challenge in VPR is to learn place representations that remain robust under appearance changes, viewpoint variations, and perceptual aliasing. However, existing VPR methods still show limitations in adaptive multi-scale feature fusion and viewpoint-aware training supervision, which may hinder robustness under severe viewpoint changes. In this paper, we propose MscaVPR, a VPR framework that combines multi-scale feature enhancement with azimuth-aware training. Specifically, a Multi-Scale Spatial Pyramid Attention (MSPA) module is incorporated to aggregate regional features across different spatial scales, and Coordinate Attention (CA) is used to encode positional cues for spatially refined feature learning. To further enhance viewpoint robustness, we design an azimuth-guided training strategy that selects hard positive samples with significant viewpoint discrepancies and optimizes them using an azimuth-aware auxiliary loss function.Experimental results on multiple benchmark datasets indicate that MscaVPR generally outperforms the strong baseline and demonstrates improved performance under challenging conditions. In particular, Recall@1 is improved by 2.1%, 1.9%, and 1.9% on the AmsterTime, SVOX-Night, and SVOX-Sun datasets, respectively. The results demonstrate that explicitly incorporating azimuth cues provides an effective complement to existing multi-scale and attention-based VPR methods. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

34 pages, 3730 KB  
Article
Bidirectional Perceptual Multimodal Interaction Network Based on Contrastive Learning for Breast Cancer pCR Prediction
by Jingjing Feng, Zongli Jiang and Jinli Zhang
Tomography 2026, 12(5), 74; https://doi.org/10.3390/tomography12050074 - 19 May 2026
Viewed by 323
Abstract
Background/Objectives: Early and accurate prediction of pathological complete response (pCR) after neoadjuvant chemotherapy (NAC) is vital for personalized breast cancer treatment. However, existing deep learning methods are hampered by tumor heterogeneity and semantic misalignment between high-dimensional dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) and [...] Read more.
Background/Objectives: Early and accurate prediction of pathological complete response (pCR) after neoadjuvant chemotherapy (NAC) is vital for personalized breast cancer treatment. However, existing deep learning methods are hampered by tumor heterogeneity and semantic misalignment between high-dimensional dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) and low-dimensional clinical data, which limits pCR prediction performance and generalization. This study addresses these challenges via a novel multimodal network. Methods: We propose a Bidirectional Perceptual Multimodal Interaction Network (BPMINet) based on contrastive learning. BPMINet integrates pre-NAC DCE-MRI and clinical information through three core components: (1) we propose a bidirectional cross-modal attention (BiCMA) fusion mechanism to resolve semantic misalignment and facilitate effective multimodal feature fusion; (2) we design a multimodal contrast-aware feature enhancement (MCFE) module as a key component tightly integrated into the pCR-oriented contrastive learning framework, which serves to boost discriminative power for pCR prediction and improve generalization performance on hard-to-classify samples; (3) we adopt a dual-loss strategy to enable the collaborative optimization of discriminative feature representation and pCR prediction performance. Results: On two publicly available multicenter datasets, BPMINet outperformed all comparative methods across seven evaluation metrics: specifically, it surpassed the top-performing baseline by 5.17% in AUC and 5.24% in accuracy on the MAMA-MIA dataset. More notably, it achieved substantially larger gains of 11.72% in AUC and 7.38% in accuracy on the ISPY1 dataset. Conclusions: BPMINet achieves optimal pCR prediction performance, confirming its superiority and strong generalization ability for multimodal breast cancer pCR prediction. Full article
Show Figures

Figure 1

19 pages, 3193 KB  
Article
A Value-Driven Multi-Agent Reinforcement Learning Framework for Decentralized Adaptive Energy Management in Prosumer Smart Grids
by Otilia Elena Dragomir and Florin Dragomir
Buildings 2026, 16(10), 1974; https://doi.org/10.3390/buildings16101974 - 16 May 2026
Viewed by 259
Abstract
Prosumer communities, aggregations of residential and commercial entities equipped with distributed energy resources (DER), including photovoltaic systems, battery storage, and flexible loads, are emerging as critical organizational units in decarbonising smart grid architectures. Managing these communities effectively requires balancing economic efficiency with equity, [...] Read more.
Prosumer communities, aggregations of residential and commercial entities equipped with distributed energy resources (DER), including photovoltaic systems, battery storage, and flexible loads, are emerging as critical organizational units in decarbonising smart grid architectures. Managing these communities effectively requires balancing economic efficiency with equity, autonomy, and environmental sustainability, objectives that conventional centralized control methods and existing multi-agent reinforcement learning (MARL) implementations fail to address simultaneously. This article proposes a value-aligned hierarchical multi-agent reinforcement learning (VA-HMARL) framework as a formally unified architecture that embeds equity (Jain’s Fairness Index J ≥ 0.90), individual autonomy, and carbon sustainability as hard constraints within the MARL reward structure. The framework integrates: a multi-objective Value Alignment Module (VAM) combining economic, fairness, sustainability, and comfort objectives; attention-based implicit coordination for scalable agent interaction; and differentially private federated policy aggregation (ε = 1.0, δ = 10−5) for GDPR-compliant collaborative learning. Simulation on a 20-prosumer community modelled on the IEEE 33-bus feeder over 10 Monte Carlo runs (300 episodes each) demonstrates: a 6.2% energy cost reduction versus the Rule-Based baseline (p = 0.0004); a Jain’s Fairness Index of 0.912 ± 0.031 at policy convergence (final 50 episodes), satisfying the J ≥ 0.90 community equity floor; and an 18.0% reduction in CO2 emissions. The economic efficiency trade-off relative to performance-optimized MARL baselines is limited to 2.4%, within the 5% design target. These results establish VA-HMARL as a technically feasible and ethically grounded paradigm for autonomous decentralized energy governance. Full article
(This article belongs to the Special Issue AI-Driven Distributed Optimization for Building Energy Management)
Show Figures

Figure 1

29 pages, 17443 KB  
Article
Per-SAM-MCPA: A Lightweight Framework for Individual Tree Crown Segmentation from UAV Imagery
by Chuting Hu, Size Dai, Shifan Wu, Qiaolin Ye and He Yan
Remote Sens. 2026, 18(10), 1559; https://doi.org/10.3390/rs18101559 - 13 May 2026
Viewed by 338
Abstract
Accurate individual tree crown (ITC) segmentation from unmanned aerial vehicle (UAV) imagery is important for fine-scale forest inventory, plantation management, and ecological monitoring. However, delineating ITCs in dense plantation environments remains difficult because crowns are strongly adjacent, canopy structures are highly homogeneous, and [...] Read more.
Accurate individual tree crown (ITC) segmentation from unmanned aerial vehicle (UAV) imagery is important for fine-scale forest inventory, plantation management, and ecological monitoring. However, delineating ITCs in dense plantation environments remains difficult because crowns are strongly adjacent, canopy structures are highly homogeneous, and crown boundaries are often blurred, making it hard for existing methods to preserve both regional integrity and boundary continuity. This study proposes the Perceptual Segment-Anything Model with Multi-head Cross-Parallel Attention (Per-SAM-MCPA), a lightweight and effective framework for fine-grained ITC segmentation in dense plantation scenes. Based on a compact ResNet-50 backbone, the framework integrates perceptual target-aware representation, multi-scale detail enhancement, global contextual modeling, and semantic-boundary collaborative refinement to improve crown discrimination and structural consistency. A perceptual relation module is used to strengthen pixel-level semantic dependency modeling, and a Multi-head Cross-Parallel Attention (MCPA) mechanism is designed to capture long-range contextual interactions through orthogonally decomposed spatial attention, improving global geometric consistency with limited computational overhead. A Composite Constraint Loss (CCL) that combines a weighted cross-entropy loss, a structural similarity loss, and a boundary term based on Hausdorff distance is introduced to jointly optimize region-level segmentation quality and boundary fidelity. Experiments on the Catalpa bungei UAV dataset show that the proposed method achieves an intersection over union (IoU) of 87.3% and an F1-score of 91.0%, outperforming representative baseline methods such as SAM and Mask R-CNN while maintaining an inference speed of 35.7 FPS on a single GPU. These results indicate that Per-SAM-MCPA offers an accurate, efficient, and practical solution for ITC segmentation in dense plantation environments. Full article
Show Figures

Figure 1

23 pages, 6367 KB  
Article
SCNAnet: Structure-Aware Contrastive with Noise-Augmented Network for Unsupervised Change Detection
by Yijie Sun, Qingxi Wu and Nan Wang
Remote Sens. 2026, 18(9), 1427; https://doi.org/10.3390/rs18091427 - 4 May 2026
Viewed by 356
Abstract
Unsupervised change detection (UCD) is a key technique in Earth observation, aiming to identify and quantify surface changes over time by analyzing multi-temporal remote sensing images without manual annotations. Unlike supervised approaches that rely on ground reference to directly guide discriminative semantic learning, [...] Read more.
Unsupervised change detection (UCD) is a key technique in Earth observation, aiming to identify and quantify surface changes over time by analyzing multi-temporal remote sensing images without manual annotations. Unlike supervised approaches that rely on ground reference to directly guide discriminative semantic learning, UCD methods must construct their own reference. A mainstream strategy employs one temporal image as the reference and uses transformation models (e.g., style transfer networks) to align the other image in unchanged regions. Loss is then reduced by labeling hard-to-align pixels as “changes” and excluding them from the objective. However, this optimization process is dominated by style losses, which cause the model to learn to exclude regions that make only limited contributions to style-loss minimization, rather than to acquire discriminative representations of true geospatial changes. Such shortcut-driven optimization results in insufficient modeling of genuine change features and frequent misclassification of unchanged yet stylistically similar regions. To address these limitations, we propose SCNAnet, a novel framework that integrates three modules: a noise-perturbation consistency branch to suppress shortcut-driven learning, a structure-aware style transformation encoder to strengthen semantic representations of structural changes, and a frequency-attention decoder to refine the delineation of change regions. Extensive experiments on three benchmark datasets (GF-2, OSCD, and QuickBird) demonstrate the effectiveness of SCNAnet. Specifically, SCNAnet improves the F1 score by approximately 8% on the Montpellier dataset compared with the second-best method, demonstrating its effectiveness under challenging conditions. Full article
Show Figures

Figure 1

29 pages, 17309 KB  
Article
A Lightweight Hybrid CNN–CBAM Model for Multistage Acute Lymphoblastic Leukemia Classification from Peripheral Blood Smear Images
by Kittipol Wisaeng
Informatics 2026, 13(5), 69; https://doi.org/10.3390/informatics13050069 - 30 Apr 2026
Viewed by 1638
Abstract
Accurate and efficient classification of hematological malignancies from peripheral blood smear (PBS) images remains challenging due to the scarcity of annotated datasets, staining variability, and subtle morphological differences among blood cancer subtypes. To address these limitations, this study proposes an Advanced Lightweight Deep [...] Read more.
Accurate and efficient classification of hematological malignancies from peripheral blood smear (PBS) images remains challenging due to the scarcity of annotated datasets, staining variability, and subtle morphological differences among blood cancer subtypes. To address these limitations, this study proposes an Advanced Lightweight Deep Learning (ALDL) framework for the multi-class classification of Acute Lymphoblastic Leukemia (ALL) across four clinically significant stages: Benign, Pro-B, Pre-B, and Early Pre-B. The framework integrates EfficientNetV2-S with Convolutional Block Attention Modules (CBAM) to enhance spatial and channel-wise feature refinement. At the same time, Focal Loss is employed to mitigate class imbalance by prioritizing hard-to-classify samples. A robust preprocessing pipeline, including CLAHE contrast enhancement, Reinhard stain normalization, and data augmentation, improves feature visibility and dataset generalization. Lesion segmentation is performed using RGB-based thresholding and watershed overlay, followed by lesion-level cropping to ensure consistency across inputs. Experimental evaluations on the ALL-DB dataset demonstrate the superior performance of the proposed method, achieving an average accuracy of 96.11%, an F1-score of 95.99%, and an AUC of 0.9875. Comparative analyses against MobileNetV3, ResNet50, DenseNet121, VGG16, and InceptionV3 confirm that the proposed segmentation-guided EfficientNetV2-S + CBAM + Focal Loss framework consistently outperforms conventional CNN architectures across both 70:30 and 60:40 train–test splits. Furthermore, a detailed investigation of color spaces (RGB, HSV, LAB, and HED) indicates that RGB yields the most reliable segmentation and classification results. At the same time, HED enhances lesion visualization at the expense of higher computational cost. The proposed ALDL framework demonstrates strong potential for real-world application as a computer-aided diagnostic (CAD) system for early leukemia detection, offering improved diagnostic reliability, reduced error rates, and practical scalability for clinical environments. Full article
(This article belongs to the Section Health Informatics)
Show Figures

Figure 1

Back to TopTop