Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (668)

Search Parameters:
Keywords = channel aware

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 29559 KiB  
Article
CFRANet: Cross-Modal Frequency-Responsive Attention Network for Thermal Power Plant Detection in Multispectral High-Resolution Remote Sensing Images
by Qinxue He, Bo Cheng, Xiaoping Zhang and Yaocan Gan
Remote Sens. 2025, 17(15), 2706; https://doi.org/10.3390/rs17152706 - 5 Aug 2025
Abstract
Thermal Power Plants (TPPs), as widely used industrial facilities for electricity generation, represent a key task in remote sensing image interpretation. However, detecting TPPs remains a challenging task due to their complex and irregular composition. Many traditional approaches focus on detecting compact, small-scale [...] Read more.
Thermal Power Plants (TPPs), as widely used industrial facilities for electricity generation, represent a key task in remote sensing image interpretation. However, detecting TPPs remains a challenging task due to their complex and irregular composition. Many traditional approaches focus on detecting compact, small-scale objects, while existing composite object detection methods are mostly part-based, limiting their ability to capture the structural and textural characteristics of composite targets like TPPs. Moreover, most of them rely on single-modality data, failing to fully exploit the rich information available in remote sensing imagery. To address these limitations, we propose a novel Cross-Modal Frequency-Responsive Attention Network (CFRANet). Specifically, the Modality-Aware Fusion Block (MAFB) facilitates the integration of multi-modal features, enhancing inter-modal interactions. Additionally, the Frequency-Responsive Attention (FRA) module leverages both spatial and localized dual-channel information and utilizes Fourier-based frequency decomposition to separately capture high- and low-frequency components, thereby improving the recognition of TPPs by learning both detailed textures and structural layouts. Experiments conducted on our newly proposed AIR-MTPP dataset demonstrate that CFRANet achieves state-of-the-art performance, with a mAP50 of 82.41%. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

25 pages, 6934 KiB  
Article
Feature Constraints Map Generation Models Integrating Generative Adversarial and Diffusion Denoising
by Chenxing Sun, Xixi Fan, Xiechun Lu, Laner Zhou, Junli Zhao, Yuxuan Dong and Zhanlong Chen
Remote Sens. 2025, 17(15), 2683; https://doi.org/10.3390/rs17152683 - 3 Aug 2025
Viewed by 67
Abstract
The accelerated evolution of remote sensing technology has intensified the demand for real-time tile map generation, highlighting the limitations of conventional mapping approaches that rely on manual cartography and field surveys. To address the critical need for rapid cartographic updates, this study presents [...] Read more.
The accelerated evolution of remote sensing technology has intensified the demand for real-time tile map generation, highlighting the limitations of conventional mapping approaches that rely on manual cartography and field surveys. To address the critical need for rapid cartographic updates, this study presents a novel multi-stage generative framework that synergistically integrates Generative Adversarial Networks (GANs) with Diffusion Denoising Models (DMs) for high-fidelity map generation from remote sensing imagery. Specifically, our proposed architecture first employs GANs for rapid preliminary map generation, followed by a cascaded diffusion process that progressively refines topological details and spatial accuracy through iterative denoising. Furthermore, we propose a hybrid attention mechanism that strategically combines channel-wise feature recalibration with coordinate-aware spatial modulation, enabling the enhanced discrimination of geographic features under challenging conditions involving edge ambiguity and environmental noise. Quantitative evaluations demonstrate that our method significantly surpasses established baselines in both structural consistency and geometric fidelity. This framework establishes an operational paradigm for automated, rapid-response cartography, demonstrating a particular utility in time-sensitive applications including disaster impact assessment, unmapped terrain documentation, and dynamic environmental surveillance. Full article
Show Figures

Figure 1

21 pages, 16422 KiB  
Article
DCE-Net: An Improved Method for Sonar Small-Target Detection Based on YOLOv8
by Lijun Cao, Zhiyuan Ma, Qiuyue Hu, Zhongya Xia and Meng Zhao
J. Mar. Sci. Eng. 2025, 13(8), 1478; https://doi.org/10.3390/jmse13081478 - 31 Jul 2025
Viewed by 84
Abstract
Sonar is the primary tool used for detecting small targets at long distances underwater. Due to the influence of the underwater environment and imaging mechanisms, sonar images face challenges such as a small number of target pixels, insufficient data samples, and uneven category [...] Read more.
Sonar is the primary tool used for detecting small targets at long distances underwater. Due to the influence of the underwater environment and imaging mechanisms, sonar images face challenges such as a small number of target pixels, insufficient data samples, and uneven category distribution. Existing target detection methods are unable to effectively extract features from sonar images, leading to high false positive rates and affecting the accuracy of target detection models. To counter these challenges, this paper presents a novel sonar small-target detection framework named DCE-Net that refines the YOLOv8 architecture. The Detail Enhancement Attention Block (DEAB) utilizes multi-scale residual structures and channel attention mechanism (AM) to achieve image defogging and small-target structure completion. The lightweight spatial variation convolution module (CoordGate) reduces false detections in complex backgrounds through dynamic position-aware convolution kernels. The improved efficient multi-scale AM (MH-EMA) performs scale-adaptive feature reweighting and combines cross-dimensional interaction strategies to enhance pixel-level feature representation. Experiments on a self-built sonar small-target detection dataset show that DCE-Net achieves an mAP@0.5 of 87.3% and an mAP@0.5:0.95 of 41.6%, representing improvements of 5.5% and 7.7%, respectively, over the baseline YOLOv8. This demonstrates that DCE-Net provides an efficient solution for underwater detection tasks. Full article
(This article belongs to the Special Issue Artificial Intelligence Applications in Underwater Sonar Images)
Show Figures

Graphical abstract

21 pages, 4400 KiB  
Article
BFLE-Net: Boundary Feature Learning and Enhancement Network for Medical Image Segmentation
by Jiale Fan, Liping Liu and Xinyang Yu
Electronics 2025, 14(15), 3054; https://doi.org/10.3390/electronics14153054 - 30 Jul 2025
Viewed by 142
Abstract
Multi-organ medical image segmentation is essential for accurate clinical diagnosis, effective treatment planning, and reliable prognosis, yet it remains challenging due to complex backgrounds, irrelevant noise, unclear organ boundaries, and wide variations in organ size. To address these challenges, the boundary feature learning [...] Read more.
Multi-organ medical image segmentation is essential for accurate clinical diagnosis, effective treatment planning, and reliable prognosis, yet it remains challenging due to complex backgrounds, irrelevant noise, unclear organ boundaries, and wide variations in organ size. To address these challenges, the boundary feature learning and enhancement network is proposed. This model integrates a dedicated boundary learning module combined with an auxiliary loss function to strengthen the semantic correlations between boundary pixels and regional features, thus reducing category mis-segmentation. Additionally, channel and positional compound attention mechanisms are employed to selectively filter features and minimize background interference. To further enhance multi-scale representation capabilities, the dynamic scale-aware context module dynamically selects and fuses multi-scale features, significantly improving the model’s adaptability. The model achieves average Dice similarity coefficients of 81.67% on synapse and 90.55% on ACDC datasets, outperforming state-of-the-art methods. This network significantly improves segmentation by emphasizing boundary accuracy, noise reduction, and multi-scale adaptability, enhancing clinical diagnostics and treatment planning. Full article
Show Figures

Figure 1

22 pages, 12983 KiB  
Article
A Hybrid Model for Fluorescein Funduscopy Image Classification by Fusing Multi-Scale Context-Aware Features
by Yawen Wang, Chao Chen, Zhuo Chen and Lingling Wu
Technologies 2025, 13(8), 323; https://doi.org/10.3390/technologies13080323 - 30 Jul 2025
Viewed by 121
Abstract
With the growing use of deep learning in medical image analysis, automated classification of fundus images is crucial for the early detection of fundus diseases. However, the complexity of fluorescein fundus angiography (FFA) images poses challenges in the accurate identification of lesions. To [...] Read more.
With the growing use of deep learning in medical image analysis, automated classification of fundus images is crucial for the early detection of fundus diseases. However, the complexity of fluorescein fundus angiography (FFA) images poses challenges in the accurate identification of lesions. To address these issues, we propose the Enhanced Feature Fusion ConvNeXt (EFF-ConvNeXt) model, a novel architecture combining VGG16 and an enhanced ConvNeXt for FFA image classification. VGG16 is employed to extract edge features, while an improved ConvNeXt incorporates the Context-Aware Feature Fusion (CAFF) strategy to enhance global contextual understanding. CAFF integrates an Improved Global Context (IGC) module with multi-scale feature fusion to jointly capture local and global features. Furthermore, an SKNet module is used in the final stages to adaptively recalibrate channel-wise features. The model demonstrates improved classification accuracy and robustness, achieving 92.50% accuracy and 92.30% F1 score on the APTOS2023 dataset—surpassing the baseline ConvNeXt-T by 3.12% in accuracy and 4.01% in F1 score. These results highlight the model’s ability to better recognize complex disease features, providing significant support for more accurate diagnosis of fundus diseases. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence in Medical Image Analysis)
Show Figures

Figure 1

24 pages, 1147 KiB  
Article
A Channel-Aware AUV-Aided Data Collection Scheme Based on Deep Reinforcement Learning
by Lizheng Wei, Minghui Sun, Zheng Peng, Jingqian Guo, Jiankuo Cui, Bo Qin and Jun-Hong Cui
J. Mar. Sci. Eng. 2025, 13(8), 1460; https://doi.org/10.3390/jmse13081460 - 30 Jul 2025
Viewed by 109
Abstract
Underwater sensor networks (UWSNs) play a crucial role in subsea operations like marine exploration and environmental monitoring. A major challenge for UWSNs is achieving effective and energy-efficient data collection, particularly in deep-sea mining, where energy limitations and long-term deployment are key concerns. This [...] Read more.
Underwater sensor networks (UWSNs) play a crucial role in subsea operations like marine exploration and environmental monitoring. A major challenge for UWSNs is achieving effective and energy-efficient data collection, particularly in deep-sea mining, where energy limitations and long-term deployment are key concerns. This study introduces a Channel-Aware AUV-Aided Data Collection Scheme (CADC) that utilizes deep reinforcement learning (DRL) to improve data collection efficiency. It features an innovative underwater node traversal algorithm that accounts for unique underwater signal propagation characteristics, along with a DRL-based path planning approach to mitigate propagation losses and enhance data energy efficiency. CADC achieves a 71.2% increase in energy efficiency compared to existing clustering methods and shows a 0.08% improvement over the Deep Deterministic Policy Gradient (DDPG), with a 2.3% faster convergence than the Twin Delayed DDPG (TD3), and reduces energy cost to only 22.2% of that required by the TSP-based baseline. By combining a channel-aware traversal with adaptive DRL navigation, CADC effectively optimizes data collection and energy consumption in underwater environments. Full article
Show Figures

Figure 1

18 pages, 5309 KiB  
Article
LGM-YOLO: A Context-Aware Multi-Scale YOLO-Based Network for Automated Structural Defect Detection
by Chuanqi Liu, Yi Huang, Zaiyou Zhao, Wenjing Geng and Tianhong Luo
Processes 2025, 13(8), 2411; https://doi.org/10.3390/pr13082411 - 29 Jul 2025
Viewed by 197
Abstract
Ensuring the structural safety of steel trusses in escalators is critical for the reliable operation of vertical transportation systems. While manual inspection remains widely used, its dependence on human judgment leads to extended cycle times and variable defect-recognition rates, making it less reliable [...] Read more.
Ensuring the structural safety of steel trusses in escalators is critical for the reliable operation of vertical transportation systems. While manual inspection remains widely used, its dependence on human judgment leads to extended cycle times and variable defect-recognition rates, making it less reliable for identifying subtle surface imperfections. To address these limitations, a novel context-aware, multi-scale deep learning framework based on the YOLOv5 architecture is proposed, which is specifically designed for automated structural defect detection in escalator steel trusses. Firstly, a method called GIES is proposed to synthesize pseudo-multi-channel representations from single-channel grayscale images, which enhances the network’s channel-wise representation and mitigates issues arising from image noise and defocused blur. To further improve detection performance, a context enhancement pipeline is developed, consisting of a local feature module (LFM) for capturing fine-grained surface details and a global context module (GCM) for modeling large-scale structural deformations. In addition, a multi-scale feature fusion module (MSFM) is employed to effectively integrate spatial features across various resolutions, enabling the detection of defects with diverse sizes and complexities. Comprehensive testing on the NEU-DET and GC10-DET datasets reveals that the proposed method achieves 79.8% mAP on NEU-DET and 68.1% mAP on GC10-DET, outperforming the baseline YOLOv5s by 8.0% and 2.7%, respectively. Although challenges remain in identifying extremely fine defects such as crazing, the proposed approach offers improved accuracy while maintaining real-time inference speed. These results indicate the potential of the method for intelligent visual inspection in structural health monitoring and industrial safety applications. Full article
Show Figures

Figure 1

20 pages, 9955 KiB  
Article
Dual-Branch Occlusion-Aware Semantic Part-Features Extraction Network for Occluded Person Re-Identification
by Bo Sun, Yulong Zhang, Jianan Wang and Chunmao Jiang
Mathematics 2025, 13(15), 2432; https://doi.org/10.3390/math13152432 - 28 Jul 2025
Viewed by 154
Abstract
Occlusion remains a major challenge in person re-identification, as it often leads to incomplete or misleading visual cues. To address this issue, we propose a dual-branch occlusion-aware network (DOAN), which explicitly and implicitly enhances the model’s capability to perceive and handle occlusions. The [...] Read more.
Occlusion remains a major challenge in person re-identification, as it often leads to incomplete or misleading visual cues. To address this issue, we propose a dual-branch occlusion-aware network (DOAN), which explicitly and implicitly enhances the model’s capability to perceive and handle occlusions. The proposed DOAN framework comprises two synergistic branches. In the first branch, we introduce an Occlusion-Aware Semantic Attention (OASA) module to extract semantic part features, incorporating a parallel channel and spatial attention (PCSA) block to precisely distinguish between pedestrian body regions and occlusion noise. We also generate occlusion-aware parsing labels by combining external human parsing annotations with occluder masks, providing structural supervision to guide the model in focusing on visible regions. In the second branch, we develop an occlusion-aware recovery (OAR) module that reconstructs occluded pedestrians to their original, unoccluded form, enabling the model to recover missing semantic information and enhance occlusion robustness. Extensive experiments on occluded, partial, and holistic benchmark datasets demonstrate that DOAN consistently outperforms existing state-of-the-art methods. Full article
Show Figures

Figure 1

27 pages, 1128 KiB  
Article
Adaptive Multi-Hop P2P Video Communication: A Super Node-Based Architecture for Conversation-Aware Streaming
by Jiajing Chen and Satoshi Fujita
Information 2025, 16(8), 643; https://doi.org/10.3390/info16080643 - 28 Jul 2025
Viewed by 308
Abstract
This paper proposes a multi-hop peer-to-peer (P2P) video streaming architecture designed to support dynamic, conversation-aware communication. The primary contribution is a decentralized system built on WebRTC that eliminates reliance on a central media server by employing super node aggregation. In this architecture, video [...] Read more.
This paper proposes a multi-hop peer-to-peer (P2P) video streaming architecture designed to support dynamic, conversation-aware communication. The primary contribution is a decentralized system built on WebRTC that eliminates reliance on a central media server by employing super node aggregation. In this architecture, video streams from multiple peer nodes are dynamically routed through a group of super nodes, enabling real-time reconfiguration of the network topology in response to conversational changes. To support this dynamic behavior, the system leverages WebRTC data channels for control signaling and overlay restructuring, allowing efficient dissemination of topology updates and coordination messages among peers. A key focus of this study is the rapid and efficient reallocation of network resources immediately following conversational events, ensuring that the streaming overlay remains aligned with ongoing interaction patterns. While the automatic detection of such events is beyond the scope of this work, we assume that external triggers are available to initiate topology updates. To validate the effectiveness of the proposed system, we construct a simulation environment using Docker containers and evaluate its streaming performance under dynamic network conditions. The results demonstrate the system’s applicability to adaptive, naturalistic communication scenarios. Finally, we discuss future directions, including the seamless integration of external trigger sources and enhanced support for flexible, context-sensitive interaction frameworks. Full article
(This article belongs to the Special Issue Second Edition of Advances in Wireless Communications Systems)
Show Figures

Figure 1

22 pages, 963 KiB  
Article
The Impact of E-Commerce Live Streaming on Purchase Intention for Sustainable Green Agricultural Products: A Study in the Context of Agricultural Tourism Integration
by Wenkui Jin and Wenying Zhang
Sustainability 2025, 17(15), 6850; https://doi.org/10.3390/su17156850 - 28 Jul 2025
Viewed by 332
Abstract
Growing awareness of sustainable development and green consumer concerns is driving the market expansion for green agriculture products. E-commerce live streaming gives rural enterprises a new channel through scenario-building and interaction, while agro-tourism integration combines resources to generate a variety of promotion scenarios. [...] Read more.
Growing awareness of sustainable development and green consumer concerns is driving the market expansion for green agriculture products. E-commerce live streaming gives rural enterprises a new channel through scenario-building and interaction, while agro-tourism integration combines resources to generate a variety of promotion scenarios. This study examines the effects of external stimuli, including social networks, resource endowment, infrastructure, and the characteristics of e-commerce streamers, on the perception, trust, perceived value, and purchase intention of green consumption. It is based on the SOR (Stimulus–Organism–Response) theoretical model and focuses on e-commerce live streaming in the agriculture-tourism integration scenario. According to a structural equation modeling (SEM) analysis of 350 consumer questionnaires, these external stimuli primarily influence purchase intention through perceived value, trust, and green consumption cognition, with resource endowment having the most significant impact. The effects of infrastructure on perceived value and streamer attractiveness on green consumption cognition are not statistically significant. This research not only broadens the use of the SOR model in the emerging field of agritourism integration but also offers rural businesses theoretical backing and useful guidance to maximize e-commerce live marketing and enhance agritourism integration. Full article
Show Figures

Figure 1

25 pages, 2518 KiB  
Article
An Efficient Semantic Segmentation Framework with Attention-Driven Context Enhancement and Dynamic Fusion for Autonomous Driving
by Jia Tian, Peizeng Xin, Xinlu Bai, Zhiguo Xiao and Nianfeng Li
Appl. Sci. 2025, 15(15), 8373; https://doi.org/10.3390/app15158373 - 28 Jul 2025
Viewed by 329
Abstract
In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where [...] Read more.
In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where strict real-time performance is essential. Achieving an effective balance between speed and accuracy has thus become a central challenge in this field. To address this issue, we present a lightweight semantic segmentation model tailored for the perception requirements of autonomous vehicles. The architecture follows an encoder–decoder paradigm, which not only preserves the capability for deep feature extraction but also facilitates multi-scale information integration. The encoder leverages a high-efficiency backbone, while the decoder introduces a dynamic fusion mechanism designed to enhance information interaction between different feature branches. Recognizing the limitations of convolutional networks in modeling long-range dependencies and capturing global semantic context, the model incorporates an attention-based feature extraction component. This is further augmented by positional encoding, enabling better awareness of spatial structures and local details. The dynamic fusion mechanism employs an adaptive weighting strategy, adjusting the contribution of each feature channel to reduce redundancy and improve representation quality. To validate the effectiveness of the proposed network, experiments were conducted on a single RTX 3090 GPU. The Dynamic Real-time Integrated Vision Encoder–Segmenter Network (DriveSegNet) achieved a mean Intersection over Union (mIoU) of 76.9% and an inference speed of 70.5 FPS on the Cityscapes test dataset, 74.6% mIoU and 139.8 FPS on the CamVid test dataset, and 35.8% mIoU with 108.4 FPS on the ADE20K dataset. The experimental results demonstrate that the proposed method achieves an excellent balance between inference speed, segmentation accuracy, and model size. Full article
Show Figures

Figure 1

23 pages, 19710 KiB  
Article
Hybrid EEG Feature Learning Method for Cross-Session Human Mental Attention State Classification
by Xu Chen, Xingtong Bao, Kailun Jitian, Ruihan Li, Li Zhu and Wanzeng Kong
Brain Sci. 2025, 15(8), 805; https://doi.org/10.3390/brainsci15080805 - 28 Jul 2025
Viewed by 258
Abstract
Background: Decoding mental attention states from electroencephalogram (EEG) signals is crucial for numerous applications such as cognitive monitoring, adaptive human–computer interaction, and brain–computer interfaces (BCIs). However, conventional EEG-based approaches often focus on channel-wise processing and are limited to intra-session or subject-specific scenarios, lacking [...] Read more.
Background: Decoding mental attention states from electroencephalogram (EEG) signals is crucial for numerous applications such as cognitive monitoring, adaptive human–computer interaction, and brain–computer interfaces (BCIs). However, conventional EEG-based approaches often focus on channel-wise processing and are limited to intra-session or subject-specific scenarios, lacking robustness in cross-session or inter-subject conditions. Methods: In this study, we propose a hybrid feature learning framework for robust classification of mental attention states, including focused, unfocused, and drowsy conditions, across both sessions and individuals. Our method integrates preprocessing, feature extraction, feature selection, and classification in a unified pipeline. We extract channel-wise spectral features using short-time Fourier transform (STFT) and further incorporate both functional and structural connectivity features to capture inter-regional interactions in the brain. A two-stage feature selection strategy, combining correlation-based filtering and random forest ranking, is adopted to enhance feature relevance and reduce dimensionality. Support vector machine (SVM) is employed for final classification due to its efficiency and generalization capability. Results: Experimental results on two cross-session and inter-subject EEG datasets demonstrate that our approach achieves classification accuracy of 86.27% and 94.01%, respectively, significantly outperforming traditional methods. Conclusions: These findings suggest that integrating connectivity-aware features with spectral analysis can enhance the generalizability of attention decoding models. The proposed framework provides a promising foundation for the development of practical EEG-based systems for continuous mental state monitoring and adaptive BCIs in real-world environments. Full article
Show Figures

Figure 1

24 pages, 3480 KiB  
Article
MFPI-Net: A Multi-Scale Feature Perception and Interaction Network for Semantic Segmentation of Urban Remote Sensing Images
by Xiaofei Song, Mingju Chen, Jie Rao, Yangming Luo, Zhihao Lin, Xingyue Zhang, Senyuan Li and Xiao Hu
Sensors 2025, 25(15), 4660; https://doi.org/10.3390/s25154660 - 27 Jul 2025
Viewed by 372
Abstract
To improve semantic segmentation performance for complex urban remote sensing images with multi-scale object distribution, class similarity, and small object omission, this paper proposes MFPI-Net, an encoder–decoder-based semantic segmentation network. It includes four core modules: a Swin Transformer backbone encoder, a diverse dilation [...] Read more.
To improve semantic segmentation performance for complex urban remote sensing images with multi-scale object distribution, class similarity, and small object omission, this paper proposes MFPI-Net, an encoder–decoder-based semantic segmentation network. It includes four core modules: a Swin Transformer backbone encoder, a diverse dilation rates attention shuffle decoder (DDRASD), a multi-scale convolutional feature enhancement module (MCFEM), and a cross-path residual fusion module (CPRFM). The Swin Transformer efficiently extracts multi-level global semantic features through its hierarchical structure and window attention mechanism. The DDRASD’s diverse dilation rates attention (DDRA) block combines convolutions with diverse dilation rates and channel-coordinate attention to enhance multi-scale contextual awareness, while Shuffle Block improves resolution via pixel rearrangement and avoids checkerboard artifacts. The MCFEM enhances local feature modeling through parallel multi-kernel convolutions, forming a complementary relationship with the Swin Transformer’s global perception capability. The CPRFM employs multi-branch convolutions and a residual multiplication–addition fusion mechanism to enhance interactions among multi-source features, thereby improving the recognition of small objects and similar categories. Experiments on the ISPRS Vaihingen and Potsdam datasets show that MFPI-Net outperforms mainstream methods, achieving 82.57% and 88.49% mIoU, validating its superior segmentation performance in urban remote sensing. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

28 pages, 2925 KiB  
Article
A Lightweight Neural Network Based on Memory and Transition Probability for Accurate Real-Time Sleep Stage Classification
by Dhanushka Wijesinghe and Ivan T. Lima
Brain Sci. 2025, 15(8), 789; https://doi.org/10.3390/brainsci15080789 - 25 Jul 2025
Viewed by 364
Abstract
Background/Objectives: This study shows a lightweight hybrid framework based on a feedforward neural network using a single frontopolar electroencephalography channel, which is a practical configuration for wearable systems combining memory and a sleep stage transition probability matrix. Methods: Motivated by autocorrelation [...] Read more.
Background/Objectives: This study shows a lightweight hybrid framework based on a feedforward neural network using a single frontopolar electroencephalography channel, which is a practical configuration for wearable systems combining memory and a sleep stage transition probability matrix. Methods: Motivated by autocorrelation analysis, revealing strong temporal dependencies across sleep stages, we incorporate prior epoch information as additional features. To capture temporal context without requiring long input sequences, we introduce a transition-aware feature derived from the softmax output of the previous epoch, weighted by a learned stage transition matrix. The model combines predictions from memory-based and no-memory networks using a confidence-driven fallback strategy. Results: The proposed model achieves up to 85.4% accuracy and 0.79 Cohen’s kappa, despite using only a single 30 s epoch per prediction. Compared to other models that use a single frontopolar channel, our method outperforms convolutional neural networks, recurrent neural networks, and decision tree approaches. Additionally, confidence-based rejection of low-certainty predictions enhances reliability, since most of the epochs with low confidence in the sleep stage classification contain transitions between sleep stages. Conclusions: These results demonstrate that the proposed method balances performance, interpretability, and computational efficiency, making it well-suited for real-time clinical and wearable sleep staging applications using battery-powered computing devices. Full article
Show Figures

Figure 1

20 pages, 7280 KiB  
Article
UAV-DETR: An Enhanced RT-DETR Architecture for Efficient Small Object Detection in UAV Imagery
by Yu Zhou and Yan Wei
Sensors 2025, 25(15), 4582; https://doi.org/10.3390/s25154582 - 24 Jul 2025
Viewed by 539
Abstract
To mitigate the technical challenges associated with small-object detection, feature degradation, and spatial-contextual misalignment in UAV-acquired imagery, this paper proposes UAV-DETR, an enhanced Transformer-based object detection model designed for aerial scenarios. Specifically, UAV imagery often suffers from feature degradation due to low resolution [...] Read more.
To mitigate the technical challenges associated with small-object detection, feature degradation, and spatial-contextual misalignment in UAV-acquired imagery, this paper proposes UAV-DETR, an enhanced Transformer-based object detection model designed for aerial scenarios. Specifically, UAV imagery often suffers from feature degradation due to low resolution and complex backgrounds and from semantic-spatial misalignment caused by dynamic shooting conditions. This work addresses these challenges by enhancing feature perception, semantic representation, and spatial alignment. Architecturally extending the RT-DETR framework, UAV-DETR incorporates three novel modules: the Channel-Aware Sensing Module (CAS), the Scale-Optimized Enhancement Pyramid Module (SOEP), and the newly designed Context-Spatial Alignment Module (CSAM), which integrates the functionalities of contextual and spatial calibration. These components collaboratively strengthen multi-scale feature extraction, semantic representation, and spatial-contextual alignment. The CAS module refines the backbone to improve multi-scale feature perception, while SOEP enhances semantic richness in shallow layers through lightweight channel-weighted fusion. CSAM further optimizes the hybrid encoder by simultaneously correcting contextual inconsistencies and spatial misalignments during feature fusion, enabling more precise cross-scale integration. Comprehensive comparisons with mainstream detectors, including Faster R-CNN and YOLOv5, demonstrate that UAV-DETR achieves superior small-object detection performance in complex aerial scenarios. The performance is thoroughly evaluated in terms of mAP@0.5, parameter count, and computational complexity (GFLOPs). Experiments on the VisDrone2019 dataset benchmark demonstrate that UAV-DETR achieves an mAP@0.5 of 51.6%, surpassing RT-DETR by 3.5% while reducing the number of model parameters from 19.8 million to 16.8 million. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

Back to TopTop