MDPI - Publisher of Open Access Journals

32 pages, 2453 KB

Open AccessArticle

An Improved MSEM-Deeplabv3+ Method for Intelligent Detection of Rock Mass Fractures

by Chi Zhang, Shu Gan, Xiping Yuan, Weidong Luo, Chong Ma and Yi Li

Remote Sens. 2026, 18(7), 1041; https://doi.org/10.3390/rs18071041 - 30 Mar 2026

Fractures as critical discontinuous structural planes in rock masses, directly govern their stability and serve as the core controlling factor in rock mechanics engineering. Existing deep learning models for fracture extraction face persistent challenges, including imbalanced integration of deep and shallow features, limited [...] Read more.

Fractures as critical discontinuous structural planes in rock masses, directly govern their stability and serve as the core controlling factor in rock mechanics engineering. Existing deep learning models for fracture extraction face persistent challenges, including imbalanced integration of deep and shallow features, limited suppression of background noise, inadequate multi-scale feature representation, and large parameter sizes—making it difficult to strike a balance between detection accuracy and deployment efficiency. Focusing on the Wanshanshan quarry in Yunnan, this study first constructs a high-precision digital model using close-range photogrammetry and 3D real-scene reconstruction. A lightweight yet high-accuracy intelligent detection method, termed MSEM-Deeplabv3+, is then proposed for rock mass fracture extraction. The model adopts lightweight MobileNetV2 as the backbone network, incorporating inverted residual modules and depthwise separable convolutions, resulting in a parameter size of only 6.02 MB and FLOPs of 30.170 G—substantially reducing computational overhead. Furthermore, the proposed MAGF (Multi-Scale Attention Gated Fusion) and SCSA (Spatial-Channel Synergistic Attention) modules are integrated to enhance the representation of fracture details and semantic consistency while effectively suppressing multi-source and multi-scale background interference. Experimental results demonstrate that the proposed model achieves an mPA of 89.69%, mIoU of 83.71%, F1-Score of 90.41%, and Kappa coefficient of 80.81%, outperforming the classic Deeplabv3+ model by 5.81%, 6.18%, 4.53%, and 9.2%, respectively. It also significantly surpasses benchmark models such as U-Net and HRNet. The method accurately captures fine and continuous fracture details, preserves the spatial distribution of long-range continuous fractures, and maintains robust performance on the CFD cross-scene dataset, showcasing strong adaptability and generalization capability. This approach effectively mitigates the risks associated with manual high-altitude inspections and provides a lightweight, high-precision, non-contact intelligent solution for fracture detection in high-steep rock slopes. Full article

19 pages, 3412 KB

Open AccessArticle

Attention-Enhanced GAN for Astronomical Image Restoration Under Atmospheric Turbulence and Optical Aberrations

by Chaoyong Peng, Jinlong Li, Jiaqi Bao and Lin Luo

Sensors 2026, 26(7), 2135; https://doi.org/10.3390/s26072135 - 30 Mar 2026

Abstract

Ground-based astronomical images are often degraded by atmospheric turbulence and deterministic optical aberrations introduced by telescope design and manufacturing processes. Joint mitigation of these distortions remains challenging due to the lack of reliable ground-truth data. To address this issue, a physics-based atmospheric–optical imaging [...] Read more.

Ground-based astronomical images are often degraded by atmospheric turbulence and deterministic optical aberrations introduced by telescope design and manufacturing processes. Joint mitigation of these distortions remains challenging due to the lack of reliable ground-truth data. To address this issue, a physics-based atmospheric–optical imaging model is developed to generate a large-scale, physically consistent simulated dataset, enabling supervised learning without real paired observations. Based on this, an attention-enhanced generative adversarial network (AE-GAN) is proposed for astronomical image restoration. The network incorporates a Channel Attention Block (CAB) and a Semantic Attention Module (SAM) within a feature pyramid architecture to enhance multi-scale representation and suppress turbulence-induced distortions. Experimental results show that the proposed method achieves consistent restoration performance under varying turbulence strengths, aberration amplitudes, and noise levels. Compared with recent Transformer-based methods, it maintains competitive performance across different aberration types while achieving significantly higher computational efficiency (1.21 s per image, 3.5× faster). In addition, the model trained on simulated data generalizes effectively to real astronomical observations. Full article

(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)

33 pages, 3496 KB

Open AccessArticle

Modified RefineNet with Attention-Based Fusion for Multi-Class Classification of Corn and Pepper Plant Diseases

by Maramreddy Srinivasulu and Sandipan Maiti

AgriEngineering 2026, 8(4), 122; https://doi.org/10.3390/agriengineering8040122 - 30 Mar 2026

Abstract

Early and precise detection of plant diseases is essential for safeguarding crop yield and ensuring sustainable agricultural practices. In this study, we propose the Modified RefineNet with Attention based Fusion (MoRefNet-AF), a Modified RefineNet architecture enhanced with attention-based fusion for multi-class classification of [...] Read more.

Early and precise detection of plant diseases is essential for safeguarding crop yield and ensuring sustainable agricultural practices. In this study, we propose the Modified RefineNet with Attention based Fusion (MoRefNet-AF), a Modified RefineNet architecture enhanced with attention-based fusion for multi-class classification of corn (maize) and Pepper leaf diseases. Unlike the original RefineNet, which was segmentation-oriented and computationally heavy, MoRefNet-AF is redesigned for lightweight and discriminative classification. The modifications include replacing standard convolutions with depthwise separable convolutions for efficiency, adopting the Mish activation function for smoother gradient flow, redesigning the multi-resolution fusion module with concatenation and shared convolution for richer cross-scale integration, and incorporating Squeeze-and-Excitation (SE) blocks for adaptive channel recalibration. Additionally, Chained Residual Pooling (CRP) with atrous convolutions enhances contextual representation, while global average pooling with dense layers improves classification readiness. When evaluated on a curated six-class dataset combining PlantVillage and Mendeley leaf disease repositories, MoRefNet-AF achieved 99.88% accuracy, 99.74% precision, 99.73% recall, 99.95% F1-score, and 99.73% specificity. These results outperform strong baselines including ResNet152V2, DenseNet201, EfficientNet-B0, and ConvNeXt-Tiny, while maintaining only 0.3 M parameters. With its compact design and TensorFlow Lite (v2.13) compatibility, MoRefNet-AF offers a robust, lightweight, and real-time deployable solution for precision agriculture and smart plant disease monitoring. Full article

(This article belongs to the Special Issue The Application of Machine Learning and Deep Learning Techniques in Agriculture)

29 pages, 7368 KB

Open AccessArticle

Method for Emotion Recognition of EEG Signals Based on Recursive Graph and Spatiotemporal Attention Mechanism

by Dong Huang, Lin Xu and Yuwen Li

Brain Sci. 2026, 16(4), 377; https://doi.org/10.3390/brainsci16040377 - 30 Mar 2026

Abstract

Emotion recognition plays a crucial role in human–computer interaction and mental health applications. Traditional Electroencephalogram (EEG)-based emotion recognition methods are limited in classification accuracy due to their neglect of the spatiotemporal characteristics of the signals and individual differences. This study proposes a novel [...] Read more.

Emotion recognition plays a crucial role in human–computer interaction and mental health applications. Traditional Electroencephalogram (EEG)-based emotion recognition methods are limited in classification accuracy due to their neglect of the spatiotemporal characteristics of the signals and individual differences. This study proposes a novel EEG emotion recognition framework that integrates spatiotemporal features to enhance performance through the following innovations: (1) the use of a Recurrence Plot (RP) to transform one-dimensional EEG signals into two-dimensional images, enhancing the representation of nonlinear dynamic features; (2) the design of a Spatiotemporal Channel Attention Module (TCSA), which combines temporal convolution, channel, and spatial attention mechanisms to optimize the capture of complex patterns; and (3) the integration of the lightweight and efficient network Efficientnet to construct the TCSA-Efficientnet classification model. On the Database for Emotion Analysis using Physiological Signals (DEAP) dataset, the proposed method achieves accuracy rates of 99.11% and 99.33% for valence and arousal classification tasks, respectively. On the Database for Emotion Recognition Using EEG and Physiological Signals (DREAMER) dataset, the method achieves accuracy rates of 98.08% and 97.49%, outperforming other EEG-based emotion classification models on both datasets. This demonstrates its advantages in accuracy, robustness, and generalization. Full article

(This article belongs to the Section Computational Neuroscience, Neuroinformatics, and Neurocomputing)

► Show Figures

Figure 1

22 pages, 8847 KB

Open AccessArticle

DGAGaze: Gaze Estimation with Dual-Stream Differential Attention and Geometry-Aware Temporal Alignment

by Wei Zhang and Pengcheng Li

Appl. Sci. 2026, 16(7), 3298; https://doi.org/10.3390/app16073298 - 29 Mar 2026

Abstract

Gaze estimation plays a crucial role in human-computer interaction and behavior analysis. However, in dynamic scenes, rigid head movements and rapid gaze shifts pose significant challenges to accurate gaze prediction. Most existing methods either process single-frame images independently or rely on long video [...] Read more.

Gaze estimation plays a crucial role in human-computer interaction and behavior analysis. However, in dynamic scenes, rigid head movements and rapid gaze shifts pose significant challenges to accurate gaze prediction. Most existing methods either process single-frame images independently or rely on long video sequences, making it difficult to simultaneously achieve strong performance and high computational efficiency. To address this issue, we propose DGAGaze, a gaze estimation framework based on a difference-driven spatiotemporal attention mechanism. This framework uses a geometry-aware temporal alignment module to mitigate interference from rigid head movements, compensating for them through pose estimation and affine feature warping, thereby achieving explicit decoupling between global head motion and local eye motion. Based on the aligned features, inter-frame differences are used to adjust spatial and channel attention weights, enhancing motion-sensitive representations without introducing an additional temporal modeling layer. Extensive experiments on the EyeDiap and Gaze360 datasets demonstrate the effectiveness of the proposed approach. DGAGaze achieves improved gaze estimation accuracy while maintaining a lightweight architecture based on a ResNet-18 backbone, outperforming existing state-of-the-art methods. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

24 pages, 4811 KB

Open AccessArticle

Lightweight Power Line Defect Detection Based on Improved YOLOv8n

by Yuhan Yin, Xiaoyi Liu, Kunxiao Wu, Ruilin Xu, Jianyong Zheng and Fei Mei

Sensors 2026, 26(7), 2112; https://doi.org/10.3390/s26072112 - 28 Mar 2026

Viewed by 51

Abstract

To address the challenges of small targets, severe background clutter, and high deployment cost in UAV-based power-line defect detection, this paper proposes a lightweight defect detection model based on an improved YOLOv8n. In the downsampling stage, we design an improved lightweight adaptive downsampling [...] Read more.

To address the challenges of small targets, severe background clutter, and high deployment cost in UAV-based power-line defect detection, this paper proposes a lightweight defect detection model based on an improved YOLOv8n. In the downsampling stage, we design an improved lightweight adaptive downsampling module (ADownPro) to replace part of conventional convolutions, which uses a dual-branch parallel structure for stronger feature interaction and depthwise separable convolutions (DSConv) for complexity reduction. In the feature extraction stage, an integration of cross-stage partial connections and partial convolution (CSPPC) is proposed to replace the C2F module for efficient multi-scale feature fusion. In the detection head, mixed local channel attention (MLCA), which combines channel-spatial information and local–global contextual features, is introduced to strengthen defect-focused representations under complex backgrounds. For the loss function, a scale-annealed mixed-quality EIoU loss (SAMQ-EIoU) is proposed by combining iso-center scale transformation, scale factor annealing and focal-style quality reweighting to improve localization accuracy at high IoU thresholds. Experiments on a constructed dataset covering six typical defect categories show that the improved YOLOv8n achieves 91.4% mAP@0.50 and 64.5% mAP@0.50:0.95, with only 1.59 M parameters and 4.9 GFLOPs. Compared with mainstream detectors, the proposed model achieves a better balance between detection accuracy and lightweight design. In particular, compared with the recently proposed YOLOv8n-DSN and IDD-YOLO, it improves mAP@0.50 by 0.6% and 0.8%, and mAP@0.50:0.95 by 1.2% and 4.8%, respectively, while further reducing the parameter count by 1.00 M and 1.26 M, and the FLOPs by 1.7 G and 0.2 G. Moreover, the cross-dataset evaluation on the public UPID and SFID datasets further demonstrate the robustness and generalization ability of the proposed method. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

27 pages, 6255 KB

Open AccessArticle

Lightweight Safety Helmet Wearing Detection Algorithm Based on GSA-YOLO

by Haodong Wang, Qiang Zhou, Zhiyuan Hao, Wentao Xiao and Luqing Yan

Sensors 2026, 26(7), 2110; https://doi.org/10.3390/s26072110 - 28 Mar 2026

Viewed by 62

Abstract

Electric power station confined spaces are high-risk and complex environments characterized by significant illumination variations. Whether safety helmets are properly worn directly affects the operational safety of workers in confined spaces. However, helmet detection in such environments faces several challenges, including drastic lighting [...] Read more.

Electric power station confined spaces are high-risk and complex environments characterized by significant illumination variations. Whether safety helmets are properly worn directly affects the operational safety of workers in confined spaces. However, helmet detection in such environments faces several challenges, including drastic lighting changes and difficulties in small-object detection. Moreover, existing object detection models typically contain a large number of parameters, making real-time helmet detection difficult to deploy on field devices with limited computational resources. To address these issues, this paper proposes a lightweight safety helmet wearing detection algorithm named GSA-YOLO. To mitigate the effects of severe illumination variation and detail loss in confined spaces, a GCA-C2f module integrating GhostConv and the CBAM attention mechanism is embedded into the backbone network. This design reduces the number of parameters and computational cost while enhancing the model’s feature extraction capability under challenging lighting conditions. To improve detection performance for occluded targets, an improved efficient channel attention (I-ECA) mechanism is introduced into the neck structure, which suppresses irrelevant channel features and enhances occluded object detection accuracy. Furthermore, to alleviate missed detections of small objects and inaccurate localization under low-light conditions, a P2 detection branch is added to the head, and the WIoU loss function is adopted to dynamically adjust the weights of hard and easy samples, thereby improving small-object detection accuracy and localization robustness. A confined space helmet detection dataset containing 5000 images was constructed through on-site data collection for model training and validation. Experimental results demonstrate that the proposed GSA-YOLO achieves an mAP@0.5 of 91.2% on the self-built dataset with only 2.3 M parameters, outperforming the baseline model by 2.9% while reducing the parameter count by 23.6%. The experimental results verify that the proposed algorithm is suitable for environments with significant illumination variation and small-object detection challenges. It provides a lightweight and efficient solution for on-site helmet detection in confined space scenarios, thereby contributing to the reduction in industrial safety accidents. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

24 pages, 1020 KB

Open AccessArticle

Research on the Diagnosis of Abnormal Sound Defects in Automobile Engines Based on Fusion of Multi-Modal Images and Audio

by Yi Xu, Wenbo Chen and Xuedong Jing

Electronics 2026, 15(7), 1406; https://doi.org/10.3390/electronics15071406 - 27 Mar 2026

Viewed by 168

Abstract

Against the global carbon neutrality target, predictive maintenance (PdM) of automotive engines represents a core technical strategy to advance the sustainable development of the automotive industry. Conventional single-modal diagnostic approaches for engine abnormal sound defects suffer from low accuracy and weak anti-interference capability. [...] Read more.

Against the global carbon neutrality target, predictive maintenance (PdM) of automotive engines represents a core technical strategy to advance the sustainable development of the automotive industry. Conventional single-modal diagnostic approaches for engine abnormal sound defects suffer from low accuracy and weak anti-interference capability. Existing multi-modal fusion methods fail to deeply mine the physical coupling between cross-modal features and often entail excessive model complexity, hindering deployment on resource-constrained on-board edge devices. To resolve these limitations, this study proposes a Physical Prior-Embedded Cross-Modal Attention (PPE-CMA) mechanism for lightweight multi-modal fusion diagnosis of engine abnormal sound defects. First, wavelet packet decomposition (WPD) and mel-frequency cepstral coefficients (MFCC) are integrated to extract time-frequency features from engine audio signals, while a channel-pruned ResNet18 is employed to extract spatial features from engine thermal imaging and vibration visualization images. Second, the PPE-CMA module is designed to adaptively assign attention weights to audio and image features by exploiting the physical coupling between engine fault acoustic and visual characteristics, enabling efficient cross-modal feature fusion with redundant information suppression. A rigorous theoretical derivation is provided to link cosine similarity with the physical correlation of engine fault acoustic-visual features, justifying the attention weight constraint (β = 1 − α) from the perspective of fault feature physical coupling. Third, an improved lightweight XGBoost classifier is constructed for fault classification, and a hybrid data augmentation strategy customized for engine multi-modal data is proposed to address the small-sample challenge in industrial applications. Ablation experiments on ResNet18 pruning ratios verify the optimal trade-off between diagnostic performance and computational efficiency, while feature distribution analysis validates the authenticity and effectiveness of the hybrid augmentation strategy. Experimental results on a self-constructed multi-modal dataset show that the proposed method achieves 98.7% diagnostic accuracy and a 98.2% F1-score, retaining 96.5% accuracy under 90 dB high-level environmental noise, with an end-to-end inference speed of 0.8 ms per sample (including preprocessing, feature extraction, and classification). Cross-engine and cross-domain validation on a 2.0T diesel engine small-sample dataset and the open-source SEMFault-2024 dataset yield average accuracies of 94.8% and 95.2%, respectively, demonstrating strong generalization. This method effectively enhances the accuracy and robustness of engine abnormal sound defect diagnosis, offering a lightweight technical solution for on-board real-time fault diagnosis and in-plant online quality inspection. By reducing engine fault-induced energy loss and spare parts waste, it further promotes energy conservation and emission reduction in the automotive industry. Quantified experimental data on fuel efficiency improvement and carbon emission reduction are provided to substantiate the ecological benefits of the proposed framework. Full article

► Show Figures

Figure 1

23 pages, 1270 KB

Open AccessArticle

A Band-Aware Riemannian Network with Domain Adaptation for Motor Imagery EEG Signal Decoding

by Zhehan Wang, Yuliang Ma, Yicheng Du and Qingshan She

Brain Sci. 2026, 16(4), 363; https://doi.org/10.3390/brainsci16040363 - 27 Mar 2026

Viewed by 180

Abstract

Background: The decoding of motor imagery electroencephalography (MI-EEG) is constrained by core issues including low signal-to-noise ratio (SNR) and cross-session as well as cross-subject domain shift, which seriously impedes the practical deployment of brain–computer interfaces (BCIs). Methods: To address these challenges, this paper [...] Read more.

Background: The decoding of motor imagery electroencephalography (MI-EEG) is constrained by core issues including low signal-to-noise ratio (SNR) and cross-session as well as cross-subject domain shift, which seriously impedes the practical deployment of brain–computer interfaces (BCIs). Methods: To address these challenges, this paper proposes a novel end-to-end MI-EEG decoding method named BARN-DA. Two innovative modules, Band-Aware Channel Attention (BACA) and Multi-Scale Kernel Perception (MSKP), are designed: one enhances discriminative channel features by modeling channel information fused with frequency band feature representation, and the other captures complex data correlations via multi-scale parallel convolutions to improve the discriminability of the network’s feature extraction. Subsequently, the features are mapped onto the Riemannian manifold. For the source and target domain features residing on this manifold, a Riemannian Maximum Mean Discrepancy (R-MMD) loss is designed based on the log-Euclidean metric. This approach enables the effective embedding of Symmetric Positive Definite (SPD) matrices into the Reproducing Kernel Hilbert Space (RKHS), thereby reducing cross-domain discrepancies. Results: Experimental results on four public datasets demonstrate that the BARN-DA method achieves average cross-session classification accuracies of 84.65% ± 8.97% (BCIC IV 2a), 89.19% ± 7.69% (BCIC IV 2b), and 61.76% ± 12.68% (SHU), as well as average cross-subject classification accuracies of 65.49% ± 11.64% (BCIC IV 2a), 78.78% ± 8.44% (BCIC IV 2b), and 78.14% ± 14.41% (BCIC III 4a). Compared with state-of-the-art methods, BARN-DA obtains higher classification accuracy and stronger cross-session and cross-subject generalization ability. Conclusions: These results confirm that BARN-DA effectively alleviates low SNR and domain shift problems in MI-EEG decoding, providing an efficient technical solution for practical BCI systems. Full article

(This article belongs to the Section Computational Neuroscience, Neuroinformatics, and Neurocomputing)

► Show Figures

Figure 1

24 pages, 15151 KB

Open AccessArticle

SG-YOLO: A Multispectral Small-Object Detector for UAV Imagery Based on YOLO

by Binjie Zhang, Lin Wang, Quanwei Yao, Keyang Li and Qinyan Tan

Remote Sens. 2026, 18(7), 1003; https://doi.org/10.3390/rs18071003 - 27 Mar 2026

Viewed by 186

Abstract

Object detection in unmanned aerial vehicle (UAV) imagery remains a crucial yet challenging task due to complex backgrounds, large scale variations, and the prevalence of small objects. Visible-spectrum images lack robustness under all-weather and all-illumination conditions; by contrast, multispectral sensing provides complementary cues [...] Read more.

Object detection in unmanned aerial vehicle (UAV) imagery remains a crucial yet challenging task due to complex backgrounds, large scale variations, and the prevalence of small objects. Visible-spectrum images lack robustness under all-weather and all-illumination conditions; by contrast, multispectral sensing provides complementary cues (e.g., thermal signatures) that improve detection robustness. However, existing multispectral solutions often incur high computational costs and are therefore difficult to deploy on resource-constrained UAV platforms. To address these issues, SG-YOLO is proposed, a lightweight and efficient multispectral object detection framework that aims to balance accuracy and efficiency. First, a Spectral Gated Downsampling Stem (SGDS) is designed, in which grouped convolutions and a gating mechanism are employed at the early stage of the network to extract band-specific features, thereby maximizing spectral complementarity while minimizing redundancy. Second, a Spectral–Spatial Iterative Attention Fusion (SSIAF) module is introduced, in which spectral-wise (channel) attention and spatial-wise attention are iteratively coupled and cascaded in a multi-scale manner to jointly model cross-band dependencies and spatial saliency, thereby aggregating high-level semantic information while suppressing redundant spectral responses. Finally, a Spatial–Channel Synergistic Fusion (SCSF) module is designed to enhance multi-scale and cross-channel feature integration in the neck. Experiments on the MODA dataset show that SG-YOLOs achieves 72.4% mAP₅₀, outperforming the baseline by 3.2%. Moreover, compared with a range of mainstream one-stage detectors and multispectral detection methods, SG-YOLO delivers the best overall performance, providing an effective solution for UAV object detection while maintaining a favorable trade-off between model size and detection accuracy. Full article

(This article belongs to the Special Issue Deep Neural Networks for Hyperspectral Remote Sensing Image Processing (Second Edition))

► Show Figures

Figure 1

25 pages, 9555 KB

Open AccessArticle

EFSL-YOLO: An Improved Model for Small Object Detection in UAV Vision

by Meng Zhou, Shuke He, Chang Wang and Jing Wang

Drones 2026, 10(4), 243; https://doi.org/10.3390/drones10040243 - 27 Mar 2026

Viewed by 109

Abstract

To address the challenges in UAV remote sensing imagery, such as small object size, dense occlusion and complex background interference, this paper proposes an enhanced small object detection algorithm based on an improved YOLOv13 model for drone applications in complex weather environments. First, [...] Read more.

To address the challenges in UAV remote sensing imagery, such as small object size, dense occlusion and complex background interference, this paper proposes an enhanced small object detection algorithm based on an improved YOLOv13 model for drone applications in complex weather environments. First, an enhanced feature fusion attention network (EFFA-Net) is designed in the preprocessing stage to reduce image degradation and suppress the interference caused by smoke and haze. Then, in the backbone, a swish-gated convolution (SwiGLUConv) module is designed to adaptively expand the receptive field and enhance multi-scale feature extraction, which strengthens the representation of small targets while maintaining efficient computation. Furthermore, a locally enhanced multi-scale context fusion (LF-MSCF) module is integrated into the feature fusion neck of YOLO, combining multi-head self-attention, channel attention, and spatial attention to suppress background noise and redundant responses, thereby improving detection accuracy. Extensive experiments on the VisDrone-DET2019 dataset, UAVDT dataset, and HazyDet dataset demonstrate that the proposed algorithm outperforms other mainstream methods, showcasing excellent detection accuracy and robustness in complex UAV aerial scenarios. Full article

(This article belongs to the Special Issue When Deep Learning Meets Geometry for Air-to-Ground Perception on Drones: 2nd Edition)

25 pages, 8205 KB

Open AccessArticle

Forest Road Extraction via Optimized DeepLabv3+ and Multi-Temporal Remote Sensing for Wildfire Emergency Response

by Zhuoran Gao, Ziyang Li, Weiyuan Yao, Tingtao Zhang, Shi Qiu and Zhaoyan Liu

Appl. Sci. 2026, 16(7), 3228; https://doi.org/10.3390/app16073228 - 26 Mar 2026

Viewed by 251

Abstract

Forest fires occur frequently in China; however, the complex terrain and incomplete road networks severely constrain ground rescue efficiency. Accurate forest road information is essential for the optimization of emergency response and rescue force deployment. Existing road extraction algorithms are primarily designed for [...] Read more.

Forest fires occur frequently in China; however, the complex terrain and incomplete road networks severely constrain ground rescue efficiency. Accurate forest road information is essential for the optimization of emergency response and rescue force deployment. Existing road extraction algorithms are primarily designed for urban environments and exhibit limited efficacy in forest scenarios due to dense canopy, complex background interference and specific forest road features. To address this gap, this study proposes a forest road extraction method based on an enhanced DeepLabv3+ model using multi-temporal, high-resolution satellite imagery. Specifically, a Multi-Scale Channel Attention (MCSA) mechanism is embedded in skip connections to suppress background interference, while strip pooling is integrated into the Atrous Spatial Pyramid Pooling (ASPP) module to better capture slender road features. A composite Focal-Dice loss function is also constructed to mitigate sample imbalance. Finally, by applying the model in multi-temporal remote sensing images, a fusion strategy is introduced to integrate multi-seasonal road masks to enhance overall accuracy and topological integrity. Experimental results show that the proposed method achieves a precision of 54.1%, an F1-Score of 59.3%, and an IoU of 41.8%, effectively enhancing road continuity and providing robust technical support for fire-rescue decision-making. Full article

(This article belongs to the Special Issue From Prediction to Action: Next Generation AI Solutions for Disaster Preparedness Emergency Response and Community Safety)

► Show Figures

Figure 1

28 pages, 1349 KB

Open AccessArticle

HAAU-Net: Hybrid Adaptive Attention U-Net Integrated with Context-Aware Morphologically Stable Features for Real-Time MRI Brain Tumor Detection and Segmentation

by Muhammad Adeel Asghar, Sultan Shoaib and Muhammad Zahid

Tomography 2026, 12(4), 44; https://doi.org/10.3390/tomography12040044 - 25 Mar 2026

Viewed by 139

Abstract

Background: The Magnetic Resonance Imaging (MRI)-based tumor segmentation remains a challenging problem in medical imaging due to tumor heterogeneity, unpredictable morphological features, and the high complexity of calculations needed to implement it in clinical practice, putting it out of the scope of real-time [...] Read more.

Background: The Magnetic Resonance Imaging (MRI)-based tumor segmentation remains a challenging problem in medical imaging due to tumor heterogeneity, unpredictable morphological features, and the high complexity of calculations needed to implement it in clinical practice, putting it out of the scope of real-time applications. Although neural networks have significantly improved segmentation performance, they still struggle to capture morphological tumor features while maintaining computational efficiency. This work introduces Hybrid Adaptive Attention U-Net (HAAU-Net) framework, combining context-aware morphologically stable features and spatial channel attention to achieve high-quality tumor segmentation with less computational cost. Methods: The proposed HAAU-Net framework integrates multi-scale Adaptive Attention Blocks (AAB), Context-Aware Morphological Feature Module (CAMFM) and Spatial-Channel Hybrid Attention Mechanism (SCHAM). CAMFM is used to maintain the stability of morphological features by hierarchical aggregation and dynamic normalization of features. SCHAM enhances feature representation by modelling channels and spatial regions where the strongest feature are determined to use in segmentation. On the BRaTS 2022/2023 data, the proposed HAAU-Net is evaluated using four modalities including T1, T1GD, T2 and T2-FLAIR sequences. Results: The proposed model able to obtain 96.8% segmentation accuracy with a Dice coefficient of 0.89 on the entire tumor region, outperforming the alternative U-Net (0.83) and conventional CNN methods of segmentation (0.81). The proposed HAAU-Net architecture cuts the computational complexity of the standard deep learning models by 43% and still achieve real-time inference (28 FPS on a regular GPU). The hybrid model used to predict survival has a C-Index of 0.91 which is higher than the traditional SVM-based methods (0.72). Conclusions: Spatial-channel attention, combined with morphologically stable features, can be combined to allow clinically significant interpretability in attention maps. The proposed framework significantly improves segmentation performance while maintaining computational effeciency. This broad system has a serious potential of AI-enabled clinical decision support system and early prognostic diagnosis in neuro-oncology with practical deployment capability. Full article

► Show Figures

Figure 1

20 pages, 3760 KB

Open AccessArticle

Feature-Enhanced Diffusion Model for Text-Guided Sound Effect Generation

by Wei Wan, Lin Jiang, Xiangyang Miao, Yun Fang and Dongfeng Ye

Electronics 2026, 15(7), 1358; https://doi.org/10.3390/electronics15071358 - 25 Mar 2026

Viewed by 203

Abstract

This study proposes a feature-enhanced diffusion model based on wavelet transform and Mamba to address the issues of low audio realism, inadequate text relevance, and slow inference speed in text-guided sound effect generation. A wavelet transform-based downsampling module is designed to mitigate the [...] Read more.

This study proposes a feature-enhanced diffusion model based on wavelet transform and Mamba to address the issues of low audio realism, inadequate text relevance, and slow inference speed in text-guided sound effect generation. A wavelet transform-based downsampling module is designed to mitigate the loss of high-frequency feature information during the downsampling process of the diffusion model, thereby enhancing the realism of the generated audio. A multi-scale feature extraction and fusion method is employed to capture both local and global acoustic information, while the channel attention mechanism further strengthens the model’s focus on text-relevant key features. Additionally, an optimization method based on Mamba and adaptive weight adjustment is proposed, which takes advantage of Mamba’s efficient information processing mechanism and learnable parameters to optimize skip connections, improving model training and inference efficiency without adding substantial computational cost. Experiments show that the model achieves FAD and KL scores of 1.608 and 1.609, respectively, reflecting improvements of 33.8% and 26.1% compared to the baseline model. Full article

(This article belongs to the Special Issue Digital Intelligence Technology and Applications, 2nd Edition)

► Show Figures

Figure 1

25 pages, 9448 KB

Open AccessArticle

SeaLSOD-YOLO: A Lightweight Framework for Maritime Small Object Detection Using YOLOv11

by Jinjia Ruan, Jin He and Yao Tong

Sensors 2026, 26(7), 2017; https://doi.org/10.3390/s26072017 - 24 Mar 2026

Viewed by 283

Abstract

Maritime small object detection is critical for UAV-based sea surveillance but remains challenging due to the small size of targets and interference from sea reflections and waves. This paper proposes SeaLSOD-YOLO, a lightweight detection algorithm based on YOLOv11, designed to improve small object [...] Read more.

Maritime small object detection is critical for UAV-based sea surveillance but remains challenging due to the small size of targets and interference from sea reflections and waves. This paper proposes SeaLSOD-YOLO, a lightweight detection algorithm based on YOLOv11, designed to improve small object detection accuracy while maintaining real-time performance. The method incorporates four key modules: Shallow Multi-scale Output Reconstruction, which fuses shallow and mid-level features to preserve fine-grained details; SPPF-FD, which combines spatial pyramid pooling with frequency-domain adaptive convolution to enhance sensitivity to high-frequency textures and suppress sea-surface interference; attention-based feature fusion, which emphasizes small object features through channel and spatial attention; and dynamic multi-scale sampling, which optimizes feature representation across different scales. Experiments on the SeaDroneSee dataset demonstrate that, compared with YOLOv11s, the proposed method improves precision from 75.6% to 81.9%, recall from 62.6% to 73.5%, and mAP@0.5 from 67.9% to 77.0%. The mAP@0.5:0.95 also increases from 41.1% to 44.9%. The model achieves an inference speed of 256 FPS. Although the parameter size increases from 18.2 MB to 30.8 MB, the method maintains a favorable balance between detection accuracy and computational efficiency. Comparative evaluation further shows superior performance in detecting small maritime objects such as buoys and lifeboats. These results indicate that SeaLSOD-YOLO effectively balances accuracy, efficiency, and real-time capability in complex maritime environments. Future work will focus on further optimization of attention mechanisms and upsampling strategies to enhance the detection of extremely small targets. Full article

(This article belongs to the Section Communications)

► Show Figures

Figure 1

Search Results (932)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (932)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI