MDPI - Publisher of Open Access Journals

23 pages, 5667 KiB

Open AccessArticle

MEFA-Net: Multilevel Feature Extraction and Fusion Attention Network for Infrared Small-Target Detection

by Jingcui Ma, Nian Pan, Dengyu Yin, Di Wang and Jin Zhou

Remote Sens. 2025, 17(14), 2502; https://doi.org/10.3390/rs17142502 - 18 Jul 2025

Viewed by 209

Infrared small-target detection encounters significant challenges due to a low image signal-to-noise ratio, limited target size, and complex background noise. To address the issues of sparse feature loss for small targets during the down-sampling phase of the traditional U-Net network and the semantic [...] Read more.

Infrared small-target detection encounters significant challenges due to a low image signal-to-noise ratio, limited target size, and complex background noise. To address the issues of sparse feature loss for small targets during the down-sampling phase of the traditional U-Net network and the semantic gap in the feature fusion process, a multilevel feature extraction and fusion attention network (MEFA-Net) is designed. Specifically, the dilated direction-sensitive convolution block (DDCB) is devised to collaboratively extract local detail features, contextual features, and Gaussian salient features via ordinary convolution, dilated convolution and parallel strip convolution. Furthermore, the encoder attention fusion module (EAF) is employed, where spatial and channel attention weights are generated using dual-path pooling to achieve the adaptive fusion of deep and shallow layer features. Lastly, an efficient up-sampling block (EUB) is constructed, integrating a hybrid up-sampling strategy with multi-scale dilated convolution to refine the localization of small targets. The experimental results confirm that the proposed algorithm model surpasses most existing recent methods. Compared with the baseline, the intersection over union (IoU) and probability of detection

P_{d}

of MEFA-Net on the IRSTD-1k dataset are increased by 2.25% and 3.05%, respectively, achieving better detection performance and a lower false alarm rate in complex scenarios. Full article

► Show Figures

Figure 1

21 pages, 3826 KiB

Open AccessArticle

UAV-OVD: Open-Vocabulary Object Detection in UAV Imagery via Multi-Level Text-Guided Decoding

by Lijie Tao, Guoting Wei, Zhuo Wang, Zhaoshuai Qi, Ying Li and Haokui Zhang

Drones 2025, 9(7), 495; https://doi.org/10.3390/drones9070495 - 14 Jul 2025

Viewed by 319

Abstract

Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore [...] Read more.

Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore open-vocabulary or open-world detection, their application to UAV imagery remains limited and underexplored. In this paper, we address this limitation by exploring the relationship between images and textual semantics to extend object detection in UAV imagery to an open-vocabulary setting. We propose a novel and efficient detector named Unmanned Aerial Vehicle Open-Vocabulary Detector (UAV-OVD), specifically designed for drone-captured scenes. To facilitate open-vocabulary object detection, we propose improvements from three complementary perspectives. First, at the training level, we design a region–text contrastive loss to replace conventional classification loss, allowing the model to align visual regions with textual descriptions beyond fixed category sets. Structurally, building on this, we introduce a multi-level text-guided fusion decoder that integrates visual features across multiple spatial scales under language guidance, thereby improving overall detection performance and enhancing the representation and perception of small objects. Finally, from the data perspective, we enrich the original dataset with synonym-augmented category labels, enabling more flexible and semantically expressive supervision. Experiments conducted on two widely used benchmark datasets demonstrate that our approach achieves significant improvements in both mean mAP and Recall. For instance, for Zero-Shot Detection on xView, UAV-OVD achieves 9.9 mAP and 67.3 Recall, 1.1 and 25.6 higher than that of YOLO-World. In terms of speed, UAV-OVD achieves 53.8 FPS, nearly twice as fast as YOLO-World and five times faster than DetrReg, demonstrating its strong potential for real-time open-vocabulary detection in UAV imagery. Full article

(This article belongs to the Special Issue Applications of UVs in Digital Photogrammetry and Image Processing)

► Show Figures

Figure 1

26 pages, 2178 KiB

Open AccessArticle

Cross-Modal Fake News Detection Method Based on Multi-Level Fusion Without Evidence

by Ping He, Hanxue Zhang, Shufu Cao and Yali Wu

Algorithms 2025, 18(7), 426; https://doi.org/10.3390/a18070426 - 10 Jul 2025

Viewed by 284

Abstract

Although multimodal feature fusion technology in fake news detection can integrate complementary information from different modal data, the semantic inconsistency of multimodal features will lead to feature fusion difficulties. And there is the problem of information loss during one fusion process. In addition, [...] Read more.

Although multimodal feature fusion technology in fake news detection can integrate complementary information from different modal data, the semantic inconsistency of multimodal features will lead to feature fusion difficulties. And there is the problem of information loss during one fusion process. In addition, although it is possible to improve the detection effect by increasing the support of external evidence in fake news detection, there is a lag in obtaining external evidence and the reliability and completeness of the evidence source is difficult to guarantee. Additional noise may be introduced to interfere with the model judgment. Therefore, a cross-modal fake news detection method (CM-MLF) based on evidence-free multilevel fusion is proposed. The method solves the semantic inconsistency problem by utilizing cross-modal alignment processing. And it utilizes the attention mechanism to perform multilevel fusion of text and image features without the assistance of other evidential features to further enhance the expressive power of the features. Experiments show that the method achieves better detection results on multiple benchmark datasets, effectively improving the accuracy and robustness of cross-modal fake news detection. Full article

(This article belongs to the Special Issue Algorithms for Feature Selection (3rd Edition))

► Show Figures

Graphical abstract

22 pages, 696 KiB

Open AccessArticle

Domain Knowledge-Driven Method for Threat Source Detection and Localization in the Power Internet of Things

by Zhimin Gu, Jing Guo, Jiangtao Xu, Yunxiao Sun and Wei Liang

Electronics 2025, 14(13), 2725; https://doi.org/10.3390/electronics14132725 - 7 Jul 2025

Viewed by 302

Abstract

Although the Power Internet of Things (PIoT) significantly improves operational efficiency by enabling real-time monitoring, intelligent control, and predictive maintenance across the grid, its inherently open and deeply interconnected cyber-physical architecture concurrently introduces increasingly complex and severe security threats. Existing IoT security solutions [...] Read more.

Although the Power Internet of Things (PIoT) significantly improves operational efficiency by enabling real-time monitoring, intelligent control, and predictive maintenance across the grid, its inherently open and deeply interconnected cyber-physical architecture concurrently introduces increasingly complex and severe security threats. Existing IoT security solutions are not fully adapted to the specific requirements of power systems, such as safety-critical reliability, protocol heterogeneity, physical/electrical context awareness, and the incorporation of domain-specific operational knowledge unique to the power sector. These limitations often lead to high false positives (flagging normal operations as malicious) and false negatives (failing to detect actual intrusions), ultimately compromising system stability and security response. To address these challenges, we propose a domain knowledge-driven threat source detection and localization method for the PIoT. The proposed method combines multi-source features—including electrical-layer measurements, network-layer metrics, and behavioral-layer logs—into a unified representation through a multi-level PIoT feature engineering framework. Building on advances in multimodal data integration and feature fusion, our framework employs a hybrid neural architecture combining the TabTransformer to model structured physical and network-layer features with BiLSTM to capture temporal dependencies in behavioral log sequences. This design enables comprehensive threat detection while supporting interpretable and fine-grained source localization. Experiments on a real-world Power Internet of Things (PIoT) dataset demonstrate that the proposed method achieves high detection accuracy and enables the actionable attribution of attack stages aligned with the MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) framework. The proposed approach offers a scalable and domain-adaptable foundation for security analytics in cyber-physical power systems. Full article

► Show Figures

Figure 1

26 pages, 9083 KiB

Open AccessArticle

An Efficient Fine-Grained Recognition Method Enhanced by Res2Net Based on Dynamic Sparse Attention

by Qifeng Niu, Hui Wang and Feng Xu

Sensors 2025, 25(13), 4147; https://doi.org/10.3390/s25134147 - 3 Jul 2025

Viewed by 322

Abstract

Fine-grained recognition tasks face significant challenges in differentiating subtle, class-specific details against cluttered backgrounds. This paper presents an efficient architecture built upon the Res2Net backbone, significantly enhanced by a dynamic Sparse Attention mechanism. The core approach leverages the inherent multi-scale representation power of [...] Read more.

Fine-grained recognition tasks face significant challenges in differentiating subtle, class-specific details against cluttered backgrounds. This paper presents an efficient architecture built upon the Res2Net backbone, significantly enhanced by a dynamic Sparse Attention mechanism. The core approach leverages the inherent multi-scale representation power of Res2Net to capture discriminative patterns across different granularities. Crucially, the integrated Sparse Attention module operates dynamically, selectively amplifying the most informative features while attenuating irrelevant background noise and redundant details. This combined strategy substantially improves the model’s ability to focus on pivotal regions critical for accurate classification. Furthermore, strategic architectural optimizations are applied throughout to minimize computational complexity, resulting in a model that demands significantly fewer parameters and exhibits faster inference times. Extensive evaluations on benchmark datasets demonstrate the effectiveness of the proposed method. It achieves a modest but consistent accuracy gain over strong baselines (approximately 2%) while simultaneously reducing model size by around 30% and inference latency by about 20%, proving highly effective for practical fine-grained recognition applications requiring both high accuracy and operational efficiency. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition for Advanced Smart Agriculture Solutions)

► Show Figures

Figure 1

26 pages, 5237 KiB

Open AccessArticle

A Bridge Defect Detection Algorithm Based on UGMB Multi-Scale Feature Extraction and Fusion

by Haiyan Zhang, Chao Tian, Ao Zhang, Yilin Liu, Guxue Gao, Zhiwen Zhuang, Tongtong Yin and Nuo Zhang

Symmetry 2025, 17(7), 1025; https://doi.org/10.3390/sym17071025 - 30 Jun 2025

Viewed by 235

Abstract

Aiming at the problems of leakage and misdetection caused by insufficient multi-scale feature extraction and an excessive amount of model parameters in bridge defect detection, this paper proposes the AMSF-Pyramid-YOLOv11n model. First, a Cooperative Optimization Module (COPO) is introduced, which consists of the [...] Read more.

Aiming at the problems of leakage and misdetection caused by insufficient multi-scale feature extraction and an excessive amount of model parameters in bridge defect detection, this paper proposes the AMSF-Pyramid-YOLOv11n model. First, a Cooperative Optimization Module (COPO) is introduced, which consists of the designed multi-level dilated shared convolution (FPSharedConv) and a dual-domain attention block. Through the joint optimization of FPSharedConv and a CGLU gating mechanism, the module significantly improves feature extraction efficiency and learning capability. Second, the Unified Global-Multiscale Bottleneck (UGMB) multi-scale feature pyramid designed in this study efficiently integrates the FCGL_MANet, WFU, and HAFB modules. By leveraging the symmetry of Haar wavelet decomposition combined with local-global attention, this module effectively addresses the challenge of multi-scale feature fusion, enhancing the model’s ability to capture both symmetrical and asymmetrical bridge defect patterns. Finally, an optimized lightweight detection head (LCB_Detect) is employed, which reduces the parameter count by 6.35% through shared convolution layers and separate batch normalization. Experimental results show that the proposed model achieves a mean average precision (mAP@0.5) of 60.3% on a self-constructed bridge defect dataset, representing an improvement of 11.3% over the baseline YOLOv11n. The model effectively reduces the false positive rate while improving the detection accuracy of bridge defects. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

19 pages, 3827 KiB

Open AccessArticle

Multi-Level Intertemporal Attention-Guided Network for Change Detection in Remote Sensing Images

by Shuo Liu, Qinyu Zhang, Yuhang Zhang, Xiaochen Niu, Wuxia Zhang and Fei Xie

Remote Sens. 2025, 17(13), 2233; https://doi.org/10.3390/rs17132233 - 29 Jun 2025

Viewed by 248

Abstract

Change detection (CD) is detecting and evaluating surface changes by comparing Remote Sensing Images (RSIs) at different times, which is of great significance for environmental protection and urban planning. Due to the need for higher standards in complex scenes, attention-based CD methods have [...] Read more.

Change detection (CD) is detecting and evaluating surface changes by comparing Remote Sensing Images (RSIs) at different times, which is of great significance for environmental protection and urban planning. Due to the need for higher standards in complex scenes, attention-based CD methods have become predominant. These methods focus on regions of interest, improving detection accuracy and efficiency. However, external factors can introduce many pseudo-changes, presenting significant challenges for CD. To address this issue, we proposed a Multi-level Intertemporal Attention-guided Network (MIANet) for CD. Firstly, an Intertemporal Fusion Attention Unit (IFAU) is proposed to facilitate early feature interaction, which helps eliminate irrelevant changes. Secondly, the Change Location and Recognition Module (CLRM) is designed to explore change areas more deeply, effectively improving the representation of change features. Furthermore, we also employ a challenging landslide mapping dataset for the CD task. Through comprehensive testing on two datasets, the MIANet algorithm proves to be effective and robust, achieving detection results that are either better or at least comparable with current methods in terms of accuracy and reliability. Full article

(This article belongs to the Special Issue Integrating Deep Learning with Image Perception for Advanced Remote Sensing Applications)

► Show Figures

Figure 1

19 pages, 5785 KiB

Open AccessArticle

RPFusionNet: An Efficient Semantic Segmentation Method for Large-Scale Remote Sensing Images via Parallel Region–Patch Fusion

by Shiyan Pang, Weimin Zeng, Yepeng Shi, Zhiqi Zuo, Kejiang Xiao and Yujun Wu

Remote Sens. 2025, 17(13), 2158; https://doi.org/10.3390/rs17132158 - 24 Jun 2025

Viewed by 409

Abstract

Mainstream deep learning segmentation models are designed for small-sized images, and when applied to high-resolution remote sensing images, the limited information contained in small-sized images greatly restricts a model’s ability to capture complex contextual information at a global scale. To mitigate this challenge, [...] Read more.

Mainstream deep learning segmentation models are designed for small-sized images, and when applied to high-resolution remote sensing images, the limited information contained in small-sized images greatly restricts a model’s ability to capture complex contextual information at a global scale. To mitigate this challenge, we present RPFusionNet, a novel parallel semantic segmentation framework that is specifically designed to efficiently integrate both local and global features. RPFusionNet leverages two distinct feature representations: REGION (representing large areas) and PATCH (representing smaller regions). This framework comprises two parallel branches: the REGION branch initially downsamples the entire image, then extracts features via a convolutional neural network (CNN)-based encoder, and subsequently captures multi-level information using pooled kernels of varying sizes. This design enables the model to adapt effectively to objects of different scales. In contrast, the PATCH branch utilizes a pixel-level feature extractor to enrich the high-dimensional features of the local region, thereby enhancing the representation of fine-grained details. To model the semantic correlation between the two branches, we have developed the Region–Patch scale fusion module. This module ensures that the network can comprehend a wider range of image contexts while preserving local details, thus bridging the gap between regional and local information. Extensive experiments were conducted on three public datasets: WBDS, AIDS, and Vaihingen. Compared to other state-of-the-art methods, our network achieved the highest accuracy on all three datasets, with an IoU score of 92.08% on the WBDS dataset, 89.99% on the AIDS dataset, and 88.44% on the Vaihingen dataset. Full article

(This article belongs to the Special Issue New Insights in Remote Sensing Image Interpretation with Deep Learning)

► Show Figures

Figure 1

25 pages, 9860 KiB

Open AccessArticle

Indoor Dynamic Environment Mapping Based on Semantic Fusion and Hierarchical Filtering

by Yiming Li, Luying Na, Xianpu Liang and Qi An

ISPRS Int. J. Geo-Inf. 2025, 14(7), 236; https://doi.org/10.3390/ijgi14070236 - 21 Jun 2025

Viewed by 631

Abstract

To address the challenges of dynamic object interference and redundant information representation in map construction for indoor dynamic environments, this paper proposes an indoor dynamic environment mapping method based on semantic fusion and hierarchical filtering. First, prior dynamic object masks are obtained using [...] Read more.

To address the challenges of dynamic object interference and redundant information representation in map construction for indoor dynamic environments, this paper proposes an indoor dynamic environment mapping method based on semantic fusion and hierarchical filtering. First, prior dynamic object masks are obtained using the YOLOv8 model, and geometric constraints between prior static objects and dynamic regions are introduced to identify non-prior dynamic objects, thereby eliminating all dynamic features (both prior and non-prior). Second, an initial semantic point cloud map is constructed by integrating prior static features from a semantic segmentation network with pose estimates from an RGB-D camera. Dynamic noise is then removed using statistical outlier removal (SOR) filtering, while voxel filtering optimizes point cloud density, generating a compact yet texture-rich semantic dense point cloud map with minimal dynamic artifacts. Subsequently, a multi-resolution semantic octree map is built using a recursive spatial partitioning algorithm. Finally, point cloud poses are corrected via Transform Frame (TF) transformation, and a 2D traversability grid map is generated using passthrough filtering and grid projection. Experimental results demonstrate that the proposed method constructs multi-level semantic maps with rich information, clear structure, and high reliability in indoor dynamic scenarios. Additionally, the map file size is compressed by 50–80%, significantly enhancing the reliability of mobile robot navigation and the efficiency of path planning. Full article

(This article belongs to the Special Issue Indoor Mobile Mapping and Location-Based Knowledge Services)

► Show Figures

Figure 1

21 pages, 83137 KiB

Open AccessArticle

RGB-FIR Multimodal Pedestrian Detection with Cross-Modality Context Attentional Model

by Han Wang, Lei Jin, Guangcheng Wang, Wenjie Liu, Quan Shi, Yingyan Hou and Jiali Liu

Sensors 2025, 25(13), 3854; https://doi.org/10.3390/s25133854 - 20 Jun 2025

Viewed by 330

Abstract

Pedestrian detection is an important research topic in the field of visual cognition and autonomous driving systems. The proposal of the YOLO model has significantly improved the speed and accuracy of detection. To achieve full day detection performance, multimodal YOLO models based on [...] Read more.

Pedestrian detection is an important research topic in the field of visual cognition and autonomous driving systems. The proposal of the YOLO model has significantly improved the speed and accuracy of detection. To achieve full day detection performance, multimodal YOLO models based on RGB-FIR image pairs have become a research hotspot. Existing work has focused on the design of fusion modules after feature extraction of RGB and FIR branch backbone networks, achieving a multimodal backbone network framework based on back-end fusion. However, these methods overlook the complementarity and prior knowledge between modalities and scales in the front-end raw feature extraction of RGB and FIR branch backbone networks. As a result, the performance of the backend fusion framework largely depends on the representation ability of the raw features of each modality in the front-end. This paper proposes a novel RGB-FIR multimodal backbone network framework based on a cross-modality context attentional model (CCAM). Different from the existing works, a multi-level fusion framework is designed. At the front-end of the RGB-FIR parallel backbone network, the CCAM model is constructed for the raw features of each scale. The RGB-FIR feature fusion results of the lower-level features of the RGB and FIR branch backbone networks are fully utilized to optimize the spatial weight of the upper level RGB and FIR features, to achieve cross-modality and cross-scale complementarity between adjacent scale feature extraction modules. At the back-end of the RGB-FIR parallel network, a channel-space joint attention model (CBAM) and self-attention models are combined to obtain the final RGB-FIR fusion features at each scale for those RGB and FIR features optimized by CCAM. Compared with the current RGB-FIR multimodal YOLO model, comparative experiments on different performance evaluation indicators on multiple RGB-FIR public datasets indicate that this method can significantly enhance the accuracy and robustness of pedestrian detection. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

24 pages, 6594 KiB

Open AccessArticle

GAT-Enhanced YOLOv8_L with Dilated Encoder for Multi-Scale Space Object Detection

by Haifeng Zhang, Han Ai, Donglin Xue, Zeyu He, Haoran Zhu, Delian Liu, Jianzhong Cao and Chao Mei

Remote Sens. 2025, 17(13), 2119; https://doi.org/10.3390/rs17132119 - 20 Jun 2025

Viewed by 445

Abstract

The problem of inadequate object detection accuracy in complex remote sensing scenarios has been identified as a primary concern. Traditional YOLO-series algorithms encounter challenges such as poor robustness in small object detection and significant interference from complex backgrounds. In this paper, a multi-scale [...] Read more.

The problem of inadequate object detection accuracy in complex remote sensing scenarios has been identified as a primary concern. Traditional YOLO-series algorithms encounter challenges such as poor robustness in small object detection and significant interference from complex backgrounds. In this paper, a multi-scale feature fusion framework based on an improved version of YOLOv8_L is proposed. The combination of a graph attention network (GAT) and Dilated Encoder network significantly improves the algorithm detection and recognition performance for space remote sensing objects. It mainly includes abandoning the original Feature Pyramid Network (FPN) structure, proposing an adaptive fusion strategy based on multi-level features of backbone network, enhancing the expression ability of multi-scale objects through upsampling and feature stacking, and reconstructing the FPN. The local features extracted by convolutional neural networks are mapped to graph-structured data, and the nodal attention mechanism of GAT is used to capture the global topological association of space objects, which makes up for the deficiency of the convolutional operation in weight allocation and realizes GAT integration. The Dilated Encoder network is introduced to cover different-scale targets by differentiating receptive fields, and the feature weight allocation is optimized by combining it with a Convolutional Block Attention Module (CBAM). According to the characteristics of space missions, an annotated dataset containing 8000 satellite and space station images is constructed, covering a variety of lighting, attitude and scale scenes, and providing benchmark support for model training and verification. Experimental results on the space object dataset reveal that the enhanced algorithm achieves a mean average precision (mAP) of 97.2%, representing a 2.1% improvement over the original YOLOv8_L. Comparative experiments with six other models demonstrate that the proposed algorithm outperforms its counterparts. Ablation studies further validate the synergistic effect between the graph attention network (GAT) and the Dilated Encoder. The results indicate that the model maintains a high detection accuracy under challenging conditions, including strong light interference, multi-scale variations, and low-light environments. Full article

(This article belongs to the Special Issue Remote Sensing Image Thorough Analysis by Advanced Machine Learning)

► Show Figures

Figure 1

24 pages, 4557 KiB

Open AccessArticle

Advanced Multi-Level Ensemble Learning Approaches for Comprehensive Sperm Morphology Assessment

by Abdulsamet Aktas, Taha Cap, Gorkem Serbes, Hamza Osman Ilhan and Hakkı Uzun

Diagnostics 2025, 15(12), 1564; https://doi.org/10.3390/diagnostics15121564 - 19 Jun 2025

Viewed by 459

Abstract

Introduction: Fertility is fundamental to human well-being, significantly impacting both individual lives and societal development. In particular, sperm morphology—referring to the shape, size, and structural integrity of sperm cells—is a key indicator in diagnosing male infertility and selecting viable sperm in assisted reproductive [...] Read more.

Introduction: Fertility is fundamental to human well-being, significantly impacting both individual lives and societal development. In particular, sperm morphology—referring to the shape, size, and structural integrity of sperm cells—is a key indicator in diagnosing male infertility and selecting viable sperm in assisted reproductive technologies such as in vitro fertilisation (IVF) and intracytoplasmic sperm injection (ICSI). However, traditional manual evaluation methods are highly subjective and inconsistent, creating a need for standardized, automated systems. Objectives: This study aims to develop a robust and fully automated sperm morphology classification framework capable of accurately identifying a wide range of morphological abnormalities, thereby minimizing observer variability and improving diagnostic support in reproductive healthcare. Methods: We propose a novel ensemble-based classification approach that combines convolutional neural network (CNN)-derived features using both feature-level and decision-level fusion techniques. Features extracted from multiple EfficientNetV2 variants are fused and classified using Support Vector Machines (SVM), Random Forest (RF), and Multi-Layer Perceptron with Attention (MLP-Attention). Decision-level fusion is achieved via soft voting to enhance robustness and accuracy. Results: The proposed ensemble framework was evaluated using the Hi-LabSpermMorpho dataset, which contains 18 distinct sperm morphology classes. The fusion-based model achieved an accuracy of 67.70%, significantly outperforming individual classifiers. The integration of multiple CNN architectures and ensemble techniques effectively mitigated class imbalance and enhanced the generalizability of the model. Conclusions: The presented methodology demonstrates a substantial improvement over traditional and single-model approaches in automated sperm morphology classification. By leveraging ensemble learning and multi-level fusion, the model provides a reliable and scalable solution for clinical decision-making in male fertility assessment. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

18 pages, 3051 KiB

Open AccessArticle

Segmentation and Fractional Coverage Estimation of Soil, Illuminated Vegetation, and Shaded Vegetation in Corn Canopy Images Using CCSNet and UAV Remote Sensing

by Shanxin Zhang, Jibo Yue, Xiaoyan Wang, Haikuan Feng, Yang Liu and Meiyan Shu

Agriculture 2025, 15(12), 1309; https://doi.org/10.3390/agriculture15121309 - 18 Jun 2025

Viewed by 534

Abstract

The accurate estimation of corn canopy structure and light conditions is essential for effective crop management and informed variety selection. This study introduces CCSNet, a deep learning-based semantic segmentation model specifically developed to extract fractional coverages of soil, illuminated vegetation, and shaded vegetation [...] Read more.

The accurate estimation of corn canopy structure and light conditions is essential for effective crop management and informed variety selection. This study introduces CCSNet, a deep learning-based semantic segmentation model specifically developed to extract fractional coverages of soil, illuminated vegetation, and shaded vegetation from high-resolution corn canopy images acquired by UAVs. CCSNet improves segmentation accuracy by employing multi-level feature fusion and pyramid pooling to effectively capture multi-scale contextual information. The model was evaluated using Pixel Accuracy (PA), mean Intersection over Union (mIoU), and Recall, and was benchmarked against U-Net, PSPNet and UNetFormer. On the test set, CCSNet utilizing a ResNet50 backbone achieved the highest accuracy, with an mIoU of 86.42% and a PA of 93.58%. In addition, its estimation of fractional coverage for key canopy components yielded a root mean squared error (RMSE) ranging from 3.16% to 5.02%. Compared to lightweight backbones (e.g., MobileNetV2), CCSNet exhibited superior generalization performance when integrated with deeper backbones. These results highlight CCSNet’s capability to deliver high-precision segmentation and reliable phenotypic measurements. This provides valuable insights for breeders to evaluate light-use efficiency and facilitates intelligent decision-making in precision agriculture. Full article

(This article belongs to the Special Issue Research Advances in Perception for Agricultural Robots)

► Show Figures

Figure 1

28 pages, 4916 KiB

Open AccessArticle

Research on Bearing Fault Diagnosis Method for Varying Operating Conditions Based on Spatiotemporal Feature Fusion

by Jin Wang, Yan Wang, Junhui Yu, Qingping Li, Hailin Wang and Xinzhi Zhou

Sensors 2025, 25(12), 3789; https://doi.org/10.3390/s25123789 - 17 Jun 2025

Viewed by 385

Abstract

In real-world scenarios, the rotational speed of bearings is variable. Due to changes in operating conditions, the feature distribution of bearing vibration data becomes inconsistent, which leads to the inability to directly apply the training model built under one operating condition (source domain) [...] Read more.

In real-world scenarios, the rotational speed of bearings is variable. Due to changes in operating conditions, the feature distribution of bearing vibration data becomes inconsistent, which leads to the inability to directly apply the training model built under one operating condition (source domain) to another condition (target domain). Furthermore, the lack of sufficient labeled data in the target domain further complicates fault diagnosis under varying operating conditions. To address this issue, this paper proposes a spatiotemporal feature fusion domain-adaptive network (STFDAN) framework for bearing fault diagnosis under varying operating conditions. The framework constructs a feature extraction and domain adaptation network based on a parallel architecture, designed to capture the complex dynamic characteristics of vibration signals. First, the Fast Fourier Transform (FFT) and Variational Mode Decomposition (VMD) are used to extract the spectral and modal features of the signals, generating a joint representation with multi-level information. Then, a parallel processing mechanism of the Convolutional Neural Network (SECNN) based on the Squeeze-and-Excitation module and the Bidirectional Long Short-Term Memory network (BiLSTM) is employed to dynamically adjust weights, capturing high-dimensional spatiotemporal features. The cross-attention mechanism enables the interaction and fusion of spatial and temporal features, significantly enhancing the complementarity and coupling of the feature representations. Finally, a Multi-Kernel Maximum Mean Discrepancy (MKMMD) is introduced to align the feature distributions between the source and target domains, enabling efficient fault diagnosis under varying bearing conditions. The proposed STFDAN framework is evaluated using bearing datasets from Case Western Reserve University (CWRU), Jiangnan University (JNU), and Southeast University (SEU). Experimental results demonstrate that STFDAN achieves high diagnostic accuracy across different load conditions and effectively solves the bearing fault diagnosis problem under varying operating conditions. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

20 pages, 4172 KiB

Open AccessArticle

Multi-Level Feature Fusion Attention Generative Adversarial Network for Retinal Optical Coherence Tomography Image Denoising

by Yiming Qian and Yichao Meng

Appl. Sci. 2025, 15(12), 6697; https://doi.org/10.3390/app15126697 - 14 Jun 2025

Viewed by 428

Abstract

Background: Optical coherence tomography (OCT) is limited by inherent speckle noise, degrading retinal microarchitecture visualization and pathological analysis. Existing denoising methods inadequately balance noise suppression and structural preservation, necessitating advanced solutions for clinical OCT reconstruction. Methods: We propose MFFA-GAN, a generative adversarial [...] Read more.

Background: Optical coherence tomography (OCT) is limited by inherent speckle noise, degrading retinal microarchitecture visualization and pathological analysis. Existing denoising methods inadequately balance noise suppression and structural preservation, necessitating advanced solutions for clinical OCT reconstruction. Methods: We propose MFFA-GAN, a generative adversarial network integrating multilevel feature fusion and an efficient local attention (ELA) mechanism. It optimizes cross-feature interactions and channel-wise information flow. Evaluations on three public OCT datasets compared traditional methods and deep learning models using PSNR, SSIM, CNR, and ENL metrics. Results: MFFA-GAN achieved good performance (PSNR:30.107 dB, SSIM:0.727, CNR:3.927, ENL:529.161) on smaller datasets, outperforming benchmarks and further enhanced interpretability through pixel error maps. It preserved retinal layers and textures while suppressing noise. Ablation studies confirmed the synergy of multilevel features and ELA, improving PSNR by 1.8 dB and SSIM by 0.12 versus baselines. Conclusions: MFFA-GAN offers a reliable OCT denoising solution by harmonizing noise reduction and structural fidelity. Its hybrid attention mechanism enhances clinical image quality, aiding retinal analysis and diagnosis. Full article

(This article belongs to the Special Issue Explainable Artificial Intelligence Technology and Its Applications)

► Show Figures

Figure 1

Search Results (326)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (326)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI