MDPI - Publisher of Open Access Journals

22 pages, 2498 KiB

Open AccessArticle

SceEmoNet: A Sentiment Analysis Model with Scene Construction Capability

by Yi Liang, Dongfang Han, Zhenzhen He, Bo Kong and Shuanglin Wen

Appl. Sci. 2025, 15(15), 8588; https://doi.org/10.3390/app15158588 (registering DOI) - 2 Aug 2025

How do humans analyze the sentiments embedded in text? When attempting to analyze a text, humans construct a “scene” in their minds through imagination based on the text, generating a vague image. They then synthesize the text and the mental image to derive [...] Read more.

How do humans analyze the sentiments embedded in text? When attempting to analyze a text, humans construct a “scene” in their minds through imagination based on the text, generating a vague image. They then synthesize the text and the mental image to derive the final analysis result. However, current sentiment analysis models lack such imagination; they can only analyze based on existing information in the text, which limits their classification accuracy. To address this issue, we propose the SceEmoNet model. This model endows text classification models with imagination through Stable diffusion, enabling the model to generate corresponding visual scenes from input text, thus introducing a new modality of visual information. We then use the Contrastive Language-Image Pre-training (CLIP) model, a multimodal feature extraction model, to extract aligned features from different modalities, preventing significant feature differences caused by data heterogeneity. Finally, we fuse information from different modalities using late fusion to obtain the final classification result. Experiments on six datasets with different classification tasks show improvements of 9.57%, 3.87%, 3.63%, 3.14%, 0.77%, and 0.28%, respectively. Additionally, we set up experiments to deeply analyze the model’s advantages and limitations, providing a new technical path for follow-up research. Full article

(This article belongs to the Special Issue Advanced Technologies and Applications of Emotion Recognition)

► Show Figures

Figure 1

18 pages, 1811 KiB

Open AccessArticle

A Multimodal Deep Learning Framework for Consistency-Aware Review Helpfulness Prediction

by Seonu Park, Xinzhe Li, Qinglong Li and Jaekyeong Kim

Electronics 2025, 14(15), 3089; https://doi.org/10.3390/electronics14153089 (registering DOI) - 1 Aug 2025

Abstract

Multimodal review helpfulness prediction (MRHP) aims to identify the most helpful reviews by leveraging both textual and visual information. However, prior studies have primarily focused on modeling interactions between these modalities, often overlooking the consistency between review content and ratings, which is a [...] Read more.

Multimodal review helpfulness prediction (MRHP) aims to identify the most helpful reviews by leveraging both textual and visual information. However, prior studies have primarily focused on modeling interactions between these modalities, often overlooking the consistency between review content and ratings, which is a key indicator of review credibility. To address this limitation, we propose CRCNet (Content–Rating Consistency Network), a novel MRHP model that jointly captures the semantic consistency between review content and ratings while modeling the complementary characteristics of text and image modalities. CRCNet employs RoBERTa and VGG-16 to extract semantic and visual features, respectively. A co-attention mechanism is applied to capture the consistency between content and rating, and a Gated Multimodal Unit (GMU) is adopted to integrate consistency-aware representations. Experimental results on two large-scale Amazon review datasets demonstrate that CRCNet outperforms both unimodal and multimodal baselines in terms of MAE, MSE, RMSE, and MAPE. Further analysis confirms the effectiveness of content–rating consistency modeling and the superiority of the proposed fusion strategy. These findings suggest that incorporating semantic consistency into multimodal architectures can substantially improve the accuracy and trustworthiness of review helpfulness prediction. Full article

(This article belongs to the Special Issue Innovative Applications of Large Language Models in Natural Language Processing (NLP))

22 pages, 24173 KiB

Open AccessArticle

ScaleViM-PDD: Multi-Scale EfficientViM with Physical Decoupling and Dual-Domain Fusion for Remote Sensing Image Dehazing

by Hao Zhou, Yalun Wang, Wanting Peng, Xin Guan and Tao Tao

Remote Sens. 2025, 17(15), 2664; https://doi.org/10.3390/rs17152664 (registering DOI) - 1 Aug 2025

Abstract

Remote sensing images are often degraded by atmospheric haze, which not only reduces image quality but also complicates information extraction, particularly in high-level visual analysis tasks such as object detection and scene classification. State-space models (SSMs) have recently emerged as a powerful paradigm [...] Read more.

Remote sensing images are often degraded by atmospheric haze, which not only reduces image quality but also complicates information extraction, particularly in high-level visual analysis tasks such as object detection and scene classification. State-space models (SSMs) have recently emerged as a powerful paradigm for vision tasks, showing great promise due to their computational efficiency and robust capacity to model global dependencies. However, most existing learning-based dehazing methods lack physical interpretability, leading to weak generalization. Furthermore, they typically rely on spatial features while neglecting crucial frequency domain information, resulting in incomplete feature representation. To address these challenges, we propose ScaleViM-PDD, a novel network that enhances an SSM backbone with two key innovations: a Multi-scale EfficientViM with Physical Decoupling (ScaleViM-P) module and a Dual-Domain Fusion (DD Fusion) module. The ScaleViM-P module synergistically integrates a Physical Decoupling block within a Multi-scale EfficientViM architecture. This design enables the network to mitigate haze interference in a physically grounded manner at each representational scale while simultaneously capturing global contextual information to adaptively handle complex haze distributions. To further address detail loss, the DD Fusion module replaces conventional skip connections by incorporating a novel Frequency Domain Module (FDM) alongside channel and position attention. This allows for a more effective fusion of spatial and frequency features, significantly improving the recovery of fine-grained details, including color and texture information. Extensive experiments on nine publicly available remote sensing datasets demonstrate that ScaleViM-PDD consistently surpasses state-of-the-art baselines in both qualitative and quantitative evaluations, highlighting its strong generalization ability. Full article

(This article belongs to the Special Issue Artificial Intelligence Algorithm for Remote Sensing Imagery Processing (5th Edition))

► Show Figures

Figure 1

14 pages, 483 KiB

Open AccessReview

Artificial Intelligence and Its Impact on the Management of Lumbar Degenerative Pathology: A Narrative Review

by Alessandro Trento, Salvatore Rapisarda, Nicola Bresolin, Andrea Valenti and Enrico Giordan

Medicina 2025, 61(8), 1400; https://doi.org/10.3390/medicina61081400 (registering DOI) - 1 Aug 2025

Abstract

In this narrative review, we explore the role of artificial intelligence (AI) in managing lumbar degenerative conditions, a topic that has recently garnered significant interest. The use of AI-based solutions in spine surgery is particularly appealing due to its potential applications in preoperative [...] Read more.

In this narrative review, we explore the role of artificial intelligence (AI) in managing lumbar degenerative conditions, a topic that has recently garnered significant interest. The use of AI-based solutions in spine surgery is particularly appealing due to its potential applications in preoperative planning and outcome prediction. This study aims to clarify the impact of artificial intelligence models on the diagnosis and prognosis of common types of degenerative conditions: lumbar disc herniation, spinal stenosis, and eventually spinal fusion. Additionally, the study seeks to identify predictive factors for lumbar fusion surgery based on a review of the literature from the past 10 years. From the literature search, 96 articles were examined. The literature on this topic appears to be consistent, describing various models that show promising results, particularly in predicting outcomes. However, most studies adopt a retrospective approach and often lack detailed information about imaging features, intraoperative findings, and postoperative functional metrics. Additionally, the predictive performance of these models varies significantly, and few studies include external validation. The application of artificial intelligence in treating degenerative spine conditions, while valid and promising, is still in a developmental phase. However, over the last decade, there has been an exponential growth in studies related to this subject, which is beginning to pave the way for its systematic use in clinical practice. Full article

(This article belongs to the Special Issue Clinical Applications of Modern Technologies in Neurosurgery and Spine Surgery)

► Show Figures

Figure 1

18 pages, 11340 KiB

Open AccessArticle

CLSANet: Cognitive Learning-Based Self-Adaptive Feature Fusion for Multimodal Visual Object Detection

by Han Peng, Qionglin Liu, Riqing Ruan, Shuaiqi Yuan and Qin Li

Electronics 2025, 14(15), 3082; https://doi.org/10.3390/electronics14153082 (registering DOI) - 1 Aug 2025

Abstract

Multimodal object detection leverages the complementary characteristics of visible (RGB) and infrared (IR) imagery, making it well-suited for challenging scenarios such as low illumination, occlusion, and complex backgrounds. However, most existing fusion-based methods rely on static or heuristic strategies, limiting their adaptability to [...] Read more.

Multimodal object detection leverages the complementary characteristics of visible (RGB) and infrared (IR) imagery, making it well-suited for challenging scenarios such as low illumination, occlusion, and complex backgrounds. However, most existing fusion-based methods rely on static or heuristic strategies, limiting their adaptability to dynamic environments. To address this limitation, we propose CLSANet, a cognitive learning-based self-adaptive network that enhances detection performance by dynamically selecting and integrating modality-specific features. CLSANet consists of three key modules: (1) a Dominant Modality Identification Module that selects the most informative modality based on global scene analysis; (2) a Modality Enhancement Module that disentangles and strengthens shared and modality-specific representations; and (3) a Self-Adaptive Fusion Module that adjusts fusion weights spatially according to local scene complexity. Compared to existing methods, CLSANet achieves state-of-the-art detection performance with significantly fewer parameters and lower computational cost. Ablation studies further demonstrate the individual effectiveness of each module under different environmental conditions, particularly in low-light and occluded scenes. CLSANet offers a compact, interpretable, and practical solution for multimodal object detection in resource-constrained settings. Full article

(This article belongs to the Special Issue Digital Intelligence Technology and Applications)

► Show Figures

Figure 1

24 pages, 10190 KiB

Open AccessArticle

MSMT-RTDETR: A Multi-Scale Model for Detecting Maize Tassels in UAV Images with Complex Field Backgrounds

by Zhenbin Zhu, Zhankai Gao, Jiajun Zhuang, Dongchen Huang, Guogang Huang, Hansheng Wang, Jiawei Pei, Jingjing Zheng and Changyu Liu

Agriculture 2025, 15(15), 1653; https://doi.org/10.3390/agriculture15151653 - 31 Jul 2025

Abstract

Accurate detection of maize tassels plays a crucial role in yield estimation of maize in precision agriculture. Recently, UAV and deep learning technologies have been widely introduced in various applications of field monitoring. However, complex field backgrounds pose multiple challenges against the precision [...] Read more.

Accurate detection of maize tassels plays a crucial role in yield estimation of maize in precision agriculture. Recently, UAV and deep learning technologies have been widely introduced in various applications of field monitoring. However, complex field backgrounds pose multiple challenges against the precision detection of maize tassels, including maize tassel multi-scale variations caused by varietal differences and growth stage variations, intra-class occlusion, and background interference. To achieve accurate maize tassel detection in UAV images under complex field backgrounds, this study proposes an MSMT-RTDETR detection model. The Faster-RPE Block is first designed to enhance multi-scale feature extraction while reducing model Params and FLOPs. To improve detection performance for multi-scale targets in complex field backgrounds, a Dynamic Cross-Scale Feature Fusion Module (Dy-CCFM) is constructed by upgrading the CCFM through dynamic sampling strategies and multi-branch architecture. Furthermore, the MPCC3 module is built via re-parameterization methods, and further strengthens cross-channel information extraction capability and model stability to deal with intra-class occlusion. Experimental results on the MTDC-UAV dataset demonstrate that the MSMT-RTDETR significantly outperforms the baseline in detecting maize tassels under complex field backgrounds, where a precision of 84.2% was achieved. Compared with Deformable DETR and YOLOv10m, improvements of 2.8% and 2.0% were achieved, respectively, in the mAP₅₀ for UAV images. This study proposes an innovative solution for accurate maize tassel detection, establishing a reliable technical foundation for maize yield estimation. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

30 pages, 59872 KiB

Open AccessArticle

Advancing 3D Seismic Fault Identification with SwiftSeis-AWNet: A Lightweight Architecture Featuring Attention-Weighted Multi-Scale Semantics and Detail Infusion

by Ang Li, Rui Li, Yuhao Zhang, Shanyi Li, Yali Guo, Liyan Zhang and Yuqing Shi

Electronics 2025, 14(15), 3078; https://doi.org/10.3390/electronics14153078 (registering DOI) - 31 Jul 2025

Abstract

The accurate identification of seismic faults, which serve as crucial fluid migration pathways in hydrocarbon reservoirs, is of paramount importance for reservoir characterization. Traditional interpretation is inefficient. It also struggles with complex geometries, failing to meet the current exploration demands. Deep learning boosts [...] Read more.

The accurate identification of seismic faults, which serve as crucial fluid migration pathways in hydrocarbon reservoirs, is of paramount importance for reservoir characterization. Traditional interpretation is inefficient. It also struggles with complex geometries, failing to meet the current exploration demands. Deep learning boosts fault identification significantly but struggles with edge accuracy and noise robustness. To overcome these limitations, this research introduces SwiftSeis-AWNet, a novel lightweight and high-precision network. The network is based on an optimized MedNeXt architecture for better fault edge detection. To address the noise from simple feature fusion, a Semantics and Detail Infusion (SDI) module is integrated. Since the Hadamard product in SDI can cause information loss, we engineer an Attention-Weighted Semantics and Detail Infusion (AWSDI) module that uses dynamic multi-scale feature fusion to preserve details. Validation on field seismic datasets from the Netherlands F3 and New Zealand Kerry blocks shows that SwiftSeis-AWNet mitigates challenges like the loss of small-scale fault features and misidentification of fault intersection zones, enhancing the accuracy and geological reliability of automated fault identification. Full article

(This article belongs to the Special Issue Advances in Image Recognition, Image Segmentation, Image Fusion, and Singal Processing)

► Show Figures

Figure 1

25 pages, 10331 KiB

Open AccessArticle

Forest Fire Detection Method Based on Dual-Branch Multi-Scale Adaptive Feature Fusion Network

by Qinggan Wu, Chen Wei, Ning Sun, Xiong Xiong, Qingfeng Xia, Jianmeng Zhou and Xingyu Feng

Forests 2025, 16(8), 1248; https://doi.org/10.3390/f16081248 - 31 Jul 2025

Abstract

There are significant scale and morphological differences between fire and smoke features in forest fire detection. This paper proposes a detection method based on dual-branch multi-scale adaptive feature fusion network (DMAFNet). In this method, convolutional neural network (CNN) and transformer are used to [...] Read more.

There are significant scale and morphological differences between fire and smoke features in forest fire detection. This paper proposes a detection method based on dual-branch multi-scale adaptive feature fusion network (DMAFNet). In this method, convolutional neural network (CNN) and transformer are used to form a dual-branch backbone network to extract local texture and global context information, respectively. In order to overcome the difference in feature distribution and response scale between the two branches, a feature correction module (FCM) is designed. Through space and channel correction mechanisms, the adaptive alignment of two branch features is realized. The Fusion Feature Module (FFM) is further introduced to fully integrate dual-branch features based on the two-way cross-attention mechanism and effectively suppress redundant information. Finally, the Multi-Scale Fusion Attention Unit (MSFAU) is designed to enhance the multi-scale detection capability of fire targets. Experimental results show that the proposed DMAFNet has significantly improved in mAP (mean average precision) indicators compared with existing mainstream detection methods. Full article

(This article belongs to the Section Natural Hazards and Risk Management)

► Show Figures

Figure 1

22 pages, 4399 KiB

Open AccessArticle

Deep Learning-Based Fingerprint–Vein Biometric Fusion: A Systematic Review with Empirical Evaluation

by Sarah Almuwayziri, Abeer Al-Nafjan, Hessah Aljumah and Mashael Aldayel

Appl. Sci. 2025, 15(15), 8502; https://doi.org/10.3390/app15158502 (registering DOI) - 31 Jul 2025

Abstract

User authentication is crucial for safeguarding access to digital systems and services. Biometric authentication serves as a strong and user-friendly alternative to conventional security methods such as passwords and PINs, which are often susceptible to breaches. This study proposes a deep learning-based multimodal [...] Read more.

User authentication is crucial for safeguarding access to digital systems and services. Biometric authentication serves as a strong and user-friendly alternative to conventional security methods such as passwords and PINs, which are often susceptible to breaches. This study proposes a deep learning-based multimodal biometric system that combines fingerprint (FP) and finger vein (FV) modalities to improve accuracy and security. The system explores three fusion strategies: feature-level fusion (combining feature vectors from each modality), score-level fusion (integrating prediction scores from each modality), and a hybrid approach that leverages both feature and score information. The implementation involved five pretrained convolutional neural network (CNN) models: two unimodal (FP-only and FV-only) and three multimodal models corresponding to each fusion strategy. The models were assessed using the NUPT-FPV dataset, which consists of 33,600 images collected from 140 subjects with a dual-mode acquisition device in varied environmental conditions. The results indicate that the hybrid-level fusion with a dominant score weight (0.7 score, 0.3 feature) achieved the highest accuracy (99.79%) and the lowest equal error rate (EER = 0.0018), demonstrating superior robustness. Overall, the results demonstrate that integrating deep learning with multimodal fusion is highly effective for advancing scalable and accurate biometric authentication solutions suitable for real-world deployments. Full article

(This article belongs to the Special Issue Applied Deep Learning in Sensitive and Biometric Information Protection)

► Show Figures

Figure 1

29 pages, 15488 KiB

Open AccessArticle

GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images

by Tao He, Jianyu Chen and Delu Pan

Remote Sens. 2025, 17(15), 2652; https://doi.org/10.3390/rs17152652 (registering DOI) - 31 Jul 2025

Viewed by 43

Abstract

Geographic object-based image analysis (GEOBIA) has demonstrated substantial utility in remote sensing tasks. However, its integration with deep learning remains largely confined to image-level classification. This is primarily due to the irregular shapes and fragmented boundaries of segmented objects, which limit its applicability [...] Read more.

Geographic object-based image analysis (GEOBIA) has demonstrated substantial utility in remote sensing tasks. However, its integration with deep learning remains largely confined to image-level classification. This is primarily due to the irregular shapes and fragmented boundaries of segmented objects, which limit its applicability in semantic segmentation. While convolutional neural networks (CNNs) excel at local feature extraction, they inherently struggle to capture long-range dependencies. In contrast, Transformer-based models are well suited for global context modeling but often lack fine-grained local detail. To overcome these limitations, we propose GOFENet (Geo-Object Feature Enhanced Network)—a hybrid semantic segmentation architecture that effectively fuses object-level priors into deep feature representations. GOFENet employs a dual-encoder design combining CNN and Swin Transformer architectures, enabling multi-scale feature fusion through skip connections to preserve both local and global semantics. An auxiliary branch incorporating cascaded atrous convolutions is introduced to inject information of segmented objects into the learning process. Furthermore, we develop a cross-channel selection module (CSM) for refined channel-wise attention, a feature enhancement module (FEM) to merge global and local representations, and a shallow–deep feature fusion module (SDFM) to integrate pixel- and object-level cues across scales. Experimental results on the GID and LoveDA datasets demonstrate that GOFENet achieves superior segmentation performance, with 66.02% mIoU and 51.92% mIoU, respectively. The model exhibits strong capability in delineating large-scale land cover features, producing sharper object boundaries and reducing classification noise, while preserving the integrity and discriminability of land cover categories. Full article

► Show Figures

Figure 1

21 pages, 1681 KiB

Open AccessArticle

Cross-Modal Complementarity Learning for Fish Feeding Intensity Recognition via Audio–Visual Fusion

by Jian Li, Yanan Wei, Wenkai Ma and Tan Wang

Animals 2025, 15(15), 2245; https://doi.org/10.3390/ani15152245 - 31 Jul 2025

Viewed by 48

Abstract

Accurate evaluation of fish feeding intensity is crucial for optimizing aquaculture efficiency and the healthy growth of fish. Previous methods mainly rely on single-modal approaches (e.g., audio or visual). However, the complex underwater environment makes single-modal monitoring methods face significant challenges: visual systems [...] Read more.

Accurate evaluation of fish feeding intensity is crucial for optimizing aquaculture efficiency and the healthy growth of fish. Previous methods mainly rely on single-modal approaches (e.g., audio or visual). However, the complex underwater environment makes single-modal monitoring methods face significant challenges: visual systems are severely affected by water turbidity, lighting conditions, and fish occlusion, while acoustic systems suffer from background noise. Although existing studies have attempted to combine acoustic and visual information, most adopt simple feature-level fusion strategies, which fail to fully explore the complementary advantages of the two modalities under different environmental conditions and lack dynamic evaluation mechanisms for modal reliability. To address these problems, we propose the Adaptive Cross-modal Attention Fusion Network (ACAF-Net), a cross-modal complementarity learning framework with a two-stage attention fusion mechanism: (1) a cross-modal enhancement stage that enriches individual representations through Low-rank Bilinear Pooling and learnable fusion weights; (2) an adaptive attention fusion stage that dynamically weights acoustic and visual features based on complementarity and environmental reliability. Our framework incorporates dimension alignment strategies and attention mechanisms to capture temporal–spatial complementarity between acoustic feeding signals and visual behavioral patterns. Extensive experiments demonstrate superior performance compared to single-modal and conventional fusion approaches, with 6.4% accuracy improvement. The results validate the effectiveness of exploiting cross-modal complementarity for underwater behavioral analysis and establish a foundation for intelligent aquaculture monitoring systems. Full article

(This article belongs to the Special Issue Innovations in Aquaculture: New Technologies, Culture Systems and Integration of Emerging Species)

► Show Figures

Figure 1

13 pages, 11739 KiB

Open AccessArticle

DeepVinci: Organ and Tool Segmentation with Edge Supervision and a Densely Multi-Scale Pyramid Module for Robot-Assisted Surgery

by Li-An Tseng, Yuan-Chih Tsai, Meng-Yi Bai, Mei-Fang Li, Yi-Liang Lee, Kai-Jo Chiang, Yu-Chi Wang and Jing-Ming Guo

Diagnostics 2025, 15(15), 1917; https://doi.org/10.3390/diagnostics15151917 - 30 Jul 2025

Viewed by 167

Abstract

Background: Automated surgical navigation can be separated into three stages: (1) organ identification and localization, (2) identification of the organs requiring further surgery, and (3) automated planning of the operation path and steps. With its ideal visual and operating system, the da [...] Read more.

Background: Automated surgical navigation can be separated into three stages: (1) organ identification and localization, (2) identification of the organs requiring further surgery, and (3) automated planning of the operation path and steps. With its ideal visual and operating system, the da Vinci surgical system provides a promising platform for automated surgical navigation. This study focuses on the first step in automated surgical navigation by identifying organs in gynecological surgery. Methods: Due to the difficulty of collecting da Vinci gynecological endoscopy data, we propose DeepVinci, a novel end-to-end high-performance encoder–decoder network based on convolutional neural networks (CNNs) for pixel-level organ semantic segmentation. Specifically, to overcome the drawback of a limited field of view, we incorporate a densely multi-scale pyramid module and feature fusion module, which can also enhance the global context information. In addition, the system integrates an edge supervision network to refine the segmented results on the decoding side. Results: Experimental results show that DeepVinci can achieve state-of-the-art accuracy, obtaining dice similarity coefficient and mean pixel accuracy values of 0.684 and 0.700, respectively. Conclusions: The proposed DeepVinci network presents a practical and competitive semantic segmentation solution for da Vinci gynecological surgery. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

23 pages, 7739 KiB

Open AccessArticle

AGS-YOLO: An Efficient Underwater Small-Object Detection Network for Low-Resource Environments

by Weikai Sun, Xiaoqun Liu, Juan Hao, Qiyou Yao, Hailin Xi, Yuwen Wu and Zhaoye Xing

J. Mar. Sci. Eng. 2025, 13(8), 1465; https://doi.org/10.3390/jmse13081465 - 30 Jul 2025

Viewed by 169

Abstract

Detecting underwater targets is crucial for ecological evaluation and the sustainable use of marine resources. To enhance environmental protection and optimize underwater resource utilization, this study proposes AGS-YOLO, an innovative underwater small-target detection model based on YOLO11. Firstly, this study proposes AMSA, a [...] Read more.

Detecting underwater targets is crucial for ecological evaluation and the sustainable use of marine resources. To enhance environmental protection and optimize underwater resource utilization, this study proposes AGS-YOLO, an innovative underwater small-target detection model based on YOLO11. Firstly, this study proposes AMSA, a multi-scale attention module, and optimizes the C3k2 structure to improve the detection and precise localization of small targets. Secondly, a streamlined GSConv convolutional module is incorporated to minimize the parameter count and computational load while effectively retaining inter-channel dependencies. Finally, a novel and efficient cross-scale connected neck network is designed to achieve information complementarity and feature fusion among different scales, efficiently capturing multi-scale semantics while decreasing the complexity of the model. In contrast with the baseline model, the method proposed in this paper demonstrates notable benefits for use in underwater devices constrained by limited computational capabilities. The results demonstrate that AGS-YOLO significantly outperforms previous methods in terms of accuracy on the DUO underwater dataset, with mAP@0.5 improving by 1.3% and mAP@0.5:0.95 improving by 2.6% relative to those of the baseline YOLO11n model. In addition, the proposed model also shows excellent performance on the RUOD dataset, demonstrating its competent detection accuracy and reliable generalization. This study proposes innovative approaches and methodologies for underwater small-target detection, which have significant practical relevance. Full article

(This article belongs to the Topic Applications and Development of Underwater Robotics and Underwater Vision Technology, 2nd Edition)

► Show Figures

Figure 1

26 pages, 62045 KiB

Open AccessArticle

CML-RTDETR: A Lightweight Wheat Head Detection and Counting Algorithm Based on the Improved RT-DETR

by Yue Fang, Chenbo Yang, Chengyong Zhu, Hao Jiang, Jingmin Tu and Jie Li

Electronics 2025, 14(15), 3051; https://doi.org/10.3390/electronics14153051 - 30 Jul 2025

Viewed by 108

Abstract

Wheat is one of the important grain crops, and spike counting is crucial for predicting spike yield. However, in complex farmland environments, the wheat body scale has huge differences, its color is highly similar to the background, and wheat ears often overlap with [...] Read more.

Wheat is one of the important grain crops, and spike counting is crucial for predicting spike yield. However, in complex farmland environments, the wheat body scale has huge differences, its color is highly similar to the background, and wheat ears often overlap with each other, which makes wheat ear detection work face a lot of challenges. At the same time, the increasing demand for high accuracy and fast response in wheat spike detection has led to the need for models to be lightweight function with reduced the hardware costs. Therefore, this study proposes a lightweight wheat ear detection model, CML-RTDETR, for efficient and accurate detection of wheat ears in real complex farmland environments. In the model construction, the lightweight network CSPDarknet is firstly introduced as the backbone network of CML-RTDETR to enhance the feature extraction efficiency. In addition, the FM module is cleverly introduced to modify the bottleneck layer in the C2f component, and hybrid feature extraction is realized by spatial and frequency domain splicing to enhance the feature extraction capability of wheat to be tested in complex scenes. Secondly, to improve the model’s detection capability for targets of different scales, a multi-scale feature enhancement pyramid (MFEP) is designed, consisting of GHSDConv, for efficiently obtaining low-level detail information and CSPDWOK for constructing a multi-scale semantic fusion structure. Finally, channel pruning based on Layer-Adaptive Magnitude Pruning (LAMP) scoring is performed to reduce model parameters and runtime memory. The experimental results on the GWHD2021 dataset show that the

{AP}_{50}

of CML-RTDETR reaches 90.5%, which is an improvement of 1.2% compared to the baseline RTDETR-R18 model. Meanwhile, the parameters and GFLOPs have been decreased to 11.03 M and 37.8 G, respectively, resulting in a reduction of 42% and 34%, respectively. Finally, the real-time frame rate reaches 73 fps, significantly achieving parameter simplification and speed improvement. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 3139 KiB

Open AccessArticle

Intelligent Recognition and Parameter Estimation of Radar Active Jamming Based on Oriented Object Detection

by Jiawei Lu, Yiduo Guo, Weike Feng, Xiaowei Hu, Jian Gong and Yu Zhang

Remote Sens. 2025, 17(15), 2646; https://doi.org/10.3390/rs17152646 - 30 Jul 2025

Viewed by 83

Abstract

To enhance the perception capability of radar in complex electromagnetic environments, this paper proposes an intelligent jamming recognition and parameter estimation method based on deep learning. The core idea of the method is to reformulate the jamming perception problem as an object detection [...] Read more.

To enhance the perception capability of radar in complex electromagnetic environments, this paper proposes an intelligent jamming recognition and parameter estimation method based on deep learning. The core idea of the method is to reformulate the jamming perception problem as an object detection task in computer vision, and we pioneer the application of oriented object detection to this problem, enabling simultaneous jamming classification and key parameter estimation. This method takes the time–frequency spectrogram of jamming signals as input. First, it employs the oriented object detection network YOLOv8-OBB (You Only Look Once Version 8–oriented bounding box) to identify three types of classic suppression jamming and five types of Interrupted Sampling Repeater Jamming (ISRJ) and outputs the positional information of the jamming in the time–frequency spectrogram. Second, for the five ISRJ types, a post-processing algorithm based on boxes fusion is designed to further extract features for secondary recognition. Finally, by integrating the detection box information and secondary recognition results, parameters of different ISRJ are estimated. In this paper, ablation experiments from the perspective of Non-Maximum Suppression (NMS) are conducted to simulate and compare the OBB method with the traditional horizontal bounding box-based detection approaches, highlighting OBB’s detection superiority in dense jamming scenarios. Experimental results show that, compared with existing jamming detection methods, the proposed method achieves higher detection probabilities under the jamming-to-noise ratio (JNR) ranging from 0 to 20 dB, with correct identification rates exceeding 98.5% for both primary and secondary recognition stages. Moreover, benefiting from the advanced YOLOv8 network, the method exhibits an absolute error of less than 1.85% in estimating six types of jamming parameters, outperforming existing methods in estimation accuracy across different JNR conditions. Full article

(This article belongs to the Special Issue Array and Signal Processing for Radar (Second Edition))

► Show Figures

Figure 1

Search Results (4,085)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (4,085)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI