MDPI - Publisher of Open Access Journals

26 pages, 27107 KiB

Open AccessArticle

MSFUnet: A Semantic Segmentation Network for Crop Leaf Growth Status Monitoring

by Zhihan Cheng and He Yan

AgriEngineering 2025, 7(7), 238; https://doi.org/10.3390/agriengineering7070238 - 15 Jul 2025

Viewed by 127

Monitoring the growth status of crop leaves is an integral part of agricultural management and involves important tasks such as leaf shape analysis and area calculation. To achieve this goal, accurate leaf segmentation is a critical step. However, this task presents a challenge, [...] Read more.

Monitoring the growth status of crop leaves is an integral part of agricultural management and involves important tasks such as leaf shape analysis and area calculation. To achieve this goal, accurate leaf segmentation is a critical step. However, this task presents a challenge, as crop leaf images often feature substantial overlap, obstructing the precise differentiation of individual leaf edges. Moreover, existing segmentation methods fail to preserve fine edge details, a deficiency that compromises precise morphological analysis. To overcome these challenges, we introduce MSFUnet, an innovative network for semantic segmentation. MSFUnet integrates a multi-path feature fusion (MFF) mechanism and an edge-detail focus (EDF) module. The MFF module integrates multi-scale features to improve the model’s capacity for distinguishing overlapping leaf areas, while the EDF module employs extended convolution to accurately capture fine edge details. Collectively, these modules enable MSFUnet to achieve high-precision individual leaf segmentation. In addition, standard image augmentations (e.g., contrast/brightness adjustments) were applied to mitigate the impact of variable lighting conditions on leaf appearance in the input images, thereby improving model robustness. Experimental results indicate that MSFUnet attains an MIoU of 93.35%, outperforming conventional segmentation methods and highlighting its effectiveness in crop leaf growth monitoring. Full article

► Show Figures

Figure 1

34 pages, 4568 KiB

Open AccessReview

Nanoradiopharmaceuticals: Design Principles, Radiolabeling Strategies, and Biomedicine Applications

by Andrés Núñez-Salinas, Cristian Parra-Garretón, Daniel Acuña, Sofía Peñaloza, Germán Günther, Soledad Bollo, Francisco Arriagada and Javier Morales

Pharmaceutics 2025, 17(7), 912; https://doi.org/10.3390/pharmaceutics17070912 - 14 Jul 2025

Viewed by 156

Abstract

Nanoradiopharmaceuticals integrate nanotechnology with nuclear medicine to enhance the precision and effectiveness of radiopharmaceuticals used in diagnostic imaging and targeted therapies. Nanomaterials offer improved targeting capabilities and greater stability, helping to overcome several limitations. This review presents a comprehensive overview of the fundamental [...] Read more.

Nanoradiopharmaceuticals integrate nanotechnology with nuclear medicine to enhance the precision and effectiveness of radiopharmaceuticals used in diagnostic imaging and targeted therapies. Nanomaterials offer improved targeting capabilities and greater stability, helping to overcome several limitations. This review presents a comprehensive overview of the fundamental design principles, radiolabeling techniques, and biomedical applications of nanoradiopharmaceuticals, with a particular focus on their expanding role in precision oncology. It explores key areas, including single- and multi-modal imaging modalities (SPECT, PET), radionuclide therapies involving beta, alpha, and Auger emitters, and integrated theranostic systems. A diverse array of nanocarriers is examined, including liposomes, micelles, albumin nanoparticles, PLGA, dendrimers, and gold, iron oxide, and silica-based platforms, with an assessment of both preclinical and clinical research outcomes. Theranostic nanoplatforms, which integrate diagnostic and therapeutic functions within a single system, enable real-time monitoring and personalized dose optimization. Although some of these systems have progressed to clinical trials, several obstacles remain, including formulation stability, scalable manufacturing, regulatory compliance, and long-term safety considerations. In summary, nanoradiopharmaceuticals represent a promising frontier in personalized medicine, particularly in oncology. By combining diagnostic and therapeutic capabilities within a single nanosystem, they facilitate more individualized and adaptive treatment approaches. Continued innovation in formulation, radiochemistry, and regulatory harmonization will be crucial to their successful routine clinical use. Full article

(This article belongs to the Special Issue Nanosystems for Advanced Diagnostics and Therapy)

► Show Figures

Figure 1

22 pages, 7562 KiB

Open AccessArticle

FIGD-Net: A Symmetric Dual-Branch Dehazing Network Guided by Frequency Domain Information

by Luxia Yang, Yingzhao Xue, Yijin Ning, Hongrui Zhang and Yongjie Ma

Symmetry 2025, 17(7), 1122; https://doi.org/10.3390/sym17071122 - 13 Jul 2025

Viewed by 221

Abstract

Image dehazing technology is a crucial component in the fields of intelligent transportation and autonomous driving. However, most existing dehazing algorithms only process images in the spatial domain, failing to fully exploit the rich information in the frequency domain, which leads to residual [...] Read more.

Image dehazing technology is a crucial component in the fields of intelligent transportation and autonomous driving. However, most existing dehazing algorithms only process images in the spatial domain, failing to fully exploit the rich information in the frequency domain, which leads to residual haze in the images. To address this issue, we propose a novel Frequency-domain Information Guided Symmetric Dual-branch Dehazing Network (FIGD-Net), which utilizes the spatial branch to extract local haze features and the frequency branch to capture the global haze distribution, thereby guiding the feature learning process in the spatial branch. The FIGD-Net mainly consists of three key modules: the Frequency Detail Extraction Module (FDEM), the Dual-Domain Multi-scale Feature Extraction Module (DMFEM), and the Dual-Domain Guidance Module (DGM). First, the FDEM employs the Discrete Cosine Transform (DCT) to convert the spatial domain into the frequency domain. It then selectively extracts high-frequency and low-frequency features based on predefined proportions. The high-frequency features, which contain haze-related information, are correlated with the overall characteristics of the low-frequency features to enhance the representation of haze attributes. Next, the DMFEM utilizes stacked residual blocks and gradient feature flows to capture local detail features. Specifically, frequency-guided weights are applied to adjust the focus of feature channels, thereby improving the module’s ability to capture multi-scale features and distinguish haze features. Finally, the DGM adjusts channel weights guided by frequency information. This smooths out redundant signals and enables cross-branch information exchange, which helps to restore the original image colors. Extensive experiments demonstrate that the proposed FIGD-Net achieves superior dehazing performance on multiple synthetic and real-world datasets. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

36 pages, 25361 KiB

Open AccessArticle

Remote Sensing Image Compression via Wavelet-Guided Local Structure Decoupling and Channel–Spatial State Modeling

by Jiahui Liu, Lili Zhang and Xianjun Wang

Remote Sens. 2025, 17(14), 2419; https://doi.org/10.3390/rs17142419 - 12 Jul 2025

Viewed by 282

Abstract

As the resolution and data volume of remote sensing imagery continue to grow, achieving efficient compression without sacrificing reconstruction quality remains a major challenge, given that traditional handcrafted codecs often fail to balance rate-distortion performance and computational complexity, while deep learning-based approaches offer [...] Read more.

As the resolution and data volume of remote sensing imagery continue to grow, achieving efficient compression without sacrificing reconstruction quality remains a major challenge, given that traditional handcrafted codecs often fail to balance rate-distortion performance and computational complexity, while deep learning-based approaches offer superior representational capacity. However, challenges remain in achieving a balance between fine-detail adaptation and computational efficiency. Mamba, a state–space model (SSM)-based architecture, offers linear-time complexity and excels at capturing long-range dependencies in sequences. It has been adopted in remote sensing compression tasks to model long-distance dependencies between pixels. However, despite its effectiveness in global context aggregation, Mamba’s uniform bidirectional scanning is insufficient for capturing high-frequency structures such as edges and textures. Moreover, existing visual state–space (VSS) models built upon Mamba typically treat all channels equally and lack mechanisms to dynamically focus on semantically salient spatial regions. To address these issues, we present an innovative architecture for distant sensing image compression, called the Multi-scale Channel Global Mamba Network (MGMNet). MGMNet integrates a spatial–channel dynamic weighting mechanism into the Mamba architecture, enhancing global semantic modeling while selectively emphasizing informative features. It comprises two key modules. The Wavelet Transform-guided Local Structure Decoupling (WTLS) module applies multi-scale wavelet decomposition to disentangle and separately encode low- and high-frequency components, enabling efficient parallel modeling of global contours and local textures. The Channel–Global Information Modeling (CGIM) module enhances conventional VSS by introducing a dual-path attention strategy that reweights spatial and channel information, improving the modeling of long-range dependencies and edge structures. We conducted extensive evaluations on three distinct remote sensing datasets to assess the MGMNet. The results of the investigations revealed that MGMNet outperforms the current SOTA models across various performance metrics. Full article

(This article belongs to the Special Issue New Insights in Remote Sensing Image Interpretation with Deep Learning)

► Show Figures

Figure 1

24 pages, 2440 KiB

Open AccessArticle

A Novel Dynamic Context Branch Attention Network for Detecting Small Objects in Remote Sensing Images

by Huazhong Jin, Yizhuo Song, Ting Bai, Kaimin Sun and Yepei Chen

Remote Sens. 2025, 17(14), 2415; https://doi.org/10.3390/rs17142415 - 12 Jul 2025

Viewed by 150

Abstract

Detecting small objects in remote sensing images is challenging due to their size, which results in limited distinctive features. This limitation necessitates the effective use of contextual information for accurate identification. Many existing methods often struggle because they do not dynamically adjust the [...] Read more.

Detecting small objects in remote sensing images is challenging due to their size, which results in limited distinctive features. This limitation necessitates the effective use of contextual information for accurate identification. Many existing methods often struggle because they do not dynamically adjust the contextual scope based on the specific characteristics of each target. To address this issue and improve the detection performance of small objects (typically defined as objects with a bounding box area of less than 1024 pixels), we propose a novel backbone network called the Dynamic Context Branch Attention Network (DCBANet). We present the Dynamic Context Scale-Aware (DCSA) Block, which utilizes a multi-branch architecture to generate features with diverse receptive fields. Within each branch, a Context Adaptive Selection Module (CASM) dynamically weights information, allowing the model to focus on the most relevant context. To further enhance performance, we introduce an Efficient Branch Attention (EBA) module that adaptively reweights the parallel branches, prioritizing the most discriminative ones. Finally, to ensure computational efficiency, we design a Dual-Gated Feedforward Network (DGFFN), a lightweight yet powerful replacement for standard FFNs. Extensive experiments conducted on four public remote sensing datasets demonstrate that the DCBANet achieves impressive mAP@0.5 scores of 80.79% on DOTA, 89.17% on NWPU VHR-10, 80.27% on SIMD, and a remarkable 42.4% mAP@0.5:0.95 on the specialized small object benchmark AI-TOD. These results surpass RetinaNet, YOLOF, FCOS, Faster R-CNN, Dynamic R-CNN, SKNet, and Cascade R-CNN, highlighting its effectiveness in detecting small objects in remote sensing images. However, there remains potential for further improvement in multi-scale and weak target detection. Future work will integrate local and global context to enhance multi-scale object detection performance. Full article

(This article belongs to the Special Issue High-Resolution Remote Sensing Image Processing and Applications)

► Show Figures

Figure 1

22 pages, 4079 KiB

Open AccessArticle

Breast Cancer Classification with Various Optimized Deep Learning Methods

by Mustafa Güler, Gamze Sart, Ömer Algorabi, Ayse Nur Adıguzel Tuylu and Yusuf Sait Türkan

Diagnostics 2025, 15(14), 1751; https://doi.org/10.3390/diagnostics15141751 - 10 Jul 2025

Viewed by 246

Abstract

Background/Objectives: In recent years, there has been a significant increase in the number of women with breast cancer. Breast cancer prediction is defined as a medical data analysis and image processing problem. Experts may need artificial intelligence technologies to distinguish between benign and [...] Read more.

Background/Objectives: In recent years, there has been a significant increase in the number of women with breast cancer. Breast cancer prediction is defined as a medical data analysis and image processing problem. Experts may need artificial intelligence technologies to distinguish between benign and malignant tumors in order to make decisions. When the studies in the literature are examined, it can be seen that applications of deep learning algorithms in the field of medicine have achieved very successful results. Methods: In this study, 11 different deep learning algorithms (Vanilla, ResNet50, ResNet152, VGG16, DenseNet152, MobileNetv2, EfficientB1, NasNet, DenseNet201, ensemble, and Tuned Model) were used. Images of pathological specimens from breast biopsies consisting of two classes, benign and malignant, were used for classification analysis. To limit the computational time and speed up the analysis process, 10,000 images, 6172 IDC-negative and 3828 IDC-positive, were selected. Of the images, 80% were used for training, 10% were used for validation, and 10% were used for testing the trained model. Results: The results demonstrate that DenseNet201 achieved the highest classification accuracy of 89.4%, with a precision of 88.2%, a recall of 84.1%, an F1 score of 86.1%, and an AUC score of 95.8%. Conclusions: In conclusion, this study highlights the potential of deep learning algorithms in breast cancer classification. Future research should focus on integrating multi-modal imaging data, refining ensemble learning methodologies, and expanding dataset diversity to further improve the classification accuracy and real-world clinical applicability. Full article

(This article belongs to the Topic Machine Learning and Deep Learning in Medical Imaging)

► Show Figures

Figure 1

18 pages, 16017 KiB

Open AccessArticle

Design and Fabrication of Multi-Frequency and Low-Quality-Factor Capacitive Micromachined Ultrasonic Transducers

by Amirhossein Moshrefi, Abid Ali, Mathieu Gratuze and Frederic Nabki

Micromachines 2025, 16(7), 797; https://doi.org/10.3390/mi16070797 - 8 Jul 2025

Viewed by 350

Abstract

Capacitive micromachined ultrasonic transducers (CMUTs) have been developed for air-coupled applications to address key challenges such as noise, prolonged ringing, and side-lobe interference. This study introduces an optimized CMUT design that leverages the squeeze-film damping effect to achieve a low-quality factor, enhancing resolution [...] Read more.

Capacitive micromachined ultrasonic transducers (CMUTs) have been developed for air-coupled applications to address key challenges such as noise, prolonged ringing, and side-lobe interference. This study introduces an optimized CMUT design that leverages the squeeze-film damping effect to achieve a low-quality factor, enhancing resolution and temporal precision for imaging as one of the suggested airborne application. The device was fabricated using the PolyMUMPs process, ensuring high structural accuracy and consistency. Finite element analysis (FEA) simulations validated the optimized parameters, demonstrating improved displacement, reduced side-lobe artifacts, and sharper main lobes for superior imaging performance. Experimental validation, including Laser Doppler Vibrometer (LDV) measurements of membrane displacement and mode shapes, along with ring oscillation tests to assess Q-factor and signal decay, confirmed the device’s reliability and consistency across four CMUT arrays. Additionally, this study explores the implementation of multi-frequency CMUT arrays, enhancing imaging versatility across different air-coupled applications. By integrating multiple frequency bands, the proposed CMUTs enable adaptable imaging focus, improving their suitability for diverse diagnostic scenarios. These advancements highlight the potential of the proposed design to deliver a superior performance for airborne applications, paving the way for its integration into advanced diagnostic systems. Full article

(This article belongs to the Special Issue MEMS Ultrasonic Transducers)

► Show Figures

Figure 1

17 pages, 7786 KiB

Open AccessArticle

Video Coding Based on Ladder Subband Recovery and ResGroup Module

by Libo Wei, Aolin Zhang, Lei Liu, Jun Wang and Shuai Wang

Entropy 2025, 27(7), 734; https://doi.org/10.3390/e27070734 - 8 Jul 2025

Viewed by 240

Abstract

With the rapid development of video encoding technology in the field of computer vision, the demand for tasks such as video frame reconstruction, denoising, and super-resolution has been continuously increasing. However, traditional video encoding methods typically focus on extracting spatial or temporal domain [...] Read more.

With the rapid development of video encoding technology in the field of computer vision, the demand for tasks such as video frame reconstruction, denoising, and super-resolution has been continuously increasing. However, traditional video encoding methods typically focus on extracting spatial or temporal domain information, often facing challenges of insufficient accuracy and information loss when reconstructing high-frequency details, edges, and textures of images. To address this issue, this paper proposes an innovative LadderConv framework, which combines discrete wavelet transform (DWT) with spatial and channel attention mechanisms. By progressively recovering wavelet subbands, it effectively enhances the video frame encoding quality. Specifically, the LadderConv framework adopts a stepwise recovery approach for wavelet subbands, first processing high-frequency detail subbands with relatively less information, then enhancing the interaction between these subbands, and ultimately synthesizing a high-quality reconstructed image through inverse wavelet transform. Moreover, the framework introduces spatial and channel attention mechanisms, which further strengthen the focus on key regions and channel features, leading to notable improvements in detail restoration and image reconstruction accuracy. To optimize the performance of the LadderConv framework, particularly in detail recovery and high-frequency information extraction tasks, this paper designs an innovative ResGroup module. By using multi-layer convolution operations along with feature map compression and recovery, the ResGroup module enhances the network’s expressive capability and effectively reduces computational complexity. The ResGroup module captures multi-level features from low level to high level and retains rich feature information through residual connections, thus improving the overall reconstruction performance of the model. In experiments, the combination of the LadderConv framework and the ResGroup module demonstrates superior performance in video frame reconstruction tasks, particularly in recovering high-frequency information, image clarity, and detail representation. Full article

(This article belongs to the Special Issue Rethinking Representation Learning in the Age of Large Models)

► Show Figures

Figure 1

27 pages, 19258 KiB

Open AccessArticle

A Lightweight Multi-Frequency Feature Fusion Network with Efficient Attention for Breast Tumor Classification in Pathology Images

by Hailong Chen, Qingqing Song and Guantong Chen

Information 2025, 16(7), 579; https://doi.org/10.3390/info16070579 - 6 Jul 2025

Viewed by 308

Abstract

The intricate and complex tumor cell morphology in breast pathology images is a key factor for tumor classification. This paper proposes a lightweight breast tumor classification model with multi-frequency feature fusion (LMFM) to tackle the problem of inadequate feature extraction and poor classification [...] Read more.

The intricate and complex tumor cell morphology in breast pathology images is a key factor for tumor classification. This paper proposes a lightweight breast tumor classification model with multi-frequency feature fusion (LMFM) to tackle the problem of inadequate feature extraction and poor classification performance. The LMFM utilizes wavelet transform (WT) for multi-frequency feature fusion, integrating high-frequency (HF) tumor details with high-level semantic features to enhance feature representation. The network’s ability to extract irregular tumor characteristics is further reinforced by dynamic adaptive deformable convolution (DADC). The introduction of the token-based Region Focus Module (TRFM) reduces interference from irrelevant background information. At the same time, the incorporation of a linear attention (LA) mechanism lowers the model’s computational complexity and further enhances its global feature extraction capability. The experimental results demonstrate that the proposed model achieves classification accuracies of 98.23% and 97.81% on the BreaKHis and BACH datasets, with only 9.66 M parameters. Full article

(This article belongs to the Section Biomedical Information and Health)

► Show Figures

Figure 1

21 pages, 10364 KiB

Open AccessArticle

LightMFF: A Simple and Efficient Ultra-Lightweight Multi-Focus Image Fusion Network

by Xinzhe Xie, Zijian Lin, Buyu Guo, Shuangyan He, Yanzhen Gu, Yefei Bai and Peiliang Li

Appl. Sci. 2025, 15(13), 7500; https://doi.org/10.3390/app15137500 - 3 Jul 2025

Viewed by 214

Abstract

In recent years, deep learning-based multi-focus image fusion (MFF) methods have demonstrated remarkable performance. However, their reliance on complex network architectures often demands substantial computational resources, limiting practical applications. To address this, we propose LightMFF, an ultra-lightweight fusion network that achieves superior performance [...] Read more.

In recent years, deep learning-based multi-focus image fusion (MFF) methods have demonstrated remarkable performance. However, their reliance on complex network architectures often demands substantial computational resources, limiting practical applications. To address this, we propose LightMFF, an ultra-lightweight fusion network that achieves superior performance with minimal computational overhead. Our core insight is to reformulate the multi-focus fusion problem from a classification perspective to a refinement perspective, where coarse initial decision maps and explicit edge information are leveraged to guide the final decision map generation. This novel formulation enables a significantly simplified architecture, requiring only 0.02 M parameters while maintaining state-of-the-art fusion quality. Extensive experiments demonstrate that LightMFF achieves real-time performance at 0.02 s per image pair with merely 0.06 G FLOPs, representing a 98.05% reduction in computational cost compared to prior approaches. Crucially, LightMFF consistently surpasses existing methods across standard fusion quality metrics. Full article

(This article belongs to the Special Issue Advances in Optical Imaging and Deep Learning)

► Show Figures

Figure 1

18 pages, 2148 KiB

Open AccessArticle

A Cross-Spatial Differential Localization Network for Remote Sensing Change Captioning

by Ruijie Wu, Hao Ye, Xiangying Liu, Zhenzhen Li, Chenhao Sun and Jiajia Wu

Remote Sens. 2025, 17(13), 2285; https://doi.org/10.3390/rs17132285 - 3 Jul 2025

Viewed by 253

Abstract

Remote Sensing Image Change Captioning (RSICC) aims to generate natural language descriptions of changes in bi-temporal remote sensing images, providing more semantically interpretable results than conventional pixel-level change detection methods. However, existing approaches often rely on stacked Transformer modules, leading to suboptimal feature [...] Read more.

Remote Sensing Image Change Captioning (RSICC) aims to generate natural language descriptions of changes in bi-temporal remote sensing images, providing more semantically interpretable results than conventional pixel-level change detection methods. However, existing approaches often rely on stacked Transformer modules, leading to suboptimal feature discrimination. Moreover, direct difference computation after feature extraction tends to retain task-irrelevant noise, limiting the model’s ability to capture meaningful changes. This study proposes a novel cross-spatial Transformer and symmetric difference localization network (CTSD-Net) for RSICC to address these limitations. The proposed Cross-Spatial Transformer adaptively enhances spatial-aware feature representations by guiding the model to focus on key regions across temporal images. Additionally, a hierarchical difference feature integration strategy is introduced to suppress noise by fusing multi-level differential features, while residual-connected high-level features serve as query vectors to facilitate bidirectional change representation learning. Finally, a causal Transformer decoder creates accurate descriptions by linking visual information with text. CTSD-Net achieved BLEU-4 scores of 66.32 and 73.84 on the LEVIR-CC and WHU-CDC datasets, respectively, outperforming existing methods in accurately locating change areas and describing them semantically. This study provides a promising solution for enhancing interpretability in remote sensing change analysis. Full article

(This article belongs to the Special Issue Machine Learning and Deep Learning Applied to Remote Sensing Image Analysis)

► Show Figures

Figure 1

19 pages, 2374 KiB

Open AccessArticle

Tracking and Registration Technology Based on Panoramic Cameras

by Chao Xu, Guoxu Li, Ye Bai, Yuzhuo Bai, Zheng Cao and Cheng Han

Appl. Sci. 2025, 15(13), 7397; https://doi.org/10.3390/app15137397 - 1 Jul 2025

Viewed by 236

Abstract

Augmented reality (AR) has become a research focus in computer vision and graphics, with growing applications driven by advances in artificial intelligence and the emergence of the metaverse. Panoramic cameras offer new opportunities for AR due to their wide field of view but [...] Read more.

Augmented reality (AR) has become a research focus in computer vision and graphics, with growing applications driven by advances in artificial intelligence and the emergence of the metaverse. Panoramic cameras offer new opportunities for AR due to their wide field of view but also pose significant challenges for camera pose estimation because of severe distortion and complex scene textures. To address these issues, this paper proposes a lightweight, unsupervised deep learning model for panoramic camera pose estimation. The model consists of a depth estimation sub-network and a pose estimation sub-network, both optimized for efficiency using network compression, multi-scale rectangular convolutions, and dilated convolutions. A learnable occlusion mask is incorporated into the pose network to mitigate errors caused by complex relative motion. Furthermore, a panoramic view reconstruction model is constructed to obtain effective supervisory signals from the predicted depth, pose information, and corresponding panoramic images and is trained using a designed spherical photometric consistency loss. The experimental results demonstrate that the proposed method achieves competitive accuracy while maintaining high computational efficiency, making it well-suited for real-time AR applications with panoramic input. Full article

► Show Figures

Figure 1

14 pages, 6074 KiB

Open AccessArticle

Cross-Modal Data Fusion via Vision-Language Model for Crop Disease Recognition

by Wenjie Liu, Guoqing Wu, Han Wang and Fuji Ren

Sensors 2025, 25(13), 4096; https://doi.org/10.3390/s25134096 - 30 Jun 2025

Viewed by 250

Abstract

Crop diseases pose a significant threat to agricultural productivity and global food security. Timely and accurate disease identification is crucial for improving crop yield and quality. While most existing deep learning-based methods focus primarily on image datasets for disease recognition, they often overlook [...] Read more.

Crop diseases pose a significant threat to agricultural productivity and global food security. Timely and accurate disease identification is crucial for improving crop yield and quality. While most existing deep learning-based methods focus primarily on image datasets for disease recognition, they often overlook the complementary role of textual features in enhancing visual understanding. To address this problem, we proposed a cross-modal data fusion via a vision-language model for crop disease recognition. Our approach leverages the Zhipu.ai multi-model to generate comprehensive textual descriptions of crop leaf diseases, including global description, local lesion description, and color-texture description. These descriptions are encoded into feature vectors, while an image encoder extracts image features. A cross-attention mechanism then iteratively fuses multimodal features across multiple layers, and a classification prediction module generates classification probabilities. Extensive experiments on the Soybean Disease, AI Challenge 2018, and PlantVillage datasets demonstrate that our method outperforms state-of-the-art image-only approaches with higher accuracy and fewer parameters. Specifically, with only 1.14M model parameters, our model achieves a 98.74%, 87.64% and 99.08% recognition accuracy on the three datasets, respectively. The results highlight the effectiveness of cross-modal learning in leveraging both visual and textual cues for precise and efficient disease recognition, offering a scalable solution for crop disease recognition. Full article

(This article belongs to the Section Smart Agriculture)

► Show Figures

Figure 1

26 pages, 23383 KiB

Open AccessArticle

Multi-Focus Image Fusion Based on Dual-Channel Rybak Neural Network and Consistency Verification in NSCT Domain

by Ming Lv, Sensen Song, Zhenhong Jia, Liangliang Li and Hongbing Ma

Fractal Fract. 2025, 9(7), 432; https://doi.org/10.3390/fractalfract9070432 - 30 Jun 2025

Viewed by 290

Abstract

In multi-focus image fusion, accurately detecting and extracting focused regions remains a key challenge. Some existing methods suffer from misjudgment of focus areas, resulting in incorrect focus information or the unintended retention of blurred regions in the fused image. To address these issues, [...] Read more.

In multi-focus image fusion, accurately detecting and extracting focused regions remains a key challenge. Some existing methods suffer from misjudgment of focus areas, resulting in incorrect focus information or the unintended retention of blurred regions in the fused image. To address these issues, this paper proposes a novel multi-focus image fusion method that leverages a dual-channel Rybak neural network combined with consistency verification in the nonsubsampled contourlet transform (NSCT) domain. Specifically, the high-frequency sub-bands produced by NSCT decomposition are processed using the dual-channel Rybak neural network and a consistency verification strategy, allowing for more accurate extraction and integration of salient details. Meanwhile, the low-frequency sub-bands are fused using a simple averaging approach to preserve the overall structure and brightness information. The effectiveness of the proposed method has been thoroughly evaluated through comprehensive qualitative and quantitative experiments conducted on three widely used public datasets: Lytro, MFFW, and MFI-WHU. Experimental results show that our method consistently outperforms several state-of-the-art image fusion techniques, including both traditional algorithms and deep learning-based approaches, in terms of visual quality and objective performance metrics (

Q_{A B / F}

,

Q_{C B}

,

Q_{E}

,

Q_{F M I}

,

Q_{M I}

,

Q_{M S E}

,

Q_{N C I E}

,

Q_{N M I}

,

Q_{P}

, and

Q_{P S N R}

). These results clearly demonstrate the robustness and superiority of the proposed fusion framework in handling multi-focus image fusion tasks. Full article

(This article belongs to the Special Issue Fractional Order Complex Systems: Advanced Control, Intelligent Estimation and Reinforcement Learning Image Processing Algorithms, Second Edition)

► Show Figures

Figure 1

19 pages, 7851 KiB

Open AccessArticle

Ship Plate Detection Algorithm Based on Improved RT-DETR

by Lei Zhang and Liuyi Huang

J. Mar. Sci. Eng. 2025, 13(7), 1277; https://doi.org/10.3390/jmse13071277 - 30 Jun 2025

Viewed by 288

Abstract

To address the challenges in ship plate detection under complex maritime scenarios—such as small target size, extreme aspect ratios, dense arrangements, and multi-angle rotations—this paper proposes a multi-module collaborative detection algorithm, RT-DETR-HPA, based on an enhanced RT-DETR framework. The proposed model integrates three [...] Read more.

To address the challenges in ship plate detection under complex maritime scenarios—such as small target size, extreme aspect ratios, dense arrangements, and multi-angle rotations—this paper proposes a multi-module collaborative detection algorithm, RT-DETR-HPA, based on an enhanced RT-DETR framework. The proposed model integrates three core components: an improved High-Frequency Enhanced Residual Block (HFERB) embedded in the backbone to strengthen multi-scale high-frequency feature fusion, with deformable convolution added to handle occlusion and deformation; a Pinwheel-shaped Convolution (PConv) module employing multi-directional convolution kernels to achieve rotation-adaptive local detail extraction and accurately capture plate edges and character features; and an Adaptive Sparse Self-Attention (ASSA) mechanism incorporated into the encoder to automatically focus on key regions while suppressing complex background interference, thereby enhancing feature discriminability. Comparative experiments conducted on a self-constructed dataset of 20,000 ship plate images show that, compared to the original RT-DETR, RT-DETR-HPA achieves a 3.36% improvement in mAP@50 (up to 97.12%), a 3.23% increase in recall (reaching 94.88%), and maintains real-time detection speed at 40.1 FPS. Compared with mainstream object detection models such as the YOLO series and Faster R-CNN, RT-DETR-HPA demonstrates significant advantages in high-precision localization, adaptability to complex scenarios, and real-time performance. It effectively reduces missed and false detections caused by low resolution, poor lighting, and dense occlusion, providing a robust and high-accuracy solution for intelligent ship supervision. Future work will focus on lightweight model design and dynamic resolution adaptation to enhance its applicability on mobile maritime surveillance platforms. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

Search Results (765)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (765)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI