MDPI - Publisher of Open Access Journals

31 pages, 12308 KB

Open AccessArticle

An Improved MSEM-Deeplabv3+ Method for Intelligent Detection of Rock Mass Fractures

by Chi Zhang, Shu Gan, Xiping Yuan, Weidong Luo, Chong Ma and Yi Li

Remote Sens. 2026, 18(7), 1041; https://doi.org/10.3390/rs18071041 - 30 Mar 2026

Viewed by 240

Fractures as critical discontinuous structural planes in rock masses, directly govern their stability and serve as the core controlling factor in rock mechanics engineering. Existing deep learning models for fracture extraction face persistent challenges, including imbalanced integration of deep and shallow features, limited [...] Read more.

Fractures as critical discontinuous structural planes in rock masses, directly govern their stability and serve as the core controlling factor in rock mechanics engineering. Existing deep learning models for fracture extraction face persistent challenges, including imbalanced integration of deep and shallow features, limited suppression of background noise, inadequate multi-scale feature representation, and large parameter sizes—making it difficult to strike a balance between detection accuracy and deployment efficiency. Focusing on the Wanshanshan quarry in Yunnan, this study first constructs a high-precision digital model using close-range photogrammetry and 3D real-scene reconstruction. A lightweight yet high-accuracy intelligent detection method, termed MSEM-Deeplabv3+, is then proposed for rock mass fracture extraction. The model adopts lightweight MobileNetV2 as the backbone network, incorporating inverted residual modules and depthwise separable convolutions, resulting in a parameter size of only 6.02 MB and FLOPs of 30.170 G—substantially reducing computational overhead. Furthermore, the proposed MAGF (Multi-Scale Attention Gated Fusion) and SCSA (Spatial-Channel Synergistic Attention) modules are integrated to enhance the representation of fracture details and semantic consistency while effectively suppressing multi-source and multi-scale background interference. Experimental results demonstrate that the proposed model achieves an mPA of 89.69%, mIoU of 83.71%, F1-Score of 90.41%, and Kappa coefficient of 80.81%, outperforming the classic Deeplabv3+ model by 5.81%, 6.18%, 4.53%, and 9.2%, respectively. It also significantly surpasses benchmark models such as U-Net and HRNet. The method accurately captures fine and continuous fracture details, preserves the spatial distribution of long-range continuous fractures, and maintains robust performance on the CFD cross-scene dataset, showcasing strong adaptability and generalization capability. This approach effectively mitigates the risks associated with manual high-altitude inspections and provides a lightweight, high-precision, non-contact intelligent solution for fracture detection in high-steep rock slopes. Full article

► Show Figures

Figure 1

20 pages, 3216 KB

Open AccessArticle

AMFA-DeepLab: An Improved Lightweight DeepLabV3+ Adaptive Multi-Statistic Fusion Attention Network for Sea Ice Segmentation in GaoFen-1 Images

by Zengzhou Hao, Xin Li, Qiankun Zhu, Yunzhou Li, Zhihua Mao, Jianyu Chen and Delu Pan

Remote Sens. 2026, 18(5), 783; https://doi.org/10.3390/rs18050783 - 4 Mar 2026

Viewed by 336

Abstract

For addressing difficult detail extraction and low operating efficiency in monitoring sea ice in a large area with wide-field-of-view images from the Chinese Gaofen-1 satellite, a lightweight, high-precision sea ice segmentation network adaptive multistatistic fusion attention (AMFA) module using DeepLabV3+ as the base [...] Read more.

For addressing difficult detail extraction and low operating efficiency in monitoring sea ice in a large area with wide-field-of-view images from the Chinese Gaofen-1 satellite, a lightweight, high-precision sea ice segmentation network adaptive multistatistic fusion attention (AMFA) module using DeepLabV3+ as the base architecture (AMFA-DeepLab) is proposed. First, the module replaces the backbone network with a lightweight MobileNetV2 to ensure feature extraction capability and greatly reduce model computational complexity using inverted residuals and depthwise separable convolution. Second, to solve the problems of fragmented ice texture blurring and speckle noise interference in optical images, an AMFA is designed and introduced into the decoder side. This module innovatively integrates the global median pooling branch and adapts the recalibrated feature weight through a dynamic channel mixing mechanism, effectively enhancing the model’s capability of capturing fine sea ice edge features and its antinoise robustness in complex backgrounds. Experimental results based on the dataset from Liaodong Bay in the Bohai Sea of China show that the intersection over union of AMFA-DeepLab reaches 92.15% and the F1-score reaches 95.91%, increases of 3.06%, and 1.68%, respectively, compared with those of the baseline model. In addition, only 5.85 million model parameters are needed, the training time is shortened to 4.42 h, and the inference speed is 281.76 frames per second. Visualized analysis and generalization test further demonstrates that this model can accurately eliminate clutter interference from coastal land and seawater and extract the fine filamentous structure of drift ice in the scene of complex melting ice. This research overcomes the precision bottleneck while achieving an ultimate lightweight model, providing efficient technical support for operational dynamic monitoring of sea ice disasters based on Chinese GaoFen-1 satellites. Full article

(This article belongs to the Topic AI for Natural Disasters Detection, Prediction and Modeling)

► Show Figures

Figure 1

28 pages, 4123 KB

Open AccessArticle

Nonlinear Impacts of Air Pollutants and Meteorological Factors on PM_2.5: An Interpretable GT-iFormer Model with SHAP Analysis

by Dong Li, Mengmeng Liu, Houzeng Han and Jian Wang

Atmosphere 2026, 17(3), 266; https://doi.org/10.3390/atmos17030266 - 3 Mar 2026

Viewed by 496

Abstract

Accurate prediction of PM_2.5 concentration is crucial for air quality management and public health protection. However, existing methods often struggle to capture and interpret the nonlinear relationships among multiple atmospheric variables. This study proposes GT-iFormer, a novel interpretable deep learning model that [...] Read more.

Accurate prediction of PM_2.5 concentration is crucial for air quality management and public health protection. However, existing methods often struggle to capture and interpret the nonlinear relationships among multiple atmospheric variables. This study proposes GT-iFormer, a novel interpretable deep learning model that integrates graph convolutional networks (GCNs), Temporal Convolutional Networks (TCNs), and inverted Transformer (iTransformer) for PM_2.5 concentration prediction. The model features a GTCN-Block that encapsulates GCN and TCN with residual-style fusion, preserving feature-level dependencies alongside temporal patterns to prevent information degradation. The Pearson correlation coefficients and KNN algorithm are innovatively integrated to build a data-driven graph structure, which allows GCNs to flexibly model the nonlinear relationships between pollutants and meteorological factors based on observed data. TCNs obtain multi-scale temporal patterns via causal dilated convolutions. Subsequently, the concatenated representations of GTCN-Block are input into iTransformer to model global inter-variable interactions using attention mechanisms along the axis of the variable. We incorporated SHAP (SHapley Additive exPlanations) analysis to expose feature importance and nonlinear relationships with PM_2.5 predictions. Results on the hour-level data of Beijing (2020–2021) and Shenzhen (2021) show that our proposed GT-iFormer surpasses all baseline models, with an RMSE of 8.781 μg/m³ and R² of 0.978 for Beijing, and an RMSE of 3.871 μg/m³ and R² of 0.957 for Shenzhen on single-step prediction, equating to RMSE reductions of 15.75% and 17.92%, respectively, over the best baseline model. The SHAP analysis shows clearly distinct regional patterns, with combustion sources dominant in Beijing (represented by CO at 28.231%), and traffic emissions dominant in Shenzhen (represented by NO₂ at 25.908%). Crucial threshold effects are established for all variables, with significant cross-city differences that can serve as general forecasts and guidance for city-specific air quality management policies. Full article

(This article belongs to the Section Air Quality)

► Show Figures

Figure 1

22 pages, 6921 KB

Open AccessArticle

SFE-DETR: An Enhanced Transformer-Based Face Detector for Small Target Faces in Open Complex Scenes

by Chenhao Yang, Yueming Jiang and Chunyan Song

Sensors 2026, 26(1), 125; https://doi.org/10.3390/s26010125 - 24 Dec 2025

Viewed by 626

Abstract

Face detection is an important task in the field of computer vision and is widely applied in various applications. However, in open and complex scenes with dense faces, occlusions, and image degradation, small face detection still faces significant challenges due to the extremely [...] Read more.

Face detection is an important task in the field of computer vision and is widely applied in various applications. However, in open and complex scenes with dense faces, occlusions, and image degradation, small face detection still faces significant challenges due to the extremely small target scale, difficult localization, and severe background interference. To address these issues, this paper proposes a small face detector for open complex scenes, SFE-DETR, which aims to simultaneously improve detection accuracy and computational efficiency. The backbone network of the model adopts an inverted residual shift convolution and dilated reparameterization structure, which enhances shallow features and enables deep feature self-adaptation, thereby better preserving small-scale information and reducing the number of parameters. Additionally, a multi-head multi-scale self-attention mechanism is introduced to fuse multi-scale convolutional features with channel-wise weighting, capturing fine-grained facial features while suppressing background noise. Moreover, a redesigned SFE-FPN introduces high-resolution layers and incorporates a novel feature fusion module consisting of local, large-scale, and global branches, efficiently aggregating multi-level features and significantly improving small face detection performance. Experimental results on two challenging small face detection datasets show that SFE-DETR reduces parameters by 28.1% compared to the original RT-DETR-R18 model, achieving a mAP50 of 94.7% and AP-s of 42.1% on the SCUT-HEAD dataset, and a mAP50 of 86.3% on the WIDER FACE (Hard) subset. These results demonstrate that SFE-DETR achieves optimal detection performance among models of the same scale while maintaining efficiency. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

17 pages, 3409 KB

Open AccessArticle

Efficient Image Segmentation of Coal Blocks Using an Improved DIRU-Net Model

by Jingyi Liu, Gaoxia Fan and Balaiti Maimutimin

Mathematics 2025, 13(21), 3541; https://doi.org/10.3390/math13213541 - 4 Nov 2025

Cited by 1 | Viewed by 611

Abstract

Coal block image segmentation is of great significance for obtaining the particle size distribution and specific gravity information of ores. However, the existing methods are limited by harsh environments, such as dust, complex shapes, and the uneven distribution of light, color and texture. [...] Read more.

Coal block image segmentation is of great significance for obtaining the particle size distribution and specific gravity information of ores. However, the existing methods are limited by harsh environments, such as dust, complex shapes, and the uneven distribution of light, color and texture. To address these challenges, based on the backbone of the U-Net encoder and decoder, and combining the characteristics of dilated convolution and inverted residual structures, we propose a lightweight deep convolutional network (DIRU-Net) for coal block image segmentation. We have also constructed a high-quality dataset of conveyor belt coal block images, solving the problem that there are currently no publicly available datasets. We comprehensively evaluated DIRU-Net in the coal block dataset and compared it with other state-of-the-art coal block segmentation methods. DIRU-Net outperforms all methods in terms of segmentation performance and lightweight. Among them, the segmentation accuracy rate reaches 94.8%, and the parameter size is only 0.77 MB. Full article

(This article belongs to the Special Issue Mathematical Methods for Image Processing and Computer Vision)

► Show Figures

Figure 1

29 pages, 8732 KB

Open AccessArticle

MFF-ClassificationNet: CNN-Transformer Hybrid with Multi-Feature Fusion for Breast Cancer Histopathology Classification

by Xiaoli Wang, Guowei Wang, Luhan Li, Hua Zou and Junpeng Cui

Biosensors 2025, 15(11), 718; https://doi.org/10.3390/bios15110718 - 29 Oct 2025

Cited by 2 | Viewed by 1415

Abstract

Breast cancer is one of the most prevalent malignant tumors among women worldwide, underscoring the urgent need for early and accurate diagnosis to reduce mortality. To address this, A Multi-Feature Fusion Classification Network (MFF-ClassificationNet) is proposed for breast histopathological image classification. The network [...] Read more.

Breast cancer is one of the most prevalent malignant tumors among women worldwide, underscoring the urgent need for early and accurate diagnosis to reduce mortality. To address this, A Multi-Feature Fusion Classification Network (MFF-ClassificationNet) is proposed for breast histopathological image classification. The network adopts a two-branch parallel architecture, where a convolutional neural network captures local details and a Transformer models global dependencies. Their features are deeply integrated through a Multi-Feature Fusion module, which incorporates a Convolutional Block Attention Module—Squeeze and Excitation (CBAM-SE) fusion block combining convolutional block attention, squeeze-and-excitation mechanisms, and a residual inverted multilayer perceptron to enhance fine-grained feature representation and category-specific lesion characterization. Experimental evaluations on the BreakHis dataset achieved accuracies of 98.30%, 97.62%, 98.81%, and 96.07% at magnifications of 40×, 100×, 200×, and 400×, respectively, while an accuracy of 97.50% was obtained on the BACH dataset. These results confirm that integrating local and global features significantly strengthens the model’s ability to capture multi-scale and context-aware information, leading to superior classification performance. Overall, MFF-ClassificationNet surpasses conventional single-path approaches and provides a robust, generalizable framework for advancing computer-aided diagnosis of breast cancer. Full article

(This article belongs to the Special Issue AI-Based Biosensors and Biomedical Imaging)

► Show Figures

Figure 1

28 pages, 945 KB

Open AccessArticle

Enhanced Heart Sound Detection via Multi-Scale Feature Extraction and Attention Mechanism Using Pitch-Shifting Data Augmentation

by Pengcheng Yue, Mingrong Dong and Yixuan Yang

Electronics 2025, 14(20), 4092; https://doi.org/10.3390/electronics14204092 - 17 Oct 2025

Viewed by 1012

Abstract

Cardiovascular diseases pose a major global health threat, making early automated detection through heart sound analysis crucial for their prevention. However, existing deep learning-based heart sound detection methods have shortcomings in feature extraction, and current attention mechanisms perform inadequately in capturing key heart [...] Read more.

Cardiovascular diseases pose a major global health threat, making early automated detection through heart sound analysis crucial for their prevention. However, existing deep learning-based heart sound detection methods have shortcomings in feature extraction, and current attention mechanisms perform inadequately in capturing key heart sound features. To address this, we first introduce a Multi-Scale Feature Extraction Network composed of Multi-Scale Inverted Residual (MIR) modules and Dynamically Gated Convolution (DGC) modules to extract heart sound features effectively. The MIR module can efficiently extract multi-scale heart sound features, and the DGC module enhances the network’s representation ability by capturing feature interrelationships and dynamically adjusting information flow. Subsequently, a Multi-Scale Attention Prediction Network is designed for heart sound feature classification, which includes a multi-scale attention (MSA) module. The MSA module effectively captures subtle pathological features of heart sound signals through multi-scale feature extraction and cross-scale feature interaction. Additionally, pitch-shifting techniques are applied in the preprocessing stage to enhance the model’s generalization ability, and multiple feature extraction techniques are used for initial feature extraction of heart sounds. Evaluated via five-fold cross-validation, the model achieved accuracies of 98.89% and 98.86% on the PhysioNet/CinC 2016 and 2022 datasets, respectively, demonstrating superior performance and strong potential for clinical application. Full article

► Show Figures

Figure 1

22 pages, 3632 KB

Open AccessArticle

RFR-YOLO-Based Recognition Method for Dairy Cow Behavior in Farming Environments

by Congcong Li, Jialong Ma, Shifeng Cao and Leifeng Guo

Agriculture 2025, 15(18), 1952; https://doi.org/10.3390/agriculture15181952 - 15 Sep 2025

Cited by 2 | Viewed by 1594

Abstract

Cow behavior recognition constitutes a fundamental element of effective cow health monitoring and intelligent farming systems. Within large-scale cow farming environments, several critical challenges persist, including the difficulty in accurately capturing behavioral feature information, substantial variations in multi-scale features, and high inter-class similarity [...] Read more.

Cow behavior recognition constitutes a fundamental element of effective cow health monitoring and intelligent farming systems. Within large-scale cow farming environments, several critical challenges persist, including the difficulty in accurately capturing behavioral feature information, substantial variations in multi-scale features, and high inter-class similarity among different cow behaviors. To address these limitations, this study introduces an enhanced target detection algorithm for cow behavior recognition, termed RFR-YOLO, which is developed upon the YOLOv11n framework. A well-structured dataset encompassing nine distinct cow behaviors—namely, lying, standing, walking, eating, drinking, licking, grooming, estrus, and limping—is constructed, comprising a total of 13,224 labeled samples. The proposed algorithm incorporates three major technical improvements: First, an Inverted Dilated Convolution module (Region Semantic Inverted Convolution, RsiConv) is designed and seamlessly integrated with the C3K2 module to form the C3K2_Rsi module, which effectively reduces computational overhead while enhancing feature representation. Second, a Four-branch Multi-scale Dilated Attention mechanism (Four Multi-Scale Dilated Attention, FMSDA) is incorporated into the network architecture, enabling the scale-specific features to align with the corresponding receptive fields, thereby improving the model’s capacity to capture multi-scale characteristics. Third, a Reparameterized Generalized Residual Feature Pyramid Network (Reparameterized Generalized Residual-FPN, RepGRFPN) is introduced as the Neck component, allowing for the features to propagate through differentiated pathways and enabling flexible control over multi-scale feature expression, thereby facilitating efficient feature fusion and mitigating the impact of behavioral similarity. The experimental results demonstrate that RFR-YOLO achieves precision, recall, mAP50, and mAP50:95 values of 95.9%, 91.2%, 94.9%, and 85.2%, respectively, representing performance gains of 5.5%, 5%, 5.6%, and 3.5% over the baseline model. Despite a marginal increase in computational complexity of 1.4G, the algorithm retains a high detection speed of 147.6 frames per second. The proposed RFR-YOLO algorithm significantly improves the accuracy and robustness of target detection in group cow farming scenarios. Full article

(This article belongs to the Section Farm Animal Production)

► Show Figures

Figure 1

19 pages, 11410 KB

Open AccessArticle

A Pool Drowning Detection Model Based on Improved YOLO

by Wenhui Zhang, Lu Chen and Jianchun Shi

Sensors 2025, 25(17), 5552; https://doi.org/10.3390/s25175552 - 5 Sep 2025

Cited by 1 | Viewed by 3032

Abstract

Drowning constitutes the leading cause of injury-related fatalities among adolescents. In swimming pool environments, traditional manual surveillance exhibits limitations, while existing technologies suffer from poor adaptability of wearable devices. Vision models based on YOLO still face challenges in edge deployment efficiency, robustness in [...] Read more.

Drowning constitutes the leading cause of injury-related fatalities among adolescents. In swimming pool environments, traditional manual surveillance exhibits limitations, while existing technologies suffer from poor adaptability of wearable devices. Vision models based on YOLO still face challenges in edge deployment efficiency, robustness in complex water conditions, and multi-scale object detection. To address these issues, we propose YOLO11-LiB, a drowning object detection model based on YOLO11n, featuring three key enhancements. First, we design the Lightweight Feature Extraction Module (LGCBlock), which integrates the Lightweight Attention Encoding Block (LAE) and effectively combines Ghost Convolution (GhostConv) with dynamic convolution (DynamicConv). This optimizes the downsampling structure and the C3k2 module in the YOLO11n backbone network, significantly reducing model parameters and computational complexity. Second, we introduce the Cross-Channel Position-aware Spatial Attention Inverted Residual with Spatial–Channel Separate Attention module (C2PSAiSCSA) into the backbone. This module embeds the Spatial–Channel Separate Attention (SCSA) mechanism within the Inverted Residual Mobile Block (iRMB) framework, enabling more comprehensive and efficient feature extraction. Finally, we redesign the neck structure as the Bidirectional Feature Fusion Network (BiFF-Net), which integrates the Bidirectional Feature Pyramid Network (BiFPN) and Frequency-Aware Feature Fusion (FreqFusion). The enhanced YOLO11-LiB model was validated against mainstream algorithms through comparative experiments, and ablation studies were conducted. Experimental results demonstrate that YOLO11-LiB achieves a drowning class mean average precision (DmAP50) of 94.1%, with merely 2.02 M parameters and a model size of 4.25 MB. This represents an effective balance between accuracy and efficiency, providing a high-performance solution for real-time drowning detection in swimming pool scenarios. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

22 pages, 8053 KB

Open AccessArticle

Rolling Bearing Fault Diagnosis Based on Fractional Constant Q Non-Stationary Gabor Transform and VMamba-Conv

by Fengyun Xie, Chengjie Song, Yang Wang, Minghua Song, Shengtong Zhou and Yuanwei Xie

Fractal Fract. 2025, 9(8), 515; https://doi.org/10.3390/fractalfract9080515 - 6 Aug 2025

Viewed by 1239

Abstract

Rolling bearings are prone to failure, meaning that research on intelligent fault diagnosis is crucial in relation to this key transmission component in rotating machinery. The application of deep learning (DL) has significantly advanced the development of intelligent fault diagnosis. This paper proposes [...] Read more.

Rolling bearings are prone to failure, meaning that research on intelligent fault diagnosis is crucial in relation to this key transmission component in rotating machinery. The application of deep learning (DL) has significantly advanced the development of intelligent fault diagnosis. This paper proposes a novel method for rolling bearing fault diagnosis based on the fractional constant Q non-stationary Gabor transform (FCO-NSGT) and VMamba-Conv. Firstly, a rolling bearing fault experimental platform is established and the vibration signals of rolling bearings under various working conditions are collected using an acceleration sensor. Secondly, a kurtosis-to-entropy ratio (KER) method and the rotational kernel function of the fractional Fourier transform (FRFT) are proposed and applied to the original CO-NSGT to overcome the limitations of the original CO-NSGT, such as the unsatisfactory time–frequency representation due to manual parameter setting and the energy dispersion problem of frequency-modulated signals that vary with time. A lightweight fault diagnosis model, VMamba-Conv, is proposed, which is a restructured version of VMamba. It integrates an efficient selective scanning mechanism, a state space model, and a convolutional network based on SimAX into a dual-branch architecture and uses inverted residual blocks to achieve a lightweight design while maintaining strong feature extraction capabilities. Finally, the time–frequency graph is inputted into VMamba-Conv to diagnose rolling bearing faults. This approach reduces the number of parameters, as well as the computational complexity, while ensuring high accuracy and excellent noise resistance. The results show that the proposed method has excellent fault diagnosis capabilities, with an average accuracy of 99.81%. By comparing the Adjusted Rand Index, Normalized Mutual Information, F1 Score, and accuracy, it is concluded that the proposed method outperforms other comparison methods, demonstrating its effectiveness and superiority. Full article

(This article belongs to the Special Issue Implementations and Applications of Algorithms Based on Fractional Calculus to Engineering Problems)

► Show Figures

Figure 1

17 pages, 1927 KB

Open AccessArticle

ConvTransNet-S: A CNN-Transformer Hybrid Disease Recognition Model for Complex Field Environments

by Shangyun Jia, Guanping Wang, Hongling Li, Yan Liu, Linrong Shi and Sen Yang

Plants 2025, 14(15), 2252; https://doi.org/10.3390/plants14152252 - 22 Jul 2025

Cited by 7 | Viewed by 2980

Abstract

To address the challenges of low recognition accuracy and substantial model complexity in crop disease identification models operating in complex field environments, this study proposed a novel hybrid model named ConvTransNet-S, which integrates Convolutional Neural Networks (CNNs) and transformers for crop disease identification [...] Read more.

To address the challenges of low recognition accuracy and substantial model complexity in crop disease identification models operating in complex field environments, this study proposed a novel hybrid model named ConvTransNet-S, which integrates Convolutional Neural Networks (CNNs) and transformers for crop disease identification tasks. Unlike existing hybrid approaches, ConvTransNet-S uniquely introduces three key innovations: First, a Local Perception Unit (LPU) and Lightweight Multi-Head Self-Attention (LMHSA) modules were introduced to synergistically enhance the extraction of fine-grained plant disease details and model global dependency relationships, respectively. Second, an Inverted Residual Feed-Forward Network (IRFFN) was employed to optimize the feature propagation path, thereby enhancing the model’s robustness against interferences such as lighting variations and leaf occlusions. This novel combination of a LPU, LMHSA, and an IRFFN achieves a dynamic equilibrium between local texture perception and global context modeling—effectively resolving the trade-offs inherent in standalone CNNs or transformers. Finally, through a phased architecture design, efficient fusion of multi-scale disease features is achieved, which enhances feature discriminability while reducing model complexity. The experimental results indicated that ConvTransNet-S achieved a recognition accuracy of 98.85% on the PlantVillage public dataset. This model operates with only 25.14 million parameters, a computational load of 3.762 GFLOPs, and an inference time of 7.56 ms. Testing on a self-built in-field complex scene dataset comprising 10,441 images revealed that ConvTransNet-S achieved an accuracy of 88.53%, which represents improvements of 14.22%, 2.75%, and 0.34% over EfficientNetV2, Vision Transformer, and Swin Transformer, respectively. Furthermore, the ConvTransNet-S model achieved up to 14.22% higher disease recognition accuracy under complex background conditions while reducing the parameter count by 46.8%. This confirms that its unique multi-scale feature mechanism can effectively distinguish disease from background features, providing a novel technical approach for disease diagnosis in complex agricultural scenarios and demonstrating significant application value for intelligent agricultural management. Full article

(This article belongs to the Section Plant Modeling)

► Show Figures

Figure 1

23 pages, 8232 KB

Open AccessArticle

Intelligent Identification of Tea Plant Seedlings Under High-Temperature Conditions via YOLOv11-MEIP Model Based on Chlorophyll Fluorescence Imaging

by Chun Wang, Zejun Wang, Lijiao Chen, Weihao Liu, Xinghua Wang, Zhiyong Cao, Jinyan Zhao, Man Zou, Hongxu Li, Wenxia Yuan and Baijuan Wang

Plants 2025, 14(13), 1965; https://doi.org/10.3390/plants14131965 - 27 Jun 2025

Cited by 5 | Viewed by 1268

Abstract

To achieve an efficient, non-destructive, and intelligent identification of tea plant seedlings under high-temperature stress, this study proposes an improved YOLOv11 model based on chlorophyll fluorescence imaging technology for intelligent identification. Using tea plant seedlings under varying degrees of high temperature as the [...] Read more.

To achieve an efficient, non-destructive, and intelligent identification of tea plant seedlings under high-temperature stress, this study proposes an improved YOLOv11 model based on chlorophyll fluorescence imaging technology for intelligent identification. Using tea plant seedlings under varying degrees of high temperature as the research objects, raw fluorescence images were acquired through a chlorophyll fluorescence image acquisition device. The fluorescence parameters obtained by Spearman correlation analysis were found to be the maximum photochemical efficiency (Fv/Fm), and the fluorescence image of this parameter is used to construct the dataset. The YOLOv11 model was improved in the following ways. First, to reduce the number of network parameters and maintain a low computational cost, the lightweight MobileNetV4 network was introduced into the YOLOv11 model as a new backbone network. Second, to achieve efficient feature upsampling, enhance the efficiency and accuracy of feature extraction, and reduce computational redundancy and memory access volume, the EUCB (Efficient Up Convolution Block), iRMB (Inverted Residual Mobile Block), and PConv (Partial Convolution) modules were introduced into the YOLOv11 model. The research results show that the improved YOLOv11-MEIP model has the best performance, with precision, recall, and mAP50 reaching 99.25%, 99.19%, and 99.46%, respectively. Compared with the YOLOv11 model, the improved YOLOv11-MEIP model achieved increases of 4.05%, 7.86%, and 3.42% in precision, recall, and mAP50, respectively. Additionally, the number of model parameters was reduced by 29.45%. This study provides a new intelligent method for the classification of high-temperature stress levels of tea seedlings, as well as state detection and identification, and provides new theoretical support and technical reference for the monitoring and prevention of tea plants and other crops in tea gardens under high temperatures. Full article

(This article belongs to the Special Issue Practical Applications of Chlorophyll Fluorescence Measurements)

► Show Figures

Figure 1

27 pages, 6771 KB

Open AccessArticle

A Deep Neural Network Framework for Dynamic Two-Handed Indian Sign Language Recognition in Hearing and Speech-Impaired Communities

by Vaidhya Govindharajalu Kaliyaperumal and Paavai Anand Gopalan

Sensors 2025, 25(12), 3652; https://doi.org/10.3390/s25123652 - 11 Jun 2025

Cited by 1 | Viewed by 1382

Abstract

Language is that kind of expression by which effective communication with another can be well expressed. One may consider such as a connecting bridge for bridging communication gaps for the hearing- and speech-impaired, even though it remains as an advanced method for hand [...] Read more.

Language is that kind of expression by which effective communication with another can be well expressed. One may consider such as a connecting bridge for bridging communication gaps for the hearing- and speech-impaired, even though it remains as an advanced method for hand gesture expression along with identification through the various different unidentified signals to configure their palms. This challenge can be met with a novel Enhanced Convolutional Transformer with Adaptive Tuna Swarm Optimization (ECT-ATSO) recognition framework proposed for double-handed sign language. In order to improve both model generalization and image quality, preprocessing is applied to images prior to prediction, and the proposed dataset is organized to handle multiple dynamic words. Feature graining is employed to obtain local features, and the ViT transformer architecture is then utilized to capture global features from the preprocessed images. After concatenation, this generates a feature map that is then divided into various words using an Inverted Residual Feed-Forward Network (IRFFN). Using the Tuna Swarm Optimization (TSO) algorithm in its enhanced form, the provided Enhanced Convolutional Transformer (ECT) model is optimally tuned to handle the problem dimensions with convergence problem parameters. In order to solve local optimization constraints when adjusting the position for the tuna update process, a mutation operator was introduced. The dataset visualization that demonstrates the best effectiveness compared to alternative cutting-edge methods, recognition accuracy, and convergences serves as a means to measure performance of this suggested framework. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

24 pages, 2119 KB

Open AccessFeature PaperArticle

Multimodal Medical Image Fusion Using a Progressive Parallel Strategy Based on Deep Learning

by Peng Peng and Yaohua Luo

Electronics 2025, 14(11), 2266; https://doi.org/10.3390/electronics14112266 - 31 May 2025

Cited by 5 | Viewed by 6160

Abstract

Multimodal medical image fusion plays a critical role in enhancing diagnostic accuracy by integrating complementary information from different imaging modalities. However, existing methods often suffer from issues such as unbalanced feature fusion, structural blurring, loss of fine details, and limited global semantic modeling, [...] Read more.

Multimodal medical image fusion plays a critical role in enhancing diagnostic accuracy by integrating complementary information from different imaging modalities. However, existing methods often suffer from issues such as unbalanced feature fusion, structural blurring, loss of fine details, and limited global semantic modeling, particularly in low signal-to-noise modalities like PET. To address these challenges, we propose PPMF-Net, a novel progressive and parallel deep learning framework for PET–MRI image fusion. The network employs a hierarchical multi-path architecture to capture local details, global semantics, and high-frequency information in a coordinated manner. Specifically, it integrates three key modules: (1) a Dynamic Edge-Enhanced Module (DEEM) utilizing inverted residual blocks and channel attention to sharpen edge and texture features, (2) a Nonlinear Interactive Feature Extraction module (NIFE) that combines convolutional operations with element-wise multiplication to enable cross-modal feature coupling, and (3) a Transformer-Enhanced Global Modeling module (TEGM) with hybrid local–global attention to improve long-range dependency and structural consistency. A multi-objective unsupervised loss function is designed to jointly optimize structural fidelity, functional complementarity, and detail clarity. Experimental results on the Harvard MIF dataset demonstrate that PPMF-Net outperforms state-of-the-art methods across multiple metrics—achieving SF: 38.27, SD: 96.55, SCD: 1.62, and MS-SSIM: 1.14—and shows strong generalization and robustness in tasks such as SPECT–MRI and CT–MRI fusion, indicating its promising potential for clinical applications. Full article

(This article belongs to the Special Issue AI-Driven Medical Image/Video Processing)

► Show Figures

Figure 1

20 pages, 3955 KB

Open AccessArticle

Lightweight Pepper Disease Detection Based on Improved YOLOv8n

by Yuzhu Wu, Junjie Huang, Siji Wang, Yujian Bao, Yizhe Wang, Jia Song and Wenwu Liu

AgriEngineering 2025, 7(5), 153; https://doi.org/10.3390/agriengineering7050153 - 12 May 2025

Cited by 4 | Viewed by 2575

Abstract

China is the world’s largest producer of chili peppers, which occupy particularly important economic and social values in various fields such as medicine, food, and industry. However, during its production process, chili peppers are affected by pests and diseases, resulting in significant yield [...] Read more.

China is the world’s largest producer of chili peppers, which occupy particularly important economic and social values in various fields such as medicine, food, and industry. However, during its production process, chili peppers are affected by pests and diseases, resulting in significant yield reduction due to the temperature and environment. In this study, a lightweight pepper disease identification method, DD-YOLO, based on the YOLOv8n model, is proposed. First, the deformable convolutional module DCNv2 (Deformable ConvNetsv2) and the inverted residual mobile block iRMB (Inverted Residual Mobile Block) are introduced into the C2F module to improve the accuracy of the sampling range and reduce the computational amount. Secondly, the DySample sampling operator (Dynamic Sample) is integrated into the head network to reduce the amount of data and the complexity of computation. Finally, we use Large Separable Kernel Attention (LSKA) to improve the SPPF module (Spatial Pyramid Pooling Fast) to enhance the performance of multi-scale feature fusion. The experimental results show that the accuracy, recall, and average precision of the DD-YOLO model are 91.6%, 88.9%, and 94.4%, respectively. Compared with the base network YOLOv8n, it improves 6.2, 2.3, and 2.8 percentage points, respectively. The model weight is reduced by 22.6%, and the number of floating-point operations per second is improved by 11.1%. This method provides a technical basis for intensive cultivation and management of chili peppers, as well as efficiently and cost-effectively accomplishing the task of identifying chili pepper pests and diseases. Full article

(This article belongs to the Topic Digital Agriculture, Smart Farming and Crop Monitoring)

► Show Figures

Figure 1

Search Results (54)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (54)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI