MDPI - Publisher of Open Access Journals

24 pages, 6437 KiB

Open AccessArticle

LEAD-YOLO: A Lightweight and Accurate Network for Small Object Detection in Autonomous Driving

by Yunchuan Yang, Shubin Yang and Qiqing Chan

Sensors 2025, 25(15), 4800; https://doi.org/10.3390/s25154800 (registering DOI) - 4 Aug 2025

The accurate detection of small objects remains a critical challenge in autonomous driving systems, where improving detection performance typically comes at the cost of increased model complexity, conflicting with the lightweight requirements of edge deployment. To address this dilemma, this paper proposes LEAD-YOLO [...] Read more.

The accurate detection of small objects remains a critical challenge in autonomous driving systems, where improving detection performance typically comes at the cost of increased model complexity, conflicting with the lightweight requirements of edge deployment. To address this dilemma, this paper proposes LEAD-YOLO (Lightweight Efficient Autonomous Driving YOLO), an enhanced network architecture based on YOLOv11n that achieves superior small object detection while maintaining computational efficiency. The proposed framework incorporates three innovative components: First, the Backbone integrates a lightweight Convolutional Gated Transformer (CGF) module, which employs normalized gating mechanisms with residual connections, and a Dilated Feature Fusion (DFF) structure that enables progressive multi-scale context modeling through dilated convolutions. These components synergistically enhance small object perception and environmental context understanding without compromising network efficiency. Second, the neck features a hierarchical feature fusion module (HFFM) that establishes guided feature aggregation paths through hierarchical structuring, facilitating collaborative modeling between local structural information and global semantics for robust multi-scale object detection in complex traffic scenarios. Third, the head implements a shared feature detection head (SFDH) structure, incorporating shared convolution modules for efficient cross-scale feature sharing and detail enhancement branches for improved texture and edge modeling. Extensive experiments validate the effectiveness of LEAD-YOLO: on the nuImages dataset, the method achieves 3.8% and 5.4% improvements in mAP@0.5 and mAP@[0.5:0.95], respectively, while reducing parameters by 24.1%. On the VisDrone2019 dataset, performance gains reach 7.9% and 6.4% for corresponding metrics. These findings demonstrate that LEAD-YOLO achieves an excellent balance between detection accuracy and model efficiency, thereby showcasing substantial potential for applications in autonomous driving. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

19 pages, 2276 KiB

Open AccessArticle

Segmentation of Stone Slab Cracks Based on an Improved YOLOv8 Algorithm

by Qitao Tian, Runshu Peng and Fuzeng Wang

Appl. Sci. 2025, 15(15), 8610; https://doi.org/10.3390/app15158610 (registering DOI) - 3 Aug 2025

Abstract

To tackle the challenges of detecting complex cracks on large stone slabs with noisy textures, this paper presents the first domain-optimized framework for stone slab cracks, an improved semantic segmentation model (YOLOv8-Seg) synergistically integrating U-NetV2, DSConv, and DySample. The network uses the lightweight [...] Read more.

To tackle the challenges of detecting complex cracks on large stone slabs with noisy textures, this paper presents the first domain-optimized framework for stone slab cracks, an improved semantic segmentation model (YOLOv8-Seg) synergistically integrating U-NetV2, DSConv, and DySample. The network uses the lightweight U-NetV2 backbone combined with dynamic feature recalibration and multi-scale refinement to better capture fine crack details. The dynamic up-sampling module (DySample) helps to adaptively reconstruct curved boundaries. In addition, the dynamic snake convolution head (DSConv) improves the model’s ability to follow irregular crack shapes. Experiments on the custom-built ST stone crack dataset show that YOLOv8-Seg achieves an mAP@0.5 of 0.856 and an mAP@0.5–0.95 of 0.479. The model also reaches a mean intersection over union (MIoU) of 79.17%, outperforming both baseline and mainstream segmentation models. Ablation studies confirm the value of each module. Comparative tests and industrial validation demonstrate stable performance across different stone materials and textures and a 30% false-positive reduction in real production environments. Overall, YOLOv8-Seg greatly improves segmentation accuracy and robustness in industrial crack detection on natural stone slabs, offering a strong solution for intelligent visual inspection in real-world applications. Full article

► Show Figures

Figure 1

30 pages, 1142 KiB

Open AccessReview

Beyond the Backbone: A Quantitative Review of Deep-Learning Architectures for Tropical Cyclone Track Forecasting

by He Huang, Difei Deng, Liang Hu, Yawen Chen and Nan Sun

Remote Sens. 2025, 17(15), 2675; https://doi.org/10.3390/rs17152675 - 2 Aug 2025

Viewed by 57

Abstract

Accurate forecasting of tropical cyclone (TC) tracks is critical for disaster preparedness and risk mitigation. While traditional numerical weather prediction (NWP) systems have long served as the backbone of operational forecasting, they face limitations in computational cost and sensitivity to initial conditions. In [...] Read more.

Accurate forecasting of tropical cyclone (TC) tracks is critical for disaster preparedness and risk mitigation. While traditional numerical weather prediction (NWP) systems have long served as the backbone of operational forecasting, they face limitations in computational cost and sensitivity to initial conditions. In recent years, deep learning (DL) has emerged as a promising alternative, offering data-driven modeling capabilities for capturing nonlinear spatiotemporal patterns. This paper presents a comprehensive review of DL-based approaches for TC track forecasting. We categorize all DL-based TC tracking models according to the architecture, including recurrent neural networks (RNNs), convolutional neural networks (CNNs), Transformers, graph neural networks (GNNs), generative models, and Fourier-based operators. To enable rigorous performance comparison, we introduce a Unified Geodesic Distance Error (UGDE) metric that standardizes evaluation across diverse studies and lead times. Based on this metric, we conduct a critical comparison of state-of-the-art models and identify key insights into their relative strengths, limitations, and suitable application scenarios. Building on this framework, we conduct a critical cross-model analysis that reveals key trends, performance disparities, and architectural tradeoffs. Our analysis also highlights several persistent challenges, such as long-term forecast degradation, limited physical integration, and generalization to extreme events, pointing toward future directions for developing more robust and operationally viable DL models for TC track forecasting. To support reproducibility and facilitate standardized evaluation, we release an open-source UGDE conversion tool on GitHub. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

22 pages, 4300 KiB

Open AccessArticle

Optimised DNN-Based Agricultural Land Cover Mapping Using Sentinel-2 and Landsat-8 with Google Earth Engine

by Nisha Sharma, Sartajvir Singh and Kawaljit Kaur

Land 2025, 14(8), 1578; https://doi.org/10.3390/land14081578 - 1 Aug 2025

Viewed by 191

Abstract

Agriculture is the backbone of Punjab’s economy, and with much of India’s population dependent on agriculture, the requirement for accurate and timely monitoring of land has become even more crucial. Blending remote sensing with state-of-the-art machine learning algorithms enables the detailed classification of [...] Read more.

Agriculture is the backbone of Punjab’s economy, and with much of India’s population dependent on agriculture, the requirement for accurate and timely monitoring of land has become even more crucial. Blending remote sensing with state-of-the-art machine learning algorithms enables the detailed classification of agricultural lands through thematic mapping, which is critical for crop monitoring, land management, and sustainable development. Here, a Hyper-tuned Deep Neural Network (Hy-DNN) model was created and used for land use and land cover (LULC) classification into four classes: agricultural land, vegetation, water bodies, and built-up areas. The technique made use of multispectral data from Sentinel-2 and Landsat-8, processed on the Google Earth Engine (GEE) platform. To measure classification performance, Hy-DNN was contrasted with traditional classifiers—Convolutional Neural Network (CNN), Random Forest (RF), Classification and Regression Tree (CART), Minimum Distance Classifier (MDC), and Naive Bayes (NB)—using performance metrics including producer’s and consumer’s accuracy, Kappa coefficient, and overall accuracy. Hy-DNN performed the best, with overall accuracy being 97.60% using Sentinel-2 and 91.10% using Landsat-8, outperforming all base models. These results further highlight the superiority of the optimised Hy-DNN in agricultural land mapping and its potential use in crop health monitoring, disease diagnosis, and strategic agricultural planning. Full article

(This article belongs to the Special Issue Advances on Land Cover/Land Use Ontologies for Innovative Production/Utilization of Land Information)

► Show Figures

Figure 1

21 pages, 3136 KiB

Open AccessReview

The Role of Genomic Islands in the Pathogenicity and Evolution of Plant-Pathogenic Gammaproteobacteria

by Yuta Watanabe, Yasuhiro Ishiga and Nanami Sakata

Microorganisms 2025, 13(8), 1803; https://doi.org/10.3390/microorganisms13081803 - 1 Aug 2025

Viewed by 72

Abstract

Genomic islands (GIs) including integrative and conjugative elements (ICEs), prophages, and integrative plasmids are central drivers of horizontal gene transfer in bacterial plant pathogens. These elements often carry cargo genes encoding virulence factors, antibiotic and metal resistance determinants, and metabolic functions that enhance [...] Read more.

Genomic islands (GIs) including integrative and conjugative elements (ICEs), prophages, and integrative plasmids are central drivers of horizontal gene transfer in bacterial plant pathogens. These elements often carry cargo genes encoding virulence factors, antibiotic and metal resistance determinants, and metabolic functions that enhance environmental adaptability. In plant-pathogenic species such as Pseudomonas syringae, GIs contribute to host specificity, immune evasion, and the emergence of novel pathogenic variants. ICEclc and its homologs represent integrative and mobilizable elements whose tightly regulated excision and transfer are driven by a specialized transcriptional cascade, while ICEs in P. syringae highlight the ecological impact of cargo genes on pathogen virulence and fitness. Pathogenicity islands further modulate virulence gene expression in response to in planta stimuli. Beyond P. syringae, GIs in genera such as Erwinia, Pectobacterium, and Ralstonia underpin critical traits like toxin biosynthesis, secretion system acquisition, and topoisomerase-mediated stability. Leveraging high-throughput genomics and structural biology will be essential to dissect GI regulation and develop targeted interventions to curb disease spread. This review synthesizes the current understanding of GIs in plant-pathogenic gammaproteobacteria and outlines future research priorities for translating mechanistic insights into sustainable disease control strategies. Full article

(This article belongs to the Special Issue Exploring Foliar Plant–Bacterial Pathogen Interactions and Innovative Control Approaches)

► Show Figures

Figure 1

22 pages, 24173 KiB

Open AccessArticle

ScaleViM-PDD: Multi-Scale EfficientViM with Physical Decoupling and Dual-Domain Fusion for Remote Sensing Image Dehazing

by Hao Zhou, Yalun Wang, Wanting Peng, Xin Guan and Tao Tao

Remote Sens. 2025, 17(15), 2664; https://doi.org/10.3390/rs17152664 - 1 Aug 2025

Viewed by 154

Abstract

Remote sensing images are often degraded by atmospheric haze, which not only reduces image quality but also complicates information extraction, particularly in high-level visual analysis tasks such as object detection and scene classification. State-space models (SSMs) have recently emerged as a powerful paradigm [...] Read more.

Remote sensing images are often degraded by atmospheric haze, which not only reduces image quality but also complicates information extraction, particularly in high-level visual analysis tasks such as object detection and scene classification. State-space models (SSMs) have recently emerged as a powerful paradigm for vision tasks, showing great promise due to their computational efficiency and robust capacity to model global dependencies. However, most existing learning-based dehazing methods lack physical interpretability, leading to weak generalization. Furthermore, they typically rely on spatial features while neglecting crucial frequency domain information, resulting in incomplete feature representation. To address these challenges, we propose ScaleViM-PDD, a novel network that enhances an SSM backbone with two key innovations: a Multi-scale EfficientViM with Physical Decoupling (ScaleViM-P) module and a Dual-Domain Fusion (DD Fusion) module. The ScaleViM-P module synergistically integrates a Physical Decoupling block within a Multi-scale EfficientViM architecture. This design enables the network to mitigate haze interference in a physically grounded manner at each representational scale while simultaneously capturing global contextual information to adaptively handle complex haze distributions. To further address detail loss, the DD Fusion module replaces conventional skip connections by incorporating a novel Frequency Domain Module (FDM) alongside channel and position attention. This allows for a more effective fusion of spatial and frequency features, significantly improving the recovery of fine-grained details, including color and texture information. Extensive experiments on nine publicly available remote sensing datasets demonstrate that ScaleViM-PDD consistently surpasses state-of-the-art baselines in both qualitative and quantitative evaluations, highlighting its strong generalization ability. Full article

(This article belongs to the Special Issue Artificial Intelligence Algorithm for Remote Sensing Imagery Processing (5th Edition))

► Show Figures

Figure 1

25 pages, 10331 KiB

Open AccessArticle

Forest Fire Detection Method Based on Dual-Branch Multi-Scale Adaptive Feature Fusion Network

by Qinggan Wu, Chen Wei, Ning Sun, Xiong Xiong, Qingfeng Xia, Jianmeng Zhou and Xingyu Feng

Forests 2025, 16(8), 1248; https://doi.org/10.3390/f16081248 - 31 Jul 2025

Viewed by 166

Abstract

There are significant scale and morphological differences between fire and smoke features in forest fire detection. This paper proposes a detection method based on dual-branch multi-scale adaptive feature fusion network (DMAFNet). In this method, convolutional neural network (CNN) and transformer are used to [...] Read more.

There are significant scale and morphological differences between fire and smoke features in forest fire detection. This paper proposes a detection method based on dual-branch multi-scale adaptive feature fusion network (DMAFNet). In this method, convolutional neural network (CNN) and transformer are used to form a dual-branch backbone network to extract local texture and global context information, respectively. In order to overcome the difference in feature distribution and response scale between the two branches, a feature correction module (FCM) is designed. Through space and channel correction mechanisms, the adaptive alignment of two branch features is realized. The Fusion Feature Module (FFM) is further introduced to fully integrate dual-branch features based on the two-way cross-attention mechanism and effectively suppress redundant information. Finally, the Multi-Scale Fusion Attention Unit (MSFAU) is designed to enhance the multi-scale detection capability of fire targets. Experimental results show that the proposed DMAFNet has significantly improved in mAP (mean average precision) indicators compared with existing mainstream detection methods. Full article

(This article belongs to the Section Natural Hazards and Risk Management)

► Show Figures

Figure 1

18 pages, 74537 KiB

Open AccessArticle

SDA-YOLO: Multi-Scale Dynamic Branching and Attention Fusion for Self-Explosion Defect Detection in Insulators

by Zhonghao Yang, Wangping Xu, Nanxing Chen, Yifu Chen, Kaijun Wu, Min Xie, Hong Xu and Enhui Zheng

Electronics 2025, 14(15), 3070; https://doi.org/10.3390/electronics14153070 - 31 Jul 2025

Viewed by 158

Abstract

To enhance the performance of UAVs in detecting insulator self-explosion defects during power inspections, this paper proposes an insulator self-explosion defect recognition algorithm, SDA-YOLO, based on an improved YOLOv11s network. First, the SODL is added to YOLOv11 to fuse shallow features with deeper [...] Read more.

To enhance the performance of UAVs in detecting insulator self-explosion defects during power inspections, this paper proposes an insulator self-explosion defect recognition algorithm, SDA-YOLO, based on an improved YOLOv11s network. First, the SODL is added to YOLOv11 to fuse shallow features with deeper features, thereby improving the model’s focus on small-sized self-explosion defect features. The OBB is also employed to reduce interference from the complex background. Second, the DBB module is incorporated into the C3k2 module in the backbone to extract target features through a multi-branch parallel convolutional structure. Finally, the AIFI module replaces the C2PSA module, effectively directing and aggregating information between channels to improve detection accuracy and inference speed. The experimental results show that the average accuracy of SDA-YOLO reaches 96.0%, which is higher than the YOLOv11s baseline model of 6.6%. While maintaining high accuracy, the inference speed of SDA-YOLO can reach 93.6 frames/s, which achieves the purpose of the real-time detection of insulator faults. Full article

► Show Figures

Figure 1

24 pages, 4039 KiB

Open AccessReview

A Mathematical Survey of Image Deep Edge Detection Algorithms: From Convolution to Attention

by Gang Hu

Mathematics 2025, 13(15), 2464; https://doi.org/10.3390/math13152464 - 31 Jul 2025

Viewed by 235

Abstract

Edge detection, a cornerstone of computer vision, identifies intensity discontinuities in images, enabling applications from object recognition to autonomous navigation. This survey presents a mathematically grounded analysis of edge detection’s evolution, spanning traditional gradient-based methods, convolutional neural networks (CNNs), attention-driven architectures, transformer-backbone models, [...] Read more.

Edge detection, a cornerstone of computer vision, identifies intensity discontinuities in images, enabling applications from object recognition to autonomous navigation. This survey presents a mathematically grounded analysis of edge detection’s evolution, spanning traditional gradient-based methods, convolutional neural networks (CNNs), attention-driven architectures, transformer-backbone models, and generative paradigms. Beginning with Sobel and Canny’s kernel-based approaches, we trace the shift to data-driven CNNs like Holistically Nested Edge Detection (HED) and Bidirectional Cascade Network (BDCN), which leverage multi-scale supervision and achieve ODS (Optimal Dataset Scale) scores 0.788 and 0.806, respectively. Attention mechanisms, as in EdgeNAT (ODS 0.860) and RankED (ODS 0.824), enhance global context, while generative models like GED (ODS 0.870) achieve state-of-the-art precision via diffusion and GAN frameworks. Evaluated on BSDS500 and NYUDv2, these methods highlight a trajectory toward accuracy and robustness, yet challenges in efficiency, generalization, and multi-modal integration persist. By synthesizing mathematical formulations, performance metrics, and future directions, this survey equips researchers with a comprehensive understanding of edge detection’s past, present, and potential, bridging theoretical insights with practical advancements. Full article

(This article belongs to the Special Issue Artificial Intelligence and Algorithms with Their Applications)

► Show Figures

Figure 1

26 pages, 62045 KiB

Open AccessArticle

CML-RTDETR: A Lightweight Wheat Head Detection and Counting Algorithm Based on the Improved RT-DETR

by Yue Fang, Chenbo Yang, Chengyong Zhu, Hao Jiang, Jingmin Tu and Jie Li

Electronics 2025, 14(15), 3051; https://doi.org/10.3390/electronics14153051 - 30 Jul 2025

Viewed by 147

Abstract

Wheat is one of the important grain crops, and spike counting is crucial for predicting spike yield. However, in complex farmland environments, the wheat body scale has huge differences, its color is highly similar to the background, and wheat ears often overlap with [...] Read more.

Wheat is one of the important grain crops, and spike counting is crucial for predicting spike yield. However, in complex farmland environments, the wheat body scale has huge differences, its color is highly similar to the background, and wheat ears often overlap with each other, which makes wheat ear detection work face a lot of challenges. At the same time, the increasing demand for high accuracy and fast response in wheat spike detection has led to the need for models to be lightweight function with reduced the hardware costs. Therefore, this study proposes a lightweight wheat ear detection model, CML-RTDETR, for efficient and accurate detection of wheat ears in real complex farmland environments. In the model construction, the lightweight network CSPDarknet is firstly introduced as the backbone network of CML-RTDETR to enhance the feature extraction efficiency. In addition, the FM module is cleverly introduced to modify the bottleneck layer in the C2f component, and hybrid feature extraction is realized by spatial and frequency domain splicing to enhance the feature extraction capability of wheat to be tested in complex scenes. Secondly, to improve the model’s detection capability for targets of different scales, a multi-scale feature enhancement pyramid (MFEP) is designed, consisting of GHSDConv, for efficiently obtaining low-level detail information and CSPDWOK for constructing a multi-scale semantic fusion structure. Finally, channel pruning based on Layer-Adaptive Magnitude Pruning (LAMP) scoring is performed to reduce model parameters and runtime memory. The experimental results on the GWHD2021 dataset show that the

{AP}_{50}

of CML-RTDETR reaches 90.5%, which is an improvement of 1.2% compared to the baseline RTDETR-R18 model. Meanwhile, the parameters and GFLOPs have been decreased to 11.03 M and 37.8 G, respectively, resulting in a reduction of 42% and 34%, respectively. Finally, the real-time frame rate reaches 73 fps, significantly achieving parameter simplification and speed improvement. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

23 pages, 3835 KiB

Open AccessArticle

Computational Saturation Mutagenesis Reveals Pathogenic and Structural Impacts of Missense Mutations in Adducin Proteins

by Lennon Meléndez-Aranda, Jazmin Moreno Pereyda and Marina M. J. Romero-Prado

Genes 2025, 16(8), 916; https://doi.org/10.3390/genes16080916 - 30 Jul 2025

Viewed by 266

Abstract

Background and objectives: Adducins are cytoskeletal proteins essential for membrane stability, actin–spectrin network organization, and cell signaling. Mutations in the genes ADD1, ADD2, and ADD3 have been linked to hypertension, neurodevelopmental disorders, and cancer. However, no comprehensive in silico saturation [...] Read more.

Background and objectives: Adducins are cytoskeletal proteins essential for membrane stability, actin–spectrin network organization, and cell signaling. Mutations in the genes ADD1, ADD2, and ADD3 have been linked to hypertension, neurodevelopmental disorders, and cancer. However, no comprehensive in silico saturation mutagenesis study has systematically evaluated the pathogenic potential and structural consequences of all possible missense mutations in adducins. This study aimed to identify high-risk variants and their potential impact on protein stability and function. Methods: We performed computational saturation mutagenesis for all possible single amino acid substitutions across the adducin proteins family. Pathogenicity predictions were conducted using four independent tools: AlphaMissense, Rhapsody, PolyPhen-2, and PMut. Predictions were validated against UniProt-annotated pathogenic variants. Predictive performance was assessed using Cohen’s Kappa, sensitivity, and precision. Mutations with a prediction probability ≥ 0.8 were further analyzed for structural stability using mCSM, DynaMut2, MutPred2, and Missense3D, with particular focus on functionally relevant domains such as phosphorylation and calmodulin-binding sites. Results: PMut identified the highest number of pathogenic mutations, while PolyPhen-2 yielded more conservative predictions. Several high-risk mutations clustered in known regulatory and binding regions. Substitutions involving glycine were consistently among the most destabilizing due to increased backbone flexibility. Validated variants showed strong agreement across multiple tools, supporting the robustness of the analysis. Conclusions: This study highlights the utility of multi-tool bioinformatic strategies for comprehensive mutation profiling. The results provide a prioritized list of high-impact adducin variants for future experimental validation and offer insights into potential therapeutic targets for disorders involving ADD1, ADD2, and ADD3 mutations. Full article

(This article belongs to the Section Bioinformatics)

► Show Figures

Graphical abstract

30 pages, 5307 KiB

Open AccessArticle

Self-Normalizing Multi-Omics Neural Network for Pan-Cancer Prognostication

by Asim Waqas, Aakash Tripathi, Sabeen Ahmed, Ashwin Mukund, Hamza Farooq, Joseph O. Johnson, Paul A. Stewart, Mia Naeini, Matthew B. Schabath and Ghulam Rasool

Int. J. Mol. Sci. 2025, 26(15), 7358; https://doi.org/10.3390/ijms26157358 - 30 Jul 2025

Viewed by 240

Abstract

Prognostic markers such as overall survival (OS) and tertiary lymphoid structure (TLS) ratios, alongside diagnostic signatures like primary cancer-type classification, provide critical information for treatment selection, risk stratification, and longitudinal care planning across the oncology continuum. However, extracting these signals solely from sparse, [...] Read more.

Prognostic markers such as overall survival (OS) and tertiary lymphoid structure (TLS) ratios, alongside diagnostic signatures like primary cancer-type classification, provide critical information for treatment selection, risk stratification, and longitudinal care planning across the oncology continuum. However, extracting these signals solely from sparse, high-dimensional multi-omics data remains a major challenge due to heterogeneity and frequent missingness in patient profiles. To address this challenge, we present SeNMo, a self-normalizing deep neural network trained on five heterogeneous omics layers—gene expression, DNA methylation, miRNA abundance, somatic mutations, and protein expression—along with the clinical variables, that learns a unified representation robust to missing modalities. Trained on more than 10,000 patient profiles across 32 tumor types from The Cancer Genome Atlas (TCGA), SeNMo provides a baseline that can be readily fine-tuned for diverse downstream tasks. On a held-out TCGA test set, the model achieved a concordance index of 0.758 for OS prediction, while external evaluation yielded 0.73 on the CPTAC lung squamous cell carcinoma cohort and 0.66 on an independent 108-patient Moffitt Cancer Center cohort. Furthermore, on Moffitt’s cohort, baseline SeNMo fine-tuned for TLS ratio prediction aligned with expert annotations (p < 0.05) and sharply separated high- versus low-TLS groups, reflecting distinct survival outcomes. Without altering the backbone, a single linear head classified primary cancer type with 99.8% accuracy across the 33 classes. By unifying diagnostic and prognostic predictions in a modality-robust architecture, SeNMo demonstrated strong performance across multiple clinically relevant tasks, including survival estimation, cancer classification, and TLS ratio prediction, highlighting its translational potential for multi-omics oncology applications. Full article

(This article belongs to the Section Molecular Pathology, Diagnostics, and Therapeutics)

► Show Figures

Figure 1

35 pages, 4940 KiB

Open AccessArticle

A Novel Lightweight Facial Expression Recognition Network Based on Deep Shallow Network Fusion and Attention Mechanism

by Qiaohe Yang, Yueshun He, Hongmao Chen, Youyong Wu and Zhihua Rao

Algorithms 2025, 18(8), 473; https://doi.org/10.3390/a18080473 - 30 Jul 2025

Viewed by 281

Abstract

Facial expression recognition (FER) is a critical research direction in artificial intelligence, which is widely used in intelligent interaction, medical diagnosis, security monitoring, and other domains. These applications highlight its considerable practical value and social significance. Face expression recognition models often need to [...] Read more.

Facial expression recognition (FER) is a critical research direction in artificial intelligence, which is widely used in intelligent interaction, medical diagnosis, security monitoring, and other domains. These applications highlight its considerable practical value and social significance. Face expression recognition models often need to run efficiently on mobile devices or edge devices, so the research on lightweight face expression recognition is particularly important. However, feature extraction and classification methods of lightweight convolutional neural network expression recognition algorithms mostly used at present are not specifically and fully optimized for the characteristics of facial expression images, yet fail to make full use of the feature information in face expression images. To address the lack of facial expression recognition models that are both lightweight and effectively optimized for expression-specific feature extraction, this study proposes a novel network design tailored to the characteristics of facial expressions. In this paper, we refer to the backbone architecture of MobileNet V2 network, and redesign LightExNet, a lightweight convolutional neural network based on the fusion of deep and shallow layers, attention mechanism, and joint loss function, according to the characteristics of the facial expression features. In the network architecture of LightExNet, firstly, deep and shallow features are fused in order to fully extract the shallow features in the original image, reduce the loss of information, alleviate the problem of gradient disappearance when the number of convolutional layers increases, and achieve the effect of multi-scale feature fusion. The MobileNet V2 architecture has also been streamlined to seamlessly integrate deep and shallow networks. Secondly, by combining the own characteristics of face expression features, a new channel and spatial attention mechanism is proposed to obtain the feature information of different expression regions as much as possible for encoding. Thus improve the accuracy of expression recognition effectively. Finally, the improved center loss function is superimposed to further improve the accuracy of face expression classification results, and corresponding measures are taken to significantly reduce the computational volume of the joint loss function. In this paper, LightExNet is tested on the three mainstream face expression datasets: Fer2013, CK+ and RAF-DB, respectively, and the experimental results show that LightExNet has 3.27 M Parameters and 298.27 M Flops, and the accuracy on the three datasets is 69.17%, 97.37%, and 85.97%, respectively. The comprehensive performance of LightExNet is better than the current mainstream lightweight expression recognition algorithms such as MobileNet V2, IE-DBN, Self-Cure Net, Improved MobileViT, MFN, Ada-CM, Parallel CNN(Convolutional Neural Network), etc. Experimental results confirm that LightExNet effectively improves recognition accuracy and computational efficiency while reducing energy consumption and enhancing deployment flexibility. These advantages underscore its strong potential for real-world applications in lightweight facial expression recognition. Full article

► Show Figures

Figure 1

24 pages, 14323 KiB

Open AccessArticle

GTDR-YOLOv12: Optimizing YOLO for Efficient and Accurate Weed Detection in Agriculture

by Zhaofeng Yang, Zohaib Khan, Yue Shen and Hui Liu

Agronomy 2025, 15(8), 1824; https://doi.org/10.3390/agronomy15081824 - 28 Jul 2025

Viewed by 356

Abstract

Weed infestation contributes significantly to global agricultural yield loss and increases the reliance on herbicides, raising both economic and environmental concerns. Effective weed detection in agriculture requires high accuracy and architectural efficiency. This is particularly important under challenging field conditions, including densely clustered [...] Read more.

Weed infestation contributes significantly to global agricultural yield loss and increases the reliance on herbicides, raising both economic and environmental concerns. Effective weed detection in agriculture requires high accuracy and architectural efficiency. This is particularly important under challenging field conditions, including densely clustered targets, small weed instances, and low visual contrast between vegetation and soil. In this study, we propose GTDR-YOLOv12, an improved object detection framework based on YOLOv12, tailored for real-time weed identification in complex agricultural environments. The model is evaluated on the publicly available Weeds Detection dataset, which contains a wide range of weed species and challenging visual scenarios. To achieve better accuracy and efficiency, GTDR-YOLOv12 introduces several targeted structural enhancements. The backbone incorporates GDR-Conv, which integrates Ghost convolution and Dynamic ReLU (DyReLU) to improve early-stage feature representation while reducing redundancy. The GTDR-C3 module combines GDR-Conv with Task-Dependent Attention Mechanisms (TDAMs), allowing the network to adaptively refine spatial features critical for accurate weed identification and localization. In addition, the Lookahead optimizer is employed during training to improve convergence efficiency and reduce computational overhead, thereby contributing to the model’s lightweight design. GTDR-YOLOv12 outperforms several representative detectors, including YOLOv7, YOLOv9, YOLOv10, YOLOv11, YOLOv12, ATSS, RTMDet and Double-Head. Compared with YOLOv12, GTDR-YOLOv12 achieves notable improvements across multiple evaluation metrics. Precision increases from 85.0% to 88.0%, recall from 79.7% to 83.9%, and F1-score from 82.3% to 85.9%. In terms of detection accuracy, mAP:0.5 improves from 87.0% to 90.0%, while mAP:0.5:0.95 rises from 58.0% to 63.8%. Furthermore, the model reduces computational complexity. GFLOPs drop from 5.8 to 4.8, and the number of parameters is reduced from 2.51 M to 2.23 M. These reductions reflect a more efficient network design that not only lowers model complexity but also enhances detection performance. With a throughput of 58 FPS on the NVIDIA Jetson AGX Xavier, GTDR-YOLOv12 proves both resource-efficient and deployable for practical, real-time weeding tasks in agricultural settings. Full article

(This article belongs to the Section Weed Science and Weed Management)

► Show Figures

Figure 1

25 pages, 2518 KiB

Open AccessArticle

An Efficient Semantic Segmentation Framework with Attention-Driven Context Enhancement and Dynamic Fusion for Autonomous Driving

by Jia Tian, Peizeng Xin, Xinlu Bai, Zhiguo Xiao and Nianfeng Li

Appl. Sci. 2025, 15(15), 8373; https://doi.org/10.3390/app15158373 - 28 Jul 2025

Viewed by 310

Abstract

In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where [...] Read more.

In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where strict real-time performance is essential. Achieving an effective balance between speed and accuracy has thus become a central challenge in this field. To address this issue, we present a lightweight semantic segmentation model tailored for the perception requirements of autonomous vehicles. The architecture follows an encoder–decoder paradigm, which not only preserves the capability for deep feature extraction but also facilitates multi-scale information integration. The encoder leverages a high-efficiency backbone, while the decoder introduces a dynamic fusion mechanism designed to enhance information interaction between different feature branches. Recognizing the limitations of convolutional networks in modeling long-range dependencies and capturing global semantic context, the model incorporates an attention-based feature extraction component. This is further augmented by positional encoding, enabling better awareness of spatial structures and local details. The dynamic fusion mechanism employs an adaptive weighting strategy, adjusting the contribution of each feature channel to reduce redundancy and improve representation quality. To validate the effectiveness of the proposed network, experiments were conducted on a single RTX 3090 GPU. The Dynamic Real-time Integrated Vision Encoder–Segmenter Network (DriveSegNet) achieved a mean Intersection over Union (mIoU) of 76.9% and an inference speed of 70.5 FPS on the Cityscapes test dataset, 74.6% mIoU and 139.8 FPS on the CamVid test dataset, and 35.8% mIoU with 108.4 FPS on the ADE20K dataset. The experimental results demonstrate that the proposed method achieves an excellent balance between inference speed, segmentation accuracy, and model size. Full article

► Show Figures

Figure 1

Search Results (2,849)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (2,849)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI