Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (4,402)

Search Parameters:
Keywords = multi-scale Attention

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 4105 KB  
Article
Robust Dual-Stream Diagnosis Network for Ultrasound Breast Tumor Classification with Cross-Domain Segmentation Priors
by Xiaokai Jiang, Xuewen Ding, Jinying Ma, Chunyu Liu and Xinyi Li
Sensors 2026, 26(3), 974; https://doi.org/10.3390/s26030974 (registering DOI) - 2 Feb 2026
Abstract
Ultrasound imaging is widely used for early breast cancer screening to enhance patient survival. However, interpreting these images is inherently challenging due to speckle noise, low lesion-to-tissue contrast, and highly variable tumor morphology within complex anatomical structures. Additionally, variations in image characteristics across [...] Read more.
Ultrasound imaging is widely used for early breast cancer screening to enhance patient survival. However, interpreting these images is inherently challenging due to speckle noise, low lesion-to-tissue contrast, and highly variable tumor morphology within complex anatomical structures. Additionally, variations in image characteristics across institutions and devices further impede the development of robust and generalizable computer-aided diagnostic systems. To alleviate these issues, this paper presents a cross-domain segmentation prior guided classification strategy for robust breast tumor diagnosis in ultrasound imaging, implemented through a novel Dual-Stream Diagnosis Network (DSDNet). DSDNet adopts a decoupled dual-stream architecture, where a frozen segmentation branch supplies spatial priors to guide the classification backbone. This design enables stable and accurate performance across diverse imaging conditions and clinical settings. To realize the proposed DSDNet framework, three novel modules are created. The Dual-Stream Mask Attention (DSMA) module enhances lesion priors by jointly modeling foreground and background cues. The Segmentation Prior Guidance Fusion (SPGF) module integrates multi-scale priors into the classification backbone using cross-domain spatial cues, improving tumor morphology representation. The Mamba-Inspired Linear Transformer (MILT) block, built upon the Mamba-Inspired Linear Attention (MILA) mechanism, serves as an efficient attention-based feature extractor. On the BUSI, BUS, and GDPH_SYSUCC datasets, DSDNet achieves ACC values of 0.878, 0.836, and 0.882, and Recall scores of 0.866, 0.789, and 0.878, respectively. These results highlight the effectiveness and strong classification performance of our method in ultrasound breast cancer diagnosis. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

48 pages, 4817 KB  
Review
Design and Application of Stimuli-Responsive Hydrogels for 4D Printing: A Review of Adaptive Materials in Engineering
by Muhammad F. Siddique, Farag K. Omar and Ali H. Al-Marzouqi
Gels 2026, 12(2), 138; https://doi.org/10.3390/gels12020138 (registering DOI) - 2 Feb 2026
Abstract
Stimuli-responsive hydrogels are an emerging class of smart materials with immense potential across biomedical engineering, soft robotics, environmental systems, and advanced manufacturing. In this review, we present an in-depth exploration of their material design, classification, fabrication strategies, and real-world applications. We examine how [...] Read more.
Stimuli-responsive hydrogels are an emerging class of smart materials with immense potential across biomedical engineering, soft robotics, environmental systems, and advanced manufacturing. In this review, we present an in-depth exploration of their material design, classification, fabrication strategies, and real-world applications. We examine how a wide range of external stimuli—such as temperature, pH, moisture, ions, electricity, magnetism, redox conditions, and light—interact with polymer composition and crosslinking chemistry to shape the responsive behavior of hydrogels. Special attention is given to the growing field of 4D printing, where time-dependent shape and property changes enable dynamic, programmable systems. Unlike existing reviews that often treat materials, stimuli, or applications in isolation, this work introduces a multidimensional comparative framework that connects stimulus-response behavior with fabrication techniques and end-use domains. We also highlight key challenges that limit practical deployment—including mechanical fragility, slow actuation, and scale-up difficulties—and outline engineering solutions such as hybrid material design, anisotropic structuring, and multi-stimuli integration. Our aim is to offer a forward-looking perspective that bridges material innovation with functional design, serving as a resource for researchers and engineers working to develop next-generation adaptive systems. Full article
(This article belongs to the Special Issue 3D Printing of Gel-Based Materials (2nd Edition))
Show Figures

Figure 1

27 pages, 2010 KB  
Article
Image Captioning Using Enhanced Cross-Modal Attention with Multi-Scale Aggregation for Social Hotspot and Public Opinion Monitoring
by Shan Jiang, Yingzhao Chen, Rilige Chaomu and Zheng Liu
Inventions 2026, 11(1), 13; https://doi.org/10.3390/inventions11010013 (registering DOI) - 2 Feb 2026
Abstract
Large volumes of images shared on social media have made image captioning an important tool for social hotspot identification and public opinion monitoring, where accurate visual–language alignment is essential for reliable analysis. However, existing image captioning models based on BLIP-2 (Bootstrapped Language–Image Pre-training) [...] Read more.
Large volumes of images shared on social media have made image captioning an important tool for social hotspot identification and public opinion monitoring, where accurate visual–language alignment is essential for reliable analysis. However, existing image captioning models based on BLIP-2 (Bootstrapped Language–Image Pre-training) often struggle with complex, context-rich, and socially meaningful images in real-world social media scenarios, mainly due to insufficient cross-modal interaction, redundant visual token representations, and an inadequate ability to capture multi-scale semantic cues. As a result, the generated captions tend to be incomplete or less informative. To address these limitations, this paper proposes ECMA (Enhanced Cross-Modal Attention), a lightweight module integrated into the Querying Transformer (Q-Former) of BLIP-2. ECMA enhances cross-modal interaction through bidirectional attention between visual features and query tokens, enabling more effective information exchange, while a multi-scale visual aggregation strategy is introduced to model semantic representations at different levels of abstraction. In addition, a semantic residual gating mechanism is designed to suppress redundant information while preserving task-relevant features. ECMA can be seamlessly incorporated into BLIP-2 without modifying the original architecture or fine-tuning the vision encoder or the large language model, and is fully compatible with OPT (Open Pre-trained Transformer)-based variants. Experimental results on the COCO (Common Objects in Context) benchmark demonstrate consistent performance improvements, where ECMA improves the CIDEr (Consensus-based Image Description Evaluation) score from 144.6 to 146.8 and the BLEU-4 score from 42.5 to 43.9 on the OPT-6.7B model, corresponding to relative gains of 1.52% and 3.29%, respectively, while also achieving competitive METEOR (Metric for Evaluation of Translation with Explicit Ordering) scores. Further evaluations on social media datasets show that ECMA generates more coherent, context-aware, and socially informative captions, particularly for images involving complex interactions and socially meaningful scenes. Full article
23 pages, 10699 KB  
Article
YOLOv11-IMP: Anchor-Free Multiscale Detection Model for Accurate Grape Yield Estimation in Precision Viticulture
by Shaoxiong Zheng, Xiaopei Yang, Peng Gao, Qingwen Guo, Jiahong Zhang, Shihong Chen and Yunchao Tang
Agronomy 2026, 16(3), 370; https://doi.org/10.3390/agronomy16030370 - 2 Feb 2026
Abstract
Estimating grape yields in viticulture is hindered by persistent challenges, including strong occlusion between grapes, irregular cluster morphologies, and fluctuating illumination throughout the growing season. This study introduces YOLOv11-IMP, an improved multiscale anchor-free detection framework extending YOLOv11, tailored to vineyard environments. Its architecture [...] Read more.
Estimating grape yields in viticulture is hindered by persistent challenges, including strong occlusion between grapes, irregular cluster morphologies, and fluctuating illumination throughout the growing season. This study introduces YOLOv11-IMP, an improved multiscale anchor-free detection framework extending YOLOv11, tailored to vineyard environments. Its architecture comprises five specialized components: (i) a viticulture-oriented backbone employing cross-stage partial fusion with depthwise convolutions for enriched feature extraction, (ii) a bifurcated neck enhanced by large-kernel attention to expand the receptive field coverage, (iii) a scale-adaptive anchor-free detection head for robust multiscale localization, (iv) a cross-modal processing module integrating visual features with auxiliary textual descriptors to enable fine-grained cluster-level yield estimation, and (v) aross multiple scales. This work evaluated YOLOv11-IMP on five grape varieties collecten augmented spatial pyramid pooling module that aggregates contextual information acd under diverse environmental conditions. The framework achieved 94.3% precision and 93.5% recall for cluster detection, with a mean absolute error (MAE) of 0.46 kg per vine. The robustness tests found less than 3.4% variation in accuracy across lighting and weather conditions. These results demonstrate that YOLOv11-IMP can deliver high-fidelity, real-time yield data, supporting decision-making for precision viticulture and sustainable agricultural management. Full article
(This article belongs to the Special Issue Innovations in Agriculture for Sustainable Agro-Systems)
Show Figures

Figure 1

26 pages, 6232 KB  
Article
MFE-YOLO: A Multi-Scale Feature Enhanced Network for PCB Defect Detection with Cross-Group Attention and FIoU Loss
by Ruohai Di, Hao Fan, Hanxiao Feng, Zhigang Lv, Lei Shu, Rui Xie and Ruoyu Qian
Entropy 2026, 28(2), 174; https://doi.org/10.3390/e28020174 - 2 Feb 2026
Abstract
The detection of defects in Printed Circuit Boards (PCBs) is a critical yet challenging task in industrial quality control, characterized by the prevalence of small targets and complex backgrounds. While deep learning models like YOLOv5 have shown promise, they often lack the ability [...] Read more.
The detection of defects in Printed Circuit Boards (PCBs) is a critical yet challenging task in industrial quality control, characterized by the prevalence of small targets and complex backgrounds. While deep learning models like YOLOv5 have shown promise, they often lack the ability to quantify predictive uncertainty, leading to overconfident errors in challenging scenarios—a major source of false alarms and reduced reliability in automated manufacturing inspection lines. From a Bayesian perspective, this overconfidence signifies a failure in probabilistic calibration, which is crucial for trustworthy automated inspection. To address this, we propose MFE-YOLO, a Bayesian-enhanced detection framework built upon YOLOv5 that systematically integrates uncertainty-aware mechanisms to improve both accuracy and operational reliability in real-world settings. First, we construct a multi-background PCB defect dataset with diverse substrate colors and shapes, enhancing the model’s ability to generalize beyond the single-background bias of existing data. Second, we integrate the Convolutional Block Attention Module (CBAM), reinterpreted through a Bayesian lens as a feature-wise uncertainty weighting mechanism, to suppress background interference and amplify salient defect features. Third, we propose a novel FIoU loss function, redesigned within a probabilistic framework to improve bounding box regression accuracy and implicitly capture localization uncertainty, particularly for small defects. Extensive experiments demonstrate that MFE-YOLO achieves state-of-the-art performance, with mAP@0.5 and mAP@0.5:0.95 values of 93.9% and 59.6%, respectively, outperforming existing detectors, including YOLOv8 and EfficientDet. More importantly, the proposed framework yields better-calibrated confidence scores, significantly reducing false alarms and enabling more reliable human-in-the-loop verification. This work provides a deployable, uncertainty-aware solution for high-throughput PCB inspection, advancing toward trustworthy and efficient quality control in modern manufacturing environments. Full article
(This article belongs to the Special Issue Bayesian Networks and Causal Discovery)
Show Figures

Figure 1

18 pages, 11148 KB  
Article
YOLO-DSNet for Small Target Detection
by Haokun Xu, Huangleshuai He, Qike Zhi, Zhengyi Yang and Bocheng Han
Appl. Sci. 2026, 16(3), 1493; https://doi.org/10.3390/app16031493 - 2 Feb 2026
Abstract
Small target detection in Unmanned Aerial Vehicle (UAV) applications is often plagued by inherent challenges such as small object sizes, sparse information, and complex background interference. Traditional detection algorithms and existing YOLO series models suffer from limitations in detection accuracy and fine-grained detail [...] Read more.
Small target detection in Unmanned Aerial Vehicle (UAV) applications is often plagued by inherent challenges such as small object sizes, sparse information, and complex background interference. Traditional detection algorithms and existing YOLO series models suffer from limitations in detection accuracy and fine-grained detail preservation. To address this, this paper proposes YOLO-DSNet, a small target detection network based on YOLOv13n. First, we introduce the dual-stream attention module (DSAM), which enhances discriminative features by leveraging bidirectional context modeling. Second, we design the Multi-scale Attention C2f (MSA-C2f) module—an adaptive architecture that optimizes feature extraction via multi-scale enhancement, effectively preserving and integrating small target information. Finally, through dataset augmentation, we significantly improve the model’s detection performance. The proposed YOLO-DSNet achieves a mAP@0.5 improvement from 30.8% to 40.1% on the VisDrone2019 dataset with only 0.8 million additional parameters, yielding a 30% accuracy gain while increasing computational overhead by merely 11.6 Gigaflops (GFLOPs). Experiments demonstrate YOLO-DSNet’s effectiveness in small target detection tasks such as UAV aerial photography and remote sensing imagery, successfully balancing accuracy and efficiency with high practical value. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

27 pages, 4367 KB  
Article
MTFE-Net: A Deep Learning Vision Model for Surface Roughness Extraction Based on the Combination of Texture Features and Deep Learning Features
by Qiancheng Jin, Wangzhe Du, Huaxin Liu, Xuwei Li, Xiaomiao Niu, Yaxing Liu, Jiang Ji, Mingjun Qiu and Yuanming Liu
Metals 2026, 16(2), 179; https://doi.org/10.3390/met16020179 - 2 Feb 2026
Abstract
Surface roughness, critically measured by the Arithmetical Mean Roughness (Ra), is a vital determinant of workpiece functional performance. Traditional contact-based measurement methods are inefficient and unsuitable for online inspection. While machine vision offers a promising alternative, existing approaches lack robustness, and pure deep [...] Read more.
Surface roughness, critically measured by the Arithmetical Mean Roughness (Ra), is a vital determinant of workpiece functional performance. Traditional contact-based measurement methods are inefficient and unsuitable for online inspection. While machine vision offers a promising alternative, existing approaches lack robustness, and pure deep learning models suffer from poor interpretability. Therefore, MTFE-Net is proposed, which is a novel deep learning framework for surface roughness classification. The key innovation of MTFE-Net lies in its effective integration of traditional texture feature analysis with deep learning within a dual-branch architecture. The MTFE (Multi-dimensional Texture Feature Extraction) branch innovatively combines a comprehensive suite of texture descriptors including Gray-Level Co-occurrence Matrix (GLCM), gray-level difference statistic, first-order statistic, Tamura texture features, wavelet transform, and Local Binary Pattern (LBP). This multi-scale, multi-perspective feature extraction strategy overcomes the limitations of methods that focus on only specific texture aspects. These texture features are then refined using Multi-Head Self-Attention (MHA) mechanism and Mamba model. Experiments on a dataset of Q235 steel surfaces show that MTFE-Net achieves state-of-the-art performance with 95.23% accuracy, 94.89% precision, 94.67% recall and 94.74% F1-score, significantly outperforming comparable models. The results validate that the fusion strategy effectively enhances accuracy and robustness, providing a powerful solution for industrial non-contact roughness inspection. Full article
(This article belongs to the Section Computation and Simulation on Metals)
Show Figures

Figure 1

24 pages, 3790 KB  
Article
An Edge-Deployable Lightweight Intrusion Detection System for Industrial Control
by Zhenxiong Zhang, Lei Zhang, Jialong Xu, Zhengze Chen and Peng Wang
Electronics 2026, 15(3), 644; https://doi.org/10.3390/electronics15030644 - 2 Feb 2026
Abstract
Industrial Control Systems (ICSs), critical to infrastructure, face escalating cyber threats under Industry 4.0, yet existing intrusion detection methods are hindered by attack sample scarcity, spatiotemporal heterogeneity of industrial protocols, and resource constraints of embedded devices. This paper proposes a four-stage closed-loop intrusion [...] Read more.
Industrial Control Systems (ICSs), critical to infrastructure, face escalating cyber threats under Industry 4.0, yet existing intrusion detection methods are hindered by attack sample scarcity, spatiotemporal heterogeneity of industrial protocols, and resource constraints of embedded devices. This paper proposes a four-stage closed-loop intrusion detection framework for ICSs, with its core innovations integrating the following key components: First, a protocol-conditioned Conditional Generative Adversarial Network (CTGAN) is designed to synthesize realistic attack traffic by enforcing industrial protocol constraints and validating syntax through dual-path discriminators, ensuring generated traffic adheres to protocol specifications. Second, a three-tiered sliding window encoder transforms raw network flows into structured RGB images, capturing protocol syntax, device states, and temporal autocorrelation to enable multiresolution spatiotemporal analysis. Third, an Efficient Multiscale Attention Visual State Space Model (EMA-VSSM) is developed by integrating gate-enhanced state-space layers with multiscale attention mechanisms and contrastive learning, enhancing threat detection through improved long-range dependency modeling and spatial–temporal correlation capture. Finally, a lightweight EMA-VSSM student model, developed via hierarchical distillation, achieves a model compression rate of 64.8% and an inference efficiency enhancement of approximately 30% relative to the original model. Experimental results on a real-world ICS dataset demonstrate that this lightweight model attains an accuracy of 98.20% with a False Negative Rate (FNR) of 0.0316, outperforming state-of-the-art baseline methods such as XGBoost and Swin Transformer. By effectively balancing protocol compliance, multi-resolution feature extraction, and computational efficiency, this framework enables real-time deployment on resource-constrained ICS controllers. Full article
Show Figures

Figure 1

25 pages, 4090 KB  
Article
TPHFC-Net—A Triple-Path Heterogeneous Feature Collaboration Network for Enhancing Motor Imagery Classification
by Yuchen Jin, Chunxu Dou, Dingran Wang and Chao Liu
Technologies 2026, 14(2), 96; https://doi.org/10.3390/technologies14020096 (registering DOI) - 2 Feb 2026
Abstract
Electroencephalography-based motor imagery (EEG-MI) classification is a cornerstone of Brain–Computer Interface (BCI) systems, enabling the identification of motor intentions by decoding neural patterns within EEG signals. However, conventional methods, predominantly reliant on convolutional neural networks (CNNs), are proficient at extracting local temporal features [...] Read more.
Electroencephalography-based motor imagery (EEG-MI) classification is a cornerstone of Brain–Computer Interface (BCI) systems, enabling the identification of motor intentions by decoding neural patterns within EEG signals. However, conventional methods, predominantly reliant on convolutional neural networks (CNNs), are proficient at extracting local temporal features but struggle to capture long-range dependencies and global contextual information. To address this limitation, we propose a Triple-path Heterogeneous Feature Collaboration Network (TPHFC-Net), which synergistically integrates three distinct temporal modeling pathways: a multi-scale Temporal Convolutional Network (TCN) to capture fine-grained local dynamics, a Transformer branch to model global dependencies via multi-head self-attention, and a Long Short-Term Memory (LSTM) network to track sequential state evolution. These heterogeneous features are subsequently fused adaptively by a dynamic gating mechanism. In addition, the model’s robustness and discriminative power are further augmented by a lightweight front-end denoising diffusion model for enhanced noisy feature representation and a back-end prototype attention mechanism to bolster the inter-class separability of non-stationary EEG features. Extensive experiments on the BCI Competition IV-2a and IV-2b datasets validate the superiority of the proposed model, achieving mean classification accuracies of 82.45% and 89.49%, respectively, on the subject-dependent MI task and significantly outperforming existing mainstream baselines. Full article
Show Figures

Figure 1

21 pages, 2928 KB  
Article
No Trade-Offs: Unified Global, Local, and Multi-Scale Context Modeling for Building Pixel-Wise Segmentation
by Zhiyu Zhang, Debao Yuan, Yifei Zhou and Renxu Yang
Remote Sens. 2026, 18(3), 472; https://doi.org/10.3390/rs18030472 - 2 Feb 2026
Abstract
Building extraction from remote sensing imagery plays a pivotal role in applications such as smart cities, urban planning, and disaster assessment. Although deep learning has significantly advanced this task, existing methods still struggle to strike an effective balance among global semantic understanding, local [...] Read more.
Building extraction from remote sensing imagery plays a pivotal role in applications such as smart cities, urban planning, and disaster assessment. Although deep learning has significantly advanced this task, existing methods still struggle to strike an effective balance among global semantic understanding, local detail recovery, and multi-scale contextual awareness—particularly when confronted with challenges including extreme scale variations, complex spatial distributions, occlusions, and ambiguous boundaries. To address these issues, we propose TriadFlow-Net, an efficient end-to-end network architecture. First, we introduce the Multi-scale Attention Feature Enhancement Module (MAFEM), which employs parallel attention branches with varying neighborhood radii to adaptively capture multi-scale contextual information, thereby alleviating the problem of imbalanced receptive field coverage. Second, to enhance robustness under severe occlusion scenarios, we innovatively integrate a Non-Causal State Space Model (NC-SSD) with a Densely Connected Dynamic Fusion (DCDF) mechanism, enabling linear-complexity modeling of global long-range dependencies. Finally, we incorporate a Multi-scale High-Frequency Detail Extractor (MHFE) along with a channel–spatial attention mechanism to precisely refine boundary details while suppressing noise. Extensive experiments conducted on three publicly available building segmentation benchmarks demonstrate that the proposed TriadFlow-Net achieves state-of-the-art performance across multiple evaluation metrics, while maintaining computational efficiency—offering a novel and effective solution for high-resolution remote sensing building extraction. Full article
Show Figures

Figure 1

18 pages, 2539 KB  
Article
Squeeze-Excitation Attention-Guided 3D Inception ResNet for Aflatoxin B1 Classification in Almonds Using Hyperspectral Imaging
by Md. Ahasan Kabir, Ivan Lee and Sang-Heon Lee
Toxins 2026, 18(2), 76; https://doi.org/10.3390/toxins18020076 (registering DOI) - 2 Feb 2026
Abstract
Almonds are a highly valued nut due to their rich protein and nutritional content. However, they are vulnerable to aflatoxin B1 (AFB1) contamination in warm and humid environments. Consumption of AFB1-contaminated almonds can pose serious health risks, including kidney damage, and may lead [...] Read more.
Almonds are a highly valued nut due to their rich protein and nutritional content. However, they are vulnerable to aflatoxin B1 (AFB1) contamination in warm and humid environments. Consumption of AFB1-contaminated almonds can pose serious health risks, including kidney damage, and may lead to significant economic losses. Consequently, a rapid and non-destructive detection method is essential to ensure food safety by identifying and removing contaminated almonds from the supply chain. Hyperspectral imaging (HSI) and 3D deep learning provide a non-destructive, efficient alternative to current AFB1 detection methods. This study presents an attention-guided Inception ResNet 3D Network (AGIR-3DNet) for fast and precise detection of AFB1 contamination in almonds utilizing HSI. The proposed model integrates multi-scale feature extraction, residual learning, and attention mechanisms to enhance spatial-spectral feature representation, enabling more precise classification. The proposed 3D model was rigorously tested, and its performance was compared against 3D Inception and various conventional machine learning models. Compared to conventional machine learning models and deep learning architectures, AGIR-3DNet outperformed and achieved superior validation accuracy of 93.30%, an F1-score (harmonic mean of precision and recall) of 0.94, and an area under the receiver operating characteristic curve (AUC) value of 0.98. Furthermore, the model enhances processing efficiency, making it faster and more suitable for real-time industrial applications. Full article
(This article belongs to the Special Issue Mycotoxins in Food and Feeds: Human Health and Animal Nutrition)
Show Figures

Figure 1

27 pages, 7007 KB  
Article
A Developed YOLOv8 Model for the Rapid Detection of Surface Defects in Underground Structures
by Chao Ma, Xingyu Nie, Ping Fan and Guosheng Wang
Buildings 2026, 16(3), 610; https://doi.org/10.3390/buildings16030610 - 2 Feb 2026
Abstract
The YOLOv8 model has been shown to offer several advantages in detecting defects on concrete surfaces. However, it is ineffective at achieving multiscale feature extraction and accurate detection of underground structures under complex background conditions. Therefore, this study developed a YOLOv8-PSN model to [...] Read more.
The YOLOv8 model has been shown to offer several advantages in detecting defects on concrete surfaces. However, it is ineffective at achieving multiscale feature extraction and accurate detection of underground structures under complex background conditions. Therefore, this study developed a YOLOv8-PSN model to detect surface defects in underground structures more rapidly and accurately. The model uses PSA (Pyramid Squeeze Attention) and Slim-neck to improve the original YOLOv8. The PSA module is adopted in the backbone and neck network to improve the model’s perception of multiscale features. Meanwhile, a Slim-neck structure is introduced into the Neck part to improve computational efficiency and feature fusion. Then, a dataset comprising six concrete surface defect categories, including cracks and spalling, is built and used to evaluate the performance of the developed YOLOv8-PSN. Experimental results show that, compared with the original YOLOv8, YOLOv10, YOLOv11, SSD, and faster R-CNN, the mAP@50 of YOLOV8-PSN increases by 4.48%, 5.32%, 3.47%,20.03%, and 20.93%, respectively, while still maintaining a high-speed, real-time detection speed of up to 99 FPS. Therefore, the developed model has good robustness and practicality in a complex environment and can effectively and rapidly detect surface defects in underground structures. Full article
(This article belongs to the Topic Nondestructive Testing and Evaluation)
Show Figures

Figure 1

22 pages, 1588 KB  
Article
A Hybrid HOG-LBP-CNN Model with Self-Attention for Multiclass Lung Disease Diagnosis from CT Scan Images
by Aram Hewa, Jafar Razmara and Jaber Karimpour
Computers 2026, 15(2), 93; https://doi.org/10.3390/computers15020093 (registering DOI) - 1 Feb 2026
Abstract
Resource-limited settings continue to face challenges in the identification of COVID-19, bacterial pneumonia, viral pneumonia, and normal lung conditions because of the overlap of CT appearance and inter-observer variability. We justify a hybrid architecture of deep learning which combines hand-designed descriptors (Histogram of [...] Read more.
Resource-limited settings continue to face challenges in the identification of COVID-19, bacterial pneumonia, viral pneumonia, and normal lung conditions because of the overlap of CT appearance and inter-observer variability. We justify a hybrid architecture of deep learning which combines hand-designed descriptors (Histogram of Oriented Gradients, Local Binary Patterns) and a 20-layer Convolutional Neural Network with dual self-attention. Handcrafted features were then trained with Support Vector Machines, and ensemble averaging was used to integrate the results with the CNN. The confidence level of 0.7 was used to mark suspicious cases to be reviewed manually. On a balanced dataset of 14,000 chest CT scans (3500 per class), the model was trained and cross-validated five-fold on a patient-wise basis. It had 97.43% test accuracy and a macro F1-score of 0.97, which was statistically significant compared to standalone CNN (92.0%), ResNet-50 (90.0%), multiscale CNN (94.5%), and ensemble CNN (96.0%). A further 2–3% enhancement was added by the self-attention module that targets the diagnostically salient lung regions. The predictions that were below the confidence limit amounted to only 5 percent, which indicated reliability and clinical usefulness. The framework provides an interpretable and scalable method of diagnosing multiclass lung disease, especially applicable to be deployed in healthcare settings with limited resources. The further development of the work will involve the multi-center validation, optimization of the model, and greater interpretability to be used in the real world. Full article
(This article belongs to the Special Issue AI in Bioinformatics)
22 pages, 1028 KB  
Article
Foggy Ship Detection with Multi-Scale Feature and Attention Fusion
by Xiangjin Zeng, Jie Li and Ruifeng Xiong
Appl. Sci. 2026, 16(3), 1475; https://doi.org/10.3390/app16031475 - 1 Feb 2026
Abstract
To address the problem of insufficient detection accuracy, high false negative rate of small targets, and large positioning errors of ships in complex marine environments and foggy conditions, an improved DBL-YOLO method based on YOLOv11 is proposed. This method customizes and optimizes modules [...] Read more.
To address the problem of insufficient detection accuracy, high false negative rate of small targets, and large positioning errors of ships in complex marine environments and foggy conditions, an improved DBL-YOLO method based on YOLOv11 is proposed. This method customizes and optimizes modules according to the characteristics of foggy scenes—the C3k2-MDSC module is designed to efficiently extract and fuse multi-scale spatial features, and a dynamic weight allocation mechanism is adopted to balance the contributions of features at different scales in the foggy and blurred environment; a lightweight BiFPN structure is introduced to enhance the efficiency of cross-scale feature transmission and solve the problem of feature attenuation in foggy conditions; a novel fusion of the Deformable-LKA attention mechanism is innovated, which combines a large receptive field and spatial adaptive adjustment capabilities to focus on the key contour features of blurred ships in foggy conditions; an Inner-SIoU regression loss function is proposed, which optimizes the positioning accuracy of dense and small targets through an auxiliary bounding box dynamic scaling strategy. Experimental results show that in foggy scenes, the recall rate is increased by 3.4%, the F1 score is increased by 1%, and mAP@0.5 and mAP@0.5:0.95 are increased by 1.4% and 3.1%, respectively. The final average precision reaches 98.6%, demonstrating excellent detection accuracy and robustness. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
23 pages, 7288 KB  
Article
ECA-RepNet: A Lightweight Coal–Rock Recognition Network Using Recurrence Plot Transformation
by Jianping Zhou, Zhixin Jin, Hongwei Wang, Wenyan Cao, Xipeng Gu, Qingyu Kong, Jianzhong Li and Zeping Liu
Information 2026, 17(2), 140; https://doi.org/10.3390/info17020140 - 1 Feb 2026
Abstract
Coal and rock recognition is one of the key technologies in mining production, but traditional methods have limitations such as single-feature representation dimension, insufficient robustness, and unbalanced performance in lightweight design under noise interference and complex feature conditions. To address these issues, an [...] Read more.
Coal and rock recognition is one of the key technologies in mining production, but traditional methods have limitations such as single-feature representation dimension, insufficient robustness, and unbalanced performance in lightweight design under noise interference and complex feature conditions. To address these issues, an Efficient Channel Attention Reparameterized Network (ECA-RepNet) based on recurrence plot and Efficient Channel Attention mechanism is proposed. The one-dimensional vibration signal is mapped to the two-dimensional image space through a recurrence plot (RP), which retains the dynamic characteristics of the time series while capturing the complex patterns in the signal. Multi-scale feature extraction and lightweight design are achieved through the reparameterized large kernel block (RepLK Block) and the depthwise separable convolution (DSConv) module. The ECA module is introduced to embed multiple convolutional layers. Through global average pooling, one-dimensional convolution, and dynamic weight allocation, the modeling ability of inter-channel dependencies is enhanced, the model robustness is improved, and the computational overhead is reduced. Experimental results demonstrate that the ECA-RepNet model achieves 97.33% accuracy, outperforming classic models including ResNet, CNN, and MobileNet in parameter efficiency, training time, and inference speed. Full article
Show Figures

Graphical abstract

Back to TopTop