Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (342)

Search Parameters:
Keywords = squeeze-and-excite module

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
14 pages, 3212 KB  
Article
High-Accuracy Woven Fabric Recognition with Optimized ResNet50 and ConvNeXt
by Yanqiu Du, Quan Peng, Fugang Liu and Zhongtian Ma
Appl. Sci. 2026, 16(11), 5590; https://doi.org/10.3390/app16115590 - 3 Jun 2026
Viewed by 134
Abstract
Traditional woven fabric identification heavily depends on manual experience and subjective judgment, which limits its applicability in modern intelligent textile manufacturing. To address this issue, this paper proposes improved woven fabric recognition approaches based on the ResNet50 and ConvNeXt architectures. For the classical [...] Read more.
Traditional woven fabric identification heavily depends on manual experience and subjective judgment, which limits its applicability in modern intelligent textile manufacturing. To address this issue, this paper proposes improved woven fabric recognition approaches based on the ResNet50 and ConvNeXt architectures. For the classical ResNet50, the squeeze-and-excitation network (SENet) and the convolutional block attention module (CBAM) attention mechanisms are integrated separately to boost feature representation, and adopt the Adam optimizer to accelerate convergence. For ConvNeXt, we optimize the network stacking blocks and design a warmup + cosine annealing learning rate scheduling strategy. With the transfer learning strategy, classification experiments are conducted on a self-constructed fabric dataset. The improved ResNet50 model achieves a recognition accuracy of 90.15%, while the optimized ConvNeXt model reaches 90.01%. The models outperform their baseline counterparts, demonstrating the effectiveness of the proposed improvements. This study provides a feasible reference for the research of automatic woven fabric classification and related intelligent textile inspection. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

29 pages, 50937 KB  
Article
MAFT: A Lightweight Network for Martian Rock Segmentation Based on an Adaptive Frequency Transformer
by Chu Li, Yutong Jia, Gang Wan, Qifang Ma, Jia Liu, Yang Wang, Biao Wang, Jia Liu and Zhanji Wei
Remote Sens. 2026, 18(11), 1794; https://doi.org/10.3390/rs18111794 - 1 Jun 2026
Viewed by 252
Abstract
The segmentation of rocks on the Martian surface is crucial for navigation and obstacle avoidance by Mars rovers. However, frequent dust storms degrade rock surface textures, and the wide range of rock scales—from sub-meter to ten-meter—further complicates segmentation, especially under the strict computational [...] Read more.
The segmentation of rocks on the Martian surface is crucial for navigation and obstacle avoidance by Mars rovers. However, frequent dust storms degrade rock surface textures, and the wide range of rock scales—from sub-meter to ten-meter—further complicates segmentation, especially under the strict computational constraints of rover hardware. This paper proposes a lightweight network named MAFT, specifically designed for Martian rock segmentation. The network builds upon the Adaptive Frequency Transformer (AFFormer) and constructs an improved backbone termed the Improved Adaptive Frequency Transformer (IAFFormer). By replacing the traditional self-attention mechanism with a frequency-domain approach, it captures global feature dependencies while reducing the computational complexity from quadratic to linear. The spatially isolated 1 × 1 convolutions in the pixel descriptor module are further replaced with Adaptive Kernel Convolution (AKConv), enabling the backbone to dynamically adjust its sampling positions to conform to the irregular and diverse morphologies of Martian rocks. An Enhanced Multidimensional Convolutional Attention (EMCA) module is introduced as the decoding structure. By integrating max-pooling in the squeeze stage and adaptive dilated convolutions in the excitation stage, EMCA strengthens the boundary perception and long-range dependency modeling of dust-covered rocks without increasing the parameter count. Additionally, we constructed a dataset of Martian rocks for the Zhurong rover (TWMARS-V2) and conducted experiments using a synthetic dataset (SynMars) and a real dataset (MarsData-V2). Experimental results demonstrate that MAFT achieves the highest segmentation accuracy among all compared methods, with only 2.97 M parameters and 15.49 G FLOPs. On the TWMARS-V2 dataset, Pixel Accuracy (PA) reaches 98.17%, and IoU reaches 88.90%. Full article
Show Figures

Figure 1

31 pages, 9088 KB  
Article
MaxI-Net: A 3D AI Framework for CBCT-Based Maxillofacial Defect Reconstruction and Patient-Specific Implant Generation with Biomechanical Validation
by Mamta Juneja, Maanya Kharbanda, Nitin Pandey, Agrima Sudhir, Aditya Poddar, Harleen Kaur, Prashant Prakash, Manoj Kumar Jaiswal, Prashant Jindal and Philip Breedon
Bioengineering 2026, 13(6), 619; https://doi.org/10.3390/bioengineering13060619 - 26 May 2026
Viewed by 505
Abstract
Maxillofacial defects impair facial aesthetics and oral function, arising from trauma, tumor resection, or congenital anomalies; however, reconstruction using Computer-Aided Design (CAD) and autologous grafts remains complex and time-intensive, and is associated with donor-site morbidity. Although deep learning (DL) has advanced automated reconstruction, [...] Read more.
Maxillofacial defects impair facial aesthetics and oral function, arising from trauma, tumor resection, or congenital anomalies; however, reconstruction using Computer-Aided Design (CAD) and autologous grafts remains complex and time-intensive, and is associated with donor-site morbidity. Although deep learning (DL) has advanced automated reconstruction, existing models often address isolated tasks, lack integrated multi-scale feature learning, and rely on small datasets. This study proposes the Maxillofacial Implant-generation Network (MaxI-Net), a fast, resource-efficient three-dimensional DL framework for end-to-end maxillofacial defect reconstruction and patient-specific implant generation, with a completion step of cavity filling within the assembly. The model employs a 3D encoder–bottleneck-decoder architecture integrating hybrid dilated convolutions, residual connections, squeeze-and-excitation (SE) blocks, and 3D Convolutional Block Attention Modules (CBAM) with multi-scale feature fusion. It was trained on 921 Cone Beam-Computed Tomography (CBCT) scans, augmented to 11,973 maxillary defect pairs, using Dice loss and Adam optimisation with Automatic Mixed Precision, and benchmarked against UNet, UNETR, SegResNet, and SwinUNETR. MaxI-Net achieved the following: superior Dice Similarity Coefficient (DSC) = 0.778; 95th percentile Hausdorff Distance (HD95) = 3.453 mm; DSC Standard Deviation (SD) = 0.094; 95% confidence interval (CI) for mean DSC: 0.775–0.782). It was statistically validated against all competing architectures via pairwise Wilcoxon signed-rank tests, with significant DSC improvements confirmed across all comparators (p < 0.001) and rank-biserial effect sizes ranging from r = 0.250 against the closest competitor SegResNet* with high efficiency (0.06 s/volume; 9.6 min/epoch). Internal cavity filling of the generated implants was performed as a brief manual post-processing step in Autodesk Fusion 360 prior to biomechanical validation. Biomechanical validation using a finite element analysis (FEA) of polyether–ether–ketone (PEEK) implants (~26.53 g) showed 41% stress reduction under physiological loads (100–400 N), predicting a ~9.2-year lifespan. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) in Bioengineering: Second Edition)
Show Figures

Figure 1

26 pages, 19025 KB  
Article
Integrating Hybrid Attention Mechanisms into CNN-Based Architectures to Enhance Image Classification and Interpretability
by Alidor M. Mbayandjambe, Selain K. Kasereka, Darren Kevin T. Nguemdjom, Petro M. Tshakwanda, Milena Savova-Mratsenkova and Tasho Tashev
Mach. Learn. Knowl. Extr. 2026, 8(6), 143; https://doi.org/10.3390/make8060143 - 25 May 2026
Viewed by 206
Abstract
Integrating complementary attention mechanisms into standard Convolutional Neural Networks (CNNs) is a promising strategy for improving feature discrimination without substantial computational overhead. This paper presents a controlled empirical study of a hybrid attention module that combines Squeeze-and-Excitation Networks (SENet) and the Convolutional Block [...] Read more.
Integrating complementary attention mechanisms into standard Convolutional Neural Networks (CNNs) is a promising strategy for improving feature discrimination without substantial computational overhead. This paper presents a controlled empirical study of a hybrid attention module that combines Squeeze-and-Excitation Networks (SENet) and the Convolutional Block Attention Module (CBAM) through an adaptive element-wise summation with a learnable weighting parameter α and a residual connection. This work contributes a systematic and statistically rigorous evaluation of attention fusion across four CNN backbones (ResNet18, VGG16, AlexNet, and SqueezeNet) on the CIFAR-10 benchmark at 32×32 resolution. All models were trained from scratch under a deliberately conservative protocol (50 epochs, no pretrained weights, standard augmentation) to isolate the incremental effect of attention mechanisms under controlled conditions. Under this protocol, the hybrid SENet+CBAM configuration achieves statistically significant accuracy improvements over the corresponding baselines (p<0.001, 5-fold cross-validation): ResNet18 improves from 77.93% to 90.71% (+12.78%), VGG16 from 55.78% to 70.17% (+14.39%), AlexNet from 62.67% to 71.82% (+9.15%), and SqueezeNet from 71.91% to 78.29% (+6.38%). These gains must be interpreted within the scope of this controlled setting. Absolute accuracy values are below fully optimized literature benchmarks. For VGG16 in particular, part of the improvement likely reflects correction of underfitting under the conservative protocol, not the full potential of the hybrid mechanism. Parameter overhead remains modest at 1.5–5.8%, and training convergence improves by 16.5% on average. The hybrid approach outperforms the best previously reported SENet+CBAM result for each architecture by an average of 2.32%. Grad-CAM visualizations and attention entropy analysis provide qualitative evidence of more concentrated spatial attention patterns under the hybrid configuration. These should be understood as proxy indicators rather than rigorous interpretability measures. Validation on higher-resolution benchmarks such as CIFAR-100, STL-10, and ImageNet subsets is a necessary next step before broader applicability can be claimed. Full article
Show Figures

Figure 1

32 pages, 2387 KB  
Article
LGP-Net: A Lightweight Gated-Fusion Network with Physics-Informed Features for Automatic Modulation Classification
by Xuanchen Liu and Zhuo Chen
Electronics 2026, 15(11), 2261; https://doi.org/10.3390/electronics15112261 - 23 May 2026
Viewed by 150
Abstract
The growing diversity of wireless standards and complex real-world channel effects render automatic modulation classification (AMC) increasingly challenging for spectrum monitoring and edge intelligence. However, most competitive deep-learning-based AMC networks still require 105106 parameters, exceeding the memory available on [...] Read more.
The growing diversity of wireless standards and complex real-world channel effects render automatic modulation classification (AMC) increasingly challenging for spectrum monitoring and edge intelligence. However, most competitive deep-learning-based AMC networks still require 105106 parameters, exceeding the memory available on resource-constrained edge platforms. We propose LGP-Net, a lightweight gated-fusion network that pairs a physics-informed expert branch with a compact temporal encoder built from depthwise separable convolution (DSConv), squeeze-and-excitation (SE) attention, and a single-layer gated recurrent unit (GRU). Specifically, unlike other dual-branch structures that directly concatenate the outputs of both pathways, this work designs a lightweight gating unit that requires no external signal-to-noise ratio (SNR) labels and adaptively reweights the two pathways according to signal-quality degradation. With fewer than 40 K parameters, a peak activation footprint of 26.00 KB and an amortised inference latency of 9.7 μs per sample under GPU acceleration, LGP-Net attains 65.00% overall accuracy on RadioML 2016.10B (91.48% at 0 dB) and 62.76% on RadioML 2016.10A, placing it in a competitive accuracy–efficiency regime relative to architectures consuming 5× to 500× more parameters. These characteristics support deployment-oriented feasibility under memory-constrained edge settings and high-throughput spectrum-monitoring pipelines. Full article
Show Figures

Figure 1

26 pages, 2458 KB  
Article
An Adaptive Audiovisual Fusion Method Based on Prediction Confidence for Fine Granularity Bird Species Recognition
by Xinliang Xu, Qiming Liu, Xin Wen, Heng Zhao, Zhenhao Wang and Chong Wang
Appl. Sci. 2026, 16(10), 5113; https://doi.org/10.3390/app16105113 - 20 May 2026
Viewed by 325
Abstract
To address the inherent limitations of single-modality approaches in fine-grained bird species recognition, this paper proposes an adaptive audiovisual fusion method based on prediction confidence. The proposed framework comprises three core components: an image classification branch, an audio classification branch, and a confidence–adaptive [...] Read more.
To address the inherent limitations of single-modality approaches in fine-grained bird species recognition, this paper proposes an adaptive audiovisual fusion method based on prediction confidence. The proposed framework comprises three core components: an image classification branch, an audio classification branch, and a confidence–adaptive fusion module. The image branch employs EfficientNet-B3 to extract fine-grained visual features through compound scaling and squeeze-and-excitation (SE) attention. The audio branch utilizes ResNet-50 to classify Mel spectrograms converted from bird vocalizations, incorporating a dense sampling inference strategy to fully exploit complete audio information. For multimodal integration, a confidence–adaptive fusion strategy is introduced that jointly considers information entropy and probability gap to dynamically assess the reliability of each modality’s prediction, thereby assigning fusion weights at the sample level without any additional trainable parameters. Experiments on the SSW60 multimodal bird recognition dataset show that the image branch achieves a Top-1 accuracy of 91.55%, outperforming ResNet-50 (89.75%) and VGG-16 (83.81%); the audio branch reaches 68.20%, surpassing AST (63.29%) and VGG-16 (53.48%); and the fused model attains 95.30% Top-1 accuracy, a 3.75 percentage-point improvement over the image-only baseline and a 0.21 percentage-point gain over the learning-based TMC fusion baseline without introducing any trainable parameters, confirming the effectiveness of the proposed method. Full article
(This article belongs to the Special Issue AI-Based Supervised Prediction Models)
Show Figures

Figure 1

17 pages, 3640 KB  
Communication
A Dual-Modal Mixture-of-Experts Attention U-Net (DMoE-AttU-Net) for Change Detection Using Heterogeneous Optical and SAR Remote Sensing Images
by Seyed Ehsan Khankeshizadeh, Ali Mohammadzadeh, Ali Jamali and Sadegh Jamali
Remote Sens. 2026, 18(10), 1508; https://doi.org/10.3390/rs18101508 - 11 May 2026
Viewed by 550
Abstract
Binary change detection (BCD) using heterogeneous optical and SAR imagery faces challenges due to modality-specific noise and the lack of adaptive fusion strategies. Existing methods often fail to suppress SAR speckle noise and accurately localize fine boundaries. This study proposes a novel deep [...] Read more.
Binary change detection (BCD) using heterogeneous optical and SAR imagery faces challenges due to modality-specific noise and the lack of adaptive fusion strategies. Existing methods often fail to suppress SAR speckle noise and accurately localize fine boundaries. This study proposes a novel deep architecture, termed Dual-Modal Mixture-of-Experts Attention U-Net (DMoE-AttU-Net), featuring (i) dual-stream encoders for modality-specific feature extraction, (ii) a mixture-of-experts (MoE) module in the SAR stream with a gating network for dynamic fusion, (iii) Squeeze-and-Excitation (SE) and spatial attention mechanisms in the decoder, and (iv) hierarchical skip connections for multi-scale fusion. Unlike existing multimodal change detection frameworks that apply uniform feature fusion, the proposed architecture introduces a modality-aware design in which the MoE mechanism is selectively applied to the SAR stream, enabling adaptive suppression of speckle noise while preserving complementary optical information. These components collectively enhance change localization and reduce noise-induced artifacts. The proposed model achieved a mean IoU of 0.855 and a kappa coefficient of 0.836 on three optical–SAR datasets, outperforming state-of-the-art methods in both accuracy and spatial consistency. Full article
(This article belongs to the Section Remote Sensing Perspective)
Show Figures

Figure 1

28 pages, 12703 KB  
Article
Multi-Scale Attention Network for Landslide Susceptibility Assessment
by Zhao Zhan, Shanxiong Chen, Min Zhang, Wenzhong Shi, Yangjie Sun and Hongbo Luo
Geosciences 2026, 16(5), 188; https://doi.org/10.3390/geosciences16050188 - 7 May 2026
Viewed by 301
Abstract
Landslide susceptibility assessment (LSA) is crucial for regional landslide risk evaluation and mitigation strategy formulation. Previous studies mostly adopted single-scale features, while landslide formation is influenced by multi-scale factors, making multi-scale information extraction more appropriate for assessment. This study proposes a deep learning [...] Read more.
Landslide susceptibility assessment (LSA) is crucial for regional landslide risk evaluation and mitigation strategy formulation. Previous studies mostly adopted single-scale features, while landslide formation is influenced by multi-scale factors, making multi-scale information extraction more appropriate for assessment. This study proposes a deep learning framework integrating multi-scale and attention modules for object-based LSA. A multi-scale network extracts geo-environmental features at different scales, which are input into attention networks using multi-head attention and Squeeze-and-Excitation, termed MSMHA and MSSE, respectively, to enhance relevant features and suppress irrelevant ones. Finally, features are fused for classification and prediction. In a case study in Hong Kong, CNN-based and ML-based methods were compared using 9814 landslides and 11 influencing factors. Results show the proposed MSMHA (area under the curve, AUC 0.91) and MSSE (AUC 0.90) outperform conventional methods (e.g., random forest with AUC 0.86; multi-layer perceptron and support vector machine with AUC 0.85; DenseNet with AUC 0.86; CNN with AUC 0.88; VGG with AUC 0.87; GoogLeNet and ResNet with AUC 0.81). CNN-based methods outperformed ML-based ones, indicating that incorporating neighborhood information improves model performance. The rationality of the susceptibility map generated by MSMHA was verified via comparative analysis. Results confirm that the proposed multi-scale and attention-integrated framework outperforms traditional single-scale methods consistently. Equally importantly, the case study provides advanced CNN-based landslide susceptibility maps for Hong Kong, which can serve as a critical reference for regional landslide risk management and the formulation of targeted mitigation strategies. Full article
Show Figures

Figure 1

17 pages, 5695 KB  
Article
MDCNet: A Multi-Neighborhood Dense Connectivity Network for Infrared Transmission Line Clamp Segmentation
by Guocheng An, Wanrong Lu, Guohua Zhai, Xiaolong Wang and Yanwei Zhang
Electronics 2026, 15(9), 1926; https://doi.org/10.3390/electronics15091926 - 2 May 2026
Viewed by 303
Abstract
Advancements in infrared imaging technology have introduced a novel perspective for inspecting power transmission lines. Nevertheless, the inherent low contrast and indistinct edges of infrared images present significant challenges, rendering the direct application of traditional semantic segmentation algorithms unsatisfactory. To mitigate this problem, [...] Read more.
Advancements in infrared imaging technology have introduced a novel perspective for inspecting power transmission lines. Nevertheless, the inherent low contrast and indistinct edges of infrared images present significant challenges, rendering the direct application of traditional semantic segmentation algorithms unsatisfactory. To mitigate this problem, we propose a multi-neighborhood densely connected network architecture. This framework incorporates two pivotal modules: the Multi-Head Squeeze-and-Excitation (MHSE) module and the Multi-Neighborhood Feature Fusion (MNFF) module. The MHSE enhances local feature representations by capturing nuanced feature interactions, thereby alleviating the issue of imbalanced global feature weight distribution. The MNFF aggregates feature data from multiple adjacent nodes at each node’s input, which not only facilitates the integration of multi-scale target features but also leverages neighborhood information to precisely localize and amplify features within specific regions. Furthermore, we have built the first Infrared Dataset of Power Transmission Line Suspension Clamp (CLAMPTISS) to substantiate our approach. Empirical evidence demonstrates that our proposed network surpasses state-of-the-art networks across three key metrics: the mean Intersection over Union (mIoU) and localization accuracy (Pd) have increased by 8.3% and 13.3%, respectively, while the false alarm rate (Fa) has decreased by 38.2%. Full article
Show Figures

Figure 1

31 pages, 2300 KB  
Article
MDCAD-Net: A Multi-Dilated Convolution Attention Denoising Network for Bearing Fault Diagnosis
by Ran Duan, Ruopeng Yan and Guangyin Jin
Vibration 2026, 9(2), 30; https://doi.org/10.3390/vibration9020030 - 24 Apr 2026
Viewed by 389
Abstract
Bearing fault diagnosis is an important task for condition monitoring and predictive maintenance of rotating machinery. Nevertheless, many existing deep learning-based methods have difficulty in jointly modeling multi-scale fault characteristics, adaptively highlighting informative features, and maintaining robustness under noisy measurement conditions. To address [...] Read more.
Bearing fault diagnosis is an important task for condition monitoring and predictive maintenance of rotating machinery. Nevertheless, many existing deep learning-based methods have difficulty in jointly modeling multi-scale fault characteristics, adaptively highlighting informative features, and maintaining robustness under noisy measurement conditions. To address these issues, this study presents MDCAD-Net, a multi-dilated convolution attention denoising network that integrates multi-scale temporal feature extraction, attention-based feature refinement, and explicit noise suppression within an end-to-end learning framework. Parallel dilated convolutions with different dilation rates are employed to capture short-duration transient impulses as well as long-range periodic patterns in vibration signals. Channel-wise feature recalibration using squeeze-and-excitation networks and spatial-temporal attention via a convolutional block attention module are combined to enhance informative representations. In addition, a denoising block with gated attention and residual connections is introduced to reduce noise interference while retaining fault-related signal components. Experiments conducted on the Case Western Reserve University bearing dataset show that the proposed method achieves a classification accuracy of 98.93% and yields competitive performance compared with several commonly used deep learning models. Ablation studies and feature visualization results further illustrate the contributions of the individual components and the separability of the learned feature representations under noisy conditions. The results indicate the potential of the proposed framework for practical bearing fault diagnosis under noisy operating conditions. Full article
Show Figures

Figure 1

21 pages, 3375 KB  
Article
Deep6DHead: A 6D Head Pose Estimation Method Based on Deep Feature Enhancement
by Fake Jiang, Shucheng Huang and Mingxing Li
Symmetry 2026, 18(5), 705; https://doi.org/10.3390/sym18050705 - 22 Apr 2026
Cited by 1 | Viewed by 333
Abstract
To address the bottlenecks of accuracy in head pose estimation caused by occlusion and rotational representation ambiguities, we propose Deep6DHead, a 6-degree-of-freedom (6DoF) head pose estimation method based on deep feature enhancement. This method innovatively integrates RGB and depth information to construct a [...] Read more.
To address the bottlenecks of accuracy in head pose estimation caused by occlusion and rotational representation ambiguities, we propose Deep6DHead, a 6-degree-of-freedom (6DoF) head pose estimation method based on deep feature enhancement. This method innovatively integrates RGB and depth information to construct a four-channel input and achieves feature fusion of RGB-D through a dual-branch network. First, a Squeeze-and-Excitation (SE) module adaptively weights the depth geometric features of key anatomical regions to achieve channel recalibration. Second, based on the 6DoF rotation representation framework, we introduce an anatomical constraint loss using the nasal bridge normal. This constraint corrects rotation deviations caused by noise by enforcing consistency in local geometric orientation. Finally, the model outputs the rotation matrix end-to-end for final pose estimation. Experiments on the 300W-LP, BIWI, and AFLW2000 datasets demonstrate that our method significantly improves robustness and accuracy, particularly under extreme head poses. Notably, it achieves state-of-the-art performance on the roll axis (lowest error: 2.05) and a competitive overall MAE of 3.45, providing an effective solution for head pose estimation in complex real-world scenarios including extreme viewing angles. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

13 pages, 4462 KB  
Article
A Lightweight 1D-CNN-Transformer for Bearing Fault Diagnosis Under Limited Data and AWGN Interference
by Yifan Guo, Yijie Zhi, Renyi Qi and Ming Cai
Sensors 2026, 26(9), 2574; https://doi.org/10.3390/s26092574 - 22 Apr 2026
Cited by 1 | Viewed by 688
Abstract
Intelligent bearing fault diagnosis is essential for maintaining the reliability of rotating machinery. However, deploying deep learning models in industrial environments is often constrained by a lack of labeled data, environmental noise, and strict hardware limits. To address these connected challenges, this paper [...] Read more.
Intelligent bearing fault diagnosis is essential for maintaining the reliability of rotating machinery. However, deploying deep learning models in industrial environments is often constrained by a lack of labeled data, environmental noise, and strict hardware limits. To address these connected challenges, this paper proposes 1D-CNN-Trans, a flexible and resource-efficient hybrid framework. Designed for supervised diagnosis with restricted data, the configurable model combines a compact one-dimensional convolutional neural network (1D-CNN) for local feature extraction, a Transformer encoder for capturing long-range temporal dependencies, and an optional squeeze-and-excitation (SE) module for channel recalibration under favorable conditions. The method is evaluated on two standard mechanical benchmarks under limited sample conditions, controlled additive white Gaussian noise (AWGN), and dynamic non-stationary interference. Experimental results indicate that 1D-CNN-Trans shows improved robustness under interference compared to selected baselines, notably improving accuracy against a standard CNN backbone. Furthermore, findings indicate that while the Transformer ensures noise robustness, channel recalibration (via SE) introduces optimization instability under extreme sparsity and noise. Consequently, we reposition the architecture as a configurable framework where recalibration is conditionally activated. Finally, theoretical complexity analysis is provided to validate the model’s low computational burden, indicating its general feasibility for resource-constrained scenarios. Full article
Show Figures

Figure 1

21 pages, 1514 KB  
Article
Enhanced YOLOv11 with BiFormer and CoordSE for Real-Time Steel Continuous Casting Slag Object Detection
by Binhong Li, Pengfei Cheng and Xichen Liu
Appl. Sci. 2026, 16(8), 3965; https://doi.org/10.3390/app16083965 - 19 Apr 2026
Viewed by 366
Abstract
Precise slag addition monitoring in steel continuous casting is critical, yet harsh industrial environments make this task extremely challenging. This research proposes a novel deep learning framework by integrating BiFormer and coordinate-aware squeeze-and-excitation (CoordSE) modules into the YOLOv11 architecture. To efficiently extract features [...] Read more.
Precise slag addition monitoring in steel continuous casting is critical, yet harsh industrial environments make this task extremely challenging. This research proposes a novel deep learning framework by integrating BiFormer and coordinate-aware squeeze-and-excitation (CoordSE) modules into the YOLOv11 architecture. To efficiently extract features of small slag particles against complex molten steel backgrounds, the BiFormer component employs a dual-level routing attention strategy. Concurrently, the CoordSE module captures spatial and channel-wise feature dependencies by combining direction-aware feature aggregation with multi-branch fully connected layers. Evaluated on a custom dataset of 2847 high-resolution industrial images, the proposed BiFormer-CoordSEBlock-YOLOv11 model achieved 82.5 ± 0.2% precision, 69.1 ± 0.3% recall, and 80.6 ± 0.2% mAP@0.5. Comprehensive ablation studies confirm that the BiFormer and CoordSE modules improved the baseline mAP@0.5 by 23.4% and 12.3%, respectively. Operating at a real-time inference speed of 45.2 FPS on standard hardware, this model offers a highly competitive framework for metallurgical process monitoring. However, the current recall rate of 69.1% and the lack of physical validation on resource-constrained edge devices represent limitations that must be systematically addressed before full-scale industrial deployment. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

25 pages, 8972 KB  
Article
Deep-MiSR: Multi-Scale Convolution and Attention-Enhanced DeepLabV3+ for Brain Tumor Segmentation in MRI
by Md Parvej Mosharaf, Jie Su and Jing Zhang
Appl. Sci. 2026, 16(8), 3900; https://doi.org/10.3390/app16083900 - 17 Apr 2026
Viewed by 310
Abstract
Accurate brain tumor segmentation in magnetic resonance imaging (MRI) is essential for diagnosis, treatment planning, and therapy monitoring. Conventional deep learning models often struggle with large variations in tumor shape, size, and contrast, as well as severe foreground–background imbalance. To address these challenges, [...] Read more.
Accurate brain tumor segmentation in magnetic resonance imaging (MRI) is essential for diagnosis, treatment planning, and therapy monitoring. Conventional deep learning models often struggle with large variations in tumor shape, size, and contrast, as well as severe foreground–background imbalance. To address these challenges, this study presents Deep-MiSR, an enhanced encoder–decoder framework built upon DeepLabV3+ with a MobileNetV2 backbone, tailored for single-modality contrast-enhanced T1-weighted (T1CE) MRI segmentation. Three complementary components are integrated into the architecture: mixed depthwise convolution (MixConv) with heterogeneous kernels within the atrous spatial pyramid pooling module for multi-scale feature aggregation, a squeeze-and-excitation block for adaptive channel recalibration, and R-Drop regularization that enforces prediction consistency via symmetric Kullback–Leibler divergence. The model was evaluated on 3064 T1CE slices from 233 patients drawn from the publicly available Nanfang Hospital brain MRI dataset. Deep-MiSR achieved a Dice similarity coefficient of 0.9281, a mean intersection-over-union of 0.8738, a precision of 0.8839, and a 95th-percentile Hausdorff distance of 7.69 mm, demonstrating consistent improvements over both the DeepLabV3+ baseline and all prior methods evaluated on the same data. Ablation studies confirmed that each component contributes independently, with R-Drop providing the largest individual gain. These findings demonstrate that combining multi-scale convolution, channel attention, and consistency regularization constitutes an effective and computationally practical strategy for robust single-modality brain tumor segmentation. Full article
(This article belongs to the Special Issue Advances in Deep Learning-Based Medical Image Analysis: 2nd Edition)
Show Figures

Figure 1

18 pages, 4176 KB  
Article
An Attention-Enhanced Network for Visual Attitude Estimation
by Lu Liu, Jiahao Duan, Yaoyang Shen, Shihan Wang, Jiale Mao, Wei Liu, Yuyan Guo, Lan Wu, Ming Kong and Hang Yu
Algorithms 2026, 19(4), 309; https://doi.org/10.3390/a19040309 - 15 Apr 2026
Viewed by 277
Abstract
Accurate estimation of object attitude is essential for understanding motion behavior and achieving dynamic tracking. Existing image-based methods often suffer from low efficiency and limited accuracy, while the potential of deep learning has not been fully exploited in this field. To address these [...] Read more.
Accurate estimation of object attitude is essential for understanding motion behavior and achieving dynamic tracking. Existing image-based methods often suffer from low efficiency and limited accuracy, while the potential of deep learning has not been fully exploited in this field. To address these limitations, a lightweight deep learning method for attitude estimation is proposed and validated on spherical particles. A synthetic dataset is generated through VTK-based rendering and automatic annotation, providing large-scale training samples with known Euler angles. An improved MobileNetV1 backbone is developed by integrating Squeeze-and-Excitation blocks, a dual-scale Pyramid Pooling Module, global average pooling, and a regression-oriented multilayer perceptron, which enhances feature extraction and enables direct Euler angle prediction. Experimental results show that the proposed method achieves an average error of 0.308° on synthetic test images. Furthermore, a solid particle was fabricated through 3D printing and physical measurements were conducted, where the network combined with image preprocessing and augmentation achieved an average error of about 0.5° on real images, demonstrating a lightweight and deployment-friendly framework for practical attitude estimation. The results verify the effectiveness of the method and demonstrate its potential for accurate and computationally efficient attitude measurement in applications such as fluid dynamics, industrial inspection, and motion tracking. Full article
Show Figures

Figure 1

Back to TopTop