Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (116)

Search Parameters:
Keywords = multi-branch pooling

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 6283 KB  
Article
Surface Defect Detection in Liquid Crystal Display Polariser Coating Manufacturing Based on an Enhanced YOLOv10-N Approach
by Jiayue Zhang, Shanhui Liu, Minghui Chen, Kezhan Zhang, Yinfeng Li, Ming Peng and Yeting Teng
Coatings 2026, 16(4), 451; https://doi.org/10.3390/coatings16040451 - 8 Apr 2026
Viewed by 260
Abstract
To address the issues of uneven grayscale distribution, weak defect features, and small target scales on the coating surface of LCD polarizers during manufacturing, an improved YOLOv10-N-based method is proposed for surface defect detection. First, a polarizer coating defect dataset is constructed based [...] Read more.
To address the issues of uneven grayscale distribution, weak defect features, and small target scales on the coating surface of LCD polarizers during manufacturing, an improved YOLOv10-N-based method is proposed for surface defect detection. First, a polarizer coating defect dataset is constructed based on the LCD polarizer coating process and the characteristics of coating defects. Adaptive median filtering is then employed for image denoising, while a particle-swarm-optimization-based improved histogram equalization method is adopted for image enhancement. Next, the Scale-aware Pyramid Pooling (SCPP) module is introduced into the C2f module of the backbone network to construct the C2f_SCPP feature extraction module, thereby improving the model’s ability to detect coating defects with different morphologies through multi-scale semantic feature fusion. In addition, rotation-equivariant convolution PreCM is incorporated into the SPPF module of the backbone network to build the SPPF_PreCM module, which effectively suppresses feature redundancy and scale conflicts while strengthening the representation of tiny defects. Finally, while retaining the original Distribution Focal Loss (DFL) branch of YOLOv10, WIoU is used to replace CIoU as the IoU loss term in bounding box regression, thereby improving localization accuracy and accelerating model convergence during training. Experimental results show that, compared with YOLOv10-N, the proposed method improves mAP@0.5 and mAP@0.5:0.95 by 1.8 and 2.8 percentage points, respectively, demonstrating its effectiveness for polarizer coating defect detection. However, its generalization capability under diverse production environments, varying illumination conditions, and complex noise scenarios still requires further investigation. Full article
(This article belongs to the Section High-Energy Beam Surface Engineering and Coatings)
Show Figures

Figure 1

19 pages, 1666 KB  
Article
MTLL: A Novel Multi-Task Learning Approach for Lymphocytic Leukemia Classification and Nucleus Segmentation
by Cuisi Ou, Zhigang Hu, Xinzheng Wang, Kaiwen Cao and Yipei Wang
Electronics 2026, 15(7), 1419; https://doi.org/10.3390/electronics15071419 - 28 Mar 2026
Viewed by 262
Abstract
Bone marrow cell classification and nucleus segmentation in microscopic images are fundamental tasks for computer-aided diagnosis of lymphocytic leukemia. However, bone marrow cells from different subtypes exhibit high morphological similarity, and structural information is often constrained under optical microscopic imaging, posing challenges for [...] Read more.
Bone marrow cell classification and nucleus segmentation in microscopic images are fundamental tasks for computer-aided diagnosis of lymphocytic leukemia. However, bone marrow cells from different subtypes exhibit high morphological similarity, and structural information is often constrained under optical microscopic imaging, posing challenges for stable and effective feature representation. To address this issue, we propose MTLL (Multitask Model on Lymphocytic Leukemia), a novel multitask approach that performs cell classification and nucleus segmentation within a unified network to exploit their complementary information. The model constructs a hybrid backbone for shared feature representation based on a CNN-Transformer architecture, in which Fuse-MBConv modules are tightly integrated with multilayer multi-scale transformers to enable deep fusion of local texture and global semantic information. For the segmentation branch, we design an AM (Atrous Multilayer Perceptron) decoder that combines atrous spatial pyramid pooling with multilayer perceptrons to fuse multi-scale information and accurately delineate nucleus boundaries. The classification branch incorporates prior knowledge of cell nuclei structures to capture subtle variations in cellular morphology and texture, thereby enhancing the model’s ability to distinguish between leukemia subtypes. Experimental results demonstrate that the MTLL model significantly outperforms existing advanced single-task and multi-task models in both lymphocytic leukemia classification and cell nucleus segmentation. These results validate the effectiveness of the multi-task feature-sharing strategy for lymphocytic leukemia diagnosis using bone marrow microscopic images. Full article
Show Figures

Figure 1

20 pages, 4497 KB  
Article
Remote Sensing Identification of Benggang Using a Two-Stream Network with Multimodal Feature Enhancement and Sparse Attention
by Xuli Rao, Qihao Chen, Kexin Zhu, Zhide Chen, Jinshi Lin and Yanhe Huang
Electronics 2026, 15(6), 1331; https://doi.org/10.3390/electronics15061331 - 23 Mar 2026
Viewed by 226
Abstract
Benggang (Benggang), a typical landform characterized by severe erosion and a geohazard in the red-soil hilly regions of southern China, is characterized by a fragmented texture, irregular boundaries, and high similarity to background objects such as bare soil and roads, which poses a [...] Read more.
Benggang (Benggang), a typical landform characterized by severe erosion and a geohazard in the red-soil hilly regions of southern China, is characterized by a fragmented texture, irregular boundaries, and high similarity to background objects such as bare soil and roads, which poses a dual challenge of “multiscale variability + strong noise” for automated identification at regional scales. To address insufficient information from a single modality and the limited representation of cross-scale features, this study proposes a dual-stream feature-fusion network (DF-Net) for multisource data consisting of a digital orthophoto map (DOM) and a digital elevation model (DEM). The method adopts ResNeSt50d as the backbone of the two branches: on the DOM side, a Canny-edge channel is stacked to enhance high-frequency boundary information; on the DEM side, derived terrain factors, including slope, aspect, curvature, and hillshade, are introduced to provide morphological constraints. In the cross-modal fusion stage, a multiscale sparse attention fusion module is designed, which acquires contextual information via multiwindow average pooling and suppresses noise interference through top-K sparsification. In the decision stage, a multibranch ensemble is employed to improve classification stability. Taking Anxi County, Fujian Province, as the study area, a coregistered dataset of GF-2 (1 m) DOM and ALOS (12.5 m) DEMs is constructed, and a zonal partitioning strategy is adopted to evaluate the model’s generalization ability. The experimental results show that DF-Net achieves 97.44% accuracy, 85.71% recall, and an 82.98% F1 score in the independent test zone, outperforming multiple mainstream CNN/transformer classification models. This study indicates that the strategy of “multimodal feature enhancement + sparse attention fusion” tailored to Benggang erosional landforms can significantly improve recognition performance under complex backgrounds, providing technical support for rapid Benggang surveys and governance-effectiveness assessments. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

20 pages, 2605 KB  
Article
Spatial-Frequency Decoupling Alignment Encoding for Remote Sensing Change Detection
by Xu Zhang, Yue Du, Weiran Zhou and Kaihua Zhang
Sensors 2026, 26(6), 1979; https://doi.org/10.3390/s26061979 - 21 Mar 2026
Viewed by 454
Abstract
Existing remote sensing change detection methods often struggle to accurately capture the contours of complex change targets and subtle textural differences. This makes it difficult to effectively distinguish between the boundaries of change targets and the background. To address this challenge, we propose [...] Read more.
Existing remote sensing change detection methods often struggle to accurately capture the contours of complex change targets and subtle textural differences. This makes it difficult to effectively distinguish between the boundaries of change targets and the background. To address this challenge, we propose a novel method called spatial-frequency decoupling alignment encoding (SDA-Encoding), which is designed to fully leverage information from both the spatial and frequency domains. Specifically, we first use a Transformer encoder to extract bi-temporal features. Next, we apply wavelet transform to decouple these features into low-frequency and high-frequency components. In the multi-scale high-frequency interaction (MHI) module, we combine local spatial enhancement using spatial pyramid pooling with cross-scale dependency supplementation via the dual-domain alignment fusion (DAF) module. Meanwhile, in the position-aware low-frequency enhancement (PLE) module, spatial position sensitivity is restored using coordinate attention, and region-level contextual dependencies are captured through the selective fusion attention (SFA) module. Finally, the two frequency-domain branches are complementarily fused within the spatial domain to achieve unified detection of both fine-grained and structural changes. Experimental results on three benchmark datasets demonstrate the significant performance improvements of SDA-Encoding. Full article
(This article belongs to the Special Issue Image Processing and Analysis for Object Detection: 3rd Edition)
Show Figures

Figure 1

31 pages, 1466 KB  
Article
Fusing Geometric and Semantic Features via Cosine Similarity Cross-Attention for Remote Sensing Scene Classification
by Xuefei Xu and Chengjun Xu
Sensors 2026, 26(5), 1613; https://doi.org/10.3390/s26051613 - 4 Mar 2026
Viewed by 346
Abstract
High-resolution remote sensing image scene classification (HRRSI-SC) is crucial for obtaining accurate Earth surface information. However, the task remains challenging due to significant background interference, high intra-class variation, and subtle inter-class similarities. Convolutional neural networks (CNNs) are constrained by their local receptive fields, [...] Read more.
High-resolution remote sensing image scene classification (HRRSI-SC) is crucial for obtaining accurate Earth surface information. However, the task remains challenging due to significant background interference, high intra-class variation, and subtle inter-class similarities. Convolutional neural networks (CNNs) are constrained by their local receptive fields, which limits their ability to capture long-range spatial dependencies. On the other hand, Vision Transformers (e.g., ViT-B-16) excel at global feature extraction but often suffer from high computational complexity and may lack the inherent inductive biases for local feature modeling that CNNs possess. To address these limitations, this paper proposes a cross-level feature complementary classification framework based on Lie Group manifold space, termed CBCAM-LGM. Within the proposed CBCAM-LGM framework, multi-granularity features are first distilled via a global average pooling layer to suppress redundant information. The core of our approach, the cross-level bidirectional complementary attention module (CBCAM), then enables the adaptive fusion of features from both branches through a cross-query attention mechanism. Furthermore, by employing parallel dilated convolutions and a parameter-sharing strategy, the model captures multi-scale contextual information by sharing a single set of convolutional weights, which reduces the computational complexity to merely 1.21 GMACs while preserving multi-scale representation with minimal parameter overhead. Extensive experiments on challenging benchmarks demonstrate the model’s efficacy, as it achieves a state-of-the-art classification accuracy of 97.81% on the AID, surpassing the ViT-B-16 baseline by 1.63%, while containing only 11.237 million parameters (an 87% reduction). These results collectively affirm that our model presents an efficient solution characterized by high accuracy and low complexity. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

22 pages, 13735 KB  
Article
DBM-YOLO: A Dual-Branch Model with Feature Sharing for UAV Object Detection in Low-Illumination Environments
by Liwen Liu, Huilin Li, Gui Fu, Bo Zhou, You Wang and Rong Fan
Drones 2026, 10(3), 169; https://doi.org/10.3390/drones10030169 - 28 Feb 2026
Viewed by 548
Abstract
To resolve the issue of degraded detection accuracy for unmanned aerial vehicle object detection under low-illumination environments, this paper introduces a parallel object detection model. First, a dual-branch architecture is established by parallelly integrating a Zero-Reference Deep Curve Estimation (Zero-DCE) illumination enhancement network [...] Read more.
To resolve the issue of degraded detection accuracy for unmanned aerial vehicle object detection under low-illumination environments, this paper introduces a parallel object detection model. First, a dual-branch architecture is established by parallelly integrating a Zero-Reference Deep Curve Estimation (Zero-DCE) illumination enhancement network with a You Only Look Once (YOLOv11n)-based object detection network, enabling collaborative feature training and real-time updates. Through a feature-sharing mechanism, the two branches are jointly optimized during training, thus enhancing the model’s generalization capability in low-illumination environments. Furthermore, to further improve detection accuracy, a Dynamic Pooling Synergy Attention (DPSA) module is introduced into the backbone of YOLOv11n. By integrating dynamic pooling-based channel attention with spatial attention, this module improves feature representation, improves performance under complex environments, and increases adaptability to multi-scale targets. In addition, a High and Low Frequency Spatially-adaptive Feature Modulation (HLSAFM) module is added to the detection network’s Neck. Through high- and low-frequency feature refinement, segmented feature processing, and dynamic modulation, the network is able to capture richer feature information, thereby strengthening feature representation and discriminative capability. Extensive experiments on the VisDrone (Night) and DroneVehicle (Night) datasets demonstrate superior performance over multiple existing methods under low-illumination object detection tasks. Compared with the original YOLOv11n model, the proposed model mAP50 increases by 6.0% and 1.0% and mAP50:95 increases by 3.1% and 0.8%, respectively. These results confirm the enhanced detection capability achieved by our method in challenging low-illumination unmanned aerial vehicle (UAV) scenarios. Full article
Show Figures

Figure 1

26 pages, 3736 KB  
Article
EIMGDNet: An Edge-Induced and Multi-Dimensional Grouped Difference Network for Remote Sensing Image Change Detection
by Le Sun, Mingxuan Ding, Qiaolin Ye, Yuhui Zheng, Zebin Wu and Wen Lu
Remote Sens. 2026, 18(4), 649; https://doi.org/10.3390/rs18040649 - 20 Feb 2026
Viewed by 483
Abstract
Change detection in remote sensing imagery is crucial for monitoring temporal variations in surface characteristics; nevertheless, it presents significant challenges owing to indistinct boundaries, limited semantic differentiation, and inadequate incorporation of multi-scale contextual information. To solve these problems, we propose EIMDGNet (Edge-Induced and [...] Read more.
Change detection in remote sensing imagery is crucial for monitoring temporal variations in surface characteristics; nevertheless, it presents significant challenges owing to indistinct boundaries, limited semantic differentiation, and inadequate incorporation of multi-scale contextual information. To solve these problems, we propose EIMDGNet (Edge-Induced and Multi-Dimensional Grouped Difference Network), a novel architecture that enhances boundary representation and cross-scale feature interaction for accurate and robust change detection. EIMDGNet adopts a dual-branch ResNet18 backbone to extract multi-scale features from bi-temporal images, capturing both fine spatial detail and high-level semantic context. To improve boundary awareness and reduce pseudo-change interference, we introduce the Edge-Induced Differential Multi-Dimensional Group Enhancement Module (EID-MDGEM). This module enriches fine-grained spatial features through grouped pooling across spatial and channel dimensions, enabling precise localization of change contours. Within EID-MDGEM, the Edge Feature Enhancement Module (EFEM) integrates a parameter-free attention mechanism to generate edge-saliency maps, highlighting true change regions while suppressing background noise and irrelevant variations. To further enhance semantic consistency across feature scales, we design the Multi-Scale Hierarchical Progressive Fusion Module (MSHPM). This component employs a bottom-up progressive strategy to hierarchically integrate low-level spatial details with high-level semantic abstractions, thus increasing the continuity and completeness of detected change regions. By tightly coupling edge-aware enhancement with multi-scale hierarchical fusion, EIMDGNet effectively addresses major obstacles in change detection, including boundary ambiguity, inconsistent scale information, and feature misalignment. We evaluated EIMDGNet on five remote sensing change detection datasets: LEVIR-CD, DSIFN-CD, S2Looking, CLCD-CD and GVLM-CD. Our method consistently outperformed state-of-the-art approaches, achieving 91.49% F1 and 82.93% IoU on LEVIR-CD, 77.32% F1 and 69.39% IoU on DSIFN-CD, the highest 49.19% IoU and 99.20% OA on S2Looking, 81.65% F1 and 72.91% IoU on CLCD-CD, and 85.49% F1 and 76.08% IoU on GVLM-CD. These results demonstrate the superior accuracy and robustness of EIMDGNet across diverse change detection scenarios. Full article
Show Figures

Figure 1

26 pages, 1749 KB  
Article
Institutional Governance and Entrepreneurship: A Multi-Branch Perspective on Policy Mixes in Emerging Economies
by Mohammad Ali Moradi and Mohammad Jahanbakht
Adm. Sci. 2026, 16(2), 97; https://doi.org/10.3390/admsci16020097 - 12 Feb 2026
Cited by 1 | Viewed by 656
Abstract
Institutions play a central role in shaping entrepreneurial behavior, yet much of the existing literature, even with the foundational insights of institutional economists such as Veblen, Mitchell, Commons, Coase, Ostrom, Williamson, and North, continues to view institutions as monolithic entities rather than as [...] Read more.
Institutions play a central role in shaping entrepreneurial behavior, yet much of the existing literature, even with the foundational insights of institutional economists such as Veblen, Mitchell, Commons, Coase, Ostrom, Williamson, and North, continues to view institutions as monolithic entities rather than as differentiated governance systems. This study addresses this gap by reconceptualizing institutions as multi-branch governance architectures in which legislative, executive, and judicial mechanisms interact to shape entrepreneurial outcomes, particularly in volatile emerging economies. The research asks how these disaggregated governance branches, mediated by institutional quality and external shocks, jointly influence entrepreneurial activity. Using Global Entrepreneurship Monitor (GEM) microdata for Iran over the period 2008–2020, merged with governance indicators and shock variables including sanctions and COVID-19, we employ pooled logistic regression to estimate the effects of governance functions and their policy mix interactions on Total Entrepreneurial Activity. The results show that executive policy quality has the strongest positive association with entrepreneurship, legislative coherence strengthens opportunity-driven activity, and judicial inefficiencies suppress entrepreneurial engagement by increasing uncertainty. Interaction effects further reveal that misalignment among governance branches weakens entrepreneurial activity, while coherent policy mixes mitigate the negative impact of external shocks. By integrating conceptual synthesis with empirical evidence, the study advances institutional theory, clarifies deficiencies in prevailing models, and demonstrates that entrepreneurial dynamism depends on the configuration and coordination of governance branches rather than on aggregate institutional scores. These insights provide policymakers with actionable guidance for designing coherent, adaptive, and resilient entrepreneurship-supporting ecosystems. Full article
Show Figures

Figure 1

18 pages, 2702 KB  
Article
A Dual-Branch Ensemble Learning Method for Industrial Anomaly Detection: Fusion and Optimization of Scattering and PCA Features
by Jing Cai, Zhuo Wu, Runan Hua, Shaohua Mao, Yulun Zhang, Ran Guo and Ke Lin
Appl. Sci. 2026, 16(3), 1597; https://doi.org/10.3390/app16031597 - 5 Feb 2026
Viewed by 434
Abstract
Industrial visual anomaly detection remains challenging because practical inspection systems must achieve high detection accuracy while operating under highly imbalanced data, diverse defect patterns, limited computational resources, and increasing demands for interpretability. This work aims to develop a lightweight yet effective and explainable [...] Read more.
Industrial visual anomaly detection remains challenging because practical inspection systems must achieve high detection accuracy while operating under highly imbalanced data, diverse defect patterns, limited computational resources, and increasing demands for interpretability. This work aims to develop a lightweight yet effective and explainable anomaly detection framework for industrial images in settings where a limited number of labeled anomalous samples are available. We propose a dual-branch feature-based supervised ensemble method that integrates complementary representations: a PCA branch to capture linear global structure and a scattering branch to model multi-scale textures. A heterogeneous pool of classical learners (SVM, RF, ET, XGBoost, and LightGBM) is trained on each feature branch, and stable probability outputs are obtained via stratified K-fold out-of-fold training, probability calibration, and a quantile-based threshold search. Decision-level fusion is then performed by stacking, where logistic regression, XGBoost, and LightGBM serve as meta-learners over the out-of-fold probabilities of the selected top-K base learners. Experiments on two public benchmarks (MVTec AD and BTAD) show that the proposed method substantially improves the best PCA-based single model, achieving relative F1_score gains of approximately 31% (MVTec AD) and 26% (BTAD), with maximum AUC values of about 0.91 and 0.96, respectively, under comparable inference complexity. Overall, the results demonstrate that combining high-quality handcrafted features with supervised ensemble fusion provides a practical and interpretable alternative/complement to heavier deep models for resource-constrained industrial anomaly detection, and future work will explore more category-adaptive decision strategies to further enhance robustness on challenging classes. Full article
(This article belongs to the Special Issue AI and Data-Driven Methods for Fault Detection and Diagnosis)
Show Figures

Figure 1

24 pages, 5280 KB  
Article
MA-DeepLabV3+: A Lightweight Semantic Segmentation Model for Jixin Fruit Maturity Recognition
by Leilei Deng, Jiyu Xu, Di Fang and Qi Hou
AgriEngineering 2026, 8(2), 40; https://doi.org/10.3390/agriengineering8020040 - 23 Jan 2026
Viewed by 569
Abstract
Jixin fruit (Malus domesticaJixin’) is a high-value specialty fruit of significant economic importance in northeastern and northwestern China. Automatic recognition of fruit maturity is a critical prerequisite for intelligent harvesting. However, challenges inherent to field environments—including heterogeneous ripeness levels [...] Read more.
Jixin fruit (Malus domesticaJixin’) is a high-value specialty fruit of significant economic importance in northeastern and northwestern China. Automatic recognition of fruit maturity is a critical prerequisite for intelligent harvesting. However, challenges inherent to field environments—including heterogeneous ripeness levels among fruits on the same plant, gradual color transitions during maturation that result in ambiguous boundaries, and occlusion by branches and foliage—render traditional image recognition methods inadequate for simultaneously achieving high recognition accuracy and computational efficiency. Although existing deep learning models can improve recognition accuracy, their substantial computational demands and high hardware requirements preclude deployment on resource-constrained embedded devices such as harvesting robots. To achieve the rapid and accurate identification of Jixin fruit maturity, this study proposes Multi-Attention DeepLabV3+ (MA-DeepLabV3+), a streamlined semantic segmentation framework derived from an enhanced DeepLabV3+ model. First, a lightweight backbone network is adopted to replace the original complex structure, substantially reducing computational burden. Second, a Multi-Scale Self-Attention Module (MSAM) is proposed to replace the traditional Atrous Spatial Pyramid Pooling (ASPP) structure, reducing network computational cost while enhancing the model’s perception capability for fruits of different scales. Finally, an Attention and Convolution Fusion Module (ACFM) is introduced in the decoding stage to significantly improve boundary segmentation accuracy and small target recognition ability. Experimental results on a self-constructed Jixin fruit dataset demonstrated that the proposed MA-DeepLabV3+ model achieves an mIoU of 86.13%, mPA of 91.29%, and F1 score of 90.05%, while reducing the number of parameters by 89.8% and computational cost by 55.3% compared to the original model. The inference speed increased from 41 frames per second (FPS) to 81 FPS, representing an approximately two-fold improvement. The model memory footprint is only 21 MB, demonstrating potential for deployment on embedded devices such as harvesting robots. Experimental results demonstrate that the proposed model achieves significant reductions in computational complexity while maintaining high segmentation accuracy, exhibiting robust performance particularly in complex scenarios involving color gradients, ambiguous boundaries, and occlusion. This study provides technical support for the development of intelligent Jixin fruit harvesting equipment and offers a valuable reference for the application of lightweight deep learning models in smart agriculture. Full article
Show Figures

Figure 1

26 pages, 55590 KB  
Article
Adaptive Edge-Aware Detection with Lightweight Multi-Scale Fusion
by Xiyu Pan, Kai Xiong and Jianjun Li
Electronics 2026, 15(2), 449; https://doi.org/10.3390/electronics15020449 - 20 Jan 2026
Viewed by 434
Abstract
In object detection, boundary blurring caused by occlusion and background interference often hinders effective feature extraction. To address this challenge, we propose Edge Aware-YOLO, a novel framework designed to enhance edge awareness and efficient feature fusion. Our method integrates three key contributions. First, [...] Read more.
In object detection, boundary blurring caused by occlusion and background interference often hinders effective feature extraction. To address this challenge, we propose Edge Aware-YOLO, a novel framework designed to enhance edge awareness and efficient feature fusion. Our method integrates three key contributions. First, the Variable Sobel Compact Inverted Block (VSCIB) employs convolution kernels with adjustable orientation and size, enabling robust multi-scale edge adaptation. Second, the Spatial Pyramid Shared Convolution (SPSC) replaces standard pooling with shared dilated convolutions, minimizing detail loss during feature reconstruction. Finally, the Efficient Downsampling Convolution (EDC) utilizes a dual-branch architecture to balance channel compression with semantic preservation. Extensive evaluations on public datasets demonstrate that Edge Aware-YOLO significantly outperforms state-of-the-art models. On MS COCO, it achieves 56.3% mAP50 and 40.5% mAP50–95 (gains of 1.5% and 1.0%) with only 2.4M parameters and 5.8 GFLOPs, surpassing advanced models like YOLOv11. Full article
(This article belongs to the Topic Intelligent Image Processing Technology)
Show Figures

Figure 1

27 pages, 32247 KB  
Article
A Dual-Resolution Network Based on Orthogonal Components for Building Extraction from VHR PolSAR Images
by Songhao Ni, Fuhai Zhao, Mingjie Zheng, Zhen Chen and Xiuqing Liu
Remote Sens. 2026, 18(2), 305; https://doi.org/10.3390/rs18020305 - 16 Jan 2026
Viewed by 243
Abstract
Sub-meter-resolution Polarimetric Synthetic Aperture Radar (PolSAR) imagery enables precise building footprint extraction but introduces complex scattering correlated with fine spatial structures. This change renders both traditional methods, which rely on simplified scattering models, and existing deep learning approaches, which sacrifice spatial detail through [...] Read more.
Sub-meter-resolution Polarimetric Synthetic Aperture Radar (PolSAR) imagery enables precise building footprint extraction but introduces complex scattering correlated with fine spatial structures. This change renders both traditional methods, which rely on simplified scattering models, and existing deep learning approaches, which sacrifice spatial detail through multi-looking, inadequate for high-precision extraction tasks. To address this, we propose an Orthogonal Dual-Resolution Network (ODRNet) for end-to-end, precise segmentation directly from single-look complex (SLC) data. Unlike complex-valued neural networks that suffer from high computational cost and optimization difficulties, our approach decomposes complex-valued data into its orthogonal real and imaginary components, which are then concurrently fed into a Dual-Resolution Branch (DRB) with Bilateral Information Fusion (BIF) to effectively balance the trade-off between semantic and spatial details. Crucially, we introduce an auxiliary Polarization Orientation Angle (POA) regression task to enforce physical consistency between the orthogonal branches. To tackle the challenge of diverse building scales, we designed a Multi-scale Aggregation Pyramid Pooling Module (MAPPM) to enhance contextual awareness and a Pixel-attention Fusion (PAF) module to adaptively fuse dual-branch features. Furthermore, we have constructed a VHR PolSAR building footprint segmentation dataset to support related research. Experimental results demonstrate that ODRNet achieves 64.3% IoU and 78.27% F1-score on our dataset, and 73.61% IoU with 84.8% F1-score on a large-scale SLC scene, confirming the method’s significant potential and effectiveness in high-precision building extraction directly from SLC. Full article
Show Figures

Figure 1

26 pages, 3626 KB  
Article
A Lightweight Frozen Multi-Convolution Dual-Branch Network for Efficient sEMG-Based Gesture Recognition
by Shengbiao Wu, Zhezhe Lv, Yuehong Li, Chengmin Fang, Tao You and Jiazheng Gui
Sensors 2026, 26(2), 580; https://doi.org/10.3390/s26020580 - 15 Jan 2026
Viewed by 413
Abstract
Gesture recognition is important for rehabilitation assistance and intelligent prosthetic control. However, surface electromyography (sEMG) signals exhibit strong non-stationarity, and conventional deep-learning models require long training time and high computational cost, limiting their use on resource-constrained devices. This study proposes a Frozen Multi-Convolution [...] Read more.
Gesture recognition is important for rehabilitation assistance and intelligent prosthetic control. However, surface electromyography (sEMG) signals exhibit strong non-stationarity, and conventional deep-learning models require long training time and high computational cost, limiting their use on resource-constrained devices. This study proposes a Frozen Multi-Convolution Dual-Branch Network (FMC-DBNet) to address these challenges. The model employs randomly initialized and fixed convolutional kernels for training-free multi-scale feature extraction, substantially reducing computational overhead. A dual-branch architecture is adopted to capture complementary temporal and physiological patterns from raw sEMG signals and intrinsic mode functions (IMFs) obtained through variational mode decomposition (VMD). In addition, positive-proportion (PPV) and global-average-pooling (GAP) statistics enhance lightweight multi-resolution representation. Experiments on the Ninapro DB1 dataset show that FMC-DBNet achieves an average accuracy of 96.4% ± 1.9% across 27 subjects and reduces training time by approximately 90% compared with a conventional trainable CNN baseline. These results demonstrate that frozen random-convolution structures provide an efficient and robust alternative to fully trained deep networks, offering a promising solution for low-power and computationally efficient sEMG gesture recognition. Full article
(This article belongs to the Section Electronic Sensors)
Show Figures

Figure 1

18 pages, 10421 KB  
Article
A Deep Learning Framework with Multi-Scale Texture Enhancement and Heatmap Fusion for Face Super Resolution
by Bing Xu, Lei Wang, Yanxia Wu, Xiaoming Liu and Lu Gan
AI 2026, 7(1), 20; https://doi.org/10.3390/ai7010020 - 9 Jan 2026
Viewed by 854
Abstract
Face super-resolution (FSR) has made great progress thanks to deep learning and facial priors. However, many existing methods do not fully exploit landmark heatmaps and lack effective multi-scale texture modeling, which often leads to texture loss and artifacts under large upscaling factors. To [...] Read more.
Face super-resolution (FSR) has made great progress thanks to deep learning and facial priors. However, many existing methods do not fully exploit landmark heatmaps and lack effective multi-scale texture modeling, which often leads to texture loss and artifacts under large upscaling factors. To address these problems, we propose a Multi-Scale Residual Stacking Network (MRSNet), which integrates multi-scale texture enhancement with multi-stage heatmap fusion. The MRSNet is built upon Residual Attention-Guided Units (RAGUs) and incorporates a Face Detail Enhancer (FDE), which applies edge, texture, and region branches to achieve differentiated enhancement across facial components. Furthermore, we design a Multi-Scale Texture Enhancement Module (MTEM) that employs progressive average pooling to construct hierarchical receptive fields and employs heatmap-guided attention for adaptive texture refinement. In addition, we introduce a multi-stage heatmap fusion strategy that injects landmark priors into multiple phases of the network, including feature extraction, texture enhancement, and detail reconstruction, enabling deep sharing and progressive integration of prior knowledge. Extensive experiments on CelebA and Helen demonstrate that the proposed method achieves superior detail recovery and generates perceptually realistic high-resolution face images. Both quantitative and qualitative evaluations confirm that our approach outperforms state-of-the-art methods. Full article
Show Figures

Figure 1

38 pages, 9342 KB  
Review
Monitoring and Control of the Direct Energy Deposition (DED) Additive Manufacturing Process Using Deep Learning Techniques: A Review
by Yonghui Liu, Haonan Ren, Qi Zhang, Peng Yuan, Hui Ma, Yanfeng Li, Yin Zhang and Jiawei Ning
Materials 2026, 19(1), 89; https://doi.org/10.3390/ma19010089 - 25 Dec 2025
Cited by 2 | Viewed by 1396
Abstract
Directed Energy Deposition (DED), as a core branch of additive manufacturing, encompasses two typical processes: laser directed energy deposition (LDED) and wire and arc additive manufacturing (WAAM), which are widely used in manufacturing aerospace engine blades and core components of high-end equipment. In [...] Read more.
Directed Energy Deposition (DED), as a core branch of additive manufacturing, encompasses two typical processes: laser directed energy deposition (LDED) and wire and arc additive manufacturing (WAAM), which are widely used in manufacturing aerospace engine blades and core components of high-end equipment. In recent years, with the increasing adoption of deep learning (DL) technologies, the research focus in DED has gradually shifted from traditional “process parameter optimization” to “AI-driven process optimization” and “online real-time monitoring”. Given the complex and distinct influence mechanisms of key parameters (such as laser power/arc current, scanning/travel speed) on melt pool behavior and forming quality in the two processes, the introduction of artificial intelligence to address both common and specific issues has become particularly necessary. This review systematically summarizes the application of DL techniques in both types of DED processes. It begins by outlining DL frameworks, such as artificial neural networks (ANNs), recurrent neural networks (RNNs), convolutional neural networks (CNNs), and reinforcement learning (RL), and their compatibility with DED data. Subsequently, it compares the application scenarios, monitoring accuracy, and applicability of AI in DED process monitoring across multiple dimensions, including process parameters, optical, thermal fields, acoustic signals, and multi-sensor fusion. The review further explores the potential and value of DL in closed-loop parameter adjustment and reinforcement learning control. Finally, it addresses current bottlenecks such as data quality and model interpretability, and outlines future research directions, aiming to provide theoretical and engineering references for the intelligent upgrade and quality improvement of both DED processes. Full article
Show Figures

Graphical abstract

Back to TopTop