Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (552)

Search Parameters:
Keywords = multi-scale deep feature representation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 12740 KB  
Article
SAM2-RoadNet: Topology-Aware Multi-Scale Road Extraction from High-Resolution Remote Sensing Images
by Ruyue Feng, Ziyou Guo, Xiao Du and Tieru Wu
Remote Sens. 2026, 18(6), 913; https://doi.org/10.3390/rs18060913 - 17 Mar 2026
Abstract
Road extraction from high-resolution remote sensing images (HRSIs) is a fundamental task for many geospatial applications, yet it remains challenging due to complex backgrounds, frequent occlusions, and the requirement to preserve the topological connectivity of elongated road networks. To address these issues, this [...] Read more.
Road extraction from high-resolution remote sensing images (HRSIs) is a fundamental task for many geospatial applications, yet it remains challenging due to complex backgrounds, frequent occlusions, and the requirement to preserve the topological connectivity of elongated road networks. To address these issues, this paper proposes SAM2-RoadNet, a topology-aware multi-scale road extraction framework that adapts the powerful representation capability of the Segment Anything Model 2 (SAM2) to HRSI road segmentation. Unlike prompt-driven segmentation paradigms, SAM2-RoadNet employs the SAM2 image encoder solely as a feature extractor and introduces an adapter-based domain adaptation strategy to efficiently transfer pretrained knowledge to the remote sensing domain. Receptive field blocks are further integrated to enhance contextual perception and align channel dimensions, followed by a weighted bidirectional feature pyramid network (W-BiFPN) to fuse hierarchical features across multiple scales. Moreover, a topology-aware training strategy based on the soft-clDice loss is incorporated to explicitly enforce structural continuity and reduce road fragmentation. Extensive experiments conducted on two challenging benchmarks, including DeepGlobe, Massachusetts, demonstrate that SAM2-RoadNet achieves superior overall performance across multiple evaluation metrics compared with state-of-the-art methods in both quantitative accuracy and qualitative visual quality, while demonstrating promising cross-dataset transferability without additional fine-tuning. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

31 pages, 23615 KB  
Article
A Memory-Efficient Class-Incremental Learning Framework for Remote Sensing Scene Classification via Feature Replay
by Yunze Wei, Yuhan Liu, Ben Niu, Xiantai Xiang, Jingdun Lin, Yuxin Hu and Yirong Wu
Remote Sens. 2026, 18(6), 896; https://doi.org/10.3390/rs18060896 - 15 Mar 2026
Abstract
Most existing deep learning models for remote sensing scene classification (RSSC) adopt an offline learning paradigm, where all classes are jointly optimized on fixed-class datasets. In dynamic real-world scenarios with streaming data and emerging classes, such paradigms are inherently prone to catastrophic forgetting [...] Read more.
Most existing deep learning models for remote sensing scene classification (RSSC) adopt an offline learning paradigm, where all classes are jointly optimized on fixed-class datasets. In dynamic real-world scenarios with streaming data and emerging classes, such paradigms are inherently prone to catastrophic forgetting when models are incrementally trained on new data. Recently, a growing number of class-incremental learning (CIL) methods have been proposed to tackle these issues, some of which achieve promising performance by rehearsing training data from previous tasks. However, implementing such strategy in real-world scenarios is often challenging, as the requirement to store historical data frequently conflicts with strict memory constraints and data privacy protocols. To address these challenges, we propose a novel memory-efficient feature-replay CIL framework (FR-CIL) for RSSC that retains compact feature embeddings, rather than raw images, as exemplars for previously learned classes. Specifically, a progressive multi-scale feature enhancement (PMFE) module is proposed to alleviate representation ambiguity. It adopts a progressive construction scheme to enable fine-grained and interactive feature enhancement, thereby improving the model’s representation capability for remote sensing scenes. Then, a specialized feature calibration network (FCN) is trained in a transductive learning paradigm with manifold consistency regularization to adapt stored feature descriptors to the updated feature space, thereby effectively compensating for feature space drift and enabling a unified classifier. Following feature calibration, a bias rectification (BR) strategy is employed to mitigate prediction bias by exclusively optimizing the classifier on a balanced exemplar set. As a result, this memory-efficient CIL framework not only addresses data privacy concerns but also mitigates representation drift and classifier bias. Extensive experiments on public datasets demonstrate the effectiveness and robustness of the proposed method. Notably, FR-CIL outperforms the leading state-of-the-art CIL methods in mean accuracy by margins of 3.75%, 3.09%, and 2.82% on the six-task AID, seven-task RSI-CB256, and nine-task NWPU-45 datasets, respectively. At the same time, it reduces memory storage requirements by over 94.7%, highlighting its strong potential for real-world RSSC applications under strict memory constraints. Full article
Show Figures

Figure 1

18 pages, 5377 KB  
Article
Prediction of Prestress Changes in Concrete Under Freeze–Thaw Cycles Based on Transformer Model
by Jiancheng Zhang, Xiaolin Yang and Wen Zhang
Eng 2026, 7(3), 133; https://doi.org/10.3390/eng7030133 - 14 Mar 2026
Abstract
Given that freeze–thaw damage of prestressed concrete significantly threatens structural service life and that existing conventional simulation techniques fail to capture prestress time series, this paper proposes a deep learning prediction model based on the Transformer model. The model integrates a multi-head self-attention [...] Read more.
Given that freeze–thaw damage of prestressed concrete significantly threatens structural service life and that existing conventional simulation techniques fail to capture prestress time series, this paper proposes a deep learning prediction model based on the Transformer model. The model integrates a multi-head self-attention mechanism and positional encoding to effectively capture long-range dependencies in prestressed time series. It enhances temporal modeling capability through a 128-dimensional high-dimensional feature space (chosen to balance representation capacity and computational efficiency for the dataset scale) and a 4-layer encoder stacking structure. A dataset was constructed using time-series data from three prestressed concrete components subjected to 50 freeze–thaw cycles. The F-a component was used as the training set, while F-b and F-c served as the testing sets. During the training phase, a Noam learning rate scheduler, gradient clipping, and an early stopping strategy were employed. The results indicate that the training strategy enables the loss function to converge quickly without overfitting, demonstrating good generalization performance. The prediction model performs well on the F-a and F-c datasets, with determination coefficients (R2) of 0.8404 and 0.8425, and corresponding Mean Absolute Error (MAE) of 61.71 MPa and 57.41 MPa, respectively. It can accurately track the periodic variation trend of prestress, demonstrating the model’s effectiveness in prestress prediction. This model provides a new technical tool for the health monitoring and performance prediction of prestressed concrete structures in freeze–thaw environments. Full article
(This article belongs to the Section Chemical, Civil and Environmental Engineering)
Show Figures

Figure 1

21 pages, 11196 KB  
Article
CR-MAT: Causal Representation Learning for Few-Shot Non-Intrusive Load Monitoring
by Xianglong Li, Shengxin Kong, Jiani Zeng, Hanqi Dai, Lu Zhang, Weixian Wang, Zihan Zhang and Liwen Xu
Electronics 2026, 15(6), 1195; https://doi.org/10.3390/electronics15061195 - 13 Mar 2026
Viewed by 133
Abstract
Non-intrusive load monitoring (NILM) is a key enabler for smart-grid applications, yet practical deployment is often hindered by limited appliance-level labels and severe distribution shifts across households and operating conditions. As a result, many deep learning approaches become unreliable in small-sample and out-of-distribution [...] Read more.
Non-intrusive load monitoring (NILM) is a key enabler for smart-grid applications, yet practical deployment is often hindered by limited appliance-level labels and severe distribution shifts across households and operating conditions. As a result, many deep learning approaches become unreliable in small-sample and out-of-distribution (OOD) settings. In this paper, we propose CR-MAT, a causality-driven representation learning framework for few-shot NILM classification. Instead of relying on large-scale training or heavy data augmentation, CR-MAT injects causal representation learning into multi-appliance task modeling, encouraging the network to learn appliance-discriminative features that are stable across environments while suppressing spurious, domain-specific correlations. We conduct extensive experiments under multiple OOD scenarios and consistently observe improved classification robustness compared with deep NILM baselines. Further analysis indicates that causal representation learning enhances resilience to non-stationary consumption patterns and improves generalization under OOD scenarios. The proposed framework provides a practical route toward reliable NILM classification and supports downstream smart-grid applications such as flexible load control and demand response. Full article
Show Figures

Figure 1

31 pages, 6867 KB  
Article
Field-Scale Detection of Rice Bacterial Leaf Blight Using UAV-Based Multispectral Imagery: Via Cross-Scale Sample-Label Transfer and Spatial–Spectral Feature Fusion
by Huiqin Ma, Zhiqin Gui, Yujin Jing, Dongmei Chen, Dayang Li, Dong Shen and Jingcheng Zhang
Remote Sens. 2026, 18(6), 880; https://doi.org/10.3390/rs18060880 - 13 Mar 2026
Viewed by 85
Abstract
Accurate field-scale crop disease detection is crucial for precise decisions and for highly efficient multi-scale collaboration. UAV-based multispectral imaging technology offers advantages in terms of high efficiency and low cost. Deep learning shows potential for deep representation and fusion of spectral and spatial [...] Read more.
Accurate field-scale crop disease detection is crucial for precise decisions and for highly efficient multi-scale collaboration. UAV-based multispectral imaging technology offers advantages in terms of high efficiency and low cost. Deep learning shows potential for deep representation and fusion of spectral and spatial features. However, traditional manual disease surveys are limited by efficiency and cost, making it difficult to meet the large sample sizes required by deep learning. Therefore, we proposed a method for rice bacterial leaf blight detection using UAV-based multispectral imagery. This method integrates a cross-scale sample-label transfer, and a spectral–spatial dual-branch feature fusion architecture (DualRiceNet). We first used RTK positioning to transfer disease labels from near-ground RGB images to high-altitude multispectral images, effectively expanding the dataset and alleviating the scarcity of labeled samples. DualRiceNet employed a cross-attention mechanism to couple its spectral and spatial branches, thereby isolating disease-specific spatial–spectral patterns from complex interference from the farmland background. DualRiceNet achieved an overall accuracy (OA) of 92.3% on the same-distribution test set. In an independent scenario test set spanning multiple differences in geography, time, phenology, and variety, the model maintained the highest OA of 80.0%. Our method demonstrated an excellent generalization ability to real-world environmental variations in rice fields. Full article
Show Figures

Figure 1

21 pages, 4501 KB  
Article
YOLOv8n-ALC: An Efficient Network for Bolt-Nut Fastener Detection in Complex Substation Environments
by Dazhang You, Fangke Li, Sicheng Wang and Yepeng Zhang
Appl. Sci. 2026, 16(6), 2716; https://doi.org/10.3390/app16062716 - 12 Mar 2026
Viewed by 104
Abstract
Bolt-nut fasteners are critical components of substation equipment, and their integrity directly affects the operational reliability of power systems. In practical inspection scenarios, however, the small physical scale of bolt-nut fasteners, together with complex background structures, often obscures their discriminative visual features, making [...] Read more.
Bolt-nut fasteners are critical components of substation equipment, and their integrity directly affects the operational reliability of power systems. In practical inspection scenarios, however, the small physical scale of bolt-nut fasteners, together with complex background structures, often obscures their discriminative visual features, making accurate automated detection particularly challenging. Reliable detection is a prerequisite for downstream tasks such as loosening identification and defect diagnosis. To address these challenges, this paper proposes YOLOv8n-ALC, an enhanced detection network built upon the lightweight YOLOv8n framework. The backbone is redesigned by integrating the AdditiveBlock from CAS-ViT and a Convolutional Gated Linear Unit (CGLU) to strengthen fine-grained feature extraction and suppress background interference without increasing computational burden. In addition, an improved Large Separable Kernel Attention (LSKA) module is introduced to expand the effective receptive field while maintaining efficiency, enabling more robust multi-scale feature representation. To further alleviate feature degradation of small bolt-nut fasteners in deep layers, a Context-Guided Reconstruction Feature Pyramid Network (CGRFPN) is employed in the neck to optimize cross-layer feature fusion and enhance localization accuracy. Experimental results demonstrate that YOLOv8n-ALC achieves an mAP@0.5 of 92.1%, with precision and recall of 93.5% and 87.1%, respectively, outperforming the baseline by clear margins. These results confirm the effectiveness and robustness of the proposed method for intelligent substation inspection and bolt-nut fastener condition monitoring. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 1192 KB  
Article
Multi-Scale Feature Mixing of Language Model Embeddings for Enhanced Prediction of Submitochondrial Protein Localization
by Rong Wang, Menghua Wang, Yibo Wu, Lixiang Yang and Xiao Wang
Algorithms 2026, 19(3), 212; https://doi.org/10.3390/a19030212 - 11 Mar 2026
Viewed by 103
Abstract
Accurate prediction of submitochondrial localization is fundamental to understanding mitochondrial biogenesis and cellular metabolic pathways. While deep representations from pre-trained protein language models (pLMs) have significantly advanced the field, traditional global average pooling methods often fail to capture critical, localized N-terminal targeting signals, [...] Read more.
Accurate prediction of submitochondrial localization is fundamental to understanding mitochondrial biogenesis and cellular metabolic pathways. While deep representations from pre-trained protein language models (pLMs) have significantly advanced the field, traditional global average pooling methods often fail to capture critical, localized N-terminal targeting signals, particularly in long sequences where these motifs are mathematically diluted. To resolve this “signal dilution” bottleneck, we developed a multi-scale architecture that explicitly integrates high-resolution N-terminal features with global evolutionary context derived from ESM-2 embeddings. The proposed framework utilizes an orthogonal mixing strategy consisting of Token-mixing and Channel-mixing. Token-mixing is specifically designed to detect spatial rhythmic patterns across residue positions, while Channel-mixing refines the biochemical signatures within the latent feature space. Extensive benchmarking across diverse datasets demonstrates that our approach effectively maintains signal integrity. Compared to existing state-of-the-art methods, the model achieves a superior overall Generalized Correlation Coefficient (GCC) of 0.7443 on the SM424-18 dataset and 0.7878 on the SubMitoPred dataset, outperforming the latest benchmarks by 9.4% and 16.1%, respectively. Furthermore, on the independent M983 test set, our method maintained a high GCC of 0.6945, demonstrating a 9.9% improvement relative to the state-of-the-art methods. This robust and efficient framework provides a high-precision tool for large-scale mitochondrial proteomics. Full article
Show Figures

Figure 1

19 pages, 7917 KB  
Article
A Line Selection Method for Small-Current Grounding Faults Based on Time–Frequency Graphs and Image Detection
by Lei Li, Shuai Hao and Weili Wu
Electronics 2026, 15(6), 1165; https://doi.org/10.3390/electronics15061165 - 11 Mar 2026
Viewed by 151
Abstract
Aiming at the problem that the multi-scale feature interaction ability of the traditional deep learning-based line selection algorithm is insufficient, resulting in the decline of line selection accuracy, a multi-scale feature fusion line selection method based on transfer learning is proposed, abbreviated as [...] Read more.
Aiming at the problem that the multi-scale feature interaction ability of the traditional deep learning-based line selection algorithm is insufficient, resulting in the decline of line selection accuracy, a multi-scale feature fusion line selection method based on transfer learning is proposed, abbreviated as TLM-Net. Firstly, to address the issue of the insufficient generalization ability of the line selection network in small-sample scenarios, a simulation data pre-training framework is constructed, and a robust feature representation basis is established through a cross-domain knowledge transfer mechanism. Secondly, aiming at the problem of insufficient extraction of feature information by traditional algorithms, a multi-scale feature fusion network (MFFN) is designed to integrate global context information and local detail features, achieving cross-level semantic complementarity and spatial alignment optimization. Then, to enhance the representation ability of weak fault feature information, an EKA mechanism integrating variable kernel convolution is designed. The background interference is reduced through adaptive multi-region feature focusing, and the edge recognition accuracy of the model for irregular targets is improved. Finally, the pre-trained model is transferred to the target domain by adopting the transfer learning strategy, and the network parameters are fine-tuned in combination with the on-site data to achieve cross-domain adaptation of the feature space. The experimental results show that the TLM-Net algorithm’s mAP@0.5 reaches 98.5%, the accuracy rate and recall rate reach 98.3% and 96.5%, respectively, and the accuracy is improved by 37.5% compared with the original model. Full article
(This article belongs to the Special Issue Security Defense Technologies for the New-Type Power System)
Show Figures

Figure 1

19 pages, 2380 KB  
Article
DTBAffinity: A Multi-Modal Feature Engineering and Gradient-Boosting Framework for Drug–Target Binding Affinity on Davis and KIBA Benchmarks
by Meshari Alazmi
Computers 2026, 15(3), 182; https://doi.org/10.3390/computers15030182 - 10 Mar 2026
Viewed by 163
Abstract
An accurate prediction of how strongly a drug binds to its target (where the drug will have the desired effect) is very important for drug discovery. It helps select the most promising compounds and saves money by doing fewer experiments. We present DTBAffinity, [...] Read more.
An accurate prediction of how strongly a drug binds to its target (where the drug will have the desired effect) is very important for drug discovery. It helps select the most promising compounds and saves money by doing fewer experiments. We present DTBAffinity, a multi-modal regression framework that integrates chemically meaningful ligand descriptors with diverse protein sequence features in a unified gradient-boosting model. The representation of ligands includes physicochemical and topological descriptors (RDKit and Mordred), structural keys (MACCS and FP4), circular fingerprints (ECFP/Morgan), and SMILES-derived features from iFeatureOmega. For proteins, thousands of sequence-derived descriptors (composition, autocorrelations, physicochemical profiles, and evolutionary indices) from iFeatureOmega are used, together with contextual embeddings from large protein language models (ESM-1b, ESM-2). The feature matrices are cleaned up, variance filtered, z-score scaled, and univariate selected before being concatenated and modeled with regularized XGBoost ensembles. We evaluate DTBAffinity on two kinase-centric datasets that are commonly used: Davis (30,056 interactions: pKd values) and KIBA (118,254 interactions: integrated affinity scores). Various metrics are used to measure the performance, such as MSE, R2, Pearson/Spearman correlations, Concordance Index (CI), rm2, and AUPR. On Davis, DTBAffinity yields MSE = 0.1885, CI = 0.9102, and AUPR = 0.8112, and on KIBA, it gives MSE = 0.1540, CI = 0.8686, and AUPR = 0.8361; thus, it is better than the state-of-the-art baselines such as KronRLS, SimBoost, DeepDTA, and GraphDTA. The findings here imply that the combination of interpretable descriptors and contextual embeddings in a robust boosting framework is a great way to realize accurate, interpretable, and generalizable DTBA prediction. Full article
(This article belongs to the Special Issue AI in Bioinformatics)
Show Figures

Figure 1

22 pages, 11365 KB  
Article
Addressing Dense Small-Object Detection in Remote Sensing: An Open-Vocabulary Object Detection Framework
by Menghan Ju, Yingchao Feng, Wenhui Diao and Chunbo Liu
Remote Sens. 2026, 18(6), 851; https://doi.org/10.3390/rs18060851 - 10 Mar 2026
Viewed by 180
Abstract
Remote sensing open-vocabulary object detection focuses on identifying and localizing unseen categories within remote sensing imagery. However, constrained by characteristics such as dense target distribution, complex background interference, and drastic scale variations inherent to remote sensing scenarios, existing methods are prone to background [...] Read more.
Remote sensing open-vocabulary object detection focuses on identifying and localizing unseen categories within remote sensing imagery. However, constrained by characteristics such as dense target distribution, complex background interference, and drastic scale variations inherent to remote sensing scenarios, existing methods are prone to background noise interference when extracting features from dense, small target regions. This leads to weakened semantic representation and reduced localization accuracy. Therefore, we propose RS-DINO to address these challenges. Specifically: Firstly, to address the issue of small features being obscured by the background, the feature extraction module incorporates a multi-scale large-kernel attention mechanism. This expands the receptive field while enhancing local detail modelling, significantly improving the feature representation of minute targets. Secondly, a cross-modal feature fusion module employing bidirectional cross-attention achieves deep alignment between image and textual features. Subsequently, a language-guided query selection mechanism enhances detection accuracy through hybrid query strategies. Finally, to enhance the spatial sensitivity and channel adaptability of fusion features, the multimodal decoder integrates a convolutional gated feedforward network, significantly boosting the model’s robustness in dense, multi-scale scenes. Experiments on DIOR, DOTA v2.0, and NWPU-VHR10 demonstrate substantial gains, with fine-tuned RS-DINO surpassing existing methods by 3.5%, 3.7%, and 4.0% in accuracy, respectively. Full article
Show Figures

Figure 1

21 pages, 6660 KB  
Article
Infrared and Visible Multi-Scale Pyramid Cross-Layer Fusion Algorithm Based on Thermal Extended Target Separation
by An Liang, Laixian Zhang, Yingchun Li, Hao Ding, Haijing Zheng, Rong Li and Rui Zhu
Photonics 2026, 13(3), 263; https://doi.org/10.3390/photonics13030263 - 10 Mar 2026
Viewed by 136
Abstract
Infrared and visible image fusion aims to synergistically combine the thermal target saliency of infrared images with the rich textual details of visible images. To address the limitations of traditional multi-scale methods in terms of target-background contrast and detail preservation, this paper introduces [...] Read more.
Infrared and visible image fusion aims to synergistically combine the thermal target saliency of infrared images with the rich textual details of visible images. To address the limitations of traditional multi-scale methods in terms of target-background contrast and detail preservation, this paper introduces a novel multi-scale pyramid cross-layer fusion framework. The core of this framework lies in a thermal expansion-based target separation mechanism for superior hierarchical decomposition. Source images are first decomposed via a Gaussian–Laplacian pyramid for multi-resolution representation. By exploiting infrared thermal saliency and visible geometric priors, the scene is explicitly segregated into a target layer and a background layer. The target layer employs deep feature extraction based on Iteratively Reweighted Nuclear Norm minimization to sharpen thermal prominences and enhance contrast; concurrently, the background layer undergoes a cross-modal, cross-layer consistency fusion strategy, integrating spatial textures across frequency bands to maintain structural fidelity and detail richness. This dual-layer paradigm, augmented by multi-scale aggregation, ensures seamless, artifact-free fusion. To comprehensively evaluate the proposed method, systematic experiments are conducted on two benchmark datasets: TNO and RoadScene. Evaluations on the dataset demonstrate that our method outperforms state-of-the-art baselines. Extended experiments on the MSRS dataset further confirm the strong generalization capability and robustness of our method. Furthermore, systematic hyperparameter experiments determine the optimal model configuration, and ablation studies substantiate the effective contribution of both the pyramid segregation module and the IRNN optimization module to the final fusion performance. Extensive hyperparameter testing identified the optimal setup, and ablation studies confirmed the contribution of each key module. Overall, our fusion algorithm demonstrates satisfactory performance in the experiments, representing a clear advance. Full article
(This article belongs to the Special Issue Computational Optical Imaging: Theories, Algorithms, and Applications)
Show Figures

Figure 1

22 pages, 3583 KB  
Article
SSFF-DETR: A Surface Contaminant Detection Transformer for Microsystem Devices with Scale Sequence Feature Fusion
by Mengxiao Cui, Liping Lu and Hanshan Li
Appl. Sci. 2026, 16(5), 2610; https://doi.org/10.3390/app16052610 - 9 Mar 2026
Viewed by 129
Abstract
Microsystem devices are widely used in key fields such as aerospace. The various contaminants generated during their manufacturing process have the characteristics of diverse forms and are easily affected by background interference, making them difficult to detect. To solve this problem, this paper [...] Read more.
Microsystem devices are widely used in key fields such as aerospace. The various contaminants generated during their manufacturing process have the characteristics of diverse forms and are easily affected by background interference, making them difficult to detect. To solve this problem, this paper proposes a surface contaminant detection transformer for microsystem devices with scale sequence feature fusion (SSFF-DETR). This model is based on the real-time detection transformer (RT-DETR) framework. The faster efficient channel attention (Faster-ECA) was constructed as the backbone network, enhancing the extraction ability and computational efficiency of key features of contaminants. By introducing the dynamic feature region collaborative attention (DFRCA) at the end of the backbone network, the contrast between contaminant features and the background was effectively enhanced, thereby improving the model’s ability to identify contaminants. An Encoder based on scale sequence feature (SSF) and triple-branch feature fusion (TFF) is designed. By enhancing multi-scale representation, it effectively retains the detailed features of contaminants in complex backgrounds and alleviates the problem of feature loss during transmission in deep networks. The experimental results show that compared with the RT-DETR model, the SFFE-DETR model has achieved an increase of 2.6% in mean average precision (mAP). At the same time, the Giga Floating-Point Operations Per Second (GFLOPs) have decreased by 2G, and the params have reduced by 0.8 M. This provides a feasible solution for the high-precision and high-efficiency automated detection of surface contaminants in microsystem devices. Full article
Show Figures

Figure 1

22 pages, 6170 KB  
Article
A Lightweight Net with Dual-Path Feature Enhancer and Bidirectional Gated Fusion for Cloud Detection
by Yan Mo, Puhui Chen, Shaowei Bai and Erbao Xiao
Sensors 2026, 26(5), 1727; https://doi.org/10.3390/s26051727 - 9 Mar 2026
Viewed by 155
Abstract
Cloud detection serves as a critical preprocessing step in remote sensing image processing and quantitative applications. However, prevailing deep learning-based models often depend on computationally intensive backbone networks to achieve high accuracy, which hinders their deployment in resource-constrained scenarios such as on-board processing [...] Read more.
Cloud detection serves as a critical preprocessing step in remote sensing image processing and quantitative applications. However, prevailing deep learning-based models often depend on computationally intensive backbone networks to achieve high accuracy, which hinders their deployment in resource-constrained scenarios such as on-board processing or edge computing. To bridge the trade-off between accuracy and efficiency, this paper introduces a lightweight network for cloud detection. The core innovations of our network are twofold: (1) a dual-path feature enhancer that operates at the front end to extract and fuse multi-scale features through a parallel architecture, significantly enriching feature diversity and representational capacity, thereby alleviating the need for a complex backbone, and (2) a bidirectional gated fusion module, which adaptively integrates multi-scale features from the dual-path feature enhancer with deep semantic features from the backbone decoder through a gated attention mechanism and dynamic convolution, thereby enhancing feature discriminability. Comprehensive experiments on the public HRC_WHU dataset demonstrate that the proposed model achieves a high overall accuracy of 96.31% and a mean intersection-over-union of 92.82%, with only 12.04 GFLOPs of computational cost, outperforming several state-of-the-art methods. These results validate that our approach effectively balances high detection performance with computational efficiency, offering a practical solution for real-time, lightweight cloud detection in high-resolution remote sensing imagery. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems—2nd Edition)
Show Figures

Figure 1

15 pages, 2052 KB  
Article
A Dual-Branch Multi-Scale Network for Skin Lesion Classification
by Ying Liu, Xinyu Feng, Yuchai Wan, Huifu Li, Xun Zhang and Abdureyim Raxidin
Electronics 2026, 15(5), 1118; https://doi.org/10.3390/electronics15051118 - 8 Mar 2026
Viewed by 172
Abstract
Dermoscopic images are widely used for diagnosing skin diseases, and automatic classification of lesion types using deep learning can significantly enhance diagnostic efficiency. However, challenges such as variations in imaging conditions, subtle differences between classes, high variability within classes, and severe class imbalance [...] Read more.
Dermoscopic images are widely used for diagnosing skin diseases, and automatic classification of lesion types using deep learning can significantly enhance diagnostic efficiency. However, challenges such as variations in imaging conditions, subtle differences between classes, high variability within classes, and severe class imbalance complicate skin lesion analysis. This paper introduces a dual-branch deep learning model where two branches independently process high-frequency and low-frequency image features to generate multi-scale fused representations. To address class imbalance, the model employs cosine similarity to strengthen inter-class discrimination and incorporates a bias term to improve recognition of minority lesion classes. Experiments conducted on the ISIC 2017 and ISIC 2018 datasets demonstrate that the proposed method surpasses state-of-the-art approaches, achieving accuracies of 97.0% and 91.9%, respectively, with sensitivity and specificity both exceeding 90% on the two datasets. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision Application: Second Edition)
Show Figures

Figure 1

27 pages, 5957 KB  
Article
A Study of the Three-Dimensional Localization of an Underwater Glider Hull Using a Hierarchical Convolutional Neural Network Vision Encoder and a Variable Mixture-of-Experts Transformer
by Jungwoo Lee, Ji-Hyun Park, Jeong-Hwan Hwang, Kyoungseok Noh and Jinho Suh
Remote Sens. 2026, 18(5), 793; https://doi.org/10.3390/rs18050793 - 5 Mar 2026
Viewed by 185
Abstract
Although underwater gliders are highly energy-efficient platforms capable of long-duration and large-scale ocean observation, their lack of self-propulsion requires external assistance for recovery upon mission completion. In harsh and dynamic marine environments, reliably detecting the glider and accurately estimating its three-dimensional position are [...] Read more.
Although underwater gliders are highly energy-efficient platforms capable of long-duration and large-scale ocean observation, their lack of self-propulsion requires external assistance for recovery upon mission completion. In harsh and dynamic marine environments, reliably detecting the glider and accurately estimating its three-dimensional position are critical to ensuring the recovery operations are safe and efficient. This paper proposes a perception framework based on deep learning to detect underwater glider hulls and estimate their three-dimensional relative positions using camera–sonar multi-sensor fusion. This approach integrates a hierarchical convolutional neural network (CNN) vision encoder and a transformer-based architecture to estimate the glider’s spatial location and heading direction simultaneously. The hierarchical CNN encoder extracts multi-level, semantically rich visual features, thereby improving robustness to visual degradation and environmental disturbances common in underwater settings. Additionally, the transformer incorporates a variable mixture-of-experts (vMoE) mechanism that adaptively allocates expert networks across layers, enhancing representational capacity while maintaining computational efficiency. The resulting pose estimates enable precise, collision-free ROV navigation for automated recovery and onboard sensor inspection tasks. Experimental results, including ablation studies, validate the effectiveness of the proposed components and demonstrate their contributions to accurate glider hull detection and three-dimensional localization. Overall, the proposed framework provides a scalable, reliable perception solution that allows for the safe, autonomous recovery of underwater gliders with an ROV in realistic ocean environments. Full article
Show Figures

Figure 1

Back to TopTop