Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (447)

Search Parameters:
Keywords = low-level visual features

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 1020 KB  
Article
Research on a Portable Multispectral Imaging System for Starch Content Detection in Watermelon–Pumpkin Grafted Seedling Leaves
by Shengyong Xu, Honglei Yang, Yu Zeng, Shaodong Wang, Shuo Yang, Zhilong Bie and Yuan Huang
Agriculture 2026, 16(10), 1127; https://doi.org/10.3390/agriculture16101127 - 21 May 2026
Viewed by 100
Abstract
Plant leaf starch content is a critical indicator of metabolic status, yet traditional enzymatic methods are destructive, labor-intensive, and costly. This study proposes a novel non-destructive detection method using watermelon–pumpkin grafted seedlings. To optimize hardware design, 12 characteristic wavelengths were identified via competitive [...] Read more.
Plant leaf starch content is a critical indicator of metabolic status, yet traditional enzymatic methods are destructive, labor-intensive, and costly. This study proposes a novel non-destructive detection method using watermelon–pumpkin grafted seedlings. To optimize hardware design, 12 characteristic wavelengths were identified via competitive adaptive reweighted sampling (CARS). A portable multispectral imaging system was developed, featuring narrowband LEDs and integrated human–computer interaction software for real-time visualization. We constructed a multimodal deep learning architecture that integrates a convolutional neural network (CNN) for spatial feature extraction from RGB images, a fully connected neural network (FCNN) for spectral data, and a Transformer network for high-level feature fusion. Experimental results showed that the ShuffleNet v2-Transformer model achieved an R2 of 0.956 (RMSE = 0.036) for watermelon leaves, while the EfficientNet b1-Transformer model reached an R2 of 0.967 (RMSE = 0.052) for pumpkin leaves. This multimodal approach significantly outperformed conventional PLSR and single-modal CNN models, demonstrating superior ability in processing long-range dependencies within spectral–spatial data. The system enables accurate detection with a throughput of 120 samples per hour at a hardware cost approximately 90% lower than commercial multispectral cameras. This provides an efficient, low-cost solution for large-scale monitoring of plant physiological indicators in precision breeding. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
20 pages, 13558 KB  
Article
Deep Hybrid Synesthesia Model for Audio-Image Transfer
by Zhaojie Luo, Jiayong Jiang and Ladóczki Bence
Electronics 2026, 15(10), 2218; https://doi.org/10.3390/electronics15102218 - 21 May 2026
Viewed by 141
Abstract
Most artistic expressions are conveyed through images (e.g., painting) and audio (e.g., music), and deep learning has been successfully applied to neural style transfer within each of these modalities. However, there is still a lack of deep models that explicitly learn to transfer [...] Read more.
Most artistic expressions are conveyed through images (e.g., painting) and audio (e.g., music), and deep learning has been successfully applied to neural style transfer within each of these modalities. However, there is still a lack of deep models that explicitly learn to transfer style between images and audio. Motivated by synesthesia, which reflects intrinsic connections between vision and hearing in the human brain, we propose a deep hybrid synesthesia model for audio–image style transfer. Our framework consists of two main components: (1) a component conversion module that learns cross-modal mappings between audio rhythm/spectrum and image color/shape in a continuous valence–arousal (VA) emotion space; and (2) a style conversion module that transfers high-level artistic styles between Eastern (ink-wash, shui-mo) and Western painting and their corresponding musical counterparts. We first learn emotion-aware feature networks that align low-level audio and visual components based on shared affective representations, and then model long-term stylistic structures for cross-modal style transfer. Experiments include “seeing the sound” (audio-to-image generation with controllable components) and full audio–image style transformations. Both objective analyses and subjective evaluations suggest that our model can produce cross-modal artworks whose perceived style and emotional content are consistent with human synesthetic impressions. Full article
Show Figures

Figure 1

27 pages, 72468 KB  
Article
Long-Tailed Remote Sensing Image Classification via Multi-Scale Data, Pre-Trained Model, and Efficient Inference Strategy
by Song Han, Xing Han, Yibo Xu, Yongqin Tian, Weidong Zhang and Wenyi Zhao
Remote Sens. 2026, 18(10), 1636; https://doi.org/10.3390/rs18101636 - 19 May 2026
Viewed by 225
Abstract
Remote sensing image classification is one of the fundamental tasks in the field of remote sensing and plays a critical role in Earth observation applications. However, the inherent multi-scale characteristics of this task pose significant challenges to scene classification. To address these issues, [...] Read more.
Remote sensing image classification is one of the fundamental tasks in the field of remote sensing and plays a critical role in Earth observation applications. However, the inherent multi-scale characteristics of this task pose significant challenges to scene classification. To address these issues, we propose a novel framework that integrates the Contrastive Language–Image Pre-training (CLIP) model, multi-scale data, and efficient inference strategy. The proposed framework transfers general-purpose features learnt from natural images to remote sensing image classification. Specifically, this framework leverages the rich feature representations learnt by the CLIP model in the contrastive learning procedure and adopts it as the backbone network of the model to extract fine-grained and multi-scale features for remote sensing images. That is, the model can learn local fine-grained details but also encode global contextual information useful for the classification of visually similar scene categories. Afterwards, AdapterFormer module is inserted into the few selected layers of CLIP model, which can effectively enhance model performance and have low computational overhead. This helps efficient knowledge sharing and introduces new features at the model level. Furthermore, to alleviate possible performance deterioration brought about by multi-scale feature variation, a multi-scale training set is constructed at data level, providing complementary multi-scale information. Through the synergy of all these strategies above, the proposed method greatly improves the classification performance of multi-scale remote sensing images. Extensive experiments on the MEET dataset (it includes 80 fine categories and more than 800,000 samples) show that the proposed method greatly improves the performance. Compared with general-purpose classification networks and remote sensing-related models, the proposed method always gets state-of-the-art results. Full article
Show Figures

Figure 1

22 pages, 19994 KB  
Article
A Dual-Channel and Multi-Sensor Fusion Framework for Coal Mine Image Dehazing
by Xinliang Wang and Yan Huo
Sensors 2026, 26(10), 3171; https://doi.org/10.3390/s26103171 - 17 May 2026
Viewed by 292
Abstract
Due to dust, haze and uneven lighting conditions, images captured in coal mines frequently suffer severe quality degradation. Traditional dehazing methods typically overlook color characteristics and employ single algorithms, and deep-learning-based approaches require substantial training data and demand high hardware specifications, which restricts [...] Read more.
Due to dust, haze and uneven lighting conditions, images captured in coal mines frequently suffer severe quality degradation. Traditional dehazing methods typically overlook color characteristics and employ single algorithms, and deep-learning-based approaches require substantial training data and demand high hardware specifications, which restricts their dehazing performance and efficiency. This research proposes an efficient image dehazing framework. This method integrates bright and dark channel information to derive contrast feature values based on their linear differences. These values reflect dust concentration levels in the environment. By incorporating dust sensor data, the adaptive scaling coefficient and dust compensation terms are established. The adaptive scaling coefficient serves as a dynamic pixel selection ratio during ambient light estimation, effectively preserving the brightest pixel points. The global color mean functions as the criterion for determining image color characteristics, distinguishing between color images and low-light grayscale images to enable different dehazing approaches. This process achieves state verification and information complementarity between visual perception and dust measurement. The weighted fusion of bright and dark channels yields more accurate estimation for ambient light and transmission. Additionally, a weighted guided filter is designed with dust compensation terms incorporated. Ablation studies were conducted to validate the effectiveness of this method in enhancing image features. Finally, comparative experiments were performed using a self-constructed coal mine hazy image dataset, along with SOTS-indoor and SOTS-outdoor datasets. Experimental results demonstrate that, compared with other state-of-the-art methods, this method effectively removes haze while restoring image features and details, exhibiting superior stability, adaptability, and computational efficiency. Full article
Show Figures

Figure 1

28 pages, 125254 KB  
Article
Bridging Image-Based Detection and Field Evaluation: A Semi-Automated Pavement Distress Assessment Framework
by Betül Değer Şitilbay and Mehmet Ozan Yılmaz
Sustainability 2026, 18(10), 4935; https://doi.org/10.3390/su18104935 - 14 May 2026
Viewed by 176
Abstract
Accurate, rapid, and consistent evaluation of pavement condition across large-scale road networks is critical for sustainable maintenance and rehabilitation planning. However, conventional approaches largely rely on manual visual inspections, which are time-consuming, subjective, and difficult to implement at the network level. In this [...] Read more.
Accurate, rapid, and consistent evaluation of pavement condition across large-scale road networks is critical for sustainable maintenance and rehabilitation planning. However, conventional approaches largely rely on manual visual inspections, which are time-consuming, subjective, and difficult to implement at the network level. In this study, a semi-automated pavement distress evaluation framework that integrates field-based assessment with computer vision techniques is proposed. The study was conducted on a 3 km roadway network located within the Yıldız Technical University Davutpaşa Campus. Field-based distress observations were used as reference data, while street-level images obtained from the Mapillary platform were analyzed using a deep learning-based YOLOv8 model trained on the RDD2022 dataset, which was specifically developed for road distress detection. The analysis focuses on crack and pothole distress, which have a dominant influence on PCR and are highly distinguishable in image-based approaches. Correlation analyses between automated detection results and field-based data demonstrate a strong agreement, reaching values of approximately ρ0.90 in some routes. These findings indicate that these distress types are effective in representing variations in pavement condition. The results demonstrate that multi-source image data and deep learning-based detection methods can be reliably used for section-level pavement condition assessment. The proposed approach addresses a key gap in the literature by transforming image-level detections into engineering-based decision-support information. Furthermore, by leveraging publicly available data sources, the framework offers a low-cost and scalable solution that enables rapid preliminary assessment over large road networks, thereby providing significant potential for sustainable infrastructure management and the development of data-driven maintenance strategies. Several practical challenges encountered during the detection process—including sensitivity to contrast enhancement parameters, false positives from shadows and surface reflections, heterogeneous image resolution across crowdsourced imagery, and training distribution gaps for locally prevalent infrastructure features—are discussed, and directions for reducing human intervention through adaptive preprocessing and targeted model refinement are identified. Full article
Show Figures

Figure 1

15 pages, 1850 KB  
Article
Unsupervised Head PD-to-T2 MR Image Translation via Multi-Scale Feature Regularization
by Xu Chen, Yuntian Bai and Yifeng Hong
Information 2026, 17(5), 474; https://doi.org/10.3390/info17050474 - 12 May 2026
Viewed by 124
Abstract
Unsupervised medical image translation remains challenging because model development often relies on unpaired training, whereas reliable evaluation requires well-matched reference images. PD-weighted and T2-weighted brain MR images provide a useful testbed for this problem because they are closely matched anatomically while still exhibiting [...] Read more.
Unsupervised medical image translation remains challenging because model development often relies on unpaired training, whereas reliable evaluation requires well-matched reference images. PD-weighted and T2-weighted brain MR images provide a useful testbed for this problem because they are closely matched anatomically while still exhibiting distinct contrast characteristics. Existing methods often align only high-level features, overlooking low-level texture details that are important for structural fidelity. In this work, we propose the Multi-Scale Feature Regularization and Patch Mixup (MSFRPM) framework based on an encoder–decoder architecture. It aligns cross-domain features across multiple scales to preserve local details and employs a patch-based mixup strategy to augment training data. The framework was evaluated using an unsupervised learning protocol with strict data partitioning. Experimental results demonstrate that MSFRPM achieves strong performance relative to eight state-of-the-art methods. Our approach achieved improvements in MAE (6.26 ± 0.86), PSNR (23.53 ± 0.92), SSIM (0.83 ± 0.03), and GMSD (0.100 ± 0.010). Qualitative assessments confirmed improved structural fidelity, and t-SNE visualization validated enhanced cross-domain feature alignment. Overall, MSFRPM provides a useful approach for unsupervised PD-to-T2 image translation under the current experimental setting. Full article
(This article belongs to the Section Biomedical Information and Health)
Show Figures

Figure 1

23 pages, 6159 KB  
Article
GIDNet: Infrared Small Target Detection Network Based on Gradient-Intensity Decoupled
by Xianwei Gao, Jingtao Wu, Dafeng Cao, Haotian Xu, Yingjie Ma, Lu Li and Mingjing Zhao
Remote Sens. 2026, 18(10), 1527; https://doi.org/10.3390/rs18101527 - 12 May 2026
Viewed by 277
Abstract
Infrared small target detection (IRSTD) plays a pivotal role in a comprehensive set of applications. Despite the extensive research alongside numerous algorithms proposed in recent years, IRSTD remains a formidable task, primarily stemming from the inherently low level of signal-to-noise ratios (SNR) as [...] Read more.
Infrared small target detection (IRSTD) plays a pivotal role in a comprehensive set of applications. Despite the extensive research alongside numerous algorithms proposed in recent years, IRSTD remains a formidable task, primarily stemming from the inherently low level of signal-to-noise ratios (SNR) as well as the presence of intricate background clutter. Current models remain constrained by three critical bottlenecks: the degradation of spectral coupling between intensity and gradient information in deep layers, limited scale adaptability of static filters, and the loss of spatial precision caused by iterative downsampling. We propose GIDNet, a gradient-intensity decoupled network that balances target energy preservation and noise suppression to address the aforementioned issues. Our GIDNet architecture incorporates three core components: a gradient-intensity synergistic convolution (GISC) designed to synergistically encode intensity and gradient information for robust target enhancement; a multi-scale difference contrast (MSDC) module for scale-adaptive detection via adaptive contrast modeling; and a shallow feature projection (SFP) strategy aimed at maintaining precise spatial localization by bridging the gap between deep semantics and shallow spatial details. Comprehensive evaluations, encompassing both quantitative metrics and qualitative visualizations, consistently demonstrate the preeminence of the developed GIDNet surpassing the performance of 16 counterparts. Full article
(This article belongs to the Special Issue Remote Sensing Data Preprocessing and Calibration)
Show Figures

Figure 1

25 pages, 2707 KB  
Article
Recognition of Gait Alterations Induced by Alcohol-Impairment Simulation Goggles Using Smartphone Accelerometer Signals
by Paweł Marciniak and Mariusz Zubert
Sensors 2026, 26(10), 3038; https://doi.org/10.3390/s26103038 - 12 May 2026
Viewed by 235
Abstract
The reliable identification of impairment relevant to safety-critical activities remains a significant challenge for public safety, motivating the exploration of unobtrusive and widely accessible sensing technologies. This study examines the viability of utilising inertial data acquired from consumer-grade smartphones to characterise gait disturbances [...] Read more.
The reliable identification of impairment relevant to safety-critical activities remains a significant challenge for public safety, motivating the exploration of unobtrusive and widely accessible sensing technologies. This study examines the viability of utilising inertial data acquired from consumer-grade smartphones to characterise gait disturbances associated with simulated visual impairment. The study simulates intoxication-related effects using alcohol-impairment goggles and does not involve the measurement of real alcohol intoxication. Two supervised experimental protocols were conducted in which participants traversed predefined walking routes under normal conditions and while wearing alcohol-impairment simulation goggles representing five manufacturer-declared blood alcohol concentration (BAC)-related goggle conditions plus a no-goggles control condition. An initial indoor trial, conducted in a structured corridor environment, yielded limited discrimination of gait dynamics due to strong spatial and visual stabilisation cues. To address this limitation, a subsequent outdoor experiment was conducted along a 100 m path lacking prominent visual reference points, resulting in motion patterns that more closely reflect unconstrained, real-world locomotion. Tri-axial accelerometer and gyroscope signals were recorded using smartphones, followed by artefact removal, segmentation, and standardisation to ensure inter-trial comparability. The resulting curated dataset comprises 290,919 multi-channel samples derived from 96 walking trials involving 16 participants and is released as an openly accessible resource to support further research in gait analysis and classification of gait alterations associated with simulated impairment. Model evaluation was performed using an 80/20 train–test split conducted within each traversal, with training and test windows originating from the same participant and walking session. Consequently, the reported results reflect within-subject performance instead of subject-independent generalisation. Multiple deep learning architectures combining convolutional feature extraction, bidirectional long short-term memory layers, and self-attention mechanisms were systematically evaluated. Using a subject-dependent evaluation protocol, the best-performing architecture achieved an accuracy of 71.4% and a weighted F1-score of 71.5% in distinguishing gait patterns associated with different levels of simulated visual impairment. The best-performing architectures yielded classification performance consistent with exploratory, low-stakes assessment of gait alterations associated with simulated visual impairment, using accelerometer data alone. These findings illustrate the feasibility of using smartphones as auxiliary tools for exploratory, low-stakes screening or educational applications and contribute a publicly released dataset and benchmark results to facilitate methodological advancement in inertial sensor-based gait impairment analysis. Full article
(This article belongs to the Collection Sensors for Gait, Human Movement Analysis, and Health Monitoring)
Show Figures

Graphical abstract

22 pages, 6871 KB  
Article
GSC-YOLO: A Pedestrian Detection Method for Low-Light Security Surveillance Scenarios
by Wei Qing, Fan Li, Shuang Li and Pengfei Yin
Sensors 2026, 26(10), 2987; https://doi.org/10.3390/s26102987 - 9 May 2026
Viewed by 568
Abstract
Pedestrian detection in nighttime security surveillance and other low-light visual sensing tasks is an important foundation for intelligent perception in complex environments. Under low-light conditions, visible-light images often suffer from missing texture details, intensified noise, and reduced contrast, which can easily lead to [...] Read more.
Pedestrian detection in nighttime security surveillance and other low-light visual sensing tasks is an important foundation for intelligent perception in complex environments. Under low-light conditions, visible-light images often suffer from missing texture details, intensified noise, and reduced contrast, which can easily lead to insufficient target representation, unstable cross-scale feature fusion, and an increased risk of missed detections. Although multimodal schemes, such as RGB–infrared approaches, can improve detection performance by exploiting modal complementarity, they involve relatively high hardware costs, cross-modal calibration complexity, and system integration overhead, which impose deployment limitations in lightweight or cost-sensitive scenarios. Therefore, developing an efficient pedestrian detection method for low-light monocular RGB scenarios is of clear practical value. This study focuses on low-light monocular RGB pedestrian detection and proposes an application-oriented structurally optimized model, termed GSC-YOLO, built upon YOLOv13. First, GhostNetV3 is introduced as the backbone to enhance multi-scale feature representation under weak-texture conditions. Second, a Semantic–Spatial Alignment (SSA) module is designed to improve information compensation and suppress noise during the feature fusion stage. Finally, C2f_Faster is incorporated into the high-level semantic branch to optimize information flow and reduce redundant computation. On the RGB subsets of the two public datasets, LLVIP and KAIST, GSC-YOLO achieves mAP@0.5:0.95 values of 57.70% and 66.61%, respectively, and Recall values of 89.93% and 90.49%, respectively, consistently outperforming the YOLOv13 baseline. The results demonstrate that, under the experimental settings adopted in this study, the proposed method effectively improves pedestrian perception performance in low-light RGB scenes while maintaining favorable real-time inference capability, and may provide a useful reference for front-end vision sensing research in low-altitude intelligent networks. Full article
Show Figures

Figure 1

29 pages, 5881 KB  
Article
AFPN-ResUNet: A Residual Attention Mechanism-Guided Asymptotic Feature Pyramid Network for Complex Outcrop Lithology Segmentation
by Mingming Tang, Kang Fu, Lei Tian, Wanxin Chen, Yuhan Li, Zongxu Zhang and Zhiyuan Ma
Remote Sens. 2026, 18(10), 1457; https://doi.org/10.3390/rs18101457 - 7 May 2026
Viewed by 198
Abstract
Although the accurate lithological segmentation of outcrops plays a key role in hydrocarbon exploration, complex field environments and substantial scale variations within outcrops, particularly in extremely thin sand–mudstone interbeds, present considerable obstacles to precise segmentation. To overcome these complexities, we propose a Residual [...] Read more.
Although the accurate lithological segmentation of outcrops plays a key role in hydrocarbon exploration, complex field environments and substantial scale variations within outcrops, particularly in extremely thin sand–mudstone interbeds, present considerable obstacles to precise segmentation. To overcome these complexities, we propose a Residual Attention Mechanism-Guided Asymptotic Feature Pyramid Network (AFPN-ResUNet). This architecture employs a structurally optimized RE-CBAM, which seamlessly integrates a Convolutional Block Attention Module (CBAM) into the residual network framework. This mechanism dynamically recalibrates channel and spatial feature responses, thereby effectively suppressing background artifacts while accentuating salient geological boundaries. Furthermore, we abandon traditional naive feature concatenation and instead utilize automatically generated spatially adaptive weights to guide the asymptotic fusion of features across different layers. This asymptotic fusion strategy effectively resolves the semantic discrepancies between distinct network levels, preserving the fine-grained spatial details crucial for delineating ultra-thin interbedded lithologies. To evaluate the architecture, a dedicated outcrop dataset was constructed. Compared to representative baselines (UNet, Vision Transformer, DeepLabV3+, PSPNet, and SegNeXt), AFPN-ResUNet achieves an mIoU of 93.41%, outperforming the baseline models by margins of 23.20%, 23.92%, 12.40%, 12.38%, and 26.04%, respectively. Additionally, ablation studies indicate that incorporating RE-CBAM and AFPN modules improves the mIoU by 13.11% and 13.98% over the backbone, respectively. These quantitative results demonstrate that AFPN-ResUNet effectively mitigates boundary blurring and preserves spatial continuity, an advantage visually corroborated by the Grad-CAM heatmaps. Notably, despite a relatively longer inference latency (33.99 ms), the model maintains a low computational overhead (179.79 G FLOPs), underscoring its practical application potential for outcrop lithology segmentation. Full article
Show Figures

Figure 1

23 pages, 5712 KB  
Article
A Visual Fault Detection System for Elevator Polyurethane Buffers Based on Multi-Scale Image Enhancement and Texture-Aware YOLO Network
by Li Lai, Shixuan Ding, Zewen Li, Zimin Luo and Hao Wang
Appl. Sci. 2026, 16(9), 4528; https://doi.org/10.3390/app16094528 - 4 May 2026
Viewed by 230
Abstract
Polyurethane buffers serve as critical safety protection devices for elevators, with their integrity directly impacting the effectiveness of protective functions during accidents. Current buffer inspections primarily rely on manual patrols, suffering from low inspection frequency, high subjectivity, and significant detection difficulties. To enhance [...] Read more.
Polyurethane buffers serve as critical safety protection devices for elevators, with their integrity directly impacting the effectiveness of protective functions during accidents. Current buffer inspections primarily rely on manual patrols, suffering from low inspection frequency, high subjectivity, and significant detection difficulties. To enhance the intelligence and real-time capability of buffer fault detection, this paper proposes a visual fault detection system for elevator buffers based on image enhancement. The system first designs a Hierarchical Fusion Enhancement Module, which effectively suppresses elastic artifacts and significantly enhances crack edge saliency through illumination correction, texture-sensitive guided filtering, and direction-frequency complementary enhancement. It then proposes a gradient-direction texture feature extractor that integrates a gradient-magnitude-weighted Grey-Level Co-occurrence Matrix with a completed local ternary pattern to construct strongly discriminative texture prior features. Finally, a Texture Fusion-Enhanced YOLO detector is developed, which incorporates texture features into the backbone network via a learnable mapping mechanism to achieve early alignment of texture knowledge with depth features. Experimental results indicate that under low-light and complex background conditions, the system achieves a detection accuracy (mAP@0.5) of 0.903 and an F1 Score of 0.891, showing competitive accuracy and robustness within the tested scenarios. Full article
Show Figures

Figure 1

23 pages, 1785 KB  
Article
Semantic Density-Guided ResNet for Dense Infrared Small Target Detection
by Xin Zhang, Wei An, Xinyi Ying, Ruojing Li, Nuo Chen, Boyang Li, Chao Xiao and Miao Li
Remote Sens. 2026, 18(9), 1397; https://doi.org/10.3390/rs18091397 - 1 May 2026
Viewed by 383
Abstract
Dense infrared small target detection (ISTD) in long-range remote sensing is critical for multi-target surveillance, yet existing benchmarks mostly contain only sparsely distributed targets and rarely reflect dense scenes. To address this limitation, we construct a new dense satellite ISTD dataset, IR-SatDense, by [...] Read more.
Dense infrared small target detection (ISTD) in long-range remote sensing is critical for multi-target surveillance, yet existing benchmarks mostly contain only sparsely distributed targets and rarely reflect dense scenes. To address this limitation, we construct a new dense satellite ISTD dataset, IR-SatDense, by compositing small targets onto real satellite infrared backgrounds and partitioning it into subsets using the Average Minimum Inter-Target Distance (AMID) to explicitly control target density. By visualizing multi-stage backbone features, we observe that in dense scenes the deepest stage naturally forms compact, high-response target clusters in the semantic feature maps, while low- and middle-level features remain heavily cluttered. This motivates us to treat high-level semantic density as a global prior to guide low-level feature enhancement. Therefore, we propose Semantic Density-Guided ResNet (SDG-ResNet), a plug-in backbone that attaches a lightweight semantic density head to the deepest stage and injects the predicted density map into intermediate layers through Semantic Density-Guided Refine (SDGR) blocks with residual spatial gating. Integrated into representative transformer-based detectors, including Deformable DETR, DETA, and DINO, SDG-ResNet consistently improves the probability of detection (PD) at comparable false alarm (FA) levels on IR-SatDense while maintaining competitive performance on the sparse dataset IRSTD-1K. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

27 pages, 4169 KB  
Article
The Use of an Improved Lightweight Scalable Attention-Guided Super-Resolution Method for Remote Sensing Image Enhancement
by Boyu Pang and Yinnian Liu
Appl. Sci. 2026, 16(9), 4298; https://doi.org/10.3390/app16094298 - 28 Apr 2026
Viewed by 431
Abstract
To address the urgent demand for real-time reconstruction in remote sensing satellite imaging, as well as the difficulty of extracting sparse target features from dark backgrounds under low-illumination conditions, this paper proposes a lightweight, scalable attention-guided super-resolution reconstruction framework (SASR). The framework adopts [...] Read more.
To address the urgent demand for real-time reconstruction in remote sensing satellite imaging, as well as the difficulty of extracting sparse target features from dark backgrounds under low-illumination conditions, this paper proposes a lightweight, scalable attention-guided super-resolution reconstruction framework (SASR). The framework adopts an efficient, scalable visual backbone with staged feature extraction to capture discriminative information at three hierarchical scales. A refined multi-scale channel attention module, improved from the classic MS-CAM structure, is further introduced to fuse high-level semantic features and low-level texture details comprehensively. Finally, stacked sub-pixel convolution operations are employed to achieve high-precision image super-resolution enhancement. The proposed method maintains superior lightweight characteristics and fast inference efficiency while embedding effective channel attention optimisation for accurate feature representation. Experimental validations are conducted on the GF-5 satellite datasets: at 2× magnification, the proposed model achieves 32.2346 dB PSNR and 0.8791 SSIM; at 3× magnification, 31.6040 dB PSNR and 0.8376 SSIM; at 4× magnification, PSNR remains above 30 dB, and SSIM exceeds 0.8. The framework also exhibits robust generalization performance on marine remote sensing image datasets. Comparative experiments with recent super-resolution methods on multiple public datasets further verify the effectiveness and practical superiority of the proposed approach. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

25 pages, 5188 KB  
Article
MonoCrown for Crown-Level Tree Species Semantic Segmentation in Heterogeneous Forests Using UAV RGB Imagery
by Linzhi Wen and Guangsheng Chen
Remote Sens. 2026, 18(9), 1338; https://doi.org/10.3390/rs18091338 - 27 Apr 2026
Viewed by 355
Abstract
Crown-level tree species semantic segmentation enables fine-grained forest inventory and management. Current high-precision tree species classification typically relies on multi-source remote sensing data, the acquisition and processing of which remain costly for large-area applications, making low-cost unmanned aerial vehicle (UAV) RGB imagery an [...] Read more.
Crown-level tree species semantic segmentation enables fine-grained forest inventory and management. Current high-precision tree species classification typically relies on multi-source remote sensing data, the acquisition and processing of which remain costly for large-area applications, making low-cost unmanned aerial vehicle (UAV) RGB imagery an attractive option for large-scale forest mapping. However, in heterogeneous forests, complex canopy structures and the limited spectral discriminability of low-cost UAV RGB imagery make 2D appearance cues alone insufficient for reliable species discrimination, crown delineation, and accurate separation of adjacent crowns. This often leads to inter-class confusion, blurred crown boundaries, and poor recognition of small crowns. To address these limitations, this paper proposes MonoCrown (MCrown), which strengthens geometric and contextual representation for distinguishing visually similar species and delineating crowns from single-temporal UAV RGB imagery. To compensate for the insufficiency of appearance cues, MCrown introduces monocular depth inferred offline from the same RGB image as a frozen geometric prior, and integrates cross-window global–local attention (CW-GLA), bidirectional cross-modal attention (BiCoAttn), and depth-adaptive injection (DAI) to capture long-range dependencies and promote complementary use of appearance and geometric features, especially for small crowns with similar visual patterns in complex scenes. To validate the method’s effectiveness, a crown-level UAV RGB dataset covering approximately 40 km2 was constructed. Systematic comparative experiments were conducted on the proposed dataset and on public benchmarks, supporting the effectiveness of the proposed approach across ten dominant classes, especially for small crowns and visually similar categories. Its mean Intersection over Union (mIoU) and overall accuracy (OA) reached 74.1% and 87.3%, respectively. The method achieves high-precision crown-level tree species semantic segmentation using single-temporal UAV RGB as the sole acquired modality, while monocular depth inferred from the same RGB image serves only as a frozen geometric prior, without requiring multispectral, multi-temporal, or active-sensor acquisitions. This offers a practical solution for crown-level tree species mapping in heterogeneous forests. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

23 pages, 15571 KB  
Article
A Practical Weakly Supervised Framework for Dose-Up Translation of Low-Enhanced CT Under Clinical Acquisition Variability
by Jong Bub Lee, Se Hwan Lim, Yu Jin Jung, Jae Hwan Kim and Hyun Gyu Lee
J. Imaging 2026, 12(5), 190; https://doi.org/10.3390/jimaging12050190 - 27 Apr 2026
Viewed by 294
Abstract
Low-dose contrast-enhanced computed tomography (CT) is widely used to reduce contrast-induced toxicity, but reduced iodine concentration and inconsistent acquisition conditions often produce uneven contrast attenuation and spatial misalignment between scans. In this context, we define dose-up translation as the computational process of synthetically [...] Read more.
Low-dose contrast-enhanced computed tomography (CT) is widely used to reduce contrast-induced toxicity, but reduced iodine concentration and inconsistent acquisition conditions often produce uneven contrast attenuation and spatial misalignment between scans. In this context, we define dose-up translation as the computational process of synthetically enhancing low-dose contrast images to approximate the visual and diagnostic quality of full-dose acquisitions. These factors limit the effective use of routinely acquired imaging data for dose-up translation, particularly in veterinary abdominal CT where respiratory motion and postural variability further degrade anatomical correspondence. We present a weakly aligned enhancement framework designed to operate under spatial misalignment and limited paired data. Registration-based pseudo-references are constructed using a hybrid strategy that combines deformable anatomical alignment with feature-level correspondence. Dose-up translation is performed using structure-preserving translation with multi-scale consistency and edge-aware regularization to maintain anatomical boundaries. To address limited low-dose datasets, a two-stage knowledge transfer strategy transfers anatomical and contrast priors from abundant pre-contrast data. Quantitative evaluation demonstrated region-level contrast-to-noise ratio improvements of up to 31.5% (e.g., from 5.55 to 8.38 in the caudal vena cava (CVC), p < 0.05) compared with baseline enhancement methods across 1171 test slices. Experiments demonstrate consistent improvements in structural fidelity, distributional realism, and region-level vascular conspicuity compared with paired, unpaired, and synthetic-pairing baselines. These findings suggest that the dose-up translation of low-enhanced CT is better formulated as a weakly aligned domain adaptation problem rather than a strictly paired reconstruction task, enabling practical image translation under realistic clinical acquisition variability. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

Back to TopTop