Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (111)

Search Parameters:
Keywords = low textured scene

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 2072 KiB  
Article
Barefoot Footprint Detection Algorithm Based on YOLOv8-StarNet
by Yujie Shen, Xuemei Jiang, Yabin Zhao and Wenxin Xie
Sensors 2025, 25(15), 4578; https://doi.org/10.3390/s25154578 - 24 Jul 2025
Viewed by 293
Abstract
This study proposes an optimized footprint recognition model based on an enhanced StarNet architecture for biometric identification in the security, medical, and criminal investigation fields. Conventional image recognition algorithms exhibit limitations in processing barefoot footprint images characterized by concentrated feature distributions and rich [...] Read more.
This study proposes an optimized footprint recognition model based on an enhanced StarNet architecture for biometric identification in the security, medical, and criminal investigation fields. Conventional image recognition algorithms exhibit limitations in processing barefoot footprint images characterized by concentrated feature distributions and rich texture patterns. To address this, our framework integrates an improved StarNet into the backbone of YOLOv8 architecture. Leveraging the unique advantages of element-wise multiplication, the redesigned backbone efficiently maps inputs to a high-dimensional nonlinear feature space without increasing channel dimensions, achieving enhanced representational capacity with low computational latency. Subsequently, an Encoder layer facilitates feature interaction within the backbone through multi-scale feature fusion and attention mechanisms, effectively extracting rich semantic information while maintaining computational efficiency. In the feature fusion part, a feature modulation block processes multi-scale features by synergistically combining global and local information, thereby reducing redundant computations and decreasing both parameter count and computational complexity to achieve model lightweighting. Experimental evaluations on a proprietary barefoot footprint dataset demonstrate that the proposed model exhibits significant advantages in terms of parameter efficiency, recognition accuracy, and computational complexity. The number of parameters has been reduced by 0.73 million, further improving the model’s speed. Gflops has been reduced by 1.5, lowering the performance requirements for computational hardware during model deployment. Recognition accuracy has reached 99.5%, with further improvements in model precision. Future research will explore how to capture shoeprint images with complex backgrounds from shoes worn at crime scenes, aiming to further enhance the model’s recognition capabilities in more forensic scenarios. Full article
(This article belongs to the Special Issue Transformer Applications in Target Tracking)
Show Figures

Figure 1

17 pages, 1927 KiB  
Article
ConvTransNet-S: A CNN-Transformer Hybrid Disease Recognition Model for Complex Field Environments
by Shangyun Jia, Guanping Wang, Hongling Li, Yan Liu, Linrong Shi and Sen Yang
Plants 2025, 14(15), 2252; https://doi.org/10.3390/plants14152252 - 22 Jul 2025
Viewed by 355
Abstract
To address the challenges of low recognition accuracy and substantial model complexity in crop disease identification models operating in complex field environments, this study proposed a novel hybrid model named ConvTransNet-S, which integrates Convolutional Neural Networks (CNNs) and transformers for crop disease identification [...] Read more.
To address the challenges of low recognition accuracy and substantial model complexity in crop disease identification models operating in complex field environments, this study proposed a novel hybrid model named ConvTransNet-S, which integrates Convolutional Neural Networks (CNNs) and transformers for crop disease identification tasks. Unlike existing hybrid approaches, ConvTransNet-S uniquely introduces three key innovations: First, a Local Perception Unit (LPU) and Lightweight Multi-Head Self-Attention (LMHSA) modules were introduced to synergistically enhance the extraction of fine-grained plant disease details and model global dependency relationships, respectively. Second, an Inverted Residual Feed-Forward Network (IRFFN) was employed to optimize the feature propagation path, thereby enhancing the model’s robustness against interferences such as lighting variations and leaf occlusions. This novel combination of a LPU, LMHSA, and an IRFFN achieves a dynamic equilibrium between local texture perception and global context modeling—effectively resolving the trade-offs inherent in standalone CNNs or transformers. Finally, through a phased architecture design, efficient fusion of multi-scale disease features is achieved, which enhances feature discriminability while reducing model complexity. The experimental results indicated that ConvTransNet-S achieved a recognition accuracy of 98.85% on the PlantVillage public dataset. This model operates with only 25.14 million parameters, a computational load of 3.762 GFLOPs, and an inference time of 7.56 ms. Testing on a self-built in-field complex scene dataset comprising 10,441 images revealed that ConvTransNet-S achieved an accuracy of 88.53%, which represents improvements of 14.22%, 2.75%, and 0.34% over EfficientNetV2, Vision Transformer, and Swin Transformer, respectively. Furthermore, the ConvTransNet-S model achieved up to 14.22% higher disease recognition accuracy under complex background conditions while reducing the parameter count by 46.8%. This confirms that its unique multi-scale feature mechanism can effectively distinguish disease from background features, providing a novel technical approach for disease diagnosis in complex agricultural scenarios and demonstrating significant application value for intelligent agricultural management. Full article
(This article belongs to the Section Plant Modeling)
Show Figures

Figure 1

27 pages, 1868 KiB  
Article
SAM2-DFBCNet: A Camouflaged Object Detection Network Based on the Heira Architecture of SAM2
by Cao Yuan, Libang Liu, Yaqin Li and Jianxiang Li
Sensors 2025, 25(14), 4509; https://doi.org/10.3390/s25144509 - 21 Jul 2025
Viewed by 365
Abstract
Camouflaged Object Detection (COD) aims to segment objects that are highly integrated with their background, presenting significant challenges such as low contrast, complex textures, and blurred boundaries. Existing deep learning methods often struggle to achieve robust segmentation under these conditions. To address these [...] Read more.
Camouflaged Object Detection (COD) aims to segment objects that are highly integrated with their background, presenting significant challenges such as low contrast, complex textures, and blurred boundaries. Existing deep learning methods often struggle to achieve robust segmentation under these conditions. To address these limitations, this paper proposes a novel COD network, SAM2-DFBCNet, built upon the SAM2 Hiera architecture. Our network incorporates three key modules: (1) the Camouflage-Aware Context Enhancement Module (CACEM), which fuses local and global features through an attention mechanism to enhance contextual awareness in low-contrast scenes; (2) the Cross-Scale Feature Interaction Bridge (CSFIB), which employs a bidirectional convolutional GRU for the dynamic fusion of multi-scale features, effectively mitigating representation inconsistencies caused by complex textures and deformations; and (3) the Dynamic Boundary Refinement Module (DBRM), which combines channel and spatial attention mechanisms to optimize boundary localization accuracy and enhance segmentation details. Extensive experiments on three public datasets—CAMO, COD10K, and NC4K—demonstrate that SAM2-DFBCNet outperforms twenty state-of-the-art methods, achieving maximum improvements of 7.4%, 5.78%, and 4.78% in key metrics such as S-measure (Sα), F-measure (Fβ), and mean E-measure (Eϕ), respectively, while reducing the Mean Absolute Error (M) by 37.8%. These results validate the superior performance and robustness of our approach in complex camouflage scenarios. Full article
(This article belongs to the Special Issue Transformer Applications in Target Tracking)
Show Figures

Figure 1

23 pages, 10392 KiB  
Article
Dual-Branch Luminance–Chrominance Attention Network for Hydraulic Concrete Image Enhancement
by Zhangjun Peng, Li Li, Chuanhao Chang, Rong Tang, Guoqiang Zheng, Mingfei Wan, Juanping Jiang, Shuai Zhou, Zhenggang Tian and Zhigui Liu
Appl. Sci. 2025, 15(14), 7762; https://doi.org/10.3390/app15147762 - 10 Jul 2025
Viewed by 261
Abstract
Hydraulic concrete is a critical infrastructure material, with its surface condition playing a vital role in quality assessments for water conservancy and hydropower projects. However, images taken in complex hydraulic environments often suffer from degraded quality due to low lighting, shadows, and noise, [...] Read more.
Hydraulic concrete is a critical infrastructure material, with its surface condition playing a vital role in quality assessments for water conservancy and hydropower projects. However, images taken in complex hydraulic environments often suffer from degraded quality due to low lighting, shadows, and noise, making it difficult to distinguish defects from the background and thereby hindering accurate defect detection and damage evaluation. In this study, following systematic analyses of hydraulic concrete color space characteristics, we propose a Dual-Branch Luminance–Chrominance Attention Network (DBLCANet-HCIE) specifically designed for low-light hydraulic concrete image enhancement. Inspired by human visual perception, the network simultaneously improves global contrast and preserves fine-grained defect textures, which are essential for structural analysis. The proposed architecture consists of a Luminance Adjustment Branch (LAB) and a Chroma Restoration Branch (CRB). The LAB incorporates a Luminance-Aware Hybrid Attention Block (LAHAB) to capture both the global luminance distribution and local texture details, enabling adaptive illumination correction through comprehensive scene understanding. The CRB integrates a Channel Denoiser Block (CDB) for channel-specific noise suppression and a Frequency-Domain Detail Enhancement Block (FDDEB) to refine chrominance information and enhance subtle defect textures. A feature fusion block is designed to fuse and learn the features of the outputs from the two branches, resulting in images with enhanced luminance, reduced noise, and preserved surface anomalies. To validate the proposed approach, we construct a dedicated low-light hydraulic concrete image dataset (LLHCID). Extensive experiments conducted on both LOLv1 and LLHCID benchmarks demonstrate that the proposed method significantly enhances the visual interpretability of hydraulic concrete surfaces while effectively addressing low-light degradation challenges. Full article
Show Figures

Figure 1

23 pages, 9575 KiB  
Article
Infrared and Visible Image Fusion via Residual Interactive Transformer and Cross-Attention Fusion
by Liquan Zhao, Chen Ke, Yanfei Jia, Cong Xu and Zhijun Teng
Sensors 2025, 25(14), 4307; https://doi.org/10.3390/s25144307 - 10 Jul 2025
Viewed by 357
Abstract
Infrared and visible image fusion combines infrared and visible images of the same scene to produce a more informative and comprehensive fused image. Existing deep learning-based fusion methods fail to establish dependencies between global and local information during feature extraction. This results in [...] Read more.
Infrared and visible image fusion combines infrared and visible images of the same scene to produce a more informative and comprehensive fused image. Existing deep learning-based fusion methods fail to establish dependencies between global and local information during feature extraction. This results in unclear scene texture details and low contrast of the infrared thermal targets in the fused image. This paper proposes an infrared and visible image fusion network to address this issue via the use of a residual interactive transformer and cross-attention fusion. The network first introduces a residual dense module to extract shallow features from the input infrared and visible images. Next, the residual interactive transformer extracts global and local features from the source images and establishes interactions between them. Two identical residual interactive transformers are used for further feature extraction. A cross-attention fusion module is also designed to fuse the infrared and visible feature maps extracted by the residual interactive transformer. Finally, an image reconstruction network generates the fused image. The proposed method is evaluated on the RoadScene, TNO, and M3FD datasets. The experimental results show that the fused images produced by the proposed method contain more visible texture details and infrared thermal information. Compared to nine other methods, the proposed approach achieves superior fusion performance. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

27 pages, 13245 KiB  
Article
LHRF-YOLO: A Lightweight Model with Hybrid Receptive Field for Forest Fire Detection
by Yifan Ma, Weifeng Shan, Yanwei Sui, Mengyu Wang and Maofa Wang
Forests 2025, 16(7), 1095; https://doi.org/10.3390/f16071095 - 2 Jul 2025
Viewed by 350
Abstract
Timely and accurate detection of forest fires is crucial for protecting forest ecosystems. However, traditional monitoring methods face significant challenges in effectively detecting forest fires, primarily due to the dynamic spread of flames and smoke, irregular morphologies, and the semi-transparent nature of smoke, [...] Read more.
Timely and accurate detection of forest fires is crucial for protecting forest ecosystems. However, traditional monitoring methods face significant challenges in effectively detecting forest fires, primarily due to the dynamic spread of flames and smoke, irregular morphologies, and the semi-transparent nature of smoke, which make it extremely difficult to extract key visual features. Additionally, deploying these detection systems to edge devices with limited computational resources remains challenging. To address these issues, this paper proposes a lightweight hybrid receptive field model (LHRF-YOLO), which leverages deep learning to overcome the shortcomings of traditional monitoring methods for fire detection on edge devices. Firstly, a hybrid receptive field extraction module is designed by integrating the 2D selective scan mechanism with a residual multi-branch structure. This significantly enhances the model’s contextual understanding of the entire image scene while maintaining low computational complexity. Second, a dynamic enhanced downsampling module is proposed, which employs feature reorganization and channel-wise dynamic weighting strategies to minimize the loss of critical details, such as fine smoke textures, while reducing image resolution. Furthermore, a scale weighted Fusion module is introduced to optimize multi-scale feature fusion through adaptive weight allocation, addressing the issues of information dilution and imbalance caused by traditional fusion methods. Finally, the Mish activation function replaces the SiLU activation function to improve the model’s ability to capture flame edges and faint smoke textures. Experimental results on the self-constructed Fire-SmokeDataset demonstrate that LHRF-YOLO achieves significant model compression while further improving accuracy compared to the baseline model YOLOv11. The parameter count is reduced to only 2.25M (a 12.8% reduction), computational complexity to 5.4 GFLOPs (a 14.3% decrease), and mAP50 is increased to 87.6%, surpassing the baseline model. Additionally, LHRF-YOLO exhibits leading generalization performance on the cross-scenario M4SFWD dataset. The proposed method balances performance and resource efficiency, providing a feasible solution for real-time and efficient fire detection on resource-constrained edge devices with significant research value. Full article
(This article belongs to the Special Issue Forest Fires Prediction and Detection—2nd Edition)
Show Figures

Figure 1

19 pages, 17180 KiB  
Article
Adaptive Support Weight-Based Stereo Matching with Iterative Disparity Refinement
by Alexander Richter, Till Steinmann, Andreas Reichenbach and Stefan J. Rupitsch
Sensors 2025, 25(13), 4124; https://doi.org/10.3390/s25134124 - 2 Jul 2025
Viewed by 410
Abstract
Real-time 3D reconstruction in minimally invasive surgery improves depth perception and supports intraoperative decision-making and navigation. However, endoscopic imaging presents significant challenges, such as specular reflections, low-texture surfaces, and tissue deformation. We present a novel, deterministic and iterative stereo-matching method based on adaptive [...] Read more.
Real-time 3D reconstruction in minimally invasive surgery improves depth perception and supports intraoperative decision-making and navigation. However, endoscopic imaging presents significant challenges, such as specular reflections, low-texture surfaces, and tissue deformation. We present a novel, deterministic and iterative stereo-matching method based on adaptive support weights that is tailored to these constraints. The algorithm is implemented in CUDA and C++ to enable real-time performance. We evaluated our method on the Stereo Correspondence and Reconstruction of Endoscopic Data (SCARED) dataset and a custom synthetic dataset using the mean absolute error (MAE), root mean square error (RMSE), and frame rate as metrics. On SCARED datasets 8 and 9, our method achieves MAEs of 3.79 mm and 3.61 mm, achieving 24.9 FPS on a system with an AMD Ryzen 9 5950X and NVIDIA RTX 3090. To the best of our knowledge, these results are on par with or surpass existing deterministic stereo-matching approaches. On synthetic data, which eliminates real-world imaging errors, the method achieves an MAE of 140.06 μm and an RMSE of 251.9 μm, highlighting its performance ceiling under noise-free, idealized conditions. Our method focuses on single-shot 3D reconstruction as a basis for stereo frame stitching and full-scene modeling. It provides accurate, deterministic, real-time depth estimation under clinically relevant conditions and has the potential to be integrated into surgical navigation, robotic assistance, and augmented reality workflows. Full article
(This article belongs to the Special Issue Stereo Vision Sensing and Image Processing)
Show Figures

Figure 1

17 pages, 12088 KiB  
Article
Edge-Guided DETR Model for Intelligent Sensing of Tomato Ripeness Under Complex Environments
by Jiamin Yao, Jianxuan Zhou, Yangang Nie, Jun Xue, Kai Lin and Liwen Tan
Mathematics 2025, 13(13), 2095; https://doi.org/10.3390/math13132095 - 26 Jun 2025
Cited by 1 | Viewed by 469
Abstract
Tomato ripeness detection in open-field environments is challenged by dense planting, heavy occlusion, and complex lighting conditions. Existing methods mainly rely on color and texture cues, limiting boundary perception and causing redundant predictions in crowded scenes. To address these issues, we propose an [...] Read more.
Tomato ripeness detection in open-field environments is challenged by dense planting, heavy occlusion, and complex lighting conditions. Existing methods mainly rely on color and texture cues, limiting boundary perception and causing redundant predictions in crowded scenes. To address these issues, we propose an improved detection framework called Edge-Guided DETR (EG-DETR), based on the DEtection TRansformer (DETR). EG-DETR introduces edge prior information by extracting multi-scale edge features through an edge backbone network. These features are fused in the transformer decoder to guide queries toward foreground regions, which improves detection under occlusion. We further design a redundant box suppression strategy to reduce duplicate predictions caused by clustered fruits. We evaluated our method on a multimodal tomato dataset that included varied lighting conditions such as natural light, artificial light, low light, and sodium yellow light. Our experimental results show that EG-DETR achieves an AP of 83.7% under challenging lighting and occlusion, outperforming existing models. This work provides a reliable intelligent sensing solution for automated harvesting in smart agriculture. Full article
Show Figures

Figure 1

16 pages, 1058 KiB  
Article
Multi-Scale Context Enhancement Network with Local–Global Synergy Modeling Strategy for Semantic Segmentation on Remote Sensing Images
by Qibing Ma, Hongning Liu, Yifan Jin and Xinyue Liu
Electronics 2025, 14(13), 2526; https://doi.org/10.3390/electronics14132526 - 21 Jun 2025
Cited by 1 | Viewed by 319
Abstract
Semantic segmentation of remote sensing images is a fundamental task in geospatial analysis and Earth observation research, and has a wide range of applications in urban planning, land cover classification, and ecological monitoring. In complex geographic scenes, low target-background discriminability in overhead views [...] Read more.
Semantic segmentation of remote sensing images is a fundamental task in geospatial analysis and Earth observation research, and has a wide range of applications in urban planning, land cover classification, and ecological monitoring. In complex geographic scenes, low target-background discriminability in overhead views (e.g., indistinct boundaries, ambiguous textures, and low contrast) significantly complicates local–global information modeling and results in blurred boundaries and classification errors in model predictions. To address this issue, in this paper, we proposed a novel Multi-Scale Local–Global Mamba Feature Pyramid Network (MLMFPN) through designing a local–global information synergy modeling strategy, and guided and enhanced the cross-scale contextual information interaction in the feature fusion process to obtain quality semantic features to be used as cues for precise semantic reasoning. The proposed MLMFPN comprises two core components: Local–Global Align Mamba Fusion (LGAMF) and Context-Aware Cross-attention Interaction Module (CCIM). Specifically, LGAMF designs a local-enhanced global information modeling through asymmetric convolution for synergistic modeling of the receptive fields in vertical and horizontal directions, and further introduces the Vision Mamba structure to facilitate local–global information fusion. CCIM introduces positional encoding and cross-attention mechanisms to enrich the global-spatial semantics representation during multi-scale context information interaction, thereby achieving refined segmentation. The proposed methods are evaluated on the ISPRS Potsdam and Vaihingen datasets and the outperformance in the results verifies the effectiveness of the proposed method. Full article
Show Figures

Figure 1

18 pages, 4774 KiB  
Article
InfraredStereo3D: Breaking Night Vision Limits with Perspective Projection Positional Encoding and Groundbreaking Infrared Dataset
by Yuandong Niu, Limin Liu, Fuyu Huang, Juntao Ma, Chaowen Zheng, Yunfeng Jiang, Ting An, Zhongchen Zhao and Shuangyou Chen
Remote Sens. 2025, 17(12), 2035; https://doi.org/10.3390/rs17122035 - 13 Jun 2025
Viewed by 458
Abstract
In fields such as military reconnaissance, forest fire prevention, and autonomous driving at night, there is an urgent need for high-precision three-dimensional reconstruction in low-light or night environments. The acquisition of remote sensing data by RGB cameras relies on external light, resulting in [...] Read more.
In fields such as military reconnaissance, forest fire prevention, and autonomous driving at night, there is an urgent need for high-precision three-dimensional reconstruction in low-light or night environments. The acquisition of remote sensing data by RGB cameras relies on external light, resulting in a significant decline in image quality and making it difficult to meet the task requirements. The method based on lidar has poor imaging effects in rainy and foggy weather, close-range scenes, and scenarios requiring thermal imaging data. In contrast, infrared cameras can effectively overcome this challenge because their imaging mechanisms are different from those of RGB cameras and lidar. However, the research on three-dimensional scene reconstruction of infrared images is relatively immature, especially in the field of infrared binocular stereo matching. There are two main challenges given this situation: first, there is a lack of a dataset specifically for infrared binocular stereo matching; second, the lack of texture information in infrared images causes a limit in the extension of the RGB method to the infrared reconstruction problem. To solve these problems, this study begins with the construction of an infrared binocular stereo matching dataset and then proposes an innovative perspective projection positional encoding-based transformer method to complete the infrared binocular stereo matching task. In this paper, a stereo matching network combined with transformer and cost volume is constructed. The existing work in the positional encoding of the transformer usually uses a parallel projection model to simplify the calculation. Our method is based on the actual perspective projection model so that each pixel is associated with a different projection ray. It effectively solves the problem of feature extraction and matching caused by insufficient texture information in infrared images and significantly improves matching accuracy. We conducted experiments based on the infrared binocular stereo matching dataset proposed in this paper. Experiments demonstrated the effectiveness of the proposed method. Full article
(This article belongs to the Collection Visible Infrared Imaging Radiometers and Applications)
Show Figures

Figure 1

18 pages, 8414 KiB  
Article
Fish Body Pattern Style Transfer Based on Wavelet Transformation and Gated Attention
by Hongchun Yuan and Yixuan Wang
Appl. Sci. 2025, 15(9), 5150; https://doi.org/10.3390/app15095150 - 6 May 2025
Viewed by 420
Abstract
To address the temporal jitter with low segmentation accuracy and the lack of high-precision transformations for specific object classes in video generation, we propose the fish body pattern sync-style network for ornamental fish videos. This network innovatively integrates dynamic texture transfer with instance [...] Read more.
To address the temporal jitter with low segmentation accuracy and the lack of high-precision transformations for specific object classes in video generation, we propose the fish body pattern sync-style network for ornamental fish videos. This network innovatively integrates dynamic texture transfer with instance segmentation, adopting a two-stage processing architecture. First, high-precision video frame segmentation is performed using Mask2Former to eliminate background elements that do not participate in the style transfer process. Then, we introduce the wavelet-gated styling network, which reconstructs a multi-scale feature space via discrete wavelet transform, enhancing the granularity of multi-scale style features during the image generation phase. Additionally, we embed a convolutional block attention module within the residual modules, not only improving the realism of the generated images but also effectively reducing boundary artifacts in foreground objects. Furthermore, to mitigate the frame-to-frame jitter commonly observed in generated videos, we incorporate a contrastive coherence preserving loss into the training process of the style transfer network. This enhances the perceptual loss function, thereby preventing video flickering and ensuring improved temporal consistency. In real-world aquarium scenes, compared to state-of-the-art methods, FSSNet effectively preserves localized texture details in generated videos and achieves competitive SSIM and PSNR scores. Moreover, temporal consistency is significantly improved. The flow warping error index decreases to 1.412. We chose FNST (fast neural style transfer) as our baseline model and demonstrate improvements in both model parameter count and runtime efficiency. According to user preferences, 43.75% of participants preferred the dynamic effects generated by this method. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

31 pages, 7540 KiB  
Article
Temporal Denoising of Infrared Images via Total Variation and Low-Rank Bidirectional Twisted Tensor Decomposition
by Zhihao Liu, Weiqi Jin and Li Li
Remote Sens. 2025, 17(8), 1343; https://doi.org/10.3390/rs17081343 - 9 Apr 2025
Viewed by 798
Abstract
Temporal random noise (TRN) in uncooled infrared detectors significantly degrades image quality. Existing denoising techniques primarily address fixed-pattern noise (FPN) and do not effectively mitigate TRN. Therefore, a novel TRN denoising approach based on total variation regularization and low-rank tensor decomposition is proposed. [...] Read more.
Temporal random noise (TRN) in uncooled infrared detectors significantly degrades image quality. Existing denoising techniques primarily address fixed-pattern noise (FPN) and do not effectively mitigate TRN. Therefore, a novel TRN denoising approach based on total variation regularization and low-rank tensor decomposition is proposed. This method effectively suppresses temporal noise by introducing twisted tensors in both horizontal and vertical directions while preserving spatial information in diverse orientations to protect image details and textures. Additionally, the Laplacian operator-based bidirectional twisted tensor truncated nuclear norm (bt-LPTNN), is proposed, which is a norm that automatically assigns weights to different singular values based on their importance. Furthermore, a weighted spatiotemporal total variation regularization method for nonconvex tensor approximation is employed to preserve scene details. To recover spatial domain information lost during tensor estimation, robust principal component analysis is employed, and spatial information is extracted from the noise tensor. The proposed model, bt-LPTVTD, is solved using an augmented Lagrange multiplier algorithm, which outperforms several state-of-the-art algorithms. Compared to some of the latest algorithms, bt-LPTVTD demonstrates improvements across all evaluation metrics. Extensive experiments conducted using complex scenes underscore the strong adaptability and robustness of our algorithm. Full article
(This article belongs to the Special Issue Recent Advances in Infrared Target Detection)
Show Figures

Graphical abstract

17 pages, 904 KiB  
Article
Apple Detection via Near-Field MIMO-SAR Imaging: A Multi-Scale and Context-Aware Approach
by Yuanping Shi, Yanheng Ma and Liang Geng
Sensors 2025, 25(5), 1536; https://doi.org/10.3390/s25051536 - 1 Mar 2025
Viewed by 1029
Abstract
Accurate fruit detection is of great importance for yield assessment, timely harvesting, and orchard management strategy optimization in precision agriculture. Traditional optical imaging methods are limited by lighting and meteorological conditions, making it difficult to obtain stable, high-quality data. Therefore, this study utilizes [...] Read more.
Accurate fruit detection is of great importance for yield assessment, timely harvesting, and orchard management strategy optimization in precision agriculture. Traditional optical imaging methods are limited by lighting and meteorological conditions, making it difficult to obtain stable, high-quality data. Therefore, this study utilizes near-field millimeter-wave MIMO-SAR (Multiple Input Multiple Output Synthetic Aperture Radar) technology, which is capable of all-day and all-weather imaging, to perform high-precision detection of apple targets in orchards. This paper first constructs a near-field millimeter-wave MIMO-SAR imaging system and performs multi-angle imaging on real fruit tree samples, obtaining about 150 sets of SAR-optical paired data, covering approximately 2000 accurately annotated apple targets. Addressing challenges such as weak scattering, low texture contrast, and complex backgrounds in SAR images, we propose an innovative detection framework integrating Dynamic Spatial Pyramid Pooling (DSPP), Recursive Feature Fusion Network (RFN), and Context-Aware Feature Enhancement (CAFE) modules. DSPP employs a learnable adaptive mechanism to dynamically adjust multi-scale feature representations, enhancing sensitivity to apple targets of varying sizes and distributions; RFN uses a multi-round iterative feature fusion strategy to gradually refine semantic consistency and stability, improving the robustness of feature representation under weak texture and high noise scenarios; and the CAFE module, based on attention mechanisms, explicitly models global and local associations, fully utilizing the scene context in texture-poor SAR conditions to enhance the discriminability of apple targets. Experimental results show that the proposed method achieves significant improvements in average precision (AP), recall rate, and F1 score on the constructed near-field millimeter-wave SAR apple dataset compared to various classic and mainstream detectors. Ablation studies confirm the synergistic effect of DSPP, RFN, and CAFE. Qualitative analysis demonstrates that the detection framework proposed in this paper can still stably locate apple targets even under conditions of leaf occlusion, complex backgrounds, and weak scattering. This research provides a beneficial reference and technical basis for using SAR data in fruit detection and yield estimation in precision agriculture. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

27 pages, 9095 KiB  
Article
BMFusion: Bridging the Gap Between Dark and Bright in Infrared-Visible Imaging Fusion
by Chengwen Liu, Bin Liao and Zhuoyue Chang
Electronics 2024, 13(24), 5005; https://doi.org/10.3390/electronics13245005 - 19 Dec 2024
Viewed by 1121
Abstract
The fusion of infrared and visible light images is a crucial technology for enhancing visual perception in complex environments. It plays a pivotal role in improving visual perception and subsequent performance in advanced visual tasks. However, due to the significant degradation of visible [...] Read more.
The fusion of infrared and visible light images is a crucial technology for enhancing visual perception in complex environments. It plays a pivotal role in improving visual perception and subsequent performance in advanced visual tasks. However, due to the significant degradation of visible light image quality in low-light or nighttime scenes, most existing fusion methods often struggle to obtain sufficient texture details and salient features when processing such scenes. This can lead to a decrease in fusion quality. To address this issue, this article proposes a new image fusion method called BMFusion. Its aim is to significantly improve the quality of fused images in low-light or nighttime scenes and generate high-quality fused images around the clock. This article first designs a brightness attention module composed of brightness attention units. It extracts multimodal features by combining the SimAm attention mechanism with a Transformer architecture. Effective enhancement of brightness and features has been achieved, with gradual brightness attention performed during feature extraction. Secondly, a complementary fusion module was designed. This module deeply fuses infrared and visible light features to ensure the complementarity and enhancement of each modal feature during the fusion process, minimizing information loss to the greatest extent possible. In addition, a feature reconstruction network combining CLIP-guided semantic vectors and neighborhood attention enhancement was proposed in the feature reconstruction stage. It uses the KAN module to perform channel adaptive optimization on the reconstruction process, ensuring semantic consistency and detail integrity of the fused image during the reconstruction phase. The experimental results on a large number of public datasets demonstrate that the BMFusion method can generate fusion images with higher visual quality and richer details in night and low-light environments compared with various existing state-of-the-art (SOTA) algorithms. At the same time, the fusion image can significantly improve the performance of advanced visual tasks. This shows the great potential and application prospect of this method in the field of multimodal image fusion. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

18 pages, 5411 KiB  
Article
Leveraging Neural Radiance Fields for Large-Scale 3D Reconstruction from Aerial Imagery
by Max Hermann, Hyovin Kwak, Boitumelo Ruf and Martin Weinmann
Remote Sens. 2024, 16(24), 4655; https://doi.org/10.3390/rs16244655 - 12 Dec 2024
Viewed by 2509
Abstract
Since conventional photogrammetric approaches struggle with with low-texture, reflective, and transparent regions, this study explores the application of Neural Radiance Fields (NeRFs) for large-scale 3D reconstruction of outdoor scenes, since NeRF-based methods have recently shown very impressive results in these areas. We evaluate [...] Read more.
Since conventional photogrammetric approaches struggle with with low-texture, reflective, and transparent regions, this study explores the application of Neural Radiance Fields (NeRFs) for large-scale 3D reconstruction of outdoor scenes, since NeRF-based methods have recently shown very impressive results in these areas. We evaluate three approaches: Mega-NeRF, Block-NeRF, and Direct Voxel Grid Optimization, focusing on their accuracy and completeness compared to ground truth point clouds. In addition, we analyze the effects of using multiple sub-modules, estimating the visibility by an additional neural network and varying the density threshold for the extraction of the point cloud. For performance evaluation, we use benchmark datasets that correspond to the setting off standard flight campaigns and therefore typically have nadir camera perspective and relatively little image overlap, which can be challenging for NeRF-based approaches that are typically trained with significantly more images and varying camera angles. We show that despite lower quality compared to classic photogrammetric approaches, NeRF-based reconstructions provide visually convincing results in challenging areas. Furthermore, our study shows that in particular increasing the number of sub-modules and predicting the visibility using an additional neural network improves the quality of the resulting reconstructions significantly. Full article
Show Figures

Figure 1

Back to TopTop