MDPI - Publisher of Open Access Journals

24 pages, 3235 KiB

Open AccessArticle

A Cost-Sensitive Small Vessel Detection Method for Maritime Remote Sensing Imagery

by Zhuhua Hu, Wei Wu, Ziqi Yang, Yaochi Zhao, Lewei Xu, Lingkai Kong, Yunpei Chen, Lihang Chen and Gaosheng Liu

Remote Sens. 2025, 17(14), 2471; https://doi.org/10.3390/rs17142471 - 16 Jul 2025

Viewed by 230

Abstract

Vessel detection technology based on marine remote sensing imagery is of great importance. However, it often faces challenges, such as small vessel targets, cloud occlusion, insufficient data volume, and severely imbalanced class distribution in datasets. These issues result in conventional models failing to [...] Read more.

Vessel detection technology based on marine remote sensing imagery is of great importance. However, it often faces challenges, such as small vessel targets, cloud occlusion, insufficient data volume, and severely imbalanced class distribution in datasets. These issues result in conventional models failing to meet the accuracy requirements for practical applications. In this paper, we first construct a novel remote sensing vessel image dataset that includes various complex scenarios and enhance the data volume and diversity through data augmentation techniques. Secondly, we address the class imbalance between foreground (small vessels) and background in remote sensing imagery from two perspectives: the sensitivity of IoU metrics to small object localization errors and the innovative design of a cost-sensitive loss function. Specifically, at the dataset level, we select vessel targets appearing in the original dataset as templates and randomly copy–paste several instances onto arbitrary positions. This enriches the diversity of target samples per image and mitigates the impact of data imbalance on the detection task. At the algorithm level, we introduce the Normalized Wasserstein Distance (NWD) to compute the similarity between bounding boxes. This enhances the importance of small target information during training and strengthens the model’s cost-sensitive learning capabilities. Ablation studies reveal that detection performance is optimal when the weight assigned to the NWD metric in the model’s loss function matches the overall proportion of small objects in the dataset. Comparative experiments show that the proposed NWD-YOLO achieves Precision, Recall, and

{AP}_{50}

scores of 0.967, 0.958, and 0.971, respectively, meeting the accuracy requirements of real-world applications. Full article

(This article belongs to the Special Issue Synthetic Aperture Radar (SAR) Image Object Detection and Information Extraction: Methods and Applications (Second Edition))

► Show Figures

Figure 1

17 pages, 12088 KiB

Open AccessArticle

Edge-Guided DETR Model for Intelligent Sensing of Tomato Ripeness Under Complex Environments

by Jiamin Yao, Jianxuan Zhou, Yangang Nie, Jun Xue, Kai Lin and Liwen Tan

Mathematics 2025, 13(13), 2095; https://doi.org/10.3390/math13132095 - 26 Jun 2025

Viewed by 464

Abstract

Tomato ripeness detection in open-field environments is challenged by dense planting, heavy occlusion, and complex lighting conditions. Existing methods mainly rely on color and texture cues, limiting boundary perception and causing redundant predictions in crowded scenes. To address these issues, we propose an [...] Read more.

Tomato ripeness detection in open-field environments is challenged by dense planting, heavy occlusion, and complex lighting conditions. Existing methods mainly rely on color and texture cues, limiting boundary perception and causing redundant predictions in crowded scenes. To address these issues, we propose an improved detection framework called Edge-Guided DETR (EG-DETR), based on the DEtection TRansformer (DETR). EG-DETR introduces edge prior information by extracting multi-scale edge features through an edge backbone network. These features are fused in the transformer decoder to guide queries toward foreground regions, which improves detection under occlusion. We further design a redundant box suppression strategy to reduce duplicate predictions caused by clustered fruits. We evaluated our method on a multimodal tomato dataset that included varied lighting conditions such as natural light, artificial light, low light, and sodium yellow light. Our experimental results show that EG-DETR achieves an

A P

of 83.7% under challenging lighting and occlusion, outperforming existing models. This work provides a reliable intelligent sensing solution for automated harvesting in smart agriculture. Full article

(This article belongs to the Special Issue Intelligent Perception Computing and Graph Neural Networks: Algorithms, Applications, and New Challenges)

► Show Figures

Figure 1

22 pages, 7106 KiB

Open AccessArticle

Enhancing Highway Scene Understanding: A Novel Data Augmentation Approach for Vehicle-Mounted LiDAR Point Cloud Segmentation

by Dalong Zhou, Yuanyang Yi, Yu Wang, Zhenfeng Shao, Yanjun Hao, Yuyan Yan, Xiaojin Zhao and Junkai Guo

Remote Sens. 2025, 17(13), 2147; https://doi.org/10.3390/rs17132147 - 23 Jun 2025

Viewed by 384

Abstract

The intelligent extraction of highway assets is pivotal for advancing transportation infrastructure and autonomous systems, yet traditional methods relying on manual inspection or 2D imaging struggle with sparse, occluded environments, and class imbalance. This study proposes an enhanced MinkUNet-based framework to address data [...] Read more.

The intelligent extraction of highway assets is pivotal for advancing transportation infrastructure and autonomous systems, yet traditional methods relying on manual inspection or 2D imaging struggle with sparse, occluded environments, and class imbalance. This study proposes an enhanced MinkUNet-based framework to address data scarcity, occlusion, and imbalance in highway point cloud segmentation. A large-scale dataset (PEA-PC Dataset) was constructed, covering six key asset categories, addressing the lack of specialized highway datasets. A hybrid conical masking augmentation strategy was designed to simulate natural occlusions and enhance local feature retention, while semi-supervised learning prioritized foreground differentiation. The experimental results showed that the overall mIoU reached 73.8%, with the IoU of bridge railings and emergency obstacles exceeding 95%. The IoU of columnar assets increased from 2.6% to 29.4% through occlusion perception enhancement, demonstrating the effectiveness of this method in improving object recognition accuracy. The framework balances computational efficiency and robustness, offering a scalable solution for sparse highway scenes. However, challenges remain in segmenting vegetation-occluded pole-like assets due to partial data loss. This work highlights the efficacy of tailored augmentation and semi-supervised strategies in refining 3D segmentation, advancing applications in intelligent transportation and digital infrastructure. Full article

(This article belongs to the Special Issue Advances in High-Resolution Satellite Remote Sensing Image Processing and Classification)

► Show Figures

Figure 1

18 pages, 5033 KiB

Open AccessArticle

Research on Multi-Target Detection and Tracking of Intelligent Vehicles in Complex Traffic Environments Based on Deep Learning Theory

by Xuewen Chen, Shilong Yan and Chenxi Xia

World Electr. Veh. J. 2025, 16(6), 325; https://doi.org/10.3390/wevj16060325 - 11 Jun 2025

Viewed by 1062

Abstract

To address the issues of missed detections and false detections of small target missed detections caused by dense occlusion in complex traffic environments, a non-maximum suppression method, Bot-NMS, is proposed to achieve accurate prediction and localization of occluded targets. In the backbone network [...] Read more.

To address the issues of missed detections and false detections of small target missed detections caused by dense occlusion in complex traffic environments, a non-maximum suppression method, Bot-NMS, is proposed to achieve accurate prediction and localization of occluded targets. In the backbone network of YOLOv7, the Ghost module, the ECA attention mechanism, and the multi-scale feature detection structure are introduced to enhance the network’s capacity to learn small target features. The SCSTD and KITTI datasets were used to train and test the improved YOLOv7 target detection network model. The results demonstrate that the improved YOLOv7 method significantly enhances the recall rate and detection accuracy of various targets. A multi-target tracking method based on target re-identification (ReID) is proposed. Utilizing deep learning theory, a ReID model for target identification is constructed to comprehensively capture global and foreground target features. By establishing the correlation cost matrix of the cosine distance and IoU overlap, the correlation between target detection objects, the tracking trajectory, and ReID feature similarity is realized. The VERI-776 vehicle re-identification dataset and MARKET1501 pedestrian re-identification dataset were used to train the proposed ReID model, and multi-target tracking performance comparison experiments were conducted on the MOT16 dataset. The results show that the multi-target tracking method by introducing the ReID model and improving the cost matrix can better deal with the dense occlusion of the target, and can effectively and accurately track the road target in the realistic complex traffic environment. Full article

(This article belongs to the Special Issue Recent Advances in Intelligent Vehicle)

► Show Figures

Figure 1

21 pages, 57861 KiB

Open AccessArticle

Automatic Apple Detection and Counting with AD-YOLO and MR-SORT

by Xueliang Yang, Yapeng Gao, Mengyu Yin and Haifang Li

Sensors 2024, 24(21), 7012; https://doi.org/10.3390/s24217012 - 31 Oct 2024

Cited by 2 | Viewed by 2395

Abstract

In the production management of agriculture, accurate fruit counting plays a vital role in the orchard yield estimation and appropriate production decisions. Although recent tracking-by-detection algorithms have emerged as a promising fruit-counting method, they still cannot completely avoid fruit occlusion and light variations [...] Read more.

In the production management of agriculture, accurate fruit counting plays a vital role in the orchard yield estimation and appropriate production decisions. Although recent tracking-by-detection algorithms have emerged as a promising fruit-counting method, they still cannot completely avoid fruit occlusion and light variations in complex orchard environments, and it is difficult to realize automatic and accurate apple counting. In this paper, a video-based multiple-object tracking method, MR-SORT (Multiple Rematching SORT), is proposed based on the improved YOLOv8 and BoT-SORT. First, we propose the AD-YOLO model, which aims to reduce the number of incorrect detections during object tracking. In the YOLOv8s backbone network, an Omni-dimensional Dynamic Convolution (ODConv) module is used to extract local feature information and enhance the model’s ability better; a Global Attention Mechanism (GAM) is introduced to improve the detection ability of a foreground object (apple) in the whole image; a Soft Spatial Pyramid Pooling Layer (SSPPL) is designed to reduce the feature information dispersion and increase the sensory field of the network. Then, the improved BoT-SORT algorithm is proposed by fusing the verification mechanism, SURF feature descriptors, and the Vector of Local Aggregate Descriptors (VLAD) algorithm, which can match apples more accurately in adjacent video frames and reduce the probability of ID switching in the tracking process. The results show that the mAP metrics of the proposed AD-YOLO model are 3.1% higher than those of the YOLOv8 model, reaching 96.4%. The improved tracking algorithm has 297 fewer ID switches, which is 35.6% less than the original algorithm. The multiple-object tracking accuracy of the improved algorithm reached 85.6%, and the average counting error was reduced to 0.07. The coefficient of determination

R^{2}

between the ground truth and the predicted value reached 0.98. The above metrics show that our method can give more accurate counting results for apples and even other types of fruit. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition for Advanced Smart Agriculture Solutions)

► Show Figures

Figure 1

20 pages, 39702 KiB

Open AccessArticle

Spatial Information Enhancement with Multi-Scale Feature Aggregation for Long-Range Object and Small Reflective Area Object Detection from Point Cloud

by Hanwen Li, Huamin Tao, Qiuqun Deng, Shanzhu Xiao and Jianxiong Zhou

Remote Sens. 2024, 16(14), 2631; https://doi.org/10.3390/rs16142631 - 18 Jul 2024

Cited by 1 | Viewed by 1131

Abstract

Accurate and comprehensive 3D objects detection is important for perception systems in autonomous driving. Nevertheless, contemporary mainstream methods tend to perform more effectively on large objects in regions proximate to the LiDAR, leaving limited exploration of long-range objects and small objects. The divergent [...] Read more.

Accurate and comprehensive 3D objects detection is important for perception systems in autonomous driving. Nevertheless, contemporary mainstream methods tend to perform more effectively on large objects in regions proximate to the LiDAR, leaving limited exploration of long-range objects and small objects. The divergent point pattern of LiDAR, which results in a reduction in point density as the distance increases, leads to a non-uniform point distribution that is ill-suited to discretized volumetric feature extraction. To address this challenge, we propose the Foreground Voxel Proposal (FVP) module, which effectively locates and generates voxels at the foreground of objects. The outputs are subsequently merged to mitigating the difference in point cloud density and completing the object shape. Furthermore, the susceptibility of small objects to occlusion results in the loss of feature space. To overcome this, we propose the Multi-Scale Feature Integration Network (MsFIN), which captures contextual information at different ranges. Subsequently, the outputs of these features are integrated through a cascade framework based on transformers in order to supplement the object features space. The extensive experimental results demonstrate that our network achieves remarkable results. Remarkably, our approach demonstrated an improvement of 8.56% AP on the SECOND baseline for the Car detection task at a distance of more than 20 m, and 9.38% AP on the Cyclist detection task. Full article

(This article belongs to the Special Issue LiDAR and Point Cloud Processing for Digital Surface Modelling and 3D Scene Reconstruction)

► Show Figures

Figure 1

16 pages, 6425 KiB

Open AccessArticle

A Robust AR-DSNet Tracking Registration Method in Complex Scenarios

by Xiaomei Lei, Wenhuan Lu, Jiu Yong and Jianguo Wei

Electronics 2024, 13(14), 2807; https://doi.org/10.3390/electronics13142807 - 17 Jul 2024

Viewed by 1053

Abstract

A robust AR-DSNet (Augmented Reality method based on DSST and SiamFC networks) tracking registration method in complex scenarios is proposed to improve the ability of AR (Augmented Reality) tracking registration to distinguish target foreground and semantic interference background, and to address the issue [...] Read more.

A robust AR-DSNet (Augmented Reality method based on DSST and SiamFC networks) tracking registration method in complex scenarios is proposed to improve the ability of AR (Augmented Reality) tracking registration to distinguish target foreground and semantic interference background, and to address the issue of registration failure caused by similar target drift when obtaining scale information based on predicted target positions. Firstly, the pre-trained network in SiamFC (Siamese Fully-Convolutional) is utilized to obtain the response map of a larger search area and set a threshold to filter out the initial possible positions of the target; Then, combining the advantage of the DSST (Discriminative Scale Space Tracking) filter tracker to update the template online, a new scale filter is trained after collecting multi-scale images at the initial possible position of target to reason the target scale change. And linear interpolation is used to update the correlation coefficient to determine the final position of target tracking based on the difference between two frames. Finally, ORB (Oriented FAST and Rotated BRIEF) feature detection and matching are performed on the accurate target position image, and the registration matrix is calculated through matching relationships to overlay the virtual model onto the real scene, achieving enhancement of the real world. Simulation experiments show that in complex scenarios such as similar interference, target occlusion, and local deformation, the proposed AR-DSNet method can complete the registration of the target in AR 3D tracking, ensuring real-time performance while improving the robustness of the AR tracking registration algorithm. Full article

(This article belongs to the Special Issue Emerging Immersive Learning Technologies: Augmented and Virtual Reality)

► Show Figures

Figure 1

14 pages, 5273 KiB

Open AccessArticle

Mask Mixup Model: Enhanced Contrastive Learning for Few-Shot Learning

by Kai Xie, Yuxuan Gao, Yadang Chen and Xun Che

Appl. Sci. 2024, 14(14), 6063; https://doi.org/10.3390/app14146063 - 11 Jul 2024

Cited by 1 | Viewed by 1465

Abstract

Few-shot image classification aims to improve the performance of traditional image classification when faced with limited data. Its main challenge lies in effectively utilizing sparse sample label data to accurately predict the true feature distribution. Recent approaches have employed data augmentation techniques like [...] Read more.

Few-shot image classification aims to improve the performance of traditional image classification when faced with limited data. Its main challenge lies in effectively utilizing sparse sample label data to accurately predict the true feature distribution. Recent approaches have employed data augmentation techniques like random Mask or mixture interpolation to enhance the diversity and generalization of labeled samples. However, these methods still encounter several issues: (1) random Mask can lead to complete blockage or exposure of foreground, causing loss of crucial sample information; and (2) uniform data distribution after mixture interpolation makes it difficult for the model to differentiate between different categories and effectively distinguish their boundaries. To address these challenges, this paper introduces a novel data augmentation method based on saliency mask blending. Firstly, it selectively preserves key image features through adaptive selection and retention using visual feature occlusion fusion and confidence clipping strategies. Secondly, a visual feature saliency fusion approach is employed to calculate the importance of various image regions, guiding the blending process to produce more diverse and enriched images with clearer category boundaries. The proposed method achieves outstanding performance on multiple standard few-shot image classification datasets (miniImageNet, tieredImageNet, Few-shot FC100, and CUB), surpassing state-of-the-art methods by approximately 0.2–1%. Full article

► Show Figures

Figure 1

31 pages, 647 KiB

Open AccessReview

Retinal Vein Occlusion–Background Knowledge and Foreground Knowledge Prospects—A Review

by Maja Lendzioszek, Anna Bryl, Ewa Poppe, Katarzyna Zorena and Malgorzata Mrugacz

J. Clin. Med. 2024, 13(13), 3950; https://doi.org/10.3390/jcm13133950 - 5 Jul 2024

Cited by 8 | Viewed by 3350

Abstract

Thrombosis of retinal veins is one of the most common retinal vascular diseases that may lead to vascular blindness. The latest epidemiological data leave no illusions that the burden on the healthcare system, as impacted by patients with this diagnosis, will increase worldwide. [...] Read more.

Thrombosis of retinal veins is one of the most common retinal vascular diseases that may lead to vascular blindness. The latest epidemiological data leave no illusions that the burden on the healthcare system, as impacted by patients with this diagnosis, will increase worldwide. This obliges scientists to search for new therapeutic and diagnostic options. In the 21st century, there has been tremendous progress in retinal imaging techniques, which has facilitated a better understanding of the mechanisms related to the development of retinal vein occlusion (RVO) and its complications, and consequently has enabled the introduction of new treatment methods. Moreover, artificial intelligence (AI) is likely to assist in selecting the best treatment option for patients in the near future. The aim of this comprehensive review is to re-evaluate the old but still relevant data on the RVO and confront them with new studies. The paper will provide a detailed overview of diagnosis, current treatment, prevention, and future therapeutic possibilities regarding RVO, as well as clarifying the mechanism of macular edema in this disease entity. Full article

(This article belongs to the Special Issue New Clinical Treatment for Ocular Vascular Disease and Fundus Disease)

21 pages, 9226 KiB

Open AccessArticle

Moving Object Detection in Freely Moving Camera via Global Motion Compensation and Local Spatial Information Fusion

by Zhongyu Chen, Rong Zhao, Xindong Guo, Jianbin Xie and Xie Han

Sensors 2024, 24(9), 2859; https://doi.org/10.3390/s24092859 - 30 Apr 2024

Cited by 2 | Viewed by 4119

Abstract

Motion object detection (MOD) with freely moving cameras is a challenging task in computer vision. To extract moving objects, most studies have focused on the difference in motion features between foreground and background, which works well for dynamic scenes with relatively regular movements [...] Read more.

Motion object detection (MOD) with freely moving cameras is a challenging task in computer vision. To extract moving objects, most studies have focused on the difference in motion features between foreground and background, which works well for dynamic scenes with relatively regular movements and variations. However, abrupt illumination changes and occlusions often occur in real-world scenes, and the camera may also pan, tilt, rotate, and jitter, etc., resulting in local irregular variations and global discontinuities in motion features. Such complex and changing scenes bring great difficulty in detecting moving objects. To solve this problem, this paper proposes a new MOD method that effectively leverages local and global visual information for foreground/background segmentation. Specifically, on the global side, to support a wider range of camera motion, the relative inter-frame transformations are optimized to absolute transformations referenced to intermediate frames in a global form after enriching the inter-frame matching pairs. The global transformation is fine-tuned using the spatial transformer network (STN). On the local side, to address the problem of dynamic background scenes, foreground object detection is optimized by utilizing the pixel differences between the current frame and the local background model, as well as the consistency of local spatial variations. Then, the spatial information is combined using optical flow segmentation methods, enhancing the precision of the object information. The experimental results show that our method achieves a detection accuracy improvement of over 1.5% compared with the state-of-the-art methods on the datasets of CDNET2014, FBMS-59, and CBD. It demonstrates significant effectiveness in challenging scenarios such as shadows, abrupt changes in illumination, camera jitter, occlusion, and moving backgrounds. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

14 pages, 2317 KiB

Open AccessArticle

Enhanced U-Net with GridMask (EUGNet): A Novel Approach for Robotic Surgical Tool Segmentation

by Mostafa Daneshgar Rahbar and Seyed Ziae Mousavi Mojab

J. Imaging 2023, 9(12), 282; https://doi.org/10.3390/jimaging9120282 - 18 Dec 2023

Cited by 2 | Viewed by 2587

Abstract

This study proposed enhanced U-Net with GridMask (EUGNet) image augmentation techniques focused on pixel manipulation, emphasizing GridMask augmentation. This study introduces EUGNet, which incorporates GridMask augmentation to address U-Net’s limitations. EUGNet features a deep contextual encoder, residual connections, class-balancing loss, adaptive feature fusion, [...] Read more.

This study proposed enhanced U-Net with GridMask (EUGNet) image augmentation techniques focused on pixel manipulation, emphasizing GridMask augmentation. This study introduces EUGNet, which incorporates GridMask augmentation to address U-Net’s limitations. EUGNet features a deep contextual encoder, residual connections, class-balancing loss, adaptive feature fusion, GridMask augmentation module, efficient implementation, and multi-modal fusion. These innovations enhance segmentation accuracy and robustness, making it well-suited for medical image analysis. The GridMask algorithm is detailed, demonstrating its distinct approach to pixel elimination, enhancing model adaptability to occlusions and local features. A comprehensive dataset of robotic surgical scenarios and instruments is used for evaluation, showcasing the framework’s robustness. Specifically, there are improvements of 1.6 percentage points in balanced accuracy for the foreground, 1.7 points in intersection over union (IoU), and 1.7 points in mean Dice similarity coefficient (DSC). These improvements are highly significant and have a substantial impact on inference speed. The inference speed, which is a critical factor in real-time applications, has seen a noteworthy reduction. It decreased from 0.163 milliseconds for the U-Net without GridMask to 0.097 milliseconds for the U-Net with GridMask. Full article

(This article belongs to the Special Issue Clinical and Pathological Imaging in the Era of Artificial Intelligence: New Insights and Perspectives)

► Show Figures

Figure 1

21 pages, 12605 KiB

Open AccessArticle

Semi-RainGAN: A Semisupervised Coarse-to-Fine Guided Generative Adversarial Network for Mixture of Rain Removal

by Rongwei Yu, Ni Shu, Peihao Zhang and Yizhan Li

Symmetry 2023, 15(10), 1832; https://doi.org/10.3390/sym15101832 - 27 Sep 2023

Viewed by 1398

Abstract

Images taken in various real-world scenarios meet the symmetrical goal of simultaneously removing foreground rain-induced occlusions and restoring the background details. This inspires us to remember the principle of symmetry; real-world rain is a mixture of rain streaks and rainy haze and degrades [...] Read more.

Images taken in various real-world scenarios meet the symmetrical goal of simultaneously removing foreground rain-induced occlusions and restoring the background details. This inspires us to remember the principle of symmetry; real-world rain is a mixture of rain streaks and rainy haze and degrades the visual quality of the background. Current efforts formulate image rain streak removal and rainy haze removal as separate models, which disrupts the symmetrical characteristics of real-world rain and background, leading to significant performance degradation. To achieve this symmetrical balance, we propose a novel semisupervised coarse-to-fine guided generative adversarial network (Semi-RainGAN) for the mixture of rain removal. Beyond existing wisdom, Semi-RainGAN is a joint learning paradigm of the mixture of rain removal and attention and depth estimation. Additionally, it introduces a coarse-to-fine guidance mechanism that effectively fuses estimated image, attention, and depth features. This mechanism enables us to achieve symmetrically high-quality rain removal while preserving fine-grained details. To bridge the gap between synthetic and real-world rain, Semi-RainGAN makes full use of unpaired real-world rainy and clean images, enhancing its generalization to real-world scenarios. Extensive experiments on both synthetic and real-world rain datasets demonstrate clear visual and numerical improvements of Semi-RainGAN over sixteen state-of-the-art models. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

19 pages, 5666 KiB

Open AccessArticle

A Novel Moving Object Detection Algorithm Based on Robust Image Feature Threshold Segmentation with Improved Optical Flow Estimation

by Jing Ding, Zhen Zhang, Xuexiang Yu, Xingwang Zhao and Zhigang Yan

Appl. Sci. 2023, 13(8), 4854; https://doi.org/10.3390/app13084854 - 12 Apr 2023

Cited by 8 | Viewed by 2911

Abstract

The detection of moving objects in images is a crucial research objective; however, several challenges, such as low accuracy, background fixing or moving, ‘ghost’ issues, and warping, exist in its execution. The majority of approaches operate with a fixed camera. This study proposes [...] Read more.

The detection of moving objects in images is a crucial research objective; however, several challenges, such as low accuracy, background fixing or moving, ‘ghost’ issues, and warping, exist in its execution. The majority of approaches operate with a fixed camera. This study proposes a robust feature threshold moving object identification and segmentation method with enhanced optical flow estimation to overcome these challenges. Unlike most optical flow Otsu segmentation for fixed cameras, a background feature threshold segmentation technique based on a combination of the Horn–Schunck (HS) and Lucas–Kanade (LK) optical flow methods is presented in this paper. This approach aims to obtain the segmentation of moving objects. First, the HS and LK optical flows with the image pyramid are integrated to establish the high-precision and anti-interference optical flow estimation equation. Next, the Delaunay triangulation is used to solve the motion occlusion problem. Finally, the proposed robust feature threshold segmentation method is applied to the optical flow field to attract the moving object, which is the. extracted from the Harris feature and the image background affine transformation model. The technique uses morphological image processing to create the final moving target foreground area. Experimental results verified that this method successfully detected and segmented objects with high accuracy when the camera was either fixed or moving. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

21 pages, 4801 KiB

Open AccessArticle

Thermal Infrared Tracking Method Based on Efficient Global Information Perception

by Long Zhao, Xiaoye Liu, Honge Ren and Lingjixuan Xue

Sensors 2022, 22(19), 7408; https://doi.org/10.3390/s22197408 - 29 Sep 2022

Cited by 3 | Viewed by 1965

Abstract

To solve the insufficient ability of the current Thermal InfraRed (TIR) tracking methods to resist occlusion and interference from similar targets, we propose a TIR tracking method based on efficient global information perception. In order to efficiently obtain the global semantic information of [...] Read more.

To solve the insufficient ability of the current Thermal InfraRed (TIR) tracking methods to resist occlusion and interference from similar targets, we propose a TIR tracking method based on efficient global information perception. In order to efficiently obtain the global semantic information of images, we use the Transformer structure for feature extraction and fusion. In the feature extraction process, the Focal Transformer structure is used to improve the efficiency of remote information modeling, which is highly similar to the human attention mechanism. The feature fusion process supplements the relative position encoding to the standard Transformer structure, which allows the model to continuously consider the influence of positional relationships during the learning process. It can also generalize to capture the different positional information for different input sequences. Thus, it makes the Transformer structure model the semantic information contained in images more efficiently. To further improve the tracking accuracy and robustness, the heterogeneous bi-prediction head is utilized in the object prediction process. The fully connected sub-network is responsible for the classification prediction of the foreground or background. The convolutional sub-network is responsible for the regression prediction of the object bounding box. In order to alleviate the contradiction between the vast demand for training data of the Transformer model and the insufficient scale of the TIR tracking dataset, the LaSOT-TIR dataset is generated with the generative adversarial network for network training. Our method achieves the best performance compared with other state-of-the-art trackers on the VOT2015-TIR, VOT2017-TIR, PTB-TIR and LSOTB-TIR datasets, and performs outstandingly especially when dealing with severe occlusion or interference from similar objects. Full article

(This article belongs to the Special Issue Sensing and Processing for Infrared Vision: Methods and Applications)

► Show Figures

Figure 1

12 pages, 1713 KiB

Open AccessArticle

Scattering-Assisted Computational Imaging

by Yiwei Sun, Xiaoyan Wu, Jianhong Shi and Guihua Zeng

Photonics 2022, 9(8), 512; https://doi.org/10.3390/photonics9080512 - 23 Jul 2022

Cited by 3 | Viewed by 2173

Abstract

Imaging objects hidden behind an opaque shelter provides a crucial advantage when physically going around the obstacle is impossible or dangerous. Previous methods have demonstrated that is possible to reconstruct the image of a target hidden from view. However, these methods enable the [...] Read more.

Imaging objects hidden behind an opaque shelter provides a crucial advantage when physically going around the obstacle is impossible or dangerous. Previous methods have demonstrated that is possible to reconstruct the image of a target hidden from view. However, these methods enable the reconstruction by using the reflected light from a wall which may not be feasible in the wild. Compared with the wall, the “plug and play” scattering medium is more naturally and artificially accessible, such as smog and fogs. Here, we introduce a scattering-assisted technique that requires only a remarkably small block of single-shot speckle to perform transmission imaging around in-line-of-sight barriers. With the help of extra inserted scattering layers and a deep learning algorithm, the target hidden from view can be stably recovered while the directly uncovered view is reduced to 0.097% of the whole field of view, successfully removing the influence of large foreground occlusions. This scattering-assisted computational imaging has wide potential applications in real-life scenarios, such as covert imaging, resuming missions, and detecting hidden adversaries in real-time. Full article

► Show Figures

Figure 1

Search Results (27)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (27)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI