MDPI - Publisher of Open Access Journals

15 pages, 1662 KiB

Open AccessArticle

YOLO-HVS: Infrared Small Target Detection Inspired by the Human Visual System

by Xiaoge Wang, Yunlong Sheng, Qun Hao, Haiyuan Hou and Suzhen Nie

Biomimetics 2025, 10(7), 451; https://doi.org/10.3390/biomimetics10070451 - 8 Jul 2025

Viewed by 423

To address challenges of background interference and limited multi-scale feature extraction in infrared small target detection, this paper proposes a YOLO-HVS detection algorithm inspired by the human visual system. Based on YOLOv8, we design a multi-scale spatially enhanced attention module (MultiSEAM) using multi-branch [...] Read more.

To address challenges of background interference and limited multi-scale feature extraction in infrared small target detection, this paper proposes a YOLO-HVS detection algorithm inspired by the human visual system. Based on YOLOv8, we design a multi-scale spatially enhanced attention module (MultiSEAM) using multi-branch depth-separable convolution to suppress background noise and enhance occluded targets, integrating local details and global context. Meanwhile, the C2f_DWR (dilation-wise residual) module with regional-semantic dual residual structure is designed to significantly improve the efficiency of capturing multi-scale contextual information by expanding convolution and two-step feature extraction mechanism. We construct the DroneRoadVehicles dataset containing 1028 infrared images captured at 70–300 m, covering complex occlusion and multi-scale targets. Experiments show that YOLO-HVS achieves mAP50 of 83.4% and 97.8% on the public dataset DroneVehicle and the self-built dataset, respectively, which is an improvement of 1.1% and 0.7% over the baseline YOLOv8, and the number of model parameters only increases by 2.3 M, and the increase of GFLOPs is controlled at 0.1 G. The experimental results demonstrate that the proposed approach exhibits enhanced robustness in detecting targets under severe occlusion and low SNR conditions, while enabling efficient real-time infrared small target detection. Full article

(This article belongs to the Special Issue Advanced Biologically Inspired Vision and Its Application)

► Show Figures

Graphical abstract

17 pages, 939 KiB

Open AccessArticle

Whole-Body 3D Pose Estimation Based on Body Mass Distribution and Center of Gravity Constraints

by Fan Wei, Guanghua Xu, Qingqiang Wu, Penglin Qin, Leijun Pan and Yihua Zhao

Sensors 2025, 25(13), 3944; https://doi.org/10.3390/s25133944 - 25 Jun 2025

Viewed by 537

Abstract

Estimating the 3D pose of a human body from monocular images is crucial for computer vision applications, but the technique remains challenging due to depth ambiguity and self-occlusion. Traditional methods often suffer from insufficient prior knowledge and weak constraints, resulting in inaccurate 3D [...] Read more.

Estimating the 3D pose of a human body from monocular images is crucial for computer vision applications, but the technique remains challenging due to depth ambiguity and self-occlusion. Traditional methods often suffer from insufficient prior knowledge and weak constraints, resulting in inaccurate 3D keypoint estimation. In this paper, we propose a method for whole-body 3D pose estimation based on a Transformer architecture, integrating body mass distribution and center of gravity constraints. The method maps the pose to the center of gravity position using the anatomical mass ratio of the human body and computes the segment-level center of gravity using the moment synthesis method. A combined loss function is designed to enforce consistency between the predicted keypoints and the center of gravity position, as well as the invariance of limb length. Extensive experiments on the Human 3.6M WholeBody dataset demonstrate that the proposed method achieves state-of-the-art performance, with a whole-body mean joint position error (MPJPE) of 44.49 mm, which is 60.4% lower than the previous Large Simple Baseline method. Notably, it reduces the body part keypoints’ MPJPE from 112.6 to 40.41, showcasing the enhanced robustness and effectiveness to occluded scenes. This study highlights the effectiveness of integrating physical constraints into deep learning frameworks for accurate 3D pose estimation. Full article

(This article belongs to the Collection Advances in Deep-Learning-Based Sensing, Imaging, and Video Processing)

► Show Figures

Figure 1

27 pages, 22376 KiB

Open AccessArticle

Performance Evaluation of Monocular Markerless Pose Estimation Systems for Industrial Exoskeletons

by Soocheol Yoon, Ya-Shian Li-Baboud, Ann Virts, Roger Bostelman, Mili Shah and Nishat Ahmed

Sensors 2025, 25(9), 2877; https://doi.org/10.3390/s25092877 - 2 May 2025

Cited by 1 | Viewed by 596

Abstract

Industrial exoskeletons (a.k.a. wearable robots) have been developed to reduce musculoskeletal fatigue and work injuries. Human joint kinematics and human–robot alignment are important measurements in understanding the effects of industrial exoskeletons. Recently, markerless pose estimation systems based on monocular color (red, green, blue—RGB) [...] Read more.

Industrial exoskeletons (a.k.a. wearable robots) have been developed to reduce musculoskeletal fatigue and work injuries. Human joint kinematics and human–robot alignment are important measurements in understanding the effects of industrial exoskeletons. Recently, markerless pose estimation systems based on monocular color (red, green, blue—RGB) and depth cameras are being used to estimate human joint positions. This study analyzes the performance of monocular markerless pose estimation systems on human skeletal joint estimation while wearing exoskeletons. Two pose estimation systems producing RGB and depth images from ten viewpoints are evaluated for one subject in 14 industrial poses. The experiment was repeated for three different types of exoskeletons on the same subject. An optical tracking system (OTS) was used as a reference system. The image acceptance rate was 56% for the RGB, 22% for the depth, and 78% for the OTS pose estimation system. The key sources of pose estimation error were the occlusions from the exoskeletons, industrial poses, and viewpoints. The reference system showed decreased performance when the optical markers were occluded by the exoskeleton or when the markers’ position shifted with the exoskeleton. This study performs a systematic comparison of two types of monocular markerless pose estimation systems and an optical tracking system, as well as a proposed metric, based on a tracking quality ratio, to assess whether a skeletal joint estimation would be acceptable for human kinematics analysis in exoskeleton studies. Full article

(This article belongs to the Special Issue Wearable Robotics and Assistive Devices)

► Show Figures

Figure 1

29 pages, 13792 KiB

Open AccessArticle

Improving Fire and Smoke Detection with You Only Look Once 11 and Multi-Scale Convolutional Attention

by Yuxuan Li, Lisha Nie, Fangrong Zhou, Yun Liu, Haoyu Fu, Nan Chen, Qinling Dai and Leiguang Wang

Fire 2025, 8(5), 165; https://doi.org/10.3390/fire8050165 - 22 Apr 2025

Cited by 1 | Viewed by 1441

Abstract

Fires pose significant threats to human safety, health, and property. Traditional methods, with their inefficient use of features, struggle to meet the demands of fire detection. You Only Look Once (YOLO), as an efficient deep learning object detection framework, can rapidly locate and [...] Read more.

Fires pose significant threats to human safety, health, and property. Traditional methods, with their inefficient use of features, struggle to meet the demands of fire detection. You Only Look Once (YOLO), as an efficient deep learning object detection framework, can rapidly locate and identify fire and smoke objects in visual images. However, research utilizing the latest YOLO11 for fire and smoke detection remains sparse, and addressing the scale variability of fire and smoke objects as well as the practicality of detection models continues to be a research focus. This study first compares YOLO11 with classic models in the YOLO series to analyze its advantages in fire and smoke detection tasks. Then, to tackle the challenges of scale variability and model practicality, we propose a Multi-Scale Convolutional Attention (MSCA) mechanism, integrating it into YOLO11 to create YOLO11s-MSCA. Experimental results show that YOLO11 outperforms other YOLO models by balancing accuracy, speed, and practicality. The YOLO11s-MSCA model performs exceptionally well on the D-Fire dataset, improving overall detection accuracy by 2.6% and smoke recognition accuracy by 2.8%. The model demonstrates a stronger ability to identify small fire and smoke objects. Although challenges remain in handling occluded targets and complex backgrounds, the model exhibits strong robustness and generalization capabilities, maintaining efficient detection performance in complicated environments. Full article

(This article belongs to the Topic AI for Natural Disasters Detection, Prediction and Modeling)

► Show Figures

Figure 1

21 pages, 5409 KiB

Open AccessArticle

Discriminative Deformable Part Model for Pedestrian Detection with Occlusion Handling

by Shahzad Siddiqi, Muhammad Faizan Shirazi and Yawar Rehman

AI 2025, 6(4), 70; https://doi.org/10.3390/ai6040070 - 3 Apr 2025

Viewed by 888

Abstract

Efficient pedestrian detection plays an important role in many practical daily life applications, such as autonomous cars, video surveillance, and intelligent driving assistance systems. The main goal of pedestrian detection systems, especially in vehicles, is to prevent accidents. By recognizing pedestrians in real [...] Read more.

Efficient pedestrian detection plays an important role in many practical daily life applications, such as autonomous cars, video surveillance, and intelligent driving assistance systems. The main goal of pedestrian detection systems, especially in vehicles, is to prevent accidents. By recognizing pedestrians in real time, these systems can alert drivers or even autonomously apply brakes, minimizing the possibility of collisions. However, occlusion is a major obstacle to pedestrian detection. Pedestrians are typically occluded by trees, street poles, cars, and other pedestrians. State-of-the-art detection methods are based on fully visible or little-occluded pedestrians; hence, their performance declines with increasing occlusion level. To meet this challenge, a pedestrian detector capable of handling occlusion is preferred. To increase the detection accuracy for occluded pedestrians, we propose a new method called the Discriminative Deformable Part Model (DDPM), which uses the concept of breaking human image into deformable parts via machine learning. In existing works, human image breaking into deformable parts has been performed by human intuition. In our novel approach, machine learning is used for deformable objects such as humans, combining the benefits and removing the drawbacks of the previous works. We also propose a new pedestrian dataset based on Eastern clothes to accommodate the detector’s evaluation under different intra-class variations of pedestrians. The proposed method achieves a higher detection accuracy on Pascal VOC and VisDrone Detection datasets when compared with other popular detection methods. Full article

► Show Figures

Figure 1

16 pages, 2166 KiB

Open AccessArticle

Integrating Pose Features and Cross-Relationship Learning for Human–Object Interaction Detection

by Lang Wu, Jie Li, Shuqin Li, Yu Ding, Meng Zhou and Yuntao Shi

AI 2025, 6(3), 55; https://doi.org/10.3390/ai6030055 - 12 Mar 2025

Viewed by 1104

Abstract

Background: The main challenge in human–object interaction detection (HOI) is how to accurately reason about ambiguous, complex, and difficult to recognize interactions. The model structure of the existing methods is relatively single, and the image input may be occluded and cannot be accurately [...] Read more.

Background: The main challenge in human–object interaction detection (HOI) is how to accurately reason about ambiguous, complex, and difficult to recognize interactions. The model structure of the existing methods is relatively single, and the image input may be occluded and cannot be accurately recognized. Methods: In this paper, we design a Pose-Aware Interaction Network (PAIN) based on transformer architecture and human posture to address these issues through two innovations: A new feature fusion method is proposed, which fuses human pose features and image features early before the encoder to improve the feature expression ability, and the individual motion-related features are additionally strengthened by adding to the human branch; the Cross-Attention Relationship fusion Module (CARM) better fuses the three-branch output and captures the detailed relationship information of HOI. Results: The proposed method achieves 64.51%

A P_{r o l e}^{# 1}

, 66.42%

A P_{r o l e}^{# 2}

on the public dataset V-COCO and 30.83% AP on HICO-DET, which can recognize HOI instances more accurately. Full article

► Show Figures

Figure 1

21 pages, 17223 KiB

Open AccessArticle

Line-YOLO: An Efficient Detection Algorithm for Power Line Angle

by Chuanjiang Wang, Yuqing Chen, Zecong Wu, Baoqi Liu, Hao Tian, Dongxiao Jiang and Xiujuan Sun

Sensors 2025, 25(3), 876; https://doi.org/10.3390/s25030876 - 31 Jan 2025

Cited by 2 | Viewed by 1298

Abstract

Aiming at the problem that the workload of human judgment of the power line tilt angle is large and prone to large errors, this paper proposes an improved algorithm Line-YOLO based on YOLOv8s-seg. Firstly, the problem of the variable shape of the power [...] Read more.

Aiming at the problem that the workload of human judgment of the power line tilt angle is large and prone to large errors, this paper proposes an improved algorithm Line-YOLO based on YOLOv8s-seg. Firstly, the problem of the variable shape of the power line is solved through the introduction of deformable convolutional DCNv4, and the detection accuracy is improved. The BiFPN structure is also introduced for the Neck layer, which shortens the time required for feature fusion and improves the detection efficiency. After that, the EMA attention mechanism module is added behind the second and third C2f modules of the original model, which improves the model’s ability to recognize the target, and effectively solves the problem of loss and error when power line targets overlap. Finally, a small target detection head is added after the first EMA attention mechanism module for detecting small or occluded targets in the image, which improves the model’s ability to detect small targets. In this paper, we conduct experiments by collecting relevant power line connection images and making our dataset. The experimental results show that the

m A P @ 0.5

of Line-YOLO is improved by 6.2% compared to the benchmark model, the number of parameters is reduced by 28.2%, the floating-point operations per second is enhanced by 35.3%, and the number of detected frames per second is improved by 14 FPS. It is proved by the experiments that the enhanced model Line-YOLO detects the results better, and it can efficiently complete the power line angle detection task. Full article

(This article belongs to the Section Electronic Sensors)

► Show Figures

Figure 1

23 pages, 22602 KiB

Open AccessArticle

Enhancing Human Detection in Occlusion-Heavy Disaster Scenarios: A Visibility-Enhanced DINO (VE-DINO) Model with Reassembled Occlusion Dataset

by Zi-An Zhao, Shidan Wang, Min-Xin Chen, Ye-Jiao Mao, Andy Chi-Ho Chan, Derek Ka-Hei Lai, Duo Wai-Chi Wong and James Chung-Wai Cheung

Smart Cities 2025, 8(1), 12; https://doi.org/10.3390/smartcities8010012 - 16 Jan 2025

Cited by 2 | Viewed by 2288

Abstract

Natural disasters create complex environments where effective human detection is both critical and challenging, especially when individuals are partially occluded. While recent advancements in computer vision have improved detection capabilities, there remains a significant need for efficient solutions that can enhance search-and-rescue (SAR) [...] Read more.

Natural disasters create complex environments where effective human detection is both critical and challenging, especially when individuals are partially occluded. While recent advancements in computer vision have improved detection capabilities, there remains a significant need for efficient solutions that can enhance search-and-rescue (SAR) operations in resource-constrained disaster scenarios. This study modified the original DINO (Detection Transformer with Improved Denoising Anchor Boxes) model and introduced the visibility-enhanced DINO (VE-DINO) model, designed for robust human detection in occlusion-heavy environments, with potential integration into SAR system. VE-DINO enhances detection accuracy by incorporating body part key point information and employing a specialized loss function. The model was trained and validated using the COCO2017 dataset, with additional external testing conducted on the Disaster Occlusion Detection Dataset (DODD), which we developed by meticulously compiling relevant images from existing public datasets to represent occlusion scenarios in disaster contexts. The VE-DINO achieved an average precision of 0.615 at IoU 0.50:0.90 on all bounding boxes, outperforming the original DINO model (0.491) in the testing set. The external testing of VE-DINO achieved an average precision of 0.500. An ablation study was conducted and demonstrated the robustness of the model subject when confronted with varying degrees of body occlusion. Furthermore, to illustrate the practicality, we conducted a case study demonstrating the usability of the model when integrated into an unmanned aerial vehicle (UAV)-based SAR system, showcasing its potential in real-world scenarios. Full article

(This article belongs to the Topic Machine Learning and Big Data Analytics for Natural Disaster Reduction and Resilience)

► Show Figures

Figure 1

23 pages, 4473 KiB

Open AccessArticle

A Study of Occluded Person Re-Identification for Shared Feature Fusion with Pose-Guided and Unsupervised Semantic Segmentation

by Junsuo Qu, Zhenguo Zhang, Yanghai Zhang and Chensong He

Electronics 2024, 13(22), 4523; https://doi.org/10.3390/electronics13224523 - 18 Nov 2024

Viewed by 1466

Abstract

The human body is often occluded by a variety of obstacles in the monitoring system, so occluded person re-identification is still a long-standing challenge. Recent methods based on pose guidance or external semantic clues have improved the representation and related performance of features; [...] Read more.

The human body is often occluded by a variety of obstacles in the monitoring system, so occluded person re-identification is still a long-standing challenge. Recent methods based on pose guidance or external semantic clues have improved the representation and related performance of features; there are still problems, such as weak model representation and unreliable semantic clues. To solve the above problems, we proposed a feature extraction network, named shared feature fusion with pose-guided and unsupervised semantic segmentation (SFPUS). This network will extract more discriminative features and reduce the occlusion noise on pedestrian matching. Firstly, the multibranch joint feature extraction module (MFE) is used to extract feature sets containing pose information and high-order semantic information. This module not only provides robust extraction capabilities but can also precisely segment occlusion and the body. Secondly, in order to obtain multiscale discriminant features, the multiscale correlation feature matching fusion module (MCF) is used to match the two feature sets, and the Pose–Semantic Fusion Loss is designed to calculate the similarity of the feature sets between different modes and fuse them into a feature set. Thirdly, to solve the problem of image occlusion, we use unsupervised cascade clustering to better prevent occlusion interference. Finally, performances of the proposed method and various existing methods are compared on the Occluded-Duke, Occluded-ReID, Market-1501 and Duke-MTMC datasets. The accuracy of Rank-1 reached 65.7%, 80.8%, 94.8% and 89.6%, respectively, and the mAP accuracy reached 58.8%, 72.5%, 91.8% and 80.1%. The experiment results demonstrate that our proposed SFPUS holds promising prospects and performs admirably compared with state-of-the-art methods. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

23 pages, 23514 KiB

Open AccessArticle

Deep-Learning-Based Automated Building Construction Progress Monitoring for Prefabricated Prefinished Volumetric Construction

by Wei Png Chua and Chien Chern Cheah

Sensors 2024, 24(21), 7074; https://doi.org/10.3390/s24217074 - 2 Nov 2024

Viewed by 2654

Abstract

Prefabricated prefinished volumetric construction (PPVC) is a relatively new technique that has recently gained popularity for its ability to improve flexibility in scheduling and resource management. Given the modular nature of PPVC assembly and the large amounts of visual data amassed throughout a [...] Read more.

Prefabricated prefinished volumetric construction (PPVC) is a relatively new technique that has recently gained popularity for its ability to improve flexibility in scheduling and resource management. Given the modular nature of PPVC assembly and the large amounts of visual data amassed throughout a construction project today, PPVC building construction progress monitoring can be conducted by quantifying assembled PPVC modules within images or videos. As manually processing high volumes of visual data can be extremely time consuming and tedious, building construction progress monitoring can be automated to be more efficient and reliable. However, the complex nature of construction sites and the presence of nearby infrastructure could occlude or distort visual data. Furthermore, imaging constraints can also result in incomplete visual data. Therefore, it is hard to apply existing purely data-driven object detectors to automate building progress monitoring at construction sites. In this paper, we propose a novel 2D window-based automated visual building construction progress monitoring (WAVBCPM) system to overcome these issues by mimicking human decision making during manual progress monitoring with a primary focus on PPVC building construction. WAVBCPM is segregated into three modules. A detection module first conducts detection of windows on the target building. This is achieved by detecting windows within the input image at two scales by using YOLOv5 as a backbone network for object detection before using a window detection filtering process to omit irrelevant detections from the surrounding areas. Next, a rectification module is developed to account for missing windows in the mid-section and near-ground regions of the constructed building that may be caused by occlusion and poor detection. Lastly, a progress estimation module checks the processed detections for missing or excess information before performing building construction progress estimation. The proposed method is tested on images from actual construction sites, and the experimental results demonstrate that WAVBCPM effectively addresses real-world challenges. By mimicking human inference, it overcomes imperfections in visual data, achieving higher accuracy in progress monitoring compared to purely data-driven object detectors. Full article

(This article belongs to the Special Issue Novel Trends in Sensor Technology Applications for Intelligent Urban Infrastructures)

► Show Figures

Figure 1

33 pages, 57153 KiB

Open AccessArticle

Evaluation of Automated Object-Detection Algorithms for Koala Detection in Infrared Aerial Imagery

by Laith A. H. Al-Shimaysawee, Anthony Finn, Delene Weber, Morgan F. Schebella and Russell S. A. Brinkworth

Sensors 2024, 24(21), 7048; https://doi.org/10.3390/s24217048 - 31 Oct 2024

Viewed by 1365

Abstract

Effective detection techniques are important for wildlife monitoring and conservation applications and are especially helpful for species that live in complex environments, such as arboreal animals like koalas (Phascolarctos cinereus). The implementation of infrared cameras and drones has demonstrated encouraging outcomes, [...] Read more.

Effective detection techniques are important for wildlife monitoring and conservation applications and are especially helpful for species that live in complex environments, such as arboreal animals like koalas (Phascolarctos cinereus). The implementation of infrared cameras and drones has demonstrated encouraging outcomes, regardless of whether the detection was performed by human observers or automated algorithms. In the case of koala detection in eucalyptus plantations, there is a risk to spotters during forestry operations. In addition, fatigue and tedium associated with the difficult and repetitive task of checking every tree means automated detection options are particularly desirable. However, obtaining high detection rates with minimal false alarms remains a challenging task, particularly when there is low contrast between the animals and their surroundings. Koalas are also small and often partially or fully occluded by canopy, tree stems, or branches, or the background is highly complex. Biologically inspired vision systems are known for their superior ability in suppressing clutter and enhancing the contrast of dim objects of interest against their surroundings. This paper introduces a biologically inspired detection algorithm to locate koalas in eucalyptus plantations and evaluates its performance against ten other detection techniques, including both image processing and neural-network-based approaches. The nature of koala occlusion by canopy cover in these plantations was also examined using a combination of simulated and real data. The results show that the biologically inspired approach significantly outperformed the competing neural-network- and computer-vision-based approaches by over 27%. The analysis of simulated and real data shows that koala occlusion by tree stems and canopy can have a significant impact on the potential detection of koalas, with koalas being fully occluded in up to 40% of images in which koalas were known to be present. Our analysis shows the koala’s heat signature is more likely to be occluded when it is close to the centre of the image (i.e., it is directly under a drone) and less likely to be occluded off the zenith. This has implications for flight considerations. This paper also describes a new accurate ground-truth dataset of aerial high-dynamic-range infrared imagery containing instances of koala heat signatures. This dataset is made publicly available to support the research community. Full article

(This article belongs to the Special Issue Emerging Remote Sensing Techniques and Applications for Object Detection)

► Show Figures

Figure 1

16 pages, 76534 KiB

Open AccessArticle

KSL-POSE: A Real-Time 2D Human Pose Estimation Method Based on Modified YOLOv8-Pose Framework

by Tianyi Lu, Ke Cheng, Xuecheng Hua and Suning Qin

Sensors 2024, 24(19), 6249; https://doi.org/10.3390/s24196249 - 26 Sep 2024

Cited by 3 | Viewed by 2429

Abstract

Two-dimensional human pose estimation aims to equip computers with the ability to accurately recognize human keypoints and comprehend their spatial contexts within media content. However, the accuracy of real-time human pose estimation diminishes when processing images with occluded body parts or overlapped individuals. [...] Read more.

Two-dimensional human pose estimation aims to equip computers with the ability to accurately recognize human keypoints and comprehend their spatial contexts within media content. However, the accuracy of real-time human pose estimation diminishes when processing images with occluded body parts or overlapped individuals. To address these issues, we propose a method based on the YOLO framework. We integrate the convolutional concepts of Kolmogorov–Arnold Networks (KANs) through introducing non-linear activation functions to enhance the feature extraction capabilities of the convolutional kernels. Moreover, to improve the detection of small target keypoints, we integrate the cross-stage partial (CSP) approach and utilize the small object enhance pyramid (SOEP) module for feature integration. We also innovatively incorporate a layered shared convolution with batch normalization detection head (LSCB), consisting of multiple shared convolutional layers and batch normalization layers, to enable cross-stage feature fusion and address the low utilization of model parameters. Given the structure and purpose of the proposed model, we name it KSL-POSE. Compared to the baseline model YOLOv8l-POSE, KSL-POSE achieves significant improvements, increasing the average detection accuracy by 1.5% on the public MS COCO 2017 data set. Furthermore, the model also demonstrates competitive performance on the CrowdPOSE data set, thus validating its generalization ability. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

17 pages, 3759 KiB

Open AccessArticle

A Multi-Scale Graph Attention-Based Transformer for Occluded Person Re-Identification

by Ming Ma, Jianming Wang and Bohan Zhao

Appl. Sci. 2024, 14(18), 8279; https://doi.org/10.3390/app14188279 - 13 Sep 2024

Cited by 1 | Viewed by 2115

Abstract

The objective of person re-identification (ReID) tasks is to match a specific individual across different times, locations, or camera viewpoints. The prevalent issue of occlusion in real-world scenarios affects image information, rendering the affected features unreliable. The difficulty and core challenge lie in [...] Read more.

The objective of person re-identification (ReID) tasks is to match a specific individual across different times, locations, or camera viewpoints. The prevalent issue of occlusion in real-world scenarios affects image information, rendering the affected features unreliable. The difficulty and core challenge lie in how to effectively discern and extract visual features from human images under various complex conditions, including cluttered backgrounds, diverse postures, and the presence of occlusions. Some works have employed pose estimation or human key point detection to construct graph-structured information to counteract the effects of occlusions. However, this approach introduces new noise due to issues such as the invisibility of key points. Our proposed module, in contrast, does not require the use of additional feature extractors. Our module employs multi-scale graph attention for the reweighting of feature importance. This allows features to concentrate on areas genuinely pertinent to the re-identification task, thereby enhancing the model’s robustness against occlusions. To address these problems, a model that employs multi-scale graph attention to reweight the importance of features is proposed in this study, significantly enhancing the model’s robustness against occlusions. Our experimental results demonstrate that, compared to baseline models, the method proposed herein achieves a notable improvement in mAP on occluded datasets, with increases of 0.5%, 31.5%, and 12.3% in mAP scores. Full article

► Show Figures

Figure 1

23 pages, 76553 KiB

Open AccessFeature PaperArticle

3DRecNet: A 3D Reconstruction Network with Dual Attention and Human-Inspired Memory

by Muhammad Awais Shoukat, Allah Bux Sargano, Lihua You and Zulfiqar Habib

Electronics 2024, 13(17), 3391; https://doi.org/10.3390/electronics13173391 - 26 Aug 2024

Viewed by 1340

Abstract

Humans inherently perceive 3D scenes using prior knowledge and visual perception, but 3D reconstruction in computer graphics is challenging due to complex object geometries, noisy backgrounds, and occlusions, leading to high time and space complexity. To addresses these challenges, this study introduces 3DRecNet, [...] Read more.

Humans inherently perceive 3D scenes using prior knowledge and visual perception, but 3D reconstruction in computer graphics is challenging due to complex object geometries, noisy backgrounds, and occlusions, leading to high time and space complexity. To addresses these challenges, this study introduces 3DRecNet, a compact 3D reconstruction architecture optimized for both efficiency and accuracy through five key modules. The first module, the Human-Inspired Memory Network (HIMNet), is designed for initial point cloud estimation, assisting in identifying and localizing objects in occluded and complex regions while preserving critical spatial information. Next, separate image and 3D encoders perform feature extraction from input images and initial point clouds. These features are combined using a dual attention-based feature fusion module, which emphasizes features from the image branch over those from the 3D encoding branch. This approach ensures independence from proposals at inference time and filters out irrelevant information, leading to more accurate and detailed reconstructions. Finally, a Decoder Branch transforms the fused features into a 3D representation. The integration of attention-based fusion with the memory network in 3DRecNet significantly enhances the overall reconstruction process. Experimental results on the benchmark datasets, such as ShapeNet, ObjectNet3D, and Pix3D, demonstrate that 3DRecNet outperforms existing methods. Full article

(This article belongs to the Special Issue New Trends in Computer Vision and Image Processing)

► Show Figures

Figure 1

10 pages, 4728 KiB

Open AccessCommunication

High-Resolution Iodine-Enhanced Micro-Computed Tomography of Intact Human Hearts for Detailed Coronary Microvasculature Analyses

by Joerg Reifart and Paul Iaizzo

J. Imaging 2024, 10(7), 173; https://doi.org/10.3390/jimaging10070173 - 18 Jul 2024

Viewed by 1671

Abstract

Identifying the detailed anatomies of the coronary microvasculature remains an area of research; one needs to develop methods for non-destructive, high-resolution, three-dimensional imaging of these vessels for computational modeling. Currently employed Micro-Computed Tomography (Micro-CT) protocols for vasa vasorum analyses require organ dissection and, [...] Read more.

Identifying the detailed anatomies of the coronary microvasculature remains an area of research; one needs to develop methods for non-destructive, high-resolution, three-dimensional imaging of these vessels for computational modeling. Currently employed Micro-Computed Tomography (Micro-CT) protocols for vasa vasorum analyses require organ dissection and, in most cases, non-clearable contrast agents. Here, we describe a method developed for a non-destructive, economical means to achieve high-resolution images of the human coronary microvasculature without organ dissection. Formalin-fixed human hearts were cannulated using venogram balloon catheters, which were then fixed into the specimen’s aortic root. The canulated hearts, protected by a polyethylene bag, were placed in radiolucent containers filled with insulating polyurethane foam to reduce movement. For vasculature staining, iodine potassium iodide (IKI, Lugol’s solution; 6.3% Potassium Iodide, 4.1% Iodide) was injected. Contrast distributions were monitored using a North Star Imaging X3000 micro-CT scanner with low-radiation settings, followed by high-radiation scanning (3600 rad, 60 kV, 900 mA) for the final high-resolution imaging. We successfully imaged four intact human hearts presenting with chronic total coronary occlusions of the right coronary artery. This imaging enabled detailed analyses of the vasa vasorum surrounding stenosed and occluded segments. After imaging, the hearts were cleared of iodine and excess polyurethane foam and returned to their initial formalin-fixed state for indefinite storage. Conclusions: the described methodologies allow for the non-destructive, high-resolution micro-CT imaging of coronary microvasculature in intact human hearts, paving the way for detailed computational 3D microvascular reconstructions with a macrovascular context. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

Search Results (51)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (51)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI