Next Article in Journal
Comparative Assessment of Neural Radiance Fields and 3D Gaussian Splatting for Point Cloud Generation from UAV Imagery
Previous Article in Journal
A Magnetron Plasma Arc Fusion Identification Study Based on GPCC-CNN-SVM Multi-Source Signal Fusion
Previous Article in Special Issue
Multi-Head Attention-Based Framework with Residual Network for Human Action Recognition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

DualPose: Dual-Block Transformer Decoder with Contrastive Denoising for Multi-Person Pose Estimation

by
Matteo Fincato
and
Roberto Vezzani
*
Department of Engineering “Enzo Ferrari”, University of Modena and Reggio Emilia, Via P. Vivarelli 10, 41125 Modena, Italy
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(10), 2997; https://doi.org/10.3390/s25102997
Submission received: 3 April 2025 / Revised: 5 May 2025 / Accepted: 6 May 2025 / Published: 9 May 2025

Abstract

Multi-person pose estimation is the task of detecting and regressing the keypoint coordinates of multiple people in a single image. Significant progress has been achieved in recent years, especially with the introduction of transformer-based end-to-end methods. In this paper, we present DualPose, a novel framework that enhances multi-person pose estimation by leveraging a dual-block transformer decoding architecture. Class prediction and keypoint estimation are split into parallel blocks so each sub-task can be separately improved and the risk of interference is reduced. This architecture improves the precision of keypoint localization and the model’s capacity to accurately classify individuals. To improve model performance, the Keypoint-Block uses parallel processing of self-attentions, providing a novel strategy that improves keypoint localization accuracy and precision. Additionally, DualPose incorporates a contrastive denoising (CDN) mechanism, leveraging positive and negative samples to stabilize training and improve robustness. Thanks to CDN, a variety of training samples are created by introducing controlled noise into the ground truth, improving the model’s ability to discern between valid and incorrect keypoints. DualPose achieves state-of-the-art results outperforming recent end-to-end methods, as shown by extensive experiments on the MS COCO and CrowdPose datasets. The code and pretrained models are publicly available.
Keywords: contrastive denoising; DualPose; human pose estimation; multi-person pose estimation; transformer-based models contrastive denoising; DualPose; human pose estimation; multi-person pose estimation; transformer-based models

Share and Cite

MDPI and ACS Style

Fincato, M.; Vezzani, R. DualPose: Dual-Block Transformer Decoder with Contrastive Denoising for Multi-Person Pose Estimation. Sensors 2025, 25, 2997. https://doi.org/10.3390/s25102997

AMA Style

Fincato M, Vezzani R. DualPose: Dual-Block Transformer Decoder with Contrastive Denoising for Multi-Person Pose Estimation. Sensors. 2025; 25(10):2997. https://doi.org/10.3390/s25102997

Chicago/Turabian Style

Fincato, Matteo, and Roberto Vezzani. 2025. "DualPose: Dual-Block Transformer Decoder with Contrastive Denoising for Multi-Person Pose Estimation" Sensors 25, no. 10: 2997. https://doi.org/10.3390/s25102997

APA Style

Fincato, M., & Vezzani, R. (2025). DualPose: Dual-Block Transformer Decoder with Contrastive Denoising for Multi-Person Pose Estimation. Sensors, 25(10), 2997. https://doi.org/10.3390/s25102997

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop