Next Article in Journal
RiscADA: RISC-V Extension for Optimized Control of External D/A and A/D Converters
Previous Article in Journal
Filter-Based Tchebichef Moment Analysis for Whole Slide Image Reconstruction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

AFJ-PoseNet: Enhancing Simple Baselines with Attention-Guided Fusion and Joint-Aware Positional Encoding

School of Computer and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(15), 3150; https://doi.org/10.3390/electronics14153150 (registering DOI)
Submission received: 4 July 2025 / Revised: 29 July 2025 / Accepted: 6 August 2025 / Published: 7 August 2025
(This article belongs to the Section Computer Science & Engineering)

Abstract

Simple Baseline has become a dominant benchmark in human pose estimation (HPE) due to its excellent performance and simple design. However, its “strong encoder + simple decoder” architectural paradigm suffers from two core limitations: (1) its non-branching, linear deconvolutional path prevents it from leveraging the rich, fine-grained features generated by the encoder at multiple scales and (2) the model lacks explicit prior knowledge of both the absolute positions and structural layout of human keypoints. To address these issues, this paper introduces AFJ-PoseNet, a new architecture that deeply enhances the Simple Baseline framework. First, we restructure Simple Baseline’s original linear decoder into a U-Net-like multi-scale fusion path, introducing intermediate features from the encoder via skip connections. For efficient fusion, we design a novel Attention Fusion Module (AFM), which dynamically gates the flow of incoming detailed features through a context-aware spatial attention mechanism. Second, we propose the Joint-Aware Positional Encoding (JAPE) module, which innovatively combines a fixed global coordinate system with learnable, joint-specific spatial priors. This design injects both absolute position awareness and statistical priors of the human body structure. Our ablation studies on the MPII dataset validate the effectiveness of each proposed enhancement, with our full model achieving a mean PCKh of 88.915, a 0.341 percentage point improvement over our re-implemented baseline. On the more challenging COCO val2017 dataset, our ResNet-50-based AFJ-PoseNet achieves an Average Precision (AP) of 72.6%. While this involves a slight trade-off in Average Recall for higher precision, this result represents a significant 2.2 percentage point improvement over our re-implemented baseline (70.4%) and also outperforms other strong, publicly available models like DARK (72.4%) and SimCC (72.1%) under comparable settings, demonstrating the superiority and competitiveness of our proposed enhancements.
Keywords: human pose estimation; simple baseline; feature fusion; attention mechanism; positional encoding; U-Net-like architecture; deep learning; computer vision human pose estimation; simple baseline; feature fusion; attention mechanism; positional encoding; U-Net-like architecture; deep learning; computer vision

Share and Cite

MDPI and ACS Style

Zhang, W.; Shi, Y.; Lin, J. AFJ-PoseNet: Enhancing Simple Baselines with Attention-Guided Fusion and Joint-Aware Positional Encoding. Electronics 2025, 14, 3150. https://doi.org/10.3390/electronics14153150

AMA Style

Zhang W, Shi Y, Lin J. AFJ-PoseNet: Enhancing Simple Baselines with Attention-Guided Fusion and Joint-Aware Positional Encoding. Electronics. 2025; 14(15):3150. https://doi.org/10.3390/electronics14153150

Chicago/Turabian Style

Zhang, Wenhui, Yu Shi, and Jiayi Lin. 2025. "AFJ-PoseNet: Enhancing Simple Baselines with Attention-Guided Fusion and Joint-Aware Positional Encoding" Electronics 14, no. 15: 3150. https://doi.org/10.3390/electronics14153150

APA Style

Zhang, W., Shi, Y., & Lin, J. (2025). AFJ-PoseNet: Enhancing Simple Baselines with Attention-Guided Fusion and Joint-Aware Positional Encoding. Electronics, 14(15), 3150. https://doi.org/10.3390/electronics14153150

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop