This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
AFJ-PoseNet: Enhancing Simple Baselines with Attention-Guided Fusion and Joint-Aware Positional Encoding
by
Wenhui Zhang
Wenhui Zhang
,
Yu Shi
Yu Shi *
and
Jiayi Lin
Jiayi Lin
School of Computer and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(15), 3150; https://doi.org/10.3390/electronics14153150 (registering DOI)
Submission received: 4 July 2025
/
Revised: 29 July 2025
/
Accepted: 6 August 2025
/
Published: 7 August 2025
Abstract
Simple Baseline has become a dominant benchmark in human pose estimation (HPE) due to its excellent performance and simple design. However, its “strong encoder + simple decoder” architectural paradigm suffers from two core limitations: (1) its non-branching, linear deconvolutional path prevents it from leveraging the rich, fine-grained features generated by the encoder at multiple scales and (2) the model lacks explicit prior knowledge of both the absolute positions and structural layout of human keypoints. To address these issues, this paper introduces AFJ-PoseNet, a new architecture that deeply enhances the Simple Baseline framework. First, we restructure Simple Baseline’s original linear decoder into a U-Net-like multi-scale fusion path, introducing intermediate features from the encoder via skip connections. For efficient fusion, we design a novel Attention Fusion Module (AFM), which dynamically gates the flow of incoming detailed features through a context-aware spatial attention mechanism. Second, we propose the Joint-Aware Positional Encoding (JAPE) module, which innovatively combines a fixed global coordinate system with learnable, joint-specific spatial priors. This design injects both absolute position awareness and statistical priors of the human body structure. Our ablation studies on the MPII dataset validate the effectiveness of each proposed enhancement, with our full model achieving a mean PCKh of 88.915, a 0.341 percentage point improvement over our re-implemented baseline. On the more challenging COCO val2017 dataset, our ResNet-50-based AFJ-PoseNet achieves an Average Precision (AP) of 72.6%. While this involves a slight trade-off in Average Recall for higher precision, this result represents a significant 2.2 percentage point improvement over our re-implemented baseline (70.4%) and also outperforms other strong, publicly available models like DARK (72.4%) and SimCC (72.1%) under comparable settings, demonstrating the superiority and competitiveness of our proposed enhancements.
Share and Cite
MDPI and ACS Style
Zhang, W.; Shi, Y.; Lin, J.
AFJ-PoseNet: Enhancing Simple Baselines with Attention-Guided Fusion and Joint-Aware Positional Encoding. Electronics 2025, 14, 3150.
https://doi.org/10.3390/electronics14153150
AMA Style
Zhang W, Shi Y, Lin J.
AFJ-PoseNet: Enhancing Simple Baselines with Attention-Guided Fusion and Joint-Aware Positional Encoding. Electronics. 2025; 14(15):3150.
https://doi.org/10.3390/electronics14153150
Chicago/Turabian Style
Zhang, Wenhui, Yu Shi, and Jiayi Lin.
2025. "AFJ-PoseNet: Enhancing Simple Baselines with Attention-Guided Fusion and Joint-Aware Positional Encoding" Electronics 14, no. 15: 3150.
https://doi.org/10.3390/electronics14153150
APA Style
Zhang, W., Shi, Y., & Lin, J.
(2025). AFJ-PoseNet: Enhancing Simple Baselines with Attention-Guided Fusion and Joint-Aware Positional Encoding. Electronics, 14(15), 3150.
https://doi.org/10.3390/electronics14153150
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.