Abstract
Background: Accurate segmentation in radiographic imaging remains difficult due to heterogeneous contrast, acquisition artifacts, and fine-scale anatomical boundaries. Objective: This paper presents a Hybrid Attention U-Net, which paired an EfficientNet-B3 encoder with a decoder that is both lightweight, featuring CBAM and SCSE modules, and complementary for channel-wise and spatial-wise recalibration of sharper boundary recovery. Methods: The preprocessing phase uses percentile windowing, N4 bias compensation, per-image normalization, and geometric standardization as well as sparse geometric augmentations to reduce domain shift and make the pipeline viable. Results: For hand X-ray segmentation, the model achieves results with Dice = 0.8426, IoU around 0.78, pixel accuracy = 0.9058, ROC-AUC = 0.9074, and PR-AUC = 0.8452, and converges quickly at the early stages and remains steady at late epochs. Controlled ablation shows that the main factor of overlap quality of EfficientNet-B3 and that smaller batches (bs = 16) are always better at gradient noise and implicit regularization than larger batches. The qualitative overlays are complementary to quantitative gains that reveal more distinct cortical profiles and lower background leakage. Conclusions: It is computationally moderate, end-to-end trainable, and can be easily extended to multi-class problems through a softmax head and class-balanced objectives, rendering it a powerful, deployable option for musculoskeletal radiograph segmentation as well as an effective baseline in future clinical translation analyses.