Semantic Attention Guided Hierarchical Decision Network for Age Estimation
Abstract
1. Introduction
2. Related Works
2.1. Deep Age Estimation
2.2. Hierarchical Age Estimation
2.3. Fine-Grained Visual Classification
3. Hierarchical Age Estimation with Semantic Attention
3.1. Swin Transformer Based Global Regression
3.1.1. Swin Transformer Architecture
3.1.2. Local Adjusted Decision
3.2. Local Feature Learning and Semantic Attention Mechanism
3.2.1. Semantic Attention Mask Generation
3.2.2. Local Feature Learning
3.3. Joint Training
4. Experimental Results and Analysis
4.1. Datasets
4.2. Evaluation Protocol
4.3. Implementation Details
4.4. Comparisions with State-of-the-Arts
4.4.1. Experiments on Morph II Dataset
4.4.2. Experiments on MegaAge-Asian Dataset
4.4.3. Experiments on CACD Dataset
4.5. Ablation Study
4.5.1. Impact of Each Module
4.5.2. Impact of Radius in Local Adjustment
4.5.3. Impact of Multi-Task Learning with Parameters and
4.6. Speed Analysis
4.7. Qualitative Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Rahman, A.; Aonty, S.S.; Deb, K.; Sarker, I.H. Attention-Based Human Age Estimation from Face Images to Enhance Public Security. Data 2023, 8, 145. [Google Scholar] [CrossRef]
- Zhang, Y.; Shou, Y.; Meng, T.; Ai, W.; Li, K. A Multi-view Mask Contrastive Learning Graph Convolutional Neural Network for Age Estimation. Knowl. Inf. Syst. 2024, 66, 7137–7162. [Google Scholar] [CrossRef]
- Tan, Z.; Wan, J.; Lei, Z.; Zhi, R.; Guo, G.; Li, S.Z. Efficient Group-n Encoding and Decoding for Facial Age Estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 2610–2623. [Google Scholar] [CrossRef]
- Li, Z.; Jiang, R.; Aarabi, P. Continuous Face Aging via Self-estimated Residual Age Embedding. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 15003–15012. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
- Yi, D.; Lei, Z.; Li, S.Z. Age Estimation by Multi-scale Convolutional Network. In Computer Vision—ACCV 2014, Proceedings of the 2014 Asian Conference on Computer Vision (ACCV), Singapore, 1–5 November 2014; Springer: Singapore, 2014; pp. 144–158. [Google Scholar]
- Malli, R.C.; Aygun, M.; Ekenel, H.K. Apparent Age Estimation Using Ensemble of Deep Learning Models. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 714–721. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the 2021 9th International Conference on Learning Representations (ICLR), Virtually, 3–7 May 2021. [Google Scholar] [CrossRef]
- Chen, P.; Zhang, X.; Li, Y.; Tao, J.; Xiao, B.; Wang, B.; Jiang, Z. DAA: A Delta Age AdaIN operation for age estimation via binary code transformer. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 15836–15845. [Google Scholar]
- Xu, L.; Hu, C.; Shu, X.; Yu, H. Cross spatial and Cross-Scale Swin Transformer for fine-grained age estimation. Comput. Electr. Eng. 2025, 123, 110264. [Google Scholar] [CrossRef]
- He, X.; Quan, Y.; Xu, R.; Ji, H. A Universal Scale-Adaptive Deformable Transformer for Image Restoration across Diverse Artifacts. In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) 2025, Nashville, TN, USA, 11–15 June 2025; pp. 12731–12741. [Google Scholar]
- Wang, X.; Zhang, Y.; Liu, T.; Liu, X.; Xu, K.; Wan, J.; Guo, Y.; Wang, H. TopNet: Transformer-Efficient Occupancy Prediction Network for Octree-Structured Point Cloud Geometry Compression. In Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 27305–27314. [Google Scholar]
- Baek, S.; Lee, S.; Jo, H.; Choi, H.; Min, D. TADFormer: Task-Adaptive Dynamic TransFormer for Efficient Multi-Task Learning. arXiv 2025, arXiv:2501.04293. [Google Scholar]
- Zhao, Q.; Liu, J.; Wei, W. Mixture of deep networks for facial age estimation. Inf. Sci. 2024, 679, 121086. [Google Scholar] [CrossRef]
- Yang, T.-Y.; Huang, Y.-H.; Lin, Y.-Y.; Hsiu, P.-C.; Chuang, Y.-Y. SSR-Net: A Compact Soft Stagewise Regression Network for Age Estimation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; International Joint Conferences on Artificial Intelligence Organization: Stockholm, Sweden, 2018; pp. 1078–1084. [Google Scholar]
- Bekhouche, S.E.; Benlamoudi, A.; Dornaika, F.; Telli, H.; Bounab, Y. Facial Age Estimation Using Multi-Stage Deep Neural Networks. Electronics 2024, 13, 3259. [Google Scholar] [CrossRef]
- Li, P.; Hu, Y.; He, R.; Sun, Z. A Coupled Evolutionary Network for Age Estimation. arXiv 2018, arXiv:1809.07447. [Google Scholar]
- Zhang, C.; Liu, S.; Xu, X.; Zhu, C. C3AE: Exploring the Limits of Compact Model for Age Estimation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 12579–12588. [Google Scholar]
- Li, W.; Lu, J.; Feng, J.; Xu, C.; Zhou, J.; Tian, Q. BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 1145–1154. [Google Scholar]
- Zhang, Y.; Shou, Y.; Ai, W.; Meng, T.; Li, K. GroupFace: Imbalanced Age Estimation Based on Multi-Hop Attention Graph Convolutional Network and Group-Aware Margin Optimization. IEEE Trans. Inf. Forensics Secur. 2025, 20, 605–619. [Google Scholar] [CrossRef]
- Lin, T.-Y.; RoyChowdhury, A.; Maji, S. Bilinear CNN Models for Fine-Grained Visual Recognition. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1449–1457. [Google Scholar]
- Yu, C.; Zhao, X.; Zheng, Q.; Zhang, P.; You, X. Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11220, pp. 595–610. ISBN 978-3-030-01269-4. [Google Scholar]
- Xu, D.; Tang, Z.; Xu, W. Salient Object Detection Based on Regional Contrast and Relative Spatial Compactness. KSII Trans. Internet Inf. Syst. 2013, 7, 2737–2753. [Google Scholar] [CrossRef]
- Jiao, Y.; Li, Z.; Huang, S.; Yang, X.; Liu, B.; Zhang, T. Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection. IEEE Trans. Multimed. 2018, 20, 2693–2705. [Google Scholar] [CrossRef]
- Han, K.; Guo, J.; Zhang, C.; Zhu, M. Attribute-Aware Attention Model for Fine-grained Representation Learning. In Proceedings of the 26th ACM international conference on Multimedia, Seoul, Republic of Korea, 22–26 October 2018; ACM: New York, NY, USA, 2018; pp. 2040–2048. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA; pp. 7132–7141. [Google Scholar] [CrossRef]
- Zhang, K.; Liu, N.; Yuan, X.; Guo, X.; Gao, C.; Zhao, Z.; Ma, Z. Fine-Grained Age Estimation in the Wild With Attention LSTM Networks. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 3140–3152. [Google Scholar] [CrossRef]
- Wang, H.; Sanchez, V.; Li, C.-T. Improving Face-Based Age Estimation with Attention-Based Dynamic Patch Fusion. IEEE Trans. Image Process. 2022, 31, 1084–1096. [Google Scholar] [CrossRef]
- Hu, C.; Gao, J.; Chen, J.; Jiang, D.; Shu, Y. Fine-Grained Age Estimation With Multi-Attention Network. IEEE Access 2020, 8, 196013–196023. [Google Scholar] [CrossRef]
- Yu, H.; Mu, C.; Sun, C.; Yang, W.; Yang, X.; Zuo, X. Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data. Knowl.-Based Syst. 2015, 76, 67–78. [Google Scholar] [CrossRef]
- Gao, B.-B.; Xing, C.; Xie, C.-W.; Wu, J.; Geng, X. Deep Label Distribution Learning With Label Ambiguity. IEEE Trans. Image Process. 2017, 26, 2825–2838. [Google Scholar] [CrossRef]
- Rothe, R.; Timofte, R.; Gool, L.V. DEX: Deep EXpectation of Apparent Age from a Single Image. In Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), Santiago, Chile, 7–13 December 2015; pp. 252–257. [Google Scholar]
- Ricanek, K.; Tesafaye, T. MORPH: A Longitudinal Image Database of Normal Adult Age-Progression. In Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR06), Southampton, UK, 10–12 April 2006; pp. 341–345. [Google Scholar]
- Zhang, Y.; Liu, L.; Li, C.; Loy, C. change Quantifying Facial Age by Posterior of Age Comparisons. arXiv 2017, arXiv:1708.09687. [Google Scholar]
- Chen, B.-C.; Chen, C.-S.; Hsu, W.H. Face Recognition and Retrieval Using Cross-Age Reference Coding With Cross-Age Celebrity Dataset. IEEE Trans. Multimed. 2015, 17, 804–815. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Pan, H.; Han, H.; Shan, S.; Chen, X. Mean-Variance Loss for Deep Age Estimation From a Face. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 5285–5294. [Google Scholar]
- He, J.; Hu, C.; Wang, L. Facial age estimation based on asymmetrical label distribution. Multimed. Syst. 2023, 29, 753–762. [Google Scholar] [CrossRef]
- Chen, S.; Zhang, C.; Dong, M.; Le, J.; Rao, M. Using Ranking-CNN for Age Estimation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 742–751. [Google Scholar]
- Liu, N.; Zhang, F.; Duan, F. Age estimation by extracting hierarchical age-related features. J. Vis. Commun. Image Represent. 2023, 95, 103884. [Google Scholar] [CrossRef]
- Bao, Z.; Luo, Y.; Tan, Z.; Wan, J.; Ma, X.; Lei, Z. Deep domain-invariant learning for facial age estimation. Neurocomputing 2023, 534, 86–93. [Google Scholar] [CrossRef]
- Wen, C.; Zhang, X.; Yao, X.; Yang, J. Ordinal Label Distribution Learning. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 23424–23434. [Google Scholar]
- Li, P.; Hu, Y.; Wu, X.; He, R.; Sun, Z. Deep label refinement for age estimation. Pattern Recognit. 2020, 100, 107178. [Google Scholar] [CrossRef]
- Tan, Z.; Zhou, S.; Wan, J.; Lei, Z.; Li, S.Z. Age Estimation Based on a Single Network with Soft Softmax of Aging Modeling. In Computer Vision—ACCV 2016, Proceedings of the 13th Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2017; Volume 10113, pp. 203–216. ISBN 978-3-319-54186-0. [Google Scholar]
- Shen, W.; Zhao, K.; Guo, Y.; Yuille, A. Label Distribution Learning Forests. In Proceedings of the 30th Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 1–10. [Google Scholar] [CrossRef]
- Wan, J.; Tan, Z.; Lei, Z.; Guo, G.; Li, S.Z. Auxiliary Demographic Information Assisted Age Estimation With Cascaded Structure. IEEE Trans. Cybern. 2018, 48, 2531–2541. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Cheng, K.T. Facial Age Estimation by Deep Residual Decision Making. arXiv 2019, arXiv:1908.10737. [Google Scholar]
- Shen, W.; Guo, Y.; Wang, Y.; Zhao, K.; Wang, B.; Yuille, A. Deep Regression Forests for Age Estimation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2304–2313. [Google Scholar]
Method | Year | Pretrained | MAE | CS(3) | CS(5) | |
---|---|---|---|---|---|---|
Non-Hierarchical | DEX [32] | 2018 | ImageNet | 3.25 | - | 66.90 |
DEX [32] | 2018 | IMDB-WIKI | 2.68 | 64.50 | 78.10 | |
MV [37] | 2018 | IMDB-WIKI | 2.16 | 75.00 | 89.90 | |
DAA [9] | 2023 | IMDB-Wiki | 2.06 | - | - | |
MSDNN [16] | 2024 | ImageNet | 2.59 | 67.00 | 86.00 | |
Hierarchical | MTWGP [31] | 2017 | - | 6.28 | - | - |
Posterior [34] | 2017 | - | 2.87 | - | - | |
Posterior [34] | 2017 | IMDB-WIKI | 2.52 | - | 80.40 | |
Ranking-CNN [39] | 2017 | IMDB-WIKI | 2.96 | - | 83.70 | |
SSR-Net [15] | 2018 | IMDB-WIKI | 3.16 | - | 76.30 | |
C3AE [18] | 2019 | IMDB-WIKI | 3.13 | - | 65.00 | |
BridgeNet [19] | 2019 | IMDB-WIKI | 2.38 | 72.00 | 88.00 | |
LSTM [27] | 2020 | IMDB-WIKI | 2.36 | - | - | |
ADPF [28] | 2022 | - | 2.71 | 73.00 | 87.00 | |
ALD-Net [38] | 2023 | - | 2.65 | - | - | |
J-MIMO [40] | 2023 | ImageNet | 2.28 | - | - | |
DDIL [41] | 2023 | ImageNet | 4.96 | - | - | |
OLDL [42] | 2023 | ImageNet | 2.45 | - | - | |
GroupFace [20] | 2024 | IMDB-WIKI | 2.01 | - | - | |
OURS | 2025 | - | 2.29 | 74.70 | 88.45 | |
OURS | 2025 | ImageNet | 2.18 | 79.93 | 90.96 |
Method | Year | Pretrained | MAE | CS(3) | CS(5) | |
---|---|---|---|---|---|---|
Non-Hierarchical | MobileNet [15] | 2018 | IMDB-WIKI | 44.00 | 60.60 | |
DenseNet [15] | 2018 | IMDB-WIKI | - | 51.70 | 69.40 | |
LRN [43] | 2020 | IMDB-WIKI | 2.15 | 62.86 | 81.47 | |
DAA [9] | 2023 | IMDB-Wiki | 2.25 | 68.82 | 84.89 | |
Hierarchical | Posterior [34] | 2017 | IMDB-WIKI | - | 62.08 | 80.43 |
Posterior [34] | 2017 | MS-Celeb | - | 64.23 | 82.15 | |
SSR-Net [15] | 2018 | IMDB-WIKI | - | 54.90 | 74.10 | |
ALD-Net [38] | 2023 | - | - | 61.40 | 81.20 | |
OURS | 2025 | ImageNet | 3.09 | 63.11 | 82.30 |
Method | Year | Pretrained | MAE | CS(3) | CS(5) | |
---|---|---|---|---|---|---|
Non-Hierarchical | Soft softmax [44] | 2015 | IMDB-WIKI | 5.19 | - | - |
dLDLF [45] | 2017 | - | 6.16 | 50.50 | 71.10 | |
DEX [32] | 2018 | - | 6.52 | 48.50 | 68.20 | |
CasCNN [46] | 2018 | - | 5.23 | 36.20 | 56.30 | |
RNDF [47] | 2019 | IMDB-WIKI | 4.59 | - | - | |
DRF [48] | 2019 | IMDB-WIKI | 5.63 | 51.40 | 72.80 | |
Hierarchical | ADPF [28] | 2022 | IMDB-WIKI | 5.39 | - | - |
ALD-Net [38] | 2023 | - | 4.62 | - | - | |
J-MIMO [40] | 2023 | ImageNet | 4.78 | - | - | |
GroupFace [20] | 2024 | IMDB-WIKI | 4.32 | - | - | |
MoE [14] | 2024 | IMDB-WIKI | 5.19 | 48.50 | 70.00 | |
OURS | 2025 | ImageNet | 4.47 | 52.91 | 77.63 |
Method | Morph II | MegaAge-Asian | CACD | ||||||
---|---|---|---|---|---|---|---|---|---|
Swin Transformer | Local Adjusted | Semantic Attention | HBP | MAE | CS(3) | MAE | CS(3) | MAE | CS(3) |
√ | 2.33 | 72.99 | 3.18 | 61.98 | 5.15 | 42.91 | |||
√ | √ | 2.31 | 74.53 | 3.12 | 60.23 | 5.02 | 43.80 | ||
√ | √ | 2.28 | 74.37 | 3.20 | 59.49 | 5.11 | 42.74 | ||
√ | √ | √ | 2.25 | 74.27 | 3.06 | 61.12 | 4.51 | 47.73 | |
√ | √ | √ | √ | 2.18 | 79.93 | 3.09 | 63.11 | 4.47 | 52.91 |
Radius Parameter | Morph II | MegaAge-Asian | CACD | |||
---|---|---|---|---|---|---|
MAE | CS(3) | MAE | CS(3) | MAE | CS(3) | |
r = 5 | 2.43 | 70.69 | 3.13 | 61.21 | 4.72 | 50.30 |
r = 10 | 2.18 | 79.93 | 3.09 | 63.11 | 4.47 | 52.91 |
r = 15 | 2.21 | 79.12 | 3.18 | 64.55 | 4.63 | 50.12 |
r = 20 | 2.23 | 78.41 | 3.10 | 63.13 | 4.58 | 49.02 |
Method | Evaluation | ||||||
---|---|---|---|---|---|---|---|
Swin Transformer | Local Adjusted | Semantic Attention | HBP | Runtime | FPS | MACC | MAE |
√ | 1.69 | 591 | 47.82 | 2.33 | |||
√ | √ | 1.97 | 507 | 48.16 | 2.31 | ||
√ | √ | 1.91 | 523 | 47.99 | 2.28 | ||
√ | √ | √ | 2.14 | 467 | 48.33 | 2.25 | |
√ | √ | √ | √ | 3.52 | 284 | 50.51 | 2.18 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qiao, X.; Hu, C.; Qin, B. Semantic Attention Guided Hierarchical Decision Network for Age Estimation. Appl. Sci. 2025, 15, 7707. https://doi.org/10.3390/app15147707
Qiao X, Hu C, Qin B. Semantic Attention Guided Hierarchical Decision Network for Age Estimation. Applied Sciences. 2025; 15(14):7707. https://doi.org/10.3390/app15147707
Chicago/Turabian StyleQiao, Xincheng, Chunlong Hu, and Bin Qin. 2025. "Semantic Attention Guided Hierarchical Decision Network for Age Estimation" Applied Sciences 15, no. 14: 7707. https://doi.org/10.3390/app15147707
APA StyleQiao, X., Hu, C., & Qin, B. (2025). Semantic Attention Guided Hierarchical Decision Network for Age Estimation. Applied Sciences, 15(14), 7707. https://doi.org/10.3390/app15147707