An Efficient Pedestrian Gender Recognition Method Based on Key Area Feature Extraction and Information Fusion
Abstract
1. Introduction
- A discrete cosine transform (DCT) image fusion algorithm [11] and super-resolution technology-based FSRCNN model [12] are developed, using two consecutive frames from video surveillance as dual inputs to the model. The input images are divided into blocks, the DCT calculations are conducted to select high-quality blocks, and super-resolution processing is performed on high-frequency information blocks. Finally, all image blocks are fused to output a high-resolution image.
- The facial gender recognition algorithm is optimized. In addition, by selecting and fusing the intermediate layer feature values of the CNN, more comprehensive information on the feature values is obtained. Based on the EfficientNet model [13], the output features of the intermediate and high layers are unified in dimension and fused. The fused feature values are then classified to achieve facial-based gender recognition.
- A method that can recognize a pedestrian’s gender based on the entire body is proposed. This method first preprocesses an image; then extracts features from the face, hair, and clothing (lower body) regions; and finally trains different local classifiers for different regions. A Bayesian-based fusion strategy [14] is used to fuse the results of local classifiers to output the pedestrian’s gender recognition result.
2. Related Work
2.1. Gender Recognition in Low-Resolution Surveillance
2.2. Image Super-Resolution for Enhancement
2.3. Multi-Region and Attribute-Based Recognition
2.4. Information Fusion Strategies
2.5. Theoretical Foundations of Key Techniques
2.5.1. Discrete Cosine Transform (DCT) and Its Inverse
2.5.2. Bayesian Decision Fusion Framework
2.5.3. Evaluation Metrics for Classification
3. Proposed Algorithm Framework
4. DCT-PFSR-CNN Model
4.1. DCT-Based Best-Quality Block Selection
4.2. Block-Based Super-Resolution Technology
4.3. CNN-Based Face Gender Classification
- (1)
- Feature Extraction and Fusion
- (2)
- Classifier Design
- (3)
- Model Training
5. Multi-Region Pedestrian Gender Recognition Model (MPGRM)
5.1. Pedestrian Gender Recognition Model
5.2. Feature Extraction
5.2.1. Facial Feature Extraction
5.2.2. Hair Feature Region
5.3. Gender Classifier Based on YOLOv8
5.4. Information Fusion Strategy
6. Experimental Results and Analysis
6.1. Evaluation Metrics
6.2. Experimental Setup
6.3. Baseline Models and Comparison Fairness
6.4. Results of Facial Gender Recognition (DCT-PFSR-CNN)
6.4.1. DCT-Based Super-Resolution Experiment
6.4.2. Comparison with State-of-the-Art Models
6.5. Results of Full-Body Pedestrian Gender Recognition (MPGRM)
6.5.1. Overall Performance and Case Analysis
6.5.2. Benchmarking Against State-of-the-Art
6.6. Comprehensive Discussion
6.6.1. Interpretation of Results
6.6.2. Failure Analysis and Limitations
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Cao, M.; Tian, Q.; Ma, T.H.; Chen, S.C. Human Facial Attributes Estimation: A Survey. J. Softw. 2019, 30, 2188–2207. [Google Scholar]
- Fatih, E. An Effective Gender Recognition Approach Using Voice Data via Deeper LSTM Networks. Appl. Acoust. 2019, 156, 351–358. [Google Scholar] [CrossRef]
- O’Shea, K.; Nash, R. An Introduction to Convolutional Neural Networks. arXiv 2015, arXiv:1511.08458. [Google Scholar] [CrossRef]
- Rajeev, R.; Vishal, M.P.; Rama, C.; Hyper, F. A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation and Gender Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 121–135. [Google Scholar]
- Lin, C.J.; Lin, C.H.; Jeng, S.Y. Using Feature Fusion and Parameter Optimization of Dual-Input Convolutional Neural Network for Face Gender Recognition. Appl. Sci. 2020, 10, 3166. [Google Scholar] [CrossRef]
- Chen, W.B.; Li, Y.L.; Chen, Y.J. An Age and Gender Recognition Model Based on CNN-SE-ELM. Comput. Eng. Sci. 2021, 43, 872–882. [Google Scholar]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a Deep Convolutional Network for Image Super-Resolution. In Computer Vision–ECCV 2014, Proceedings of the 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V; Springer International Publishing: Cham, Switzerland, 2014; pp. 184–199. [Google Scholar]
- Chao, D.; Chen, C.L.; Xiaoou, T. Accelerating the Super-Resolution Convolutional Neural Network. Comput. Vis. 2016, 9906, 391–407. [Google Scholar]
- Tian, Y.; Jia, R.S.; Deng, M.D.; Zhao, C.Y. A Super-Resolution Reconstruction Method Based on Convolutional Neural Network in the Field of Fuzzy License Plate Image. Comput. Appl. Softw. 2020, 37, 159–164+228. [Google Scholar]
- Deshpande, A.; Razmjooy, N.; Estrela, V.V. Introduction to Computational Intelligence and Super-Resolution. In Computational Intelligence Methods for Super-Resolution in Image Processing Applications; Springer International Publishing: Cham, Switzerland, 2021; pp. 3–23. [Google Scholar]
- Raviraja, H.D.S. Enhancing Laryngeal Spinocellular Carcinoma Image Security with DCT. Indian J. Otolaryngol. Head Neck Surg. 2023, 76, 695–701. [Google Scholar] [CrossRef]
- Zhang, H.C.; Ji, F.; Zhong, X.X. Super-Resolution Reconstruction Algorithm of Single Image Based on CNN with Gaussian Blur. Comput. Appl. Softw. 2022, 39, 231–235+295. [Google Scholar]
- Luo, H.W.; Liu, B.; Yao, H.; Wang, J.B.; Yuan, H.Q.; Liu, G.H. Lightweight Gender and Age Estimation Algorithm Based on Improved Efficient Net. Transducer Microsyst. Technol. 2023, 42, 114–118. [Google Scholar] [CrossRef]
- Li, B.Y.; Liu, Q.G. Study on Decision-making of Safety Risk Factors of High-speed Railway Station Based on Fuzzy Fault Tree and BN. Railw. Stand. Des. 2024, 68, 145–152. [Google Scholar] [CrossRef]
- Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep Learning Face Attributes in the Wild. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 3730–3738. [Google Scholar]
- Cheng, Z.; Zhu, X.; Gong, S. Low-resolution face recognition. In Proceedings of the 14th Asian Conference on Computer Vision (ACCV 2018), Perth, Australia, 2–6 December 2018; Springer International Publishing: Cham, Switzerland, 2018; pp. 605–621. [Google Scholar]
- Cai, L.; Zhu, J.Q.; Zeng, H.Q.; Chen, J.; Cai, C.H.; Ma, K.K. HOG-assisted deep feature learning for pedestrian gender recognition. J. Frankl. Inst. 2017, 28, 13–30. [Google Scholar] [CrossRef]
- Cao, K.; Rong, Y.; Li, C.; Tang, X.; Loy, C.C. Pose-robust face recognition via deep residual equivariant mapping. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5187–5196. [Google Scholar]
- Lu, Y.; Ebrahimi, T. Cross-resolution face recognition via identity-preserving network and knowledge distillation. In Proceedings of the 2023 IEEE International Conference on Visual Communications and Image Processing (VCIP), Jeju, Republic of Korea, 4–7 December 2023; pp. 1–5. [Google Scholar]
- Yue, L.; Shen, H.; Li, J.; Yuan, Q.; Zhang, H.; Zhang, L. Image super-resolution: The techniques, applications, and future. Signal Process. 2016, 128, 389–408. [Google Scholar] [CrossRef]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.P.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Zhou, L.; Li, Y.; Feng, Y.; Shen, D.; Wang, H.; Dong, F. Super-Resolution Task Inference Acceleration for In-Vehicle Real-Time Video via Edge–End Collaboration. Appl. Sci. 2025, 15, 11828. [Google Scholar] [CrossRef]
- Xiang, X.; Morton, J.; Reda, F.A.; Young, L.; Perazzi, F.; Ranjan, R.; Kumar, A.; Colaco, A.; Allebach, J. HIME: Efficient Headshot Image Super-Resolution with Multiple Exemplars. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 1694–1704. [Google Scholar]
- Abdelwhab, A.; Viriri, S. A survey on soft biometrics for human identification. In Machine Learning and Biometrics; IntechOpen: Rijeka, Croatia, 2018; p. 37. [Google Scholar]
- Li, B.; Lian, X.C.; Lu, B.L. Gender classification by combining clothing, hair and facial component classifiers. Neurocomputing 2012, 76, 18–27. [Google Scholar] [CrossRef]
- Sun, Y.; Zheng, L.; Yang, Y.; Tian, Q.; Wang, S. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Computer Vision—ECCV 2018, Proceedings of the 15th European Conference, Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 480–496. [Google Scholar]
- Chen, Y.; Duffner, S.; Stoian, A.; Dufour, J.-Y.; Baskurt, A. Pedestrian attribute recognition with part-based CNN and combined feature representations. In Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Funchal, Portugal, 27–29 January 2018. [Google Scholar]
- Viola, P.; Jones, M. Robust real-time face detection. Int. J. Comput. Vis. 2004, 57, 137–154. [Google Scholar] [CrossRef]
- Liggins, M.E.; Hall, D.L.; Llinas, J. Handbook of Multisensor Data Fusion: Theory and Practice; CRC press: Boca Raton, FL, USA, 2017. [Google Scholar]
- Mudawi, A.N.; Qureshi, A.M.; Abdelhaq, M.; Alshahrani, A.; Alazeb, A.; Alonazi, M.; Algarni, A. Vehicle Detection and Classification via YOLOv8 and Deep Belief Network over Aerial Image Sequences. Sustainability 2023, 15, 14597. [Google Scholar] [CrossRef]
- Madan, A. Face Recognition using Haar Cascade Classifier. Int. J. Mod. Trends Sci. Technol. 2021, 7, 85–87. [Google Scholar] [CrossRef]
- Du, Y.F.; Zhao, H.N.; Wang, Z.Y. Super-resolution stress imaging for terahertz-elastic based on SRCNN. J. Exp. Mech. 2022, 37, 323–331. [Google Scholar]
- Arjun, A.P.; Suryanarayan, S.; Viswamanav, S.R.; Abhishek, S.; Anjali, T. Unveiling Underwater Structures: MobileNet vs. EfficientNet in Sonar Image Detection. Procedia Comput. Sci. 2024, 233, 518–527. [Google Scholar] [CrossRef]
- Duan, M.X.; Li, L.L.; Yang, C.Q.; Li, K.Q. A Hybrid Deep Learning CNN-ELM for Age and Gender Classification. Neurocomputing 2018, 275, 448–461. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A.; Liu, W.; et al. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Buyukyilmaz, M.; Cibikdiken, A.O. Voice Gender Recognition Using Deep Learning. In Proceedings of the 2016 International Conference on Modeling, Simulation and Optimization Technologies and Applications (MSOTA2016), Xiamen, China, 18–19 December 2016; Atlantis Press: Dordrecht, The Netherlands, 2016; pp. 409–411. [Google Scholar]







| Algorithm Model | F1-Score | ||
|---|---|---|---|
| RCNN-Gender | 0.76 | 0.8 | 0.78 |
| CNN-ELM | 0.81 | 0.83 | 0.82 |
| CNN-SE-ELM | 0.83 | 0.82 | 0.82 |
| LNets-ANet | 0.85 | 0.86 | 0.85 |
| DCT-PFSR-CNN | 0.89 | 0.88 | 0.88 |
| Real Situation | Predicted Situation | |
|---|---|---|
| Male | Female | |
| Male | 174 | 19 |
| Female | 13 | 118 |
| Real Situation | Predicted Situation | |
|---|---|---|
| Male | Female | |
| Male | 97 | 11 |
| Female | 12 | 72 |
| Algorithm Model | mAP | AUC |
|---|---|---|
| Mini-CNN | 0.6 | 0.65 |
| VGGNet16 | 0.73 | 0.79 |
| GoogleNet | 0.76 | 0.80 |
| ResNet50 | 0.79 | 0.82 |
| HDFL | 0.83 | 0.84 |
| MPGRM | 0.85 | 0.86 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhang, Y.; Yan, W.; Liu, G.; Jin, N.; Han, L. An Efficient Pedestrian Gender Recognition Method Based on Key Area Feature Extraction and Information Fusion. Appl. Sci. 2026, 16, 1298. https://doi.org/10.3390/app16031298
Zhang Y, Yan W, Liu G, Jin N, Han L. An Efficient Pedestrian Gender Recognition Method Based on Key Area Feature Extraction and Information Fusion. Applied Sciences. 2026; 16(3):1298. https://doi.org/10.3390/app16031298
Chicago/Turabian StyleZhang, Ye, Weidong Yan, Guoqi Liu, Ning Jin, and Lu Han. 2026. "An Efficient Pedestrian Gender Recognition Method Based on Key Area Feature Extraction and Information Fusion" Applied Sciences 16, no. 3: 1298. https://doi.org/10.3390/app16031298
APA StyleZhang, Y., Yan, W., Liu, G., Jin, N., & Han, L. (2026). An Efficient Pedestrian Gender Recognition Method Based on Key Area Feature Extraction and Information Fusion. Applied Sciences, 16(3), 1298. https://doi.org/10.3390/app16031298
