SFE-DETR: An Enhanced Transformer-Based Face Detector for Small Target Faces in Open Complex Scenes

Yang, Chenhao; Jiang, Yueming; Song, Chunyan

doi:10.3390/s26010125

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

SFE-DETR: An Enhanced Transformer-Based Face Detector for Small Target Faces in Open Complex Scenes

by

Chenhao Yang

,

Yueming Jiang

^* and

Chunyan Song

School of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China

^*

Author to whom correspondence should be addressed.

Sensors 2026, 26(1), 125; https://doi.org/10.3390/s26010125

Submission received: 20 November 2025 / Revised: 19 December 2025 / Accepted: 22 December 2025 / Published: 24 December 2025

(This article belongs to the Section Optical Sensors)

Download Versions Notes

Abstract

Face detection is an important task in the field of computer vision and is widely applied in various applications. However, in open and complex scenes with dense faces, occlusions, and image degradation, small face detection still faces significant challenges due to the extremely small target scale, difficult localization, and severe background interference. To address these issues, this paper proposes a small face detector for open complex scenes, SFE-DETR, which aims to simultaneously improve detection accuracy and computational efficiency. The backbone network of the model adopts an inverted residual shift convolution and dilated reparameterization structure, which enhances shallow features and enables deep feature self-adaptation, thereby better preserving small-scale information and reducing the number of parameters. Additionally, a multi-head multi-scale self-attention mechanism is introduced to fuse multi-scale convolutional features with channel-wise weighting, capturing fine-grained facial features while suppressing background noise. Moreover, a redesigned SFE-FPN introduces high-resolution layers and incorporates a novel feature fusion module consisting of local, large-scale, and global branches, efficiently aggregating multi-level features and significantly improving small face detection performance. Experimental results on two challenging small face detection datasets show that SFE-DETR reduces parameters by 28.1% compared to the original RT-DETR-R18 model, achieving a mAP50 of 94.7% and AP-s of 42.1% on the SCUT-HEAD dataset, and a mAP50 of 86.3% on the WIDER FACE (Hard) subset. These results demonstrate that SFE-DETR achieves optimal detection performance among models of the same scale while maintaining efficiency.

Keywords: object detection; small face detection; feature extraction; feature fusion pyramid; RT-DETR; open complex scenes

Share and Cite

MDPI and ACS Style

Yang, C.; Jiang, Y.; Song, C. SFE-DETR: An Enhanced Transformer-Based Face Detector for Small Target Faces in Open Complex Scenes. Sensors 2026, 26, 125. https://doi.org/10.3390/s26010125

AMA Style

Yang C, Jiang Y, Song C. SFE-DETR: An Enhanced Transformer-Based Face Detector for Small Target Faces in Open Complex Scenes. Sensors. 2026; 26(1):125. https://doi.org/10.3390/s26010125

Chicago/Turabian Style

Yang, Chenhao, Yueming Jiang, and Chunyan Song. 2026. "SFE-DETR: An Enhanced Transformer-Based Face Detector for Small Target Faces in Open Complex Scenes" Sensors 26, no. 1: 125. https://doi.org/10.3390/s26010125

APA Style

Yang, C., Jiang, Y., & Song, C. (2026). SFE-DETR: An Enhanced Transformer-Based Face Detector for Small Target Faces in Open Complex Scenes. Sensors, 26(1), 125. https://doi.org/10.3390/s26010125

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SFE-DETR: An Enhanced Transformer-Based Face Detector for Small Target Faces in Open Complex Scenes

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI