Concealed Face Analysis and Facial Reconstruction via a Multi-Task Approach and Cross-Modal Distillation in Terahertz Imaging
Abstract
1. Introduction
Applying Deep Learning Approaches for Terahertz and Sub-Millimeter-Wave Imaging
2. Materials and Methods
2.1. THz Image Acquisition System
2.2. Dataset
2.3. A Unified Multi-Task Learning Using Sub-MMW Imagery Only
2.3.1. A Shared Feature Encoding via a Shared CNN Encoder
2.3.2. Concealed Face Verification Head
2.3.3. Facial Posture Classification Head
2.3.4. Unconcealed Face Reconstruction Decoder
2.4. Integration of a Visible-Range Modality: A Teacher–Student Model
2.4.1. Dual-Encoder Architecture
2.4.2. Cross-Modal Fusion via Symmetric Cross-Attention
2.4.3. Teacher–Student Knowledge Distillation
3. Results
3.1. Performance Results of the THz-Only and the Distilled Multi-Modal Models
3.2. Open-Set Generalization Analysis
3.3. Single-Task vs. Multi-Task Learning Ablation
3.4. Knowledge Distillation Loss-Weight Analysis
3.5. Unconcealed Facial Reconstruction Fidelity
3.6. Cross-Attention Fusion Study
3.7. Cross-Modal Distillation Semantic Transfer
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Hao, J.; Li, J.; Pi, Y. Three-dimensional imaging of terahertz circular SAR with sparse linear array. Sensors 2018, 18, 2477. [Google Scholar] [CrossRef] [PubMed]
- Amini, T.; Jahangiri, F.; Ameri, Z.; Hemmatian, M.A. A review of feasible applications of THz waves in medical diagnostics and treatments. J. Lasers Med. Sci. 2021, 12, e92. [Google Scholar] [CrossRef] [PubMed]
- Tao, Y.H.; Fitzgerald, A.J.; Wallace, V.P. Non-contact, non-destructive testing in various industrial sectors with terahertz technology. Sensors 2020, 20, 712. [Google Scholar] [CrossRef] [PubMed]
- Phing, S.H.; Mazhorova, A.; Shalabi, M.; Peccianti, M.; Clerici, M.; Pasquazi, A.; Ozturk, J.A. Sub-wavelength terahertz beam profiling of a THz source via an all-optical knife-edge technique. Sci. Rep. 2015, 5, 8551. [Google Scholar] [CrossRef]
- Yildirim, I.O.; Altan, H.; Şahin, A.B. Performance of an active THz imaging system for recognition of concealed faces. J. Infrared Millim. Terahertz Waves 2023, 44, 365–378. [Google Scholar] [CrossRef]
- Hashemi, N.S.; Aghdam, R.B.; Ghiasi, A.S.B.; Fatemi, P. Template matching advances and applications in image analysis. arXiv 2016, arXiv:1610.07231. [Google Scholar] [CrossRef]
- Sharma, S. Template matching approach for face recognition system. Int. J. Signal Process. Syst. 2013, 1, 284–289. [Google Scholar] [CrossRef][Green Version]
- Brunelli, R.; Poggio, T. Face recognition: Features versus templates. IEEE Trans. Pattern Anal. Mach. Intell. 1993, 15, 1042–1052. [Google Scholar] [CrossRef]
- Hassner, T.; Harel, S.; Paz, E.; Enbar, R. Effective face frontalization in unconstrained images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; IEEE: New York, NY, USA, 2015; pp. 4295–4304. [Google Scholar]
- Gonzalez-Sosa, E.; Vera-Rodriguez, R.; Fierrez, J.; Patel, V.M. Millimetre wave person recognition: Hand-crafted vs learned features. In Proceedings of the IEEE International Conference on Identity, Security and Behavior Analysis (ISBA), New Delhi, India, 22 February 2017; IEEE: New York, NY, USA, 2017. [Google Scholar]
- Su, W.T.; Hung, Y.C.; Yu, P.J.; Yang, S.H.; Lin, C.W. Seeing through a black box: Toward high-quality terahertz imaging via subspace-and-attention guided restoration. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
- Alefs, B.G.; Den Hollander, R.J.M.; Nennie, F.A.; Van Der Houwen, E.H.; Bruijn, M.; Van Der Mark, W.; Noordam, J.C. Thorax biometrics from millimetre-wave images. Pattern Recognit. Lett. 2010, 31, 2357–2363. [Google Scholar] [CrossRef]
- Moreno-Moreno, M.; Fierrez, J.; Vera-Rodriguez, R.; Parron, J. Distance-based feature extraction for biometric recognition of millimeter wave body images. In Proceedings of the Carnahan Conference on Security Technology, Barcelona, Spain, 18–21 October 2011; IEEE: New York, NY, USA, 2011. [Google Scholar]
- Palka, N.; Kowalski, M. Towards fingerprint spoofing detection in the terahertz range. Sensors 2020, 20, 3379. [Google Scholar] [CrossRef]
- Gonzalez-Sosa, E.; Vera-Rodriguez, R.; Fierrez, J.; Alonso-Fernandez, F.; Patel, V.M. Exploring body texture from mmW images for person recognition. IEEE Trans. Biom. Behav. Identity Sci. 2019, 1, 139–151. [Google Scholar] [CrossRef]
- Zeng, Z.; Wu, H.; Chen, M.; Luo, S.; He, C. Concealed hazardous object detection for terahertz images with cross-feature fusion transformer. Opt. Lasers Eng. 2024, 182, 108454. [Google Scholar] [CrossRef]
- Jayaweera, S.S.; Regani, S.D.; Hu, Y.; Wang, B.; Ray Liu, K.J. mmID: High-resolution mmWave imaging for human identification. In Proceedings of the 2023 IEEE 9th World Forum on Internet of Things (WF-IoT), Aveiro, Portugal, 12–27 October 2023; IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
- Liu, Y.; Wang, C.; Lu, M.; Yang, J.; Gui, J.; Zhang, S. From simple to complex scenes: Learning robust feature representations for accurate human parsing. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5449–5462. [Google Scholar] [CrossRef] [PubMed]
- Ge, Z.; Zhang, Y.; Wu, X.; Jia, Z.; Wang, H.; Jia, K. Deep-learning-based method for concealed object detection in terahertz (THz) images. In Proceedings of the Advanced Fiber Laser Conference (AFL 2023), Shenzhen, China, 10–12 November 2023; SPIE: Bellingham, WA, USA, 2024; pp. 268–274. [Google Scholar]
- Katsuyama, Y.; Sato, T.; Qi, X.; Tamesue, K.; Wen, Z.; Yu, K.; Tokuda, K.; Sato, T. Deep learning based concealed object recognition in active millimeter wave imaging. In Proceedings of the 2021 IEEE Asia-Pacific Microwave Conference (APMC), Brisbane, Australia, 28 November–1 December 2021; IEEE: New York, NY, USA, 2021; pp. 434–436. [Google Scholar]
- Cheng, L.; Ji, Y.; Li, C.; Liu, X.; Fang, G. Improved SSD network for fast concealed object detection and recognition in passive terahertz security images. Sci. Rep. 2022, 12, 12082. [Google Scholar] [CrossRef]
- Xing, W.; Zhang, J.; Guo, L. A fast detection method based on deep learning of millimeter wave human image. In Proceedings of the 2018 International Conference on Artificial Intelligence and Virtual Reality, Takamatsu, Japan, 23–25 November 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 67–71. [Google Scholar]
- Peng, D.; Xu, L.; Wu, H.; Wang, T.; Xiao, H.; Cheng, L.; Qin, Y. Multi-scale super-resolution reconstruction of terahertz images for postal security inspection. Opt. Express 2025, 33, 16237–16252. [Google Scholar] [CrossRef] [PubMed]
- Lu, Y.; Wu, Y.; Liu, B.; Zhang, T.; Li, B.; Chu, Q.; Yu, N. Cross-modality person re-identification with shared-specific feature transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; IEEE: New York, NY, USA, 2020; pp. 13379–13389. [Google Scholar]
- Ranjan, R.; Patel, V.M.; Chellappa, R. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 41, 121–135. [Google Scholar] [CrossRef]
- Shan, L.; Zhang, R.; Chilukoti, S.V.; Zhang, X.; Lee, I.; Hei, X. IdentityKD: Identity-wise Cross-modal Knowledge Distillation for Person Recognition via mmWave Radar Sensors. In Proceedings of the 6th ACM International Conference on Multimedia in Asia, Bangkok, Thailand, 3–6 December 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 1–7. [Google Scholar]
- Wang, C.; Zhang, Q.; Wang, X.; Zhou, L.; Li, Q.; Xia, Z.; Ma, B.; Shi, Y.Q. Light-Field Image Multiple Reversible Robust Watermarking Against Geometric Attacks. IEEE Trans. Dependable Secure Comput. 2025, 22, 5861–5875. [Google Scholar] [CrossRef]
- Yip, B.; Towner, R.; Kling, T.; Chen, C.; Wang, Y. Image pre-processing using OpenCV library on MORPH-II face database. arXiv 2018, arXiv:1811.06934. [Google Scholar]
- Wang, F.; Xiang, X.; Cheng, J.; Yuille, A.L. Normface: L2 hypersphere embedding for face verification. In Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017; Association for Computing Machinery: New York, NY, USA, 2017; pp. 1041–1049. [Google Scholar]
- Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; Krishnan, D. Supervised contrastive learning. Adv. Neural Inf. Process. Syst. 2020, 33, 18661–18673. [Google Scholar]
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, NY, USA, 2018; pp. 2472–2481. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, NY, USA, 2018; pp. 586–595. [Google Scholar]
- Li, H.; Wu, X.J. CrossFuse: A novel cross attention mechanism based infrared and visible image fusion approach. Inf. Fusion 2024, 103, 102147. [Google Scholar] [CrossRef]
- Ma, M.; Ren, J.; Zhao, L.; Tulyakov, S.; Wu, C.; Peng, X. Smil: Multimodal learning with severely missing modality. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; AAAI: Washington, DC, USA, 2021; Volume 35, pp. 2302–2310. [Google Scholar]
- Yu, Z.; Wang, J.; Yu, L.C.; Zhang, X. Dual-encoder transformers with cross-modal alignment for multimodal aspect-based sentiment analysis. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Virtual, 20–23 November 2022; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 414–423. [Google Scholar]
- Thoker, F.M.; Gall, J. Cross-modal knowledge distillation for action recognition. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; IEEE: New York, NY, USA, 2019. [Google Scholar]
- Nagrani, A.; Yang, S.; Arnab, A.; Jansen, A.; Schmid, C.; Sun, C. Attention bottlenecks for multimodal fusion. Adv. Neural Inf. Process. Syst. 2021, 34, 14200–14213. [Google Scholar]
- Gupta, S.; Girshick, R.; Arbeláez, P.; Malik, J. Learning rich features from RGB-D images for object detection and segmentation. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 345–360. [Google Scholar]
- Garcia, N.C.; Morerio, P.; Murino, V. Modality distillation with multiple stream networks for action recognition. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
- Heo, B.; Kim, J.; Yun, S.; Park, H.; Kwak, N.; Choi, J.Y. A comprehensive overhaul of feature distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: New York, NY, USA, 2019; pp. 1921–1930. [Google Scholar]
- Kowalski, M.; Grudzień, A.; Mierzejewski, K. Thermal–visible face recognition based on cnn features and triple triplet configuration for on-the-move identity verification. Sensors 2022, 22, 5012. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
- Tong, Q.; Nocentini, O.; Lagomarsino, M.; Cai, K.; Lorenzini, M.; Ajoudani, A. Lightweight Facial Landmark Detection in Thermal Images via Multi-Level Cross-Modal Knowledge Transfer. arXiv 2025, arXiv:2510.11128. [Google Scholar]
- Xue, Y.; Joshi, S.; Gan, E.; Chen, P.Y.; Mirzasoleiman, B. Which features are learnt by contrastive learning? On the role of simplicity bias in class collapse and feature suppression. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 38938–38970. [Google Scholar]
- Chen, G.-H.; Yang, C.-L.; Po, L.-M.; Xie, S.-L. Edge-Based Structural Similarity for Image Quality Assessment. In Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, Toulouse, France, 14–19 May 2006; IEEE: New York, NY, USA, 2006. [Google Scholar]
- Kurup, A.R.; Rozban, D.; Abramovich, A.; Yitzhaky, Y.; Kopeika, N.S. Accelerating Millimeter Wave Imaging: Automating Glow Discharge Detector Focal Plane Arrays with Chirped FMCW Radar for Rapid Measurement and Instrumentation Applications. Electronics 2025, 14, 1819. [Google Scholar] [CrossRef]






| Metric | THz-Only Model | Distilled Multi-Modal Model |
|---|---|---|
| Verification Acc. (%) | 99.43 +/− 0.76 | 99.70 +/− 0.53 |
| Classification Acc. (%) | 99.36 +/− 0.69 | 99.50 +/− 0.72 |
| Avg. Cosine Positive Distance | 0.045 +/− 0.007 | 0.027 +/− 0.009 |
| Avg. Cosine Negative Distance | 0.774 +/− 0.004 | 0.755 +/− 0.010 |
| L2 Distance Margin | 0.996 +/− 0.021 | 1.028 +/− 0.015 |
| Windowed SSIM | 0.615 +/− 0.007 | 0.559 +/− 0.012 |
| PSNR (dB) | 25.50 +/− 0.10 | 25.14 +/− 0.12 |
| Metric | THz-Only Model | Distilled Multi-Modal Model |
|---|---|---|
| Verification Acc. (%) | 75.24 +/− 10.52 | 73.42 +/− 15.45 |
| Classification Acc. (%) | 96.13 +/− 5.58 | 95.88 +/− 3.76 |
| Avg. Cosine Pos. Distance | 0.369 +/− 0.115 | 0.307 +/− 0.121 |
| Avg. Cosine Neg. Distance | 0.592 +/− 0.112 | 0.496 +/− 0.158 |
| L2 Distance Margin | 0.245 +/− 0.108 | 0.223 +/− 0.120 |
| Windowed SSIM | 0.301 +/− 0.020 | 0.300 +/− 0.016 |
| PSNR (dB) | 21.45 +/− 0.50 | 21.44 +/− 0.54 |
| Configuration | Verification Acc. (%) | Classification Acc. (%) | F1 Macro | W- SSIM | PSNR (dB) | Verification Margin |
|---|---|---|---|---|---|---|
| V (Verification) | 99.12 | - | - | - | - | 0.964 |
| C (Classification) | - | 99.38 | 0.994 | - | - | - |
| R (Reconstruction) | - | - | - | 0.482 | 24.29 | - |
| V + C | 99.17 | 100 | 1 | - | - | 0.894 |
| V + R | 99.17 | - | - | 0.482 | 24.32 | 0.999 |
| C + R | - | 99.07 | 0.991 | 0.48 | 24.29 | - |
| V + C + R (Full MTL) | 99.81 | 100 | 1 | 0.485 | 24.36 | 1.028 |
| Variant | Features Loss W | Logit Loss W | Temp. | Positive Distance | Margin | Norm Margin | Verif. Accuracy (%) | W-SSIM |
|---|---|---|---|---|---|---|---|---|
| No_KD | 0 | 0 | 3 | 0.2276 | 0.9676 | 2.355 | 99.56 | 0.467 |
| Feat_only | 1 | 0 | 3 | 0.218 | 0.9682 | 2.486 | 99.61 | 0.471 |
| Logit_only | 0 | 0.5 | 3 | 0.2174 | 0.9761 | 2.451 | 99.71 | 0.469 |
| Default | 1 | 0.5 | 3 | 0.2116 | 0.98 | 2.454 | 99.46 | 0.468 |
| Logit_heavy | 0.3 | 1 | 3 | 0.2071 | 0.9858 | 2.474 | 99.81 | 0.469 |
| Feat_heavy | 2 | 0.15 | 3 | 0.2238 | 0.9703 | 2.414 | 99.66 | 0.462 |
| Metric | Proposed (Concealed Input) | Proposed (Unconcealed Input) | Autoencoder (Concealed Input) | Autoencoder (Unconcealed Input) |
|---|---|---|---|---|
| W-SSIM | 0.468 | 0.428 | 0.216 | 0.991 |
| PSNR (dB) | 24.21 | 23.72 | 19.93 | 41.43 |
| Edge SSIM | 0.223 | 0.193 | 0.111 | 0.987 |
| Edge MAE | 0.086 | 0.090 | 0.100 | 0.011 |
| Emb. Cosine Sim | 0.965 | 0.940 | 0.928 | 1.000 |
| Ver. Acc (%) | 99.27 | 99.27 | 99.12 | 99.12 |
| Cls. Acc (%) | 99.38 | 99.38 | 99.38 | 99.38 |
| Region | Proposed SSIM | Autoencoder SSIM | Proposed PSNR (dB) | Autoencoder PSNR (dB) |
|---|---|---|---|---|
| Upper Left (eye) | 0.537 | 0.27 | 24.35 | 19.94 |
| Upper Right (eye) | 0.515 | 0.265 | 24.04 | 19.82 |
| Center (nose) | 0.483 | 0.173 | 22.21 | 18.08 |
| Lower Center (mouth) | 0.449 | 0.185 | 23.45 | 19.28 |
| Periphery | 0.495 | 0.23 | 25.49 | 20.98 |
| Fusion Strategy | Description | Verif. Accuracy (%) | Class. Accuracy (%) | W-SSIM | PSNR (dB) | Dist. Margin |
|---|---|---|---|---|---|---|
| No Fusion | Concat + linear projection | 99.32 | 99.38 | 0.467 | 24.22 | 0.970 |
| Vis-Guided Only | Q = Vis, K/V = THz | 99.56 | 99.38 | 0.466 | 24.13 | 0.970 |
| THz-Guided Only | Q = THz, K/V = Vis | 98.98 | 99.38 | 0.466 | 24.18 | 0.961 |
| Dual Cross-Attention | Symmetric bidirectional | 99.27 | 99.38 | 0.469 | 24.18 | 0.967 |
| Variant | CKA (proj) | ρ (proj) | Cos (proj) | CKA (emb) | ρ (emb) | Ret@5 (%) | Verif. Accuracy (%) | W-SSIM |
|---|---|---|---|---|---|---|---|---|
| Proposed | 0.839 | 0.634 | 0.948 | 0.511 | 0.477 | 30 | 98.68 | 0.464 |
| No Distillation | 0.558 | 0.343 | −0.036 | 0.554 | 0.514 | 20 | 99.17 | 0.467 |
| Random Teacher | 0.927 | 0.917 | 0.844 | 0.467 † | 0.335 † | 10 † | 99.56 | 0.468 |
| Permuted | 0.480 | 0.330 | 0.922 | 0.503 | 0.466 | 30 | 99.42 | 0.469 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Bergman, N.; Yildirim, I.O.; Sahin, A.B.; Altan, H.; Yitzhaky, Y. Concealed Face Analysis and Facial Reconstruction via a Multi-Task Approach and Cross-Modal Distillation in Terahertz Imaging. Sensors 2026, 26, 1341. https://doi.org/10.3390/s26041341
Bergman N, Yildirim IO, Sahin AB, Altan H, Yitzhaky Y. Concealed Face Analysis and Facial Reconstruction via a Multi-Task Approach and Cross-Modal Distillation in Terahertz Imaging. Sensors. 2026; 26(4):1341. https://doi.org/10.3390/s26041341
Chicago/Turabian StyleBergman, Noam, Ihsan Ozan Yildirim, Asaf Behzat Sahin, Hakan Altan, and Yitzhak Yitzhaky. 2026. "Concealed Face Analysis and Facial Reconstruction via a Multi-Task Approach and Cross-Modal Distillation in Terahertz Imaging" Sensors 26, no. 4: 1341. https://doi.org/10.3390/s26041341
APA StyleBergman, N., Yildirim, I. O., Sahin, A. B., Altan, H., & Yitzhaky, Y. (2026). Concealed Face Analysis and Facial Reconstruction via a Multi-Task Approach and Cross-Modal Distillation in Terahertz Imaging. Sensors, 26(4), 1341. https://doi.org/10.3390/s26041341

