Lightweight Deep Learning Models for Face Mask Detection in Real-Time Edge Environments: A Review and Future Research Directions
Abstract
1. Introduction
1.1. Problem Statement and Rationale
- (1)
- Maintaining high recognition reliability under real-world variability such as occlusion, lighting changes, and diverse mask types.
- (2)
1.2. Review Objectives
- (a)
- Examine conventional, lightweight, and hybrid deep learning architectures used for face mask detection.
- (b)
- Compare reported performance with respect to accuracy, inference efficiency, and deployment suitability.
- (c)
- Analyze the core challenges affecting real-world deployment, including improper mask detection, domain shift, and computational constraints.
- (d)
- Identify future research directions focused on model compression, knowledge distillation, domain adaptation, and broader compliance-oriented applications.
2. Methodology
2.1. Literature Search Strategy
2.2. Inclusion and Exclusion Criteria
2.3. Screening and Selection Approach
2.4. Data Extraction and Categorization
3. Architectural Landscape of Face Mask Detection Models
3.1. Conventional CNN-Based Approaches
3.2. Lightweight Convolutional Models
3.3. Hybrid Architectures
4. Comparative Performance Analysis
4.1. Evaluation Metrics
- Accuracy
- 2.
- Precision
- 3.
- Recall (Sensitivity)
- 4.
- F1-Score
4.2. Trade-Offs Between Accuracy and Efficiency
4.2.1. Impact of Model Size and Architecture on Accuracy and Efficiency
4.2.2. Comparative Performance of Lightweight and Heavyweight Models
4.2.3. Energy Consumption Considerations for Edge Deployment
4.2.4. Strategies to Improve the Trade-Off
4.2.5. Deployment-Oriented Architecture Selection Framework
5. Future Research Directions
5.1. Improper Mask Detection and Multi-Class Analysis
5.2. Domain Adaptation and Real-World Variability
- Unsupervised domain adaptation (UDA) for aligning feature distributions across environments;
- Self-supervised representation learning to reduce dependency on labels;
- Cross-dataset training pipelines that incorporate heterogeneous noise, mask materials, and cultural variations;
- Synthetic domain randomization to simulate low-quality or occluded footage.
5.3. Knowledge Distillation and Model Compression
- Teacher–student pipelines using powerful hybrids (e.g., enhanced YOLOv5/YOLOv8) as teachers and MobileNet- or ShuffleNet-based students;
- Quantization-aware training to reduce model size without introducing significant accuracy loss;
- Structured pruning of convolutional layers to remove redundant channels;
- Neural Architecture Search (NAS) for identifying optimal low-complexity architectures tailored to improper-mask detection.
5.4. Expanding Applications Beyond Mask Detection
- Personal Protective Equipment compliance monitoring (helmets, gloves, lab coats, face shields, safety goggles, etc.);
- Human behavior analysis (face-touching detection, cough detection, proximity violations);
- Health screening (visible respiratory cues, temperature screening integration);
- Access-control and identity verification under occlusion;
- Crowd analytics and anomaly detection for smart-city infrastructure.
5.5. Standardized Evaluation Protocols and Benchmarking
5.6. Energy-Aware Evaluation and Power-Centric Benchmarking
5.7. Emerging Transformer-Based Architectures for Edge Mask Detection
6. Conclusions
Funding
Data Availability Statement
Conflicts of Interest
References
- Liang, M.; Gao, L.; Cheng, C.; Zhou, Q.; Uy, J.P.; Heiner, K.; Sun, C. Efficacy of face mask in preventing respiratory virus transmission: A systematic review and meta-analysis. Travel Med. Infect. Dis. 2020, 36, 101751. [Google Scholar] [CrossRef] [PubMed]
- Sethi, S.; Kathuria, M.; Kaushik, T. Face mask detection using deep learning: An approach to reduce risk of Coronavirus spread. J. Biomed. Inform. 2021, 120, 103848. [Google Scholar] [CrossRef]
- Wu, P.; Li, H.; Zeng, N.; Li, F. FMD-Yolo: An efficient face mask detection method for COVID-19 prevention and control in public. Image Vis. Comput. 2022, 117, 104341. [Google Scholar] [CrossRef]
- Kolosov, D.; Kelefouras, V.; Kourtessis, P.; Mporas, I. Anatomy of Deep Learning Image Classification and Object Detection on Commercial Edge Devices: A Case Study on Face Mask Detection. IEEE Access 2022, 10, 109167. [Google Scholar] [CrossRef]
- Ullah, N.; Javed, A.; Ghazanfar, M.A.; Alsufyani, A.; Bourouis, S. A novel DeepMaskNet model for face mask detection and masked facial recognition. J. King Saud Univ.—Comput. Inf. Sci. 2022, 34, 9905–9914. [Google Scholar] [CrossRef]
- Abbas, S.F.; Shaker, S.H.; Abdullatif, F.A. Face Mask Detection Based on Deep Learning: A Review. J. Soft Comput. Comput. Appl. 2024, 1, 7. [Google Scholar] [CrossRef]
- Amer, F.; Ali, M.; Al-Tamimi, M.S.H. Face mask detection methods and techniques: A review. Int. J. Nonlinear Anal. Appl. 2022, 13, 2008–6822. [Google Scholar] [CrossRef]
- Vibhuti; Jindal, N.; Singh, H.; Rana, P.S. Face mask detection in COVID-19: A strategic review. Multimed. Tools Appl. 2022, 81, 40013–40042. [Google Scholar] [CrossRef]
- Alturki, R.; Alharbi, M.; AlAnzi, F.; Albahli, S. Deep learning techniques for detecting and recognizing face masks: A survey. Front. Public Health 2022, 10, 955332. [Google Scholar] [CrossRef]
- Anggraini, N.; Ramadhani, S.H.; Wardhani, L.K.; Hakiem, N.; Shofi, I.M.; Rosyadi, M.T. Development of Face Mask Detection using SSDLite MobilenetV3 Small on Raspberry Pi 4. In Proceedings of the 2022 5th International Conference on Computer and Informatics Engineering, IC2IE 2022, Jakarta, Indonesia, 13–14 September 2022; Institute of Electrical and Electronics Engineers Inc.: New Jersey, NJ, USA, 2022; pp. 209–214. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA, 10–15 June 2019; Volume 2019, pp. 10691–10700. Available online: https://arxiv.org/pdf/1905.11946 (accessed on 1 December 2025).
- Sanjaya, S.A.; Rakhmawan, S.A. Face Mask Detection Using MobileNetV2 in the Era of COVID-19 Pandemic. In Proceedings of the 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy, ICDABI 2020, Sakheer, Bahrain, 26–27 October 2020; Institute of Electrical and Electronics Engineers Inc.: New Jersey, NJ, USA, 2020. [Google Scholar] [CrossRef]
- Shao, Y.; Ning, J.; Shao, H.; Zhang, D.; Chu, H.; Ren, Z. Lightweight face mask detection algorithm with attention mechanism. Eng. Appl. Artif. Intell. 2024, 137, 109077. [Google Scholar] [CrossRef]
- Dodda, R.; Raghavendra, C.; Swamy, U.R.; Azmera, C.N.; Sreenu, M.; Nimmala, S. Real-Time Face Mask Detection Using Deep Learning: Enhancing Public Health and Safety. E3S Web Conf. 2025, 616, 02013. [Google Scholar] [CrossRef]
- Sheikh, B.U.H.; Zafar, A. RRFMDS: Rapid Real-Time Face Mask Detection System for Effective COVID-19 Monitoring. SN Comput. Sci. 2023, 4, 288. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2323. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015; Available online: https://arxiv.org/pdf/1409.1556 (accessed on 4 December 2025).
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI 2017, San Francisco, CA, USA, 4–9 February 2017; Association for the Advancement of Artificial Intelligence: Palo Alto, CA, USA, 2017; pp. 4278–4284. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2017; pp. 2261–2269. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2016; pp. 1800–1807. [Google Scholar] [CrossRef]
- Radosavovic, I.; Kosaraju, R.P.; Girshick, R.; He, K.; Dollár, P. Designing network design spaces. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2020; pp. 10425–10433. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Columbus: IEEE Computer Society, Columbus, OH, USA, 23–28 June 2014; Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2014; pp. 580–587. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 1137–1149. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
- Cabani, A.; Hammoudi, K.; Benhabiles, H.; Melkemi, M. MaskedFace-Net—A dataset of correctly/incorrectly masked face images in the context of COVID-19. Smart Health 2020, 19, 100144. [Google Scholar] [CrossRef]
- Jiang, X.; Gao, T.; Zhu, Z.; Zhao, Y. Real-Time Face Mask Detection Method Based on YOLOv3. Electronics 2021, 10, 837. [Google Scholar] [CrossRef]
- Mahmoud, M.; Kasem, M.S.E.; Kang, H.S. A Comprehensive Survey of Masked Faces: Recognition, Detection, and Unmasking. Appl. Sci. 2024, 14, 8781. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobileNetV3. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, South Korea, 27 October–2 November 2019; Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2019; pp. 1314–1324. [Google Scholar] [CrossRef]
- Al-Rammahi, A.H.I. Face mask recognition system using MobileNetV2 with optimization function. Appl. Artif. Intell. 2022, 36, 2145638. [Google Scholar] [CrossRef]
- Fadly, F.; Kurniawan, T.B.; Dewi, D.A.; Zakaria, M.Z.; Hisham, P.A.A.B. Deep Learning Based Face Mask Detection System Using MobileNetV2 for Enhanced Health Protocol Compliance. J. Appl. Data Sci. 2024, 5, 2067–2078. [Google Scholar] [CrossRef]
- Nagrath, P.; Jain, R.; Madan, A.; Arora, R.; Kataria, P.; Hemanth, J. SSDMNV2: A real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2. Sustain. Cities Soc. 2021, 66, 102692. [Google Scholar] [CrossRef]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2018; pp. 6848–6856. [Google Scholar] [CrossRef]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5 MB Model Size. February. 2016. Available online: https://arxiv.org/pdf/1602.07360 (accessed on 1 December 2025).
- Sharma, M.; Gunwant, H.; Saggar, P.; Gupta, L.; Gupta, D. EfficientNet-B0 Model for Face Mask Detection Based on Social Information Retrieval. Int. J. Inf. Syst. Model. Des. 2022, 13, 15. [Google Scholar] [CrossRef]
- Azouji, N.; Sami, A.; Taheri, M. EfficientMask-Net for face authentication in the era of COVID-19 pandemic. Signal Image Video Process. 2022, 16, 1991–1999. [Google Scholar] [CrossRef]
- Thuan, C.H.; Nguyen, V.D. Face Mask Detection Using YOLOv8 with Fine-Tuning and EfficientNet Backbone. In Proceedings of the International Conference on Sustainable Computing. ICSC 2025, Ho Chi Minh, Vietnam, 16–17 June 2025; Lecture Notes in Electrical Engineering; Goyal, N., Nguyen, T.N., Lata, M., Ogunmola, G.A., Eds.; Springer: Singapore, 2026; Volume 1530. [Google Scholar] [CrossRef]
- Benitez-Garcia, G.; Prudente-Tixteco, L.; Olivares-Mercado, J.; Takahashi, H. SqueezeMaskNet: Real-Time Mask-Wearing Recognition for Edge Devices. Big Data Cogn. Comput. 2025, 9, 10. [Google Scholar] [CrossRef]
- Loey, M.; Manogaran, G.; Taha, M.H.N.; Khalifa, N.E.M. Fighting against COVID-19: A novel deep learning model based on YOLO-v2 with ResNet-50 for medical face mask detection. Sustain. Cities Soc. 2021, 65, 102600. [Google Scholar] [CrossRef] [PubMed]
- Karthikeyan, B.; Gowri, S. A Real-Time Face Mask Detection Using SSD and MobileNetV2. In Proceedings of the 2021 4th International Conference on Computing and Communications Technologies, ICCCT 2021, Chennai, India, 16–17 December 2021; pp. 144–148. [Google Scholar] [CrossRef]
- Pham, T.N.; Nguyen, V.H.; Huh, J.H. Integration of improved YOLOv5 for face mask detector and auto-labeling to generate dataset for fighting against COVID-19. J. Supercomput. 2023, 79, 8966–8992. [Google Scholar] [CrossRef]
- Loey, M.; Manogaran, G.; Taha, M.H.N.; Khalifa, N.E.M. A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic. Measurement 2021, 167, 108288. [Google Scholar] [CrossRef] [PubMed]
- Tabassum, T.; Talukder, A.; Rahman, M.; Rashiduzzaman; Kabir, Z.; Islam, M.; Uddin, A. A Parallel Convolutional Neural Network for Accurate Face Mask Detection in the Fight Against COVID-19. Biomed. Mater. Devices 2025, 4, 2347–2357. [Google Scholar] [CrossRef]
- Haque, S.B.U. A fuzzy-based frame transformation to mitigate the impact of adversarial attacks in deep learning-based real-time video surveillance systems. Appl. Soft Comput. 2024, 167, 112440. [Google Scholar] [CrossRef]
- Dubey, P.; Dubey, P.; Iwendi, C.; Biamba, C.N.; Rao, D.D. Enhanced IoT-Based Face Mask Detection Framework Using Optimized Deep Learning Models: A Hybrid Approach with Adaptive Algorithms. IEEE Access 2025, 13, 17325–17339. [Google Scholar] [CrossRef]
- Parikh, D.; Karthikeyan, A.; Ravi, V.; Shibu, M.; Singh, R.; Sofana, R.S. IoT and ML-driven framework for managing infectious disease risks in communal spaces: A post-COVID perspective. Front. Public Health 2025, 13, 1552515. [Google Scholar] [CrossRef]
- Truong, C.D.; Mishra, S.; Long, N.Q.; Ngoc, L.A. Efficient Face Mask Detection for Banking Information Systems. In Creative Approaches Towards Development of Computing and Multidisciplinary IT Solutions for Society; Scrivener Publishing LLC: Beverly, MA, USA, 2024; pp. 435–454. [Google Scholar] [CrossRef]
- Himeur, Y.; Al-Maadeed, S.; Varlamis, I.; Al-Maadeed, N.; Abualsaud, K.; Mohamed, A. Face Mask Detection in Smart Cities Using Deep and Transfer Learning: Lessons Learned from the COVID-19 Pandemic. Systems 2023, 11, 107. [Google Scholar] [CrossRef]
- George, A.; Ecabert, C.; Shahreza, H.O.; Kotwal, K.; Marcel, S. EdgeFace: Efficient Face Recognition Model for Edge Devices. IEEE Trans. Biom. Behav. Identity Sci. 2024, 6, 158–168. [Google Scholar] [CrossRef]
- Anh, T.N.; Nguyen, V.D. MAPBoost: Augmentation-resilient real-time object detection for edge deployment: Augmentation-resilient lightweight detection. J. Real. Time. Image Process. 2026, 23, 10. [Google Scholar] [CrossRef]
- Hamdi, A.; Noura, H.; Azar, J.; Pujolle, G. Frugal Object Detection Models: Solutions, Challenges and Future Directions. In Proceedings of the 21st International Wireless Communications and Mobile Computing Conference, IWCMC 2025, Montreal, QC, Canada, 12–16 May 2025; Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2025; pp. 1694–1701. [Google Scholar] [CrossRef]
- Qian, J.; Mu, S.; Lu, H.; Xu, S. Two-stage model re-optimization and application in face recognition. Neurocomputing 2025, 651, 130805. [Google Scholar] [CrossRef]
- Mostafa, S.A.; Ravi, S.; Zebari, D.A.; Zebari, N.A.; Mohammed, M.A.; Nedoma, J.; Martinek, R.; Deveci, M.; Ding, W. A YOLO-based deep learning model for Real-Time face mask detection via drone surveillance in public spaces. Inf. Sci. 2024, 676, 120865. [Google Scholar] [CrossRef]
- Hussain, D.; Ismail, M.; Hussain, I.; Alroobaea, R.; Hussain, S.; Ullah, S.S. Face Mask Detection Using Deep Convolutional Neural Network and MobileNetV2-Based Transfer Learning. Wirel. Commun. Mob. Comput. 2022, 2022, 1536318. [Google Scholar] [CrossRef]
- Hagui, I.; Msolli, A.; Helali, A.; Fredj, H. Face Mask Detection using CNN: A Fusion of Cryptography and Blockchain. Eng. Technol. Appl. Sci. Res. 2024, 14, 17156–17161. [Google Scholar] [CrossRef]
- Umer, M.; Sadiq, S.; Alhebshi, R.M.; Alsubai, S.; Al Hejaili, A.; Eshmawi, A.A.; Nappi, M.; Ashraf, I. Face mask detection using deep convolutional neural network and multi-stage image processing. Image Vis. Comput. 2023, 133, 104657. [Google Scholar] [CrossRef]
- Benifa, J.V.B.; Chola, C.; Muaad, A.Y.; Bin Hayat, M.A.; Bin Heyat, B.; Mehrotra, R.; Akhtar, F.; Hussein, H.S.; Vargas, D.L.R.; Castilla, Á.K.; et al. FMDNet: An Efficient System for Face Mask Detection Based on Lightweight Model during COVID-19 Pandemic in Public Areas. Sensors 2023, 23, 6090. [Google Scholar] [CrossRef]
- Bania, R.K. Ensemble of deep transfer learning models for real-time automatic detection of face mask. Multimed. Tools Appl. 2023, 82, 1. [Google Scholar] [CrossRef]
- Habeeb, Z.Q.; Al-Zaydi, I. Incorrect facemask-wearing detection using image processing and deep learning. Bull. Electr. Eng. Inform. 2023, 12, 2212–2219. [Google Scholar] [CrossRef]
- Kumar, A.; Kalia, A.; Kalia, A. ETL-YOLO v4: A face mask detection algorithm in era of COVID-19 pandemic. Optik 2022, 259, 169051. [Google Scholar] [CrossRef]
- Hosny, K.M.; Ibrahim, N.A.; Mohamed, E.R.; Hamza, H.M. Artificial intelligence-based masked face detection: A survey. Intell. Syst. Appl. 2024, 22, 200391. [Google Scholar] [CrossRef]
- Mbunge, E.; Simelane, S.; Fashoto, S.G.; Akinnuwesi, B.; Metfula, A.S. Application of deep learning and machine learning models to detect COVID-19 face masks—A review. Sustain. Oper. Comput. 2021, 2, 235–245. [Google Scholar] [CrossRef]
- Mulani, A.O.; Kulkarni, T.M. Face Mask Detection System Using Deep Learning: A Comprehensive Survey. Commun. Comput. Inf. Sci. 2025, 2439, 25–33. [Google Scholar] [CrossRef]
- Jayaswal, R.; Dixit, M. AI-based face mask detection system: A straightforward proposition to fight with Covid-19 situation. Multimed. Tools Appl. 2022, 82, 13241–13273. [Google Scholar] [CrossRef]
- Vukicevic, A.M.; Petrovic, M.; Milosevic, P.; Peulic, A.; Jovanovic, K.; Novakovic, A. A systematic review of computer vision-based personal protective equipment compliance in industry practice: Advancements, challenges and future directions. Artif. Intell. Rev. 2024, 57, 319. [Google Scholar] [CrossRef]
- Benitez-Baltazar, V.H.; Pacheco-Ramírez, J.H.; Moreno-Ruiz, J.R.; Núñez-Gurrola, C. Autonomic Face Mask Detection with Deep Learning: An IoT Application. Rev. Mex. De Ing. Biomédica 2021, 42, 160–170. [Google Scholar] [CrossRef]
- Han, Z.; Huang, H.; Fan, Q.; Li, Y.; Li, Y.; Chen, X. SMD-YOLO: An efficient and lightweight detection method for mask wearing status during the COVID-19 pandemic. Comput. Methods Programs Biomed. 2022, 221, 106888. [Google Scholar] [CrossRef] [PubMed]
- Biswas, A.K.; Roy, K. A comparative study on ‘face mask detection’ using machine learning and deep learning algorithms. Volume 1: AI, Classification, Wearable Devices, and Computer-Aided Diagnosis. In Artificial Intelligence in e-Health Framework; Academic Press: Cambridge, MA, USA, 2025; Volume 1, pp. 193–200. [Google Scholar] [CrossRef]
- Masud, U.; Siddiqui, M.; Sadiq, M.; Masood, S. SCS-Net: An efficient and practical approach towards Face Mask Detection. Procedia Comput. Sci. 2023, 218, 1878–1887. [Google Scholar] [CrossRef]
- Sahoo, M.P.; Sridevi, M.; Sridhar, R. Covid prevention based on identification of incorrect position of face-mask. Procedia Comput. Sci. 2024, 235, 1222–1234. [Google Scholar] [CrossRef]
- Koklu, M.; Cinar, I.; Taspinar, Y.S. CNN-based bi-directional and directional long-short term memory network for determination of face mask. Biomed. Signal Process. Control 2022, 71, 103216. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
- Mao, Y.; Lv, Y.; Zhang, G.; Gui, X. Exploring Transformer for Face Mask Detection. IEEE Access 2024, 12, 118377–118388. [Google Scholar] [CrossRef]
- Kuriakose, B.; Shrestha, R.; Sandnes, F.E. DeepNAVI: A deep learning based smartphone navigation assistant for people with visual impairments. Expert Syst. Appl. 2023, 212, 118720. [Google Scholar] [CrossRef]
- Tomiło, P.; Oleszczuk, P.; Laskowska, A.; Wilczewska, W.; Gnapowski, E. Effect of Architecture and Inference Parameters of Artificial Neural Network Models in the Detection Task on Energy Demand. Energies 2024, 17, 5417. [Google Scholar] [CrossRef]
- Lahmer, S.; Khoshsirat, A.; Rossi, M.; Zanella, A. Energy Consumption of Neural Networks on NVIDIA Edge Boards: An Empirical Model. In Proceedings of the 2022 20th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), Torino, Italy, 19–23 September 2022. [Google Scholar] [CrossRef]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. Proc. Mach. Learn. Res. 2021, 139, 10347–10357. [Google Scholar]
- d’Ascoli, S.; Touvron, H.; Leavitt, M.; Morcos, A.; Biroli, G.; Sagun, L. ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases. J. Stat. Mech. Theory Exp. 2021, 2022, 139. [Google Scholar] [CrossRef]
- Wu, H.; Xiao, B.; Codella, N.; Liu, M.; Dai, X.; Yuan, L.; Zhang, L. CvT: Introducing Convolutions to Vision Transformers. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 22–31. Available online: http://arxiv.org/abs/2103.15808 (accessed on 8 February 2026).
- Mehta, S.; Rastegari, M. MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer. In Proceedings of the ICLR 2022—10th International Conference on Learning Representations, Virtual, 25–29 April 2022; Available online: https://arxiv.org/pdf/2110.02178 (accessed on 8 February 2026).
- Chen, Y.; Dai, X.; Chen, D.; Liu, M.; Dong, X.; Yuan, L.; Liu, Z. Mobile-Former: Bridging MobileNet and Transformer. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5260–5269. Available online: http://arxiv.org/abs/2108.05895 (accessed on 8 February 2026).


| Ref. | Experiment | Goal | Materials | Methods | Results | Conclusion |
|---|---|---|---|---|---|---|
| [3] | Face mask detection | To propose a novel face mask detection framework FMD-Yolo to monitor whether people wear masks in the right way in public, which is an effective way to block the virus transmission. | Im-Res2Net-101 feature extractor, enhanced path aggregation network En-PAN, localization loss, Matrix NMS method | Im-Res2Net-101 used for feature extraction with En-PAN for feature fusion; localization loss applied during training and Matrix NMS used at inference. | The proposed FMD-Yolo has achieved the best precision AP50 of 92.0% and 88.4% on the two datasets, and AP75 at Intersection over Union (IoU) = 0.75 has improved 5.5% and 3.9% respectively compared with the second one. | The results demonstrate the superiority of FMD-Yolo in face mask detection with both theoretical values and practical significance. |
| [10] | Object detection | To develop multi-class mask compliance detection on Raspberry Pi 4 using SSDLite MobileNetV3. | Raspberry Pi 4 Model B 4 Gb, Raspberry Pi 4 Model B Cam V.1, monitor, push button non-momentary switch, fan, diode 1N4001, 3 resistor 470 Ohm, transistor 2n2222 | 1. Trained SSDLite MobilenetV3 Small model with fine-tuning and without fine-tuning. 2. Compared the detection performance of SSDLite MobilenetV3 Small with other models. 3. Evaluated the detection, FPS, and power consumption of the models. | SSDLite MobileNetV3-Small achieved the highest FPS but showed limited accuracy for incorrect mask detection; overall accuracy was 70%. | The SSDLite MobilenetV3 Small model offers faster detection than others but is less effective than SSDLite MobilenetV2 in identifying incorrect mask usage. |
| Object detection model comparison | Comparing models like SSDLite MobilenetV3 Small, SSDLite MobilenetV3 Large and SSDLite MobilenetV2. | Raspberry Pi 4 Model B 4 Gb, Raspberry Pi 4 Model B Cam V.1, dataset of face images with and without masks | 1. Trained the different object detection models on the face mask dataset. 2. Evaluated the detection accuracy, FPS, and power consumption of the models | The SSDLite MobilenetV2 model with fine-tuning was best. The SSDLite MobilenetV3 Small model had the highest FPS but limited detection. | The SSDLite MobilenetV2 model is the most suitable for face mask detection on Raspberry Pi 4. | |
| [11] | Empirical study | To study model scaling and balance depth, width, and resolution for improved performance. | Convolutional Neural Networks (ConvNets) | Systematically studied scaling up ConvNets by adjusting network depth, width, and resolution. | Scaling up any dimension of network width, depth, or resolution improves accuracy, but the accuracy gain diminishes for bigger models. | Carefully balancing network width, depth, and resolution is an important but missing piece, preventing from achieving better accuracy and efficiency. |
| Methodology development | To propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. | Convolutional Neural Networks (ConvNets) | Proposed a compound scaling method that uniformly scales network width, depth, and resolution with a set of fixed scaling coefficients. | The proposed compound scaling method can achieve better accuracy and efficiency compared to conventional single-dimension scaling methods. | The compound scaling method enables scaling up a baseline ConvNet to any target resource constraints in a more principled way, while maintaining model efficiency. | |
| Neural architecture search and model scaling | To design a new baseline network and scale it up to obtain a family of models, called EfficientNets. | Convolutional Neural Networks (ConvNets) | Used neural architecture search to develop a new baseline network called EfficientNet-B0, and then applied the proposed compound scaling method to scale it up and obtain a family of EfficientNet models. | The scaled EfficientNet models significantly outperform other ConvNets in terms of accuracy and efficiency. | The EfficientNet models, developed using the proposed compound scaling method, achieve much better accuracy and efficiency than previous ConvNets. | |
| [12] | Machine learning algorithm through image classification using MobileNetV2 | To develop a face mask detection model that can be used by authorities to make mitigation, evaluation, prevention, and action planning against COVID-19. | 1916 images of people wearing masks, 1930 images of people not wearing masks, image size of 224 × 224 pixels | Data collection and preprocessing; MobileNetV2-based model training with augmentation; evaluation using accuracy, precision, recall, and F1-score. | The built model can detect people wearing and not wearing face masks with an accuracy of 96.85%. | Supports monitoring and enforcement of face mask policies for COVID-19 mitigation. |
| Application of the face mask detection model to real-world data | To apply the developed face mask detection model to images from 25 cities in Indonesia and analyze the percentage of people wearing face masks in each city. | Images from various sources (public place CCTV, shops, traffic cameras) in 25 cities in Indonesia, selected based on data availability | Apply the trained face mask detection model to the images from the 25 cities, calculate the percentage of people wearing and not wearing face masks in each city. | The percentage of people not wearing face masks ranged from 64.14% (Surabaya) to 82.76% (Jambi). | Face mask usage differs across cities, with some showing notably lower compliance. This helps authorities target interventions and allocate resources to areas with the weakest mask-wearing. | |
| Correlation analysis | To evaluate the validity of the face mask wearing percentage data by correlating it with the COVID-19 vigilance index. | Percentage of people wearing face masks in the 25 cities, COVID-19 vigilance index data | Conduct a bivariate correlation analysis between the percentage of people wearing face masks in the cities and the COVID-19 vigilance index. | The percentage of people wearing face masks and the COVID-19 vigilance index have a strong, negative, and significant correlation of −0.62. | The model’s mask-wearing data aligns with the COVID-19 vigilance index, showing that cities with lower mask-wearing rates require higher vigilance against transmission. | |
| [13] | Algorithm development | To propose a novel object detector, lightweight FMD through You Only Look Once (LFMD-YOLO), which can achieve an excellent balance of precision and speed. | Cross Stage Partial bottleneck with three convolutions and Efficient Channel Attention (C3E), Max-pooling Efficient Channel Attention Pyramid Fast (MECAPF) module, custom backbone, Enhanced Bidirectional Feature Pyramid Network (E-BiFPN), detection heads, IoU | Designed C3E and MECAPF modules, proposed a custom backbone, integrated E-BiFPN for multi-scale feature fusion, and enhanced detection heads with improved IoU. | The proposed LFMD-YOLO achieves higher detection accuracy with mAPs of 68.7% and 60.1%, respectively, while having lower parameters and giga floating point operations (GFLOPs). | The proposed LFMD-YOLO can achieve an excellent balance of precision and speed for lightweight face mask detection. |
| [14] | Deep learning-based face mask detection | To develop a deep learning-based system for real-time face mask detection to enhance public health monitoring in environments where mask compliance is critical. | Convolutional Neural Network (CNN) built with TensorFlow and Keras, diverse input images, Google Colab, Google Drive | Utilize a CNN model to effectively classify individuals as mask-wearing or non-mask-wearing. Apply data preprocessing and augmentation techniques to improve model robustness and generalizability. Leverage cloud-based resources for efficient model training and deployment. | The system achieved high training and validation accuracy, consistent loss reduction, and strong real-time detection. It remained reliable despite minor validation fluctuations, demonstrating resilience and suitability for varied environments. | The DL-based system detects mask usage in real time. Data augmentation improves generalization, allowing reliable performance across varied scenarios and image conditions. |
| [15] | Face mask detection system development | To develop a rapid real-time face mask detection system (RRFMDS) for effective COVID-19 monitoring. | Single-shot multi-box detector based on ResNet-10, fine-tuned MobileNetV2, custom dataset of 14,535 images with 5000 incorrect masks, 4789 with masks, and 4746 without masks | Used single-shot multi-box detector for face detection and fine-tuned MobileNetV2 for face mask classification. Trained the system on the custom dataset. | The system can detect all three classes (incorrect masks, with mask and without mask faces) with an average accuracy of 99.15% and 97.81% on training and testing data respectively. The system takes on average 0.14201142 s to process a single frame. | The proposed RRFMDS is a lightweight and efficient approach for real-time face mask detection from video data. It outperforms existing state-of-the-art models in terms of accuracy and processing speed. |
| Architecture | Year | Key Innovation | Parameter Count | Strengths | Limitations | Ref. |
|---|---|---|---|---|---|---|
| LeNet-5 | 1998 | Early CNN architecture (convolution + pooling) | ~60 K | Simple, stable | Too shallow for modern tasks | [16] |
| AlexNet | 2012 | ReLU, dropout, GPU training | ~60 M | Started modern deep learning | Heavy; not edge-friendly | [17] |
| VGG16/VGG19 | 2014 | Deep stacks of 3 × 3 convolution layers | ~138 M | Strong features | Extremely large & slow | [18] |
| Inception-v1 | 2015 | Multi-branch convolutions | ~6.8 M | Efficient, flexible | Complex structure | [19] |
| Inception-ResNet | 2017 | Residual + inception blocks | 23–55 M | Very accurate | Heavy | [20] |
| ResNet (18–101) | 2016 | Skip connections | 11–44 M | Deep & stable | Still heavy for edge | [21] |
| DenseNet121 | 2017 | Dense connectivity | ~8 M | High feature reuse | Slow inference | [22] |
| Xception | 2017 | Depthwise separable convolution | ~22 M | Good efficiency | Not lightweight enough | [23] |
| Faster R-CNN | 2015 | Two-stage region detector | Backbone-dependent | Accurate | Slow without GPU | [27] |
| Mask R-CNN | 2017 | Adds segmentation branch | Backbone-dependent | Detects improper masks | Heavy for edge | [28] |
| RegNet | 2020 | Regular network design space | 10–50 M | Strong accuracy | Rarely used in mask detection | [24] |
| Model Type | Key Architectural Concept | Approx. Parameters/Complexity | Typical Usage in Mask Detection |
|---|---|---|---|
| MobileNetV2 | Depthwise separable convolutions with inverted residual bottlenecks | ~3.4 M parameters (α = 1.0) | Most widely adopted lightweight backbone; real-time mask/no-mask or 3-class classification on embedded devices. |
| EfficientNet-B0 | Compound scaling of depth, width, and resolution | ~5.3 M parameters | Used in high-accuracy systems (e.g., EfficientMask-Net); suitable for improper mask detection with slightly higher computational needs. |
| ShuffleNet | Grouped 1 × 1 convolution with channel shuffle | ~2.3 M parameters (1.0×) | Limited adoption; tested in low-resource conditions but less consistent than MobileNet. |
| SqueezeNet/SqueezeMaskNet | Fire module (1 × 1 squeeze + expand) with attention extensions | ~1.2 M (SqueezeNet), ~1.5 M (SqueezeMaskNet) | Designed for real-time multi-class classification; high FPS on Jetson-class edge hardware. |
| EfficientMask-Net | EfficientNet-B0 backbone with large-margin piecewise-linear classifier (LMPL) | ~5.3 M parameters | Achieves up to 99.6% accuracy; offers detailed detection of improper mask positioning (nose/chin uncovered). |
| Hybrid CNN–YOLO variants (e.g., MobileNetV2 + YOLO) | Lightweight backbone with optimised detection head | Varies (<8 M total) | Used for real-time detection + localisation in surveillance and compliance monitoring; effective for streaming environments. |
| Hybrid Architecture | Backbone Type | Detection/Classification Head | Key Idea | Reported Strengths | Ref. |
|---|---|---|---|---|---|
| YOLOv3-Based Hybrid Detector | CSPDarknet-style backbone | YOLOv3 detection head | Full detector tailored to mask usage | Real-time performance with strong localization | [30] |
| YOLOv2–ResNet50 | ResNet50 (heavy backbone) | YOLOv2 one-stage detector | Combine high-level semantic features with fast one-stage detection | High accuracy in medical mask detection; good robustness | [44] |
| MobileNetV2 + SSD | MobileNetV2 (lightweight) | SSD one-stage detector | Lightweight backbone with efficient localizations | Real-time mask detection on edge devices | [45] |
| YOLOv5 + Coordinate Attention | YOLOv5 backbone | Attention-enhanced detection head | Spatial refinement + auto-labelling | Strong mean Average Precision (mAP) improvement; suitable for embedded devices | [46] |
| CNN Feature Extractor + SVM/ML Classifier | VGG19, ResNet, MobileNet | SVM/KNN/RF classifiers | Deep features + classical ML | Good performance on small datasets; simpler deployment | [47] |
| Smart-City System-Level Hybrid | CNN/YOLO backbone | IoT + Edge-tier inference pipeline | Combines DL, transfer learning, and IoT | Scalable deployment across large environments | [53] |
| Study (Ref.) | Accuracy | Precision | Recall | F1-Score | AP | mAP | ROC/AUC | Use Case/Interpretation in Mask Detection |
|---|---|---|---|---|---|---|---|---|
| [2] (single-stage and two-stage object detectors) | ✓ | ✓ | ✓ | ✓ | Binary classifier; strong balanced metrics on curated datasets | |||
| [3] (FMD-YOLO) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | YOLO detection; AP/mAP used for bounding-box evaluation | |
| [5] (DeepMaskNet) | ✓ | ✓ | ✓ | ✓ | Detection + masked-face recognition; reports full metric suite | |||
| [18] (VGG16/VGG19) | (✓ ImageNet) | ✓ | Backbone for early mask-classification pipelines | |||||
| [21] ResNet | (✓ ImageNet) | ✓ | Backbone widely reused in mask detection & compliance tasks | |||||
| [25] R-CNN | ✓ | ✓ | Basis for two-stage detectors adapted for mask detection | |||||
| [27] Faster R-CNN | ✓ | ✓ | Used in early mask detectors assessing region-level AP/mAP | |||||
| [33] MobileNetV2 | ✓ | ✓ | ✓ | ✓ | Lightweight backbone for fast mask/no-mask classification | |||
| [37] (SSDMNV2) | ✓ | ✓ | ✓ | ✓ | SSD + MobileNetV2; used in real-time mask detection systems | |||
| [44] (YOLOv2–ResNet50) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Hybrid YOLO-based medical mask detector | |
| [63] (Ensemble Classification Model) | ✓ | ✓ | ✓ | ✓ | ✓ | Ensemble ResNet50/Inception/VGG; includes ROC curve & AUC ≈ 0.99 | ||
| [71] (IoT Mask Detection) | ✓ | ✓ | ✓ | ✓ | ✓ | IoT access-control system; explicitly reports ROC curve & AUC ≈ 0.96 |
| Architecture Family | Representative Models | Parameter Scale | FPS on Edge Devices (Jetson/RPi/Low-Power GPU) | Memory/Deployment Characteristics | Suitability for Real-Time Mask Detection |
|---|---|---|---|---|---|
| Conventional CNN Backbones | VGG16/19, ResNet50 [18,21], DenseNet121 [22], InceptionV3 [19], classical transfer-learning approaches | High (8 M–140 M+) | Low–Moderate (<10–15 FPS without optimization) | Require GPU-class memory; heavy compute | High accuracy under controlled datasets but generally not suitable for real-time edge deployment |
| Two-Stage Detectors (R-CNN Family) | R-CNN [25], Fast R-CNN [26], Faster R-CNN [27], Mask R-CNN [28] | High + region proposal overhead | Low (<5–10 FPS on Jetson; often <5 FPS on RPi) | Large VRAM usage; very slow on CPUs | Excellent detection accuracy, but too slow for practical edge-device mask monitoring |
| Single-Stage Detectors (Heavy Backbones) | YOLOv2–ResNet50 [44], ETL-YOLOv4 [65], drone-based YOLO [58] | Moderate–High (40 M–60 M+) | Moderate (10–30 FPS on Jetson Xavier; <15 FPS on Nano/RPi) | Need GPU acceleration; moderate memory | Suitable for edge devices only with optimization; strong accuracy but mixed speed |
| Lightweight CNN Backbones (Classification) | MobileNetV1/V2/V3 [32,33,34], EfficientNet-B0 [11], ShuffleNet [38], SqueezeNet [39], mask-detection works [42] | Low (1 M–5 M range) | High (30–60 FPS on Jetson Nano; usable on RPi) | Very small footprint; easy to quantize and prune; CPU-friendly | Excellent for fast mask classification once faces are detected; ideal for edge and mobile deployment |
| Lightweight Single-Stage Detectors | SSD-MobileNetV2 (SSDMNV2) [37,45], EfficientMask-Net [41], YOLOv4-tiny/YOLOv5-s variants | Low–Moderate (2 M–10 M) | High (25–90 FPS depending on platform) | Optimized for low memory; fits into IoT/embedded systems | Best trade-off between accuracy and speed; preferred choice for real-time mask detection on edge devices |
| Hybrid & Attention-Enhanced Architectures | YOLOv5 + CoordAttention [46], IoT-optimized deep learning [50]. | Low–Moderate (slightly higher due to attention modules) | High (25–60 FPS with optimized pipelines) | Slightly heavier than lightweight CNNs but still edge-deployable | Very promising direction: improved robustness (occlusion, clutter) while remaining efficient |
| Extreme Lightweight/Frugal/Deployment-Engineered Models | Frugal object detectors [56], augmentation-resilient object detectors [55] | Very Low (<1 M–3 M) | Very High (60+ FPS even on modest devices) | Minimal memory; optimized for microcontrollers, Neural Processing Units (NPUs), or minimal-GPU boards | Ideal for massive IoT, smart-city nodes, or hundreds of camera feeds with strict power limits; slight accuracy trade-off |
| Ref | Device/Platform | Model/Backbone | Task | FPS/Latency | Power (W)/Energy (J) | Measured vs. Estimated |
|---|---|---|---|---|---|---|
| [4] | Raspberry Pi 4 | MobileNetV3 | Image Classification | 19.2 ms latency | 9 W (max) | Latency measured; power estimated |
| [4] | Intel NCS2 + Raspberry Pi 4 | MobileNetV3 | Image Classification | 9.5 ms latency | 2 W (max) | Latency measured; power estimated |
| [4] | Jetson Nano | MobileNetV3 | Image Classification | 5.09 ms latency | 10 W (max) | Latency measured; power estimated |
| [4] | Jetson Xavier NX | MobileNetV3 | Image Classification | 1.22 ms latency | 15 W (max) | Latency measured; power estimated |
| [4] | Raspberry Pi 4 | SSDLite MobileNetV3 | Object Detection | 47 ms latency | 9 W (max) | Latency measured; power estimated |
| [4] | Jetson Xavier NX | SSDLite MobileNetV3 | Object Detection | 2.9 ms latency | 15 W (max) | Latency measured; power estimated |
| [10] | Raspberry Pi 4 | SSDLite MobileNetV3 Small | Object Detection | 8.67–9.79 FPS | 7.4–8.0 W | Measured |
| [10] | Raspberry Pi 4 | SSDLite MobileNetV3 Large | Object Detection | 3.81–4.26 FPS | 7.3–8.0 W | Measured |
| [10] | Raspberry Pi 4 | SSDLite MobileNetV2 | Object Detection | 3.33–3.57 FPS | 7.2–7.9 W | Measured |
| [37] | Laptop (i7-8750H + GTX1050Ti) | SSDMNV2 (SSD-ResNet10 + MobileNetV2) | Object Detection | 15.71 FPS | Not reported | Measured |
| [43] | Jetson Orin NX | SqueezeMaskNet | Object Detection | 96 FPS | Not reported | Measured |
| [43] | Jetson Xavier NX | SqueezeMaskNet | Object Detection | 84 FPS | Not reported | Measured |
| [43] | Jetson Orin Nano | SqueezeMaskNet | Object Detection | 74 FPS | Not reported | Measured |
| [43] | RTX 2080 Super GPU | SqueezeMaskNet | Object Detection | 297 FPS | Not reported | Measured |
| [80] | RTX 3090 GPU | YOLOv8n/YOLOv9t/YOLOv10n | Object Detection | 1.78–3.16 min inference time | ~144 W | Power measured |
| [80] | Jetson Xavier NX | YOLOv8n | Object Detection | 3.82 min inference | 7.29 W | Measured |
| [81] | Jetson TX2 | Conv & Fully Connected NN layers | Neural Network Inference | Not reported | Energy per inference (J) | Measured + modeled |
| [81] | Jetson Xavier NX | Conv & Fully Connected NN layers | Neural Network Inference | Not reported | Energy per inference (J) | Measured + modeled |
| Note: Devices reported in this table include Raspberry Pi 4 (Raspberry Pi Ltd., Cambridge, UK), Intel NCS2 (Intel Corporation, Santa Clara, CA, USA), and NVIDIA Jetson platforms and GPUs (NVIDIA Corporation, Santa Clara, CA, USA). Manufacturer information refers to official product developers; procurement or sourcing details were not reported in the original studies. | ||||||
| Model | Parameters (Approx.) | Accuracy (%) | Speed/Resource Use | Notes |
|---|---|---|---|---|
| YOLOv4-tiny | ~6 M | Lower than YOLOv4 | Fast, low resource | 1/10th parameters of YOLOv4 [72] |
| MobileNetV2 | Lightweight | ~92.6 | Real-time, embedded devices | Robust for real-time use [33] |
| DenseMaskNet (DenseNet201) | Heavyweight | 99 | Slower, high resource | Highest accuracy in comparison [75] |
| Mask R-CNN | Heavyweight | Highest | Not suitable for real-time | Best accuracy, poor efficiency [28] |
| Custom Lightweight Net (SCS-Net) | 0.12 M | ~95.5 | Highly efficient | Up to 496× parameter reduction [74] |
| Ensemble of Single-Stage and Two-Stage Detectors | - | 98.2 | 0.05 s/image | High accuracy and speed [2] |
| Deployment Environment | Typical Hardware | Recommended Architecture | Key Rationale |
|---|---|---|---|
| Ultra-low-power edge | Microcontrollers, ARM CPUs | MobileNetV2, ShuffleNet | Minimal parameter count |
| Edge AI devices | Jetson Nano, Raspberry Pi + NCS2 | SSD-MobileNetV2, YOLO-tiny | Real-time detection |
| Embedded GPU platforms | Jetson Xavier NX, Orin | YOLOv5s, SqueezeMaskNet | Balanced speed and accuracy |
| Cloud/server systems | GPU clusters | Faster R-CNN, ConvNeXt | Maximum accuracy |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Rasheed, S. Lightweight Deep Learning Models for Face Mask Detection in Real-Time Edge Environments: A Review and Future Research Directions. Mach. Learn. Knowl. Extr. 2026, 8, 102. https://doi.org/10.3390/make8040102
Rasheed S. Lightweight Deep Learning Models for Face Mask Detection in Real-Time Edge Environments: A Review and Future Research Directions. Machine Learning and Knowledge Extraction. 2026; 8(4):102. https://doi.org/10.3390/make8040102
Chicago/Turabian StyleRasheed, Saim. 2026. "Lightweight Deep Learning Models for Face Mask Detection in Real-Time Edge Environments: A Review and Future Research Directions" Machine Learning and Knowledge Extraction 8, no. 4: 102. https://doi.org/10.3390/make8040102
APA StyleRasheed, S. (2026). Lightweight Deep Learning Models for Face Mask Detection in Real-Time Edge Environments: A Review and Future Research Directions. Machine Learning and Knowledge Extraction, 8(4), 102. https://doi.org/10.3390/make8040102

