1. Introduction
The need to protect sensitive data and tangible assets has increased due to the growing importance of technology and connected networks. Cyber-attacks, identity theft, and security lapses routinely target government and commercial projects. The level of risk is proven by a well-known cyber-fraud cases that resulted in financial damages [
1]. A large percentage of crimes commit such offenses due to a single failing: conventional access control systems that utilize tokens such as ID cards, physical keys, passwords, or PINs. These systems authenticate access based on ID cards using “what we have” or “what we know” instead of “who we are.” On the other hand, biometric systems that capture unique physiological and behavioral markers offer a more secure and tailored option for identity verification. Face recognition systems are becoming increasingly popular across different applications, which makes them the most accessible biometric authentication technology [
2,
3].
In recent years, person recognition systems have evolved into one of the most dynamic and important fields of study in security and authentication disciplines. Nevertheless, certain problems continue to exist, such as effective recognition typically involving a dataset containing images, voice samples, and biometric records from a single individual to enable confident recognition. The use of biometric technologies has drastically modernized the process of identifying individuals. Unlike older approaches, which require authentication through passwords, ID cards, token keys, and PINs, biometric systems provide access using human characteristics. These systems verify identity through both behavioral and physiological traits. All forms of traditional security systems have their advantages and disadvantages. Passwords and PINs are difficult to remember and are easy to guess. Physical keys and tokens are prone to being misplaced, and ID cards can break, be damaged, forged, or rendered unreadable. Although biometric identifiers are generally more difficult to share or replicate than passwords or physical tokens, they are not immune to presentation attacks. Modern facial recognition systems can be deceived by high-resolution printouts, 3D masks, replayed video attacks, or digitally generated adversarial samples. To mitigate these risks, current research includes face anti-spoofing (FAS) and Presentation Attack Detection (PAD) techniques such as texture-based analysis, liveness detection, and deep learning-based spoof detection. Modern systems of biometric identification make use of fingerprint scanning, facial recognition, hand geometric models, palm scan, hand vein pattern analysis, iris scan, recognition of ears, and voice identification. With these methods, both physical and digital access control can be more secure, user-friendly, and tamper-proof [
4,
5].
The field of intelligent environments has brought innovations that have advanced security systems over the last few decades. Biometric identifiers have many forms, but perhaps the most recognizable and easily used is the human face. Facial recognition systems (FRS) identify and authenticate a person by comparing their face to the images in the database [
6]. Capturing an individual’s image, extracting unique attributes, and saving that data for comparison are the basic steps in storing faces. These systems can identify individuals through photographs or live video feeds. With technological advancements, the system can compare faces against thousands of profiles within seconds. These rapid recognition capabilities make facial scan one of the most convenient biometric technologies [
7]. An approach to improving safety-related issues is discussed in [
8]. Several researchers are currently investigating person recognition methods, especially facial recognition, which has become a very useful and efficient application for biometric technology. Compared to different biometric technologies, face recognition has a number of benefits. Facial recognition can function in a non-intrusive and contactless way, in contrast to techniques that necessitate direct physical involvement, including placing a hand on a scanner for hand geometry, carefully positioning a finger for fingerprint identification, or aligning one’s eye for retina or iris scans. Because of this, it is more practical and easy to use in practical situations. To guarantee precise facial image capture, most biometric systems still need some human cooperation, such as standing in front of a camera [
9]. However, one of the primary advantages of face recognition technology is its ability for passive identification. Since cameras can collect images automatically, no intentional action is needed from the individual. Alongside this convenience, there are reasons why facial recognition software is not widely embraced. The software’s unreliability arises from problems like occlusions, poor images, insufficient illumination, and variations in head attitude or expression. These limitations affect accuracy and reliability. Scientists are always coming up with new ways to overcome such obstacles to make systems using facial recognition technology more effective, secure, and reliable in practical uses [
4,
10]. Person recognition is defined as the application of pattern recognition technology to identify a person from a set list of individuals, classify known individuals, and distinguish them from unknown individuals. Out of all biometric traits, the human face is one of the most difficult to capture due to its highly dynamic nature, which includes expressions, head poses, and lighting. These difficulties make facial recognition one of the most intensive and intricate tasks in pattern recognition. To meet these challenges, improved algorithms and models designed to enhance accuracy, robustness, and reliability of face recognition systems have been developed through computer vision and through the incorporation of Artificial Intelligence (AI) [
11].
The contribution of this review lies in its structured synthesis of face recognition research spanning classical feature-based approaches and modern learning-based models. Rather than treating these paradigms in isolation, this work presents a unified taxonomy that explicitly connects traditional feature extraction and distance-based methods with convolutional and transformer-based architectures. In contrast to prior surveys that emphasize isolated accuracy improvements, this review provides comparative insights focused on practical trade-offs, including robustness to variations, computational complexity, dataset dependence, and deployment constraints. Furthermore, the review highlights research challenges that have gained increasing relevance in recent years, such as demographic bias, privacy preserving learning, robustness under unconstrained conditions, and lightweight deployment for edge and real-time systems. By positioning these themes within a single analytical framework, this paper aims to serve as a consolidated and up-to-date reference for researchers and practitioners working on contemporary face recognition systems.
There remains a need for current face recognition studies to assess the risks linked to the technology’s growing divergence from democratic and ethical considerations about privacy, social control, and passive citizen oversight. Recent studies have shown that FR systems are consistently and disproportionately inaccurate for certain subdemographic groups, such as women and people with certain skin tones [
12]. Such inaccuracies are not simply reflections of the technology’s current growing pains, but instead reflect systems of social injustice. FR systems misclassify certain subordinated groups, including people with darker skin tones and women, which can perpetuate social and economic marginalization [
13]. The ability to analyze and build face recognition systems should not outstrip the technology’s ability to exercise such systems ethically and without bias or discrimination [
14].
The structure of the study is organized as follows:
Section 1 gives an introduction.
Section 2 outlines the fundamental overview and taxonomic landscape.
Section 3 explores various applications of face recognition technology. In
Section 4, different face recognition datasets are discussed.
Section 5 presents the feature extraction techniques.
Section 6 describes deep learning approaches.
Section 7 discusses current challenges. Finally,
Section 8 discusses Emerging Directions and Ethical Considerations, and
Section 9 concludes the paper.
2. Face Recognition: Fundamental Overview and Taxonomic Landscape
2.1. Literature Search Strategy
This review was conducted according to the PRISMA 2020 Meta Analyses guidelines, Preferred Reporting Items for Systematic Reviews, and the study selection process is illustrated in
Figure 1. Literature searches were performed on Google Scholar, IEEE Xplore, and the Web of Science Core Collection, that cover publications published between 1998 and 2025. Google Scholar was used as a supplementary search source to reduce the risk of missing relevant studies not indexed in curated databases. Due to the lack of fully reproducible and exportable record counts in Google Scholar, approximate figures were used during the identification and screening stages. In contrast, IEEE Xplore and Web of Science provided reproducible indexing and were used as the primary sources for final study inclusion. A broad set of keywords was employed, including terms such as face, facial, person, identification, recognition, and technique, to capture a comprehensive range of studies related to face recognition technologies.
Publications not aligned with biometric face recognition or algorithmic developments were excluded, that include works focused on unrelated image classification or medical diagnostics. Because Google Scholar does not provide exact exportable record counts, approximate numbers were used during the identification and screening stages. These estimates reflect iterative query refinement, manual duplicate removal, and relevance based filtering. Pursuant to the PRISMA 2020 guidelines, the process to select the studies consisted of four stages: Introduction, Screening, Determination of Eligibility, and Final Inclusion. Surveyed records from the pre-selected databases were screened by relevance of the records’ titles and abstracts before being judged by full texts to ensure relevance to methodologies of biometric face recognition and algorithmic contributions.
2.2. Temporal Trends in Publication Volume
To explore patterns in publication dynamics, we analyzed the Final Publication Year (Final Year) metadata field indexed by Web of Science. This field denotes the calendar year in which an article is officially assigned to a journal issue and is particularly relevant for bibliometric analyses that depend on finalized citation records and stable publication timestamps.
2.2.1. Final Publication Year (Final Year)
As shown in
Figure 2, the number of final year publications increased from 10,579 in 2020 to 16,952 in 2024. A drop to 12,963 in 2025 may be attributed to many early-access articles not yet being formally assigned to issues at the time of data collection. The minor decline in 2023 (12,967) followed by a peak in 2024 may indicate editorial backlog clearance or strategic issue assignments by journals.
2.2.2. Interpretation and Implications
The publication trend observed from the Final Year data highlights the evolving nature of academic publishing workflows; while Early Access data typically provides an immediate reflection of research activity, the Final Year offers a more stable reference point for longitudinal analyses and archival bibliometric studies. The observed fluctuations suggest the influence of editorial practices, indexing delays, and backlog dynamics on yearly publication counts. Researchers conducting systematic reviews or meta-analyses should consider such temporal factors when designing inclusion criteria or interpreting trends. Beyond reflecting publication volume, these temporal trends also align with major technological developments in face recognition research. The sustained growth after 2020 corresponds to the widespread adoption of deep CNN-based embedding models, large-scale face datasets, and, more recently, transformer-based and architectures, indicating a strong relationship between methodological advances and research output.
3. Applications of Face Recognition
In recent years, biometric security systems have seen significant advancement, particularly in the domain of facial recognition. This technology has emerged as a reliable and practical approach for ensuring personal security across various applications, including smart cards, smart homes and governmental systems [
11,
15]. Furthermore, face recognition plays a critical role in numerous areas such as human computer interaction, information security, online banking, smartphone authentication, home video surveillance systems and automated identification [
16].
Recent deployment surveys report that more than 95% of modern smartphones now integrate face-based authentication as a primary or secondary unlock mechanism, reflecting its widespread adoption in mobile ecosystems. Empirical evaluations further show that face recognition performance varies substantially across operational contexts: controlled indoor environments frequently achieve verification rates above 98%, whereas unconstrained outdoor conditions with variations in pose, illumination, and occlusion typically reduce accuracy to the 85–92% range. Real-time surveillance systems must also operate under strict latency requirements, with many commercial systems targeting sub-100 ms end-to-end processing to maintain actionable responsiveness.
3.1. Security and Surveillance
Recent empirical studies on cybercrime show that historic fraud statistics from the 1990s no longer reflect the current threat landscape. A decade-long analysis of cybercrime in India, for example, reports that cases registered under the IT Act increased from 9622 in 2014 to 77,858 by August 2024, with financial fraud, phishing, identity theft, online harassment, and ransomware all exhibiting a sustained upward trajectory over this period. In parallel, deepfake technology has emerged as a distinct category of cybercrime, where AI-generated facial images, videos, and audio are misused for extortion, misinformation, and impersonation. Taken together, these trends indicate that modern face recognition systems operate in an environment marked by large-scale, AI-driven spoofing and identity abuse, underscoring the necessity of incorporating robust liveness detection, anti-spoofing mechanisms, and secure deployment protocols into contemporary system design [
17].
Face recognition plays a critical role in modern security and surveillance systems, offering automated identification of individuals in public and private spaces. Surveillance cameras integrated with face recognition algorithms enable real-time monitoring, threat detection, and criminal identification in crowded environments such as airports, train stations, and public events [
18]. Law enforcement agencies are increasingly deploying these systems to aid in crime prevention and suspect tracking, with face recognition enhancing traditional video surveillance by providing person-specific analytics rather than relying solely on behavioral cues [
19]. Despite its effectiveness, concerns about privacy and potential misuse have sparked significant public debate [
20].
3.2. Access Control Systems
Face recognition is widely adopted in access control systems, providing a contactless, convenient, and secure alternative to traditional authentication methods like passwords or ID cards. It is implemented in corporate buildings, research facilities, and restricted zones to regulate entry based on biometric verification [
21]. The technology reduces risks associated with lost or stolen credentials and enhances security by preventing unauthorized access. Furthermore, advancements in deep learning have remarkably improved recognition accuracy under varying lighting and environmental conditions, making it satisfactory for real-world deployment [
22]. However, spoofing attacks using photographs or masks remain an active area of research concern [
23].
In practical surveillance deployments, algorithmic performance is strongly influenced by operational constraints such as real-time processing requirements, camera placement, and environmental variability. A key technical challenge is maintaining a low False Positive Rate (FPR), as excessive false alarms can overwhelm monitoring personnel and diminish system reliability. Equally important is system latency, the end-to-end delay between frame acquisition and identity decision, which must remain within tens of milliseconds for real-time threat detection. Modern surveillance pipelines often integrate lightweight CNN backbones [
24] and quantized models to meet computational budgets while preserving recognition accuracy. Furthermore, occlusion, motion blur, and low-light conditions introduce significant intra-class variation, necessitating robust detection and tracking modules to ensure stable embedding extraction in dynamic scenes.
3.3. Human Computer Interaction
In human computer interaction (HCI), face recognition contributes to personalized, adaptive interfaces that respond to the presence and identity of users. Applications include automatic login to computing devices, personalized content delivery, and gaze-based interaction systems [
25]. Face recognition also facilitates emotion-aware computing, where systems adapt based on the detected emotional state of the user [
26]. Such advancements promote intuitive and efficient interaction, particularly in intelligent tutoring systems, gaming, and virtual reality [
27]. Nonetheless, achieving high accuracy under unconstrained conditions, such as varying head poses and occlusions, remains a technical challenge.
3.4. Mobile & IoT Systems
The proliferation of smartphones and Internet of Things (IoT) devices has accelerated the integration of face recognition for user authentication and device control. Face unlock features, popularized by devices such as Apple’s Face ID, offer secure and convenient access to personal mobile devices [
28]. In smart home environments, face recognition enables personalized automation, such as adjusting lighting or temperature based on the detected individual [
29]. Lightweight, energy-efficient face recognition models are actively being developed to overcome computational limitations in IoT systems [
30]. However, ensuring robustness against spoofing attacks and maintaining user privacy are critical research priorities.
3.5. Smart Cities and Law Enforcement
Face recognition is a cornerstone technology in the development of smart cities, supporting public safety, efficient governance, and intelligent transportation systems. Real-time facial analytics assist law enforcement in identifying wanted individuals, monitoring public spaces, and managing large-scale events. For instance, several metropolitan areas have deployed city wide face recognition networks integrated with surveillance infrastructure [
31]. These systems can also be linked with databases to expedite forensic investigations. Despite its benefits, ethical concerns regarding mass surveillance and civil liberties have prompted regulatory discussions worldwide [
32].
3.6. Healthcare and Emotion Detection
In healthcare, face recognition is emerging as a tool for patient identification, access control, and contactless monitoring, particularly valuable in post-pandemic healthcare environments [
33]. Moreover, emotion detection based on facial expressions aids in mental health assessment, stress detection, and therapeutic interventions [
27]. Assistive technologies for individuals with autism or cognitive impairments leverage face recognition to enhance social interaction and emotional awareness [
34]. These applications require high accuracy and sensitivity to subtle facial cues, often in dynamic unconstrained environments. Ethical considerations around privacy and data security are paramount, especially in sensitive healthcare contexts.
Restricted Mode: Restricted operational modes in face recognition systems typically refer to settings where the model is constrained to verify identities against a predefined enrollment set. These systems are widely used in high-security environments such as enterprise authentication, e-governance platforms, and controlled-access infrastructures. Verification performance in restricted modes depends strongly on image quality, enrollment conditions, and robustness to intra-class variation. Similarly, biometric identity verification plays a central role in applications such as newborn identification, electronic transactions, passport control, and employee authentication. Modern mobile and IoT ecosystems incorporate face-based verification through optimized models capable of real-time inference under resource-constrained hardware. These deployments highlight the increasing shift toward contactless, privacy-aware authentication without relying on user-dependent actions beyond image capture.
Verifying Identification: Newborn identity, internet transactions, banking, passports, national identification, and employee identification [
35].
Face recognition has been incorporated into numerous smartphone applications due to Open-Source Computer Vision’s (OpenCV: opencv.org) availability on Android and iOS. Indeed, it was discovered that over 500 Android apps made advantage of facial recognition. For example, the Android version now offers a face-unlocking software that uses facial recognition in place of the conventional password-based access control system [
35,
36]. Assume someone wants to establish a face recognition protection on their phone/app. He or she must do face scanning on the camera’s sensor, that saves all data about facial gestures in the database.
4. Datasets for Face Recognition
While
Table 1 and
Table 2 list dataset characteristics, we further provide a horizontal comparison of their applicability. Legacy datasets (e.g., Yale, ORL) are suitable for controlled-condition algorithm benchmarking, whereas in-the-wild datasets such as LFW, VGGFace2, and MegaFace are essential for evaluating robustness to pose, illumination, and demographic diversity. Researchers selecting datasets should consider the intended application: LFW for unconstrained identity verification, VGGFace2 for deep metric learning, and IJB-series datasets for extreme pose/occlusion scenarios. This guidance supports task-specific dataset selection.
Controlled datasets are collected under standardized capture conditions in which illumination, camera angle, background, and subject pose are intentionally regulated. These datasets typically include uniform labeling procedures, consistent resolution, and limited demographic variability, making them suitable for benchmarking algorithmic performance under idealized settings. In contrast, unconstrained or “in-the-wild’’ datasets are captured in highly variable real-world environments where illumination, pose, occlusion, expression, and imaging devices fluctuate significantly. Such datasets also tend to exhibit demographic imbalance across age, skin tone, and gender, which introduces additional bias factors. The distinction is therefore not only about visual variability but also about annotation difficulty, environmental unpredictability, and the broader range of performance challenges they impose on recognition systems.
A critical consideration when working with face recognition datasets is the range of challenges they inherently present. Many widely used datasets exhibit demographic bias, with disproportionate representation across gender, age, and skin-tone groups. Such imbalances can lead to skewed model performance and contribute to systematic disparities during deployment. In-the-wild datasets also frequently contain noisy or incorrectly labeled images due to automated or large-scale collection procedures, which can introduce inconsistencies in training and evaluation. Privacy concerns constitute an additional challenge, as many facial datasets include individuals whose images were collected without explicit consent, raising ethical and regulatory issues regarding data use, redistribution, and long-term storage. Beyond these challenges, face recognition datasets support a variety of benchmark tasks that extend beyond general “performance benchmarking.” Controlled datasets typically facilitate verification (1:1 matching) and identification (1:N search) under standardized conditions, while large-scale in-the-wild datasets enable more complex evaluations such as open-set recognition, face clustering, cross-pose matching, and robustness analysis under uncontrolled illumination, occlusion, and sensor variability. Clarifying these benchmark tasks highlights the functional role of each dataset type and underscores how their inherent characteristics influence the development and assessment of face recognition algorithms.
4.1. Dataset Evolution Trends
Face recognition datasets have evolved significantly over the years. Early datasets such as FERET, ORL, and Yale focused on controlled environments, where variations in lighting, pose, and expression were minimized. These datasets were essential in benchmarking initial algorithms but lacked the diversity and complexity needed for real-world applications.
Recent trends have shifted toward in-the-wild datasets like LFW, VGGFace2, and MegaFace, which capture more realistic and challenging variations in illumination, pose, and occlusion. These datasets better reflect the complexities that modern face recognition systems must address.
Moreover, datasets are now increasingly being curated with a focus on demographic diversity, addressing challenges such as gender, ethnicity, and age imbalance. This trend is essential for ensuring the fairness and robustness of face recognition systems. However, despite these advancements, issues like the underrepresentation of certain groups in datasets remain an ongoing challenge.
Identity Verification (1:1 Matching): Datasets like LFW and VGGFace2 are suitable for evaluating the performance of face recognition systems in unconstrained environments. These datasets are designed for identity verification and are useful for tasks requiring high accuracy.
Face Clustering and Identification (1:N Matching): Large-scale datasets such as MegaFace and CelebA are ideal for assessing the ability of systems to recognize and identify individuals across a wide range of images.
Robustness to Pose, Occlusion, and Illumination: Datasets like CASIA-WebFace and WIDER FACE are designed to challenge algorithms with varying head poses, lighting conditions, and partial occlusions, making them suitable for testing the robustness of recognition systems.
Deep Metric Learning and Face Embedding: Datasets such as VGGFace2 and IJB-A are used to evaluate face embedding models that aim to map faces into a compact space for comparison. These datasets allow for detailed performance evaluation in metric-based approaches.
4.2. Dataset Bias and Its Impact
One of the significant challenges in modern face recognition datasets is the presence of demographic bias. Many widely used datasets exhibit a disproportionate representation of certain groups, particularly in terms of gender, age, and skin tone. This bias can lead to skewed model performance, with certain demographic groups being misclassified more often than others.
For instance, several studies have shown that face recognition systems are less accurate for women and individuals with darker skin tones. This bias is often due to the overrepresentation of lighter-skinned male subjects in many datasets, which leads to a lack of generalization for other groups.
To mitigate this bias, researchers are encouraged to consider dataset augmentation techniques, such as generating synthetic data through GANs or diffusion models. Active learning approaches and fairness-aware training techniques can also be employed to ensure that models perform equitably across different demographic groups. Furthermore, using fairness metrics to assess model performance across diverse subgroups is essential for developing more inclusive face recognition systems.
4.3. In-the-Wild Datasets
In-the-wild face recognition datasets are designed to reflect real-world, unconstrained environments, where factors such as varying lighting, pose, occlusion, expression, and image quality introduce significant challenges for automated face recognition systems. Unlike controlled datasets captured under laboratory conditions, these datasets aim to evaluate the robustness and abilities of face recognition algorithms under complex, naturally occurring variations [
19,
51]. Over the past decade, numerous large-scale, publicly available datasets have emerged, playing a key role in benchmarking and advancing the state-of-the-art in domain of face recognition.
5. Feature Extraction Techniques
5.1. Holistic/Statistical Approaches
Holistic or statistical approaches treat the entire facial region as a unified entity for feature extraction, focusing on global information rather than local details. Techniques like Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA) and Independent Component Analysis (ICA) have been widely applied in this context [
57,
58,
59]. These methods minimize the dimensionality of face images while preserving the most discriminative features, recognition accuracy and enhancing computational efficiency. However, holistic approaches are often sensitive to variations in illumination, facial expression, and pose, which limits their robustness in unconstrained environments [
19]. While holistic methods provided the foundational statistical viewpoint for early face recognition, their reliance on global appearance and linear projections makes them insufficient for capturing the complex geometric and nonlinear variations present in real-world data. These limitations naturally motivated the development of model-based approaches that explicitly encode 3D structure and local facial geometry, offering a more resilient alternative under challenging pose and illumination conditions.
Additionally, holistic statistical methods offer computational efficiency and interpretable feature representations, making them suitable for small-scale or controlled-environment applications. However, their reliance on linear projections and global facial structure limits robustness under unconstrained conditions involving significant pose changes, illumination variation, or occlusion. PCA excels in dimensionality reduction but remains highly illumination-sensitive; LDA improves class separability but requires sufficient samples per identity; ICA captures localized texture information but incurs higher computational cost. Although these approaches laid the foundation for early face recognition systems, their performance degrades markedly in real-world scenarios where nonlinear and spatially localized variations dominate.
A comparative analysis of these classical approaches shows that each method excels under specific conditions. PCA performs best on small to medium-sized datasets where dimensionality reduction is essential and illumination changes are limited; however, it remains highly sensitive to lighting and pose variations. LDA is advantageous when sufficient labeled samples are available for each class, as it explicitly maximizes inter-class separability, but it can underperform when class distributions are imbalanced or when training data are scarce. ICA is more robust to noise and local texture variation, making it preferable in scenarios with moderate occlusion or expression changes, though at the cost of greater computational complexity. Kernel-based extensions such as KLDA and KPCA improve nonlinear modeling capability and are effective in handling complex variability, but rely heavily on appropriate kernel selection and tend to require more computational resources. These distinctions provide practical guidance on choosing the appropriate classical method based on data scale, labeling conditions, and environmental variability.
5.1.1. Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is a globally used dimensionality reduction technique that facilitates feature extraction in face recognition systems [
60]. It was first introduced in the context of human face recognition by Turk and Pentland [
61], building upon earlier work by Kirby and Sirovich, who reconstructed human faces by isolating the most significant components within high-dimensional data [
62]. PCA effectively reduces the dimensionality of original datasets by retaining the principal features [
58,
63], which improves computational efficiency and recognition accuracy.
This technique forms the basis of the eigenfaces approach, which relies on a set of facial features like eyes, nose, mouth, and cheeks rather than whole-face representations to identify individuals. A key requirement of eigenfaces is consistent lighting conditions. To address storage and processing constraints, PCA applies data compression techniques that compute low-dimensional representations of face data [
61,
64,
65,
66].
In [
61], three experiments were conducted to enhance PCA’s efficiency by reducing computational time without sacrificing accuracy. Results demonstrated that the second experiment matched the accuracy of the first, yet required significantly less processing time a 35% reduction compared to the standard PCA method especially effective when applied to large databases.
Moreover, integrating PCA with multiple distance classifiers has led to the development of more robust face recognition systems [
67]. For example, the use of the Olivetti Research Laboratory (ORL) database showed improved performance when PCA was combined with the Euclidean Distance (ED) classifier. Comparative analysis revealed that classifiers such as the Squared Euclidean Distance Classifier (SEDC) and the City Block Distance Classifier (CBDC) yielded better performance than the Squared Chebyshev Distance Classifier (SCDC). Notably, recognition rates using ED and SEDC were found to be nearly identical.
Further advancements include the incorporation of illumination-invariant preprocessing techniques to enhance PCA performance. One such method, Gradient Faces, demonstrated improved recognition accuracy during the preprocessing stage, as reported in [
68]. Additionally, a system combining Discrete Cosine Transform (DCT) with PCA and a Backpropagation Neural Network (BPNN) was introduced by Barnouti (2016) [
69], achieving recognition rates exceeding 90% on the Face94 and Grimace databases.
An alternative approach, known as Normalized Principal Component Analysis (NPCA), was proposed by Fares Jalled (2017) [
70], which was evaluated using the ORL and Indian Face Database. The results underscore the potential of PCA-based methods, especially when paired with preprocessing and classification enhancements, in achieving high-accuracy face recognition.
5.1.2. Independent Component Analysis (ICA)
Independent Component Analysis (ICA) is a statistical method that models observed signals as linear combinations of statistically independent sources. Unlike PCA, which aims to find uncorrelated components, ICA seeks a representation that maximizes statistical independence, offering more robust handling of nonstationary data [
71]. ICA reduces both second-order and higher-order dependencies, making it well-suited for face recognition applications. It is grounded in the framework of Blind Source Separation (BSS), which attempts to decompose a mixed signal into its constituent independent sources [
72,
73].
Hybrid models integrating PCA and ICA have shown promise in enhancing recognition performance. Sharma and Dubey (2014) [
64] proposed a face recognition system that extracts invariant facial features using both PCA and ICA, followed by neural network training. This approach improves accuracy by leveraging both global and local feature representations. Further, Karande et al. [
74] emphasized the importance of maximizing feature independence and mutual information with target labels. Their study incorporated edge-based global features and modular ICA-based local features, presenting a novel direction for robust biometric systems.
5.1.3. Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA) referred to in face recognition as Fisherface method, which is a supervised learning method that aims to maximize class separability by locating linear combinations of features that best differentiate between classes [
75]. Unlike PCA, which focuses solely on capturing variance across the dataset, LDA prioritizes inter-class variance, making it more effective in distinguishing among individuals. Moreover, LDA demonstrates resilience to variations in lighting, pose, and facial expression.
Ref. [
9] proposed an enhanced version of LDA using an empirical decomposition technique based on the Lower-Upper (LU) factorization of image matrices. Fisher Linear Discriminant Analysis (FLDA) is then applied to compute the projection space, followed by Euclidean distance for classification. This method improved accuracy across several standard datasets, including FERET, AR, ORL, and Yale B.
Additionally, Arabia Soula et al. (2016) [
76] introduced a classification approach combining Kernel Fisher Discriminant Analysis (KFDA) with Gabor features and ordinal measures. PCA was used for dimensionality reduction, and a multi-class KFDA classifier based on the radial basis function kernel was applied. Experiments on the ORL and Yale databases showed that this approach achieved an accuracy of 88.8%, significantly outperforming standard LDA, which achieved 33.3%.
5.1.4. Kernel Linear Discriminant Analysis (KLDA)
Kernel Linear Discriminant Analysis (KLDA) extends the traditional LDA by incorporating kernel methods to handle non-linear relationships within the data. It maps input vectors into a higher-dimensional feature space utilizing kernel functions that satisfy Mercer’s theorem [
77]. This enables more complex decision boundaries while maintaining computational efficiency.
Naveen Kumar H. N. et al. [
78] utilized KLDA in combination with Histogram of Oriented Gradients (HOG) and Support Vector Machines (SVM) for face recognition using the Cohn–Kanade dataset. Their findings indicated improved performance when structural and appearance-based features were used instead of purely geometric ones.
Farag G. Zbeda et al. (2016) [
79] compared Gabor-PCA and Gabor-KPCA methods, where feature vectors were first extracted using HOG and then reduced using PCA or KPCA. Experimental results using distance metrics such as Euclidean, Cosine, City Block, and Mahalanobis-Cosine (MAHCOS) revealed that Gabor-PCA outperformed Gabor-KPCA by 6.67%, 0.83%, 12.00%, and 4.17%, respectively, in recognition performance on the ORL database, particularly under low-resolution conditions.
The choice of kernel function in KLDA critically determines the type of nonlinear relationships the model can capture. Linear kernels preserve the original feature space structure and are effective when class boundaries remain approximately linear. Polynomial kernels introduce higher-order interactions, enabling KLDA to model moderate nonlinearities such as subtle texture variations or smooth pose changes. Radial Basis Function (RBF) kernels provide the highest flexibility by mapping samples into an infinite-dimensional space, making them particularly suitable for datasets with complex intra-class variability. However, RBF kernels require careful tuning of hyperparameters to avoid overfitting. Understanding these trade-offs is essential for applying KLDA effectively in unconstrained face recognition scenarios.
5.1.5. Hidden Markov Model (HMM)
Hidden Markov Models (HMMs), traditionally used in speech recognition, have been successfully adapted for face recognition. In this context, the face is segmented into regions such as nose, mouth, and eyes, which align naturally with the sequential state-based modeling of HMMs [
64,
73,
74,
80].
Phaneemdra et al. (2015) [
81] introduced a face recognition system where facial images are partitioned into blocks, and Discrete Cosine Transform (DCT) is added to each block. When combined with PCA, this reduces the feature dimensionality and accelerates the recognition process. Their method achieved a recognition accuracy of 95.21% using only half the images in the ORL database for training.
5.1.6. Kernel PCA (KPCA)
Kernel Principal Component Analysis (KPCA) is a nonlinear extension of PCA that projects the data as an input into a high-dimensional feature space where linear PCA is then performed. Unlike traditional PCA, KPCA does not require prior specification of significant components or complex optimization procedures [
82].
Ref. [
83] applied KPCA using a polynomial kernel for facial expression recognition. They employed Euclidean distance and k-nearest neighbor (k-NN) classifiers, achieving results comparable to conventional PCA-based systems.
Vinay et al. (2015) [
84] conducted a comparative analysis of Gabor-PCA and Gabor-KPCA methods using the ORL database. Recognition performance was evaluated across various distance metrics. Gabor-PCA outperformed Gabor-KPCA by notable margins—6.67% using Euclidean distance, 0.83% using Cosine, 12.00% using City Block, and 4.17% using MAHCOS—demonstrating the superior effectiveness of linear Gabor-PCA in low-resolution scenarios.
Similarly, kernel selection plays a central role in KPCA performance. Polynomial kernels allow KPCA to extract higher-order intensity correlations that are not captured by linear PCA, which is beneficial when expression or illumination changes induce nonlinear variation. RBF kernels provide even more expressive power by enabling KPCA to reconstruct highly nonlinear manifolds characteristic of in-the-wild facial imagery; while RBF-based KPCA often yields superior recognition accuracy, its computational cost increases substantially with dataset size, and generalization remains sensitive to kernel bandwidth choice. Thus, kernel selection represents a critical balance between representational capacity, robustness, and computational feasibility in practical face recognition systems.
5.2. Model Based Approaches
Model-based techniques emerged as a response to the shortcomings of holistic statistical methods by shifting focus from 2D pixel-level variation toward explicit modeling of facial geometry. Instead of describing the face as a single appearance vector, these approaches incorporate structural constraints—often in 3D to better handle pose, illumination, and expression changes that statistical projections struggle to normalize.
5.2.1. 3D Morphable Model (3DMM)
3D Morphable Models (3DMMs) utilize 3D facial geometry to improve recognition under varying pose and illumination conditions. These models are generally categorized into two types: 3D face reconstruction and 3D pose estimation [
85].
One of the strengths of 3DMM is its intrinsic ability to normalize illumination and pose by explicitly modeling facial shape and texture as continuous 3D surfaces. Because the model separates identity-dependent geometry from lighting parameters, it can re-render a face under canonical illumination and a frontal pose, significantly improving robustness in unconstrained settings. This makes 3DMM particularly effective in scenarios where 2D appearance-based methods fail due to self-shadowing, extreme viewpoints, or uneven lighting.
A notable advancement is the Albedo-Based 3D Morphable Model (AB3DMM), that incorporates a shine normalization technique during preprocessing to eliminate lighting variations. Experimental results on the Multi-PIE database demonstrated a recognition rate of 86.76%, particularly when paired with the SSR+LPQ method [
86]. Additionally, Hu et al. [
86] proposed projecting 3D facial landmarks in a grating format onto 2D images and aligning five key landmarks with a generic 3D face model to enhance geometric consistency.
Despite operating on 3D geometry, classical 3DMMs maintain relatively moderate computational cost because they rely on low-dimensional shape and texture bases derived through PCA-like decomposition. This compact representation allows efficient optimization during fitting and makes 3DMMs suitable for practical deployment in pre-deep-learning pipelines.
Modern deep learning-based 3D morphable networks build directly on the foundations of 3DMM. Approaches such as 3DDFA, Deep3DMM, and regressors built on the Basel Face Model extend classical morphable models by learning to predict 3D parameters from a single 2D image using convolutional or transformer backbones. These methods retain the interpretability and geometric priors of 3DMM while dramatically improving fitting accuracy and robustness under large pose variations. In this way, the classical 3DMM framework continues to serve as the backbone for contemporary 3D-aware face recognition systems.
5.2.2. Elastic Bunch Graph Matching (EBGM)
Elastic Bunch Graph Matching (EBGM) is a powerful face recognition technique that leverages Gabor jets to extract feature vectors from predefined facial landmarks. It is designed for accuracy rather than real-time performance [
72]. The algorithm operates in three stages: first, facial landmarks are detected by matching Gabor jets from new images to those in the training set; second, a “FaceGraph” is generated as a compact representation of the face; and third, similarity is computed between FaceGraphs by comparing their respective Gabor jet descriptors.
As shown in [
9], EBGM provides high recognition accuracy, with its key advantage being robustness to head rotation capable of recognizing faces at up to 22° of rotation [
87]. This robustness makes EBGM a strong candidate for face recognition systems where pose variation is a concern.
The benefit of the Elastic Bunch Graph Matching method is that it can detect faces up to 22 degrees of rotation [
87].
EBGM remains one of the most influential early model-based techniques because of its ability to encode localized texture variations through Gabor jets while preserving the relational structure of facial landmarks. Although computationally heavier than holistic methods, its flexibility in handling pose and expression changes continues to inspire modern graph-based and landmark-driven recognition approaches.
5.3. Hybrid Techniques
Despite the advantages of geometric modeling, model-based approaches often require higher computational resources and specialized fitting procedures. To balance the descriptive power of statistical features with the discriminative strength of modern classifiers, hybrid methods emerged as an intermediate solution. These techniques integrate components from both holistic and model-based perspectives, creating pipelines that exploit complementary strengths while mitigating individual weaknesses.
In face recognition, methods such as PCA+ANN and ICA+SVM are categorized as hybrid approaches because they integrate complementary stages of the recognition pipeline rather than relying on a single feature extractor or classifier. These combinations typically operate at the feature level: PCA or ICA performs dimensionality reduction and extracts discriminative representations, which are then fed into a secondary classifier such as an Artificial Neural Network (ANN) or Support Vector Machine (SVM). This feature-level hybridization leverages the strengths of both components statistical feature extraction for compact representation and machine learning classifiers for nonlinear decision boundaries—resulting in improved robustness compared to standalone statistical methods. Unlike decision-level fusion, where outputs of multiple classifiers are merged, these approaches form a unified pipeline in which the extracted features directly drive the final classification stage.
5.3.1. Integration of Principal Component Analysis with ANN
Hybrid face recognition techniques leverage complementary strengths of multiple feature extraction or classification methods. Most studies apply either feature-level fusion (e.g., concatenating PCA, Gabor, or CNN features) or score-level fusion to improve robustness against illumination, pose, and noise variations [
88]. Such combinations often lead to better discrimination, although they may increase computational cost and require careful parameter tuning. Several studies have explored the integration of Principal Component Analysis (PCA) with Artificial Neural Networks (ANNs), including Convolutional Neural Networks (CNNs) and Backpropagation Neural Networks (BPNNs) to enhance the potential of face recognition systems. PCA is utilized primarily for dimensionality reduction, isolating the most useful features from high-dimensional facial image data. These extracted features are subsequently used as input to ANNs for classification tasks. The combination of PCA and ANNs has shown to improve computational efficiency and classification accuracy in face recognition applications. PCA and ANN systems are effective because PCA removes redundant information and stabilizes the ANN input. However, PCA captures only linear variance, which limits performance under nonlinear variations such as expression changes or partial occlusion [
89]. The effectiveness also depends on selecting an appropriate number of principal components.
5.3.2. Optimizing Face Recognition with PCA and KNN: A Machine Learning Approach
Chen et al. (2017) [
90] presented a machine learning approach combining PCA with the K-Nearest Neighbors (KNN) algorithm to optimize face recognition performance. In this study, PCA was applied to the Labeled Faces in the Wild (LFW) dataset to reduce feature space dimensionality, effectively retaining essential discriminative information. The KNN algorithm was then employed to classify the reduced feature vectors by identifying the nearest neighbors within the dataset. This combined approach achieved an overall recognition accuracy of 88%, highlighting its effectiveness in real-world face detection scenarios. The PCA–KNN combination works well in low-dimensional spaces because PCA reduces noise while KNN performs distance-based matching. However, KNN remains sensitive to intra-class variability, and its performance decreases on large datasets due to high computational cost during nearest-neighbor search [
91].
5.3.3. Face Recognition PCA Using Convolutional Neural Networks
Winarno et al. (2019) [
92] proposed a hybrid face recognition method that integrates PCA with Convolutional Neural Networks (CNNs). In this framework, PCA serves as a pre-processing step for feature extraction, reducing the complexity of facial image data before feeding it into CNNs for classification. This method leverages the interpretability of PCA and the high recognition capabilities of CNNs, resulting in enhanced classification performance in facial recognition tasks. Using PCA before a CNN reduces feature dimensionality and helps prevent overfitting when training data is limited. A limitation is that PCA applies a fixed linear projection, which may restrict the CNN from learning fully optimized end-to-end features [
93].
5.3.4. Combination of Independent Component Analysis with Gabor Filters or Support Vector Machines
Al-Dahhan et al. (2024) investigated a multi-stage facial recognition pipeline that incorporates Independent Component Analysis (ICA), Gabor filters, t-distributed Stochastic Neighbor Embedding (t-SNE), and multiclass Support Vector Machines (SVMs). This method begins with feature extraction using Gabor wavelet transforms, followed by dimensionality reduction through t-SNE, and classification using kernelized SVMs. Evaluated on the Yale, ORL, and JAFFE face databases, the model demonstrated high recognition accuracy and robustness in identifying individual faces [
94]. Similarly, Alphonse et al. (2017) explored the synergistic combination of Gabor filters and kernelized SVMs for facial recognition tasks. Their approach emphasized the superior classification performance of SVMs in complex feature spaces and reported improvements in both accuracy and robustness when applied to face recognition benchmarks [
95]. Hybrid pipelines combining ICA, Gabor features, and SVMs benefit from both texture representation and nonlinear classification. However, these multi-stage systems involve higher computational cost and are sensitive to parameter selection such as the number of ICA components or SVM kernel choice [
96,
97].
Hybrid methods aim to compensate for the limitations of individual feature extractors by combining complementary representations. Approaches such as PCA+ANN, PCA+KNN, or ICA+Gabor+SVM often achieve superior accuracy compared to single-method pipelines, particularly in datasets with moderate expression or illumination variation. Their primary advantage lies in leveraging both global and local descriptors, enabling richer discriminative features. However, these systems typically require more computational resources, careful parameter tuning, and larger training datasets to avoid overfitting. Moreover, multi-stage pipelines introduce latency and reduce scalability compared to end-to-end deep learning models; while hybrids provide a useful intermediate step between classical and modern methods, their practical adoption is limited in real-time or large-scale deployments.
6. Deep Learning Approaches
6.1. CNN-Based Methods
Convolutional Neural Networks (CNNs) have fundamentally transformed face recognition by enabling end to end learning of discriminative features directly from pixel intensities. Instead of relying on hand crafted descriptors, CNNs learn hierarchical representations and map face images into a compact embedding space, where similarity between identities is measured using distance metrics. CNNs map face images into a compact embedding space where similarity between identities is measured using distance metrics. DeepFace primarily uses cross entropy loss for identity classification, whereas FaceNet optimizes the embedding space using triplet loss to ensure that embeddings of the same identity remain closer together than those of different identities, making metric learning a core component of modern face recognition systems. This embedding learning paradigm was popularized by DeepFace [
98] and later strengthened by models such as FaceNet [
99]. DeepFace primarily uses a cross entropy loss for identity classification, whereas FaceNet optimizes the embedding space using the triplet loss to enforce inter class separation, which introduced the triplet loss to enforce that embeddings of the same identity remain closer together than those of different identities. This formulation made metric learning an important part of current face recognition systems. Subsequent architectures, including VGGFace [
49] and ArcFace [
100], significantly enhanced recognition accuracy by refining the embedding space. In particular, ArcFace introduced the additive angular margin loss, producing highly discriminative features with improved inter class separability. Further research has also explored explainability, such as the work in [
101] integrates a CNN model with Scaled Directed Divergence (SDD) to generate class activation maps that highlight influential facial regions. Overall, CNN based methods represent the foundation for state of the art face recognition systems due to their robustness to variations in pose, illumination, and expression. CNNs are used for different use cases such as CCTV-based trash detection [
102] as well as for traffic accident prevention [
103]. CNNs have also been employed in video saliency prediction, where enhanced spatial feature extraction and temporal modeling are used to highlight visually important regions in surveillance footage, supporting more efficient downstream video understanding tasks [
104]. Building on these CNN based foundations, recent research has also focused on developing lightweight and transformer driven architectures to meet the demands of real time and large scale face recognition. The next subsection discusses these emerging models.
6.2. Lightweight & Transformer-Based
While CNNs dominate conventional face recognition pipelines, practical deployment requires models that are either computation efficient or capable of capturing global relationships beyond local convolutional filters. This motivates the development of lightweight CNNs and transformer-based architectures. Mobile CNN architectures, such as MobileFaceNets [
105], FaceLiVT [
106], ShuffleFaceNet [
107] and EfficientFace [
108], have introduced due to the increasing need for real-time face recognition (FR) on resource-limited devices like mobile phones or embedded systems. These models achieve high speed and low memory usage while maintaining a reasonable level of accuracy. At the same time, transformer models have started to emerge in the FR domain due to the ability to model long-range and global dependencies. Vision Transformers (ViT) [
109] and face-specific versions, including TransFace [
110], employ self-attention to go beyond the rich representations captured by localized CNN filters. Compared to CNN-based models, transformer approaches such as ViT and TransFace often achieve higher recognition accuracy on large scale benchmarks but this improvement typically comes at the cost of requiring significantly computational resources and more training data. CNNs remain more efficient in low-power environments, whereas transformer models demonstrate advantages mainly when large scale pretraining is available. In spite of their achieved successes, transformer models tend to require large-scale pretraining and optimization for low-power deployments. Although transformers and lightweight models expand the flexibility of face recognition systems, they still depend heavily on large scale datasets and may struggle under data scarce or high variation conditions. To address such limitations, diffusion based generative models have recently emerged as a complementary direction. Recent research therefore focuses on mitigating these limitations through architectural and system-level optimizations, including token pruning, low-rank attention approximations, hybrid CNN Transformer designs, and post-training compression techniques such as pruning and quantization. These strategies aim to reduce memory footprint and inference latency while preserving the representational advantages of self-attention, making transformer-based FR more viable for edge and real-time applications. Overall, while lightweight CNNs and transformers extend the design space of face recognition systems, both paradigms still face challenges in data efficiency and robustness under extreme variations. This has motivated growing interest in generative models, particularly diffusion-based approaches, as complementary mechanisms to enhance representation learning through synthetic data generation and augmentation.
6.3. Diffusion Based Model
Beyond purely discriminative models, generative approaches offer new capabilities for augmenting limited datasets and simulating realistic intra class variations. Diffusion models represent one of the most promising generative families for these purposes. Diffusion models are now being researched for their application in face recognition tasks. These models learn the distribution of data by reversing a process of diffusion that incrementally adds noise to the input data. In regard to face recognition, diffusion models serve the purpose of data augmentation as well as creating high-fidelity synthetic faces for training [
111]. Furthermore, they show great potential for enhancing robustness by simulating intra-class variations of diverse classes. Recent work demonstrates that diffusion models simulate intra-class variation by progressively denoising latent representations conditioned on identity, enabling controlled generation of pose, illumination, or expression changes while preserving identity features. In few shot face recognition, diffusion priors are used to synthesize identity consistent samples from very limited real images, thereby improving generalization under data-scarce scenarios. These conditional and identity preserving generation mechanisms make diffusion models particularly useful for augmenting robust FR pipelines. While embedding recognition systems with diffusion-based generative priors for forensic tasks is still a developing branch of research, it has a lot of potential for sharp-resolution imaging and deep representational learning needs [
112]. While diffusion models provide powerful identity preserving augmentation, other generative frameworks particularly GANs remain widely used for controlled face synthesis and domain adaptation. The following subsection reviews these GAN based contributions. However, diffusion models introduce their own challenges, most notably high computational cost and slow generation speed due to their iterative sampling process. These limitations hinder their direct applicability in real-time or large-scale training pipelines. Consequently, recent research has focused on accelerating diffusion inference through techniques such as fewer-step sampling, distillation, and latent diffusion formulations, aiming to balance generation efficiency with sample quality. It is worth noting that diffusion models are not intended to replace discriminative recognition architectures but rather to complement them by addressing data limitations and robustness challenges. Other generative frameworks, particularly GAN-based models, continue to play a significant role in controlled face synthesis and domain adaptation. The following subsection therefore reviews GAN-based contributions in the context of face recognition.
6.4. GANs and Face Synthesis in FR
Complementing diffusion based approaches, GANs have historically been the dominant generative technique in face synthesis due to their ability to produce high resolution and attribute controllable face images. Generative Adversarial Networks (GANs) have been extensively utilized to augment training data, improve recognition robustness, and enable face synthesis tasks. Models such as StarGAN [
113], StyleGAN [
114], and its improved versions have been instrumental in generating realistic face images with controllable attributes (e.g., pose, expression, age). In face recognition, GANs are used to perform pose-invariant recognition, domain adaptation, and data enhancement for imbalanced datasets [
115]. Disentangled representation learning with GANs further enables better identity preserving synthesis, which is critical in training FR models under data scarce scenarios. Recent studies have explicitly leveraged GAN-based data augmentation for few shot face recognition, demonstrating improved identity discrimination when only a limited number of samples per subject are available [
116].
Nonetheless, there are significant risks associated with the use of GAN-based synthesis when the GAN model is deployed, such as the possibility of dire concerns over identity leakage and privacy risks. Generative models, because of their ability to replicate faces, may unintentionally recreate faces and other personally identifiable information contained in the model training set. Another major issue is the amplification of biases when GANs are demographically imbalanced during training and the contradictory synthetic samples generated continue to reinforce the problematic bias in demographic samples, rather than try to mitigate it. These concerns are especially critical in contextual recognition, whereby differing demographic strata receive augmented samples and thus are able to achieve different recognition accuracies in the system. Finally, model inversion and attribute inference attacks target these attributes of generative models, and these concerns can allow malicious users to recover identifiable aspects of a face as they described in the training set.
Consequently, the significant advantages associated with GAN-based synthesis must be counterbalanced with the extent to which its risks can be mitigated within the face recognition system. Assessing privacy, demographic bias, and recognition system equity are the key considerations in the deployment of the face recognition system synthesis. Depending on these considerations, GAN based synthesis may require a range of privacy preserving techniques such as differential privacy, fairness-preserving sample selection, dataset balancing, and robustness testing of adversarial attacks. To provide a clear comparison of deep learning based face recognition models discussed in
Section 6.1,
Section 6.2,
Section 6.3 and
Section 6.4,
Table 3 summarizes their accuracy, data requirements, and computational costs across standard datasets. This comparison highlights the trade offs between CNN based, lightweight, transformer based, and diffusion models in terms of performance and resource demands.
7. Current Challenges
Face recognition systems continue to face significant challenges due to the growing complexity of deployment environments and increasing performance expectations. One major barrier is the heavy computational demand of modern architectures. Transformer-based models, while highly effective at capturing global dependencies, remain computationally intensive and memory-heavy, creating practical obstacles for real-time applications and edge deployment [
117].
A second persistent challenge is achieving consistent robustness in unconstrained environments. Variations in illumination, pose, occlusion, image modality, and sensor quality continue to degrade model performance, particularly for systems trained on limited or biased datasets. Even advanced architectures struggle when forced to generalize across extreme lighting conditions, severe pose deviations, or partial facial obstructions.
Ensuring efficiency on resource-limited hardware also remains a core difficulty. Although lightweight families such as MobileNet, ShuffleNet, and EfficientNet-lite have improved computational feasibility, maintaining high accuracy with compressed models is still a delicate trade-off [
118]. Techniques like pruning, quantization, and knowledge distillation reduce model size but often introduce accuracy degradation or require complex tuning.
Privacy and data governance represent another active challenge. Edge devices frequently operate in sensitive contexts, yet conventional training methods require centralized data aggregation, posing risks of identity leakage and misuse. While federated or privacy-preserving learning frameworks have been explored, their stability, communication overhead, and vulnerability to poisoning attacks remain open concerns.
Furthermore, real-world applications require face recognition systems to remain adaptive over time. Continuous changes in appearance, aging, and environmental variations can cause catastrophic forgetting in neural networks, highlighting the need for lifelong learning, incremental adaptation, and improved domain generalization techniques.
8. Emerging Directions and Ethical Considerations
Advancements in face recognition must increasingly integrate fairness, privacy, and accountability into their design. A key emerging direction is the development of algorithms capable of delivering equitable performance across diverse demographic groups. This requires constructing balanced datasets, enforcing bias-detection protocols, and designing models whose representations remain robust across age, gender, and ethnicity [
119].
Future approaches are also expected to employ more sophisticated mechanisms for handling occlusion and incomplete facial information. Techniques such as attention-based local patch recovery, identity-preserving GAN inpainting, and 3D morphable reconstruction networks offer promising pathways for resolving complex occlusion scenarios while maintaining identity fidelity.
Another avenue gaining traction is privacy-preserving machine learning. Federated learning, differential privacy, homomorphic encryption, and secure multiparty computation enable distributed training while minimizing exposure of sensitive facial data. Ensuring scalability, reliability, and regulatory compliance of these technologies will be critical for their adoption in real-world biometric systems.
Ongoing research continues to emphasize computational efficiency. Hardware-aware model compression, neural architecture search optimized for microcontrollers, and edge-enhanced inference strategies aim to deliver real-time performance without sacrificing accuracy [
120]. These innovations support sustainable deployment across smart devices, IoT nodes, and large-scale surveillance systems.
Multimodal integration is also emerging as a strong direction for next-generation biometric systems. Combining facial features with complementary modalities such as voice, gait, or thermal imaging can significantly enhance recognition reliability in challenging conditions and broaden operational contexts.
Finally, the evolution of face recognition technology must be accompanied by robust ethical frameworks and enforceable standards. Transparent model auditing, accountability mechanisms, informed consent practices, and legal guidelines will be essential for protecting civil liberties and ensuring responsible system deployment. Continued collaboration among researchers, policymakers, ethicists, and industry stakeholders will shape the trustworthy future of biometric technology.
8.1. Mitigating Algorithmic Bias and Privacy Concerns
In addition to recognizing the challenges posed by algorithmic bias and privacy risks, several advanced techniques have been proposed to address these issues from a technical perspective.
8.1.1. Fair Machine Learning Algorithms
Fair machine learning aims to reduce bias in model predictions by ensuring equitable performance across different demographic groups. Approaches such as adversarial debiasing, fairness constraints during model training, and reweighting techniques are commonly used to ensure that the model does not unfairly disadvantage certain groups. These methods are increasingly integrated into face recognition systems to promote fairness in identity verification tasks, especially in diverse populations.
8.1.2. Federated Learning
Federated learning is a decentralized approach that allows models to be trained across distributed devices while keeping the data local. This approach prevents the exposure of sensitive data during the training process, significantly enhancing privacy protection. By aggregating model updates from individual devices without sharing raw data, federated learning helps maintain data privacy while still allowing for robust model training.
8.1.3. Differential Privacy
Differential privacy is another technique that adds noise to the data in a way that prevents the identification of individual data points. By ensuring that the contribution of any single data point is obscured, differential privacy guarantees strong privacy protection for individuals, which is crucial in applications where facial data is being processed. Implementing differential privacy in face recognition systems ensures that models can be trained without compromising the privacy of the individuals whose data is used.
9. Conclusions
Face recognition has undergone significant advancements, transitioning from traditional handcrafted feature extraction methods to sophisticated deep learning and hybrid approaches. This evolution has enabled the development of more accurate and robust systems capable of operating accurately in diverse real-world scenarios. The integration of classical techniques such as PCA and ICA with modern deep learning models has shown promise in improving recognition performance, particularly in challenging conditions that involve variations in pose, illumination, and occlusion. Despite these advancements, several challenges persist. Issues such as demographic biases, privacy concerns, and the need for large diverse datasets remain critical areas requiring attention. Furthermore, the deployment of face recognition technologies in sensitive applications necessitates careful consideration of ethical implications and the establishment of appropriate regulatory frameworks.