1. Introduction
Facial reconstruction has a long history of combining scientific methods with artistic interpretation. At the end of the nineteenth century, pioneers like Hermann Welcker and Wilhelm His experimented with approximating facial appearance from skeletal remains, using clay or wax on bone casts guided by anatomical measurements and early soft tissue data [
1]. This work laid the foundation for forensic facial reconstruction [
2]. Over time, the field shifted from artistic representations to more systematic, anatomically based methods. Mikhail Gerasimov was a key figure in this transition, and his mid-twentieth-century work introduced reconstruction methods based on detailed anatomical studies [
3]. His technique involved reconstructing facial muscles and soft tissues directly on the skull while respecting established anatomical relationships and skeletal landmarks, providing a more anatomically structured and comparatively reproducible framework for facial reconstruction [
4]. Despite these advances, traditional manual reconstruction remained labor-intensive, highly dependent on professional expertise, and prone to substantial variation between reconstructions [
5]. The overall workflow of AI-assisted forensic facial reconstruction is illustrated in
Figure 1.
Digital techniques such as computer-aided design (CAD), three-dimensional (3D) scanning, and photogrammetry enabled high-resolution digital capture of skull morphology and soft-tissue locations. These tools have enabled researchers to create reproducible, manipulable 3D images of facial structures, reducing some of the subjectivity inherent in manual techniques and enabling quantitative analysis of the reconstruction process [
6,
7]. Among the most important subsequent developments was the integration of craniometric-based statistical modeling approaches, including principal component analysis. These methods enabled more population-sensitive modelling of facial soft tissue thickness and craniofacial geometry [
7,
8]. At the same time, population-specific databases of facial soft tissue thickness (FSTT) were developed. Detailed measurements accounting for demographic variables such as age, sex, and body mass index are now available for several populations, including adult males from Nigeria, Turkey, Brazil, and Germany [
8,
9,
10,
11]. A major recent development has been the integration of artificial intelligence (AI) and machine learning to generate increasingly dense soft-tissue datasets. This supports probabilistic modelling approaches that predict facial contours even in cases involving incomplete skeletal remains [
12].
Early applications of AI in forensic science primarily focused on pattern recognition and classification tasks. However, as deep learning techniques have become more advanced, researchers have increasingly explored how neural networks can model complex relationships between cranial morphology and facial features. Convolutional neural networks can model statistical associations between cranial morphology and facial traits based on training data [
13]. Recent pipelines combine 3D skull models, FSTT data, and algorithmic surface optimization to generate visually plausible facial contours [
14]. Moreover, deep learning models can incorporate demographic metadata, allowing reconstructions to account for variation related to age, sex, or population background and enabling the generation of multiple phenotypic hypotheses under conditions of uncertainty. This capability may support forensic investigations by generating multiple reconstruction hypotheses in cases involving unidentified remains; however, such reconstructions are generally intended as investigative support tools rather than definitive identification methods [
15]. In addition, the continued expansion of international databases of facial soft-tissue thickness increases the applicability of AI-based reconstruction methods, especially in cases where skeletal preservation or demographic information is limited [
16].
Recent developments in AI have expanded the methodological capabilities of forensic facial reconstruction, particularly through the use of generative models such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion Models. These models allow researchers to produce visually detailed facial images and explore a wide range of possible reconstructions. GANs can generate detailed facial textures and simulate variations in soft tissue, while VAEs provide probabilistic reconstructions that reflect natural variability in tissue distribution. Diffusion Models, representing the newest generation of generative algorithms, may improve visual image fidelity in generative imaging tasks; however, improved visual realism does not necessarily correspond to improved anatomical accuracy or forensic validity, and their applicability in forensic craniofacial reconstruction remains under active investigation [
17,
18]. The introduction of these generative techniques has expanded the methodological possibilities of facial reconstruction by automating processes and enabling the use of large datasets for model training. This approach also enables rapid reconstructions even when skeletal material is incomplete, which has important implications for forensic investigations. AI-assisted reconstructions may assist investigative processes involving unidentified remains, may contribute to investigative processes in unresolved cases, and facilitate investigative communication with law enforcement agencies and the public. By combining computational power, demographic information, and generative modeling techniques, AI-assisted approaches may improve the scalability and reproducibility of forensic facial reconstruction workflows while enabling the generation of probabilistic reconstruction hypotheses. Nevertheless, these systems remain dependent on dataset representativeness, methodological transparency, and further large-scale empirical validation before their forensic reliability can be fully established [
17,
19].
Importantly, forensic facial reconstruction should be distinguished from automated facial recognition or biometric identification systems. Reconstruction methods are primarily intended to generate investigative leads or visual approximations based on skeletal morphology, rather than to establish identity through automated matching procedures. Consequently, reconstructed facial images should be interpreted as probabilistic investigative aids rather than deterministic representations suitable for direct biometric identification.
At present, the principal challenges facing AI-assisted forensic facial reconstruction concern the limited interpretability of generative models, demographic imbalance in training datasets, uncertainty in craniofacial-to-soft tissue relationships, and the absence of standardized validation and governance frameworks. These limitations raise important methodological, ethical, and legal questions regarding the evidentiary interpretation and operational use of reconstructed facial images in forensic contexts.
Accordingly, this review critically evaluates not only recent technological advances but also the evidentiary, ethical, and governance-related limitations that currently constrain the forensic applicability of AI-assisted facial reconstruction systems.
2. Methodology of the Narrative Review
This study was conducted as a narrative review aimed at critically evaluating current developments, limitations, and future perspectives of artificial intelligence-assisted forensic facial reconstruction. Literature searches were performed in PubMed, Scopus, Web of Science, and Google Scholar between January and March 2026 using combinations of terms related to forensic facial reconstruction, craniofacial reconstruction, facial soft tissue thickness, artificial intelligence, machine learning, deep learning, generative models (including GANs, VAEs, and diffusion models), explainable AI, forensic imaging, and forensic biometrics. Additional relevant studies were identified through manual screening of reference lists from key publications.
The review included both historical and contemporary publications relevant to forensic facial reconstruction and AI-assisted craniofacial modelling, with particular emphasis on recent advances in generative artificial intelligence, explainable AI, and forensic imaging methodologies. Peer-reviewed forensic, anthropological, medical imaging, and computer vision studies were prioritized, while selected preprints and technical reports were additionally considered in areas where rapidly evolving AI methodologies remain underrepresented in the peer-reviewed forensic literature.
Given the rapidly evolving, interdisciplinary, and methodologically heterogeneous nature of AI-assisted forensic facial reconstruction research, a narrative review approach was considered more appropriate than a formal systematic review. This approach enabled broader critical discussion of methodological limitations, forensic applicability, interpretability challenges, demographic bias, ethical considerations, and governance issues associated with AI-assisted forensic facial reconstruction. Nevertheless, the authors acknowledge that narrative reviews remain inherently selective and may not comprehensively capture all emerging developments within this rapidly evolving field.
3. Deep Learning Models in Forensic Visualization
Deep learning has introduced new computational approaches to forensic facial reconstruction by modeling the relationships between skull structure and facial soft tissues. Early CNNs allowed researchers to identify statistical patterns within skeletal imaging data. Later models added demographic information, including age, sex, and population background, to improve population-sensitive modelling [
13,
15]. CNN-based approaches model statistical associations between cranial morphology and facial soft-tissue distributions. They infer statistical relationships between cranial landmarks and soft-tissue characteristics based on training datasets. Including population-specific data may support more demographically informed reconstruction hypotheses. This approach enables automatic predictions while accounting for natural variation [
13]. Generative adversarial networks (GANs) generate visually plausible facial images with detailed textures and varying soft-tissue patterns. They can produce multiple plausible reconstructions from the same skeleton, which is valuable when the material is incomplete or damaged [
20]. Variational autoencoders (VAEs) support GANs by providing probabilistic reconstructions. They capture natural variability and may support anatomically coherent soft-tissue approximations, even with limited data [
21]. Diffusion models represent the newest class of generative algorithms. They produce high-resolution and visually detailed facial representations and allow exploration of many facial variations. These models may provide improved image fidelity in generative imaging tasks; however, direct comparative validation of their forensic reconstruction performance relative to conventional methods remains limited [
22]. All methods use facial soft tissue and anthropometric measurements. With population-specific data, they can simulate faces of different sexes, ages, and ethnicities. This supports semi-automated generation of 2D and 3D reconstruction hypotheses, facilitates simulation of facial variability, and may assist in modelling demographic variation, soft-tissue distributions, and selected craniofacial characteristics. These approaches are currently applied primarily in research-oriented, experimental, and emerging forensic workflows. They may support investigative efforts involving unidentified remains, contribute to unresolved case investigations, and facilitate visual communication for law enforcement and public appeals. By combining computational modelling, demographic metadata, and generative imaging techniques, AI-assisted approaches may facilitate semi-automated generation of multiple reconstruction hypotheses and support exploratory forensic workflows. Nevertheless, the forensic reliability, reproducibility, and cross-population generalizability of these systems remain insufficiently validated across large and demographically diverse populations [
14].
Beyond demographic imbalance, the forensic reliability of AI-assisted facial reconstruction is strongly influenced by dataset quality and annotation consistency. Existing craniofacial datasets frequently originate from heterogeneous imaging modalities, including CT, CBCT, MRI, photogrammetry, and surface 3D scans, each characterized by different spatial resolutions, soft-tissue contrast properties, and acquisition artifacts [
23,
24]. Such heterogeneity complicates standardization and may reduce cross-dataset generalizability of trained models. An additional challenge concerns skull–face registration accuracy, as precise spatial alignment between skeletal structures and corresponding facial surfaces remains technically difficult and highly sensitive to landmark selection, segmentation quality, and image preprocessing protocols [
23,
25].
Variability in facial soft tissue thickness acquisition methods further contributes to inconsistency, since FSTT measurements may differ depending on imaging modality, observer methodology, anatomical landmark definitions, and population sampling procedures [
9,
10,
12]. Importantly, the field also remains constrained by the scarcity of large-scale paired skull–face datasets containing reliably matched cranial and ante-mortem facial data [
24]. This limitation substantially restricts external validation, increases the risk of overfitting, and limits the forensic generalizability of current deep learning models. Collectively, these dataset-related factors represent a major source of uncertainty affecting reconstruction accuracy, reproducibility, and forensic applicability.
Despite rapid methodological development, most deep learning-based facial reconstruction systems remain evaluated primarily in experimental or proof-of-concept settings. Large-scale comparative studies assessing reproducibility, cross-population generalizability, and forensic validity relative to conventional reconstruction methods remain limited. Consequently, current AI-assisted approaches should be interpreted primarily as probabilistic modelling tools rather than fully validated identification technologies.
At present, there is insufficient large-scale cross-population empirical evidence to conclude that deep learning-based facial reconstruction methods systematically outperform conventional expert-driven reconstruction approaches in forensic practice.
At present, no broad forensic consensus exists regarding the operational evidentiary validation of fully AI-driven facial reconstruction systems for autonomous forensic identification purposes.
Importantly, current AI-assisted forensic facial reconstruction technologies should be distinguished according to their stage of methodological maturity and forensic applicability. Many published systems remain proof-of-concept research models developed under controlled experimental conditions without independent external validation. A smaller subset of approaches has undergone limited experimental validation using morphometric or recognition-based evaluation protocols. However, only very few reconstruction methodologies have been incorporated into operational forensic workflows, and even these continue to function primarily as investigative support tools rather than fully validated identification systems suitable for autonomous evidentiary use.
These architectures differ in terms of generative strategy, interpretability, computational demands, and current levels of forensic validation. A comparative overview of these deep learning architectures is provided in
Table 1.
Importantly, current comparisons between deep learning architectures in forensic facial reconstruction remain methodologically limited due to the absence of standardized validation benchmarks, heterogeneous datasets, and the predominance of proof-of-concept studies rather than large-scale operational forensic evaluations.
4. Accuracy and Reproducibility Assessment
One of the most established quantitative approaches for evaluating the accuracy of facial reconstruction involves comparing 3D surface models of the reconstructed face with a reference CT-derived facial model derived from imaging data, most commonly computed tomography (CT). Computer-based facial reconstructions (CCFR) were geometrically compared with corresponding CT-derived facial models using CloudCompare
® v2.6.2 software (CloudCompare software project, Paris, France), which enables both numerical distance measurements between surfaces and visualization of deviations through color mapping. Within this framework, positive values indicated overestimation of the reconstructed surface relative to the actual anatomy, while negative values reflected underestimation. The results showed that approximately 63.2% to 73.67% of reconstructed surface points fell within ±2.5 mm of the reference model, with mean point-to-surface deviations ranging from around −1.66 mm to 0.33 mm across individuals. The largest discrepancies were consistently observed in anatomically complex regions of the face, particularly the periocular and midfacial areas. The eye region and cheeks were more often underestimated, whereas the chin and zygomatic regions were slightly overestimated. Importantly, the study did not restrict itself to geometric evaluation alone. An image similarity assessment using the Picasa
® 3.9 recognition tool (Google LLC, Mountain View, CA, USA) was also performed, in which reconstructed facial images were visually compared with ante-mortem photographs. The software successfully detected all input images as faces and correctly matched three out of four CT models and two out of four reconstructions to the corresponding individuals. These findings suggest that relatively small geometric deviations do not necessarily translate into proportional changes in recognition performance. They also reinforce the need to combine morphometric evaluation with recognition-based testing when validating forensic facial reconstruction methods. Nevertheless, recognition performance metrics alone do not establish the forensic validity or evidentiary admissibility of reconstructed images, particularly in contexts involving investigative or judicial decision-making [
26].
More broadly, assessing reconstruction accuracy remains a major methodological challenge in forensic craniofacial reconstruction. The process is inherently dependent on a series of assumptions about the relationship between cranial morphology and soft tissue structures, which vary considerably across populations and are not always predictable in a deterministic way. For this reason, accuracy is typically assessed using multiple complementary approaches, including 3D comparisons of cranial and facial models, quantitative distance-based analyses between reconstructed and reference faces, and visual comparisons with ante-mortem photographs. In some cases, recognition-based evaluations are also employed, whereby human observers or automated facial comparison systems assess visual similarity between reconstructed and reference images. Nevertheless, such approaches remain limited as proxies for forensic validity and should not be interpreted as definitive evidence of identification reliability.
Reproducibility represents a separate but closely related issue. Empirical studies show that different practitioners reconstructing the same skull may produce noticeably different facial outcomes. This variability stems largely from subjective interpretation of anatomical landmarks and the limitations of existing morphological standards. In response to this, AI-based approaches are increasingly proposed to improve standardization and reduce inter-operator variability by providing more consistent reconstruction outputs from identical skeletal inputs. Nevertheless, a fundamental limitation persists: bone-to-soft tissue relationships remain only partially understood, particularly with respect to population-specific variation. Features such as hair color, eye color, skin texture, and wrinkles are not directly encoded in skeletal structures, which further increases uncertainty. As a result, even with advanced imaging and statistical modelling, facial reconstruction inevitably retains a probabilistic character. While AI systems may enhance visual plausibility, increased visual plausibility does not necessarily correspond to increased anatomical or forensic accuracy. Generative models can occasionally introduce unrealistic or unsupported details—often described as “hallucinations”—which are not directly justified by underlying data. This is compounded by the limited interpretability of many deep learning systems, which makes it difficult to trace the origin of specific errors or quantify uncertainty in a transparent way [
27].
From a forensic governance perspective, the probabilistic nature of reconstruction outcomes highlights the importance of transparent uncertainty communication. Emerging discussions increasingly emphasize that reconstructed images should be accompanied by methodological documentation, demographic assumptions, and confidence limitations to avoid overinterpretation in investigative or judicial contexts.
Artificial intelligence-assisted forensic facial reconstruction increasingly requires formal uncertainty quantification frameworks capable of representing the probabilistic nature of skull-to-face inference. Probabilistic generative approaches, including Bayesian modelling strategies, latent-space probability distributions, and ensemble-based reconstruction pipelines, may enable estimation of confidence ranges associated with specific reconstructed facial regions or anatomical features [
28,
29,
30,
31]. Rather than generating a single deterministic facial approximation, such frameworks may support the generation of multiple probabilistic reconstruction variants reflecting the inherent uncertainty of craniofacial prediction [
29,
30].
In this context, uncertainty may be represented quantitatively through confidence intervals, probabilistic morphometric distributions, or spatial uncertainty mapping techniques highlighting regions of greater or lower reconstruction reliability [
29,
31]. Visualization approaches such as uncertainty heatmaps or probabilistic facial overlays may further assist forensic experts, investigators, and courts in distinguishing between relatively stable anatomical approximations and highly uncertain reconstructed features [
31]. However, standardized forensic protocols for uncertainty estimation, visualization, and probabilistic reporting remain largely undeveloped within current forensic facial reconstruction practice [
32].
Looking ahead, developments in AI may contribute to improved standardisation, for instance, through semi-automated landmark detection or large-scale comparative modelling. However, the extent to which these advances will meaningfully enhance reproducibility remains to be systematically empirically validated. A key limiting factor remains the availability and completeness of soft tissue datasets. Facial soft tissue thickness (FSTT) at anatomical landmarks varies substantially between individuals and populations, influenced by age, sex, body habitus, and ancestry. Since these variables directly affect reconstruction algorithms, incomplete or non-representative datasets can introduce systematic inaccuracies [
9,
33,
34].
An additional layer of complexity concerns the prediction of externally visible traits not determined by skeletal morphology. DNA phenotyping systems in forensic science, such as HIrisPlex-S, have been developed to infer characteristics like eye, hair, and skin color from genetic data. However, their performance varies considerably across traits. Reported AUC values range from 0.74 to 0.99 for eye color, 0.64 to 0.94 for hair color, and 0.72 to 0.99 for skin color, depending on the model used. In general, eye color prediction tends to achieve higher accuracy, whereas intermediate pigmentation categories remain more difficult to classify reliably. Reviews of forensic DNA phenotyping further indicate that hair color prediction often yields lower positive predictive values than eye color prediction, reflecting the complex polygenic architecture of pigmentation traits [
35]. These findings further illustrate that the prediction of externally visible traits remains probabilistic and trait-dependent, reinforcing the need for cautious forensic interpretation.
At present, few studies have evaluated whether reconstruction performance remains stable across independent datasets, reconstruction teams, imaging modalities, and algorithmic architectures, limiting formal assessment of reproducibility and external forensic validity.
5. Sources of Bias: Population Ancestry, Sex, and Training Data Imbalance
Population ancestry is one of the most significant sources of variation in facial soft tissue thickness (FSTT). Comparative studies, for example, between southeastern and central-western Brazilian samples, have demonstrated measurable regional differences. In these analyses, males showed greater variation across several midline and bilateral landmarks, whereas females showed fewer statistically significant differences. The authors suggest that population-specific FSTT tables can improve reconstruction outcomes, while also emphasizing that facial morphology is not determined solely by soft-tissue thickness. Features such as the eyes, lips, nose, and overall facial proportions also contribute significantly to inter-population differences [
11].
Similar findings have been reported in studies comparing Nigerian, South African, and African American male samples, in which differences were particularly evident in the lower facial region, particularly around the perioral and chin areas. Increased bilateral variability was also observed, further supporting the need for population-specific reference data in forensic reconstruction contexts [
35]. Likewise, research on the adult Turkish population has shown that average FSTT values tend to fall between those reported for Korean and European white populations, reinforcing the broader conclusion that soft tissue thickness cannot be reliably generalized across ethnic groups [
10].
Practical forensic observations align with these findings. It has been noted that applying reference values derived from African American populations to reconstruct the faces of Black children from South Africa leads to suboptimal outcomes. Such cases illustrate how the use of non-local or overly general datasets may propagate systematic bias within both conventional and AI-assisted reconstruction workflows, particularly when demographic representation in training data is limited, thereby reducing reconstruction reliability when population-specific standards are unavailable [
36].
Sex-based variation represents another important factor influencing FSTT. In Brazilian samples, males generally present thicker soft tissue across multiple anatomical landmarks, while females tend to show lower overall variability [
35]. Studies in Korean adult populations similarly indicate that sex differences are particularly evident in the upper and lower lip regions, with males exhibiting greater thickness across all skeletal classifications. These patterns support the use of sex- and skeletal class-specific reference values in forensic and clinical applications, including forensic art, anthropology, dentistry, and oral and maxillofacial surgery, although substantial inter-individual variability remains present within demographic groups [
34]. Although average differences between sexes are often modest—typically below 2.5 mm at individual landmarks—they are consistently observed across datasets, suggesting a stable pattern of sexual dimorphism in facial soft tissue distribution [
35].
Age-related changes further complicate reconstruction. Thickness at several anatomical landmarks, including the mid-philtrum, prosthion, and ectomolare2, tends to decrease with age, particularly in females, reflecting progressive soft tissue atrophy associated with facial aging processes [
30]. Despite the existence of multiple adult FSTT databases, reference material for pediatric and adolescent populations remains limited. Only a small number of studies have established dedicated datasets for younger individuals, resulting in a significant gap in forensic reference standards. Gibelli et al. explicitly highlight this limitation and emphasize the need to expand pediatric FSTT databases to support more population-appropriate reconstruction modelling in minors [
37].
Taken together, these age-, sex-, and population-related differences underscore the importance of a thorough anthropological assessment of skeletal remains prior to reconstruction, including determination of sex, age, and ancestry [
9,
10,
34]. More broadly, they demonstrate that demographic imbalance in available datasets remains a critical limitation in forensic facial reconstruction. Addressing this issue requires the systematic development of diverse, population-specific FSTT databases and the expansion of existing reference collections. Such efforts are essential not only for traditional reconstruction methods but also for AI-based approaches, which are highly dependent on representative training data to achieve robust and demographically generalizable reconstruction modelling.
From a practical standpoint, these sources of bias have direct implications for forensic work. Although facial reconstruction is primarily used as an investigative aid rather than a definitive identification tool, mismatches between demographic characteristics and reference datasets may reduce reconstruction reliability and increase the risk of misleading visual approximations [
38]. Methodological analyses consistently show that reconstruction reliability is strongly dependent on the appropriate selection of population-specific standards, while the use of generalized datasets increases the risk of systematic error [
38].
From a methodological perspective, mitigating demographic bias in forensic facial reconstruction may require standardized reporting of dataset composition, including population ancestry, age distribution, sex balance, and imaging acquisition methods. Increasingly, forensic AI discussions also emphasize the importance of documenting uncertainty ranges, demographic assumptions, and model training limitations to improve transparency and reduce the risk of overinterpretation.
As summarised in
Table 2, sources of error in forensic facial reconstruction are therefore not purely technical in nature. Rather, they emerge from the interaction between biological variability, limitations in available datasets, and uncertainty introduced by probabilistic modelling approaches themselves (
Table 2).
6. Legal and Ethical Perspectives on Digital Reconstructions
Legal and ethical assessment of AI-assisted forensic facial reconstruction requires a clear distinction between probabilistic reconstruction systems and automated facial recognition technologies used for biometric identification. Although both forensic reconstruction and biometric recognition attempt to associate evidence with individuals, they differ substantially in operational goals, interpretative frameworks, and evidentiary use [
39]. Forensic facial reconstruction is generally intended to generate investigative leads or support public appeals based on skeletal remains or limited visual evidence, whereas facial recognition systems are designed to establish identity through automated comparison against known image databases. Because these technologies differ substantially in purpose, evidentiary function, and operational use, their legal and ethical implications also differ significantly. Accordingly, reconstructed facial images used in public appeals should not be legally equated with automated biometric identification systems operating on one-to-many database matching.
Within the European Union and the United Kingdom, the legal regulation of biometric technologies is shaped primarily by the General Data Protection Regulation (GDPR). Article 9 of the GDPR restricts the processing of biometric data used for uniquely identifying individuals unless explicit consent or specific legal exemptions apply. However, reconstructed facial images occupy a more ambiguous legal position than conventional biometric templates, as they function primarily as probabilistic investigative approximations rather than direct automated identification tools. Existing forensic interpretation frameworks increasingly emphasize probabilistic reasoning and uncertainty assessment rather than binary identification conclusions [
39]. In practice, legal exemptions more commonly apply to facial recognition and biometric surveillance systems used in criminal investigations, whereas the regulatory status of AI-assisted forensic facial reconstructions used for investigative or public appeal purposes remains less clearly defined across jurisdictions [
40].
In April 2021, the European Commission introduced its broader strategy for regulating artificial intelligence through the proposed AI Act. Under this framework, AI systems used in law enforcement contexts are generally classified as “high-risk,” requiring safeguards related to accuracy, reliability, transparency, accountability, and human oversight [
40]. From a forensic perspective, these concerns are particularly important because reconstructed facial images represent probabilistic outputs generated through statistical and modelling assumptions rather than direct representations of identity. Consequently, transparency regarding dataset composition, demographic assumptions, model limitations, uncertainty ranges, and expert oversight may become important components of methodological and governance evaluation in investigative or judicial contexts. Existing forensic AI frameworks additionally emphasize explainability, accountability, transparency, and reliability as prerequisites for operational deployment in forensic environments [
25,
39]. Critics have additionally argued that existing EU policy proposals remain insufficiently specific regarding biometric technologies and do not yet provide standardized operational frameworks governing the forensic use of AI-assisted facial reconstruction systems [
41].
An additional challenge concerns the evidentiary admissibility of reconstructed facial images in judicial proceedings. Given the probabilistic and interpretative nature of forensic facial reconstruction, reconstructed images are more commonly discussed in the literature as investigative aids rather than standalone identification evidence. Discussions concerning admissibility frequently emphasize factors such as methodological transparency, reproducibility, known error rates, expert oversight, and communication of uncertainty limitations [
39,
42]. In jurisdictions such as the United States, admissibility of scientific and forensic evidence is frequently evaluated under standards derived from the Frye or Daubert doctrines, which place emphasis on methodological reliability, transparency, reproducibility, and scientifically grounded error assessment [
43,
44].
Within such frameworks, visually persuasive AI-generated reconstructions may present additional evidentiary concerns if their probabilistic nature, modelling assumptions, and methodological limitations are insufficiently communicated to courts or investigators. Importantly, visual similarity or recognition-based performance metrics alone should not be interpreted as direct indicators of identification reliability or courtroom evidentiary validity. Several reviewed forensic reconstruction systems demonstrated improved similarity-based performance metrics while simultaneously lacking sufficient forensic evaluation or explainability for judicial suitability [
39]. Consequently, the evidentiary threshold required for courtroom admissibility of AI-assisted facial reconstructions should be considered substantially higher than the threshold applicable to their use as investigative aids or public appeal instruments.
By contrast, the United States has adopted a more fragmented regulatory approach. Facial recognition and biometric surveillance technologies are widely used by law enforcement agencies across multiple operational contexts [
40,
45]. Privacy regulation largely operates at the state level, with California often cited as one of the jurisdictions providing stronger consumer data protection mechanisms. The California Privacy Rights Act expanded consumer rights regarding personal data collection and established the California Privacy Protection Agency. Nevertheless, these reforms did not substantially restrict law enforcement access to facial recognition technologies. At the same time, some municipalities, including Berkeley and San Francisco, have introduced local restrictions or bans on the use of facial recognition systems by public authorities [
40]. However, legal approaches specifically addressing AI-assisted forensic facial reconstruction remain comparatively underdeveloped.
Data protection and ethical governance remain central concerns in the implementation of AI-assisted forensic facial reconstruction. Since reconstructed images may be disseminated during investigations or public appeals, safeguards are necessary to limit unauthorized access, prevent misuse, and reduce the risk of misleading investigative conclusions. The increasing realism of generative reconstruction systems may also increase the likelihood that probabilistic facial approximations are interpreted by investigators, courts, or the public as definitive representations of identity. Photorealistic reconstructions may additionally introduce cognitive biases such as confirmation bias, anchoring effects, or contextual bias during investigative and judicial interpretation. Recent research on AI-generated image forensics notes that increasingly convincing synthetic imagery may undermine trust and complicate human interpretation of authenticity [
29]. This creates additional ethical concerns regarding reputational harm, stigmatization of suspects, and the dignity of unidentified deceased individuals and their families [
46].
Further ethical challenges emerge from controversial or poorly regulated applications of generative reconstruction technologies, including deepfake-like reconstructions, commercial use of facial data, and speculative historical reconstructions. While high-quality datasets and expert supervision may reduce some forms of error, current reconstruction methodologies remain subject to substantial uncertainty regarding predicted facial appearance. Several reconstruction approaches may unintentionally bias outputs toward the statistical properties of underlying reference models, potentially limiting forensic suitability [
39]. Emerging governance discussions therefore increasingly emphasize the need for standardized documentation protocols, reconstruction audit trails, uncertainty disclosure requirements, demographic transparency, and human expert review prior to operational or public deployment of AI-assisted forensic facial reconstructions. Responsible AI frameworks for forensic science additionally advocate governance structures emphasizing accountability, explainability, transparency, fairness, and documented oversight procedures [
40].
7. Toward Transparent and Interpretable AI Models
Interpretability and procedural transparency are critical requirements in AI-assisted forensic facial reconstruction (FFR), particularly because reconstruction outcomes remain probabilistic, inferential, and potentially influential in investigative and judicial contexts. Transparent analytical workflows enable forensic practitioners to critically evaluate how input data are processed, how reconstruction outputs are generated, and how uncertainty propagates throughout the reconstruction pipeline. These considerations are especially important given the increasing use of “black box” AI systems in forensic domains such as DNA mixture interpretation, facial recognition, and recidivism risk assessment tools, where algorithmic opacity has raised substantial concerns regarding transparency, reproducibility, evidentiary reliability, and independent scrutiny of computationally derived conclusions [
46]. In the context of forensic facial reconstruction, these concerns are amplified by the highly persuasive visual realism of contemporary generative AI systems, which may unintentionally encourage overinterpretation of probabilistic reconstructions despite unresolved methodological and biological limitations.
To mitigate these risks, principles derived from Explainable AI (XAI) should be integrated into the design of AI-assisted forensic reconstruction systems. XAI encompasses methods intended to improve human understanding of machine learning outputs and analytical decision pathways [
47]. In forensic reconstruction contexts, explainability mechanisms may assist experts in identifying modelling inconsistencies, evaluating uncertainty sources, and examining relationships between cranial morphology and reconstructed facial approximations. Potentially useful approaches include saliency or attention-map visualizations, uncertainty heatmaps, and model-traceability frameworks linking reconstruction outputs to specific anatomical or statistical parameters [
48]. However, the forensic utility of such approaches remains limited. Attention maps and related post hoc visualizations do not necessarily reflect the true internal computational logic of deep neural networks and may therefore provide only simplified or partially misleading representations of model behaviour [
49].
Importantly, many currently available XAI techniques provide only indirect or post hoc approximations of model behaviour rather than true mechanistic explanations of how generative systems produce specific reconstruction features [
48,
50,
51]. In complex architectures such as GANs and diffusion models, reconstructed facial characteristics emerge from high-dimensional latent-space interactions that often cannot be directly traced to anatomically interpretable decision pathways [
51,
52]. Consequently, attention maps, saliency visualizations, and feature-attribution techniques may create an appearance of interpretability without necessarily providing genuine forensic traceability or causal explanation of reconstruction outputs [
48,
50].
Moreover, latent-space generative processes remain particularly difficult to audit in forensic contexts because visually plausible outputs may result from statistical correlations learned during training rather than biologically meaningful craniofacial relationships [
48,
52]. This creates a significant risk of “false interpretability,” whereby AI-generated reconstructions appear scientifically transparent despite limited ability to independently verify, reproduce, or causally explain specific generated facial features [
50]. From a forensic perspective, such limitations raise important concerns regarding evidentiary reliability, expert scrutiny, and courtroom defensibility of AI-assisted reconstruction systems [
25,
48].
Beyond interpretability alone, future XAI-oriented forensic reconstruction systems may additionally incorporate uncertainty-aware modelling capable of explicitly communicating confidence estimates associated with reconstructed anatomical regions [
30,
32]. Such approaches could improve transparency by enabling experts and legal decision-makers to distinguish between relatively well-supported reconstruction features and regions characterized by high inferential uncertainty [
32]. Nevertheless, uncertainty visualization should not be interpreted as equivalent to forensic validation, as visually interpretable confidence representations may still fail to reflect the true evidentiary reliability of reconstruction outputs [
25,
53].
Importantly, explainability alone cannot resolve the broader methodological limitations associated with AI-assisted forensic reconstruction. Reconstruction outputs remain strongly dependent on the quality and representativeness of training datasets, demographic variability, skull–face correspondence assumptions, imaging heterogeneity, and subjective interpretative decisions made throughout the analytical workflow. Moreover, the increasing photorealism achievable by diffusion and GAN-based systems may mask anatomical inaccuracies or synthetic artefacts, thereby increasing the persuasive impact of reconstructed images despite unresolved uncertainty [
54,
55]. For this reason, visual realism should not be conflated with forensic accuracy or identification reliability.
Accordingly, future forensic AI governance frameworks should prioritize auditability, reproducibility, and explicit uncertainty disclosure. Reconstruction workflows should incorporate standardized documentation and reporting procedures allowing independent forensic review and evidentiary scrutiny. Such protocols should include dataset provenance records, demographic assumptions, model versioning, reconstruction parameters, uncertainty estimates, confidence reporting, and documentation of all expert interventions performed during the reconstruction process [
56]. Maintaining complete reconstruction audit trails may improve chain-of-custody integrity and facilitate independent assessment of analytical assumptions, methodological limitations, and reproducibility in adversarial legal settings. The allocation of responsibility for erroneous or misleading AI-assisted reconstructions remains legally and operationally unresolved, particularly in workflows involving commercial software, multidisciplinary expert teams, and partially automated decision pipelines. Additional safeguards could include mandatory disclosure that AI-generated reconstructions constitute investigative aids rather than biometric identification evidence, as well as independent expert review prior to operational or public release.
These considerations are particularly important because reconstructed facial approximations may have different legal implications depending on their intended use. Reconstructions disseminated publicly to generate investigative leads differ substantially from systems used for automated facial comparison or identity determination, both in terms of evidentiary function and applicable legal safeguards. AI-generated reconstructions intended for investigative dissemination may require different evidentiary safeguards than visual materials formally introduced as courtroom exhibits. AI-assisted forensic facial reconstruction should therefore continue to be understood primarily as an investigative support methodology intended to generate reconstructive hypotheses and assist investigative prioritization rather than as an autonomous identification system or a substitute for corroborating forensic evidence.
From a legal perspective, the admissibility of AI-assisted reconstructions may ultimately depend on whether reconstruction methodologies satisfy broader standards of scientific reliability, transparency, reproducibility, and expert interpretability applied within individual jurisdictions. Courts may additionally face challenges associated with the potentially prejudicial or overly persuasive character of photorealistic AI-generated reconstructions, particularly when uncertainty limitations are insufficiently disclosed. Consequently, transparent reporting of methodological assumptions, uncertainty ranges, dataset limitations, and model constraints should be regarded as essential prerequisites for responsible forensic and judicial use. Accordingly, fully autonomous AI-driven forensic facial reconstruction without meaningful human expert oversight should presently be regarded as incompatible with responsible forensic governance and evidentiary best practices.
Future research should therefore focus not only on improving visual realism, but also on developing probabilistic reconstruction frameworks, uncertainty quantification methods, cross-population validation studies, benchmark forensic datasets, and standardized evaluation protocols capable of supporting scientifically robust and legally defensible forensic applications of AI-assisted facial reconstruction [
51,
52].
Accordingly, explainability mechanisms in forensic AI should be interpreted primarily as tools supporting partial procedural transparency and expert review rather than as guarantees of scientific validity, evidentiary reliability, causal interpretability, or forensic admissibility [
51,
52].
8. Future Directions
8.1. Hybrid Expert–AI Frameworks and Standardization
In recent years, research on facial reconstruction using deep learning has increasingly highlighted the importance of combining artificial intelligence with domain expertise, particularly that of anthropologists and forensic practitioners. Generative models, including those based on GAN architectures or domain translation techniques (e.g., skull-to-face mapping), are now capable of generating visually plausible reconstruction outputs. However, despite these advances, their outputs still require substantive evaluation and correction by specialists [
56]. Experimental studies indicate that GAN-based systems and cyclic translation models (such as CycleGAN) can generate visually realistic facial approximations from skull-derived input data; however, these outputs remain probabilistic reconstructions rather than validated representations of true facial appearance. Consequently, such reconstructions must be interpreted cautiously and within the context of anatomical, anthropological, and biological expertise, which makes expert involvement indispensable [
57]. Machine learning models developed in 2025 further extend these capabilities by enabling probabilistic estimation of selected facial dimensions from dental data and skeletal morphology. This supports anthropologists in rapidly generating multiple facial variants, effectively accelerating the initial stages of reconstruction [
58]. The availability of automatically generated variants allows experts to focus on evaluating and refining the most plausible reconstructions, thereby supporting expert evaluation and facilitating comparative assessment of multiple reconstruction hypotheses [
58]. Moreover, combining computational predictions with biological and clinical knowledge supports the analysis of anatomical relationships that may be difficult to detect by either human observers or AI systems alone. This underscores the complementary nature of hybrid approaches, where neither component is sufficient in isolation [
57]. As a result, a hybrid framework—where AI generates candidate reconstructions and experts assess their anatomical and biological plausibility—is becoming increasingly central to the development of effective reconstruction tools. Within this perspective, AI systems in forensic medicine should be treated primarily as decision-support tools rather than replacements for expert judgment [
59].
8.2. Standardization: The Need for Consistent Protocols
A significant limitation in applying AI to facial reconstruction and forensic science more broadly is the lack of standardization. The literature consistently emphasizes that there are currently no universally accepted evaluation protocols or minimal methodological requirements for assessing reconstruction quality and evidentiary value. While AI is increasingly used as a support tool, its forensic applicability depends on the establishment of validation frameworks that support scientific reliability and improve legal defensibility [
59]. This issue is closely related to the fragmentation of benchmarks and evaluation procedures. Different studies rely on diverse datasets, metrics, and training strategies, making it difficult to compare results and identify methodologically robust and reproducible approaches. Similar challenges have been reported in related fields, such as AI-generated content detection, where inconsistencies in evaluation protocols limit comparability and slow methodological progress [
60]. Attempts to address these challenges can be observed in adjacent areas of forensic AI. For instance, integrated platforms such as ForensicHub have been proposed to standardize datasets, models, and evaluation metrics, thereby enabling more consistent comparisons between methods. Although such frameworks are not specifically designed for facial reconstruction, they illustrate a broader trend toward interoperability and standardization in forensic research [
60]. In addition to standardization, the development of clear validation and interpretability protocols remains essential. Evaluation should not be limited to predictive accuracy but should also include transparency, robustness, and the extent to which expert users can understand and critically assess AI-generated outputs. This is particularly important in forensic contexts, where the credibility of results depends on both performance and interpretability [
59].
Future standardization efforts should additionally establish harmonized validation frameworks incorporating benchmark datasets, minimum demographic reporting standards, cross-population testing procedures, standardized morphometric error metrics, uncertainty scoring protocols, and reproducibility assessments across independent reconstruction teams. The development of such technical validation protocols may improve inter-study comparability, facilitate independent forensic evaluation, and support more robust assessment of the evidentiary reliability of AI-assisted facial reconstruction systems.
8.3. Development of Global Datasets and the Challenge of Diversity
Another key issue concerns the limited diversity of datasets used in facial reconstruction research. Studies in computer vision consistently show that model performance and generalization are strongly influenced by the representativeness of the training data, especially with respect to ethnicity, age, and population characteristics. Many commonly used datasets remain dominated by specific groups, often consisting primarily of White individuals from the United States and the United Kingdom, and lack sufficient demographic annotation [
61]. Addressing this imbalance requires developing more representative datasets that capture a wider range of human variation. Projects such as 3D2M aim to provide 3D mesh datasets covering dozens of ethnic groups, thereby supporting more inclusive research in facial reconstruction and analysis [
62]. Such initiatives may help reduce demographic bias and improve model generalizability across populations in forensic contexts. The importance of diversity is also evident in applications such as age estimation and facial classification, where imbalanced datasets can lead to systematic errors. Research on demographic balancing demonstrates that incorporating age- and ethnicity-diverse data may improve demographic robustness and support broader generalization across datasets [
63]. In this context, synthetic data generation has emerged as an additional strategy. Models such as StyleGAN2 enable the creation of demographically balanced datasets that can complement real-world data and partially mitigate existing demographic imbalances [
64]. Furthermore, age-structured datasets—covering different stages of life—allow for the analysis of aging processes and support applications such as age progression in facial reconstruction [
65].
8.4. Integration with Other Techniques
Future developments in facial reconstruction are likely to depend on integrating AI with complementary analytical and measurement techniques. One important direction involves combining deep learning with 3D morphometric approaches, which provide anatomically grounded representations of craniofacial structures. The use of statistical shape models and landmark-based methods allows for more anatomically constrained prediction of soft-tissue features while maintaining anatomical consistency [
39]. Tools such as FaceDig facilitate this process by automating the placement of morphometric landmarks, thereby increasing both efficiency and reproducibility [
39,
66]. Advances in data acquisition technologies also play a significant role. High-resolution 3D scanning and LiDAR-based methods enable precise capture of skull geometry, thereby improving input data quality and potentially reducing selected sources of reconstruction uncertainty [
39]. Another promising area involves the incorporation of biomechanical models of soft tissues. Approaches such as PhysSFI-Net simulate the behavior of muscles and skin, supporting anatomically informed modelling of soft-tissue deformation and facial surface reconstruction. These methods are particularly relevant in applications requiring high anatomical precision, such as surgical planning [
67]. Increasingly, these techniques are combined within multimodal frameworks that integrate geometric, imaging, and biological data. Such approaches may improve the stability, interpretability, and demographic robustness of reconstruction modelling across populations and are widely regarded as a key direction for future development [
39,
67].
8.5. The Potential of Future Forensic Medicine
The ongoing development of AI-based reconstruction systems, particularly those capable of integrating multiple data modalities, may support faster generation and comparative assessment of investigative reconstruction hypotheses. Emerging approaches combine craniofacial morphology with genetic data through forensic DNA phenotyping, enabling the prediction of visible traits such as eye color, skin pigmentation, and age, although such predictions remain probabilistic and trait-dependent [
68]. Frameworks such as SPOT-Face further extend these capabilities by enabling comparisons between skull and facial features using neural networks and optimal transport methods, representing a step toward more automated comparative analysis workflows within forensic investigations. However, despite these advances, careful validation and expert oversight remain essential, especially in cases involving the prediction of phenotypic traits from genetic information [
69]. Overall, future progress in forensic facial reconstruction will depend on integrating AI with expert knowledge while maintaining transparency, uncertainty disclosure, and human expert oversight, alongside the development of standardized methodologies, diverse datasets, and multimodal analytical frameworks. Only through such a combined approach can these systems achieve both scientific reliability and practical applicability in forensic contexts.
9. Discussion
This review demonstrates that forensic facial reconstruction (FFR) has undergone a fundamental transition from expert-driven, anatomically informed approximation toward data-driven computational modeling integrating deep learning, 3D morphometrics, and multimodal data sources. Although this shift has clearly improved visual realism and enabled greater scalability, it has not eliminated the core inferential limitations associated with reconstructing facial appearance from skeletal remains [
13,
16,
39].
A central issue emerging from this analysis concerns the non-deterministic relationship between cranial morphology and facial appearance. Deep learning architectures—including convolutional neural networks, generative adversarial networks, variational autoencoders, and diffusion models—learn statistical associations rather than biologically causal relationships. Consequently, reconstructed faces should not be interpreted as definitive representations of an individual’s appearance, but rather as probabilistic estimates shaped by the structure and limitations of the training data. This distinction becomes especially important in cases where traits are only weakly, or not at all, encoded in skeletal structures, where the margin of uncertainty is inherently higher [
17,
22,
51].
At the same time, it is important to note that improvements in visual fidelity do not automatically translate into increased forensic utility. Quantitative geometric accuracy—typically evaluated using surface deviation metrics against CT-derived ground truth models—offers valuable technical insight, yet it does not fully capture how reconstructions perform in practice. Recognition-based studies suggest that even relatively minor geometric discrepancies can meaningfully influence identification outcomes. In this sense, the forensic value of facial reconstruction emerges from a more complex interplay between morphometric precision, perceptual recognizability, and the interpretive context in which the image is used, highlighting the need for multi-layered validation frameworks [
26,
27].
The review also indicates that dataset composition remains the primary source of error in AI-assisted reconstruction. Differences in facial soft tissue thickness across ancestry, sex, and age introduce systematic biases when models are trained on datasets that lack sufficient representation. These biases are not incidental; rather, they become embedded in the model’s behavior and can lead to consistent distortions when applied to underrepresented groups. As a result, further progress in reconstruction reliability may depend less on increasing algorithmic sophistication and more on improving the diversity and completeness of training data [
26,
29,
61].
A related concern involves reproducibility and methodological transparency. Traditional reconstruction methods are already subject to inter-expert variability, but the introduction of deep learning adds another layer of complexity due to limited interpretability. The “black box” nature of many generative models makes it difficult to trace errors or justify specific outputs, raising questions about evidentiary reliability in forensic settings. In this context, the development and integration of explainable AI (XAI) techniques—such as attention mapping or interpretable feature attribution—becomes particularly important for ensuring auditability and maintaining expert oversight [
24,
47,
51].
Legal and ethical considerations further complicate the practical use of AI-driven facial reconstruction. While existing regulatory frameworks, including GDPR and emerging AI legislation, classify such systems as high-risk, they do not yet provide sufficiently detailed, domain-specific guidelines for validation, accountability, or admissibility in forensic contexts. Questions related to biometric data protection, the risk of misidentification, and the public dissemination of reconstructed images, therefore, remain only partially addressed and require more clearly defined standards and protocols [
9,
34,
39].
Taken together, these findings support a hybrid operational model in which AI systems are used as decision-support tools rather than fully autonomous reconstruction engines. Within such a framework, generative models can be understood as tools for generating a range of plausible facial hypotheses, which forensic experts then critically assess and refine. This approach helps to limit the impact of algorithmic bias, reduces the risk of overinterpreting model outputs, and preserves the central role of expert judgment in evaluating anatomical plausibility and uncertainty [
40,
58,
59].
Looking ahead, further advances in forensic facial reconstruction are likely to depend on integrative approaches rather than purely algorithmic improvements. Combining deep learning with 3D morphometrics, biomechanical modeling of soft tissues, and forensic DNA phenotyping may offer more biologically grounded reconstructions. Even so, these methods remain probabilistic and vary in predictive reliability depending on the trait in question, underscoring the importance of explicitly modeling and communicating uncertainty throughout the reconstruction process [
28,
39,
67].
Future forensic validation frameworks may therefore require standardized probabilistic reporting protocols and quantitative uncertainty estimation procedures analogous to confidence reporting practices used in other forensic and biomedical domains [
27,
39,
56].
Collectively, the reviewed AI architectures reveal a consistent trade-off between visual realism, interpretability, computational complexity, and forensic reliability. CNN-based approaches generally provide greater procedural stability and more interpretable feature extraction, but remain limited in generative flexibility. GANs and diffusion models produce substantially more photorealistic reconstructions; however, their outputs are more vulnerable to hallucinated features, reduced traceability, and limited biological interpretability [
63,
64]. VAEs occupy an intermediate position by enabling probabilistic modeling of facial variability, although often at the cost of lower visual fidelity. Importantly, no currently available architecture simultaneously satisfies the key forensic requirements of reproducibility, demographic robustness, interpretability, and large-scale external validation. Consequently, the current forensic value of AI-assisted facial reconstruction may depend less on maximizing visual realism and more on developing transparent hybrid frameworks capable of integrating probabilistic modeling with expert-driven anatomical evaluation [
17,
22,
51,
58].
10. Conclusions
Forensic facial reconstruction currently operates within a constrained inferential framework defined by incomplete biological observability, demographic imbalance in training data, and limited model interpretability. Advances in deep learning have significantly improved visual fidelity and operational efficiency; however, they have not eliminated the fundamental epistemic uncertainty underlying the mapping between skeletal remains and facial appearance. Future progress will therefore depend less on increasing algorithmic complexity and more on establishing standardized, demographically balanced datasets, transparent validation protocols, and hybrid expert–AI frameworks that explicitly model and communicate uncertainty.