Forensic Facial Reconstruction in the Age of Deep Learning: Accuracy, Bias, and Future Perspectives

Bąk, Bartłomiej; Bąk, Dawid; Osińska, Aleksandra; Bednarz, Michał; Banaszek, Jakub; Baj, Jacek; Forma, Alicja; Zembala, Patryk; Teresiński, Grzegorz

doi:10.3390/app16125814

Open AccessReview

Forensic Facial Reconstruction in the Age of Deep Learning: Accuracy, Bias, and Future Perspectives

by

Bartłomiej Bąk

¹

,

Dawid Bąk

¹

,

Aleksandra Osińska

¹

,

Michał Bednarz

¹

,

Jakub Banaszek

¹

,

Jacek Baj

²

,

Alicja Forma

^3,*

,

Patryk Zembala

⁴ and

Grzegorz Teresiński

³

¹

Student Scientific Society of Forensic Medicine, Medical University of Lublin, Jaczewskiego 8b, 20-090 Lublin, Poland

²

Department of Correct, Clinical, and Imaging Anatomy, Medical University of Lublin, Jaczewskiego 4, 20-090 Lublin, Poland

³

Chair and Department of Forensic Medicine, Medical University of Lublin, Jaczewskiego 8b, 20-090 Lublin, Poland

⁴

Multispecialist Hospital, Karola Szymanowskiego 11, 27-400 Ostrowiec Świętokrzyski, Poland

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(12), 5814; https://doi.org/10.3390/app16125814 (registering DOI)

Submission received: 4 May 2026 / Revised: 4 June 2026 / Accepted: 7 June 2026 / Published: 9 June 2026

(This article belongs to the Special Issue Digital Innovations in Healthcare—2nd Edition)

Download

Browse Figure

Versions Notes

Abstract

The following narrative review discusses the use of deep learning and 3D modeling in facial reconstruction from skeletal remains, focusing on accuracy, algorithmic bias, and evidential reliability. Forensic facial reconstruction (FFR) is a multidisciplinary field combining anthropology, medicine, and visual sciences to approximate the facial appearance of unidentified individuals from skeletal remains. Traditional manual methods, based on anatomical knowledge and facial soft tissue thickness (FSTT) measurements, are limited by subjectivity, labor intensity, and inter-expert variability. This narrative review summarizes contemporary AI-assisted approaches, with emphasis on convolutional neural networks (CNNs), generative adversarial networks (GANs), variational autoencoders (VAEs), and diffusion models, which enable probabilistic prediction of facial morphology while accounting for demographic variables such as sex, age, and population ancestry. Key challenges affecting reconstruction accuracy—including dataset limitations, population-specific variability, and algorithmic bias—are discussed, alongside quantitative validation methods and concerns regarding model transparency. Legal and ethical considerations, such as privacy, biometric data protection, and the need for explainable AI (XAI) frameworks, are highlighted. Future perspectives include hybrid expert–AI workflows, the development of globally representative datasets, and the integration of multimodal data sources, including DNA phenotyping, 3D morphometrics, and biomechanical modeling. These advances aim to create standardized, interpretable, and biologically informed frameworks that enable AI to support expert judgment and enhance the reliability of forensic facial reconstructions.

Keywords:

forensic facial reconstruction; deep learning; 3D modeling; artificial intelligence; reconstruction accuracy; algorithmic bias; forensic ethics; future perspectives

1. Introduction

Facial reconstruction has a long history of combining scientific methods with artistic interpretation. At the end of the nineteenth century, pioneers like Hermann Welcker and Wilhelm His experimented with approximating facial appearance from skeletal remains, using clay or wax on bone casts guided by anatomical measurements and early soft tissue data [1]. This work laid the foundation for forensic facial reconstruction [2]. Over time, the field shifted from artistic representations to more systematic, anatomically based methods. Mikhail Gerasimov was a key figure in this transition, and his mid-twentieth-century work introduced reconstruction methods based on detailed anatomical studies [3]. His technique involved reconstructing facial muscles and soft tissues directly on the skull while respecting established anatomical relationships and skeletal landmarks, providing a more anatomically structured and comparatively reproducible framework for facial reconstruction [4]. Despite these advances, traditional manual reconstruction remained labor-intensive, highly dependent on professional expertise, and prone to substantial variation between reconstructions [5]. The overall workflow of AI-assisted forensic facial reconstruction is illustrated in Figure 1.

Digital techniques such as computer-aided design (CAD), three-dimensional (3D) scanning, and photogrammetry enabled high-resolution digital capture of skull morphology and soft-tissue locations. These tools have enabled researchers to create reproducible, manipulable 3D images of facial structures, reducing some of the subjectivity inherent in manual techniques and enabling quantitative analysis of the reconstruction process [6,7]. Among the most important subsequent developments was the integration of craniometric-based statistical modeling approaches, including principal component analysis. These methods enabled more population-sensitive modelling of facial soft tissue thickness and craniofacial geometry [7,8]. At the same time, population-specific databases of facial soft tissue thickness (FSTT) were developed. Detailed measurements accounting for demographic variables such as age, sex, and body mass index are now available for several populations, including adult males from Nigeria, Turkey, Brazil, and Germany [8,9,10,11]. A major recent development has been the integration of artificial intelligence (AI) and machine learning to generate increasingly dense soft-tissue datasets. This supports probabilistic modelling approaches that predict facial contours even in cases involving incomplete skeletal remains [12].

Early applications of AI in forensic science primarily focused on pattern recognition and classification tasks. However, as deep learning techniques have become more advanced, researchers have increasingly explored how neural networks can model complex relationships between cranial morphology and facial features. Convolutional neural networks can model statistical associations between cranial morphology and facial traits based on training data [13]. Recent pipelines combine 3D skull models, FSTT data, and algorithmic surface optimization to generate visually plausible facial contours [14]. Moreover, deep learning models can incorporate demographic metadata, allowing reconstructions to account for variation related to age, sex, or population background and enabling the generation of multiple phenotypic hypotheses under conditions of uncertainty. This capability may support forensic investigations by generating multiple reconstruction hypotheses in cases involving unidentified remains; however, such reconstructions are generally intended as investigative support tools rather than definitive identification methods [15]. In addition, the continued expansion of international databases of facial soft-tissue thickness increases the applicability of AI-based reconstruction methods, especially in cases where skeletal preservation or demographic information is limited [16].

Recent developments in AI have expanded the methodological capabilities of forensic facial reconstruction, particularly through the use of generative models such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion Models. These models allow researchers to produce visually detailed facial images and explore a wide range of possible reconstructions. GANs can generate detailed facial textures and simulate variations in soft tissue, while VAEs provide probabilistic reconstructions that reflect natural variability in tissue distribution. Diffusion Models, representing the newest generation of generative algorithms, may improve visual image fidelity in generative imaging tasks; however, improved visual realism does not necessarily correspond to improved anatomical accuracy or forensic validity, and their applicability in forensic craniofacial reconstruction remains under active investigation [17,18]. The introduction of these generative techniques has expanded the methodological possibilities of facial reconstruction by automating processes and enabling the use of large datasets for model training. This approach also enables rapid reconstructions even when skeletal material is incomplete, which has important implications for forensic investigations. AI-assisted reconstructions may assist investigative processes involving unidentified remains, may contribute to investigative processes in unresolved cases, and facilitate investigative communication with law enforcement agencies and the public. By combining computational power, demographic information, and generative modeling techniques, AI-assisted approaches may improve the scalability and reproducibility of forensic facial reconstruction workflows while enabling the generation of probabilistic reconstruction hypotheses. Nevertheless, these systems remain dependent on dataset representativeness, methodological transparency, and further large-scale empirical validation before their forensic reliability can be fully established [17,19].

Importantly, forensic facial reconstruction should be distinguished from automated facial recognition or biometric identification systems. Reconstruction methods are primarily intended to generate investigative leads or visual approximations based on skeletal morphology, rather than to establish identity through automated matching procedures. Consequently, reconstructed facial images should be interpreted as probabilistic investigative aids rather than deterministic representations suitable for direct biometric identification.

At present, the principal challenges facing AI-assisted forensic facial reconstruction concern the limited interpretability of generative models, demographic imbalance in training datasets, uncertainty in craniofacial-to-soft tissue relationships, and the absence of standardized validation and governance frameworks. These limitations raise important methodological, ethical, and legal questions regarding the evidentiary interpretation and operational use of reconstructed facial images in forensic contexts.

Accordingly, this review critically evaluates not only recent technological advances but also the evidentiary, ethical, and governance-related limitations that currently constrain the forensic applicability of AI-assisted facial reconstruction systems.

2. Methodology of the Narrative Review

This study was conducted as a narrative review aimed at critically evaluating current developments, limitations, and future perspectives of artificial intelligence-assisted forensic facial reconstruction. Literature searches were performed in PubMed, Scopus, Web of Science, and Google Scholar between January and March 2026 using combinations of terms related to forensic facial reconstruction, craniofacial reconstruction, facial soft tissue thickness, artificial intelligence, machine learning, deep learning, generative models (including GANs, VAEs, and diffusion models), explainable AI, forensic imaging, and forensic biometrics. Additional relevant studies were identified through manual screening of reference lists from key publications.

The review included both historical and contemporary publications relevant to forensic facial reconstruction and AI-assisted craniofacial modelling, with particular emphasis on recent advances in generative artificial intelligence, explainable AI, and forensic imaging methodologies. Peer-reviewed forensic, anthropological, medical imaging, and computer vision studies were prioritized, while selected preprints and technical reports were additionally considered in areas where rapidly evolving AI methodologies remain underrepresented in the peer-reviewed forensic literature.

Given the rapidly evolving, interdisciplinary, and methodologically heterogeneous nature of AI-assisted forensic facial reconstruction research, a narrative review approach was considered more appropriate than a formal systematic review. This approach enabled broader critical discussion of methodological limitations, forensic applicability, interpretability challenges, demographic bias, ethical considerations, and governance issues associated with AI-assisted forensic facial reconstruction. Nevertheless, the authors acknowledge that narrative reviews remain inherently selective and may not comprehensively capture all emerging developments within this rapidly evolving field.

3. Deep Learning Models in Forensic Visualization

Deep learning has introduced new computational approaches to forensic facial reconstruction by modeling the relationships between skull structure and facial soft tissues. Early CNNs allowed researchers to identify statistical patterns within skeletal imaging data. Later models added demographic information, including age, sex, and population background, to improve population-sensitive modelling [13,15]. CNN-based approaches model statistical associations between cranial morphology and facial soft-tissue distributions. They infer statistical relationships between cranial landmarks and soft-tissue characteristics based on training datasets. Including population-specific data may support more demographically informed reconstruction hypotheses. This approach enables automatic predictions while accounting for natural variation [13]. Generative adversarial networks (GANs) generate visually plausible facial images with detailed textures and varying soft-tissue patterns. They can produce multiple plausible reconstructions from the same skeleton, which is valuable when the material is incomplete or damaged [20]. Variational autoencoders (VAEs) support GANs by providing probabilistic reconstructions. They capture natural variability and may support anatomically coherent soft-tissue approximations, even with limited data [21]. Diffusion models represent the newest class of generative algorithms. They produce high-resolution and visually detailed facial representations and allow exploration of many facial variations. These models may provide improved image fidelity in generative imaging tasks; however, direct comparative validation of their forensic reconstruction performance relative to conventional methods remains limited [22]. All methods use facial soft tissue and anthropometric measurements. With population-specific data, they can simulate faces of different sexes, ages, and ethnicities. This supports semi-automated generation of 2D and 3D reconstruction hypotheses, facilitates simulation of facial variability, and may assist in modelling demographic variation, soft-tissue distributions, and selected craniofacial characteristics. These approaches are currently applied primarily in research-oriented, experimental, and emerging forensic workflows. They may support investigative efforts involving unidentified remains, contribute to unresolved case investigations, and facilitate visual communication for law enforcement and public appeals. By combining computational modelling, demographic metadata, and generative imaging techniques, AI-assisted approaches may facilitate semi-automated generation of multiple reconstruction hypotheses and support exploratory forensic workflows. Nevertheless, the forensic reliability, reproducibility, and cross-population generalizability of these systems remain insufficiently validated across large and demographically diverse populations [14].

Beyond demographic imbalance, the forensic reliability of AI-assisted facial reconstruction is strongly influenced by dataset quality and annotation consistency. Existing craniofacial datasets frequently originate from heterogeneous imaging modalities, including CT, CBCT, MRI, photogrammetry, and surface 3D scans, each characterized by different spatial resolutions, soft-tissue contrast properties, and acquisition artifacts [23,24]. Such heterogeneity complicates standardization and may reduce cross-dataset generalizability of trained models. An additional challenge concerns skull–face registration accuracy, as precise spatial alignment between skeletal structures and corresponding facial surfaces remains technically difficult and highly sensitive to landmark selection, segmentation quality, and image preprocessing protocols [23,25].

Variability in facial soft tissue thickness acquisition methods further contributes to inconsistency, since FSTT measurements may differ depending on imaging modality, observer methodology, anatomical landmark definitions, and population sampling procedures [9,10,12]. Importantly, the field also remains constrained by the scarcity of large-scale paired skull–face datasets containing reliably matched cranial and ante-mortem facial data [24]. This limitation substantially restricts external validation, increases the risk of overfitting, and limits the forensic generalizability of current deep learning models. Collectively, these dataset-related factors represent a major source of uncertainty affecting reconstruction accuracy, reproducibility, and forensic applicability.

Despite rapid methodological development, most deep learning-based facial reconstruction systems remain evaluated primarily in experimental or proof-of-concept settings. Large-scale comparative studies assessing reproducibility, cross-population generalizability, and forensic validity relative to conventional reconstruction methods remain limited. Consequently, current AI-assisted approaches should be interpreted primarily as probabilistic modelling tools rather than fully validated identification technologies.

At present, there is insufficient large-scale cross-population empirical evidence to conclude that deep learning-based facial reconstruction methods systematically outperform conventional expert-driven reconstruction approaches in forensic practice.

At present, no broad forensic consensus exists regarding the operational evidentiary validation of fully AI-driven facial reconstruction systems for autonomous forensic identification purposes.

Importantly, current AI-assisted forensic facial reconstruction technologies should be distinguished according to their stage of methodological maturity and forensic applicability. Many published systems remain proof-of-concept research models developed under controlled experimental conditions without independent external validation. A smaller subset of approaches has undergone limited experimental validation using morphometric or recognition-based evaluation protocols. However, only very few reconstruction methodologies have been incorporated into operational forensic workflows, and even these continue to function primarily as investigative support tools rather than fully validated identification systems suitable for autonomous evidentiary use.

These architectures differ in terms of generative strategy, interpretability, computational demands, and current levels of forensic validation. A comparative overview of these deep learning architectures is provided in Table 1.

Importantly, current comparisons between deep learning architectures in forensic facial reconstruction remain methodologically limited due to the absence of standardized validation benchmarks, heterogeneous datasets, and the predominance of proof-of-concept studies rather than large-scale operational forensic evaluations.

4. Accuracy and Reproducibility Assessment

One of the most established quantitative approaches for evaluating the accuracy of facial reconstruction involves comparing 3D surface models of the reconstructed face with a reference CT-derived facial model derived from imaging data, most commonly computed tomography (CT). Computer-based facial reconstructions (CCFR) were geometrically compared with corresponding CT-derived facial models using CloudCompare^® v2.6.2 software (CloudCompare software project, Paris, France), which enables both numerical distance measurements between surfaces and visualization of deviations through color mapping. Within this framework, positive values indicated overestimation of the reconstructed surface relative to the actual anatomy, while negative values reflected underestimation. The results showed that approximately 63.2% to 73.67% of reconstructed surface points fell within ±2.5 mm of the reference model, with mean point-to-surface deviations ranging from around −1.66 mm to 0.33 mm across individuals. The largest discrepancies were consistently observed in anatomically complex regions of the face, particularly the periocular and midfacial areas. The eye region and cheeks were more often underestimated, whereas the chin and zygomatic regions were slightly overestimated. Importantly, the study did not restrict itself to geometric evaluation alone. An image similarity assessment using the Picasa^® 3.9 recognition tool (Google LLC, Mountain View, CA, USA) was also performed, in which reconstructed facial images were visually compared with ante-mortem photographs. The software successfully detected all input images as faces and correctly matched three out of four CT models and two out of four reconstructions to the corresponding individuals. These findings suggest that relatively small geometric deviations do not necessarily translate into proportional changes in recognition performance. They also reinforce the need to combine morphometric evaluation with recognition-based testing when validating forensic facial reconstruction methods. Nevertheless, recognition performance metrics alone do not establish the forensic validity or evidentiary admissibility of reconstructed images, particularly in contexts involving investigative or judicial decision-making [26].

More broadly, assessing reconstruction accuracy remains a major methodological challenge in forensic craniofacial reconstruction. The process is inherently dependent on a series of assumptions about the relationship between cranial morphology and soft tissue structures, which vary considerably across populations and are not always predictable in a deterministic way. For this reason, accuracy is typically assessed using multiple complementary approaches, including 3D comparisons of cranial and facial models, quantitative distance-based analyses between reconstructed and reference faces, and visual comparisons with ante-mortem photographs. In some cases, recognition-based evaluations are also employed, whereby human observers or automated facial comparison systems assess visual similarity between reconstructed and reference images. Nevertheless, such approaches remain limited as proxies for forensic validity and should not be interpreted as definitive evidence of identification reliability.

Reproducibility represents a separate but closely related issue. Empirical studies show that different practitioners reconstructing the same skull may produce noticeably different facial outcomes. This variability stems largely from subjective interpretation of anatomical landmarks and the limitations of existing morphological standards. In response to this, AI-based approaches are increasingly proposed to improve standardization and reduce inter-operator variability by providing more consistent reconstruction outputs from identical skeletal inputs. Nevertheless, a fundamental limitation persists: bone-to-soft tissue relationships remain only partially understood, particularly with respect to population-specific variation. Features such as hair color, eye color, skin texture, and wrinkles are not directly encoded in skeletal structures, which further increases uncertainty. As a result, even with advanced imaging and statistical modelling, facial reconstruction inevitably retains a probabilistic character. While AI systems may enhance visual plausibility, increased visual plausibility does not necessarily correspond to increased anatomical or forensic accuracy. Generative models can occasionally introduce unrealistic or unsupported details—often described as “hallucinations”—which are not directly justified by underlying data. This is compounded by the limited interpretability of many deep learning systems, which makes it difficult to trace the origin of specific errors or quantify uncertainty in a transparent way [27].

From a forensic governance perspective, the probabilistic nature of reconstruction outcomes highlights the importance of transparent uncertainty communication. Emerging discussions increasingly emphasize that reconstructed images should be accompanied by methodological documentation, demographic assumptions, and confidence limitations to avoid overinterpretation in investigative or judicial contexts.

Artificial intelligence-assisted forensic facial reconstruction increasingly requires formal uncertainty quantification frameworks capable of representing the probabilistic nature of skull-to-face inference. Probabilistic generative approaches, including Bayesian modelling strategies, latent-space probability distributions, and ensemble-based reconstruction pipelines, may enable estimation of confidence ranges associated with specific reconstructed facial regions or anatomical features [28,29,30,31]. Rather than generating a single deterministic facial approximation, such frameworks may support the generation of multiple probabilistic reconstruction variants reflecting the inherent uncertainty of craniofacial prediction [29,30].

In this context, uncertainty may be represented quantitatively through confidence intervals, probabilistic morphometric distributions, or spatial uncertainty mapping techniques highlighting regions of greater or lower reconstruction reliability [29,31]. Visualization approaches such as uncertainty heatmaps or probabilistic facial overlays may further assist forensic experts, investigators, and courts in distinguishing between relatively stable anatomical approximations and highly uncertain reconstructed features [31]. However, standardized forensic protocols for uncertainty estimation, visualization, and probabilistic reporting remain largely undeveloped within current forensic facial reconstruction practice [32].

Looking ahead, developments in AI may contribute to improved standardisation, for instance, through semi-automated landmark detection or large-scale comparative modelling. However, the extent to which these advances will meaningfully enhance reproducibility remains to be systematically empirically validated. A key limiting factor remains the availability and completeness of soft tissue datasets. Facial soft tissue thickness (FSTT) at anatomical landmarks varies substantially between individuals and populations, influenced by age, sex, body habitus, and ancestry. Since these variables directly affect reconstruction algorithms, incomplete or non-representative datasets can introduce systematic inaccuracies [9,33,34].

An additional layer of complexity concerns the prediction of externally visible traits not determined by skeletal morphology. DNA phenotyping systems in forensic science, such as HIrisPlex-S, have been developed to infer characteristics like eye, hair, and skin color from genetic data. However, their performance varies considerably across traits. Reported AUC values range from 0.74 to 0.99 for eye color, 0.64 to 0.94 for hair color, and 0.72 to 0.99 for skin color, depending on the model used. In general, eye color prediction tends to achieve higher accuracy, whereas intermediate pigmentation categories remain more difficult to classify reliably. Reviews of forensic DNA phenotyping further indicate that hair color prediction often yields lower positive predictive values than eye color prediction, reflecting the complex polygenic architecture of pigmentation traits [35]. These findings further illustrate that the prediction of externally visible traits remains probabilistic and trait-dependent, reinforcing the need for cautious forensic interpretation.

At present, few studies have evaluated whether reconstruction performance remains stable across independent datasets, reconstruction teams, imaging modalities, and algorithmic architectures, limiting formal assessment of reproducibility and external forensic validity.

5. Sources of Bias: Population Ancestry, Sex, and Training Data Imbalance

Population ancestry is one of the most significant sources of variation in facial soft tissue thickness (FSTT). Comparative studies, for example, between southeastern and central-western Brazilian samples, have demonstrated measurable regional differences. In these analyses, males showed greater variation across several midline and bilateral landmarks, whereas females showed fewer statistically significant differences. The authors suggest that population-specific FSTT tables can improve reconstruction outcomes, while also emphasizing that facial morphology is not determined solely by soft-tissue thickness. Features such as the eyes, lips, nose, and overall facial proportions also contribute significantly to inter-population differences [11].

Similar findings have been reported in studies comparing Nigerian, South African, and African American male samples, in which differences were particularly evident in the lower facial region, particularly around the perioral and chin areas. Increased bilateral variability was also observed, further supporting the need for population-specific reference data in forensic reconstruction contexts [35]. Likewise, research on the adult Turkish population has shown that average FSTT values tend to fall between those reported for Korean and European white populations, reinforcing the broader conclusion that soft tissue thickness cannot be reliably generalized across ethnic groups [10].

Practical forensic observations align with these findings. It has been noted that applying reference values derived from African American populations to reconstruct the faces of Black children from South Africa leads to suboptimal outcomes. Such cases illustrate how the use of non-local or overly general datasets may propagate systematic bias within both conventional and AI-assisted reconstruction workflows, particularly when demographic representation in training data is limited, thereby reducing reconstruction reliability when population-specific standards are unavailable [36].

Sex-based variation represents another important factor influencing FSTT. In Brazilian samples, males generally present thicker soft tissue across multiple anatomical landmarks, while females tend to show lower overall variability [35]. Studies in Korean adult populations similarly indicate that sex differences are particularly evident in the upper and lower lip regions, with males exhibiting greater thickness across all skeletal classifications. These patterns support the use of sex- and skeletal class-specific reference values in forensic and clinical applications, including forensic art, anthropology, dentistry, and oral and maxillofacial surgery, although substantial inter-individual variability remains present within demographic groups [34]. Although average differences between sexes are often modest—typically below 2.5 mm at individual landmarks—they are consistently observed across datasets, suggesting a stable pattern of sexual dimorphism in facial soft tissue distribution [35].

Age-related changes further complicate reconstruction. Thickness at several anatomical landmarks, including the mid-philtrum, prosthion, and ectomolare2, tends to decrease with age, particularly in females, reflecting progressive soft tissue atrophy associated with facial aging processes [30]. Despite the existence of multiple adult FSTT databases, reference material for pediatric and adolescent populations remains limited. Only a small number of studies have established dedicated datasets for younger individuals, resulting in a significant gap in forensic reference standards. Gibelli et al. explicitly highlight this limitation and emphasize the need to expand pediatric FSTT databases to support more population-appropriate reconstruction modelling in minors [37].

Taken together, these age-, sex-, and population-related differences underscore the importance of a thorough anthropological assessment of skeletal remains prior to reconstruction, including determination of sex, age, and ancestry [9,10,34]. More broadly, they demonstrate that demographic imbalance in available datasets remains a critical limitation in forensic facial reconstruction. Addressing this issue requires the systematic development of diverse, population-specific FSTT databases and the expansion of existing reference collections. Such efforts are essential not only for traditional reconstruction methods but also for AI-based approaches, which are highly dependent on representative training data to achieve robust and demographically generalizable reconstruction modelling.

From a practical standpoint, these sources of bias have direct implications for forensic work. Although facial reconstruction is primarily used as an investigative aid rather than a definitive identification tool, mismatches between demographic characteristics and reference datasets may reduce reconstruction reliability and increase the risk of misleading visual approximations [38]. Methodological analyses consistently show that reconstruction reliability is strongly dependent on the appropriate selection of population-specific standards, while the use of generalized datasets increases the risk of systematic error [38].

From a methodological perspective, mitigating demographic bias in forensic facial reconstruction may require standardized reporting of dataset composition, including population ancestry, age distribution, sex balance, and imaging acquisition methods. Increasingly, forensic AI discussions also emphasize the importance of documenting uncertainty ranges, demographic assumptions, and model training limitations to improve transparency and reduce the risk of overinterpretation.

As summarised in Table 2, sources of error in forensic facial reconstruction are therefore not purely technical in nature. Rather, they emerge from the interaction between biological variability, limitations in available datasets, and uncertainty introduced by probabilistic modelling approaches themselves (Table 2).

6. Legal and Ethical Perspectives on Digital Reconstructions

Legal and ethical assessment of AI-assisted forensic facial reconstruction requires a clear distinction between probabilistic reconstruction systems and automated facial recognition technologies used for biometric identification. Although both forensic reconstruction and biometric recognition attempt to associate evidence with individuals, they differ substantially in operational goals, interpretative frameworks, and evidentiary use [39]. Forensic facial reconstruction is generally intended to generate investigative leads or support public appeals based on skeletal remains or limited visual evidence, whereas facial recognition systems are designed to establish identity through automated comparison against known image databases. Because these technologies differ substantially in purpose, evidentiary function, and operational use, their legal and ethical implications also differ significantly. Accordingly, reconstructed facial images used in public appeals should not be legally equated with automated biometric identification systems operating on one-to-many database matching.

Within the European Union and the United Kingdom, the legal regulation of biometric technologies is shaped primarily by the General Data Protection Regulation (GDPR). Article 9 of the GDPR restricts the processing of biometric data used for uniquely identifying individuals unless explicit consent or specific legal exemptions apply. However, reconstructed facial images occupy a more ambiguous legal position than conventional biometric templates, as they function primarily as probabilistic investigative approximations rather than direct automated identification tools. Existing forensic interpretation frameworks increasingly emphasize probabilistic reasoning and uncertainty assessment rather than binary identification conclusions [39]. In practice, legal exemptions more commonly apply to facial recognition and biometric surveillance systems used in criminal investigations, whereas the regulatory status of AI-assisted forensic facial reconstructions used for investigative or public appeal purposes remains less clearly defined across jurisdictions [40].

In April 2021, the European Commission introduced its broader strategy for regulating artificial intelligence through the proposed AI Act. Under this framework, AI systems used in law enforcement contexts are generally classified as “high-risk,” requiring safeguards related to accuracy, reliability, transparency, accountability, and human oversight [40]. From a forensic perspective, these concerns are particularly important because reconstructed facial images represent probabilistic outputs generated through statistical and modelling assumptions rather than direct representations of identity. Consequently, transparency regarding dataset composition, demographic assumptions, model limitations, uncertainty ranges, and expert oversight may become important components of methodological and governance evaluation in investigative or judicial contexts. Existing forensic AI frameworks additionally emphasize explainability, accountability, transparency, and reliability as prerequisites for operational deployment in forensic environments [25,39]. Critics have additionally argued that existing EU policy proposals remain insufficiently specific regarding biometric technologies and do not yet provide standardized operational frameworks governing the forensic use of AI-assisted facial reconstruction systems [41].

An additional challenge concerns the evidentiary admissibility of reconstructed facial images in judicial proceedings. Given the probabilistic and interpretative nature of forensic facial reconstruction, reconstructed images are more commonly discussed in the literature as investigative aids rather than standalone identification evidence. Discussions concerning admissibility frequently emphasize factors such as methodological transparency, reproducibility, known error rates, expert oversight, and communication of uncertainty limitations [39,42]. In jurisdictions such as the United States, admissibility of scientific and forensic evidence is frequently evaluated under standards derived from the Frye or Daubert doctrines, which place emphasis on methodological reliability, transparency, reproducibility, and scientifically grounded error assessment [43,44].

Within such frameworks, visually persuasive AI-generated reconstructions may present additional evidentiary concerns if their probabilistic nature, modelling assumptions, and methodological limitations are insufficiently communicated to courts or investigators. Importantly, visual similarity or recognition-based performance metrics alone should not be interpreted as direct indicators of identification reliability or courtroom evidentiary validity. Several reviewed forensic reconstruction systems demonstrated improved similarity-based performance metrics while simultaneously lacking sufficient forensic evaluation or explainability for judicial suitability [39]. Consequently, the evidentiary threshold required for courtroom admissibility of AI-assisted facial reconstructions should be considered substantially higher than the threshold applicable to their use as investigative aids or public appeal instruments.

By contrast, the United States has adopted a more fragmented regulatory approach. Facial recognition and biometric surveillance technologies are widely used by law enforcement agencies across multiple operational contexts [40,45]. Privacy regulation largely operates at the state level, with California often cited as one of the jurisdictions providing stronger consumer data protection mechanisms. The California Privacy Rights Act expanded consumer rights regarding personal data collection and established the California Privacy Protection Agency. Nevertheless, these reforms did not substantially restrict law enforcement access to facial recognition technologies. At the same time, some municipalities, including Berkeley and San Francisco, have introduced local restrictions or bans on the use of facial recognition systems by public authorities [40]. However, legal approaches specifically addressing AI-assisted forensic facial reconstruction remain comparatively underdeveloped.

Data protection and ethical governance remain central concerns in the implementation of AI-assisted forensic facial reconstruction. Since reconstructed images may be disseminated during investigations or public appeals, safeguards are necessary to limit unauthorized access, prevent misuse, and reduce the risk of misleading investigative conclusions. The increasing realism of generative reconstruction systems may also increase the likelihood that probabilistic facial approximations are interpreted by investigators, courts, or the public as definitive representations of identity. Photorealistic reconstructions may additionally introduce cognitive biases such as confirmation bias, anchoring effects, or contextual bias during investigative and judicial interpretation. Recent research on AI-generated image forensics notes that increasingly convincing synthetic imagery may undermine trust and complicate human interpretation of authenticity [29]. This creates additional ethical concerns regarding reputational harm, stigmatization of suspects, and the dignity of unidentified deceased individuals and their families [46].

Further ethical challenges emerge from controversial or poorly regulated applications of generative reconstruction technologies, including deepfake-like reconstructions, commercial use of facial data, and speculative historical reconstructions. While high-quality datasets and expert supervision may reduce some forms of error, current reconstruction methodologies remain subject to substantial uncertainty regarding predicted facial appearance. Several reconstruction approaches may unintentionally bias outputs toward the statistical properties of underlying reference models, potentially limiting forensic suitability [39]. Emerging governance discussions therefore increasingly emphasize the need for standardized documentation protocols, reconstruction audit trails, uncertainty disclosure requirements, demographic transparency, and human expert review prior to operational or public deployment of AI-assisted forensic facial reconstructions. Responsible AI frameworks for forensic science additionally advocate governance structures emphasizing accountability, explainability, transparency, fairness, and documented oversight procedures [40].

7. Toward Transparent and Interpretable AI Models

Interpretability and procedural transparency are critical requirements in AI-assisted forensic facial reconstruction (FFR), particularly because reconstruction outcomes remain probabilistic, inferential, and potentially influential in investigative and judicial contexts. Transparent analytical workflows enable forensic practitioners to critically evaluate how input data are processed, how reconstruction outputs are generated, and how uncertainty propagates throughout the reconstruction pipeline. These considerations are especially important given the increasing use of “black box” AI systems in forensic domains such as DNA mixture interpretation, facial recognition, and recidivism risk assessment tools, where algorithmic opacity has raised substantial concerns regarding transparency, reproducibility, evidentiary reliability, and independent scrutiny of computationally derived conclusions [46]. In the context of forensic facial reconstruction, these concerns are amplified by the highly persuasive visual realism of contemporary generative AI systems, which may unintentionally encourage overinterpretation of probabilistic reconstructions despite unresolved methodological and biological limitations.

To mitigate these risks, principles derived from Explainable AI (XAI) should be integrated into the design of AI-assisted forensic reconstruction systems. XAI encompasses methods intended to improve human understanding of machine learning outputs and analytical decision pathways [47]. In forensic reconstruction contexts, explainability mechanisms may assist experts in identifying modelling inconsistencies, evaluating uncertainty sources, and examining relationships between cranial morphology and reconstructed facial approximations. Potentially useful approaches include saliency or attention-map visualizations, uncertainty heatmaps, and model-traceability frameworks linking reconstruction outputs to specific anatomical or statistical parameters [48]. However, the forensic utility of such approaches remains limited. Attention maps and related post hoc visualizations do not necessarily reflect the true internal computational logic of deep neural networks and may therefore provide only simplified or partially misleading representations of model behaviour [49].

Importantly, many currently available XAI techniques provide only indirect or post hoc approximations of model behaviour rather than true mechanistic explanations of how generative systems produce specific reconstruction features [48,50,51]. In complex architectures such as GANs and diffusion models, reconstructed facial characteristics emerge from high-dimensional latent-space interactions that often cannot be directly traced to anatomically interpretable decision pathways [51,52]. Consequently, attention maps, saliency visualizations, and feature-attribution techniques may create an appearance of interpretability without necessarily providing genuine forensic traceability or causal explanation of reconstruction outputs [48,50].

Moreover, latent-space generative processes remain particularly difficult to audit in forensic contexts because visually plausible outputs may result from statistical correlations learned during training rather than biologically meaningful craniofacial relationships [48,52]. This creates a significant risk of “false interpretability,” whereby AI-generated reconstructions appear scientifically transparent despite limited ability to independently verify, reproduce, or causally explain specific generated facial features [50]. From a forensic perspective, such limitations raise important concerns regarding evidentiary reliability, expert scrutiny, and courtroom defensibility of AI-assisted reconstruction systems [25,48].

Beyond interpretability alone, future XAI-oriented forensic reconstruction systems may additionally incorporate uncertainty-aware modelling capable of explicitly communicating confidence estimates associated with reconstructed anatomical regions [30,32]. Such approaches could improve transparency by enabling experts and legal decision-makers to distinguish between relatively well-supported reconstruction features and regions characterized by high inferential uncertainty [32]. Nevertheless, uncertainty visualization should not be interpreted as equivalent to forensic validation, as visually interpretable confidence representations may still fail to reflect the true evidentiary reliability of reconstruction outputs [25,53].

Importantly, explainability alone cannot resolve the broader methodological limitations associated with AI-assisted forensic reconstruction. Reconstruction outputs remain strongly dependent on the quality and representativeness of training datasets, demographic variability, skull–face correspondence assumptions, imaging heterogeneity, and subjective interpretative decisions made throughout the analytical workflow. Moreover, the increasing photorealism achievable by diffusion and GAN-based systems may mask anatomical inaccuracies or synthetic artefacts, thereby increasing the persuasive impact of reconstructed images despite unresolved uncertainty [54,55]. For this reason, visual realism should not be conflated with forensic accuracy or identification reliability.

Accordingly, future forensic AI governance frameworks should prioritize auditability, reproducibility, and explicit uncertainty disclosure. Reconstruction workflows should incorporate standardized documentation and reporting procedures allowing independent forensic review and evidentiary scrutiny. Such protocols should include dataset provenance records, demographic assumptions, model versioning, reconstruction parameters, uncertainty estimates, confidence reporting, and documentation of all expert interventions performed during the reconstruction process [56]. Maintaining complete reconstruction audit trails may improve chain-of-custody integrity and facilitate independent assessment of analytical assumptions, methodological limitations, and reproducibility in adversarial legal settings. The allocation of responsibility for erroneous or misleading AI-assisted reconstructions remains legally and operationally unresolved, particularly in workflows involving commercial software, multidisciplinary expert teams, and partially automated decision pipelines. Additional safeguards could include mandatory disclosure that AI-generated reconstructions constitute investigative aids rather than biometric identification evidence, as well as independent expert review prior to operational or public release.

These considerations are particularly important because reconstructed facial approximations may have different legal implications depending on their intended use. Reconstructions disseminated publicly to generate investigative leads differ substantially from systems used for automated facial comparison or identity determination, both in terms of evidentiary function and applicable legal safeguards. AI-generated reconstructions intended for investigative dissemination may require different evidentiary safeguards than visual materials formally introduced as courtroom exhibits. AI-assisted forensic facial reconstruction should therefore continue to be understood primarily as an investigative support methodology intended to generate reconstructive hypotheses and assist investigative prioritization rather than as an autonomous identification system or a substitute for corroborating forensic evidence.

From a legal perspective, the admissibility of AI-assisted reconstructions may ultimately depend on whether reconstruction methodologies satisfy broader standards of scientific reliability, transparency, reproducibility, and expert interpretability applied within individual jurisdictions. Courts may additionally face challenges associated with the potentially prejudicial or overly persuasive character of photorealistic AI-generated reconstructions, particularly when uncertainty limitations are insufficiently disclosed. Consequently, transparent reporting of methodological assumptions, uncertainty ranges, dataset limitations, and model constraints should be regarded as essential prerequisites for responsible forensic and judicial use. Accordingly, fully autonomous AI-driven forensic facial reconstruction without meaningful human expert oversight should presently be regarded as incompatible with responsible forensic governance and evidentiary best practices.

Future research should therefore focus not only on improving visual realism, but also on developing probabilistic reconstruction frameworks, uncertainty quantification methods, cross-population validation studies, benchmark forensic datasets, and standardized evaluation protocols capable of supporting scientifically robust and legally defensible forensic applications of AI-assisted facial reconstruction [51,52].

Accordingly, explainability mechanisms in forensic AI should be interpreted primarily as tools supporting partial procedural transparency and expert review rather than as guarantees of scientific validity, evidentiary reliability, causal interpretability, or forensic admissibility [51,52].

8. Future Directions

8.1. Hybrid Expert–AI Frameworks and Standardization

In recent years, research on facial reconstruction using deep learning has increasingly highlighted the importance of combining artificial intelligence with domain expertise, particularly that of anthropologists and forensic practitioners. Generative models, including those based on GAN architectures or domain translation techniques (e.g., skull-to-face mapping), are now capable of generating visually plausible reconstruction outputs. However, despite these advances, their outputs still require substantive evaluation and correction by specialists [56]. Experimental studies indicate that GAN-based systems and cyclic translation models (such as CycleGAN) can generate visually realistic facial approximations from skull-derived input data; however, these outputs remain probabilistic reconstructions rather than validated representations of true facial appearance. Consequently, such reconstructions must be interpreted cautiously and within the context of anatomical, anthropological, and biological expertise, which makes expert involvement indispensable [57]. Machine learning models developed in 2025 further extend these capabilities by enabling probabilistic estimation of selected facial dimensions from dental data and skeletal morphology. This supports anthropologists in rapidly generating multiple facial variants, effectively accelerating the initial stages of reconstruction [58]. The availability of automatically generated variants allows experts to focus on evaluating and refining the most plausible reconstructions, thereby supporting expert evaluation and facilitating comparative assessment of multiple reconstruction hypotheses [58]. Moreover, combining computational predictions with biological and clinical knowledge supports the analysis of anatomical relationships that may be difficult to detect by either human observers or AI systems alone. This underscores the complementary nature of hybrid approaches, where neither component is sufficient in isolation [57]. As a result, a hybrid framework—where AI generates candidate reconstructions and experts assess their anatomical and biological plausibility—is becoming increasingly central to the development of effective reconstruction tools. Within this perspective, AI systems in forensic medicine should be treated primarily as decision-support tools rather than replacements for expert judgment [59].

8.2. Standardization: The Need for Consistent Protocols

A significant limitation in applying AI to facial reconstruction and forensic science more broadly is the lack of standardization. The literature consistently emphasizes that there are currently no universally accepted evaluation protocols or minimal methodological requirements for assessing reconstruction quality and evidentiary value. While AI is increasingly used as a support tool, its forensic applicability depends on the establishment of validation frameworks that support scientific reliability and improve legal defensibility [59]. This issue is closely related to the fragmentation of benchmarks and evaluation procedures. Different studies rely on diverse datasets, metrics, and training strategies, making it difficult to compare results and identify methodologically robust and reproducible approaches. Similar challenges have been reported in related fields, such as AI-generated content detection, where inconsistencies in evaluation protocols limit comparability and slow methodological progress [60]. Attempts to address these challenges can be observed in adjacent areas of forensic AI. For instance, integrated platforms such as ForensicHub have been proposed to standardize datasets, models, and evaluation metrics, thereby enabling more consistent comparisons between methods. Although such frameworks are not specifically designed for facial reconstruction, they illustrate a broader trend toward interoperability and standardization in forensic research [60]. In addition to standardization, the development of clear validation and interpretability protocols remains essential. Evaluation should not be limited to predictive accuracy but should also include transparency, robustness, and the extent to which expert users can understand and critically assess AI-generated outputs. This is particularly important in forensic contexts, where the credibility of results depends on both performance and interpretability [59].

Future standardization efforts should additionally establish harmonized validation frameworks incorporating benchmark datasets, minimum demographic reporting standards, cross-population testing procedures, standardized morphometric error metrics, uncertainty scoring protocols, and reproducibility assessments across independent reconstruction teams. The development of such technical validation protocols may improve inter-study comparability, facilitate independent forensic evaluation, and support more robust assessment of the evidentiary reliability of AI-assisted facial reconstruction systems.

8.3. Development of Global Datasets and the Challenge of Diversity

Another key issue concerns the limited diversity of datasets used in facial reconstruction research. Studies in computer vision consistently show that model performance and generalization are strongly influenced by the representativeness of the training data, especially with respect to ethnicity, age, and population characteristics. Many commonly used datasets remain dominated by specific groups, often consisting primarily of White individuals from the United States and the United Kingdom, and lack sufficient demographic annotation [61]. Addressing this imbalance requires developing more representative datasets that capture a wider range of human variation. Projects such as 3D2M aim to provide 3D mesh datasets covering dozens of ethnic groups, thereby supporting more inclusive research in facial reconstruction and analysis [62]. Such initiatives may help reduce demographic bias and improve model generalizability across populations in forensic contexts. The importance of diversity is also evident in applications such as age estimation and facial classification, where imbalanced datasets can lead to systematic errors. Research on demographic balancing demonstrates that incorporating age- and ethnicity-diverse data may improve demographic robustness and support broader generalization across datasets [63]. In this context, synthetic data generation has emerged as an additional strategy. Models such as StyleGAN2 enable the creation of demographically balanced datasets that can complement real-world data and partially mitigate existing demographic imbalances [64]. Furthermore, age-structured datasets—covering different stages of life—allow for the analysis of aging processes and support applications such as age progression in facial reconstruction [65].

8.4. Integration with Other Techniques

Future developments in facial reconstruction are likely to depend on integrating AI with complementary analytical and measurement techniques. One important direction involves combining deep learning with 3D morphometric approaches, which provide anatomically grounded representations of craniofacial structures. The use of statistical shape models and landmark-based methods allows for more anatomically constrained prediction of soft-tissue features while maintaining anatomical consistency [39]. Tools such as FaceDig facilitate this process by automating the placement of morphometric landmarks, thereby increasing both efficiency and reproducibility [39,66]. Advances in data acquisition technologies also play a significant role. High-resolution 3D scanning and LiDAR-based methods enable precise capture of skull geometry, thereby improving input data quality and potentially reducing selected sources of reconstruction uncertainty [39]. Another promising area involves the incorporation of biomechanical models of soft tissues. Approaches such as PhysSFI-Net simulate the behavior of muscles and skin, supporting anatomically informed modelling of soft-tissue deformation and facial surface reconstruction. These methods are particularly relevant in applications requiring high anatomical precision, such as surgical planning [67]. Increasingly, these techniques are combined within multimodal frameworks that integrate geometric, imaging, and biological data. Such approaches may improve the stability, interpretability, and demographic robustness of reconstruction modelling across populations and are widely regarded as a key direction for future development [39,67].

8.5. The Potential of Future Forensic Medicine

The ongoing development of AI-based reconstruction systems, particularly those capable of integrating multiple data modalities, may support faster generation and comparative assessment of investigative reconstruction hypotheses. Emerging approaches combine craniofacial morphology with genetic data through forensic DNA phenotyping, enabling the prediction of visible traits such as eye color, skin pigmentation, and age, although such predictions remain probabilistic and trait-dependent [68]. Frameworks such as SPOT-Face further extend these capabilities by enabling comparisons between skull and facial features using neural networks and optimal transport methods, representing a step toward more automated comparative analysis workflows within forensic investigations. However, despite these advances, careful validation and expert oversight remain essential, especially in cases involving the prediction of phenotypic traits from genetic information [69]. Overall, future progress in forensic facial reconstruction will depend on integrating AI with expert knowledge while maintaining transparency, uncertainty disclosure, and human expert oversight, alongside the development of standardized methodologies, diverse datasets, and multimodal analytical frameworks. Only through such a combined approach can these systems achieve both scientific reliability and practical applicability in forensic contexts.

9. Discussion

This review demonstrates that forensic facial reconstruction (FFR) has undergone a fundamental transition from expert-driven, anatomically informed approximation toward data-driven computational modeling integrating deep learning, 3D morphometrics, and multimodal data sources. Although this shift has clearly improved visual realism and enabled greater scalability, it has not eliminated the core inferential limitations associated with reconstructing facial appearance from skeletal remains [13,16,39].

A central issue emerging from this analysis concerns the non-deterministic relationship between cranial morphology and facial appearance. Deep learning architectures—including convolutional neural networks, generative adversarial networks, variational autoencoders, and diffusion models—learn statistical associations rather than biologically causal relationships. Consequently, reconstructed faces should not be interpreted as definitive representations of an individual’s appearance, but rather as probabilistic estimates shaped by the structure and limitations of the training data. This distinction becomes especially important in cases where traits are only weakly, or not at all, encoded in skeletal structures, where the margin of uncertainty is inherently higher [17,22,51].

At the same time, it is important to note that improvements in visual fidelity do not automatically translate into increased forensic utility. Quantitative geometric accuracy—typically evaluated using surface deviation metrics against CT-derived ground truth models—offers valuable technical insight, yet it does not fully capture how reconstructions perform in practice. Recognition-based studies suggest that even relatively minor geometric discrepancies can meaningfully influence identification outcomes. In this sense, the forensic value of facial reconstruction emerges from a more complex interplay between morphometric precision, perceptual recognizability, and the interpretive context in which the image is used, highlighting the need for multi-layered validation frameworks [26,27].

The review also indicates that dataset composition remains the primary source of error in AI-assisted reconstruction. Differences in facial soft tissue thickness across ancestry, sex, and age introduce systematic biases when models are trained on datasets that lack sufficient representation. These biases are not incidental; rather, they become embedded in the model’s behavior and can lead to consistent distortions when applied to underrepresented groups. As a result, further progress in reconstruction reliability may depend less on increasing algorithmic sophistication and more on improving the diversity and completeness of training data [26,29,61].

A related concern involves reproducibility and methodological transparency. Traditional reconstruction methods are already subject to inter-expert variability, but the introduction of deep learning adds another layer of complexity due to limited interpretability. The “black box” nature of many generative models makes it difficult to trace errors or justify specific outputs, raising questions about evidentiary reliability in forensic settings. In this context, the development and integration of explainable AI (XAI) techniques—such as attention mapping or interpretable feature attribution—becomes particularly important for ensuring auditability and maintaining expert oversight [24,47,51].

Legal and ethical considerations further complicate the practical use of AI-driven facial reconstruction. While existing regulatory frameworks, including GDPR and emerging AI legislation, classify such systems as high-risk, they do not yet provide sufficiently detailed, domain-specific guidelines for validation, accountability, or admissibility in forensic contexts. Questions related to biometric data protection, the risk of misidentification, and the public dissemination of reconstructed images, therefore, remain only partially addressed and require more clearly defined standards and protocols [9,34,39].

Taken together, these findings support a hybrid operational model in which AI systems are used as decision-support tools rather than fully autonomous reconstruction engines. Within such a framework, generative models can be understood as tools for generating a range of plausible facial hypotheses, which forensic experts then critically assess and refine. This approach helps to limit the impact of algorithmic bias, reduces the risk of overinterpreting model outputs, and preserves the central role of expert judgment in evaluating anatomical plausibility and uncertainty [40,58,59].

Looking ahead, further advances in forensic facial reconstruction are likely to depend on integrative approaches rather than purely algorithmic improvements. Combining deep learning with 3D morphometrics, biomechanical modeling of soft tissues, and forensic DNA phenotyping may offer more biologically grounded reconstructions. Even so, these methods remain probabilistic and vary in predictive reliability depending on the trait in question, underscoring the importance of explicitly modeling and communicating uncertainty throughout the reconstruction process [28,39,67].

Future forensic validation frameworks may therefore require standardized probabilistic reporting protocols and quantitative uncertainty estimation procedures analogous to confidence reporting practices used in other forensic and biomedical domains [27,39,56].

Collectively, the reviewed AI architectures reveal a consistent trade-off between visual realism, interpretability, computational complexity, and forensic reliability. CNN-based approaches generally provide greater procedural stability and more interpretable feature extraction, but remain limited in generative flexibility. GANs and diffusion models produce substantially more photorealistic reconstructions; however, their outputs are more vulnerable to hallucinated features, reduced traceability, and limited biological interpretability [63,64]. VAEs occupy an intermediate position by enabling probabilistic modeling of facial variability, although often at the cost of lower visual fidelity. Importantly, no currently available architecture simultaneously satisfies the key forensic requirements of reproducibility, demographic robustness, interpretability, and large-scale external validation. Consequently, the current forensic value of AI-assisted facial reconstruction may depend less on maximizing visual realism and more on developing transparent hybrid frameworks capable of integrating probabilistic modeling with expert-driven anatomical evaluation [17,22,51,58].

10. Conclusions

Forensic facial reconstruction currently operates within a constrained inferential framework defined by incomplete biological observability, demographic imbalance in training data, and limited model interpretability. Advances in deep learning have significantly improved visual fidelity and operational efficiency; however, they have not eliminated the fundamental epistemic uncertainty underlying the mapping between skeletal remains and facial appearance. Future progress will therefore depend less on increasing algorithmic complexity and more on establishing standardized, demographically balanced datasets, transparent validation protocols, and hybrid expert–AI frameworks that explicitly model and communicate uncertainty.

Author Contributions

Conceptualization, A.F. and J.B. (Jakub Banaszek); investigation, B.B., D.B., A.O., M.B. and J.B. (Jakub Banaszek); resources, B.B., D.B., A.O., M.B. and J.B. (Jakub Banaszek); data curation, B.B., D.B., A.O., M.B. and J.B. (Jakub Banaszek); writing—original draft preparation, B.B., D.B., A.O., M.B. and J.B. (Jakub Banaszek); writing—review and editing, J.B. (Jacek Baj), A.F., P.Z. and G.T.; visualization, A.F. and J.B. (Jacek Baj); supervision, G.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

3D	three-dimensional
3D2M	3D2M project; a large-scale 3D mesh facial dataset covering multiple ethnic groups, used to improve demographic diversity and reduce bias in AI-based facial reconstruction
AI	artificial intelligence
AUC	area under the curve
BMI	body mass index
CAD	computer-aided design
CCFR	computer-based facial reconstruction
CNN	convolutional neural network
CT	computed tomography
CycleGAN	cycle-consistent generative adversarial network
DNA	deoxyribonucleic acid
EU	European Union
FDP	forensic DNA phenotyping
FFR	forensic facial reconstruction
FRT	facial reconstruction technology
FSTT	facial soft tissue thickness
GAN	generative adversarial network
GDPR	general data protection regulation
LiDAR	light detection and ranging
NN	neural network
SPOT-Face	neural network-based skull–face comparison framework
UK	United Kingdom
US	United States
VAE	variational autoencoder
XAI	explainable artificial intelligence

References

Verzé, L. History of facial reconstruction. Acta Biomed. 2009, 80, 5–12. [Google Scholar]
Guleria, A.; Krishan, K.; Sharma, V.; Kanchan, T. Methods of forensic facial reconstruction and human identification: Historical background, significance, and limitations. Sci. Nat. 2023, 110, 8. [Google Scholar] [CrossRef]
Ullrich, H.; Stephan, C.N. On Gerasimov’s plastic facial reconstruction technique: New insights to facilitate repeatability. J. Forensic Sci. 2011, 56, 470–474. [Google Scholar] [CrossRef] [PubMed]
De Greef, S.; Willems, G. Three-dimensional cranio-facial reconstruction in forensic identification: Latest progress and new tendencies in the 21st century. J. Forensic Sci. 2005, 50, 12–17. [Google Scholar] [CrossRef]
Claes, P.; Vandermeulen, D.; De Greef, S.; Willems, G.; Suetens, P. Craniofacial reconstruction using a combined statistical model of face shape and soft tissue depths: Methodology and validation. Forensic Sci. Int. 2006, 159, S147–S158. [Google Scholar] [CrossRef]
Shrimpton, S.; Daniels, K.; de Greef, S.; Tilotta, F.; Willems, G.; Vandermeulen, D.; Suetens, P.; Claes, P. A spatially-dense regression study of facial form and tissue depth: Towards an interactive tool for craniofacial reconstruction. Forensic Sci. Int. 2014, 234, 103–110. [Google Scholar] [CrossRef]
Berar, M.; Tilotta, F.M.; Glaunès, J.A.; Rozenholc, Y. Craniofacial reconstruction as a prediction problem using a Latent Root Regression model. Forensic Sci. Int. 2011, 210, 228–236. [Google Scholar] [CrossRef]
Thiemann, N.; Keil, V.; Roy, U. In vivo facial soft tissue depths of a modern adult population from Germany. Int. J. Leg. Med. 2017, 131, 1455–1488. [Google Scholar] [CrossRef] [PubMed]
Adegbite, N.; Mura, M.; Shafiu, H.; Avery, C.; Ahmed, W. Forensic facial reconstruction: A computer tomography study of facial soft tissue thickness in Nigerian adult male multi-ethnic population. Int. J. Leg. Med. 2025, 139, 1953–1970. [Google Scholar] [CrossRef] [PubMed]
Bulut, O.; Sipahioglu, S.; Hekimoglu, B. Facial soft tissue thickness database for craniofacial reconstruction in the Turkish adult population. Forensic Sci. Int. 2014, 242, 44–61. [Google Scholar] [CrossRef]
Moritsugui, D.S.; Fugiwara, F.V.G.; Vassallo, F.N.S.; Mazzilli, L.E.N.; Beaini, T.L.; Melani, R.F.H. Facial soft tissue thickness in forensic facial reconstruction: Impact of regional differences in Brazil. PLoS ONE 2022, 17, e0270980. [Google Scholar] [CrossRef] [PubMed]
Shui, W.; Zhou, M.; Deng, Q.; Wu, Z.; Ji, Y.; Li, K.; He, T.; Jiang, H. Densely calculated facial soft tissue thickness for craniofacial reconstruction in Chinese adults. Forensic Sci. Int. 2016, 266, 573.e1–573.e12. [Google Scholar] [CrossRef] [PubMed]
Thurzo, A.; Kosnáčová, H.S.; Kurilová, V.; Kosmeľ, S.; Beňuš, R.; Moravanský, N.; Kováč, P.; Kuracinová, K.M.; Palkovič, M.; Varga, I. Use of Advanced Artificial Intelligence in Forensic Medicine, Forensic Anthropology and Clinical Anatomy. Healthcare 2021, 9, 1545. [Google Scholar] [CrossRef]
Asghar, N.; Noreen, S.; Javed, U.; Ali, F. Advancements in craniofacial reconstruction: Approaches and applications in forensics. Forensic Sci. Med. Pathol. 2025, 21, 1863–1879. [Google Scholar] [CrossRef] [PubMed]
Tapuskovic, T.; Nenezic, D.; Radojevic, N.; Dedeic, R. Anthropological and forensic significance of facial soft tissue thickness in Montenegrin population. Leg. Med. 2024, 71, 102537. [Google Scholar] [CrossRef]
Navic, P.; Inthasan, C.; Chaimongkhol, T.; Mahakkanukrauh, P. Facial reconstruction using 3-D computerized method: A scoping review of Methods, current Status, and future developments. Leg. Med. 2023, 62, 102239. [Google Scholar] [CrossRef]
Sordo, Z.; Chagnon, E.; Hu, Z.; Donatelli, J.J.; Andeer, P.; Nico, P.S.; Northen, T.; Ushizima, D. Synthetic Scientific Image Generation with VAE, GAN, and Diffusion Model Architectures. J. Imaging 2025, 11, 252. [Google Scholar] [CrossRef]
VanRullen, R.; Reddy, L. Reconstructing faces from fMRI patterns using deep generative neural networks. Commun. Biol. 2019, 2, 193. [Google Scholar] [CrossRef]
Haider, S.A.; Prabha, S.; Gomez-Cabello, C.A.; Borna, S.; Pressman, S.M.; Genovese, A.; Trabilsy, M.; Galvao, A.; Aziz, K.T.; Murray, P.M.; et al. A Validity Analysis of Text-to-Image Generative Artificial Intelligence Models for Craniofacial Anatomy Illustration. J. Clin. Med. 2025, 14, 2136. [Google Scholar] [CrossRef]
Nie, D.; Trullo, R.; Lian, J.; Wang, L.; Petitjean, C.; Ruan, S.; Wang, Q.; Shen, D. Medical Image Synthesis with Deep Convolutional Adversarial Networks. IEEE Trans. Biomed. Eng. 2018, 65, 2720–2730, Erratum in IEEE Trans. Biomed. Eng. 2020, 67, 2706. https://doi.org/10.1109/TBME.2020.3006296. [Google Scholar] [CrossRef]
Celard, P.; Iglesias, E.L.; Sorribes-Fdez, J.M.; Romero, R.; Vieira, A.S.; Borrajo, L. A survey on deep learning applied to medical images: From simple artificial neural networks to generative models. Neural Comput. Appl. 2023, 35, 2291–2323. [Google Scholar] [CrossRef]
Müller-Franzes, G.; Niehues, J.M.; Khader, F.; Arasteh, S.T.; Haarburger, C.; Kuhl, C.; Wang, T.; Han, T.; Nolte, T.; Nebelung, S.; et al. A multimodal comparison of latent denoising diffusion probabilistic models and generative adversarial networks for medical image synthesis. Sci. Rep. 2023, 13, 12098. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Xu, M.; Xu, M.; Ma, H.; Zhao, J.; Li, X.; Zhu, X.; Lei, Z. BFSM: 3D Bidirectional Face-Skull Morphable Model for Forensic Reconstruction. arXiv 2025. [Google Scholar] [CrossRef]
Gietzen, T.; Brylka, R.; Achenbach, J.; Hebel, K.Z.; Schömer, E.; Botsch, M.; Schwanecke, U.; Schulze, R. A Method for Automatic Forensic Facial Reconstruction Based on Dense Statistics of Soft Tissue Thickness. arXiv 2018. [Google Scholar] [CrossRef]
Liang, Y.; Zhang, C.; Zhao, J.; Wang, W.; Li, X. Skull-to-Face: Anatomy-Guided 3D Facial Reconstruction and Editing. arXiv 2024. [Google Scholar] [CrossRef]
Miranda, G.E.; Wilkinson, C.; Roughley, M.; Beaini, T.L.; Melani, R.F.H. Assessment of accuracy and recognition of three-dimensional computerized forensic craniofacial reconstruction. PLoS ONE 2018, 13, e0196770. [Google Scholar] [CrossRef]
Wilkinson, C.; Liu, C.Y.J.; Shrimpton, S.; Greenway, E. Craniofacial identification standards: A review of reliability, reproducibility, and implementation. Forensic Sci. Int. 2024, 359, 111993. [Google Scholar] [CrossRef]
Li, Y.; Tian, Y.; Huang, Y.; Lu, W.; Wang, S.; Lin, W.; Rocha, A. FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics. arXiv 2025. [Google Scholar] [CrossRef]
Zou, K.; Chen, Z.; Yuan, X.; Shen, X.; Wang, M.; Fu, H. A Review of Uncertainty Estimation and its Application in Medical Imaging. arXiv 2023. [Google Scholar] [CrossRef]
Jungo, A.; Reyes, M. Assessing Reliability and Challenges of Uncertainty Estimations for Medical Image Segmentation. Med. Image Anal. 2022, 80, 102532. [Google Scholar] [CrossRef]
Lambert, B.; Forbes, F.; Doyle, S.; Dehaene, H.; Dojat, M. Trustworthy clinical AI solutions: A unified review of uncertainty quantification in deep learning models for medical image analysis. Artif. Intell. Med. 2024, 150, 102830. [Google Scholar] [CrossRef]
Stacey, J.; Fleming, R.; Sheppard, D.; Sheppard, J.; Dobbie, G.; Karunakaran, D. A responsible artificial intelligence framework for forensic science. Forensic Sci. Int. 2025, 375, 112548. [Google Scholar] [CrossRef]
Chaimongkhol, T.; Navic, P.; Sinthubua, A.; Palee, P.; Pattamapaspong, N.; Prasitwattanaseree, S.; Charuakkra, A. Utility of 3D facial reconstruction for forensic identification: A focus on facial soft tissue thickness and customized techniques. Forensic Sci. Med. Pathol. 2025, 21, 1112–1126. [Google Scholar] [CrossRef]
Park, E.; Chang, J.; Park, J. Facial Soft Tissue Thickness Differences among Three Skeletal Classes in Korean Population Using CBCT. Int. J. Environ. Res. Public Health 2023, 20, 2658. [Google Scholar] [CrossRef] [PubMed]
Schneider, P.M.; Prainsack, B.; Kayser, M. The Use of Forensic DNA Phenotyping in Predicting Appearance and Biogeographic Ancestry. Dtsch. Arztebl. Int. 2019, 51–52, 873–880. [Google Scholar] [CrossRef] [PubMed]
Briers, N.; Briers, T.M.; Becker, P.J.; Steyn, M. Soft tissue thickness values for black and coloured South African children aged 6–13 years. Forensic Sci. Int. 2015, 252, 188.e1–188.e10. [Google Scholar] [CrossRef] [PubMed]
Gibelli, D.; Collini, F.; Porta, D.; Zago, M.; Dolci, C.; Cattaneo, C.; Sforza, C. Variations of midfacial soft-tissue thickness in subjects aged between 6 and 18years for the reconstruction of the profile: A study on an Italian sample. Leg. Med. 2016, 22, 68–74. [Google Scholar] [CrossRef]
Swift, L.; Obertova, Z.; Franklin, D. Demonstrating the empirical effect of population specificity of anthropological standards in a contemporary Australian population. Int. J. Leg. Med. 2024, 138, 537–545. [Google Scholar] [CrossRef]
La Cava, S.M.; Orrù, G.; Drahansky, M.; Marcialis, G.L.; Roli, F. 3D Face Reconstruction: The Road to Forensics. ACM Comput. Surv. 2023, 56, 77. [Google Scholar] [CrossRef]
Almeida, D.; Shmarko, K.; Lomas, E. The ethics of facial recognition technologies, surveillance, and accountability in an age of artificial intelligence: A comparative analysis of US, EU, and UK regulatory frameworks. AI Ethics 2022, 2, 377–387. [Google Scholar] [CrossRef]
Wiewiórowski, W. Artificial Intelligence Act: A Welcomed Initiative, but Ban on Remote Biometric Identification in Public Space Is Necessary, Press Release; European Commission: Brussels, Belgium, 2021; Available online: https://edps.europa.eu/system/files/2021-04/EDPS-2021-09-Artificial-Intelligence_EN.pdf (accessed on 23 April 2021).
European Commission. Proposal for a Regulation of the European Parliament and of the Council Laying down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts; European Commission: Brussels, Belgium, 2021. [Google Scholar]
Pergolizzi, J.; LeQuang, J.A.K. Black Robes and White Coats: Daubert Standard and Medical and Legal Considerations for Medical Expert Witnesses. Cureus 2024, 16, e69346. [Google Scholar] [CrossRef]
Kopitnik, N.L.; Nouhan, P.P. Expert Witness. In StatPearls [Internet]; StatPearls Publishing: Treasure Island, FL, USA, 2026. [Google Scholar] [PubMed]
Stiernströmer, E. Facial recognition technology in law enforcement-a scoping review of existing empirical studies. Police Pract. Res. 2026, 1–25. [Google Scholar] [CrossRef]
Gasiokwu, P.I.; Oyibodoro, U.G.; Nwabuoku, M.O.I. GDPR Safeguards for Facial Recognition Technology: A Critical Analysis. Int. Res. J. Multidiscip. Scope 2025, 6, 407–423. [Google Scholar] [CrossRef]
Solanke, A. Explainable digital forensics AI: Towards mitigating distrust in AI-based digital forensics analysis using interpretable models. Forensic Sci. Int. Digit. Investig. 2022, 42, 301403. [Google Scholar] [CrossRef]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
Ahmed, F.; Naz, N.S.; Khan, S.; Rehman, A.U.; Ismael, W.M.; Khan, M.A. Explainable artificial intelligence (XAI) in medical imaging: A systematic review of techniques, applications, and challenges. BMC Med. Imaging 2026, 26, 37. [Google Scholar] [CrossRef]
Ghassemi, M.; Oakden-Rayner, L.; Beam, A.L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 2021, 3, e745–e750. [Google Scholar] [CrossRef] [PubMed]
Doshi-Velez, F.; Kim, B. Towards A Rigorous Science of Interpretable Machine Learning. arXiv 2017. [Google Scholar] [CrossRef]
Koh, P.W.; Nguyen, T.; Tang, Y.S.; Mussmann, S.; Pierson, E.; Kim, B.; Liang, P. Concept Bottleneck Models. Proc. Mach. Learn. Res. 2020, 119, 5338–5348. [Google Scholar]
Siwan, D.; Krishan, K.; Sharma, V.; Garg, A.K. A novel approach of developing machine learning based models for the prediction of facial dimensions from dental parameters. Sci. Rep. 2025, 15, 41047. [Google Scholar] [CrossRef]
Pandey, S.; Bansal, I.S.; Robert, N.J. Generative AI-assisted face synthesis for forensic and criminal investigations. Adv. Eng. Res. 2026, 126–136. [Google Scholar] [CrossRef]
Gye, S.; Ko, J.; Shon, H.; Kwon, M.; Kim, J. SFLD: Reducing the content bias for AI-generated image detection. arXiv 2025. [Google Scholar] [CrossRef]
Subramani, P.; National Institute of Standards and Technology (NIST). AI Risk Management Framework (AI RMF 1.0); U.S. Department of Commerce: Gaithersburg, MD, USA, 2023. Available online: https://www.nist.gov/itl/ai-risk-management-framework (accessed on 6 June 2026).
Ballantyne, K.N.; Summersby, S.; Pearson, J.R.; Nicol, K.; Pirie, E.; Quinn, C.; Kogios, R. A transparent approach: Openness in forensic science reporting. Forensic Sci. Int. Synerg. 2024, 8, 100474. [Google Scholar] [CrossRef]
Prasad, R.S.; Singh, D. FCR: Investigating Generative AI models for Forensic Craniofacial Reconstruction. arXiv 2025. [Google Scholar] [CrossRef]
Morán-Torres, R.; Feld, K.; Hesser, J.; Taalab, Y.M.; Yen, K. Artificial intelligence and computer vision in forensic sciences. Rechtsmedizin 2025, 35, 219–225. [Google Scholar] [CrossRef]
Nusrat, T.; Kutub, U.; Khalid, M. AI-Generated Image Detection: An Empirical Study and Future Research Directions. arXiv 2025. [Google Scholar] [CrossRef]
Zargaran, A.; Silva, K.; Sousi, S.; Zargaran, D.; Mosahebi, A. 1182 Ethnic Diversity in Facial Image Databases–the State of Play for Aesthetic Plastic Surgery. Br. J. Surg. 2024, 111, znae163.476. [Google Scholar] [CrossRef]
Sankarshan, D. 3D2M Dataset: A 3-Dimension diverse Mesh Dataset. arXiv 2024. [Google Scholar] [CrossRef]
Panić, N.; Marjanović, M.; Bezdan, T. Addressing Demographic Bias in Age Estimation Models through Optimized Dataset Composition. Mathematics 2024, 12, 2358. [Google Scholar] [CrossRef]
Jain, A.; Dholakia, R.; Memon, N.; Togelius, J. Zero-shot demographically unbiased image generation from an existing biased StyleGAN. arXiv 2023. [Google Scholar] [CrossRef]
Pot, A.; Carstensen, L.L. A generated image repository of aging faces. Sci. Data 2025, 12, 1610. [Google Scholar] [CrossRef] [PubMed]
Kleisner, K.; Trnka, J.; Tureček, P. FACEDIG automated tool for placing landmarks on facial portraits for geometric morphometrics users. Sci. Rep. 2025, 15, 24330. [Google Scholar] [CrossRef]
Bao, J.; Liu, H.; Zhuang, Y.; Tao, L.; Xu, X.; Shi, Y.; Cheng, M.; Wang, Y.; Ku, C.; Zeng, T.; et al. PhysSFI-Net: Physics-informed Geometric Learning of Skeletal and Facial Interactions for Orthognathic Surgical Outcome Prediction. arXiv 2026. [Google Scholar] [CrossRef]
Jiao, M.; Li, J.; Zhong, B.; Du, S.; Li, S.; Zhang, M.; Zhang, Q.; Liang, Z.; Liu, F.; Zuo, C.; et al. De Novo Reconstruction of 3D Human Facial Images from DNA Sequence. Adv. Sci. 2025, 12, e2414507. [Google Scholar] [CrossRef] [PubMed]
Tiwari, V.; Dasari, V.S.R.; Wang, J. The Future of Artificial Intelligence in Forensics: Advancements, Challenges, and Ethical Considerations; Springer Nature Singapore: Singapore, 2025. [Google Scholar]

Figure 1. An overview of the proposed AI-assisted forensic facial reconstruction pipeline integrating deep learning models, expert validation, and sources of uncertainty.

Table 1. Methodological comparison of deep learning architectures used in forensic facial reconstruction, summarizing their functional roles, advantages, and methodological limitations.

Model	Primary Function in Reconstruction	Potential Advantages	Key Limitations	Current Forensic Validation Status
Convolutional Neural Networks (CNNs)	Model statistical associations between cranial morphology and facial soft-tissue distributions using geometric and landmark-based input data	Automated feature extraction; integration of demographic metadata; support for reproducible computational workflows	Limited generative capacity; dependent on training dataset representativeness; constrained interpretability regarding inferred facial traits	Applied primarily in experimental and research-oriented reconstruction workflows; limited large-scale forensic validation
Generative Adversarial Networks (GANs)	Generate visually plausible facial representations and textures from skeletal or morphometric input data through adversarial training	Ability to generate multiple candidate reconstructions; high visual detail; flexibility in incomplete or degraded skeletal cases	Susceptibility to artefacts and biologically unsupported “hallucinated” features; instability during training; limited interpretability	Primarily proof-of-concept and experimental applications; lack of standardized forensic benchmark comparisons
Variational Autoencoders (VAEs)	Model probabilistic distributions of facial morphology within latent feature spaces to support variability-aware reconstruction	Capture probabilistic variation; support anatomically coherent approximations; stable latent-space modelling	Lower visual fidelity relative to other generative architectures; limited fine-detail representation	Limited forensic-specific validation; mainly evaluated in exploratory modelling contexts
Diffusion Models	Iteratively generate facial representations through denoising processes approximating learned data distributions	High image fidelity; stable generative performance; capacity to model diverse facial hypotheses	High computational demands; limited explainability; uncertain biological interpretability; insufficient forensic validation	Emerging methodology with limited direct validation in forensic craniofacial reconstruction scenarios; no established standardized evaluation framework

Table 2. Summary of major sources of error and bias in forensic facial reconstruction, including biological, demographic, technical, and methodological factors affecting reconstruction reliability.

Category	Source of Error or Bias	Description	Impact on Reconstruction
Biological variability	Variation in facial soft tissue thickness (FSTT)	FSTT varies as a function of age, sex, body composition, and population ancestry, introducing intrinsic biological variability	Systematic distortion of facial volume, contour, and proportional relationships
Demographic bias	Non-representative population datasets	Limited availability of demographically diverse datasets and overrepresentation of specific populations in training data	Reduced generalizability and systematic bias in reconstructions of underrepresented groups
Anatomical limitations	Absence of soft tissue determinants in skeletal structure	Many externally visible traits (e.g., eye colour, skin tone, hair characteristics, skin texture) are not encoded in cranial morphology	Fundamental uncertainty requiring probabilistic inference rather than deterministic reconstruction
Technical (AI-related)	Hallucinations in generative models	Generative architectures may produce visually plausible but biologically unsupported features due to learned statistical correlations	Misleading realism and overconfidence in reconstructed features
Training data limitations	Imbalanced and incomplete datasets	Insufficient representation of certain age groups (e.g., children), populations, and phenotypic variability	Decreased robustness and increased susceptibility to bias propagation
Methodological constraints	Lack of standardized protocols and evaluation benchmarks	Heterogeneity in reconstruction workflows, validation metrics, and reporting standards	Limited reproducibility and comparability across studies
Expert-related variability	Subjectivity in manual reconstruction processes	Differences in expert interpretation of anatomical structures and reconstruction techniques	Inter-expert variability and reduced reproducibility
Model interpretability	Black-box nature of deep learning systems	Limited transparency in model decision-making and feature generation processes	Reduced auditability, limited error traceability, and challenges in forensic validation
Operational mismatch	Application of non-specific reference data	Use of inappropriate FSTT tables or population models not matched to the case context	Distortion of reconstructed features and decreased identification reliability

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bąk, B.; Bąk, D.; Osińska, A.; Bednarz, M.; Banaszek, J.; Baj, J.; Forma, A.; Zembala, P.; Teresiński, G. Forensic Facial Reconstruction in the Age of Deep Learning: Accuracy, Bias, and Future Perspectives. Appl. Sci. 2026, 16, 5814. https://doi.org/10.3390/app16125814

AMA Style

Bąk B, Bąk D, Osińska A, Bednarz M, Banaszek J, Baj J, Forma A, Zembala P, Teresiński G. Forensic Facial Reconstruction in the Age of Deep Learning: Accuracy, Bias, and Future Perspectives. Applied Sciences. 2026; 16(12):5814. https://doi.org/10.3390/app16125814

Chicago/Turabian Style

Bąk, Bartłomiej, Dawid Bąk, Aleksandra Osińska, Michał Bednarz, Jakub Banaszek, Jacek Baj, Alicja Forma, Patryk Zembala, and Grzegorz Teresiński. 2026. "Forensic Facial Reconstruction in the Age of Deep Learning: Accuracy, Bias, and Future Perspectives" Applied Sciences 16, no. 12: 5814. https://doi.org/10.3390/app16125814

APA Style

Bąk, B., Bąk, D., Osińska, A., Bednarz, M., Banaszek, J., Baj, J., Forma, A., Zembala, P., & Teresiński, G. (2026). Forensic Facial Reconstruction in the Age of Deep Learning: Accuracy, Bias, and Future Perspectives. Applied Sciences, 16(12), 5814. https://doi.org/10.3390/app16125814

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Forensic Facial Reconstruction in the Age of Deep Learning: Accuracy, Bias, and Future Perspectives

Abstract

1. Introduction

2. Methodology of the Narrative Review

3. Deep Learning Models in Forensic Visualization

4. Accuracy and Reproducibility Assessment

5. Sources of Bias: Population Ancestry, Sex, and Training Data Imbalance

6. Legal and Ethical Perspectives on Digital Reconstructions

7. Toward Transparent and Interpretable AI Models

8. Future Directions

8.1. Hybrid Expert–AI Frameworks and Standardization

8.2. Standardization: The Need for Consistent Protocols

8.3. Development of Global Datasets and the Challenge of Diversity

8.4. Integration with Other Techniques

8.5. The Potential of Future Forensic Medicine

9. Discussion

10. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI