Evidence of Age Estimation Procedures in Forensic Dentistry: Results from an Umbrella Review

Background and objective: Age estimation is an important tool when dealing with human remains or undocumented minors. Although the skull, the skeleton or the hand-wrist are used in age estimation as maturity indicators, they often present a lack of good conditions for a correct identification or estimation. Few systematic reviews (SRs) have been recently published; therefore, this umbrella review critically assesses their level of evidence and provides a general, comprehensive view. Materials and methods: Considering the review question “What is the current evidence on age determination approaches in Forensic Dentistry?” an electronic database search was conducted in four databases (PubMed, Cochrane, WoS, LILACS) up to December 2022, focusing on SRs of age estimation through forensic dentistry procedures. The methodological quality was analyzed using the measurement tool to assess SRs criteria (AMSTAR2). Results: Eighteen SRs were included: five of critically low quality, six of low quality, three of moderate quality and four of high quality. The SRs posited that Willems’ method is more accurate and less prone to overestimation; most methods seem to be geographically sensitive; and 3D-imaging and artificial intelligence tools demonstrate high potential. Conclusions: The quality of evidence on age estimation using dental approaches was rated as low to moderate. Well-designed clinical trials and high-standard systematic reviews are essential to corroborate the accuracy of the different procedures for age estimation in forensic dentistry.


Introduction
Age estimation is a key forensic and archeological element.Often useful for forensic identification of human remains, legal assistance involving minors or clinical diagnosis and planning [1][2][3][4], it is also helpful in mass migration and lack of valid identification [1,2,5].Several methods have been developed to this end, among them, skeletal and dental development, sexual maturation or height/weight ratios [6,7].
Although the skull, the skeleton or the hand-wrist are used in age estimation as maturity indicators, they often present a lack of good conditions for a correct identification or estimation [8,9].Teeth are the hardest human organs and are often found in adequate conditions [10][11][12].Furthermore, dental measurements and indices are considered more useful and reliable, due to less variability during development as well as the greater resistance of teeth to systemic, environmental or destructive factors [6,13].
Estimating dental age may be achieved through several strategies depending on whether tooth development (around 20 years of age) or body development have been completed.On the one hand, methodologies based on teeth development are more accurate and with a smaller margin of error [8,14].On the other hand, the biological age of the individual is being estimated, always understood in a period of time, with some level of precision, and according to the method used.Chronological age will be included, at best, in this age range [12,14].
Several systematic reviews have been published with numerous dental methods based on radiographic (panoramic radiographs or otherwise) and non-radiographic approaches, most of them only evaluate one or two methodologies.Considering the variety and the discrepancy of the methods, it is helpful to compare and summarize the evidence previously published regarding age determination in Forensic Dentistry.The purpose of this comprehensive review was to assess the existing evidence on age determination procedures in forensic dentistry.Our focus was twofold: to determine the quality of the evidence and the overall clinical accuracy of each procedure.

Materials and Methods
We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline [15] (Supplementary File S1) and the guide for systematic reviews of systematic review [16].The review protocol was approved a priori by all authors and registered on Open Science Framework (DOI 10.17605/OSF.IO/CPBZY).
The review question was: "What is the current evidence on age determination approaches in Forensic Dentistry?".

Eligibility criteria
To answer the proposed research question, the inclusion criteria were: (1) systematic review (with or without meta-analysis); (2) addressing age determination in Forensic Dentistry; (3) absence of data duplication within the included studies in the meta-analysis.No restrictions on the year of publication or language were applied.

Information sources search
Four electronic databases were searched for electronic data: PubMed, Cochrane Database of Systematic Reviews, LILACS (Latin American scientific literature in health sciences) and Web of Science.The key words and subject headings were merged in accordance with the thesaurus of each of the databases and the subject headings were exploded, with the following syntax "((age determination) OR (age determination forensic) OR (age estimation) OR (dental age estimation) OR (forensic age estimation) OR (age estimation methods) OR (age prediction) OR (dental age prediction)) AND ((tooth) OR (teeth) OR (dental) OR forensic OR (forensic dentistry) OR (forensic odontology)) AND ((Systematic Review) OR (Metaanalysis))".Grey literature searches were conducted in three appropriate databases (opensigle.inist.fr,https://www.ntis.gov/,https://www.apa.org/pubs/databases/psycextra,accessed on 22 November 2022).

Study selection
Two researchers (JAN and LBL) independently reviewed titles and abstracts.Agreement between the reviewers was assessed using kappa statistics.Any paper that was deemed to be potentially eligible by one of the two reviewers was ordered as a full-text article and screened independently by the reviewers.Disagreements were discussed with a third reviewer (JB).

Data extraction process and data items
Two reviewers (JAN and LBL) separately extracted the following: authors and year of publication, objective/focal question, databases scanned, number of studies included, type of studies included, main results and main conclusions.Any differences of opinion were resolved by discussion with a third reviewer (JB).

Risk of bias assessment
To determine the methodological quality of the included systematic reviews, two researchers (JAN and LBL) used the A Measurement Tool to Assess Systematic Reviews (AM-STAR 2) [16].AMSTAR 2 is a comprehensive 16-item tool that ranks the overall methodological quality of a systematic review.Accordingly, the quality is ranked as follows: High means 'Zero or one non-critical weakness'; Moderate means 'More than one non-critical weakness'; Low means 'One critical flaw with or without non-critical weaknesses'; and Critically low means 'More than one critical flaw with or without non-critical weaknesses.The AMSTAR 2 online tool was used to calculate the AMSTAR quality score for each study.(https://amstar.ca/Amstar_Checklist.php, accessed on 11 February 2023).

Study selection
A total of 738 titles were identified by the electronic search.Thirty-four potentially eligible full texts were screened after manual assessment of the title/abstract and deletion of duplicates (Figure 1).During the full-text screening, 16 studies were excluded with justification (Supplementary File S2).A total of 18 systematic reviews met the inclusion criteria.The inter-rater reliability of the full-text screening process was found to be high (kappa score = 1.00).
follows: High means 'Zero or one non-critical weakness'; Moderate means 'More than one non-critical weakness'; Low means 'One critical flaw with or without non-critical weaknesses'; and Critically low means 'More than one critical flaw with or without noncritical weaknesses.The AMSTAR 2 online tool was used to calculate the AMSTAR quality score for each study.(https://amstar.ca/Amstar_Checklist.php, accessed on 11/02/2023).

Study selection
A total of 738 titles were identified by the electronic search.Thirty-four potentially eligible full texts were screened after manual assessment of the title/abstract and deletion of duplicates (Figure 1).During the full-text screening, 16 studies were excluded with justification (Supplementary File S2).A total of 18 systematic reviews met the inclusion criteria.The inter-rater reliability of the full-text screening process was found to be high (kappa score = 1.00).

Synthesis of Results
Overall, three main topics of research were found among the included SRs: panoramic radiographs-based methods; three-dimensional imaging methods; and artificial intelligence (AI)-based methods.
As regards to Demirjian's method, all studies are in agreement of an overestimation that varies between 4 to 9 months [6,[20][21][22]24,26].The majority affirm that this method is geographically sensitive [6,20,21,24], except the studies that are single-population oriented [22,26].With respect to sex, two studies tend to overestimate the females [21,24], two showed more overestimation in males and lastly, one study reported that there were no differences between sexes [6] and one did not analyze sex subgroups [22].Demirjian's method was also applied solely to the third molar in two papers.Haglund et al. [23] defined the accuracy of the method for the 18 years old threshold as 71% and Rolseth et al. [33] identified that the different development ranged from 4 to 7 years of the 3rd molar mineralization.
As for the Willems' method, most studies showed a slight age overestimation, varying between 1 to 5 months [21,22,25,28,29], aside from one [26], that concluded underestimation by a month.They also conclude that males are more susceptible to this overestimation [25,26,28,29], except one [21], which concludes that females are more sensitive.One other study [22] does not analyze sex subgroups.Respecting age and geographic subgroups, some authors [21,25,28,29] reported differences between populations as the rest only studied a single population [22,26]; Esan et al., Yosuf et al. and Wang et al., also stated that a few age subgroups are more prone to overestimation [21,25,29].Regarding Cameriere's method, studies demonstrated overestimation that varied from 3 months to one year, making this variation slightly greater in males, but without statistically significant difference [19,22,32].Marroquin et al. [32] also reported that in the Indian subpopulation, this variation can be as high as 10 years of overestimation.Cameriere also developed an index to assess the threshold of 18 years old with a percentage of correct classification ranging from 72 to 96%, with a better accuracy in males [27].
Kvall's method [32] overestimates age between 1 to 2 and 12 to 13 years old and Chaillet's method [18] within 6 to 8 months.Both methods overestimate more females and present different results when analyzing various geographical subpopulations.
Other methods included in this overview were only investigated by Franco et al. [22] for the Brazilian population.Nolla's method demonstrated 2 to 3 months of overestimation, Lilequist and Lundberg's method 1 to 2 months of underestimation, Mornstard's 3 to 4 months of overestimation and lastly, Haavikko's method underestimates between 10 to 12 months.

Three-dimensional imaging methods
Overall, the level of evidence of the SRs relating to forensic methods based on 3-dimensional imaging was of low quality.
Cone Beam Computed Tomography (CBCT) and Computed Tomography (CT) Scans reported different margin of errors, according to the method applied, ranging from 3,5 to 28 years [32].When using pulp/tooth ratio, individuals were correctly identified between 30 and 90%, being the majority around 60% [30].Both studies [30,32] agree that these values differ depending on the sex of the individual and the type of the tooth.
As for Magnetic Resonance Imaging (MRI), all studies concluded that results from images obtained through MRI are equated to panoramic radiographs, without the ionizing radiation [17,30].

AI-based methods
The level of evidence on AI-based methods for sex prediction using dental measures was collectively based on low quality SRs.AI displayed precision and accuracy similar to trained examiners, overcoming the observer subjectivity.Although the accuracy, real-life testing and validation are yet to be proved [31].

Discussion
This umbrella review was able to sum up the evidence provided by the available SRs on age estimation methods in Forensic Dentistry.The collective knowledge is currently based on low-to moderate-confidence evidence-based studies, at best ranging from critically low to high quality.Overall, these results show that, due to the poor quality of many of the studies mentioned here, some of the forensic tools used today may be outdated or misused, and some results must be analyzed with caution.
Demirjian's method obtained global acceptance and became the most widely used method for dental age estimation [21].Nonetheless, most studies concluded that this method is geographically sensitive and varies according to the subpopulation studied [6,20,21,24].A possible reason for such a thing was the origin of this dataset (Caucasian subpopulation) with low heterogeneity, different from the other subpopulations studied [6,20,21,24].Demirjian's method tends to overestimate the age of the individuals, regardless of the sex of the subject.For such reasons, Demirjian's method renders as a poor forensic tool when misapplied [20,24] Willems' method is also geographically sensitive [21,25,28,29].Sehrawat et al. [28], for instance, determined that this method overestimates in the majority of the countries, except China and India.Sex seems to be a factor to consider, because this method tends to overestimate more males than females [21,25,26,28,29].
Three studies comparing both Demirjian and Willems' methods are all in agreement that the latter is more accurate and less prone to overestimation [21,22,26].
Demirjian's method was introduced in the 1970's and Willem's method in the 2000's.Since these methods (and most of the remaining methods) are based on tooth maturation and this characteristic is growth dependent, it is likely that it will need to be updated on a regular basis.Growth patterns have evolved with the improvement of healthcare, nutrition and genetics so new methods must be developed to accompany the evolution of times [21,34].
Not only in Forensic Dentistry, dental methods and indexes have been misused.In Orthodontics, indexes such as Bolton, developed from a specific subpopulation, have been proven incorrect when generalized to other populations [35].
Cameriere's method is one of the geographically stable methods and tends to overestimate by 4 months, both boys and girls, without statistically significant difference [19,22,32].
According to Hostiuc et al. this method seems to outperform several others, including Dejirmijian and Willems methods [19].
Santiago et al. concluded that the I3M has been validated in several sub-populations throughout the world, since it has a high accuracy in discriminating whether a person has reached the age of 18 years, regardless of the population studied.In terms of sex, men tend to have better results, but women also have high accuracy, sensitivity and specificity.[27].
Forensic age estimation based on dental measures has been, until recently, based on bidimensional imaging.Three-dimensional imaging is gaining more relevance in all dentistry areas, including forensic dentistry with the first study that reported the use of this technology was published in 2004 [36].CBCT and CT scans reconstructions allow the investigators to analyze the pulp/tooth volume ratio.The volume ratio might be an interesting tool for predicting age after root maturation of the third molar, around the second decade of life [30].Despite Micro-CT scans requiring the use of extracted teeth, the images are of greater quality and produce accurate measures because of more spatial resolution that of a CBCT; however, its application in live subjects is not viable, and a model based in this type of imaging may not be replicable for forensic proposes in live individuals [30,32].Major limitations of this method are the artefacts produced by adjacent metal structures and restorations, such as implants and amalgam fillings [30].Moreover, the difficulty in reproduction of the site for measurements might lead to an inaccurate analysis; the lack of a simple method of investigation should be the focus for the next researchers [30].
Due to the ethical implications of the usage of ionized radiation for other than diagnostic indications, MRI arose as a valid alternative since it uses strong magnetic fields and radio waves to generate imagens.It is a relatively new tool in age estimation, published for the first time in 2015 [37].Regarding the methodology itself, MRI can be used associated with other methods previously stated, such as Demirjian's.MRI tends to be more accurate in the early stages of development but to be more challenging and inaccurate in the latter stages because of the lack of contract between dental and bone tissue.It also takes more time and is more expensive than an ordinary panoramic radiograph [30].Due to the scarcity of research on this subject, the inter-ethnic variability is not yet proven [17].Discrepancies between the MRI approaches make it inappropriate to pool data together and perform a proper systematic review with meta-analysis.Furthermore, future age estimation methods based on MRI will probably be based on multifactorial sites and measures [17].
AI-based automated systems have been developed to surpass the examiner's subjectivity.The AI model that best performed was the Deep Learning Convolutional Neural Network approach, with similar accuracy when compared with trained researchers.AI models that combine a dual Convolutional Neural Network, first to predict sex and then age, outperform a single Convolutional Neural Network approach.However, AI-based models have not proven themselves in the field to be routinely applied [31].
The TRIPOD [38] checklist should be followed for the improvement of research quality.Particular attention should be paid to the choice of study design and reasons for exclusion, and the research question should be more clearly formulated.In order to avoid biases, future SRs should consider the RoB of individual trials and the number of authors who performed data extraction.

Strengths and limitations
There are several strengths of the present umbrella review.Overall, using a transparent and evidence-based methodology, these results provide a comprehensive overview of the available SRs for age determination in Forensic Dentistry.Because the individual studies included in each of the present SRs were not reviewed, we recommend a cautious interpretation.Therefore, the conclusions rely on the interpretation of the authors of the systematic review.

Table 1 .
Characteristics of included SRs.
Briggs Institute; MA-meta-analysis; MR-magnetic resonance; N-number of included studies; NI-no information; NOS-Newcastle-Ottawa Scale; NR-not reported; QAS-quality assessment tool.QUADAS-quality assessment and diagnostic accuracy tool; RCT-randomized controlled trials; SR-systematic review; STROBE-Strengthening the reporting of observational studies in epidemiology.*Detailedinformation regarding the methodological quality assessment is present in Table2.

Table 2 .
Methodological quality of the included SRs.