Performance of Artificial Intelligence Models Designed for Automated Estimation of Age Using Dento-Maxillofacial Radiographs—A Systematic Review

Automatic age estimation has garnered significant interest among researchers because of its potential practical uses. The current systematic review was undertaken to critically appraise developments and performance of AI models designed for automated estimation using dento-maxillofacial radiographic images. In order to ensure consistency in their approach, the researchers followed the diagnostic test accuracy guidelines outlined in PRISMA-DTA for this systematic review. They conducted an electronic search across various databases such as PubMed, Scopus, Embase, Cochrane, Web of Science, Google Scholar, and the Saudi Digital Library to identify relevant articles published between the years 2000 and 2024. A total of 26 articles that satisfied the inclusion criteria were subjected to a risk of bias assessment using QUADAS-2, which revealed a flawless risk of bias in both arms for the patient-selection domain. Additionally, the certainty of evidence was evaluated using the GRADE approach. AI technology has primarily been utilized for automated age estimation through tooth development stages, tooth and bone parameters, bone age measurements, and pulp–tooth ratio. The AI models employed in the studies achieved a remarkably high precision of 99.05% and accuracy of 99.98% in the age estimation for models using tooth development stages and bone age measurements, respectively. The application of AI as an additional diagnostic tool within the realm of age estimation demonstrates significant promise.


Introduction
Age plays a crucial role in defining a person's identity [1].The pursuit of accurate age estimation methods has persisted throughout time.Whether for living or deceased individuals, the need for reliable age estimation remains significant in various scenarios.There are different approaches to determining someone's age, which include considering their chronological age, skeletal age, and dental age [2].Chronological age refers to the length of time that has passed since birth and is the primary way of defining age [3].
Chronological age, alongside biological sex and ethnicity, is a crucial factor in anthropological and forensic studies [4].Estimating chronological age has been successfully performed by assessing the development of bones.Various skeletal parts, such as the pubic symphysis, auricular surface, and sternal ribs, have been utilized for this purpose.It should be noted that there is not one specific method based on bone development that consistently outperforms others, as the effectiveness of each method depends on numerous factors [5].
Dental maturity is a highly dependable approach for estimating chronological age in criminal, forensic, and anthropological contexts and has the ability to serve as a reliable indicator of age [6][7][8].Teeth are frequently utilized in age estimation due to their less susceptible nature to external influences, such as genetics or environment [9].Due to their highly mineralized structure, teeth are resistant to decomposition after death and can withstand flames, alkalis, and acids [10].While bones may degrade over time, teeth can be preserved for extended periods and are therefore a dependable method of identification in emergency scenarios [11,12].
A blend of techniques-visual, radiographic, chemical, and histological-are utilized for determining dental age.Visual assessment relies on tracking the succession of tooth emergence and functional transitions that accompany aging, like wear and alterations in tooth hue.Radiographic scans offer insight into the developmental stage of teeth, from the inception of mineralization to crown shaping and root tip maturation.Biochemical techniques help to identify changes in ion levels as an individual ages.Histological methods involve preparing tissues for thorough microscopic analysis [8,13,14].Morphological and radiographic techniques such as Schour and Massler's method, Demirjian's method, and Kvaal's method prove to be effective in determining age in living individuals who are in their teenage and adult years.When it comes to deceased individuals, histological and biochemical techniques like Gustafson's and Johanson's method, the Bang and Ramm method, aspartic acid racemization, and the cemental annulation technique come into play for accurately determining age [1,15].
Dental age estimation relies on two distinct methods: assessing the timing of tooth eruption and analyzing the progression of dental maturity stages.The latter, dental maturity, is deemed more dependable due to its high heritability, low coefficient variation, and autonomy from external factors such as nutrition, hormones, and environmental influences [7,16].Dental radiographic records can help determine a person's age by assessing different characteristics.These include jaw bone development, tooth germ appearance, stage of tooth crown completion and eruption, extent of deciduous teeth resorption, measurement of open apices in teeth, size of pulp chamber and root canals, formation of secondary dentin, tooth-to-pulp ratio, and development and structure of the third molar [17,18].
Estimating dental age is a complicated task, as teeth come in all shapes and sizes, making it a unique challenge.The complexities and variations within and among individuals further complicate the process [19].As people become older, they experience changes like reduced alveolar bone levels and altered pulp-to-tooth ratios.However, using only direct measurements of the first molar may result in a significant margin of error of 8.84 years [19].Moreover, previous methods of age estimation in dentistry have been limited, focusing only on specific aspects of teeth and often resulting in large error margins.Since existing age estimation methods are prone to errors and bias, we hypothesized that an improvement in accuracy could be achieved by removing subjective elements and automating the process.There have been continuous efforts to enhance the precision of AI-powered age estimations, such as utilizing deep learning algorithms, over the past ten years [5].
Dental radiographs have been utilized to demonstrate the dependability of convolutional neural networks (CNNs) for a range of dental ailments like dental caries [20], periodontal disease [21], odontogenic cysts and tumors [22], and conditions affecting the maxillary sinus and temporomandibular joints [23,24].CNNs have an advantage over traditional individual feature-based techniques as they can conduct end-to-end learning and automatically extract relevant features from raw data without human intervention.They do not need human-engineered techniques, so an AI system could greatly reduce the work of human interpreters or observers in predicting dental age [25].Additionally, since CNNs generate a comprehensive feature set from data on their own, they perform well with large datasets.Therefore, this systematic review was carried out to assess and report on the performance of AI models developed for automated age estimation from dento-maxillofacial radiographs.

Materials and Methods
To ensure the quality of the methodology, the authors adhered to the diagnostic test accuracy criteria specified in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension (PRISMA-DTA) [26].This systematic review protocol is registered in PROSPERO with registration number CRD42024528182.The search for literature was guided by the PICO (Problem/Patient, Intervention/Indicator, Comparison, and Outcome) criteria detailed in Table 1.

Search Strategy
We utilized an array of reputable databases, such as PubMed, Scopus, Embase, Cochrane, Web of Science, Google Scholar, and the Saudi Digital Library, to conduct a digital search for data.Our comprehensive search encompassed the years 2000 to 2024.Further to that, we used Boolean operators (AND, OR) and applied English language filters.To supplement our electronic search, we also manually scrutinized applicable research publications and their citations.This process included inspecting the reference lists of previously gathered articles in the college library.The search was carried out by two separate authors who were specifically trained to carry out the same task.

Study Selection
Article selection was based on how relevant articles were to the field of study, alongside the allure of their titles and abstracts.Two researchers (S.B.K. and F.B.) conducted the search process independently, resulting in a bounty of 580 articles initially considered, with 578 discovered through the electronic search and 2 found through manual exploration.To safeguard against duplicity, two additional team members conducted a thorough inspection, purging 387 replicates.The remaining trove of 193 manuscripts underwent a meticulous assessment to ascertain their eligibility.

Eligibility Criteria
The papers selected for this comprehensive review had to adhere to specific guidelines: (a) original works exploring AI; (b) inclusion of quantitative data for examination and analysis; and (c) clear references to the data enabling evaluation of AI-based models.For the study design to be included in this review, no limitations were imposed.Articles not delving into AI innovation, conference papers never published or that were only available online, unpublished works, articles lacking full-text availability, pilot studies, and those not in English were excluded.

Data Extraction
After evaluating the selected papers based on their titles and abstracts and removing duplicates, the authors thoroughly examined the full texts.Following this evaluation, the count of articles meeting the criteria for inclusion in the systematic review dwindled to 28.To uphold impartiality, the publications' journal names and author details were expunged, allowing two impartial reviewers (M.A. and A.S.), unconnected to the initial search, to appraise them.Pertinent data from the chosen papers were meticulously extracted and inputted into a Microsoft Excel document, encapsulating particulars on writers, publication dates, research goals, AI algorithm varieties employed, and the data for model training, validation, and testing.Results, findings, and recommendations from the research were also recorded.Disagreements regarding the inclusion of four articles arose due to insufficient evidence supporting their results and conclusions.After consultation with another qualified author (A.F.), these four articles were excluded.As a result, a total of 26 articles were carefully curated and meticulously examined in this systematic review, as shown in Figure 1.These 26 articles were deemed worthy of consideration for inclusion and were subjected to a rigorous evaluation.
The papers were scrutinized for quality using QUADAS-2 [27], which delved into various aspects of research design and reporting, including patient selection, index test, reference standard, flow, and timing.This evaluation sought to gauge the applicability of the data across diverse clinical settings and patient cohorts, while pinpointing possible sources of bias.Two reviewers showed substantial agreement, with an 82% level of agreement measured by Cohen's kappa.

Results
After an in-depth analysis of 26 articles, qualitative data were extracted.Most of the articles were published in the last four years, indicating a rising trend in articles focusing on the implementation of AI models for tooth numbering and detection.

Study Characteristics
The study characteristics decoded comprised details regarding the authors; year of publication; research goals; AI model development algorithms employed; training, validation, and testing data sources; model accuracy assessment; research findings; and any guidance offered by the authors.

Outcome Measures
Efficiency in task execution was evaluated by employing different metrics, such as measurable or predictive outcomes including receiver operating characteristic (ROC), area under the curve (AUC), accuracy, sensitivity, specificity, precision, recall, F-measure, mean absolute error (MAE), root mean squared error (RMSE), R squared (R 2 ), and root mean squared percentage error (RMSPE).

Risk of Bias Assessment and Applicability Concern
The evaluation of study quality and risk of bias was conducted using the QUADAS-2 assessment tool (Table S1).All studies employed patient-derived secondary information in the form of dento-maxillofacial radiographs as the input for the CNNs, presuming randomization and non-randomization to be equally dispersed in primary studies.The patient-selection domain was considered to have no risk of bias.The standardized methods for entering data into AI technology helped mitigate bias in the flow and timing domain.Nevertheless, two of the studies (15.38%) [37,46] failed to clearly delineate the reference standard employed, giving rise to inherent bias concerns in the index test, reference standard, flow, and timing domains.Another (7.69%) study [46] relied on notations from solitary observations as a gold standard, culminating in a high risk of bias with respect to index tests.Despite the above-mentioned issues, both research arms exhibited minimal risk of bias in all the studies considered.The risk of bias evaluation and applicability concerns in the studies analyzed are presented in Table S1 and Figure 2.This study supported the use of ML algorithms instead of using standard population tables.

DPRs
Conventional method The accuracy of the conventional method with the internal test set was slightly higher than that of the data mining models, with a slight difference (mean absolute error< 0.21 years, root mean square error< 0.24 years).

Neutral (N)
The threshold was also similar between the conventional method and the data mining models.
This    The models' highest accuracy and confidence intervals were found to belong to the RF algorithm.

Neutral (N)
The models' performances were found to be low.
The models were found to be low in performance but were considered as a different approach.

Assessment of Strength of Evidence
The certainty of evidence was evaluated using the Grading of Recommendations Assessment Development and Evaluation (GRADE) technique [54].There are four levels of certainty: very low, low, moderate, and high.This is determined by assessing five factors: risk of bias, inconsistency, indirectness, imprecision, and publication bias.According to the assessment, the included papers demonstrated a high level of certainty of evidence, as shown in Table 3.The certainty of the studies included in this systematic review was evaluated using the Grading of Recommendations Assessment Development and Evaluation (GRADE) approach.Inconsistency, indirectness, imprecision, risk of bias, and publication bias were the five domains that determine the certainty of evidence and can be categorized as very low, low, moderate, or high evidence.The overall certainty of evidence from the included studies in this review was found to be high.

Discussion
Determining age is crucial in various areas, like in forensic science for identifying individuals in various situations like mass casualties and criminal cases [55].It also helps to verify the ages of athletes and immigrants to uphold equal rights and fairness [56].Additionally, it aids in planning orthodontic and pediatric treatments by predicting jaw growth spurts [57].To accurately estimate age, it is essential to assess sexual characteristics, bone development, and tooth development [12].
Chronological age can be estimated using three main categories: laboratory-guided molecular biology studies, dental indicators, and bone markers [50].Dental age assessment involves comparing the developmental stages of both temporary and permanent teeth with dental development charts created by various researchers [50].Researchers have established different scales based on the developmental stages of both permanent and temporary teeth observed in radiographs.The age of 14, when the permanent second molars erupt, marks the conclusion of the childhood and mixed dentition phase and serves as a reliable method in age estimation [37].Two conventional methods that are commonly used for age determination are the 'Atlas method' and 'Demirjian's method'.The former compares radiographic dental development (mineralization) with published standards, and the latter is a scoring method that involves scoring the development of seven left lower mandibular teeth in eight categories (A-H) [58].
Even though these manual methods have been correctly utilized in various groups, there are still specific drawbacks in clinical settings, such as the subjectivity of the technique and potential bias in measurement.Additionally, these procedures can be tedious and time-consuming [31].The conventional approach to dental age estimation using image processing involves a series of manual procedures, including segmentation, feature extraction, image pre-processing, classification, and regression.Each of these steps carries a risk of errors and can introduce variability in the final result.For instance, bone images obtained from radiography scans may differ between dry and wet conditions, even within the same age group [51].
Deep learning methods are known as end-to-end learning-based approaches, where deep neural networks like convolutional neural networks can directly process input images and generate the desired output without the need for intermediate steps like segmentation and feature extraction [29].Thus, automated dental age estimation is very much essential in order to improve the accuracy and repeatability of age estimation [7].Hence, this systematic review was undertaken to assess the development and performance of AI models in automated age estimation.

Effectiveness of AI in Automated Age Estimation Using Tooth Development Stages
Dental radiological techniques for age assessment typically involve parameters like tooth development stages, tooth eruption, open apices of teeth, development of jaw bones, and pulp-tooth ratio.Tooth development is more commonly used than eruption in age assessment as the latter can be affected by external factors, whereas formation is a continuous, cumulative, and advancing process [59].Out of the studies reviewed, a total of 20 have delved into the application of AI for estimating age based on tooth development stages.The study conducted by Mulla et al. [29] achieved the highest accuracy of 98.8% and precision of 99.05% out of all other AI models in age estimation.Their approach was assessed using various performance metrics on a dataset containing 1429 dental X-ray images and indicated that features based on AlexNet outperformed those based on ResNet.Furthermore, the k-NN classifier demonstrated superior performance across different metrics when compared to other classifiers [29].
It was observed in our review that the major drawbacks associated with traditional age estimation methods were underestimation and overestimation of age.It was found that the Chaillet and Demirjian method underestimated the dental age of Malaysian Chinese individuals in the study conducted by Bunyarit SS et al. [28].Therefore, a population-specific predictive model was created using an artificial neural network-multilayer perceptron (ANN-MLP) to improve the accuracy of age estimation.The discrepancy between chronological age (CA) and dental age (DA) was much lesser (−0.05 ± 0.92 years for boys and −0.06 ± 1.11 years for girls) when utilizing the ANN-MLP networking model.In contrast to this, Galibourg et al. [30] reported that Demirjian's and Willems' methods, both overestimated the age, Demirjian's by a mean of 257 days and Willems' by 80 days, and affirmed that machine learning methods outperformed traditional approaches for age estimation using radiographic dental staging from childhood to early adulthood.These findings align with a meta-analysis which indicated that Demirjian's method tends to overestimate females' ages by 0.65 years and males' ages by 0.60 years on average [60].
A dental age estimation model that is fully automated, with no human involvement, outperformed one that depended on manually defining features.The automated model achieved a mean absolute error (MAE) of 0.83 years, which was half of that of the manual model.This autonomous method might reveal previously unrecognized age-related features, thereby improving the model's overall performance as reported by Han M et al. [36].
In another study conducted by Kumagai et al. [42], it was found that the accuracy of the conventional method using the internal test set was slightly better compared to the AI models.The difference in mean absolute error was less than 0.21 years, and the root mean square error was less than 0.24 years.The discrepancies between the conventional methods and AI models were around 44 to 77 days with mean absolute error and 62 to 88 days with root mean square error.While the conventional methods showed a slight edge in accuracy in this research, it is uncertain whether this small difference has significant clinical or practical relevance.These findings suggest that dental age estimation using AI models can be done with nearly the same precision as the conventional method.
Shen et al. [34] conducted a study on estimating the dental age of seven permanent teeth in the left mandible in Eastern Chinese individuals aged 5 to 13 years.They used the Cameriere method for age estimation and compared it with linear regression, support vector machine (SVM), and random forest (RF) models.Their results showed that all three AI models had higher accuracy than the conventional method.The improved accuracy could be due to including younger participants in the study sample.As age estimation becomes more precise with the increasing number of developing teeth, the presence of younger subjects in the study could lead to higher accuracy in the derived age estimation method [61].

Effectiveness of AI in Automated Age Estimation Using Tooth and Bone Parameters
The neural model created in the research conducted by Zaborowicz M et al. [48] demonstrated the lowest prediction errors of 2.34 months in determining the metric age of boys.Specific sets of 21 tooth and bone parameters that were developed as mathematical proportions by the same author were utilized here [62].The study used panoramic radiographs of people with normal dental development and no systemic illnesses.Cases with root canal treatment or extensive fillings were excluded to improve network construction.
In a different study [49], using the optimal EfficientNet-B5 model, the group of females aged 22 to 31 had the smallest prediction error (MAE 0.96, RMSE 1.52), whereas the group of males aged 52 to 61 had the highest error (MAE 5.12, RMSE 7.03).The discrepancy between estimated and real age increased with age.The dentition, maxillary sinus, mandibular body, and mandibular angle all contributed to age estimation.The class activation mapping results indicated that different anatomical structures were relevant in age groups.Characteristics were predominantly in the teeth in younger age groups (12 to 21 and 22 to 31 years), which is in line with conventional techniques.The emphasis turned to the maxillary sinus upon movement into the middle age groups (ages 32 to 41 and 42 to 51).Mandibular body and mandibular angle were emphasized for older age groups (52-61 and 62-71 years) [49].
In comparison to the ResNet101 network, VGG16 demonstrated superior performance in estimating DA using OPGs on a large scale using teeth and bone parameters according to the study conducted by Wang J et al. [51].The VGG16 model yielded satisfactory predictions for younger age groups, with an accuracy of up to 93.63% in the 6-to 8-year-old category [50].

Effectiveness of AI in Automated Age Estimation Using Bone Parameters
A novel automated machine learning model for bone age estimation was proposed by Sharifonnasabi F et al. [51].The current bone age estimation models are mainly in the research phase and have not been widely adopted in the industry.The proposed model achieved high accuracy (99.98%) levels for different age ranges and outperformed existing models.Testing on diverse datasets and races confirmed the superior performance of the HCNN-KNN model, making it a promising tool for bone age measurement [51].

Effectiveness of AI in Pulp-Tooth Ratio
Dental age estimation in adults is based on quantifying age-related morphological changes of teeth, such as the deposition of secondary dentin.Even after the completion of root formation, the odontoblasts remain functional, continuing the production of secondary dentin throughout life.As a result of this physiological process, the dimensions of the pulp chamber gradually change [63].Various age estimation methods have been developed based on this decrease.This assessment can be definitively performed through nonradiological methods like histological and biochemical approaches.Due to the need for tooth extraction in these methods, they are not suitable for living individuals or situations where tissue collection from human remains is not feasible.Therefore, radiological methods are more easily applicable for dental age estimation.Radiological methods have progressed significantly, enabling the three-dimensional imaging of hard tissues in the jaws [64].
Pulp chamber volumes are utilized in estimating dental age, and these ratios can be analyzed through deep learning.Despite the low performance of the models as reported in a study conducted by Dogan B et al. [53], they represent a different approach.The algorithms performed most accurately for the 18-25 age group compared to other age groups.Exploring different parameters derived from various measuring techniques in CBCT data could aid in developing machine learning algorithms for age classification in forensic scenarios.The measurements should be always taken from the cementum-enamel junction level on the axial section to obtain accurate three-dimensional secondary dentine deposition [53].
A study evaluated the effectiveness of Kvaal's age estimation method using various ML attribute extraction approaches and algorithms on a population from northeastern Brazil, based on pulp-tooth ratios.The findings suggested a positive outcome for the semantic-radiomic association attribute.Kvaal's method and ML yielded better results for the male dataset, with ML outperforming the Kvaal method by around 1 year across all analyzed scenarios [52].

Challenges and Future Considerations in AI
AI models developed for age estimation using dento-maxillofacial radiographs can be applied for various tasks like determining identities of dead people in explosions and bomb blasts, evaluating athletes in competitive sports, judging juvenile delinquencies, clinical and forensic purposes, adopting undocumented children of uncertain ages, handling international refugees, and planning treatment for patients.Despite promising results from studies evaluating the performance of AI models for automated age estimation from dento-maxillofacial radiographs, various factors need to be considered before a definitive conclusion can be reached.The limitations and challenges reported in most of the studies are mainly related to the limited number of datasets and the lack of a good number of previously reported studies for a comparison of the results.Hence, it is necessary to conduct more studies with abundant sample sizes and in diverse populations in order to improve the applicability of this approach.The requirement for abundant data can also be addressed through the application of data augmentation methods.Furthermore, the training datasets need to be precise, reliable, and free from significant errors to ensure optimal performance [65].Some studies reviewed had used a limited number of dental radiograph datasets compared to the wealth of data utilized in medical AI research.This may lead to the development of AI models that are excessively specialized, potentially skewing results towards overly optimistic outcomes.The essence of this issue lies in the fact that AI algorithms typically require a substantial amount of data for effective generalization to diverse scenarios.Consequently, it is crucial to validate the sample size and possibly conduct statistical analyses to ensure that the findings have broader applicability.The overall findings of the studies included in the paper suggest that AIbased models which include ML and DL display high accuracy and minimal average error and outperform the classical methods applied for age estimation [34,39].However, the mean error of deep learning techniques is claimed to be somewhat greater than that of machine learning regression approaches, despite the fact that they can save more time in object detection [34].When applying ML models for age estimation, we should consider individual variability and use additional predictors in order to reduce the variability [66].Deep learning techniques can perform a more detailed analysis process, where they can work directly on input images and provide the desired output without requiring the completion of intermediate processes like feature extraction and segmentation.However, designing and training a deep neural network is an expensive, time-consuming, and difficult process.Therefore, new approaches are developed that can utilize pre-trained deep neural networks and perform the necessary tasks, and these approaches are termed transfer learning methods [31].These transformers have made a breakthrough in computer vision.Age estimation models that were developed using transfer learning were more feasible in terms of cost, time spent in developing the model, and performing the task more precisely [31,49].

Conclusions
This systematic review found that AI models demonstrated superior performance in automatic age estimation utilizing dento-maxillofacial radiographic images with increased accuracy and precision and decreased mean absolute errors.Given this specific situation, AI has the potential to act as a valuable partner in supporting the efforts of dental and forensic professionals by allowing them to handle numerous images simultaneously.However, it is crucial to recognize that the results of AI radiographic analyses are not inherently flawless, as their precision depends on the quality of the training data and the effectiveness of their model's selection and training procedures.Thus, it remains essential for experts to provide their ultimate interpretation as the final assessment.

Figure 1 .
Figure 1.PRISMA 2020 flow diagram for new systematic reviews including searches of databases, registers and other sources.Figure 1. PRISMA 2020 flow diagram for new systematic reviews including searches of databases, registers and other sources.

Figure 1 .
Figure 1.PRISMA 2020 flow diagram for new systematic reviews including searches of databases, registers and other sources.Figure 1. PRISMA 2020 flow diagram for new systematic reviews including searches of databases, registers and other sources.

Figure 2 .
Figure 2. QUADAS-2 assessment of the individual risk of bias domains and applicability concerns.

Figure 2 .
Figure 2. QUADAS-2 assessment of the individual risk of bias domains and applicability concerns.
To search for articles in electronic databases several key terms were used: artificial intelligence, age, age estimation, chronological age, precise age, age prediction, age detection, age evaluation, age assessment, dental age, age classification, tooth staging, tooth parameter, bone parameters, tooth development, convolutional neural network, automated, machine learning, deep learning, X-rays, dental radiographs, panoramic radiographs, and forensics.

Table 2 .
Details of the studies that have used automation-based models for age estimation.

Table 3 .
Assessment of strength of evidence.