Artificial Intelligence-Based Models for Automated Bone Age Assessment from Posteroanterior Wrist X-Rays: A Systematic Review

Martín Pérez, Isidro Miguel; Bourhim, Sofia; Martín Pérez, Sebastián Eustaquio

doi:10.3390/app15115978

Open AccessReview

Artificial Intelligence-Based Models for Automated Bone Age Assessment from Posteroanterior Wrist X-Rays: A Systematic Review

by

Isidro Miguel Martín Pérez

^1,*

,

Sofia Bourhim

^2,*

and

Sebastián Eustaquio Martín Pérez

^1,3

¹

Escuela de Doctorado y Estudios de Posgrado, Universidad de La Laguna, 38203 San Cristóbal de La Laguna, Spain

²

ENSIAS, Mohammed V University in Rabat, Rabat 10010, Morocco

³

Faculty of Health Sciences, Universidad Europea de Canarias, 38300 Santa Cruz de Tenerife, Spain

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(11), 5978; https://doi.org/10.3390/app15115978

Submission received: 30 April 2025 / Revised: 15 May 2025 / Accepted: 20 May 2025 / Published: 26 May 2025

(This article belongs to the Special Issue Radiology and Biomedical Imaging in Musculoskeletal Research)

Download

Browse Figure

Versions Notes

Abstract

Introduction: Bone-age assessment using posteroanterior left hand–wrist radiographs is indispensable in pediatric endocrinology and forensic age determination. Traditional methods—Greulich–Pyle atlas and Tanner–Whitehouse scoring—are time-consuming, operator-dependent, and prone to inter- and intra-observer variability. Aim: To systematically review the performance of AI-based models for automated bone-age estimation from left PA hand–wrist radiographs. Materials and Methods: A systematic review was carried out and previously registered in PROSPERO (CRD42024619808) in MEDLINE (PubMed), Google Scholar, ELSEVIER (Scopus), EBSCOhost, Cochrane Library, Web of Science (WoS), IEEE Xplore, and ProQuest for original studies published between 2019 and 2024. Two independent reviewers extracted study characteristics and outcomes, assessed methodological quality via the Newcastle–Ottawa Scale, and evaluated bias using ROBINS-E. Results: Seventy-seven studies met inclusion criteria, encompassing convolutional neural networks, ensemble and hybrid models, and transfer-learning approaches. Commercial systems (e.g., BoneXpert^®, Physis^®, VUNO Med^®-BoneAge) achieved mean absolute errors of 2–31.8 months—significantly surpassing Greulich–Pyle and Tanner–Whitehouse benchmarks—and reduced reading times by up to 87%. Common limitations included demographic bias, heterogeneous imaging protocols, and scarce external validation. Conclusions: AI-based approaches have substantially advanced automated bone-age estimation, delivering clinical-grade speed and mean absolute errors below 6 months. To ensure equitable, generalizable performance, future work must prioritize demographically diverse training cohorts, implement bias-mitigation strategies, and perform local calibration against region-specific standards.

Keywords:

bone age assessment; artificial intelligence; X-rays; skeletal maturity

1. Introduction

Bone age (BA) assessment is a critical tool for evaluating skeletal maturation in children and adolescents [1,2]. It plays an essential role in pediatric endocrinology for diagnosing growth disorders [3], monitoring pubertal development [4], and guiding treatment strategies [5]. In forensic medicine, BA evaluation is important for estimating the chronological age (CA) of undocumented individuals, such as unaccompanied minors and asylum seekers [6,7], affecting their access to vital services such as healthcare and education [8].

Traditionally, BA assessment relies on posteroanterior radiographs of the left hand and wrist (PA-HW), applying qualitative manual scoring systems—such as the Greulich–Pyle (GPA) [9] and Gilsanz–Ratib [10] atlases—and quantitative approaches like the Tanner–Whitehouse 3 (TW3) method [11]. While these techniques remain foundational, they are inherently laborious, demand substantial expertise, and exhibit notable inter- and intra-observer variability. Furthermore, they complicate the accurate monitoring of longitudinal growth and frequently lack validation across diverse ethnic and demographic cohorts, accentuating the need for tailored, region-specific atlases to enhance precision [12,13,14].

In order to address these challenges, efforts to automate BA assessment began as early as the late 1980s, with the HANDX system by Michael and Nelson’s (1989) [15], followed by the PROI-based framework by Pietka et al., (1991) [16], and Tanner et al.’s Computer-based Skeletal Aging Scoring System (CASAS) in 1994 [17]. While these pioneering platforms marked significant progress, they were still hindered by inefficiencies and lengthy processing times, underscoring the imperative for more advanced, truly automated solutions.

Today, artificial intelligence (AI) has emerged as a compelling alternative to classical BA assessment methods. AI-powered platforms such as BoneXpert^® can generate BA estimates based on GPA and TW3 standards in under 15 s per PA-HW radiograph [18]. Acting as a local DICOM node that neither stores nor transmits patient data externally, BoneXpert^® complies with stringent privacy regulations while employing classic machine-learning techniques. At the same time, deep-learning architectures—most notably convolutional neural networks (CNNs)—have recently shown even greater speed, accuracy, and consistency in automating BA evaluation [19].

Despite these advances, no comprehensive synthesis studies have yet evaluated AI-based BA models for accuracy, precision, and clinical applicability using PA-HW X-rays. Therefore, this systematic review aims to benchmark AI-based models for automated BA estimation from left-hand PA-HW radiographs against traditional methods. Additionally, this review explores the challenges and opportunities AI presents for improving BA assessment, while providing insights into future research directions in this rapidly evolving field.

2. Materials and Methods

2.1. Data Sources and Search Strategy

A systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [20], and the protocol was pre-registered in the PROSPERO database (CRD42024619808). A comprehensive literature search was carried out from 27 November 2024, to 23 December 2024, across different database, including MEDLINE (PubMed), Google Scholar, ELSEVIER (Scopus), EBSCOhost, Cochrane Library, Web of Science (WOS), IEEE Xplore, and ProQuest. Specific search strategies were tailored to each database using combinations of key terms and Boolean operators. For example, the MEDLINE (PubMed) search included the following terms: (“artificial intelligence” [Title/Abstract] OR “machine learning” [Title/Abstract] OR “deep learning” [Title/Abstract] OR “convolutional neural networks” [Title/Abstract]) AND (“bone age” [Title/Abstract] OR “bone age assessment” [Title/Abstract] OR “skeletal maturity” [Title/Abstract]) AND (“radiology” [Title/Abstract] OR “medical imaging” [Title/Abstract] OR ”pediatric imaging” [Title/Abstract]). Equivalent strategies were applied across the other databases to ensure comprehensive coverage of the relevant literature.

The literature search was independently conducted by two researchers (I.M.M.P. and S.B.). Articles were initially screened based on title and abstract, with potentially relevant studies undergoing a full-text review to determine eligibility. In the case of any discrepancies between researchers, an independent and blinded researcher (S.E.M.P.) was consulted to reach a final consensus. Table S1 provides a detailed outline of the search strategy used across all databases.

2.2. Selection of Studies

Studies were eligible if they (1) involved children and adolescents undergoing BA assessment through left PA-HW radiographs, and (2) implemented AI-based models for BA estimation. Although not required, (3) studies comparing AI-driven approaches with conventional manual methods (GPA or TW3) or other computational techniques were also included. Outcomes of interest encompassed (4) model accuracy and precision, predictive validity, inter- and intra-observer variability, processing time, and clinical applicability. Table 1 presents further details.

In addition to the PECO criteria, studies were eligible if they met the following conditions: (1) original retrospective or prospective design (e.g., diagnostic accuracy, cohort, case–control, cross-sectional, validation, case series, chapters or conference proceedings); (2) publication as full-text, peer-reviewed articles; (3) publication date between 1 January 2019, and 23 December 2024; and (4) language of publication in English, Spanish, French, Portuguese or Arabic. Conversely, studies were excluded if they exhibited any of the following: (5) randomized controlled trials, clinical trials, abstracts, editorials or opinion pieces; (6) publication before 1 January 2019 or after 23 December 2024; (7) lack of availability as full-text, peer-reviewed publications; or (8) publication in languages other than those specified. Further details are provided in Table 2.

2.3. Data Extraction

Data extraction was conducted independently by two authors (I.M.M.P. and S.B.), with a third author (S.E.M.P.) available to resolve any discrepancies. A standardized data extraction form, structured according to key study components, was employed to ensure systematic and comprehensive data collection. The extracted information included authors and publication year, study design, population characteristics (e.g., age, gender, sample size), exposure (e.g., model architecture and training methods), comparison groups, outcomes (e.g., precision, accuracy, performance metrics such as mean absolute error—MAE, mean absolute deviation—MAD, root mean square error—RMSE, standard deviation, and reading time), study setting, and conclusions. The extraction process adhered to the Cochrane Handbook for Systematic Reviews of Interventions v.6.5.0 [21], ensuring methodological rigor. To enhance reliability, the data extraction template was pre-tested on a representative subset of studies to verify accuracy and consistency.

2.4. Methodological Quality Assessment (Newcastle–Ottawa Scale)

The New-Castle Ottawa Scale (NOS) [22] was applied to assess the methodological quality of observational studies by S.E.M.P. This scale evaluates three key domains: the selection of study participants, the comparability of cohorts, and outcome assessment. Each study was rated using a star-based system, with higher scores indicating greater methodological rigor and lower risk of bias. By utilizing the NOS, a standardized and transparent framework was maintained, ensuring consistency in quality assessment and facilitating the identification of both methodological strengths and potential biases within the included studies.

2.5. Risk of Bias Assessment (ROBINS-E)

The Risk Of Bias In Non-randomized Studies of Exposures (ROBINS-E) tool [23] was used to assess the risk of bias in non-randomized studies and was conducted by S.E.M.P. This tool evaluates multiple domains, including confounding, participant selection, classification of exposure, deviations from intended interventions, missing data, outcome measurement, and selective reporting. Each study was systematically reviewed, and the risk was categorized as low, moderate, serious, or critical. The ROBINS-E framework offered a structured and transparent approach for identifying potential sources of bias, ensuring a rigorous and consistent appraisal of study reliability.

3. Results

3.1. Study Selection

The study selection process followed PRISMA guidelines to ensure transparency and rigor. A systematic search across eight electronic databases—MEDLINE (PubMed) (n = 331), Google Scholar (n = 145), Scopus (n = 189), EBSCOhost (n = 94), Cochrane Library (n = 49), Web of Science (WoS) (n = 171), IEEE Xplore (n = 97), and ProQuest (n = 126)—yielded 1 202 records. After removal of 920 duplicates, 282 unique records remained. These were screened by title and abstract (n = 143), resulting in 139 full-text articles assessed for eligibility. Of these, 63 were excluded: 12 focused on adult populations, 19 did not employ AI-based BA models, 24 relied on alternative imaging techniques, and 8 lacked sufficient methodological detail. Ultimately, 76 studies satisfied all inclusion criteria and were included in the qualitative and quantitative synthesis. The PRISMA flow diagram in Figure 1 provides a visual summary of this process.

3.2. Study Characteristics

This systematic review included a total of 76 studies employing AI models for BA assessment, encompassing a wide range of methodological approaches. The majority of the research were retrospective cross-sectional designs [24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87], primarily aimed at analyzing historical data to identify patterns and correlations. Additionally, prospective validation studies [43,73,88,89] were incorporated to provide evidence regarding the clinical applicability and efficacy of the developed models. Furthermore, comparative experimental studies [46,90,91,92,93] were included, which assessed the relative effectiveness of various methodologies or algorithms through direct comparison. The review also comprised longitudinal cohort studies [69], designed to evaluate the accuracy and stability of the models over time. Finally, benchmarking analyses [31,70,74,94,95], which set performance standards and enable systematic comparison of AI approaches, were included.

All studies targeted pediatric cohorts evaluated by left PA-HW radiographs (n = 318, 518) for skeletal maturity or growth-disorder assessment. The vast majority relied on the RSNA Pediatric Bone Age Challenge dataset for model development and validation [24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99]. A focused subset incorporated the Digital Hand Atlas (DHA) to enhance feature representation and improve BA prediction accuracy [24,25,26,27,29,88]. Moreover, for external validation and benchmarking, several investigations drew on clinical radiographic images repositories from tertiary pediatric hospitals and multinational image archives [31,32,33,34,36,37,38,39,40,41,46,92,94]. The included studies were conducted across a diverse range of 17 countries. Notably, China [24,25,26,29,30,32,33,34,40,46,47,48,50,51,54,56,62,63,64,65,68,78,79,80,81,85,86,90], South Korea [28,43,44,45,49,74,84,88,89], and the United States [31,53,57,90,92,96] contributed extensively to the research. Additional they were gathered from a variety of other nations, including India [37,38,42,59,87,95], Turkey [36,39,41,77], Germany [27,76,82], South Africa [55], Pakistan [58,61], Indonesia [91], Canada [31], Malaysia [94], Algeria [52], France [93], Brazil [66], Australia [70], Saudi Arabia [75], and Italy [83].

In these studies, a wide range of computational techniques were employed, notably AI-assisted software solutions such as BoneXpert^® (Visiana, Hørsholm, Denmark; versions 1.0.3, 2.1, 2.5.4.1, 3.0.3, and 3.2.2), VUNO Med^®-BoneAge (VUNO Inc., Seoul, South Korea; versions 1.0.3 and 1.1.0), BoneView^® (Gleamer, Saint-Mandé/Paris, France; version 2.3.1.1), and IB Lab PANDA (IB Lab GmbH, Vienna, Austria; versions 1.06 and 1.13.21) [27,30,45,53,76,77,83,89,93,99]. Furthermore, deep learning (DL) architectures were also commonly utilized by the included studies, particularly convolutional neural networks (CNNs) [24,25,26,28,31,32,33,34,35,36,37,38,40,41,42,46,47,48,49,50,51,52,53,54,55,58,59,60,61,62,63,64,65,66,67,69,70,71,72,74,75,78,79,80,84,85,86,87,88,90,91,92,94,96,97], along with transfer learning based on ImageNet-pretrained models such as InceptionV3 (Google Inc., Mountain View, CA, USA), VGG16 (Visual Geometry Group, University of Oxford, UK), ResNet50 (Microsoft, Redmond, WA, USA), MobileNetV2 (Google Inc., Mountain View, CA, USA), and EfficientNetV2B0 (Google Brain, Mountain View, CA, USA) [36,38,42,44,59,87,97].

Several studies also implemented custom neural network architectures, including Approximation-Aware Neural Network (AXNet) (Shanghai Jiao Tong University, Shanghai, China), Margin and Modality-Aware Network (MMANet) (Huazhong University of Science and Technology, Wuhan, China), and Dual Attention Dual-Path Network (DADPN) (City University of Hong Kong, Hong Kong, China) [70,86,94], as well as multi-domain neural networks [62,71,72]. Moreover, model integration strategies were adopted, including ensemble learning [31,38,90] and hybrid approaches combining CNN outputs with GPA or TW3 scores [28,43,73,88]. Region-based processing techniques were also applied, such as segmentation via U-Net (University of Freiburg, Freiburg, Germany) [32,33,46,48,85], attention mechanisms [32,64,71,81,94], and region localization using YOLOv3 (Joseph Redmon, Seattle, WA, USA) and YOLOv5 (Ultralytics, London, UK) [47,48,71,85].

Among the metrics employed, the most frequently utilized were, firstly, the mean absolute error (MAE) or median absolute deviation (MAD) [24,25,26,27,28,29,31,32,33,34,36,37,38,39,40,41,42,43,45,46,47,48,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,73,74,75,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,97,98,99], secondly, the root mean square error (RMSE) [25,26,27,28,32,88,90,91], thirdly, the accuracy within predefined temporal windows [25,26,88,90,91,94], and finally, the correlation with established reference standards such as GPA and TW3 [24,26,45,88]. In most studies, follow-up was limited to a single time point, with BA estimates for the AI-models compared directly against expert assessments or commercial software outputs immediately after image acquisition [24,28,43,49,88]. A subset incorporated intermediate follow-up intervals of 6 to 12 months (mos.) to evaluate the reproducibility, inter- and intra-observer agreement, and temporal stability of predictions, reporting significant improvements in RMSE and MAD within these windows [45,54,56,69,76].

Only a few studies carried out a long-term follow-up, for instance, Suh et al. correlated baseline AI predictions with final adult height outcomes [69], while Farooq et al. linked initial assessments to subsequent growth disorder detection over multi-year periods [61], underscoring the prognostic value of DL in pediatric growth monitoring. See details in Table S2: Study characteristics.

3.3. Methodological Quality Assessment (Newcastle–Ottawa Scale)

The methodological quality assessment using the Newcastle–Ottawa Scale (NOS) revealed that a subset of studies achieved the highest possible score of nine points, indicating strong methodological quality across all domains [33,69,80,90,97]. These studies showed robust participant selection with clearly defined eligibility criteria and representative cohorts [69,90], appropriate control or comparison groups [80,97], and validated exposure or outcome assessments [33]. They also ensured comparability by adequately controlling for confounding factors [80,97] and conducted rigorous follow-up procedures with low risk of attrition [33,69]. The majority of studies scored between six and eight points, suggesting moderate methodological quality with some limitations. In the Comparability domain, most studies demonstrated inadequate confounder control and failed to match groups or adjust for key outcome-influencing variables [25,26,28,34,35,38,39,40,41,42,43,46,62,67,81,85,94], resulting in descriptions that were too brief or incomplete for meaningful interpretation.

In the Exposure or Outcome assessment domain, several studies lacked long follow-up periods or failed to clearly describe outcome ascertainment methods, which may limit the reliability of their findings [31,65,77,83,84,86]. Their loss to follow-up or lack of outcome validation was evident. A smaller number of studies scored the lowest on the NOS, with only five points, indicating substantial methodological limitations [68,74]. These studies often suffered from weaknesses in the Selection domain, such as non-representative cohorts or unclear inclusion criteria [68,74], and from inadequate exposure assessment. A detailed breakdown of NOS domain scores and overall quality ratings for each study is provided in Table S3: Methodological Quality Assessment (Newcastle–Ottawa Scale).

3.4. Risk of Bias Assessment (ROBINS-E)

The risk of bias across non-randomized studies was assessed using the Risk of Bias In Non-randomized Studies of Exposures (ROBINS-E) tool. The results revealed considerable methodological limitations, particularly in controlling for confounding factors and in participant selection. A small subset of studies received an overall rating of “Some concerns”, indicating a relatively low risk of bias and appropriate control across most domains [26,29,37,45,68,74,75,83,93,99]. These studies generally demonstrated adequate classification of exposures, low levels of missing data, and well-defined outcome assessments.

However, the majority of studies were judged to have a high overall risk of bias. The domains most commonly affected were confounding and participant selection. Several studies were classified as having a very high or high risk of bias due to insufficient adjustment for key confounding variables [25,29,36,53,61,75,82,89]. Likewise, issues with participant selection were observed in studies that lacked adequate reporting or had flawed selection mechanisms, leading to a high risk of bias [31,38,46,62,69,77,84,90].

Regarding the classification of exposures, many studies performed adequately, receiving low risk ratings [26,37,54,68,88,96], though some raised concerns or were rated as high risk [58,65,80]. The outcome measurement domain also showed variability. Some studies were rated as low risk, with clearly defined and validated outcome criteria [29,44,60,99], while others were rated as high risk due to subjective measures or lack of assessor blinding [43,66,85,88,92]. A detailed domain-specific and overall risk of bias summary is provided in Table S4: Risk Of Bias In Non-randomized Studies of Exposures Assessment (ROBINS-E).

3.5. Main Results

3.5.1. AI-Assisted Softwares for BA Assessment Through Postero-Anterior Hand and Wrist Radiograph

Several AI-based systems have been developed and commercially certified for clinical bone age assessment, most notably BoneXpert^®, which debuted in 2009. Fully automated and requiring no expert oversight, BoneXpert^® has proven its utility in multiple studies: Bowden et al., (2022) [53] reported a strong concordance with experienced pediatric radiologists (R² = 0.96), and Özmen et al., (2024) [77] showed excellent inter-method reliability (ICC = 0.984) against manual readings. However, VUNO Med^®-BoneAge—a clinically approved, AI-driven bone age estimator seamlessly integrated into PACS workflows—slightly outperformed BoneXpert^® in prepubescent girls.

Similarly, Alaimo et al., (2024) [83] reported robust correlations (r > 0.80) between BoneXpert^® and the GPA, highlighting the tool’s capacity to reduce variability in BA evaluations. In addition, Pape et al., (2024) [99] identified BoneXpert^® as the system with the lowest RMSE among commercial tools, underscoring its stability and reproducibility.

In terms of accuracy, several studies have emphasized BoneXpert^®’s capacity to deliver BA estimations closely aligned with established reference standards. Booz et al., (2020) [27] observed superior accuracy compared to traditional GPA assessments, with BoneXpert^® achieving a MAD of 0.34 years (≈4.08 mos.) versus 0.79 years (≈9.48 mos.) manually. Jani et al., (2024) [95] further identified BoneXpert^® as one of the most accurate AI-based systems, with a MAE as low as 0.17 years (≈2.04 mos.) in fully automated hand-and-wrist assessments. Subsequently, Bowden et al., (2022) [53] reported a minimal MD of only 0.12 years (≈1.44 mos.) between BoneXpert^® and manual scoring. Complementing these findings, Wang et al., (2020b) [30] evaluated BoneXpert^® (v2.5.4.1) in Taiwanese children, comparing GPA and TW3 methods. Although they noted systematic discrepancies between them, the authors confirmed the tool’s capacity for accurate assessment and called for population-specific calibrations.

Beyond its precision and accuracy, BoneXpert^® also offers significant benefits in workflow efficiency. In this regard, Booz et al., (2020) [27] reported a substantial 87% reduction in radiographic reading time when compared to manual GPA scoring, thereby streamlining clinical practice. Similarly, Lee and Lee (2021) [43] observed a decrease in evaluation time from 54.29 to 35.37 s per image when radiologists used BoneXpert^®-based AI, highlighting its potential to accelerate diagnostic processes without compromising reliability. Collectively, these findings position BoneXpert^® not only as a precise and accurate solution, but also as a time-efficient tool for routine pediatric BA assessments.

3.5.2. Deep Learning Architectures for BA Assessment Through Postero-Anterior Hand and Wrist Radiograph

Convolutional Neural Networks (CNNs)

CNNs have shown substantial advancements in BA estimation, consistently delivering high precision and low error rates. For instance, Bui et al., (2019) [24] applied a two stage approach—Faster R-CNN for region-of-interest (ROI) detection followed by InceptionV4 for classification and regression—on 1375 left PA-HW X-rays (ages 0–18 years) from the DHA, achieving a MAE of 0.59 years (≈7.08 mos.). In the same way, Hao et al., (2019) [25] focused exclusively on carpal-bone regression in a cohort of 432 pediatric images (ages 0–6 years), reporting an even lower MAE of 0.23 years (≈2.75 mos.)—a domain where traditional techniques often struggle.

In addition, Liu et al., (2019) [26] enhanced performance by incorporating a non-subsampled contourlet transform (NSCT) prior to CNN analysis and roughly 1400 radiographs (0–19 years), improving MAE by over 0.1 years (≈1.2 mos.) compared with classical methods. More recently, Li et al., (2022) [46] introduced a fully automated pipeline—combining preprocessing, unsupervised segmentations and gender-aware modeling—and demonstrated MAEs of 0.52 years (≈6.2 mos.) on the RSNA dataset and 0.43 years (≈5.1 mos.) on an independent cohort.

Commercial systems have shown similar accuracy: Physis^® (16-bit AI™) [57] achieved a MAE of 0.57 years (≈6.8 mos.) in internal validation and 0.58 years (≈6.9 mos.) in external data, albeit with noted biases across sex, age and Tanner stage. In architecture comparisons, Kasani et al., (2023) [97] showed that their stacked MobileNetV2 achieved an MAE of 0.32 years (≈3.85 months) on the RSNA test set—outperforming Inception V3’s MAE of 0.33 years (≈3.99 mos.)—thereby underscoring MobileNetV2’s robustness for this task.

Advanced segmentation strategies also contribute: Zhou et al., (2020) [34] embedded U-Net within an active-learning loop and reported MAEs between 0.58 and 0.61 years (≈6.96 to 7.35 mos.), varying by gender. Finally, Nguyen et al., (2023) [93] evaluated the Gleamer BoneView^® system and found it outperformed general radiologists, yielding an MAE of 0.49 years (≈5.9 mos.), and higher sensitivity (72.1% vs. 57.4%) in detecting pathological cases.

Taken together, these studies highlight CNNs’ capacity to deliver MAEs as low as 0.23 years (≈2.75 mos.) and generally below 0.59 years (≈7.08 mos.), making them highly promising for fast, accurate BA assessment in pediatric radiology.

Transfer Learning

Transfer learning—by fine-tuning models pre-trained in large, general datasets—has proven especially valuable for BA assessment from left PA-HW X-rays, where labeled pediatric images are often scarce. Hao et al., (2019) [25] leveraged a pre-trained regression CNN on carpal bone features, reporting 90.15% of predictions within six months and 99.43% within 1 yr. of true BA, despite a relatively small training set. Liu et al., (2019) [26] combined multi-scale data fusion in combination with transfer learning, to reduce MAE by over 0.1 years (≈1.2 mos.) versus classical approaches, underscoring the method’s robustness on limited, domain-specific data.

Building on these foundations, Kim et al., (2021) [44] fine-tuned InceptionV3 and VGG16 on the 12,611 radiographs RSNA Pediatric Bone Age dataset, applying ROI masking and gamma correction; InceptionV3 model achieved an MAE of 0.495 years (≈5.94 mos.), while cutting computational time by 30% relative to VGG16. Metha et al., (2021) [42] similarly used InceptionV3 with enhanced preprocessing and reached an MAE of 0.493 years (≈5.92 mos.), confirming the efficiency gains of transfer learning coupled with optimized image handling.

More recent comparisons highlight architecture-specific advantages: Prasanna et al., (2023) [59] found VGG19 to outperform ResNet 50 on pediatric X-rays—1.625 years (≈19.5 mos.) versus 2.65 years (≈31.8 mos.) MAE, in turn—while Kasani et al., (2023) [97] demonstrated that lightweight models such as MobileNetV2 and EfficientNetV2B0, can, via transfer learning, reach MAEs as low as 0.32 years (≈3.85 mos.), on the RSNA dataset.

To sum up, these studies confirm that transfer learning not only boosts prediction accuracy (often cutting MAE to under 0.50 years) but also accelerates inference—making it a highly practical strategy for pediatric BA assessment in resource-constrained clinical settings.

3.5.3. Model Integration Techniques for BA Assessment Through Postero-Anterior Hand and Wrist Radiograph

Ensemble Learning

Ensemble strategies—combining multiple models—routinely outperform single-network approaches in bone-age estimation from left PA-HW radiographs. For example, Pan et al., (2019) [90] simply averaged the outputs of ten RSNA Challenge models, cutting the MAD to 0.316 years (≈ 3.8 months) versus 0.380 years (≈ 4.6 months). Likewise, Wibisono et al., (2019) [91] contrasted pure DL and hybrid pipelines: a standalone VGG16 CNN reached an MAE of 1.23 years (≈14.78 mos.), RMSE of 1.58 years (≈18.93 mos.), and Symmetric Mean Absolute Percentage Error (SMAPE) of 38.92% in ≈91 min., whereas a Support Vector Regression (SVR) on CNN-extracted features delivered an MAE of 2.71 years (≈32.49 mos.), RMSE of 3.38 years (≈40.58 mos.), and SMAPE of 28.34% in ≈8 min (XGBoost in ≈9.2 min).

Furthermore, Bui et al., (2019) [24] combined the TW3 atlas method with a Faster R-CNN/Inception-V4 pipeline—reducing MAE to 0.59 years (≈7.08 months) and achieving 99.8% mAP. Hao et al., (2019) [25] fine-tuned a regression CNN on carpal-bone regions, placing 90.15% of predictions within 6 mos. and 99.43% within one year of true BA. Finally, Liu et al., (2019) [26] introduced a NSCT as a preprocessing step before CNN inference, cutting MAE by over 0.10 years (≈1.2 mos.) relative to traditional methods.

Building on these results, Pan et al., (2020) [33] merged U-Net segmentation with active-learning regression, reporting MAEs of 0.58 years for males and 0.61 years for females. Zhou et al., (2020) [34] benchmarked their TW3-AI ensemble against expert radiologists, achieving a root-mean-square error of 0.50 years (≈6.0 mos.).

Finally, these studies demonstrate that ensemble learning significantly enhances precision, robustness, and efficiency—making it a highly practical strategy for pediatric BA assessment.

Hybrid models

The fusion of DL with established BA techniques has driven remarkable improvements in both accuracy and precision, providing powerful alternatives to manual procedures. For example, Bui et al., (2019) [24] augmented the TW3 method with a Faster-RCNN/Inception-V4 pipeline, cutting MAE to just 0.59 years (≈7.08 mos.) and achieving 99.8% mAP—an enhancement that both denoised the input data and streamlined inference. Likewise, Hao et al., (2019) [25] tailored a regression CNN-model to carpal-bone features, reaching 90.15% of estimates within 6 mos. and a MAE of only 0.23 years (≈2.75 mos.), a help for assessing preschool children where traditional methods often struggle [13]. Beyond precision gains, hybrid frameworks also accelerate throughput. For instance, Booz et al., (2020) [27] showed that AI systems like BoneXpert^® reduce reading times by 87% while maintaining expert-level accuracy.

Crucially, these hybrid solutions can be applied across populations. Beheshtian et al., (2023) [57] validated a 16-Bit AI model on both internal and external cohorts—yielding MADs of 0.57 years (≈6.8 mos.), in that order—though they cautioned that sex, age, and Tanner stage can introduce subtle biases. Nevertheless, automated TW3-based systems such as Son et al.’s [88] achieved an MAE of 0.46 years (≈5.52 mos.), outperforming GPA-based approaches and underscoring the consistency of hybrid models.

Finally, field-specific adaptations further highlight their versatility: Wang et al., (2020) [29] applied a hybrid AI model to Tibetan and Han cohorts, matching expert readings with a 0.56 years (≈6.72 mos.) MAE, while Lee et al., (2021) [35], noted greater variability among younger age groups—reminding us that ongoing refinement is needed to mitigate demographic biases. More details in Table 3.

In conclusion, these studies confirm that integrating CNN with traditional BA methods not only elevates precision but also accelerates workflows and broadens applicability—laying a robust foundation for next-generation pediatric radiology.

4. Discussion

The present study systematically reviews recent literature on the deployment of AI–based models for automated BA assessment from left PA-HW X-rays. In recent years, supervised ML—especially CNNs—has transformed BA assessment by automating the identification of ossification patterns more quickly and accurately than manual GPA or TW methods [100,101]. These CNN models outperform earlier automated systems like HANDX [15], which were developed according to the GPA scoring system, and the PROI/CASAS approach [16,17] based on the TW method, delivering both higher precision and speed.

Fully automated CNN-based solutions now achieve clinical-grade performance. Lee and Lee (2021) highlighted the efficiency and accuracy of BoneXpert^®, a commercial system certified for clinical use [43]. Similarly, Lee et al., (2017) built a CNN model that estimates BA within ±1 year in over 90% of cases and processes each image in under 2 s [102]. In this same line, Spampinato et al., (2017) showed CNNs generalizability across ages, sexes, and races—reporting a MAE of 0.8 years (≈9.6 mos.) on a public dataset and releasing their code for others to build on [103]. According to Bowden et al., (2022) further confirmed strong agreement with expert pediatric radiologists (R² = 0.96, MD = 1.44 mos.) across diverse populations [53].

AI-based models have substantially improved automated bone age (BA) assessment from left PA hand–wrist radiographs. Reported MAEs range from 2 to 31.8 months, with most results below 6 months, meeting the clinically acceptable threshold of <0.8 years [104]. These advances are particularly relevant in pediatric endocrinology, where bone age guides treatment decisions, and in forensic medicine, especially for age estimation in unaccompanied minors. Understanding the acceptable range of error, typically <0.8 years, is essential for contextualizing the practical applicability of automated systems, although slightly higher MAEs may still be acceptable depending on the clinical scenario [104,105].

Despite these advances, BA assessment remains subject to population-specific influences (e.g., genetic, nutritional, environmental, cultural, etc.) [106,107], for example, BoneXpert^® exhibited systematic bias in Taiwanese cohorts until local recalibration [30], and the traditional GP and TW3 standards—derived from North American Caucasian and British samples, respectively—yielded errors up to 11.8 months in girls of African descent [14]. To mitigate such biases, AI models must be trained and validated on demographically representative, balanced datasets, explicitly incorporating variables such as biological sex, ethnic background, and socioeconomic status [108]. Practical strategies include dataset diversification (e.g., oversampling under-represented groups, subsampling over-represented ones) [109] and the deployment of fairness-aware algorithms (e.g., reweighting schemes, adversarial debiasing, constrained-optimization methods) [110].

Several studies illustrate this approach. For example, Mame et al., (2022) [55] trained a DL model on over 12,600 PA-HW X-rays plus clinical covariates such as CA and sex, achieving an MAE of 0.67 years (≈8.04 mos.) and robust performance in adolescents [11,13]. Kim et al., (2024) combined EfficientNetV2S features with age and sex data to reach MAE of 0.41 years (≈4.92 mos.) and 91.1% accuracy overall and MAE of 0.267 years (≈3.2 mos.) and 95.0% in girls under the age of 11 years [74]. Similarly, Kim et al., (2023) tailored a model to Korean children, reducing MAE from 0.875 to 0.683 years (≈10.5 to 8.2 mos.) versus VUNO Med^®-BoneAge [67]. Along the same lines, Koitka et al., (2020) used a two-stage method—Faster R-CNN for ossification detection followed by sex- and region-specific regression—to achieve MAE of 0.38 years (≈4.56 mos.) on the RSNA test set and produce interpretable outputs for radiologists [31]. Finally, Hwang et al., (2022) and Nguyen et al., (2023) emphasize validating tools like VUNO Med^®-BoneAge and Gleamer BoneView^® across varied populations to ensure accuracy, equity, and long-term reliability [45,93]. Overall, these studies show that combining clinical covariates with optimized DL architectures and validating models on specific populations enables automated BA estimation from left PA-HW radiographs with MAEs below 6 months, yielding more accurate, equitable, and sustainable assessments.

4.1. Limitations

The life cycle of an ML model—from problem definition to deployment—requires rigor at every step. For automated BA estimation, CNNs have shown outstanding accuracy (MAE = 0.59 years, 7.08 mos.) in Bui et al., (2019) [24], but they also face technical challenges such as image preprocessing, context integration, and standardized X-rays to ensuring that models can be generalized and compared across different populations. Further, in data-scarce clinical settings, Liu et al., (2019) [26] proposed multi-scale data fusion framework by fusing CNN features with NSCT at multiple scales, markedly improving accuracy over conventional CNN spatial domain images, highlighting its potentiality in situations with limited annotated data.

However, high costs of AI-based models limit widespread adoption. Commercial systems like BoneXpert^® are precise but expensive. In contrast, open-sources such as ResNet-50 and VGG-19 offer a cheaper alternative. In fact, Kasani et al., (2023) achieved a MAE of 0.32 years (≈3.85 mos.) on the RSNA dataset using divide-and-conquer and Lightweight CNN architecture [97]. To remove manual annotation entirely, Li et al., (2023) [111] proposed a two-stage, annotation-free DL pipeline that uses attention (CBAM) to localize bone regions and adds gender as an auxiliary input, reaching MAE of 0.45 years (≈5.45 mos.) on RSNA and 0.28 years (≈3.34 mos.) on a private set, offering an efficient and interpretable option for clinical use.

4.2. Recommendations for Clinical Practice

To maximize the clinical impact of AI-based BA assessment, it is essential to employ reference models tailored to the target population. By calibrating against locally derived standards—such as the GP-Canary Atlas [13,112]—diagnostic precision improves and systematic bias is reduced when traditional atlases fail to capture regional growth patterns. Equally important is the curation of diverse training cohorts: deep-learning algorithms must be exposed to the full spectrum of patient variability in sex, ethnicity, socioeconomic background, and geography to mitigate bias and ensure consistent performance across populations.

Standardizing image acquisition—via uniform left PA-HW radiograph protocols and DICOM/FHIR-compliant pipelines—reduces inter-site variability, supports rigorous multicenter validation, and eases integration with existing PACS/RIS systems [113,114]. Equally critical is early engagement with regulatory agencies such as the EMA and FDA to secure Software as a Medical Device (SaMD) status and fast-track market clearance [115]. Concurrently, organizations must adopt an ISO 13485–certified quality management system [116], perform risk management per ISO 14971 [117], and enforce GDPR- and HIPAA-aligned data governance [118]. While these best practices apply broadly, commercial offerings like BoneXpert^® and VUNO Med^®-BoneAge have set the benchmark by combining regulatory approval, user-centric interfaces, validated clinical performance, and turnkey PACS integration.

Finally, fostering interdisciplinary collaboration among radiologists, endocrinologists, data scientists, ethicists, and policymakers ensures that technical innovation remains anchored to clinical needs, ethical standards, and operational realities [119,120]. The development of international benchmarking frameworks and prospective, multicenter validation studies will further unify evaluation criteria, streamline regulatory approval, and accelerate the adoption of AI-based bone-age tools in both pediatric and forensic settings.

5. Conclusions

AI-based models have markedly advanced automated BA assessment from left PA hand–wrist radiographs—achieving clinical-grade speed and mean absolute errors under six months. Despite outperforming manual and early automated methods, they remain vulnerable to population biases and data variability. Ensuring fairness and generalizability requires training on diverse cohorts, implementing bias-mitigation techniques, and locally calibrating models to regional standards.

Supplementary Materials

The following supporting materials are available online at https://www.mdpi.com/article/10.3390/app15115978/s1; Table S1: Search Strategy; Table S2: Study Characteristics; Table S3: Methodological Quality Assessment (NOS); Table S4: Risk of Bias Assessment (ROBINS-E).

Author Contributions

Conceptualization, I.M.M.P., S.B. and S.E.M.P.; methodology, I.M.M.P., S.B. and S.E.M.P.; software, I.M.M.P., S.B. and S.E.M.P.; validation, I.M.M.P., S.B. and S.E.M.P.; formal analysis, I.M.M.P., S.B. and S.E.M.P.; investigation, I.M.M.P., S.B. and S.E.M.P.; resources, I.M.M.P., S.B. and S.E.M.P.; data curation, I.M.M.P., S.B. and S.E.M.P.; writing—original draft preparation, I.M.M.P., S.B. and S.E.M.P.; writing—review and editing, I.M.M.P., S.B. and S.E.M.P.; visualization, I.M.M.P., S.B. and S.E.M.P.; supervision, I.M.M.P., S.B. and S.E.M.P.; project administration I.M.M.P., S.B. and S.E.M.P.; funding acquisition, I.M.M.P., S.B. and S.E.M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study is a systematic review that did not involve human or animal subjects. The review was conducted in accordance with the PRISMA guidelines and was prospectively registered on the International Prospective Register of Systematic Reviews (PROSPERO; registration number CRD42024619808).

Informed Consent Statement

This study is a systematic review that did not involve human subjects.

Data Availability Statement

The data supporting the reported results can be found in the manuscript.

Acknowledgments

We are grateful for the collaboration and encouragement provided by Mohammed V University in Rabat, the University of La Laguna, and the General Directorate of Relations with Africa of the Government of the Canary Islands.

Conflicts of Interest

The authors declare that there are no conflicts of interest. The funders had no role in the design of the study, in the collection, analyses, or interpretation of data, in the writing of the manuscript, or in the decision to publish the results.

References

Satoh, M. Bone Age: Assessment Methods and Clinical Applications. Clin. Pediatr. Endocrinol. 2015, 24, 143–152. [Google Scholar] [CrossRef] [PubMed]
Manzoor Mughal, A.; Hassan, N.; Ahmed, A. Bone Age Assessment Methods: A Critical Review. Pak. J. Med. Sci. 2014, 30, 211–215. [Google Scholar] [CrossRef] [PubMed]
Cavallo, F.; Mohn, A.; Chiarelli, F.; Giannini, C. Evaluation of Bone Age in Children: A Mini-Review. Front. Pediatr. 2021, 9, 580314. [Google Scholar] [CrossRef] [PubMed]
De Sanctis, V.; Di Maio, S.; Soliman, A.T.; Raiola, G.; Elalaily, R.; Millimaggi, G. Hand X-Ray in Pediatric Endocrinology: Skeletal Age Assessment and Beyond. Indian J. Endocrinol. Metab. 2014, 18, S63–S71. [Google Scholar] [CrossRef]
Mishori, R. The Use of Age Assessment in the Context of Child Migration: Imprecise, Inaccurate, Inconclusive and Endangers Children’s Rights. Children 2019, 6, 85. [Google Scholar] [CrossRef]
Herzmann, C.; Golakov, M.; Malekzada, F.; Lonnroth, K.; Kranzer, K. Radiological Screening of Refugees in Germany. Eur. Respir. J. 2017, 49, 1602487. [Google Scholar] [CrossRef]
Lossois, M.; Cyteval, C.; Baccino, E.; Peyron, P.A. Forensic Age Assessments of Alleged Unaccompanied Minors at the Medicolegal Institute of Montpellier: A 4-Year Retrospective Study. Int. J. Leg. Med. 2022, 136, 853–859. [Google Scholar] [CrossRef]
Greulich, W.W.; Pyle, S.I. Radiographic Atlas of Skeletal Development of the Hand and Wrist, 2nd ed.; Stanford University Press: Stanford, CA, USA, 1959. [Google Scholar]
Gilsanz, V.; Ratib, O. Hand Bone Age: A Digital Atlas of Skeletal Maturity; Springer: Berlin/Heidelberg, Germany; New York, NY, USA, 2005. [Google Scholar]
Tanner, J.M.; Realy, J.; Goldstein, H. Assessment of Skeletal Maturity and Prediction of Adult Height (TW3 Method); Harcourt Publishers: New York, NY, USA, 2001; pp. 1–200. [Google Scholar]
Martín Pérez, S.E.; Martín Pérez, I.M.; Molina Suárez, R.; Vega González, J.M.; García Hernández, A.M. The Validation of the Tanner–Whitehouse 3 Method for Radiological Bone Assessments in a Pediatric Population from the Canary Islands. Osteology 2025, 5, 6. [Google Scholar] [CrossRef]
Prokop-Piotrkowska, M.; Marszałek-Dziuba, K.; Moszczyńska, E.; Szalecki, M.; Jurkiewicz, E. Traditional and new methods of bone age assessment—An overview. J. Clin. Res. Pediatr. Endocrinol. 2021, 13, 251–262. [Google Scholar] [CrossRef]
Martín Pérez, I.M.; Martín Pérez, S.E.; Vega González, J.M.; Molina Suárez, R.; García Hernández, A.M.; Rodríguez Hernández, F.; Herrera Pérez, M. The Validation of the Greulich and Pyle Atlas for Radiological Bone Age Assessments in a Pediatric Population from the Canary Islands. Healthcare 2024, 12, 1847. [Google Scholar] [CrossRef]
Martín Pérez, S.E.; Martín Pérez, I.M.; Vega González, J.M.; Molina Suárez, R.; León Hernández, C.; Rodríguez Hernández, F.; Herrera Pérez, M. Precision and Accuracy of Radiological Bone Age Assessment in Children among Different Ethnic Groups: A Systematic Review. Diagnostics 2023, 13, 3124. [Google Scholar] [CrossRef] [PubMed]
Michael, D.J.; Nelson, A.C. HANDX: A Model-Based System for Automatic Segmentation of Bones from Digital Hand Radiographs. IEEE Trans. Med. Imaging 1989, 8, 64–69. [Google Scholar] [CrossRef] [PubMed]
Pietka, E.; McNitt-Gray, M.F.; Kuo, M.L.; Huang, H.K. Computer-Assisted Phalangeal Analysis in Skeletal Age Assessment. IEEE Trans. Med. Imaging 1991, 10, 616–620. [Google Scholar] [CrossRef] [PubMed]
Tanner, J.M.; Oshman, D.; Lindgren, G.; Grunbaum, J.A.; Elsouki, R.; Labarthe, D. Reliability and Validity of Computer-Assisted Estimates of Tanner-Whitehouse Skeletal Maturity (CASAS): Comparison with the Manual Method. Horm. Res. 1994, 42, 288–294. [Google Scholar] [CrossRef]
Maratova, K.; Zemkova, D.; Sedlak, P.; Pavlikova, M.; Amaratunga, S.A.; Krasnicanova, H.; Soucek, O.; Sumnik, Z. A Comprehensive Validation Study of the Latest Version of BoneXpert on a Large Cohort of Caucasian Children and Adolescents. Front. Endocrinol. 2023, 14, 1130580. [Google Scholar] [CrossRef]
Peng, C.-T.; Chan, Y.-K.; Yuh, Y.-S.; Yu, S.-S. Applying Convolutional Neural Network in Automatic Assessment of Bone Age Using Multi-Stage and Cross-Category Strategy. Appl. Sci. 2022, 12, 12798. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
Higgins, J.P.T.; Thomas, J.; Chandler, J.; Cumpston, M.; Li, T.; Page, M.J.; Welch, V.A. (Eds.) Cochrane Handbook for Systematic Reviews of Interventions Version 6.5 (Updated August 2024); Cochrane: London, UK, 2024; Available online: https://training.cochrane.org/handbook (accessed on 10 January 2025).
Wells, G.; Shea, B.; O’Connell, D.; Pereson, J.; Welch, V.; Losos, M.; Tugwell, P. The Newcastle-Ottawa Scale (NOS) for Assessing the Quality of Nonrandomised Studies in Meta-Analyses; Ottawa Health Research Institute: Ottawa, ON, Canada, 2011; Available online: http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp (accessed on 15 January 2025).
Higgins, J.P.T.; Morgan, R.L.; Rooney, A.A.; Taylor, K.W.; Thayer, K.A.; Silva, R.A.; Lemeris, C.; Akl, E.A.; Bateson, T.F.; Berkman, N.D.; et al. A Tool to Assess Risk of Bias in Non-Randomized Follow-Up Studies of Exposure Effects (ROBINS-E). Environ. Int. 2024, 182, 108602. [Google Scholar] [CrossRef]
Bui, T.D.; Lee, J.J.; Shin, J. Incorporated Region Detection and Classification Using Deep Convolutional Networks for Bone Age Assessment. Artif. Intell. Med. 2019, 97, 1–8. [Google Scholar] [CrossRef]
Hao, P.Y.; Chokuwa, S.; Xie, X.H.; Wu, F.L.; Wu, J.; Bai, C. Skeletal bone age assessments for young children based on regression convolutional neural networks. Math. Biosci. Eng. 2019, 16, 6454–6466. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, C.; Cheng, J.; Chen, X.; Wang, Z.J. A multi-scale data fusion framework for bone age assessment with convolutional neural networks. Comput. Biol. Med. 2019, 108, 161–173. [Google Scholar] [CrossRef] [PubMed]
Booz, C.; Yel, I.; Wichmann, J.L.; Boettger, S.; Al Kamali, A.; Albrecht, M.H.; Martin, S.S.; Lenga, L.; Huizinga, N.A.; D’Angelo, T.; et al. Artificial Intelligence in Bone Age Assessment: Accuracy and Efficiency of a Novel Fully Automated Algorithm Compared to the Greulich–Pyle Method. Eur. Radiol. Exp. 2020, 4, 6. [Google Scholar] [CrossRef] [PubMed]
Shin, N.-Y.; Lee, B.-D.; Kang, J.-H.; Kim, H.-R.; Oh, D.H.; Lee, B.I.; Kim, S.H.; Lee, M.S.; Heo, M.-S. Evaluation of the Clinical Efficacy of a TW3-Based Fully Automated Bone Age Assessment System Using Deep Neural Networks. Imaging Sci. Dent. 2020, 50, 237–243. [Google Scholar] [CrossRef] [PubMed]
Wang, F.; Gu, X.; Chen, S.; Liu, Y.; Shen, Q.; Pan, H.; Shi, L.; Jin, Z. Artificial Intelligence System Can Achieve Comparable Results to Experts for Bone Age Assessment of Chinese Children with Abnormal Growth and Development. PeerJ 2020, 8, e8854. [Google Scholar] [CrossRef]
Wang, Y.-M.; Tsai, T.-H.; Hsu, J.-S.; Chao, M.-F.; Wang, Y.-T.; Jaw, T.-S. Automatic Assessment of Bone Age in Taiwanese Children: A Comparison of the Greulich and Pyle Method and the Tanner and Whitehouse 3 Method. J. Med. Imaging Radiat. Oncol. 2020, 64, 704–712. [Google Scholar] [CrossRef]
Koitka, S.; Kim, M.S.; Qu, M.; Fischer, A.; Friedrich, C.M.; Nensa, F. Mimicking the Radiologists’ Workflow: Estimating Pediatric Hand Bone Age with Stacked Deep Neural Networks. Med. Image Anal. 2020, 64, 101743. [Google Scholar] [CrossRef]
Gao, Y.; Zhu, T.; Xu, X. Bone Age Assessment Based on Deep Convolution Neural Network Incorporated with Segmentation. Int. J. Comput. Assist. Radiol. Surg. 2020, 15, 1951–1962. [Google Scholar] [CrossRef]
Pan, I.; Baird, G.L.; Mutasa, S.; Merck, D.; Ruzal-Shapiro, C.; Swenson, D.W.; Ayyala, R.S. Rethinking Greulich and Pyle: A Deep Learning Approach to Pediatric Bone Age Assessment Using Pediatric Trauma Hand Radiographs. Radiol. Artif. Intell. 2020, 2, e190198. [Google Scholar] [CrossRef]
Zhou, X.-L.; Wang, E.-G.; Lin, Q.; Dong, G.-P.; Wu, W.; Huang, K.; Lai, C.; Yu, G.; Zhou, H.-C.; Ma, X.-H.; et al. Diagnostic Performance of Convolutional Neural Network-Based Tanner-Whitehouse 3 Bone Age Assessment System. Quant. Imaging Med. Surg. 2020, 10, 657–667. [Google Scholar] [CrossRef]
Wang, Z.J. Probing an AI Regression Model for Hand Bone Age Determination Using Gradient-Based Saliency Mapping. Sci. Rep. 2021, 11, 10610. [Google Scholar] [CrossRef]
Ozdemir, C.; Gedik, M.A.; Kaya, Y. Age Estimation from Left-Hand Radiographs with Deep Learning Methods. Trait. Signal 2021, 38, 1565–1574. [Google Scholar] [CrossRef]
Rani, N.S.; Yadhu, C.R.; Karthik, U. Chronological Age Assessment Based on Wrist Radiograph Processing: Some Novel Approaches. J. Intell. Fuzzy Syst. 2021, 40, 8651–8663. [Google Scholar] [CrossRef]
Poojary, N.B.; Pokhare, P.G.; Poojary, P.P.; Khanapuri, J. A Novel Approach for Bone Age Assessment Using Deep Learning. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 2021, 6, 1–5. [Google Scholar] [CrossRef]
Narin, N.G.; Yeniçeri, İ.Ö.; Yüksel, G. Estimation of Bone Age from Radiological Images with Machine Learning. Med. J. Mugla Sitki Kocman Univ. 2021, 8, 119–126. [Google Scholar] [CrossRef]
Mao, K.; Chen, L.; Wang, M.; Xu, R.; Zhao, X. Classification of Hand–Wrist Maturity Level Based on Similarity Matching. IET Image Process. 2021, 15, 2866–2879. [Google Scholar] [CrossRef]
Senel, F.A.; Dursun, A.; Ozturk, K.; Ayyildiz, V.A. Determination of Bone Age Using Deep Convolutional Neural Networks. Ann. Med. Res. 2021, 28, 1381–1386. [Google Scholar] [CrossRef]
Mehta, C.; Ayeesha, B.; Sotakanal, A.; Desai, S.D.; Ganguly, A.D.; Shetty, V. Deep Learning Framework for Automatic Bone Age Assessment. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2021, 2021, 3093–3096. [Google Scholar] [CrossRef]
Lee, B.D.; Lee, M.S. Automated Bone Age Assessment Using Artificial Intelligence: The Future of Bone Age Assessment. Korean J. Radiol. 2021, 22, 792–800. [Google Scholar] [CrossRef]
Kim, D.W.; Kim, J.; Kim, T.; Kim, T.; Kim, Y.J.; Song, I.S.; Ahn, B.; Choo, J.; Lee, D.Y. Prediction of Hand-Wrist Maturation Stages Based on Cervical Vertebrae Images Using Artificial Intelligence. Orthod. Craniofac. Res. 2021, 24 (Suppl. S2), 68–75. [Google Scholar] [CrossRef]
Hwang, J.; Yoon, H.M.; Hwang, J.Y.; Kim, P.H.; Bak, B.; Bae, B.U.; Sung, J.; Kim, H.J.; Jung, A.Y.; Cho, Y.A.; et al. Re-Assessment of Applicability of Greulich and Pyle-Based Bone Age to Korean Children Using Manual and Deep Learning-Based Automated Method. Yonsei Med. J. 2022, 63, 683–691. [Google Scholar] [CrossRef]
Li, S.; Liu, B.; Li, S.; Zhu, X.; Yan, Y.; Zhang, D. A Deep Learning-Based Computer-Aided Diagnosis Method of X-Ray Images for Bone Age Assessment. Complex Intell. Syst. 2022, 8, 1929–1939. [Google Scholar] [CrossRef] [PubMed]
Hui, Q.; Wang, C.; Weng, J.; Chen, M.; Kong, D. A Global-Local Feature Fusion Convolutional Neural Network for Bone Age Assessment of Hand X-ray Images. Appl. Sci. 2022, 12, 7218. [Google Scholar] [CrossRef]
Xu, X.; Xu, H.; Li, Z. Automated Bone Age Assessment: A New Three-Stage Assessment Method from Coarse to Fine. Healthcare 2022, 10, 2170. [Google Scholar] [CrossRef] [PubMed]
Kang, B.-K.; Han, Y.; Oh, J.; Lim, J.; Ryu, J.; Yoon, M.S.; Lee, J.; Ryu, S. Automatic Segmentation for Favourable Delineation of Ten Wrist Bones on Wrist Radiographs Using Convolutional Neural Network. J. Pers. Med. 2022, 12, 776. [Google Scholar] [CrossRef]
Zhang, Y.; Zhu, W.; Li, K.; Yan, D.; Liu, H.; Bai, J.; Liu, F.; Cheng, X.; Wu, T. SMANet: Multi-Region Ensemble of Convolutional Neural Network Model for Skeletal Maturity Assessment. Quant. Imaging Med. Surg. 2022, 12, 3556–3568. [Google Scholar] [CrossRef]
Cheng, C.F.; Liao, K.Y.; Lee, K.J.; Tsai, F.J. A Study to Evaluate Accuracy and Validity of the EFAI Computer-Aided Bone Age Diagnosis System Compared with Qualified Physicians. Front. Pediatr. 2022, 10, 829372. [Google Scholar] [CrossRef]
Zerari, A.; Djedidi, O.; Kahloul, L.; Carlo, R.; Remadna, I. Paediatric Bone Age Assessment from Hand X-ray Using Deep Learning Approach. In Advances in Computing Systems and Applications; Lecture Notes in Networks and Systems; Senouci, M.R., Boulahia, S.Y., Benatia, M.A., Eds.; Springer: Cham, Switzerland, 2022; Volume 513. [Google Scholar] [CrossRef]
Bowden, J.J.; Bowden, S.A.; Ruess, L.; Adler, B.H.; Hu, H.; Krishnamurthy, R.; Krishnamurthy, R. Validation of Automated Bone Age Analysis from Hand Radiographs in a North American Pediatric Population. Pediatr. Radiol. 2022, 52, 1347–1355. [Google Scholar] [CrossRef]
Zhao, K.; Ma, S.; Sun, Z.; Liu, X.; Zhu, Y.; Xu, Y.; Wang, X. Effect of AI-Assisted Software on Inter- and Intra-Observer Variability for the X-ray Bone Age Assessment of Preschool Children. BMC Pediatr. 2022, 22, 644. [Google Scholar] [CrossRef]
Mame, A.B.; Tapamo, J.-R. Hand Bone Age Estimation Using Deep Convolutional Neural Networks. In Advanced Data Mining and Applications, Proceedings of the 17th International Conference, ADMA 2021, Sydney, NSW, Australia, 2–4 February 2022; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2022; Volume 13022, pp. 55–65. [Google Scholar] [CrossRef]
Wang, X.; Zhou, B.; Gong, P.; Zhang, T.; Mo, Y.; Tang, J.; Shi, X.; Wang, J.; Yuan, X.; Bai, F.; et al. Artificial Intelligence–Assisted Bone Age Assessment to Improve the Accuracy and Consistency of Physicians with Different Levels of Experience. Front. Pediatr. 2022, 10, 818061. [Google Scholar] [CrossRef]
Beheshtian, E.; Putman, K.; Santomartino, S.M.; Parekh, V.S.; Yi, P.H. Generalizability and Bias in a Deep Learning Pediatric Bone Age Prediction Model Using Hand Radiographs. Radiology 2023, 306, e220505. [Google Scholar] [CrossRef]
Umer, M.; Eshmawi, A.A.; Alnowaiser, K.; Mohamed, A.; Alrashidi, H.; Ashraf, I. Skeletal Age Evaluation Using Hand X-rays to Determine Growth Problems. PeerJ Comput. Sci. 2023, 9, e1512. [Google Scholar] [CrossRef] [PubMed]
Prasanna, R.G.V.; Shaik, M.F.; Sastry, L.V.; Sahithi, C.G.; Jagadeesh, J.; Raja, I.R. Evaluation of Bone Age by Deep Learning Based on Hand X-Rays. In Expert Clouds and Applications; Lecture Notes in Networks and Systems; Jacob, I.J., Ed.; Springer Nature Singapore Pte Ltd.: Singapore, 2023; Volume 673, pp. 523–532. [Google Scholar] [CrossRef]
Nandavardhan, R.; Somanathan, R.; Suresh, V.; Savaridassan, P. Comparative Analysis of Machine Learning Approaches for Bone Age Assessment: A Comprehensive Study on Three Distinct Models. arXiv 2024, arXiv:2411.10345. [Google Scholar] [CrossRef]
Farooq, H.; Umer, M.; Saidani, O.; Almuqren, L.; Distasi, R. Improving Prediction of Skeletal Growth Problems for Age Evaluation Using Hand X-rays. Multimed. Tools Appl. 2024, 83, 80027–80049. [Google Scholar] [CrossRef]
Tang, H.; Pei, X.; Li, X.; Tong, H.; Li, X.; Huang, S. End-to-End Multi-Domain Neural Networks with Explicit Dropout for Automated Bone Age Assessment. Appl. Intell. 2023, 53, 3736–3749. [Google Scholar] [CrossRef]
Wang, X.; Xu, M.; Hu, M.; Ren, F. A Multi-Scale Framework Based on Jigsaw Patches and Focused Label Smoothing for Bone Age Assessment. Vis. Comput. 2023, 39, 1015–1025. [Google Scholar] [CrossRef]
He, B.; Xu, Z.; Zhou, D.; Chen, Y. Multi-Branch Attention Learning for Bone Age Assessment with Ambiguous Label. Sensors 2023, 23, 4834. [Google Scholar] [CrossRef]
Jian, K.; Li, S.; Yang, M.; Wang, S.; Song, C. Multi-Characteristic Reinforcement of Horizontally Integrated TENet Based on Wrist Bone Development Criteria for Pediatric Bone Age Assessment. Appl. Intell. 2023, 53, 22743–22752. [Google Scholar] [CrossRef]
Sarquis Serpa, A.; Elias Neto, A.; Kitamura, F.C.; Monteiro, S.S.; Ragazzini, R.; Duarte, G.A.R.; Caricati, L.A.; Abdala, N. Validation of a Deep Learning Algorithm for Bone Age Estimation Among Patients in the City of São Paulo, Brazil. Radiol. Bras. 2023, 56, 263–268. [Google Scholar] [CrossRef]
Kim, P.H.; Yoon, H.M.; Kim, J.R.; Hwang, J.-Y.; Choi, J.-H.; Hwang, J.; Lee, J.; Sung, J.; Jung, K.-H.; Bae, B.; et al. Bone Age Assessment Using Artificial Intelligence in Korean Pediatric Population: A Comparison of Deep-Learning Models Trained With Healthy Chronological and Greulich-Pyle Ages as Labels. Korean J. Radiol. 2023, 24, 1151–1163. [Google Scholar] [CrossRef]
Bai, M.; Gao, L.; Ji, M.; Ge, J.; Huang, L.; Qiao, H.; Xiao, J.; Chen, X.; Yang, B.; Sun, Y.; et al. The Uncovered Biases and Errors in Clinical Determination of Bone Age by Using Deep Learning Models. Eur. Radiol. 2023, 33, 3544–3556. [Google Scholar] [CrossRef]
Suh, J.; Heo, J.; Kim, S.J.; Park, S.; Jung, M.K.; Choi, H.S.; Choi, Y.; Oh, J.S.; Lee, H.I.; Lee, M.; et al. Bone Age Estimation and Prediction of Final Adult Height Using Deep Learning. Yonsei Med. J. 2023, 64, 679–686. [Google Scholar] [CrossRef] [PubMed]
Yang, Z.; Cong, C.; Pagnucco, M.; Song, Y. Multi-Scale Multi-Reception Attention Network for Bone Age Assessment in X-ray Images. Neural Netw. 2023, 158, 249–257. [Google Scholar] [CrossRef]
Mao, X.; Hui, Q.; Zhu, S.; Du, W.; Qiu, C.; Ouyang, X.; Kong, D. Automated Skeletal Bone Age Assessment with Two-Stage Convolutional Transformer Network Based on X-ray Images. Diagnostics 2023, 13, 1837. [Google Scholar] [CrossRef] [PubMed]
Kim Huang, S.; Su, Z.; Liu, S.; Chen, J.; Su, Q.; Su, H.; Shang, Y.; Jiao, Y. Combined assisted bone age assessment and adult height prediction methods in Chinese girls with early puberty: Analysis of three artificial intelligence systems. Pediatr. Radiol. 2023, 53, 1108–1116. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Ouyang, L.; Wu, W.; Zhou, X.; Huang, K.; Wang, Z.; Song, C.; Chen, Q.; Su, Z.; Zheng, R.; et al. Validation of an Established TW3 Artificial Intelligence Bone Age Assessment System: A Prospective, Multicenter, Confirmatory Study. Quant. Imaging Med. Surg. 2024, 14, 144–159. [Google Scholar] [CrossRef]
Kim, J.K.; Park, D.; Chang, M.C. Assessment of Bone Age Based on Hand Radiographs Using Regression-Based Multi-Modal Deep Learning. Life 2024, 14, 774. [Google Scholar] [CrossRef]
Hamd, Z.Y.; Alorainy, A.I.; Alharbi, M.A.; Hamdoun, A.; Alkhedeiri, A.; Alhegail, S.; Absar, N.; Khandaker, M.U.; Osman, A.F.I. Deep Learning-Based Automated Bone Age Estimation for Saudi Patients on Hand Radiograph Images: A Retrospective Study. BMC Med. Imaging 2024, 24, 199. [Google Scholar] [CrossRef]
Pape, J.; Rosolowski, M.; Pfäffle, R.; Beeskow, A.B.; Gräfe, D. A Critical Comparative Study of the Performance of Three AI-Assisted Programs for Bone Age Determination. Eur. Radiol. 2024, 35, 1190–1196. [Google Scholar] [CrossRef]
Özmen, E.; Özen Atalay, H.; Uzer, E.; Veznikli, M. A Comparison of Two Artificial Intelligence-Based Methods for Assessing Bone Age in Turkish Children: BoneXpert and VUNO Med-Bone Age. Diagn. Interv. Radiol. 2024, in press. [CrossRef]
Lu, Y.; Zhang, X.; Jing, L.; Fu, X. Data Enhancement and Deep Learning for Bone Age Assessment Using the Standards of Skeletal Maturity of Hand and Wrist for Chinese. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Guadalajara, Mexico, 1–5 November 2021. [Google Scholar]
Wang, L.; Zhang, X.; Chen, P.; Zhou, D. Doctor Simulator: Delta-Age-Sex-AdaIn Enhancing Bone Age Assessment through AdaIn Style Transfer. Pediatr. Radiol. 2024, 54, 1704–1712. [Google Scholar] [CrossRef]
Lin, Q.; Wang, H.; Wangjiu, C.; Awang, T.; Yang, M.; Qiongda, P.; Yang, X.; Pan, H.; Wang, F. An Artificial Intelligence-Based Bone Age Assessment Model for Han and Tibetan Children. Front. Physiol. 2024, 15, 1329145. [Google Scholar] [CrossRef]
Liang, Y.; Chen, X.; Zheng, R.; Cheng, X.; Su, Z.; Wang, X.; Du, H.; Zhu, M.; Li, G.; Zhong, Y.; et al. Validation of an AI-Powered Automated X-ray Bone Age Analyzer in Chinese Children and Adolescents: A Comparison with the Tanner–Whitehouse 3 Method. Adv. Ther. 2024, 41, 3664–3677. [Google Scholar] [CrossRef]
Gräfe, D.; Beeskow, A.B.; Pfäffle, R.; Rosolowski, M.; Chung, T.S.; DiFranco, M.D. Automated Bone Age Assessment in a German Pediatric Cohort: Agreement between an Artificial Intelligence Software and the Manual Greulich and Pyle Method. Eur. Radiol. 2024, 34, 4407–4413. [Google Scholar] [CrossRef]
Alaimo, D.; Terranova, M.C.; Palizzolo, E.; De Angelis, M.; Avella, V.; Paviglianiti, G.; Lo Re, G.; Matranga, D.; Salerno, S. Performance of Two Different Artificial Intelligence (AI) Methods for Assessing Carpal Bone Age Compared to the Standard Greulich and Pyle Method. Radiol. Med. 2024, 129, 1507–1512. [Google Scholar] [CrossRef]
Lee, K.-C.; Kang, C.H.; Ahn, K.-S.; Lee, K.-H.; Lee, J.J.; Cho, K.R.; Oh, S. A Comparison of Automatic Bone Age Assessments between the Left and Right Hands: A Tool for Filtering Measurement Errors. Appl. Sci. 2024, 14, 8135. [Google Scholar] [CrossRef]
Deng, Y.; Song, T.; Wang, X.; Liao, Y.; Chen, Y.; He, Q. ARAA-Net: Adaptive Region-Aware Attention Network for Epiphysis and Articular Surface Segmentation from Hand Radiographs. IEEE Trans. Instrum. Meas. 2024, 73, 2514814. [Google Scholar] [CrossRef]
Wang, S.; Jin, S.; Xu, K.; She, J.; Fan, J.; He, M.; Shaoyi, L.S.; Gao, Z.; Liu, X.; Yao, K. A Pediatric Bone Age Assessment Method for Hand Bone X-ray Images Based on Dual-Path Network. Neural Comput. Appl. 2024, 36, 9737–9752. [Google Scholar] [CrossRef]
Sharma, P. Bone Age Estimation with HS-Optimized ResNet and YOLO for Child Growth Disorder. Expert Syst. Appl. 2025, 259, 125160. [Google Scholar] [CrossRef]
Son, S.J.; Song, Y.; Kim, N.; Do, Y.; Kwak, N.; Lee, M.S.; Lee, B.-D. TW3-Based Fully Automated Bone Age Assessment System Using Deep Neural Networks. IEEE Access 2019, 7, 33346–33358. [Google Scholar] [CrossRef]
Lea, W.W.; Hong, S.J.; Nam, H.K.; Kang, W.Y.; Yang, Z.P.; Noh, E.J. External Validation of Deep Learning-Based Bone-Age Software: A Preliminary Study with Real World Data. Sci. Rep. 2022, 12, 1232. [Google Scholar] [CrossRef]
Pan, I.; Thodberg, H.H.; Halabi, S.S.; Kalpathy-Cramer, J.; Larson, D.B. Improving Automated Pediatric Bone Age Estimation Using Ensembles of Models from the 2017 RSNA Machine Learning Challenge. Radiol. Artif. Intell. 2019, 1, e190053. [Google Scholar] [CrossRef]
Wibisono, A.; Saputri, M.S.; Mursanto, P.; Rachmad, J.; Alberto; Yudasubrata, A.T.W.; Rizki, F.; Anderson, E. Deep Learning and Classic Machine Learning Approach for Automatic Bone Age Assessment. In Proceedings of the 2019 4th Asia-Pacific Conference on Intelligent Robot Systems, Tokushima, Japan, 19–22 August 2019; pp. 235–240. [Google Scholar] [CrossRef]
Reddy, N.E.; Rayan, J.C.; Annapragada, A.V.; Mahmood, N.F.; Scheslinger, A.E.; Zhang, W.; Kan, J.H. Bone Age Determination Using Only the Index Finger: A Novel Approach Using a Convolutional Neural Network Compared with Human Radiologists. Pediatr. Radiol. 2020, 50, 516–523. [Google Scholar] [CrossRef]
Nguyen, T.; Hermann, A.-L.; Ventre, J.; Ducarouge, A.; Pourchot, A.; Marty, V.; Regnard, N.-E.; Guermazi, A. High Performance for Bone Age Estimation with an Artificial Intelligence Solution. Diagn. Interv. Imaging 2023, 104, 330–336. [Google Scholar] [CrossRef]
Zulkifley, M.A.; Mohamed, N.A.; Abdani, S.R.; Kamari, N.A.M.; Moubark, A.M.; Ibrahim, A.A. Intelligent Bone Age Assessment: An Automated System to Detect a Bone Growth Problem Using Convolutional Neural Networks with Attention Mechanism. Diagnostics 2021, 11, 765. [Google Scholar] [CrossRef]
Jani, G.; Patel, B. Charting the Growth through Intelligence: A SWOC Analysis on AI-Assisted Radiologic Bone Age Estimation. Int. J. Leg. Med. 2025, 139, 679–694. [Google Scholar] [CrossRef]
Tajmir, S.H.; Lee, H.; Shailam, R.; Gale, H.I.; Nguyen, J.C.; Westra, S.J.; Lim, R.; Yune, S.; Gee, M.S.; Do, S. Artificial Intelligence-Assisted Interpretation of Bone Age Radiographs Improves Accuracy and Decreases Variability. Skelet. Radiol. 2019, 48, 275–283. [Google Scholar] [CrossRef]
Kasani, A.A.; Sajedi, H. Hand Bone Age Estimation Using Divide and Conquer Strategy and Lightweight Convolutional Neural Networks. Eng. Appl. Artif. Intell. 2023, 120, 105935. [Google Scholar] [CrossRef]
Gonca, M.; Sert, M.F.; Gunacar, D.N.; Kose, T.E.; Beser, B. Determination of Growth and Developmental Stages in Hand-Wrist Radiographs: Can Fractal Analysis in Combination with Artificial Intelligence Be Used? J. Orofac. Orthop. 2024, 85 (Suppl. S2), 1–15. [Google Scholar] [CrossRef]
Pape, J.; Hirsch, F.W.; Deffaa, O.J.; DiFranco, M.D.; Rosolowski, M.; Gräfe, D. Applicability and Robustness of an Artificial Intelligence-Based Assessment for Greulich and Pyle Bone Age in a German Cohort. Fortschr. Röntgenstr. 2024, 196, 600–606. [Google Scholar] [CrossRef]
Offiah, A.C. Current and Emerging Artificial Intelligence Applications for Pediatric Musculoskeletal Radiology. Pediatr. Radiol. 2022, 52, 2149–2158. [Google Scholar] [CrossRef]
Lee, J.H.; Kim, K.G. Applying Deep Learning in Medical Images: The Case of Bone Age Estimation. Healthc. Inform. Res. 2018, 24, 86–92. [Google Scholar] [CrossRef]
Lee, H.; Tajmir, S.; Lee, J.; Zissen, M.; Yeshiwas, B.A.; Alkasab, T.K.; Choy, G.; Do, S. Fully Automated Deep Learning System for Bone Age Assessment. J. Digit. Imaging 2017, 30, 427–441. [Google Scholar] [CrossRef]
Spampinato, C.; Palazzo, S.; Giordano, D.; Aldinucci, M.; Leonardi, R. Deep Learning for Automated Skeletal Bone Age Assessment in X-ray Images. Med. Image Anal. 2017, 36, 41–51. [Google Scholar] [CrossRef]
Thodberg, H.H. Clinical review: An automated method for determination of bone age. J. Clin. Endocrinol. Metab. 2009, 94, 2239–2244. [Google Scholar] [CrossRef]
Dallora, A.L.; Anderberg, P.; Kvist, O.; Mendes, E.; Diaz Ruiz, S.; Sanmartin Berglund, J. Bone age assessment with various machine learning techniques: A systematic literature review and meta-analysis. PLoS ONE 2019, 14, e0220242. [Google Scholar] [CrossRef]
Boas, F. The Growth of Children. Science 1897, 5, 570–573. [Google Scholar] [CrossRef]
Boas, F. Plasticity in Child Development. In Anthropology and Child Development: A Cross-Cultural Reader, 1st ed.; LeVine, R.A., New, R.S., Eds.; Blackwell: Oxford, UK, 2008; pp. 18–21. [Google Scholar]
Kamiran, F.; Calders, T.G.K. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 2012, 33, 1–33. [Google Scholar] [CrossRef]
Naseer, M.; Prabakaran, B.S.; Hasan, O.; Shafique, M. UnbiasedNets: A dataset diversification framework for robustness bias alleviation in neural networks. Mach. Learn. 2024, 113, 2499–2526. [Google Scholar] [CrossRef]
Ferrara, C.; Sellitto, G.; Ferrucci, F.; Palomba, F.; De Lucia, A. Fairness-aware Machine Learning Engineering: How Far Are We? Empir. Softw. Eng. 2024, 29, 9. [Google Scholar] [CrossRef]
Li, Z.; Chen, W.; Ju, Y.; Chen, Y.; Hou, Z.; Li, X.; Jiang, Y. Bone Age Assessment Based on Deep Neural Networks with Annotation-Free Cascaded Critical Bone Region Extraction. Front. Artif. Intell. 2023, 6, 1142895. [Google Scholar] [CrossRef]
Toledo Rodríguez, F.; Rodríguez, F. Atlas Radiológico de Referencia de la Edad Ósea en la Población Canaria; Fundación Canaria Salud y Sanidad, Cabildo de Tenerife: Tenerife, Spain, 2009. [Google Scholar]
Tang, S.-T.; Tjia, V.; Noga, T.; Febri, J.; Lien, C.-Y.; Chu, W.-C.; Chen, C.-Y.; Hsiao, C.-H. Creating a Medical Imaging Workflow Based on FHIR, DICOMweb, and SVG. J. Digit. Imaging 2023, 36, 794–803. [Google Scholar] [CrossRef] [PubMed]
Diaz, O.; Kushibar, K.; Osuala, R.; Linardos, A.; Garrucho, L.; Igual, L.; Radeva, P.; Prior, F.; Gkontra, P.; Lekadir, K. Data preparation for artificial intelligence in medical imaging: A comprehensive guide to open-access platforms and tools. Phys. Med. 2021, 83, 25–37. [Google Scholar] [CrossRef] [PubMed]
Ebad, S.A.; Alhashmi, A.; Amara, M.; Miled, A.B.; Saqib, M. Artificial Intelligence-Based Software as a Medical Device (AI-SaMD): A Systematic Review. Healthcare 2025, 13, 817. [Google Scholar] [CrossRef] [PubMed]
ISO 13485:2016; Medical Devices—Quality Management Systems—Requirements for Regulatory Purposes. International Organization for Standardization (ISO): Geneva, Switzerland, 2016.
ISO 14971:2019; Medical Devices—Application of Risk Management to Medical Devices. International Organization for Standardization (ISO): Geneva, Switzerland, 2019.
Bartels, R.; Dudink, J.; Haitjema, S.; Oberski, D.; van’t Veen, A. A Perspective on a Quality Management System for AI/ML-Based Clinical Decision Support in Hospital Care. Front. Digit. Health 2022, 4, 942588. [Google Scholar] [CrossRef]
Stogiannos, N.; Gillan, C.; Precht, H.; Sa Dos Reis, C.; Kumar, A.; O’Regan, T.; Ellis, V.; Barnes, A.; Meades, R.; Pogose, M.; et al. A multidisciplinary team and multiagency approach for AI implementation: A commentary for medical imaging and radiotherapy key stakeholders. J. Med. Imaging Radiat. Sci. 2024, 55, 101717. [Google Scholar] [CrossRef]
Lang, O.; Yaya-Stupp, D.; Traynis, I.; Cole-Lewis, H.; Bennett, C.R.; Lyles, C.; Lau, C.; Irani, M.; Semturs, C.; Webster, D.R.; et al. Using Generative AI to Investigate Medical Imagery Models and Datasets. eBioMedicine 2024, 102, 105075. [Google Scholar] [CrossRef]

Figure 1. PRISMA flow diagram.

Table 1. PECO Framework for Study Inclusion criteria.

Component	Study Design
Population (P)	Children and adolescents undergoing BA assessment using left PA-HW radiographs
Exposure (E)	AI-based models for BA estimation using left PA-HW radiographs
Comparator (C)	Conventional manual BA methods or alternative computational techniques
Outcomes (O)	Model accuracy, precision, predictive validity, inter-observer variability, intra-observer variability, processing time, clinical applicability

PECO framework outlining the Population, Exposure, Comparator and Outcome criteria guiding the selection of studies on AI-based BA assessment from posteroanterior hand and wrist radiographs.

Table 2. Study Inclusion and Exclusion criteria.

Criteria	Inclusion Criteria	Exclusion Criteria
Study Design	Diagnostic accuracy, cohort, case–control, cross-sectional, validation, case series, chapters, or conference proceedings studies.	Randomized controlled trials, clinical trials, abstracts, editorials, opinion pieces
Publication Date	1 January 2019, and 23 December 2024	Before 1 January 2019, or after 23 December 2024
Availability	Full-text, peer-reviewed publications	Not available as full-text, peer-reviewed publications
Language	English, Spanish, French, Portuguese, Arabic	Any other language

Study inclusion and exclusion criteria applied during systematic screening, specifying eligible study designs, publication timeframe, language, and full-text, peer-reviewed availability.

Table 3. Comparison of AI-based models for Automated Bone Age Assessment from Posteroanterior Hand and Wrist X-rays.

Computational Technique	Datasets	Performance	References
AI-assisted Software
AI-assisted software (BoneXpert^®,VUNO Med^®-BoneAge, BoneView^®, etc.)	RSNA, DHA, TW3 sets	MAE: 2–4 mos.	[27,30,45,53,76,77,83,89,93,99]
Deep Learning Architectures
Convolutional Neural Networks (CNNs)	RSNA, Digital Hand Atlas, Private datasets	MAE: 2.75–7.08 mos.	[24,25,26,28,31,32,33,34,35,36,37,38,40,41,42,46,47,48,49,50,51,52,53,54,55,58,59,60,61,62,63,64,65,66,67,69,70,71,72,74,75,78,79,80,84,85,86,87,88,90,91,92,94,96,97]
Transfer Learning (InceptionV3, VGG16, ResNet50, MobileNetV2, EfficientNetV2B0, etc.)	RSNA	MAE: 3.85–31.8 mos.	[36,38,42,44,59,87,97]
Custom DL Architectures (AXNet, MMANet, DADPN, etc.)	RSNA	MAE: 4–5.8 mos.	[70,86,94]
Multi-domain Neural Networks	RSNA, Local datasets	MAE: ~4–5.5 mos.	[62,71,72]
Model Integration Techniques
Ensemble Learning	RSNA	MAD: 3.79–4.55 mos.	[31,38,90,91]
Hybrid Models (CNN + TW3/GPA)	RSNA, DHA, TW3 sets	MAE: 5.52–7.08 mos.	[24,28,43,73,88]
Region Processing and Enhancement
U-Net Segmentation	RSNA	MAE: 6–7.35 mos.	[32,33,46,48,85]
Attention Mechanisms	RSNA, Chinese datasets	MAE: ~4–6 mos.	[32,64,71,81,94]
Region Localization (YOLOv3, YOLOv5)	RSNA	MAE: 4.8–6.2 mos.	[47,48,71,85]

Comparison of AI–based models for Automated Bone Age Assessment from Posteroanterior Wrist X-rays. Abbreviations: AXNet (Attention-X Network), CNN (Convolutional Neural Network), DHA (Digital Hand Atlas), GPA (Greulich and Pyle Atlas), MAE (Mean Absolute Error), MAD (Median Absolute Deviation), RSNA (Radiological Society of North America), TW3 (Tanner-Whitehouse 3), U-Net (U-shaped Network),YOLO (You Only Look Once).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Martín Pérez, I.M.; Bourhim, S.; Martín Pérez, S.E. Artificial Intelligence-Based Models for Automated Bone Age Assessment from Posteroanterior Wrist X-Rays: A Systematic Review. Appl. Sci. 2025, 15, 5978. https://doi.org/10.3390/app15115978

AMA Style

Martín Pérez IM, Bourhim S, Martín Pérez SE. Artificial Intelligence-Based Models for Automated Bone Age Assessment from Posteroanterior Wrist X-Rays: A Systematic Review. Applied Sciences. 2025; 15(11):5978. https://doi.org/10.3390/app15115978

Chicago/Turabian Style

Martín Pérez, Isidro Miguel, Sofia Bourhim, and Sebastián Eustaquio Martín Pérez. 2025. "Artificial Intelligence-Based Models for Automated Bone Age Assessment from Posteroanterior Wrist X-Rays: A Systematic Review" Applied Sciences 15, no. 11: 5978. https://doi.org/10.3390/app15115978

APA Style

Martín Pérez, I. M., Bourhim, S., & Martín Pérez, S. E. (2025). Artificial Intelligence-Based Models for Automated Bone Age Assessment from Posteroanterior Wrist X-Rays: A Systematic Review. Applied Sciences, 15(11), 5978. https://doi.org/10.3390/app15115978

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Artificial Intelligence-Based Models for Automated Bone Age Assessment from Posteroanterior Wrist X-Rays: A Systematic Review

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Sources and Search Strategy

2.2. Selection of Studies

2.3. Data Extraction

2.4. Methodological Quality Assessment (Newcastle–Ottawa Scale)

2.5. Risk of Bias Assessment (ROBINS-E)

3. Results

3.1. Study Selection

3.2. Study Characteristics

3.3. Methodological Quality Assessment (Newcastle–Ottawa Scale)

3.4. Risk of Bias Assessment (ROBINS-E)

3.5. Main Results

3.5.1. AI-Assisted Softwares for BA Assessment Through Postero-Anterior Hand and Wrist Radiograph

3.5.2. Deep Learning Architectures for BA Assessment Through Postero-Anterior Hand and Wrist Radiograph

3.5.3. Model Integration Techniques for BA Assessment Through Postero-Anterior Hand and Wrist Radiograph

4. Discussion

4.1. Limitations

4.2. Recommendations for Clinical Practice

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI