1. Introduction
The post-natal growth of the human mandible holds great significance within the field of orthodontics, as it boasts the highest growth potential among craniofacial structures [
1]. The majority of mandibular growth takes place during adolescence, which coincides with the common treatment period for orthodontic patients [
2]. Normal mandibular growth is typically observed in Class I patients, where the development of the mandible proceeds without significant deviations or abnormalities. Unanticipated mandibular growth can notably influence outcomes, particularly in Class III patients where excessive growth poses challenges. Conversely, in Class II cases, there is often a prominent deficiency in mandibular growth, characterized by insufficient horizontal and/or vertical development [
3,
4]. This deficiency limits the potential for self-correction without the intervention of orthodontic treatment [
5]. To address these challenges, growth modification therapies have been utilized for decades to enhance mandibular growth while concurrently restricting maxillary growth [
6]. If orthodontists had the ability to accurately predict mandibular growth, this would hold immense value as it would enable clinicians to anticipate and plan orthodontic treatment effectively, allowing for timely interventions to guide and optimize mandibular development, resulting in improved treatment outcomes and long-term stability.
In the 1960s, Bjork sought to understand normal growth variation by placing metallic implants in the jaws of developing children [
7,
8,
9]. His approach aimed to unravel the mysteries of mandibular growth, enabling the prediction of both the magnitude and direction of growth with greater precision. In his research, Bjork deduced that the mandible exhibits a predominant downward and forward growth pattern, with the condyle serving as the primary site of substantial growth [
9]. Skieller et al. made a significant contribution to mandibular growth prediction using longitudinal studies and cephalometric analysis. Their research focused on establishing growth patterns predictions based on intermolar angle, the shape of the lower border of the mandible, the inclination of the symphysis, and mandibular inclination [
10]. Though their methods were thought to have high accuracy, subsequent clinical evaluation exhibited notable inaccuracy and limitations [
11]. Thereafter, Ricketts and his colleagues [
12,
13] proposed an arcial prediction method, which exhibited promising clinical utility when subjected to preliminary testing with a limited sample size, subsequently earning recognition as a prediction method currently integrated into the Dolphin Imaging 11.0 Software. Ultimately, predictions derived from anatomical structures have demonstrated inconsistent accuracy.
In the pursuit of more accurate predictions, several mathematical models have been developed. Rudolph et al. incorporated the Bayes theorem and Gaussian distribution to develop a statistical model that predicted mandibular growth based on observed variables and their probabilistic relationships [
14]. Their approach leveraged both prior knowledge and the data at hand to estimate and predict mandibular growth. However, this method demonstrated a prediction accuracy of only 82%. Buschang et al. developed a mathematical model that involved comparing the average yearly growth velocities with a population-based growth curve [
15]. Their findings demonstrated a prediction accuracy of 76%; the researchers acknowledged the presence of bias due to anticipated growth variations that could not be fully accounted for by the prediction methods employed. In 2021, Jiménez-Silva et al. conducted a systematic review investigating Class II mandibular growth and reached a significant conclusion, highlighting the overall low-to-moderate methodological quality of existing predictors and underscoring the pressing need for reliable prediction methods [
16]. Numerous studies were found to possess an elevated risk of bias and employed broad sample selections, further emphasizing the need for rigorous investigation. As a result, this systematic review advocates for the implementation of a meticulously designed longitudinal cohort study based on lateral cephalometric radiographs, which adhere to stringent quality standards, to address this research gap and provide more accurate predictions. Despite concerted efforts, it remains challenging for human-made models to comprehensively account for the intricate and multifaceted variations in human beings.
Walker was one of the first in the field of orthodontics to postulate and conduct mandibular predictions using computer software [
17]. Since then, technology has advanced so significantly that artificial intelligence (AI) and machine learning (ML) have been utilized in almost every aspect of our life. AI is the development of computer systems capable of performing tasks that normally require human intelligence [
18]. Within AI, ML utilizes a set of inputs and outputs to create an algorithm to process the data and correctly predict the output [
19]. AI and ML have been utilized for several tasks in orthodontics, such as for automated cephalometric analyses [
20,
21,
22,
23,
24], predicting extraction vs. non-extraction treatment decisions [
25,
26,
27,
28,
29,
30,
31,
32,
33], predicting orthodontic extraction patterns [
34], determining the need for surgery in Class III patients [
35], and growth assessment [
36,
37,
38,
39,
40,
41,
42,
43,
44]. However, little research has been conducted on the use of AI to predict mandibular growth. Niño-Sandoval et al. utilized automated learning techniques to predict mandibular morphology in Class I, II, and III patients [
45]. This study used the coordinates of craniofacial landmarks as variables for Artificial Neural Networks and Support Vector Regression (SVR) to predict morphological outcomes. This research yielded exceptional predictability, showcasing the remarkable ability of AI to accurately forecast jaw morphology. The same group of researchers used AI to classify skeletal patterns through craniomaxillary variables selected from the mandible for forensic use [
46]. This resulted in 74% accuracy in correctly predicting the skeletal patterns. In an unpublished master thesis, Jiwa et al. employed deep learning techniques to construct an algorithm for mandibular growth prediction [
47]. Their approach involved utilizing predictions based on the X and Y coordinates of 17 mandibular landmarks on selected cephalograms and comparing them with Rickett’s growth prediction. However, this proved to be generally inaccurate, highlighting the necessity for larger and more targeted sample populations to enhance the predictive capabilities. Recently, Wood et al. utilized 39 linear and angular measurements from lateral cephalograms to predict mandibular growth in untreated Class I male patients [
48]. This study employed seven distinct ML algorithms to analyze the measurements, predict the magnitude and direction of the mandible, and subsequently compare the results to the final cephalogram of each patient. They were able to predict mandibular growth within 3 mm and
y-axis within 1°.
To the best of our knowledge, there has been no previous research investigating the application of ML for accurately predicting both the magnitude and direction of mandibular growth in adolescent males presenting with a Class II malocclusion during the circumpubertal period. As mentioned earlier, existing predictions of mandibular growth have often lacked the desired precision and accuracy. Achieving a breakthrough in the field of orthodontics through such predictions would mark a significant advancement. By gaining the capability to forecast mandibular growth in Class II patients, we could determine the optimal timing to initiate treatment, assess the need for any growth modification, and provide patients with the most effective and exceptional treatment plans possible, which is of paramount importance. The goal of this investigation is to develop an ML algorithm that can reliably and accurately forecast the magnitude and direction of mandibular growth within this specific patient subgroup.
2. Materials and Methods
2.1. Ethics
This retrospective study was approved as a non-human subjects research (NHSR) by the Institutional Review Board (IRB) of Indiana University Human Research Protection Program (HRPP) (Protocol #14987).
2.2. Study Sample
The sample of this study consisted of digital cephalometric radiographs from subjects in the American Association of Orthodontists Foundation (AAOF) Craniofacial Legacy Collection, which includes data from the Bolton Brush Growth, Burlington Growth, Denver Growth, Fels Longitudinal, Forsyth Twin, Iowa Growth, Matthews Growth, Michigan Growth, and Oregon Growth studies [
49]. The inclusion criteria consisted of males with Class II malocclusion or an ANB > 3.5 with pre-pubertal (T1) (mean age ± SD: 12.0 ± 0.29 years), pubertal (T2) (mean age ± SD: 14.1 ± 0.27 years), and post-pubertal (T3) (mean age ± SD: 15.9 ± 0.48 years) cephalograms. Subjects with craniofacial anomalies, apparent skeletal asymmetries, missing teeth (excluding third molars), missing cephalometric records, or lateral cephalograms lacking necessary structures were excluded from this study. A total of 123 cases met the inclusion criteria and were selected for this study.
2.3. Sample Size Justification
The study used 92 of the cases for training and the remaining 31 for the testing set. With this sample size, the 95% confidence interval for the intra-class correlation coefficients (ICCs) had a width of 0.28, extending from 0.62 to 0.90, if the ICC was 0.80; higher ICCs had shorter confidence interval widths.
2.4. Data Collection
Images obtained from the AAOF collection were then imported into Dolphin Imaging V. 11.95 (Dolphin Imaging and Management Solutions, Chatsworth, CA, USA) for further analyses. A solitary investigator (G.Z.) identified and annotated 25 hard tissue landmarks on each image (
Figure 1). This process enabled the calculation of 38 linear and angular measurements, which were subsequently utilized as hyperparameters for the model (
Table S1). Several cephalograms did not show adequate soft tissue; therefore, soft tissue landmarks and associated cephalometric measurements were not included in the study.
The AAOF provided dots-per-inch (DPI) calibration for measurements; however, when magnification discrepancies were detected, images were printed at a 1:1 scale, and ruler length was verified for accuracy, whereafter the digital ruler was employed to recalibrate measurements. Demographic and cephalometric data were then compiled and stored in a secure cloud service (OneDrive, Microsoft Co., Redmond, WA, USA). For the intra-examiner repeatability assessment, a research randomizer was utilized to randomly choose 20 images for retracing. ICCs were utilized to evaluate the measurements’ repeatability.
2.5. Algorithm Training and Testing
The dataset was randomly separated into 75% training data for training the model and 25% testing data for testing the model. The training set’s purpose was to impart knowledge to the ML models so that they could accurately forecast the post-pubertal mandibular length and
y-axis. To this end, input data obtained from both T1 and T2 were used for a 2 year prediction, whereas input data from only T1 were utilized for a 4 year prediction (
Figure 2).
Six fundamental traditional regression techniques, XGBoost, Random Forest, Lasso, Ridge, Linear Regression, and Support Vector Regression (SVR), along with a Multilayer Perceptron (MLP) regressor were used to ensure the robustness of our investigation. To investigate possible linear associations, we employed Linear Regression utilizing the least squares method, along with L1 (Lasso) and L2 (Ridge) regularizers. The Linear Regression technique, which is a venerable statistical tool, facilitates approximations for problems in which the number of equations exceeds the number of unknowns. This approach is particularly adept at unearthing linear relationships that might underlie the data. For data that deviated from the linear trajectory, we utilized non-linear methodologies such as kernel-based SVR, tree-based algorithms such as XGBoost and Random Forest, and the MLP regressor. Random Forest, an ensemble of decision trees, was employed to mitigate the disparity between predicted and actual dependent variables, as well as to minimize overfitting, especially given the limitations of our constrained training dataset. All experiments were conducted in Spyder 4.1.5, utilizing the programming language Python 3.7.9 (Python Software Foundation, Fredricksburgh, VA, USA). To carry out the experiments, the following packages were used: sklearn version 1.0.2 (NumFOCUS, Austin, TX, USA) for least squares, Ridge, Lasso, and Random Forest; XGBoost version 1.5.0 (DMLC, Seattle, WA, USA) for XGBoost; and Keras version 2.4.0 (Keras, Mountain View, CA, USA) in the TensorFlow version 2.4.3 (Keras, Mountain View, CA, USA) platform for the neural network.
2.6. Statistical Analysis
The mean absolute error (MAE), root mean square error (RMSE), mean error (ME), ICCs, and Bland–Altman plots were calculated for each technique to evaluate the agreement between the predicted and actual outcome measurements. The accuracy percentage of the methods was calculated using the formula (1 − (MAE/Actual Value) × 100). The directional and absolute differences between the predicted and actual measurements were calculated and compared between models using analysis of variance (ANOVA), with random effects to correlate data from the same patient. Paired t-tests were used to test for a significant mean directional difference between predicted and actual measurements. A two-sided 5% significance level was used for all the tests. All analyses were performed using SAS version 9.4 (SAS Institute, Inc., Cary, NC, USA).
4. Discussion
There is a significant degree of variability in both the magnitude and direction of pubertal mandibular growth across different genders, races, and even among individuals of the same age and gender [
51]. To thoroughly investigate the complex growth patterns of the mandible, we employed a targeted approach by selecting specific samples based on malocclusion, gender, and age. This study specifically focused on analyzing records exclusively from Class II males in the circumpubertal stage. By utilizing data from individuals aged 11 to 16 years, we were able to examine the peak growth and maturation stages that most males experience, capturing a more stable estimate of the final position of the mandible as growth approaches its plateau. Our intention was to create a novel ML model that can predict the magnitude and direction of pubertal mandibular growth in males with Class II malocclusion.
This study is a vital contribution to an extensive series of investigations utilizing advanced ML techniques to forecast the intricate process of mandibular growth. Baumrind et al. conducted a study in which orthodontists attempted to forecast the mandibular growth of Class II patients, ultimately leading to the conclusion that human predictions fare no better than chance [
52]. Conversely, our study achieved an elevated level of precision by accurately predicting the post-pubertal mandibular length within a margin of 2.5 mm. In a similar vein, Wood et al. successfully predicted the mandibular length among Class I males with an accuracy within 3 mm [
48]. ML exhibited an exceptional capability for accurately predicting the
y-axis within a narrow range of 1 degree. Notably, Wood et al. also predicted the
y-axis in a range of 1 degree [
48].
Different predictors were prominent in each ML model. In terms of mandibular length, significant predictors that were identified include chronological age, upper and lower face heights, and upper incisor position. The strong predictive power of chronological age is inherently logical, given that the patients were situated within the circumpubertal age, a period characterized by accelerated growth and development. It is noteworthy that the algorithm likely detected the average peak height velocity, which typically transpires around the age of 14 years, enabling more accurate predictions [
2]. Lower face height also contributed significantly to precise predictions. Hypodivergent patients with a short lower face height tend to exhibit more forward growth, whereas hyperdivergent patients with a long lower face height exhibit more vertical growth [
53]. Furthermore, the position of the upper incisor plays a role in this regard. Class II Division 1 malocclusion is typified by protruded maxillary incisors, whereas Class II Division 2 patients exhibit retruded maxillary incisors. Since Class II Division 2 patients commonly have a shorter lower face height [
54], the algorithm may have leveraged this information to identify them as forward growers. These predictive factors indicate that the ML algorithms were possibly capable of differentiating between Class II Division 1 and Class II Division 2 patients to discern the appropriate growth pattern more accurately.
Regarding the
y-axis, the most predictive factors were identified as SN-MP, SN-Pog, SNB, SNA, and SN-Palatal plane. SN-MP is a measurement of the mandibular plane angle relative to the cranial base. The SN-MP angle provides the mandibular rotation model that is hypodivergent, normodivergent, or hyperdivergent. The larger the SN-MP angle, the more the mandible tends to become steeper, and the more the chin moves backward [
54]. The
y-axis is another cephalometric measurement used to assess the direction of the mandibular growth: downward and backward or downward and forward. Both the SN-MP angle and the
y-axis angle are used to evaluate the skeletal and growth patterns in orthodontics and orthognathic surgery. They help orthodontists and surgeons understand the vertical dimensions of the face, the inclination of the mandible, and the overall skeletal relationships between the cranial base and the jaws. So, it is understandable that the vertical relationship between the mandible and the cranial base helps predict the vertical direction of growth via the
y-axis. This is in agreement with Schudy, who found that SN-MP is closely associated with the growth and morphology of the mandible when he sought to identify the specific increments of growth responsible for the rotation of the mandible [
54]. Schudy found that the larger the SN-MP angle, the more the mandible tends to become steeper, and the more the chin moves backward, and the smaller the angle, the greater the tendency of the mandible to become flatter and the chin to grow forward [
54]. Additionally, the ML models utilized anterior–posterior measurements, such as SNA and SNB, to predict the
y-axis. A larger SNB may indicate a more forward mandibular growth. By assessing these sagittal measurements, AI could make predictions about how the mandible will likely grow in relation to the rest of the face.
When comparing the ML techniques to one another, none showed a clear superiority to the others. However, Linear Regression may have performed worse than the others due to its inherent limitations. Linear Regression assumes a linear relationship between the predictor variables and the response variable, which may not accurately capture the non-linearities present in human growth patterns [
55]. On the other hand, the Lasso and Ridge techniques incorporate regularization, which helps address issues of overfitting and model complexity. The Lasso performs both variable selection and regularization by imposing a penalty on the absolute values of the coefficients, effectively shrinking less important predictors to zero [
55]. This feature helps in identifying the most relevant predictors for growth prediction. Ridge, on the other hand, adds a penalty term based on the square of the coefficients, which allows for a better balance between bias and variance [
56]. By considering non-linear relationships and incorporating regularization techniques, Lasso and Ridge are better equipped to handle the complexities involved in predicting human growth with AI. When assessing the overall performance, the authors would consider further studies using the Lasso prediction model.
The authors acknowledge certain limitations of this current study. First, the sample size was relatively small due to the constraints of the available records in the AAOF Legacy Collection. It is worth noting that when employing ML techniques, a larger sample size is desirable as it allows for a more representative and diverse dataset. This, in turn, increases the likelihood of capturing the true underlying patterns and characteristics of the population, thereby reducing sampling bias and enhancing the model’s ability to make predictions on unseen data. Moreover, a larger sample size would help mitigate the impact of random variation and minimize instances of overfitting. Another limitation is that many images did not include sufficient facial tissue in the lateral cephalogram, which could have potentially improved the accuracy of the prediction methods. Additionally, the utilization of automated cephalometric landmark identification methods could have ensured consistency in cephalometric analyses.
5. Conclusions
The tested ML algorithms successfully predicted the post-pubertal mandibular length within a range of 2.5 mm and the y-axis within 1°. Beyond the initial mandibular length, several key predictors emerged for mandibular length, including chronological age, upper and lower face heights, and upper and lower incisor positions and inclinations. Similarly, for the y-axis, significant predictive factors encompassed y-axis measurements at earlier time points, as well as the SN-MP, SN-Pog, SNB, and SNA angles. Upon comparing the prediction methods for both the 2 year and 4 year forecasts of mandibular length, no substantial differences surfaced in terms of absolute disparities or directional variations among any of the methods. However, regarding the y-axis, employing the 2-year prediction resulted in significantly larger absolute deviations between the predicted and actual values compared to the 4 year prediction when utilizing Linear Regression. While the potential of ML techniques to accurately anticipate future mandibular growth in Class II cases holds promise, further research is imperative. Larger sample sizes and more extensive data points are needed to refine the precision of these predictions.